WO2021000572A1 - 数据处理方法、装置和电子设备 - Google Patents

数据处理方法、装置和电子设备 Download PDF

Info

Publication number
WO2021000572A1
WO2021000572A1 PCT/CN2020/071577 CN2020071577W WO2021000572A1 WO 2021000572 A1 WO2021000572 A1 WO 2021000572A1 CN 2020071577 W CN2020071577 W CN 2020071577W WO 2021000572 A1 WO2021000572 A1 WO 2021000572A1
Authority
WO
WIPO (PCT)
Prior art keywords
split
party
data
node
value
Prior art date
Application number
PCT/CN2020/071577
Other languages
English (en)
French (fr)
Inventor
李漓春
张晋升
王华忠
Original Assignee
创新先进技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 创新先进技术有限公司 filed Critical 创新先进技术有限公司
Priority to US16/779,231 priority Critical patent/US20200167661A1/en
Priority to US16/945,780 priority patent/US20200364582A1/en
Publication of WO2021000572A1 publication Critical patent/WO2021000572A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Definitions

  • the embodiments of this specification relate to the field of computer technology, and in particular to a data processing method, device, and electronic equipment.
  • model party In business practice, usually one party has a model that needs to be kept confidential (hereinafter referred to as the model party), and the other party has business data that needs to be kept confidential (hereinafter referred to as the data party). How to make the model party and/or the model party obtain a prediction based on the model on the business data under the condition that the model party does not leak the model and the data party does not leak the business data As a result, it is a technical problem that needs to be solved urgently.
  • the purpose of the embodiments of this specification is to provide a data processing method, device and electronic equipment, so that the model party does not leak its own model and the data party does not leak its own business data, or the model party does not leak Under the condition of its own model and business data, and the data party does not leak its own business data, the model party and/or the data party obtains the prediction result of the business data based on the model.
  • a data processing method is provided, which is applied to a model party, including: selecting a split node associated with business data held by the data party from a decision forest as a target split Node, the decision forest includes at least one decision tree, the decision tree includes at least one split node and at least two leaf nodes, the split node corresponds to a true split condition, and the leaf node corresponds to a leaf value;
  • the target splitting node generates a false splitting condition; sending a splitting condition set corresponding to the target splitting node to the data party, and the splitting condition set includes a false splitting condition and a real splitting condition.
  • a data processing device which is set on the model side, and includes: a selection unit for selecting from a decision forest that is associated with business data held by the data side
  • a split node serves as a target split node
  • the decision forest includes at least one decision tree, the decision tree includes at least one split node and at least two leaf nodes, the split node corresponds to a true split condition, and the leaf node corresponds to a leaf Value; generating unit for generating false splitting conditions for the target splitting node; sending unit for sending to the data party the splitting condition set corresponding to the target splitting node, the splitting condition set including false splitting conditions and real splitting condition.
  • an electronic device including: a memory, configured to store computer instructions; a processor, configured to execute the computer instructions to implement the computer instructions described in the first aspect Method steps.
  • a data processing method which is applied to a data party, and the data party holds a set of split conditions corresponding to business data and a target split node, and the target split node Is a split node associated with the business data in the decision forest, and the method includes: determining the value of the split condition in the split condition set according to the business data to obtain the value set; Encrypt the value of to obtain the valued ciphertext set; use the valued ciphertext set as input to cooperate with the model party to execute the secure data selection algorithm; use the random number as the input to cooperate with the model party to execute the multi-party secure computing algorithm, so that the model party and / Or the data party obtains the prediction result of the decision forest.
  • a data processing device which is provided on a data party, the data party holds a set of split conditions corresponding to business data and a target split node, and the target split node Is a split node associated with the service data in the decision forest, the device includes: a determining unit configured to determine the value of the split condition in the split condition set according to the service data to obtain the value set; encryption A unit for encrypting the values in the value set using random numbers to obtain a valued ciphertext set; the first calculation unit is used for using the valued ciphertext set as an input to execute a secure data selection algorithm in cooperation with the model party; The second calculation unit is used to perform multi-party security calculation algorithms in cooperation with the model party using random numbers as input, so that the model party and/or the data party can obtain the prediction result of the decision forest.
  • an electronic device including: a memory, configured to store computer instructions; and a processor, configured to execute the computer instructions to implement the computer instructions described in the fourth aspect Method steps.
  • a data processing method applied to a model party the model party holding a decision forest, the decision forest including a target split node, the target split node It is associated with the business data held by the data party and corresponds to a set of split conditions, the set of split conditions includes a true split condition and a false split condition, and the method includes: taking the rank of the true split condition in the split condition set as Data selection value, use the data selection value as the input to cooperate with the model party to execute the secure data selection algorithm to obtain the valued ciphertext of the true split condition; use the valued ciphertext as the input to cooperate with the model party to execute the multi-party secure calculation algorithm to facilitate the model party And/or the data party obtains the prediction results of the decision forest.
  • a data processing device which is provided on a model side, the model side holds a decision forest, the decision forest includes a target split node, and the target split node It is associated with the business data held by the data party and corresponds to a set of splitting conditions.
  • the set of splitting conditions includes a true splitting condition and a false splitting condition.
  • the device includes: a first computing unit for combining the true splitting condition set The rank of the split condition is used as the data selection value, and the data selection value is used as the input to cooperate with the model party to execute the secure data selection algorithm to obtain the value ciphertext of the true split condition; the second calculation unit is used to take the value ciphertext as The input cooperates with the model party to execute multi-party security calculation algorithms so that the model party and/or data party can obtain the prediction results of the decision forest.
  • an electronic device including: a memory, configured to store computer instructions; and a processor, configured to execute the computer instructions to implement the method described in the seventh aspect Method steps.
  • the data processing method of this embodiment adds false splitting conditions to the split nodes associated with the business data held by the data party for obfuscation, which can achieve no leakage on the model side.
  • the decision forest held by the data party does not disclose the business data held by the data party, or the model party does not disclose the decision forest and business data held by the model party, and the data party does not disclose the business data held by the data party. Under the conditions of, the data party and/or the data party obtain the prediction results of the decision forest.
  • Fig. 1 is a schematic diagram of the structure of a decision tree according to an embodiment of the specification
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the specification
  • FIG. 3 is a flowchart of a data processing method according to an embodiment of the specification.
  • FIG. 4 is a schematic diagram of a structure of a decision tree according to an embodiment of the specification.
  • Fig. 5 is a flowchart of a data processing method according to an embodiment of the specification.
  • Fig. 6 is a flowchart of a data processing method according to an embodiment of the specification.
  • FIG. 7 is a schematic diagram of the functional structure of a data processing device according to an embodiment of the specification.
  • FIG. 8 is a schematic diagram of the functional structure of a data processing device according to an embodiment of the specification.
  • FIG. 9 is a schematic diagram of the functional structure of a data processing device according to an embodiment of the specification.
  • FIG. 10 is a schematic diagram of the functional structure of an electronic device according to an embodiment of the specification.
  • Multi-party secure computing (Secure Muti-Party Computation, MPC) is an algorithm that protects data privacy and security. Multiple participants can use multi-party secure computing technology to perform collaborative calculations and obtain calculation results without leaking their own data.
  • the use of multi-party secure computing technology can implement any type of mathematical operations, such as four arithmetic operations (such as addition, subtraction, multiplication, and division), logical operations (such as AND, OR, XOR), etc.
  • multi-party security calculations can be implemented in multiple ways.
  • One or more of the participants P 1 ,..., P n can have the calculation result y after calculation.
  • the secure data selection algorithm is a data selection algorithm that can protect privacy. Specifically, it can include algorithms such as Oblivious Transfer (OT) and Private Information Retrieval (PIR).
  • Oblivious Transfer Oblivious Transfer
  • PIR Private Information Retrieval
  • Inadvertent transmission also known as vacant transmission
  • vacant transmission is a two-party communication protocol that can protect privacy, which enables the two parties in communication to transfer data in a way that makes choices obscure.
  • the sender can have multiple data.
  • the recipient can obtain one or more of the plurality of data via inadvertent transmission. In this process, the sender does not know what data the receiver receives; and the receiver cannot obtain any data other than the data it receives.
  • Private information retrieval is a secure retrieval protocol that protects privacy.
  • the server can have multiple data.
  • the demander can retrieve one or more data from the multiple data of the server.
  • the service side does not know what data the demand side retrieves.
  • the demander does not know any data other than the data retrieved.
  • the decision tree may include a binary tree and the like.
  • the decision tree may include multiple nodes. Each node may correspond to location information, and the location information is used to indicate the location of the node in the decision tree, for example, the number of the node.
  • the multiple nodes can form multiple prediction paths. The starting node of the predicted path is the root node of the decision tree, and the ending node is the leaf node of the decision tree.
  • the decision tree may include a regression decision tree, a classification decision tree, and the like.
  • the prediction result of the regression decision tree may be a specific value.
  • the prediction result of the classification decision tree may be a specific category.
  • a vector can usually be used to represent the category.
  • the vector [1 0 0] can represent category A
  • the vector [0 1 0] can represent category B
  • the vector [0 0 1] can represent category C.
  • the vector here is only an example, and other mathematical methods can also be used to represent the category in practical applications.
  • split node When a node in the decision tree can be split downward, the node can be called a split node.
  • the split node may include a root node and nodes other than the leaf node and the root node.
  • the split node corresponds to a split condition and a data type, the split condition can be used to select a prediction path, and the data type is used to indicate which types of data the split condition is aimed at.
  • Leaf node When a node in the decision tree cannot be split downward, the node can be called a leaf node.
  • the leaf node corresponds to a leaf value.
  • the leaf values corresponding to different leaf nodes can be the same or different.
  • Each leaf value can represent a prediction result.
  • the leaf value can be a numeric value or a vector.
  • the leaf value corresponding to the leaf node of the regression decision tree can be a numerical value
  • the leaf value corresponding to the leaf node of the classification decision tree can be a vector.
  • the decision tree Tree1 may include nodes 1, 2, 3, 4, and 5.
  • the location information of nodes 1, 2, 3, 4, and 5 are 1, 2, 3, 4, and 5, respectively.
  • node 1 is the root node
  • nodes 1, 2, and 3 are split nodes
  • nodes 3, 4, and 5 are leaf nodes.
  • Nodes 1, 2 and 4 can form a predicted path
  • nodes 1, 2 and 5 can form another predicted path
  • nodes 1 and 3 can form another predicted path.
  • split conditions and data types corresponding to split nodes 1, 2, and 3 can be shown in Table 1 below.
  • leaf values corresponding to leaf nodes 3, 4, and 5 can be shown in Table 2 below.
  • the split conditions "Age is greater than 20" and "Annual income is greater than 50,000” can be used to select the prediction path.
  • the split condition is met, the predicted path on the left can be selected; when the split condition is not met, the predicted path on the right can be selected.
  • the split condition "age is greater than 20 years old" is met, the predicted path on the left can be selected, and then jump to node 2; when the split condition "age greater than 20 years old" is not met, the right one can be selected Predict the path, and then jump to node 3.
  • node 2 when the split condition "annual income is greater than 50,000", you can choose the prediction path on the left, and then jump to node 4.
  • the split condition "annual income is greater than 50,000" you can choose the prediction on the right Path, and then jump to node 5.
  • One or more decision trees can constitute a decision forest.
  • the decision forest may include a regression decision forest and a classification decision forest.
  • the regression decision forest may include one or more regression decision trees.
  • the prediction result of the regression decision tree can be used as the prediction result of the regression decision forest.
  • the regression decision forest includes multiple regression decision trees, the prediction results of the multiple regression decision trees can be summed, and the sum result can be used as the prediction result of the regression decision forest.
  • the classification decision forest may include one or more classification decision trees. When the classification decision forest includes a classification decision tree, the prediction result of the classification decision tree can be used as the prediction result of the classification decision forest.
  • the prediction results of the multiple classification decision trees can be counted, and the statistical results can be used as the prediction result of the classification decision forest.
  • the prediction result of the classification decision tree can be expressed as a vector, and the vector can be used to represent the category.
  • the vectors predicted by multiple classification decision trees in the classification decision forest can be summed, and the sum result can be used as the prediction result of the classification decision forest.
  • a certain classification decision forest may include classification decision trees Tree2, Tree3, Tree4.
  • the prediction result of the classification decision tree Tree2 can be expressed as a vector [1 0 0], and the vector [1 0 0] represents category A.
  • the prediction result of the classification decision tree Tree3 can be expressed as a vector [0 1 0], and the vector [0 1 0] represents category B.
  • the prediction result of the classification decision tree Tree4 can be expressed as a vector [1 0 0], and the vector [0 0 1] represents category C.
  • the vectors [1 0 0], [0 1 0] and [1 0 0] can be summed to obtain the vector [2 1 0] as the prediction result of the classification decision forest.
  • the vector [2 1 0] indicates that in the classification decision forest, the number of times that the prediction result is category A is 2, the number of times that the prediction result is category B is 1, and the number of times that the prediction result is category C is 0 times.
  • This specification provides an embodiment of a data processing system.
  • the data processing system may include a model party and a data party.
  • the model party and the data party can be devices such as servers, mobile phones, tablets, or personal computers respectively; or, they can also be systems composed of multiple devices, such as a server cluster composed of multiple servers.
  • the model party can hold the decision forest that needs to be kept secret, and the data party can hold the business data that needs to be kept secret. In practical applications, in some cases, the data party holds all business data. In other cases, the model party holds part of the business data, and the data party holds another part of the business data. For example, the model party holds transaction business data, and the data party holds lending business data.
  • the model party and the data party can perform collaborative calculations so that the model party and/or the data party can obtain the prediction results based on the decision forest for all business data.
  • this specification provides an embodiment of the data processing method. This embodiment is applied to the preprocessing stage. This embodiment takes the model party as the execution subject and may include the following steps.
  • Step S10 Select the split node associated with the business data held by the data party from the decision forest as the target split node, the decision forest includes at least one decision tree, the decision tree includes at least one split node and at least two leaves A node, the split node corresponds to a true split condition, and the leaf node corresponds to a leaf value.
  • each split node in the decision forest may correspond to a split condition.
  • the splitting conditions here can be used as the true splitting conditions.
  • the association of the split node with the business data held by the data party can be understood as: the data type corresponding to the split node is the same as the data type of the business data held by the data party.
  • the model party can obtain in advance the data type of the business data held by the data party. In this way, the model party can select the split node with the same data type as the data type of the business data held by the data party from the decision forest as the target split node.
  • the number of the target split node may be one or more.
  • the data party holds all business data, and the model party does not hold any business data. All split nodes in the decision forest are associated with the business data held by the data party. In this way, all split nodes in the decision forest are target split nodes.
  • the data party holds a part of the entire business data, and the model party holds another part of the entire business data. Part of the split nodes in the decision forest is associated with the business data held by the data party, and another part of the split nodes is associated with the business data held by the model party. In this way, some split nodes in the decision forest are target split nodes.
  • Step S12 Generate a false split condition for the target split node.
  • the model party may generate at least one false split condition for each target split node.
  • the false split condition may be randomly generated, or may also be generated according to a preset rule.
  • Step S14 Send the split condition set corresponding to the target split node to the data party, where the split condition set includes a false split condition and a real split condition.
  • each target split node may correspond to a false split condition and a real split condition, and the set formed by the false split condition and the real split condition may be used as the split condition set corresponding to the target split node.
  • the model party may send the split condition set corresponding to each target split node to the data party.
  • the data party can receive the split condition set corresponding to the target split node.
  • the split conditions in the set of split conditions can have a certain order, and the order of the true split conditions is random. Obfuscation is carried out through false splitting conditions, so that the data party does not know which splitting condition in the set of splitting conditions is the true splitting condition, thus realizing the privacy protection of the decision forest.
  • the model party may retain leaf values corresponding to leaf nodes in the decision forest.
  • all split nodes in the decision forest are associated with business data held by the data party. That is, all split nodes in the decision forest are target split nodes. In other embodiments, a part of the split nodes in the decision forest is associated with the business data held by the data party, and another part of the split nodes is associated with the business data held by the model party. That is, the decision forest includes the target split node and other split nodes except the target split node.
  • the correlation between the split node and the business data held by the model party can be understood as: the data type corresponding to the split node is the same as the data type of the business data held by the model party. In this way, the model party can retain the true split conditions corresponding to the other split nodes.
  • the model party may also send the location information of the split nodes and the location information of the leaf nodes in the decision forest to the data party.
  • the data party can receive the location information of the split nodes and the location information of the leaf nodes in the decision forest; it can reconstruct the topological structure of the decision tree in the decision forest based on the location information of the split nodes and the location information of the leaf nodes in the decision forest.
  • the topological structure of the decision tree may include the connection relationship between split nodes and leaf nodes in the decision tree.
  • the model party can select the split node associated with the business data held by the data party from the decision forest as the target split node, can generate false split conditions for the target split node, and can send the data to the data party.
  • the split condition set includes a false split condition and a true split condition.
  • false split conditions confusion is realized to protect the privacy of the decision-making forest.
  • this specification provides another embodiment of the data processing method. This embodiment is applied to the prediction stage and may include the following steps.
  • Step S20 The data party determines the value of the split condition in the split condition set corresponding to the target split node according to the business data held to obtain the value set; the target split node is the business data held by the data party in the decision forest The associated split node.
  • the data party can obtain the set of split conditions corresponding to the target split node in the decision forest.
  • the target split node is a split node associated with the business data held by the data party in the decision forest, and the set of split conditions may include false split conditions and real split conditions.
  • the data party can determine the value of the split condition in the split condition set corresponding to the target split node according to the business data it holds to obtain the value set.
  • the value set may include at least two values, and the at least two values may include a value of a real split condition and a value of at least one false split condition.
  • the value of the splitting condition can be used to characterize whether the service data meets the splitting condition. If so, the value of the splitting condition can be a first value, and if not, the value of the splitting condition can be a second value. For example, the first value may be 1, and the second value may be zero.
  • the data party can determine the value of each split condition in the set of split conditions corresponding to the target split node according to the business data it holds, and can determine The value of is used as the value in the value set corresponding to the target split node.
  • Step S22 The data party encrypts the values in the value set using random numbers to obtain the value ciphertext set.
  • the set of valued ciphertexts includes at least two valued ciphertexts, and the at least two valued ciphertexts may include a valued ciphertext of a true split condition and a value of at least one false split condition Ciphertext.
  • the data party may generate a random number for each target split node.
  • the data party can use the random number of the target split node to encrypt each value in the value set corresponding to the target split node, and the encryption result can be used as the target The valued ciphertext in the valued ciphertext set corresponding to the split node.
  • this embodiment does not specifically limit it. For example, it can be encrypted by XORing the random number and the value of the split node.
  • Step S24 For the target split node in the decision forest, the model party uses the data selection value corresponding to the target split node as input, and the data party uses the valued ciphertext set corresponding to the target split node as input, and the two cooperate to execute the security data Choose an algorithm. The model party selects the valued ciphertext of the true split condition from the valued ciphertext set input by the data party.
  • the data selection value is used as the input of the model party in the process of executing the secure data selection algorithm, and can be used to select the value ciphertext from the value ciphertext set input by the data party during the process of executing the secure data selection algorithm.
  • the model party may specifically use the position of the true split condition in the split condition set corresponding to the target split node as the data selection value corresponding to the target split node.
  • a set of split conditions includes 4 split conditions, Condition1, Condition2, Condition3, Condition4. Among them, Condition1, Condition2, and Condition4 are false split conditions, and Condition3 is a real split condition.
  • the order of the split conditions in the set of split conditions is Condition1, Condition2, Condition3, and Condition4. Then, the rank of the true split condition Condition3 is 3.
  • the model party may use the data selection value corresponding to the target split node as input, and the data party may use the valued ciphertext set corresponding to the target split node as input.
  • Collaborating to implement secure data selection algorithms The model party can select the valued ciphertext of the true split condition from the valued ciphertext set.
  • the secure data selection algorithm may include an inadvertent transmission algorithm and a private information retrieval algorithm.
  • Step S26 The model party uses the valued ciphertext of the true split condition as input, and the data party uses random numbers as input, and the two parties cooperate to execute a multi-party security calculation algorithm.
  • the model party and/or data party obtain the prediction results of the decision forest.
  • the model party obtains the valued ciphertext of the true split condition corresponding to each target split node.
  • the model party can use the value ciphertext of the true split condition corresponding to each target split node in the decision tree and the leaf value corresponding to the leaf node as input, and the data party can use the decision tree
  • the random number corresponding to each target split node is the input, and the two cooperate to execute the multi-party security calculation algorithm.
  • the model party and/or the data party can obtain the prediction result of the decision tree.
  • the model party and/or the data party may determine the prediction result of the decision forest based on the prediction result of the decision tree in the decision forest. As for the specific determination method, please refer to the previous description, which will not be repeated here.
  • all split nodes in the decision forest are associated with business data held by the data party. That is, all split nodes in the decision forest are target split nodes. In other embodiments, a part of the split nodes in the decision forest is associated with the business data held by the data party, and another part of the split nodes is associated with the business data held by the model party. That is, the decision forest includes the target split node and other split nodes except the target split node. In this way, the model party can determine the value of the true split condition corresponding to the other split nodes according to the business data it holds.
  • the model party can use the value ciphertext of the true split condition corresponding to each target split node in the decision tree, the value of the true split condition corresponding to each other split node, and the corresponding leaf node
  • the leaf value of is the input
  • the data party can take the random number corresponding to each target split node in the decision tree as the input, and the two cooperate to execute the multi-party security calculation algorithm.
  • the model party and/or the data party can obtain the prediction result of the decision tree.
  • the model party and/or the data party may obtain the prediction result of the decision tree in different ways. For example, by performing multi-party security calculations, the model party and the data party can each obtain a share of the prediction result of the decision tree. To facilitate the distinction, the share obtained by the model party can be regarded as the first share, and the share obtained by the data party can be regarded as the second share.
  • the model party can send the first share to the data party.
  • the data party can receive the first share; the first share and the second share can be added to obtain the prediction result of the decision tree. Alternatively, the data party may send the second share to the model party.
  • the model party can receive the second share; the first share and the second share can be added to obtain the prediction result of the decision tree.
  • the model party may send the first share to the data party, and the data party may receive the first share; and the data party may send the second share to the model party, and the model party may receive the second share.
  • both the model party and the data party can obtain the prediction results of the decision tree.
  • the model party and/or the data party can directly obtain the prediction result of the decision tree.
  • the decision tree Tree2 may include nodes C1, C2, C3, C4, C5, O6, O7, O8, O9, O10, and O11.
  • nodes C1, C2, C3, C4, and C5 are split nodes
  • nodes O7, O8, O9, O10, and O11 are leaf nodes.
  • the branch on the left of the split node is a branch with a value of 0, which specifically represents a branch that does not meet the splitting condition
  • the branch on the right of the split node is a branch with a value of 1, which specifically represents a branch that meets the splitting condition .
  • the model party holds the decision tree Tree2.
  • the data party holds all business data.
  • the split nodes C1, C2, C3, C4, and C5 in the decision tree Tree2 are all associated with the business data held by the data party.
  • the prediction result of the decision tree Tree2 can be expressed as the following formula.
  • v Tree2 ((v o8 ⁇ (1-v c4) + v o9 ⁇ v c4) ⁇ (1-v c2) + (v o10 ⁇ (1-v c5) + v o11 ⁇ v c5) ⁇ v c2) ⁇ (1-v c1 )+
  • v Tree2 represents the prediction result of the decision tree Tree2
  • v o6 represents the leaf value of leaf node O6.
  • v o11 represents the leaf value of the leaf node O11;
  • v c1 represents the ciphertext of the true split condition corresponding to the split node C1.
  • v c5 represents the ciphertext of the true split condition corresponding to split node C5.
  • the model party can take v c1 ,...,v c5 ,...,v o6 ,...,v o11 as input, and the data party can take the random numbers of split nodes C1, C2, C3, C4 and C5 as input , The two cooperate to execute multi-party secure computing algorithms.
  • the multi-party secure computation algorithm execution model party can obtain a share of Tree2 v v1 Tree2, data may obtain a share Tree2 of another v v2 Tree2.
  • the model party can send v1 Tree2 to the data party.
  • the data party can receive v1 Tree2 ; can add v1 Tree2 and v2 Tree2 to get v Tree2 .
  • the model party does not leak its own decision forest and the data party does not leak Under the condition of the business data held by itself, or under the condition that the model party does not disclose its own decision forest and business data, and the data party does not disclose the business data held by itself, the data party and/or data party Obtain the prediction results of the decision forest.
  • this specification provides another embodiment of the data processing method.
  • This embodiment takes the data party as the execution subject and may include the following steps.
  • Step S30 Determine the value of the split condition in the split condition set according to the held business data to obtain the value set.
  • Step S32 Use random numbers to encrypt the values in the value set to obtain the value ciphertext set.
  • Step S34 Take the valued ciphertext set as an input and execute a secure data selection algorithm in cooperation with the model party.
  • Step S36 Cooperate with the model party to execute a multi-party security calculation algorithm using random numbers as input, so that the model party and/or the data party can obtain the prediction result of the decision forest.
  • step S30 For the specific processes of step S30, step S32, step S34, and step S36, refer to the embodiment corresponding to FIG. 2 and will not be repeated here.
  • the model party does not leak its own decision forest and the data party does not leak Under the condition of the business data held by itself, or under the condition that the model party does not disclose its own decision forest and business data, and the data party does not disclose the business data held by itself, the data party and/or data party Obtain the prediction results of the decision forest.
  • this specification provides another embodiment of the data processing method.
  • This embodiment takes the model party as the execution subject and may include the following steps.
  • Step S40 Use the position of the true split condition in the split condition set as the data selection value, and use the data selection value as the input to execute the secure data selection algorithm in cooperation with the model party to obtain the value ciphertext of the true split condition.
  • Step S42 Cooperate with the model party to execute a multi-party security calculation algorithm using the valued ciphertext as input, so that the model party and/or the data party can obtain the prediction result of the decision forest.
  • step S40 and step S42 please refer to the embodiment corresponding to FIG. 2, which will not be repeated here.
  • the model party does not leak its own decision forest and the data party does not leak Under the condition of the business data held by itself, or under the condition that the model party does not disclose its own decision forest and business data, and the data party does not disclose the business data held by itself, the data party and/or data party Obtain the prediction results of the decision forest.
  • This specification also provides an embodiment of a data processing device. This embodiment can be set on the model side.
  • the device may include the following units.
  • the selecting unit 50 is configured to select a split node associated with the business data held by the data party as a target split node from a decision forest, the decision forest including at least one decision tree, the decision tree including at least one split node and at least Two leaf nodes, the split node corresponds to a true split condition, and the leaf node corresponds to a leaf value;
  • the generating unit 52 is configured to generate a false split condition for the target split node
  • the sending unit 54 is configured to send a split condition set corresponding to the target split node to the data party, where the split condition set includes a false split condition and a real split condition.
  • This specification also provides an embodiment of a data processing device.
  • This embodiment may be set on a data party, the data party holds business data and a split condition set corresponding to a target split node, and the target split node is a split node associated with the business data in the decision forest.
  • the device may include the following units.
  • the determining unit 60 is configured to determine the value of the split condition in the set of split conditions according to the service data to obtain a value set;
  • the encryption unit 62 is configured to encrypt the values in the value set using random numbers to obtain the value ciphertext set;
  • the first calculation unit 64 is configured to use the valued ciphertext set as an input to execute a secure data selection algorithm in cooperation with the model party;
  • the second calculation unit 66 is configured to use a random number as an input to cooperate with the model party to execute a multi-party security calculation algorithm, so that the model party and/or the data party can obtain the prediction result of the decision forest.
  • This specification also provides an embodiment of a data processing device.
  • This embodiment can be set on the model side, the model side holds a decision forest, the decision forest includes a target split node, the target split node is associated with the business data held by the data side and corresponds to a set of split conditions,
  • the set of splitting conditions includes real splitting conditions and false splitting conditions.
  • the device may include the following units.
  • the first calculation unit 70 is configured to use the position of the true split condition in the split condition set as the data selection value, and use the data selection value as the input to execute the secure data selection algorithm in cooperation with the model party to obtain the value ciphertext of the true split condition ;
  • the second calculation unit 72 is configured to use the valued ciphertext as an input to execute a multi-party security calculation algorithm in cooperation with the model party, so that the model party and/or the data party can obtain the prediction result of the decision forest.
  • FIG. 10 is a schematic diagram of the hardware structure of an electronic device in this embodiment.
  • the electronic device may include one or more (only one is shown in the figure) processor, memory, and transmission module.
  • processor any electronic device that can be included in the electronic device.
  • memory any type of memory
  • transmission module any type of transmission module.
  • the hardware structure shown in FIG. 10 is only for illustration, and it does not limit the hardware structure of the above electronic device.
  • the electronic device may also include more or fewer component units than shown in FIG. 10; or, have a configuration different from that shown in FIG. 10.
  • the memory may include a high-speed random access memory; or, it may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the storage may also include a remotely set network storage.
  • the remotely set network storage can be connected to the electronic device through a network such as the Internet, an intranet, a local area network, a mobile communication network, and the like.
  • the memory may be used to store program instructions or modules of application software, such as the program instructions or modules of the embodiment corresponding to FIG. 2 of this specification, the program instructions or modules of the embodiment corresponding to FIG. 5 of this specification, and the embodiment corresponding to FIG. 6 Program instructions or modules.
  • the processor can be implemented in any suitable way.
  • the processor may take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (for example, software or firmware) executable by the (micro)processor, logic gates, switches, special-purpose integrated Circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller and embedded microcontroller form, etc.
  • the processor can read and execute program instructions or modules in the memory.
  • the transmission module can be used for data transmission via a network, for example, data transmission via a network such as the Internet, an intranet, a local area network, a mobile communication network, and the like.
  • a network such as the Internet, an intranet, a local area network, a mobile communication network, and the like.
  • a programmable logic device Programmable Logic Device, PLD
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • Verilog2 Verilog2
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • This manual can be used in many general or special computer system environments or configurations.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This specification can also be practiced in distributed computing environments, in which tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本说明书实施例提供一种数据处理方法、装置和电子设备。所述方法包括:根据所述业务数据,确定分裂条件集合中分裂条件的取值,得到取值集合;利用随机数对取值集合中的取值进行加密,得到取值密文集合;以取值密文集合为输入与模型方协作执行安全数据选择算法;以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。

Description

数据处理方法、装置和电子设备 技术领域
本说明书实施例涉及计算机技术领域,特别涉及一种数据处理方法、装置和电子设备。
背景技术
在业务实际中,通常一方拥有需要保密的模型(以下称为模型方),另一方拥有需要保密的业务数据(以下称为数据方)。如何在所述模型方不泄漏所述模型、且所述数据方不泄漏所述业务数据的条件下,使得模型方和/或模型方获得基于所述模型对所述业务数据进行预测后的预测结果,是当前亟需解决的技术问题。
发明内容
本说明书实施例的目的是提供一种数据处理方法、装置和电子设备,以便于在模型方不泄漏自身的模型、且数据方不泄漏自身的业务数据的条件下,或者,在模型方不泄漏自身的模型和业务数据、且数据方不泄漏自身的业务数据的条件下,模型方和/或数据方获得基于所述模型对业务数据进行预测后的预测结果。
为实现上述目的,本说明书中一个或多个实施例提供的技术方案如下。
根据本说明书一个或多个实施例的第一方面,提供了一种数据处理方法,应用于模型方,包括:从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值;为所述目标分裂节点生成虚假分裂条件;向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
根据本说明书一个或多个实施例的第二方面,提供了一种数据处理装置,设置于模型方,包括:选取单元,用于从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值;生成单元,用于为所述目标分裂节点生成虚假分裂条件;发送单元,用于向 数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
根据本说明书一个或多个实施例的第三方面,提供了一种电子设备,包括:存储器,用于存储计算机指令;处理器,用于执行所述计算机指令以实现如第一方面所述的方法步骤。
根据本说明书一个或多个实施例的第四方面,提供了一种数据处理方法,应用于数据方,所述数据方持有业务数据和目标分裂节点对应的分裂条件集合,所述目标分裂节点为决策森林中与所述业务数据相关联的分裂节点,所述方法包括:根据所述业务数据,确定分裂条件集合中分裂条件的取值,得到取值集合;利用随机数对取值集合中的取值进行加密,得到取值密文集合;以取值密文集合为输入与模型方协作执行安全数据选择算法;以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
根据本说明书一个或多个实施例的第五方面,提供了一种数据处理装置,设置于数据方,所述数据方持有业务数据和目标分裂节点对应的分裂条件集合,所述目标分裂节点为决策森林中与所述业务数据相关联的分裂节点,所述装置包括:确定单元,用于根据所述业务数据,确定所述分裂条件集合中分裂条件的取值,得到取值集合;加密单元,用于利用随机数对取值集合中的取值进行加密,得到取值密文集合;第一计算单元,用于以取值密文集合为输入与模型方协作执行安全数据选择算法;第二计算单元,用于以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
根据本说明书一个或多个实施例的第六方面,提供了一种电子设备,包括:存储器,用于存储计算机指令;处理器,用于执行所述计算机指令以实现如第四方面所述的方法步骤。
根据本说明书一个或多个实施例的第七方面,提供了一种数据处理方法,应用于模型方,所述模型方持有决策森林,所述决策森林包括目标分裂节点,所述目标分裂节点与数据方持有的业务数据相关联、且对应有分裂条件集合,所述分裂条件集合包括真实分裂条件和虚假分裂条件,所述方法包括:将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文;以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
根据本说明书一个或多个实施例的第八方面,提供了一种数据处理装置,设置于模型方,所述模型方持有决策森林,所述决策森林包括目标分裂节点,所述目标分裂节点与数据方持有的业务数据相关联、且对应有分裂条件集合,所述分裂条件集合包括真实分裂条件和虚假分裂条件,所述装置包括:第一计算单元,用于将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文;第二计算单元,用于以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
根据本说明书一个或多个实施例的第九方面,提供了一种电子设备,包括:存储器,用于存储计算机指令;处理器,用于执行所述计算机指令以实现如第七方面所述的方法步骤。
由以上本说明书实施例提供的技术方案可见,本实施例的数据处理方法,通过为与数据方持有的业务数据相关联的分裂节点添加虚假分裂条件以进行混淆,可以实现在模型方不泄漏自身持有的决策森林、且数据方不泄漏自身持有的业务数据的条件下,或者,在模型方不泄漏自身持有的决策森林和业务数据,且数据方不泄漏自身持有的业务数据的条件下,由数据方和/或数据方获得决策森林的预测结果。
附图说明
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本说明书实施例一种决策树的结构示意图;
图2为本说明书实施例一种数据处理方法的流程图;
图3为本说明书实施例一种数据处理方法的流程图;
图4为本说明书实施例一种决策树的结构示意图;
图5为本说明书实施例一种数据处理方法的流程图;
图6为本说明书实施例一种数据处理方法的流程图;
图7为本说明书实施例一种数据处理装置的功能结构示意图;
图8为本说明书实施例一种数据处理装置的功能结构示意图;
图9为本说明书实施例一种数据处理装置的功能结构示意图;
图10为本说明书实施例一种电子设备的功能结构示意图。
具体实施方式
下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书保护的范围。
多方安全计算(Secure Muti-Party Computation,MPC)是一种保护数据隐私安全的算法。多个参与方可以在不泄漏自身数据的前提下,使用多方安全计算技术进行协作计算,得到计算结果。利用多方安全计算技术可以实现任意类型的数学运算,诸如四则运算(例如加法运算、减法运算、乘法运算、除法运算)、逻辑运算(例如与运算、或运算、异或运算)等。
在实际应用中,多方安全计算可以有多种实现方式。例如,采用多方安全计算,参与方P 1,…,P n可以协作计算函数f(x 1,…,x n)=(y 1,…,y n)=y。其中,n≥2;x 1,...,x n分别为参与方P 1,…,P n拥有的数据;y为计算结果;y 1,…,y n分别为参与方P 1,…,P n在计算后所拥有的计算结果y的份额;y 1+y 2+…+y n=y。另举一例,采用多方安全计算,参与方P 1,…,P n可以协作计算函数f(x 1,…,x n)=y。参与方P 1,…,P n中的一个或多个在计算后可以拥有计算结果y。
安全数据选择算法是一种可以保护隐私的数据选择算法,具体可以包括不经意传输(Oblivious Transfer,OT)和私有信息检索(Private Information Retrieval,PIR)等算法。
不经意传输,又称为茫然传输,是一种可以保护隐私的双方通信协议,能够使通信双方以一种选择模糊化的方式传递数据。发送方可以具有多个数据。经由不经意传输接收方能够获得所述多个数据中的一个或多个数据。在此过程中,发送方不知晓接收方接收的是哪些数据;而接收方不能够获得其所接收数据之外的其它任何数据。
私有信息检索,是一种保护隐私的安全检索协议。服务方可以具有多个数据。需求方可以从服务方的多个数据中检索一个或多个数据。服务方不知晓需求方检索的是哪些数据。需求方也不知晓除了其所检索数据以外的其它任何数据。
决策树:一种有监督的机器学习模型。所述决策树可以包括二叉树等。所述决策树可以包括多个节点。每个节点可以对应有位置信息,所述位置信息用于表示节点在决策树中的位置,例如可以为节点的编号等。所述多个节点能够形成多个预测路径。所述预测路径的起始节点为所述决策树的根节点,终止节点为所述决策树的叶子节点。
所述决策树可以包括回归决策树和分类决策树等。所述回归决策树的预测结果可以为一个具体的数值。所述分类决策树的预测结果可以为一个具体的类别。值得说明的是,为了便于分析计算,通常可以采用向量来表示类别。例如,向量[1 0 0]可以表示类别A,向量[0 1 0]可以表示类别B,向量[0 0 1]可以表示类别C。当然,此处的向量仅为示例,在实际应用中还可以采用其它的数学方式来表示类别。
分裂节点:当决策树中的一个节点能够向下分裂时,可以将该节点称为分裂节点。所述分裂节点可以包括根节点、以及除叶子节点和根节点以外的其它节点。所述分裂节点对应有分裂条件和数据类型,所述分裂条件可以用于选择预测路径,所述数据类型用于表示分裂条件针对的是哪些类型的数据。
叶子节点:当决策树中的一个节点不能够向下分裂时,可以将该节点称为叶子节点。所述叶子节点对应有叶子值。不同叶子节点对应的叶子值可以相同或不同。每个叶子值可以表示一种预测结果。所述叶子值可以为数值或向量等。例如,回归决策树的叶子节点对应的叶子值可以为数值,分类决策树的叶子节点对应的叶子值可以为向量。
为了更好地对以上术语进行理解,以下介绍一个场景示例。
请参阅图1。在本场景示例中,决策树Tree1可以包括节点1、2、3、4和5。节点1、2、3、4和5的位置信息分别为1、2、3、4和5。其中,节点1为根节点,节点1、2和3为分裂节点,节点3、4和5为叶子节点。节点1、2和4可以形成一个预测路径,节点1、2和5可以形成另一个预测路径,节点1和3可以形成另一个预测路径。
分裂节点1、2和3对应的分裂条件和数据类型可以如下表1所示。
表1
分裂节点 分裂条件 数据类型
1 年龄大于20岁 年龄
2 年收入大于5万 收入
叶子节点3、4和5对应的叶子值可以如下表2所示。
表2
叶子节点 叶子值
3 200
4 700
5 500
在决策树Tree1中,分裂条件“年龄大于20岁”、“年收入大于5万”可以用于选择预测路径。当满足分裂条件时,可以选择左边的预测路径;当不满足分裂条件时,可以选择右边的预测路径。具体地,针对节点1,当满足分裂条件“年龄大于20岁”时,可以选择左边的预测路径,进而跳转到节点2;当不满足分裂条件“年龄大于20岁”时,可以选择右边的预测路径,进而跳转到节点3。针对节点2,当满足分裂条件“年收入大于5万”时,可以选择左边的预测路径,进而跳转到节点4;当不满足分裂条件“年收入大于5万”时,可以选择右边的预测路径,进而跳转到节点5。
一个或多个决策树可以构成决策森林。所述决策森林可以包括回归决策森林和分类决策森林。所述回归决策森林可以包括一个或多个回归决策树。当回归决策森林包括一个回归决策树时,可以将该回归决策树的预测结果作为该回归决策森林的预测结果。当回归决策森林包括多个回归决策树时,可以对所述多个回归决策树的预测结果进行求和处理,可以将求和结果作为该回归决策森林的预测结果。所述分类决策森林可以包括一个或多个分类决策树。当分类决策森林包括一个分类决策树时,可以将该分类决策树的预测结果作为该分类决策森林的预测结果。当分类决策森林包括多个分类决策树时,可以对所述多个分类决策树的预测结果进行统计,可以将统计结果作为该分类决策森林的预测结果。值得说明的是,在一些场景下,分类决策树的预测结果可以表示为向量,所述向量可以用于表示类别。如此,可以对分类决策森林中多个分类决策树预测出的向量进行求和处理,可以将求和结果作为分类决策森林的预测结果。例如,某一分类决策森林可以包括分类决策树Tree2、Tree3、Tree4。分类决策树Tree2的预测结果可以表示为向量[1 0 0],向量[1 0 0]表示类别A。分类决策树Tree3的预测结果可以表示为向量[0 1 0],向量[0 1 0]表示类别B。分类决策树Tree4的预测结果可以表示为向量[1 0 0],向量[0 0 1]表示类别C。那么,可以对向量[1 0 0]、[0 1 0]和[1 0 0] 进行求和处理,得到向量[2 1 0]作为分类决策森林的预测结果。向量[2 1 0]表示在分类决策森林中预测结果为类别A的次数为2次、预测结果为类别B的次数为1次,预测结果为类别C的次数为0次。
本说明书提供数据处理系统的一个实施例。
所述数据处理系统可以包括模型方和数据方。模型方和数据方可以分别为服务器、手机、平板电脑、或个人电脑等设备;或者,也可以分别为由多台设备组成的系统,例如由多个服务器组成的服务器集群。模型方可以持有需要保密的决策森林,数据方可以持有需要保密的业务数据。在实际应用中,一些情况下,数据方持有全体业务数据。另一些情况下,模型方持有全体业务数据中的一部分业务数据,数据方持有全体业务数据中的另一部分业务数据。例如,模型方持有交易业务数据,数据方持有借贷业务数据。模型方和数据方可以进行协作计算,以便模型方和/或数据方获得基于决策森林对全体业务数据进行预测后的预测结果。
请参阅图2。基于前面的数据处理系统实施例,本说明书提供数据处理方法的一个实施例。该实施例应用于预处理阶段。该实施例以模型方为执行主体,可以包括以下步骤。
步骤S10:从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值。
在一些实施例中,决策森林中的每个分裂节点可以对应有分裂条件。为了与后续的虚假分裂条件进行区分,可以将这里的分裂条件作为真实分裂条件。
在一些实施例中,分裂节点与数据方持有的业务数据相关联可以理解为:分裂节点对应的数据类型与数据方持有业务数据的数据类型相同。模型方可以预先获得数据方持有业务数据的数据类型。如此模型方可以从决策森林中选取对应的数据类型与数据方持有的业务数据的数据类型相同的分裂节点作为目标分裂节点。
在一些实施例中,所述目标分裂节点的数量可以为一个或多个。具体地,在一些实施方式中,数据方持有全体业务数据,模型方不持有任何业务数据。决策森林中的所有分裂节点均与数据方持有的业务数据相关联。这样决策森林中的所有分裂节点均为目标分裂节点。在另一些实施方式中,数据方持有全体业务数据中的一部分数据,模型方持有全体业务数据中的另一部分数据。决策森林中的部分分裂节点与数据方持有的业务数 据相关联,另一部分分裂节点与模型方持有的业务数据相关联。这样决策森林中的部分分裂节点为目标分裂节点。
步骤S12:为所述目标分裂节点生成虚假分裂条件。
在一些实施例中,模型方可以为每个目标分裂节点生成至少一个虚假分裂条件。所述虚假分裂条件可以是随机生成的,或者,还可以是按照预设规则生成的。
步骤S14:向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
在一些实施例中,经过步骤S12,每个目标分裂节点可以对应有虚假分裂条件和真实分裂条件,可以将虚假分裂条件和真实分裂条件形成的集合作为该目标分裂节点对应的分裂条件集合。所述模型方可以向数据方发送每个目标分裂节点对应的分裂条件集合。数据方可以接收目标分裂节点对应的分裂条件集合。分裂条件集合中的分裂条件可以具有一定的顺序,真实分裂条件所在的位次是随机的。通过虚假分裂条件进行混淆,使得数据方不知晓分裂条件集合中的哪个分裂条件为真实分裂条件,从而实现了对决策森林的隐私保护。
在一些实施例中,模型方可以保留决策森林中叶子节点对应的叶子值。
在一些实施方式中,决策森林中的所有分裂节点均与数据方持有的业务数据相关联。即,决策森林中的所有分裂节点均为目标分裂节点。在另一些实施方式中,决策森林中的一部分分裂节点与数据方持有的业务数据相关联,另一部分分裂节点与模型方持有的业务数据相关联。即,决策森林中包括目标分裂节点和除目标分裂节点以外的其它分裂节点。分裂节点与模型方持有的业务数据相关联可以理解为:分裂节点对应的数据类型与模型方持有业务数据的数据类型相同。如此模型方可以保留所述其它分裂节点对应的真实分裂条件。
在一些实施例中,模型方还可以向数据方发送决策森林中分裂节点的位置信息和叶子节点的位置信息。数据方可以接收决策森林中分裂节点的位置信息和叶子节点的位置信息;可以基于决策森林中分裂节点的位置信息和叶子节点的位置信息,重构决策森林中决策树的拓扑结构。决策树的拓扑结构可以包括决策树中分裂节点和叶子节点的连接关系。
本实施例的数据处理方法,模型方可以从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,可以为所述目标分裂节点生成虚假分裂条件,可 以向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。这样一方面,通过虚假分裂条件进行混淆,实现了对决策森林的隐私保护。另一方面,便于利用决策森林对全体业务数据进行预测。
请参阅图3。基于前面的数据处理系统实施例,本说明书提供数据处理方法的另一个实施例。该实施例应用于预测阶段,可以包括以下步骤。
步骤S20:数据方根据持有的业务数据,确定目标分裂节点所对应分裂条件集合中分裂条件的取值,得到取值集合;所述目标分裂节点为决策森林中与数据方持有的业务数据相关联的分裂节点。
在一些实施例中,数据方可以获得决策森林中目标分裂节点对应的分裂条件集合。所述目标分裂节点为决策森林中与数据方持有的业务数据相关联的分裂节点,所述分裂条件集合可以包括虚假分裂条件和真实分裂条件。数据方可以根据持有的业务数据,确定目标分裂节点所对应分裂条件集合中分裂条件的取值,得到取值集合。所述取值集合可以包括至少两个取值,所述至少两个取值可以包括真实分裂条件的取值和至少一个虚假分裂条件的取值。
分裂条件的取值可以用于表征业务数据是否满足分裂条件,若是,分裂条件的取值可以为第一数值,若否,分裂条件的取值可以为第二数值。例如,所述第一数值可以为1,所述第二数值可以为0。在实际应用中,针对决策森林中的每个目标分裂节点,数据方可以根据自身持有的业务数据,分别确定该目标分裂节点所对应分裂条件集合中每个分裂条件的取值,可以将确定的取值作为该目标分裂节点所对应取值集合中的取值。
步骤S22:数据方利用随机数对取值集合中的取值进行加密,得到取值密文集合。
在一些实施例中,所述取值密文集合包括至少两个取值密文,所述至少两个取值密文可以包括真实分裂条件的取值密文和至少一个虚假分裂条件的取值密文。
在一些实施例中,数据方可以为每个目标分裂节点生成随机数。针对决策森林中的每个目标分裂节点,数据方可以利用该目标分裂节点的随机数,对该目标分裂节点所对应取值集合中的每个取值分别进行加密,可以将加密结果作为该目标分裂节点所对应取值密文集合中的取值密文。至于采用哪种方式对进行加密,本实施例并不做具体限定。例如,可以通过对随机数和分裂节点的取值进行异或运算来加密。
步骤S24:针对决策森林中的目标分裂节点,模型方以该目标分裂节点对应的数据选择值为输入,数据方以该目标分裂节点对应的取值密文集合为输入,二者协作执行安 全数据选择算法。模型方从数据方输入的取值密文集合中选择真实分裂条件的取值密文。
在一些实施例中,数据选择值作为模型方在执行安全数据选择算法过程中的输入,可以用于从数据方在执行安全数据选择算法过程中输入的取值密文集合中选择取值密文。模型方具体可以将目标分裂节点所对应分裂条件集合中真实分裂条件所在的位次,作为该目标分裂节点对应的数据选择值。例如,某一分裂条件集合包括Condition1、Condition2、Condition3、Condition4等4个分裂条件。其中,Condition1、Condition2和Condition4为虚假分裂条件,Condition3为真实分裂条件。该分裂条件集合中分裂条件的顺序为Condition1、Condition2、Condition3和Condition4。那么,真实分裂条件Condition3所在的位次为3。
在一些实施例中,针对决策森林中的目标分裂节点,模型方可以以该目标分裂节点对应的数据选择值为输入,数据方可以以该目标分裂节点对应的取值密文集合为输入,二者协作执行安全数据选择算法。模型方可以从取值密文集合中选择真实分裂条件的取值密文。根据安全数据选择算法的特性,数据方并不知晓模型方具体选择了哪个取值密文,模型方也不能够知晓除了所选择取值密文以外的其它取值密文。所述安全数据选择算法可以包括不经意传输算法和私有信息检索算法等。
步骤S26:模型方以真实分裂条件的取值密文为输入,数据方以随机数为输入,二者协作执行多方安全计算算法。模型方和/或数据方获得决策森林的预测结果。
在一些实施例中,经过步骤S24,模型方获得了每个目标分裂节点所对应真实分裂条件的取值密文。针对决策森林中的每个决策树,模型方可以以该决策树中各个目标分裂节点所对应真实分裂条件的取值密文、以及叶子节点对应的叶子值为输入,数据方可以以该决策树中各个目标分裂节点对应的随机数为输入,二者协作执行多方安全计算算法。模型方和/或数据方可以获得该决策树的预测结果。模型方和/或数据方可以根据决策森林中决策树的预测结果,确定决策森林的预测结果。至于具体的确定方式可以参见前面的叙述,在此不再赘述。
在一些实施方式中,决策森林中的所有分裂节点均与数据方持有的业务数据相关联。即,决策森林中的所有分裂节点均为目标分裂节点。在另一些实施方式中,决策森林中的一部分分裂节点与数据方持有的业务数据相关联,另一部分分裂节点与模型方持有的业务数据相关联。即,决策森林中包括目标分裂节点和除目标分裂节点以外的其它分裂节点。如此模型方可以根据自身持有的业务数据,确定所述其它分裂节点对应的真实分裂条件的取值。针对决策森林中的每个决策树,模型方可以以该决策树中各个目标分裂 节点所对应真实分裂条件的取值密文、各个其它分裂节点所对应真实分裂条件的取值、以及叶子节点对应的叶子值为输入,数据方可以以该决策树中各个目标分裂节点对应的随机数为输入,二者协作执行多方安全计算算法。模型方和/或数据方可以获得该决策树的预测结果。
在一些实施例中,依据采用的多方安全计算算法类型的不同,模型方和/或数据方获得决策树的预测结果的方式可以不同。例如,通过执行多方安全计算,模型方和数据方可以分别获得决策树的预测结果的一份份额。为了便于区分,可以将模型方获得的份额作为第一份额,可以将数据方获得的份额作为第二份额。模型方可以向数据方发送第一份额。数据方可以接收第一份额;可以将第一份额和第二份额相加,得到决策树的预测结果。或者,数据方可以向模型方发送第二份额。模型方可以接收第二份额;可以将第一份额和第二份额相加,得到决策树的预测结果。或者,模型方可以向数据方发送第一份额,数据方可以接收第一份额;并且数据方可以向模型方发送第二份额,模型方可以接收第二份额。通过将第一份额和第二份额相加,模型方和数据方均可以获得决策树的预测结果。另举一例,通过执行多方安全计算,模型方和/或数据方可以直接得到决策树的预测结果。
以下介绍一个应用场景示例。需要说明的是,该应用场景示例的目的仅在于更好地说明本说明书的实施例,并不构成对本实施例的不当限定。
请参阅图4。在本场景示例中,决策树Tree2可以包括节点C1、C2、C3、C4、C5、O6、O7、O8、O9、O10和O11。其中,节点C1、C2、C3、C4和C5为分裂节点,节点O7、O8、O9、O10和O11为叶子节点。在决策树Tree2中,分裂节点左侧的分支为取值为0的分支,具体表示不满足分裂条件的分支;分裂节点右侧的分支为取值为1的分支,具体表示满足分裂条件的分支。
在本场景示例中,模型方持有决策树Tree2。数据方持有全体业务数据。决策树Tree2中的分裂节点C1、C2、C3、C4和C5均与数据方持有的业务数据相关联。
决策树Tree2的预测结果可以表示为如下公式。
v Tree2=((v o8×(1-v c4)+v o9×v c4)×(1-v c2)+(v o10×(1-v c5)+v o11×v c5)×v c2)×(1-v c1)+
(v o6×(1-v c3)+v o7×v c3)×v c1
=v o8×(1-v c4)×(1-v c2)×(1-v c1)+v o9×v c4×(1-v c2)×(1-v c1)+
v o10×(1-v c5)×(1-v c2)×(1-v c1)+v o11×v c5×(1-v c2)×(1-v c1)+
v o6×(1-v c3)×v c1+v o7×v c3×v c1
(1)
上式(1)中,
v Tree2表示决策树Tree2的预测结果;
v o6表示叶子节点O6的叶子值。依次类推,v o11表示叶子节点O11的叶子值;
v c1表示分裂节点C1所对应真实分裂条件的取值密文。依次类推,v c5表示分裂节点C5所对应真实分裂条件的取值密文。
模型方可以以v c1,...,v c5,...,v o6,...,v o11为输入,数据方可以以分裂节点C1、C2、C3、C4和C5的随机数为输入,二者协作执行多方安全计算算法。在执行多方安全计算算法后,模型方可以获得v Tree2的一份份额v1 Tree2,数据方可以获得v Tree2的另一份份额v2 Tree2。模型方可以向数据方发送v1 Tree2。数据方可以接收v1 Tree2;可以将v1 Tree2和v2 Tree2相加,得到v Tree2
本实施例的数据处理方法,通过为与数据方持有的业务数据相关联的分裂节点添加虚假分裂条件以进行混淆,可以实现在模型方不泄漏自身持有的决策森林、且数据方不泄漏自身持有的业务数据的条件下,或者,在模型方不泄漏自身持有的决策森林和业务数据,且数据方不泄漏自身持有的业务数据的条件下,由数据方和/或数据方获得决策森林的预测结果。
请参阅图5。基于同样的发明构思,本说明书提供数据处理方法的另一个实施例。该实施例以数据方为执行主体,可以包括以下步骤。
步骤S30:根据持有的业务数据,确定分裂条件集合中分裂条件的取值,得到取值集合。
步骤S32:利用随机数对取值集合中的取值进行加密,得到取值密文集合。
步骤S34:以取值密文集合为输入与模型方协作执行安全数据选择算法。
步骤S36:以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
步骤S30、步骤S32、步骤S34和步骤S36的具体过程可以参见图2对应的实施例,在此不再赘述。
本实施例的数据处理方法,通过为与数据方持有的业务数据相关联的分裂节点添加 虚假分裂条件以进行混淆,可以实现在模型方不泄漏自身持有的决策森林、且数据方不泄漏自身持有的业务数据的条件下,或者,在模型方不泄漏自身持有的决策森林和业务数据,且数据方不泄漏自身持有的业务数据的条件下,由数据方和/或数据方获得决策森林的预测结果。
请参阅图6。基于同样的发明构思,本说明书提供数据处理方法的另一个实施例。该实施例以模型方为执行主体,可以包括以下步骤。
步骤S40:将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文。
步骤S42:以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
步骤S40和步骤S42的具体过程可以参见图2对应的实施例,在此不再赘述。
本实施例的数据处理方法,通过为与数据方持有的业务数据相关联的分裂节点添加虚假分裂条件以进行混淆,可以实现在模型方不泄漏自身持有的决策森林、且数据方不泄漏自身持有的业务数据的条件下,或者,在模型方不泄漏自身持有的决策森林和业务数据,且数据方不泄漏自身持有的业务数据的条件下,由数据方和/或数据方获得决策森林的预测结果。
请参阅图7。本说明书还提供一种数据处理装置的实施例。该实施例可以设置于模型方。所述装置可以包括以下单元。
选取单元50,用于从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值;
生成单元52,用于为所述目标分裂节点生成虚假分裂条件;
发送单元54,用于向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
请参阅图8。本说明书还提供一种数据处理装置的实施例。该实施例可以设置于数据方,所述数据方持有业务数据和目标分裂节点对应的分裂条件集合,所述目标分裂节点为决策森林中与所述业务数据相关联的分裂节点。所述装置可以包括以下单元。
确定单元60,用于根据所述业务数据,确定所述分裂条件集合中分裂条件的取值,得到取值集合;
加密单元62,用于利用随机数对取值集合中的取值进行加密,得到取值密文集合;
第一计算单元64,用于以取值密文集合为输入与模型方协作执行安全数据选择算法;
第二计算单元66,用于以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
请参阅图9。本说明书还提供一种数据处理装置的实施例。该实施例可以设置于模型方,所述模型方持有决策森林,所述决策森林包括目标分裂节点,所述目标分裂节点与数据方持有的业务数据相关联、且对应有分裂条件集合,所述分裂条件集合包括真实分裂条件和虚假分裂条件。所述装置可以包括以下单元。
第一计算单元70,用于将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文;
第二计算单元72,用于以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
下面介绍本说明书电子设备的一个实施例。图10是该实施例中一种电子设备的硬件结构示意图。如图10所示,所述电子设备可以包括一个或多个(图中仅示出一个)处理器、存储器和传输模块。当然,本领域普通技术人员可以理解,图10所示的硬件结构仅为示意,其并不对上述电子设备的硬件结构造成限定。在实际中所述电子设备还可以包括比图10所示更多或者更少的组件单元;或者,具有与图10所示不同的配置。
所述存储器可以包括高速随机存储器;或者,还可以包括非易失性存储器,例如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。当然,所述存储器还可以包括远程设置的网络存储器。所述远程设置的网络存储器可以通过诸如互联网、企业内部网、局域网、移动通信网等网络连接至所述电子设备。所述存储器可以用于存储应用软件的程序指令或模块,例如本说明书图2所对应实施例的程序指令或模块、本说明书图5所对应实施例的程序指令或模块、图6所对应实施例的程序指令或模块。
所述处理器可以按任何适当的方式实现。例如,所述处理器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固 件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式等等。所述处理器可以读取并执行所述存储器中的程序指令或模块。
所述传输模块可以用于经由网络进行数据传输,例如经由诸如互联网、企业内部网、局域网、移动通信网等网络进行数据传输。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同或相似的部分互相参见即可,每个实施例重点说明的都是与其它实施例的不同之处。尤其,对于装置实施例和电子设备实施例而言,由于其基本相似于数据处理方法实施例,所以描述的比较简单,相关之处参见数据处理方法实施例的部分说明即可。
另外,可以理解的是,本领域技术人员在阅读本说明书文件之后,可以无需创造性劳动想到将本说明书列举的部分或全部实施例进行任意组合,这些组合也在本说明书公开和保护的范围内。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片2。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog2。本领域技术人员也应该清楚,只需要将方法流程用上述几种 硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书各个实施例或者实施例的某些部分所述的方法。
本说明书可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。
本说明书可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
虽然通过实施例描绘了本说明书,本领域普通技术人员知道,本说明书有许多变形和变化而不脱离本说明书的精神,希望所附的权利要求包括这些变形和变化而不脱离本说明书的精神。

Claims (16)

  1. 一种数据处理方法,应用于模型方,包括:
    从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值;
    为所述目标分裂节点生成虚假分裂条件;
    向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
  2. 如权利要求1所述的方法,在决策森林中分裂节点对应有数据类型,所述目标分裂节点对应的数据类型与所述业务数据的数据类型相同。
  3. 如权利要求1所述的方法,数据方持有全体业务数据;或者,模型方持有全体业务数据中的一部分业务数据,数据方持有全体业务数据中的另一部分业务数据。
  4. 如权利要求1所述的方法,所述决策森林还包括其它分裂节点,所述其它分裂节点与所述模型方持有的业务数据相关联;所述方法还包括:
    保留其它分裂节点对应的真实分裂条件以及叶子节点对应的叶子值。
  5. 一种数据处理装置,设置于模型方,包括:
    选取单元,用于从决策森林中选取与数据方持有的业务数据相关联的分裂节点作为目标分裂节点,所述决策森林包括至少一个决策树,所述决策树包括至少一个分裂节点和至少两个叶子节点,所述分裂节点对应有真实分裂条件,所述叶子节点对应有叶子值;
    生成单元,用于为所述目标分裂节点生成虚假分裂条件;
    发送单元,用于向数据方发送所述目标分裂节点对应的分裂条件集合,所述分裂条件集合包括虚假分裂条件和真实分裂条件。
  6. 一种电子设备,包括:
    存储器,用于存储计算机指令;
    处理器,用于执行所述计算机指令以实现如权利要求1-4中任一项所述的方法步骤。
  7. 一种数据处理方法,应用于数据方,所述数据方持有业务数据和目标分裂节点对应的分裂条件集合,所述目标分裂节点为决策森林中与所述业务数据相关联的分裂节点,所述方法包括:
    根据所述业务数据,确定分裂条件集合中分裂条件的取值,得到取值集合;
    利用随机数对取值集合中的取值进行加密,得到取值密文集合;
    以取值密文集合为输入与模型方协作执行安全数据选择算法;
    以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
  8. 如权利要求7所述的方法,所述数据方持有全体业务数据;或者,所述模型方持有全体业务数据中的一部分业务数据,所述数据方持有全体业务数据中的另一部分业务数据;
    所述分裂条件集合包括真实分裂条件和虚假分裂条件;
    所述安全数据选择算法选自不经意传输算法和私有信息检索算法。
  9. 如权利要求7所述的方法,所述目标分裂节点的数量为至少一个;所述利用随机数对取值集合中的取值进行加密,得到取值密文集合,包括:
    为每个目标分裂节点生成随机数;
    利用目标分裂节点的随机数,对该目标分裂节点所对应取值集合中的每个取值分别进行加密,得到取值密文集合。
  10. 如权利要求7所述的方法,所述利用随机数对取值集合中的取值进行加密,得到取值密文集合,包括:
    将随机数与取值集合中的每个取值分别进行异或运算,将运算结果作为取值密文集合中的取值密文。
  11. 一种数据处理装置,设置于数据方,所述数据方持有业务数据和目标分裂节点对应的分裂条件集合,所述目标分裂节点为决策森林中与所述业务数据相关联的分裂节点,所述装置包括:
    确定单元,用于根据所述业务数据,确定所述分裂条件集合中分裂条件的取值,得到取值集合;
    加密单元,用于利用随机数对取值集合中的取值进行加密,得到取值密文集合;
    第一计算单元,用于以取值密文集合为输入与模型方协作执行安全数据选择算法;
    第二计算单元,用于以随机数为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
  12. 一种电子设备,包括:
    存储器,用于存储计算机指令;
    处理器,用于执行所述计算机指令以实现如权利要求7-10中任一项所述的方法步骤。
  13. 一种数据处理方法,应用于模型方,所述模型方持有决策森林,所述决策森林包括目标分裂节点,所述目标分裂节点与数据方持有的业务数据相关联、且对应有分裂 条件集合,所述分裂条件集合包括真实分裂条件和虚假分裂条件,所述方法包括:
    将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文;
    以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
  14. 如权利要求13所述的方法,所述决策森林还包括其它分裂节点,所述其它分裂节点与所述模型方持有的业务数据相关联、且对应有真实分裂条件;所述方法还包括:
    根据模型方持有的业务数据,确定所述其它分裂节点对应的真实分裂条件的取值;
    所述以取值密文为输入与模型方协作执行多方安全计算算法,包括:
    以目标分裂节点对应的真实分裂条件的取值密文和其它分裂节点对应的真实分裂条件的取值为输入,与模型方协作执行多方安全计算算法。
  15. 一种数据处理装置,设置于模型方,所述模型方持有决策森林,所述决策森林包括目标分裂节点,所述目标分裂节点与数据方持有的业务数据相关联、且对应有分裂条件集合,所述分裂条件集合包括真实分裂条件和虚假分裂条件,所述装置包括:
    第一计算单元,用于将分裂条件集合中真实分裂条件所在的位次作为数据选择值,以数据选择值为输入与模型方协作执行安全数据选择算法,得到真实分裂条件的取值密文;
    第二计算单元,用于以取值密文为输入与模型方协作执行多方安全计算算法,以便模型方和/或数据方获得决策森林的预测结果。
  16. 一种电子设备,包括:
    存储器,用于存储计算机指令;
    处理器,用于执行所述计算机指令以实现如权利要求13-14中任一项所述的方法步骤。
PCT/CN2020/071577 2019-07-01 2020-01-11 数据处理方法、装置和电子设备 WO2021000572A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/779,231 US20200167661A1 (en) 2019-07-01 2020-01-31 Performing data processing based on decision tree
US16/945,780 US20200364582A1 (en) 2019-07-01 2020-07-31 Performing data processing based on decision tree

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910583525.5 2019-07-01
CN201910583525.5A CN110427969B (zh) 2019-07-01 2019-07-01 数据处理方法、装置和电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/779,231 Continuation US20200167661A1 (en) 2019-07-01 2020-01-31 Performing data processing based on decision tree

Publications (1)

Publication Number Publication Date
WO2021000572A1 true WO2021000572A1 (zh) 2021-01-07

Family

ID=68409894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071577 WO2021000572A1 (zh) 2019-07-01 2020-01-11 数据处理方法、装置和电子设备

Country Status (3)

Country Link
CN (1) CN110427969B (zh)
TW (1) TWI729698B (zh)
WO (1) WO2021000572A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722739A (zh) * 2021-09-06 2021-11-30 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质
CN114900442A (zh) * 2022-05-27 2022-08-12 中金金融认证中心有限公司 用于对业务数据进行预测的方法及其相关产品

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427969B (zh) * 2019-07-01 2020-11-27 创新先进技术有限公司 数据处理方法、装置和电子设备
CN111046425B (zh) * 2019-12-12 2021-07-13 支付宝(杭州)信息技术有限公司 多方联合进行风险识别的方法和装置
CN111046408A (zh) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 判断结果处理方法、查询方法、装置、电子设备和系统
CN111144576A (zh) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 模型训练方法、装置和电子设备
CN112052875A (zh) * 2020-07-30 2020-12-08 华控清交信息科技(北京)有限公司 一种训练树模型的方法、装置和用于训练树模型的装置
CN113177212B (zh) * 2021-04-25 2022-07-19 支付宝(杭州)信息技术有限公司 联合预测方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192341A1 (en) * 2006-02-01 2007-08-16 Oracle International Corporation System and method for building decision tree classifiers using bitmap techniques
CN104601596A (zh) * 2015-02-05 2015-05-06 南京邮电大学 一种分类数据挖掘系统中数据隐私保护方法
CN108269012A (zh) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 风险评分模型的构建方法、装置、存储介质及终端
CN109034398A (zh) * 2018-08-10 2018-12-18 深圳前海微众银行股份有限公司 基于联邦训练的特征选择方法、装置及存储介质
CN109359470A (zh) * 2018-08-14 2019-02-19 阿里巴巴集团控股有限公司 多方安全计算方法及装置、电子设备
CN110427969A (zh) * 2019-07-01 2019-11-08 阿里巴巴集团控股有限公司 数据处理方法、装置和电子设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7620057B1 (en) * 2004-10-19 2009-11-17 Broadcom Corporation Cache line replacement with zero latency
FR2893796B1 (fr) * 2005-11-21 2008-01-04 Atmel Corp Procede de protection par chiffrement
JP5195149B2 (ja) * 2008-08-11 2013-05-08 富士通株式会社 真偽判定方法
US9002007B2 (en) * 2011-02-03 2015-04-07 Ricoh Co., Ltd. Efficient, remote, private tree-based classification using cryptographic techniques
US11504192B2 (en) * 2014-10-30 2022-11-22 Cilag Gmbh International Method of hub communication with surgical instrument systems
CN104636462B (zh) * 2015-02-06 2017-11-28 中国科学院软件研究所 一种能抵抗统计分析攻击的快速密文检索方法和系统
CN105678222B (zh) * 2015-12-29 2019-05-31 浙江大学 一种基于移动设备的人体行为识别方法
US20230186106A1 (en) * 2016-06-30 2023-06-15 The Trustees Of The University Of Pennsylvania Systems and methods for generating improved decision trees
CN106790165A (zh) * 2016-12-29 2017-05-31 北京信安世纪科技有限公司 一种防止重放攻击的方法
CN107124276B (zh) * 2017-04-07 2020-07-28 西安电子科技大学 一种安全的数据外包机器学习数据分析方法
CN107508799B (zh) * 2017-07-31 2018-12-04 珠海格力电器股份有限公司 一种基于即时通讯的信息呈现方法及装置
CN108900493B (zh) * 2018-06-22 2020-12-15 西安电子科技大学 一种面向大型商场交易记录的隐私保护频繁项集挖掘方法
CN109284626A (zh) * 2018-09-07 2019-01-29 中南大学 面向差分隐私保护的随机森林算法
CN109490704A (zh) * 2018-10-16 2019-03-19 河海大学 一种基于随机森林算法的配电网故障区段定位方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192341A1 (en) * 2006-02-01 2007-08-16 Oracle International Corporation System and method for building decision tree classifiers using bitmap techniques
CN104601596A (zh) * 2015-02-05 2015-05-06 南京邮电大学 一种分类数据挖掘系统中数据隐私保护方法
CN108269012A (zh) * 2018-01-12 2018-07-10 中国平安人寿保险股份有限公司 风险评分模型的构建方法、装置、存储介质及终端
CN109034398A (zh) * 2018-08-10 2018-12-18 深圳前海微众银行股份有限公司 基于联邦训练的特征选择方法、装置及存储介质
CN109359470A (zh) * 2018-08-14 2019-02-19 阿里巴巴集团控股有限公司 多方安全计算方法及装置、电子设备
CN110427969A (zh) * 2019-07-01 2019-11-08 阿里巴巴集团控股有限公司 数据处理方法、装置和电子设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722739A (zh) * 2021-09-06 2021-11-30 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质
CN113722739B (zh) * 2021-09-06 2024-04-09 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质
CN114900442A (zh) * 2022-05-27 2022-08-12 中金金融认证中心有限公司 用于对业务数据进行预测的方法及其相关产品
CN114900442B (zh) * 2022-05-27 2024-03-29 中金金融认证中心有限公司 用于对业务数据进行预测的方法及其相关产品

Also Published As

Publication number Publication date
TW202103154A (zh) 2021-01-16
CN110427969B (zh) 2020-11-27
TWI729698B (zh) 2021-06-01
CN110427969A (zh) 2019-11-08

Similar Documents

Publication Publication Date Title
WO2021000572A1 (zh) 数据处理方法、装置和电子设备
TWI745861B (zh) 資料處理方法、裝置和電子設備
TWI730622B (zh) 資料處理方法、裝置和電子設備
WO2020211485A1 (zh) 数据处理方法、装置和电子设备
CN111125727B (zh) 混淆电路生成方法、预测结果确定方法、装置和电子设备
TWI684108B (zh) 資料統計方法和裝置
WO2021027258A1 (zh) 模型参数确定方法、装置和电子设备
US20200175426A1 (en) Data-based prediction results using decision forests
WO2021114585A1 (zh) 模型训练方法、装置和电子设备
CN110580409B (zh) 模型参数确定方法、装置和电子设备
CN113239404A (zh) 一种基于差分隐私和混沌加密的联邦学习方法
CN109446828B (zh) 一种安全多方计算方法及装置
WO2019085677A1 (zh) 基于混淆电路的数据统计方法、装置以及设备
WO2021017424A1 (zh) 数据预处理方法、密文数据获取方法、装置和电子设备
WO2021000575A1 (zh) 数据交互方法、装置和电子设备
CN111428887A (zh) 一种基于多个计算节点的模型训练控制方法、装置及系统
CN111639367A (zh) 基于树模型的两方联合分类方法、装置、设备及介质
US20200167661A1 (en) Performing data processing based on decision tree
US20200293911A1 (en) Performing data processing based on decision tree
US11194824B2 (en) Providing oblivious data transfer between computing devices
US20200293908A1 (en) Performing data processing based on decision tree
TWI729697B (zh) 資料處理方法、裝置和電子設備
CN111046408A (zh) 判断结果处理方法、查询方法、装置、电子设备和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20834655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20834655

Country of ref document: EP

Kind code of ref document: A1