CN117077038A - Model privacy, data privacy and model consistency protection method of decision tree model - Google Patents

Model privacy, data privacy and model consistency protection method of decision tree model Download PDF

Info

Publication number
CN117077038A
CN117077038A CN202311115522.1A CN202311115522A CN117077038A CN 117077038 A CN117077038 A CN 117077038A CN 202311115522 A CN202311115522 A CN 202311115522A CN 117077038 A CN117077038 A CN 117077038A
Authority
CN
China
Prior art keywords
decision
model
node
data
privacy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311115522.1A
Other languages
Chinese (zh)
Inventor
苏明
杨浩然
王南
衣丽萍
张健宁
杜岱玮
刘晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202311115522.1A priority Critical patent/CN117077038A/en
Publication of CN117077038A publication Critical patent/CN117077038A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a model privacy, data privacy and model consistency protection method of a decision tree model. 1, constructing a model privacy protection scheme: the server side randomly selects a data sample according to the structure of the decision tree model, generates an arithmetic circuit in the decision process, further generates a certificate through the circuit, and can verify the certificate by the user side. 2, constructing a data privacy protection scheme: the user encrypts the privacy data to generate a data promise and a branch node range proof; the server performs range proving verification in the decision tree inference process. 3, model consistency verification: the user sends the consistency verification data set completed with promise and random exchange to the server; the server performs decision tree inference and returns a classification result; the user side verifies the correctness of the classification result returned by the server side and the consistency of the classification result of the plaintext and ciphertext of the same group of samples.

Description

Model privacy, data privacy and model consistency protection method of decision tree model
Technical Field
The invention belongs to the field of data security, and particularly relates to a model privacy, data privacy and model consistency protection method of a decision tree model.
Background
With the rise of the internet, artificial intelligence has also been increasingly used. The application of these artificial intelligence products provides on the one hand an accurate, personalized and convenient service for the user, and on the other hand a large amount of mining and collecting of user data and information. Concomitantly, user data information leakage events are multiple. Therefore, how to ensure the data privacy in the training process of the artificial intelligence model is a problem to be solved urgently. Meanwhile, for a machine learning privacy computing service provider, a machine learning model capable of providing accurate prediction results is trained, and high training cost is often required, so that the service provider sometimes does not want to easily disclose the structure and parameters of the machine learning model; in addition, once the parameters and structure of the machine learning model are disclosed, there is a risk of being attacked, so we also need to solve the technical problem of how to protect the privacy of the machine learning model. Finally, we also need a model consistency protection scheme to ensure that the machine learning privacy computing service provider provides a true and trusted machine learning decision tree model consistent with the original plaintext-based trained machine model.
Driven by the dual requirements of artificial intelligence fusion and privacy protection, privacy computing technology is gradually moving into our perspective. The privacy computing technology can help the application field represented by artificial intelligence, more data limited by privacy and safety factors are reasonably introduced, data fusion of different institutions is promoted, the data can be used and invisible, the data is of fixed value, and a scientific and technological ark for supporting across institutions, across markets and across-field data safe sharing is constructed. In the learning mode of privacy calculation, a machine learning model such as a decision tree is generally trained using conventional public data. And screening customer groups which are more suitable for product positioning from users based on the existing privacy data to provide accurate marketing decision suggestions through model inference, thereby greatly improving conversion efficiency.
The zero knowledge proof technology is a special proof system as an important component of modern cryptography. In the attestation system, an attestation party performs a series of interactions (interactions) with an attestation party. The prover proves to the verifier and believes itself to know or own a certain message, but the proving process cannot reveal any information about the proved message to the verifier. In the zero knowledge proof process, the system needs to build and distribute the necessary information needed in the protocol by the proving party and verifying party through initial setup (setup). After initiating the attestation interaction, the attestation person issues an attestation to the verifier, who will present a challenge. After receiving the challenge, the prover generates a response according to the challenge information provided by the verifier and sends the response to the verifier. Such an interaction process may last for several rounds depending on the particular certification protocol. Until the final verifier decides to finally accept or reject the proof based on the response of the proof provided by the prover to the challenge.
A zero knowledge proof system must have the following three properties:
1) Completion (completion): if the proposition is true, the proving party should be able to successfully convince the correctness of the proposition of the proving party after the completion of the interaction;
2) Reliability (soundness): except for a very small probability, the proving party cannot persuade the correctness of the proving party's proposition in the case that the proposition is false.
3) Zero-knowledge (zk for short): if the proposition is true, then the verifying party cannot learn any additional information from it, except for the fact that the correctness of the proposition is known, as evidenced by zero knowledge.
When the distribution of challenges provided by the verifier is random and independent of the information that the prover has sent, we call such verifier and zero knowledge proof publicly structured (public coin). For the zero knowledge proof of the public construction, we can transform it into a non-interactive zero-knowledge proof (non-interactive zero-knowledgeproof proof) by the Fiat-Shamir heuristic algorithm (Fiat-Shamir heuristic). For non-interactive zero-knowledge proof agreements, the prover and verifier no longer need to be online at the same time, the proof can be generated offline, stored on the chain, and anyone allowed to verify.
Zero knowledge proves perfect integration with the blockchain. On one hand, the zero knowledge proof can protect the privacy of data or a model, and the data or the model can be proved under the condition of no leakage; on the other hand, the zero-knowledge proof can finish the proof of large-batch data only by generating a proof of small data volume, and can play a great role in improving the performance of compressed data volume.
Disclosure of Invention
The invention aims to solve the following technical problems:
1, how to ensure that a third party believes that the existing model can achieve a good classification effect under the condition that the server does not reveal any information of the decision tree model, namely, the privacy security of the model is ensured.
2, how to ensure that the user side does not reveal personal data in the process of predicting by using the decision tree model, namely ensuring the data privacy safety of the user.
And 3, how to realize that the client can verify the consistency of the predictive model adopted by the ciphertext-based service provider and the predictive model obtained by the original plaintext-based training, thereby preventing the cheating of the service end.
And 4, how to deploy the root node promise of the zero knowledge proof decision tree generated by the server and the data promise generated by the user end to the blockchain as the certification, and storing the zero knowledge proof by using a weakly centralized server. The two parties can open promise to obtain evidence when necessary, and the authenticity, traceability and non-falsification of the model and the data are ensured.
Therefore, the invention provides a model privacy, data privacy and model consistency protection method of a decision tree model.
The invention is realized by the following technical scheme:
a model privacy, data privacy and model consistency protection method for a decision tree model, the method comprising the steps of:
step 1, constructing a model privacy protection scheme
The server randomly selects a data sample to make a decision according to a trusted decision tree model, and constructs a zero knowledge proof arithmetic circuit according to a decision process; then, a trusted third party performs trusted initialization according to the constructed arithmetic circuit to generate public parameters: the method comprises the steps of proving a secret key and a verification secret key, and then sending the secret keys to a server; then the server generates a certificate according to the arithmetic circuit structure and the certificate key and sends the verification key and the certificate to the user; finally, the user terminal verifies according to the verification key and the certification;
step 2, constructing a data privacy protection scheme
The server side needs to disclose a set of branch nodes of the trusted decision tree model to the user side, wherein the disclosed set of branch nodes comprises attributes and thresholds of branch node dividing conditions; the user side sequentially performs range proving on the difference between the threshold value of the dividing condition of each branch node and the data sample with the corresponding attribute based on the set of the branch nodes disclosed by the server side, and simultaneously generates a data promise and a branch node range proving by encrypting the privacy data for each data sample, and sends the generated data promise set and the generated branch node range proving set to the server side; finally, the server performs range proving verification in the decision tree deducing process, gives out the classification result of the decision tree, and sends the result to the user side;
step 3, constructing a model consistency verification scheme
The user sends the consistency verification data set completed with promise and random exchange to the server; the server performs decision tree inference and returns a classification result; the user side verifies the correctness of the classification result returned by the server side and the consistency of the classification result of the plaintext and ciphertext of the same group of samples.
In the above technical solution, in step 1, the following data need to be recorded in the decision process:
first, decision resultsDecision Path->Leaf node->Brother node hash of decision pathSamples arranged in order of attributes according to decision tree model +.>Attribute list of +.>
Secondly, recording an authentication decision tree ADT of the decision tree model DT, wherein the authentication decision tree comprises the following four types of nodes:
(1) leaf node: the leaf node stores the following contents: hash value leaf.hash of the leaf node, classification leaf.class of the leaf node and leaf node auxiliary information leaf.sup;
(2) branch node of root left subtree: the storage content of the branch node is as follows: hash value node.hash of each branch node, pointers node.lc and node.rc pointing to child nodes, decision attribute node.att of each branch node, decision threshold node.thr of each branch node;
(3) root node right subtree: a random number z;
(4) root node: root node promise Comm ADT
In the above technical scheme, in step 1, the constructed zero knowledge proof arithmetic circuit is divided into three parts, and the functions thereof are respectively: the first part proves that the attributes used by the sample are all the attributes input by the sample, namely, the attributes of the sample used by the decision tree model are checked; the second part proves the existence of a decision process, namely, in the process of using a decision tree model to make decisions on samples, each decision is performed by comparing an attribute value of the samples with a decision threshold value, and a decision path is obtained according to each decision result; the third part, prove the validity of the decision process, namely prove that each decision can have two possible different results according to the difference of attribute values; through these three parts, a zero knowledge proof arithmetic circuit is constructed that is capable of describing the complete process of data sample decision.
In the above technical solution, in step 1, a first part of the specific method for constructing the zero knowledge proof arithmetic circuit is as follows: first, an attribute tag is designed for each attribute of a data sample, wherein the sampleThe attribute label of the ith attribute isWherein r' is the finite field->A random number on the table; then define +.>Sample arranged in order of attributes for use in decision tree model +.>Attribute list of +.>Adding attribute labels to each attribute in the list, and inputting sample attributes in decision process>At->The number of occurrences in (a) is marked->When the sample is given, < >>To determine the value, it is also possible to rely on +.>Is to be used for verifying arithmetic circuitAnd->Whether or not to establish; in this part, will->As a public input +_>And r' as private input, and +.>Additional information required as proof of authentication is provided to the user.
In the above technical solution, in step 1, the specific method for constructing the second part of the zero knowledge proof arithmetic circuit is as follows: will beAnd->As private input of the arithmetic circuit, for each decision, compare +.>Andnode.thr,/>i-th attribute value representing sample x, < ->node. Thr represents the decision threshold of the branch node on the decision path corresponding to the ith attribute value of sample x, if +.>Then the comparison result outputs 1, whereas output 0; the comparison result is then compared with the child node selection scheme of the branch node>child makes a consistency comparison to verify whether the decision path is determined by the decision result; during verification, decision result is required to be +.>As a public input of the arithmetic circuit, it is associated with the decision path +.>The pointers of the branch nodes pointing to the child nodes construct a relation through an arithmetic circuit, wherein the pointers of the branch nodes pointing to the child nodes are node lc, representing +.>child=1, the pointer of the branch node to child node is node. Rc, representing +.>child=0。
In the above technical solution, in step 1, the third specific method for constructing the zero knowledge proof arithmetic circuit is as follows: committing root node to Comm ADT Hash of brother nodeAs a public input, will blockPolicy path->And a random number z as a private input; according to decision path->Sibling node hash->According to the construction mode of the authentication decision tree, an arithmetic circuit is constructed from leaf nodes layer by layer upwards, and the hash value of a root node in the arithmetic circuit is Comm'; the hash value Comm' of the root node and the root node commitment Comm are then verified ADT Is a uniform property of (a).
In the above technical solution, in step 2, when the server performs the range verification in the decision tree inference process, the value of promise hiding is proved to be in the characteristic interval, so as to complete the comparison of the secret state data and the branch node discrimination threshold.
In the above technical solution, in step 3, the data is encrypted using the petersen commitment algorithm, specific contents of the data are hidden, and commitments corresponding to the data samples are generated, so that data samples containing plaintext and ciphertext are formed, and a consistency verification data set is formed.
The invention has the advantages and beneficial effects that:
(1) For a machine learning server, model privacy security is guaranteed: by using the zk-SNARK technology, the method can realize that a third party believes that the existing model can achieve a good classification effect under the condition that any information of the decision tree model is not revealed, and can deploy the finally generated evidence to a blockchain, so that any user can verify.
(2) For the user side, the data privacy security is ensured: the server side uses Bulletproffs range to prove and realize the judgment of the dividing condition and the state. Under the condition that sensitive data is not exposed, the user side can obtain a result fed back by the trusted model.
(3) The user can verify consistency of the predictive model adopted by the ciphertext-based service provider and the predictive model obtained by original plaintext-based training, and prevent the cheating of the service end.
(4) The work can utilize a weakly centralized server to store zero knowledge proof, deploy promise generated by an AI model and data privacy protection to a blockchain as a certification, provide a reliable platform for data interaction, provide a safe hosting platform for data hosting, and ensure the authenticity, traceability and non-falsification of the model and the data.
(5) The realization of the work can provide lower cost data and model sharing schemes which are more in line with national requirements for the service end and the user end: the user can be willing to provide and share the private data from the data layer; from the AI model level, models of commercial value can be protected and security threats avoided.
Drawings
FIG. 1 is a diagram showing interaction between a server and a client according to the present invention.
FIG. 2 is a logic diagram illustrating the construction of zk-SNARK arithmetic circuits by the model privacy preserving scheme of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description is provided with reference to specific examples.
The invention designs a model privacy, data privacy and model consistency protection method of a decision tree model, which comprises the following two roles:
(1) The server side: the machine learning technology is provided by the server, and predictions can be made for given data samples, but the server does not want to reveal the machine learning model, so that model privacy needs to be protected.
(2) The user terminal: the user side provides data, but some of the data are sensitive data, so that the user does not want to disclose the data, and therefore the technology is required to protect the data privacy.
In this embodiment, it is assumed that the server already has a trained trusted decision tree model, and the client already has a selected data set that needs to be privacy protected.
Referring now to fig. 1 and 2, a method for protecting model privacy, data privacy and model consistency of a decision tree model according to the present invention is described, and comprises the following steps.
And 1, constructing a model privacy protection scheme. That is, the server needs to prove possession of the decision tree without revealing the specific content of the decision tree model, so that the proof needs to be given, which is divided into the following steps.
Step 1.1, firstly, a server randomly selects a data sample (x y) to make a decision according to a trained trusted decision tree model DT, and a zk-SNARK arithmetic circuit (zk-SNARK, namely Zero Knowledge Succinct Non-interactive Arguments of Knowledge, a fast algorithm for non-interactive proof of a zero knowledge proof set is constructed according to a decision process of the data sample, and the fast algorithm is a fast calculated zero knowledge proof).
Further, the following data need to be recorded in the decision process:
first, decision resultsDecision Path->Leaf node->Brother node hash of decision pathSamples arranged in order of attributes according to decision tree model +.>Attribute list of +.>
Secondly, recording an authentication decision tree ADT of the decision tree model DT, wherein the authentication decision tree comprises the following four types of nodes:
(1) leaf node: the leaf node stores the following contents: the purpose of adding auxiliary information to the leaf nodes is to enable the leaf nodes with the same prediction classification to have different hash values, and the auxiliary information value can be set as the leaf node number or the memory address for storing the leaf nodes.
(2) Branch node of root left subtree: the storage content of the branch node is as follows: hash value node.hash of each branch node, pointers node.lc and node.rc (node.lc refers to the left sub node, node.lc refers to the right sub node), decision attribute node.att of each branch node, decision threshold node.thr of each branch node.
(3) Root node right subtree: random number z. The random number z is introduced to improve the security of the authentication decision tree, if the promise of the authentication decision tree is required to be given for multiple times, different promise can be obtained by selecting different random numbers z, and the information of the authentication decision tree is prevented from being revealed in the process.
(4) Root node: root node promise Comm ADT I.e. the hash values of the left and right child nodes of the root node.
Further, referring to fig. 2, the zk-snare arithmetic circuit is constructed in three parts, the functions of which are:
first, the attributes used by the proving sample are all the attributes input by the sample, namely, the attributes of the sample used by the decision tree model are checked. Specifically, first, an attribute tag is designed for each attribute of a data sample, where the sampleThe attribute tag of the ith attribute is +.>Wherein r' is the finite field->A random number on the table; then define +.> Sample arranged in order of attributes for use in decision tree model +.>Is +.>Adding attribute tags to each attribute of the formula I, wherein the attribute tags are +.>The sample property in the decision process will then be entered>At->The number of occurrences in (a) is marked->When a sample is given,to determine the value, it is also possible to rely on +.>Is to be used for verifying arithmetic circuitAnd->Whether or not to establish; in this part, will->As a public input +_>And r' as private input, and +.>Additional information required as proof of authentication is provided to the user.
Secondly, the existence of a decision process is proved, namely, in the process of using a decision tree model to make decisions on a certain sample, each decision is obtained by comparing an attribute value of the sample with a decision threshold value, and a decision path is obtained according to each decision result. Specifically, it willAnd->As a private input to the arithmetic circuit, the comparator compactison_gadget provided in the libsnark library can be used for each decision to compare +.>Andnode. Thr as input of comparator, +.>Representation sample->Is the i-th attribute value of->node. Thr represents sample +.>Decision path corresponding to ith attribute valueDecision threshold of branch node on path, if +.> node. Thr, then the comparator result outputs 1, whereas output 0; the comparison result is then compared with the child node selection scheme of the branch node>child makes consistency comparison to achieve the purpose of verifying whether the decision path is determined by the decision result; during verification, decision result is required to be +.>As a public input of the arithmetic circuit, it is associated with the decision path +.>Pointers to child nodes of the middle branch node build relationships through arithmetic circuits. If, in the branch path, the pointer of the branch node to the child node is node lc, representing +.>child=1; the pointer of the branch node to the child node is node. Rc, representing +.>child=0。
Thirdly, the validity of the decision process is proved, that is, that each decision will indeed have two possible different results according to the difference of the attribute values, that is, that the decision tree model needs to be proved to have a binary tree structure. Specifically, the root node needs to be committed to Comm ADT Hash of brother nodeAs an input of the disclosure of the present invention,decision Path->And a random number z as a private input; according to decision path->Sibling node hash->According to the construction mode of the authentication decision tree, an arithmetic circuit is constructed from leaf nodes layer by layer upwards, and the hash value of a root node in the arithmetic circuit is Comm'; the hash value Comm' of the root node and the root node commitment Comm are then verified ADT Is a uniform property of (a).
Step 1.2, carrying out trusted initialization by a trusted third party according to the constructed zk-SNARK arithmetic circuit (a service end is required to send the arithmetic circuit to the trusted third party, and the arithmetic circuit sent to the trusted third party only needs to show the structure of the arithmetic circuit and does not need to give the value of each variable in the arithmetic circuit), so as to generate a public parameter: the certification key pk and the verification key vk are then sent to the server.
And 1.3, generating a proof by the server according to the zk-SNARK arithmetic circuit and the proof key pk. The proof is the proof of zk-SNARK arithmetic circuit, and the process of generating the proof needs to generate the proof for public input and private input according to the arithmetic circuit and the proof key pk. This proof indicates that the server has private input corresponding to a given public input, and can also indicate that he has a trained decision tree model. After the server generates the proof, the verification key vk and the proof are sent to the client.
And step 1.4, finally, the user performs verification according to the verification key and the certification. When the user side verifies, the given certificate needs to be verified according to the verification key and the public input, and if the given certificate passes the verification, the user side believes that the server side has the decision tree model. If the verification is passed, the next step is performed.
Further, the present embodiment uses the Groth16 algorithm for zero knowledge proof, which is a zero knowledge proof algorithm that starts from the QAP problem built by the arithmetic circuit (zk-SNARK cannot be directly applied to any computational problem and therefore requires the problem to be converted into the correct "form" of operation, which is called QAP, i.e. "quadratic arithmetic program"). In Groth16 algorithm, the prover (server) knows the solution of the QAP problemThe verifier (client) needs to know only one prefix of this solution; the goal of the prover (server) is to prove to the verifier (client) that the number it knows is indeed a prefix of a legal solution. The Groth16 algorithm is divided into the following three processes, referred to as setup, move and verify processes, respectively. The setup process, corresponding to step 1.2, generates public parameters, that is, a series of random numbers, by a trusted third party for the arithmetic circuit, thereby constructing a proof key pk and a verification key vk, after the public parameters are present, the move and verify processes can be operated; the Prove process corresponds to the step 1.3, and the Prove process generates a character string for proving proof; and a verify process, corresponding to step 1.4, wherein the verify process can verify the validity of the character string. Wherein proof character string contains three elliptic curve points, and the invention is marked as A, B and C; when the verifier has taken these three points, it can be verified whether these three points are given proof or not, in combination with the known solution vector prefix.
And 2, constructing a data privacy protection scheme. That is, the user side needs to obtain the prediction result without giving specific data content, so that it needs to implement the inference of the dense state data decision tree model, which is divided into the following steps.
Step 2.1, the server side needs to disclose a set V of branch nodes of the trusted decision tree model to the user side, wherein the disclosed set V of branch nodes comprises an attribute v.att and a threshold v.thr of a branch node dividing condition, but does not need to disclose a specific arrangement, a hierarchical relationship and a left-right relationship of the branch nodes in the decision tree model. In addition, to protect the trusted decision tree model from attack or leakage, the server may incorporate some randomly generated partitioning conditions as confusion.
Step 2.2, the user side sequentially performs range proof on the difference between the threshold value of the dividing condition of each branch node and the data sample of the corresponding attribute based on the set V of the branch nodes of the trusted decision tree model disclosed by the server side (the range proof can be realized by adopting a Bulletproffs algorithm); while the client is directed to each data sample x i Encryption of private data to generate data commitmentsAnd branch node scope Proof { Proof i,1 ,...,Proof i,l And the generated data commitment set C and the branch node range proving set P are sent to the server.
And 2.3, finally, after obtaining two items of secret state data, namely the data promise and the branch node range evidence provided by the user terminal, the server terminal executes secret state data inference, namely the range evidence is verified in the decision tree inference process, and the decision tree classification result Y is sent to the user terminal. Specifically, when verifying the range proof, proving that the promised hidden value is in a specific interval to complete the comparison of the secret state data and the branch node discrimination threshold value; for example, with the i data sample currently judged, the branch node dividing condition is x i,v.att In the case of +.thr, where x i,v.att The value of the v.att attribute representing the i-th data sample: (1) If the range verification result is True, proving x i,vatt Thr is less than or equal to v; (2) Otherwise prove x i,v.att >v.thr。
And 3, constructing a model consistency verification scheme. That is, the user side also needs to prove: the decision tree model used by the server in the step of deducing the secret state data has consistency with the decision tree model which is verified to be credible.
Firstly, a user randomly extracts t data samples based on a public data setAnd encrypted using the Pedersen commitment algorithmData, hiding specific content of the data, and generating t commitments corresponding to t data samples>Thereby forming t groups of data samples containing both plaintext and ciphertext, constituting a consistency verification dataset +.>
Then the user side scrambles the consistency verification data set to obtain a scrambled consistency verification data set T', for example, the 1 st sample and the T th sample can be exchanged to obtainAnd then the user terminal sends the T' to the server terminal.
Then the server performs decision tree inference on all the data samples and classifies the decision treeSending the message to a user side; finally, the user side verifies the correctness of the classification result and verifies the sample of the same group +.>Consistency of results.
The security of the consistency verification is specifically described as follows:
first, integrity: if the server is honest (the two models are unified), the classification result of the server should be the same for the same sample (plaintext+ciphertext).
Second, robustness: if the server is dishonest (using a dummy model), the probability that the server will give the correct result given two classifications of t data samples is expected to be 1/2 t When the experiment is repeated, t is large enough, the server cannot pass the test, and the reliability is satisfied. Meanwhile, the perfect hiding property of the Pedersen promise can be utilized to obtain that the server cannot followDeriving->Conclusions of relevant information.
Third, zero knowledge: perfect concealment of Pedersen promise ensures that a server cannot followObtain the related->Because of the participation of the random number calculation, the server side cannot acquire the related ++from the classification result point of view>Is satisfied with zero knowledge.
The foregoing has described exemplary embodiments of the invention, it being understood that any simple variations, modifications, or other equivalent arrangements which would not unduly obscure the invention may be made by those skilled in the art without departing from the spirit of the invention.

Claims (8)

1. A model privacy, data privacy and model consistency protection method of a decision tree model is characterized in that: the method comprises the following steps:
step 1, constructing a model privacy protection scheme
The server randomly selects a data sample to make a decision according to a trusted decision tree model, and constructs a zero knowledge proof arithmetic circuit according to a decision process; then, a trusted third party performs trusted initialization according to the constructed arithmetic circuit to generate public parameters: the method comprises the steps of proving a secret key and a verification secret key, and then sending the secret keys to a server; then the server generates a certificate according to the arithmetic circuit structure and the certificate key and sends the verification key and the certificate to the user; finally, the user terminal verifies according to the verification key and the certification;
step 2, constructing a data privacy protection scheme
The server side needs to disclose a set of branch nodes of the trusted decision tree model to the user side, wherein the disclosed set of branch nodes comprises attributes and thresholds of branch node dividing conditions; the user side sequentially performs range proving on the difference between the threshold value of the dividing condition of each branch node and the data sample with the corresponding attribute based on the set of the branch nodes disclosed by the server side, and simultaneously generates a data promise and a branch node range proving by encrypting the privacy data for each data sample, and sends the generated data promise set and the generated branch node range proving set to the server side; finally, the server performs range proving verification in the decision tree deducing process, gives out the classification result of the decision tree, and sends the result to the user side;
step 3, constructing a model consistency verification scheme
The user sends the consistency verification data set completed with promise and random exchange to the server; the server performs decision tree inference and returns a classification result; the user side verifies the correctness of the classification result returned by the server side and the consistency of the classification result of the plaintext and ciphertext of the same group of samples.
2. The method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 1, wherein: in step 1, the following data need to be recorded in the decision process:
first, decision resultsDecision Path->Leaf node->Brother node hash of decision pathSamples arranged in order of attributes according to decision tree model +.>Attribute list of +.>
Secondly, recording an authentication decision tree ADT of the decision tree model DT, wherein the authentication decision tree comprises the following four types of nodes:
(1) leaf node: the leaf node stores the following contents: hash value leaf.hash of the leaf node, classification leaf.class of the leaf node and leaf node auxiliary information leaf.sup;
(2) branch node of root left subtree: the storage content of the branch node is as follows: hash value node.hash of each branch node, pointers node.lc and node.rc pointing to child nodes, decision attribute node.att of each branch node, decision threshold node.thr of each branch node;
(3) root node right subtree: a random number z;
(4) root node: root node promise Comm ADT
3. The method for protecting model privacy, data privacy and model consistency of decision tree model according to claim 2, wherein: in step 1, the constructed zero knowledge proof arithmetic circuit is divided into three parts, and the functions of the zero knowledge proof arithmetic circuit are as follows: the first part proves that the attributes used by the sample are all the attributes input by the sample, namely, the attributes of the sample used by the decision tree model are checked; the second part proves the existence of a decision process, namely, in the process of using a decision tree model to make decisions on samples, each decision is performed by comparing an attribute value of the samples with a decision threshold value, and a decision path is obtained according to each decision result; the third part, prove the validity of the decision process, namely prove that each decision can have two possible different results according to the difference of attribute values; through these three parts, a zero knowledge proof arithmetic circuit is constructed that is capable of describing the complete process of data sample decision.
4. A method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 3, wherein: in step 1, the first part of the specific method for constructing the zero knowledge proof arithmetic circuit is as follows: first, an attribute tag is designed for each attribute of a data sample, wherein the sampleThe attribute tag of the ith attribute is +.>Wherein r' is the finite field->A random number on the table; then define +.>Sample arranged in order of attributes for use in decision tree model +.>Attribute list of +.>Adding attribute labels to each attribute in the system, and inputting sample attributes in the decision processAt->The number of occurrences in (a) is marked->Finally construct arithmetic circuit verificationAnd->Whether or not to establish; in this part, will->As a public input +_>As a private input.
5. A method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 3, wherein: in step 1, the second part of the method for constructing the zero knowledge proof arithmetic circuit comprises the following specific steps: will beAnd path x As private input of the arithmetic circuit, for each decision, compare +.>And i-th attribute value representing sample x, < ->A decision threshold representing a branch node on the decision path corresponding to the ith attribute value of sample x, if +.>Then the comparison result outputs 1Outputting 0 in the opposite way; the comparison result is then compared with the child node selection scheme of the branch node>Making a consistency comparison to verify whether the decision path is determined by the decision result; during verification, decision result is required to be +.>As a public input of the arithmetic circuit, it is associated with the decision path +.>The pointers of the branch nodes pointing to the child nodes construct a relation through an arithmetic circuit, wherein the pointers of the branch nodes pointing to the child nodes are node.lc, and represent the branch nodesThe pointer of the branch node to the child node is node. Rc, representing the branch node
6. A method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 3, wherein: in step 1, the third part of the method for constructing the zero knowledge proof arithmetic circuit comprises the following specific steps: committing root node to Comm ADT Hash of brother nodeAs public input, decision Path +.>And a random number z as a private input; according to decision path->Sibling node hash->According to the construction mode of the authentication decision tree, an arithmetic circuit is constructed from leaf nodes layer by layer upwards, and the hash value of a root node in the arithmetic circuit is Comm'; the hash value Comm' of the root node and the root node commitment Comm are then verified ADT Is a uniform property of (a).
7. The method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 1, wherein: in step 2, when the server performs range proving verification in the decision tree deducing process, proving that the promised hidden value is in the characteristic interval, so as to complete the comparison of the secret state data and the branch node distinguishing threshold value.
8. The method for protecting model privacy, data privacy and model consistency of a decision tree model according to claim 1, wherein: in step 3, the Pedersen commitment algorithm is used for encrypting data, hiding the specific content of the data, and generating commitments corresponding to the data samples, so that the data samples simultaneously containing plaintext and ciphertext are formed, and a consistency verification data set is formed.
CN202311115522.1A 2023-08-31 2023-08-31 Model privacy, data privacy and model consistency protection method of decision tree model Pending CN117077038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311115522.1A CN117077038A (en) 2023-08-31 2023-08-31 Model privacy, data privacy and model consistency protection method of decision tree model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311115522.1A CN117077038A (en) 2023-08-31 2023-08-31 Model privacy, data privacy and model consistency protection method of decision tree model

Publications (1)

Publication Number Publication Date
CN117077038A true CN117077038A (en) 2023-11-17

Family

ID=88716928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311115522.1A Pending CN117077038A (en) 2023-08-31 2023-08-31 Model privacy, data privacy and model consistency protection method of decision tree model

Country Status (1)

Country Link
CN (1) CN117077038A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081807A1 (en) * 2019-09-17 2021-03-18 Sap Se Non-Interactive Private Decision Tree Evaluation
US20210083841A1 (en) * 2019-09-17 2021-03-18 Sap Se Private Decision Tree Evaluation Using an Arithmetic Circuit
WO2022153377A1 (en) * 2021-01-13 2022-07-21 富士通株式会社 Control method, information processing system, information processing device, and control program
CN114841363A (en) * 2022-04-11 2022-08-02 北京理工大学 Privacy protection and verifiable federal learning method based on zero-knowledge proof
US20230269090A1 (en) * 2022-02-18 2023-08-24 Onai Inc. Apparatus for secure multiparty computations for machine-learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081807A1 (en) * 2019-09-17 2021-03-18 Sap Se Non-Interactive Private Decision Tree Evaluation
US20210083841A1 (en) * 2019-09-17 2021-03-18 Sap Se Private Decision Tree Evaluation Using an Arithmetic Circuit
WO2022153377A1 (en) * 2021-01-13 2022-07-21 富士通株式会社 Control method, information processing system, information processing device, and control program
US20230269090A1 (en) * 2022-02-18 2023-08-24 Onai Inc. Apparatus for secure multiparty computations for machine-learning
CN114841363A (en) * 2022-04-11 2022-08-02 北京理工大学 Privacy protection and verifiable federal learning method based on zero-knowledge proof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋瀚;刘怡然;宋祥福;王皓;郑志华;徐秋亮;: "隐私保护机器学习的密码学方法", 电子与信息学报, no. 05 *

Similar Documents

Publication Publication Date Title
US11580321B2 (en) Systems, devices, and methods for machine learning using a distributed framework
US11379263B2 (en) Systems, devices, and methods for selecting a distributed framework
Kol et al. Interactive distributed proofs
US10846372B1 (en) Systems and methods for trustless proof of possession and transmission of secured data
US20230237437A1 (en) Apparatuses and methods for determining and processing dormant user data in a job resume immutable sequential listing
US11475141B1 (en) Apparatus and methods for verifying lost user data
Cui et al. Proof of retrievability with public verifiability resilient against related‐key attacks
Ruhrmair Sok: Towards secret-free security
Yan et al. Blockchain-based verifiable and dynamic multi-keyword ranked searchable encryption scheme in cloud computing
US8325913B2 (en) System and method of authentication
Liang et al. Decentralized crowdsourcing for human intelligence tasks with efficient on-chain cost
US20230297691A1 (en) Apparatus and methods for verifying lost user data
CN117077038A (en) Model privacy, data privacy and model consistency protection method of decision tree model
CN113362065A (en) Online signature transaction implementation method based on distributed private key
Hajar et al. Blockchain Security Attacks: A Review Study
Bruinderink et al. Towards post-quantum bitcoin
US11856095B2 (en) Apparatus and methods for validating user data by using cryptography
US12008472B2 (en) Apparatus and method for generating a compiled artificial intelligence (AI) model
US20230252417A1 (en) Apparatus and methods for selection based on a predicted budget
Kansal et al. Forward Secure Efficient Group Signature in Dynamic Setting using Lattices
Fenzi Zero Knowledge Proofs: Theory and Applications
CN110084050B (en) Attribute-based encryption microgrid transaction method based on block chain
US20230318833A1 (en) Systems and methods for managing tokens
Xue et al. Classical Communication Coset State (t, n) Threshold Quantum Digital Signature
ESTAJI Individual Verifiability for E-Voting, From Formal Verification To Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination