CN110991655A - Method and device for processing model data by combining multiple parties - Google Patents

Method and device for processing model data by combining multiple parties Download PDF

Info

Publication number
CN110991655A
CN110991655A CN201911298674.3A CN201911298674A CN110991655A CN 110991655 A CN110991655 A CN 110991655A CN 201911298674 A CN201911298674 A CN 201911298674A CN 110991655 A CN110991655 A CN 110991655A
Authority
CN
China
Prior art keywords
result
model
random number
ciphertexts
owner
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911298674.3A
Other languages
Chinese (zh)
Other versions
CN110991655B (en
Inventor
韩帅
陈宇
马环宇
雷浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201911298674.3A priority Critical patent/CN110991655B/en
Publication of CN110991655A publication Critical patent/CN110991655A/en
Priority to PCT/CN2020/123982 priority patent/WO2021120861A1/en
Application granted granted Critical
Publication of CN110991655B publication Critical patent/CN110991655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0869Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3218Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using proof of knowledge, e.g. Fiat-Shamir, GQ, Schnorr, ornon-interactive zero-knowledge proofs

Abstract

The embodiment of the specification provides a method and a device for model data processing by multi-party union for protecting data privacy. According to the method, after a model owner performs homomorphic encryption on k model parameters to obtain k parameter ciphertexts, a first random number is selected to encrypt a random first message to obtain an auxiliary cipher text. In addition, the model owner and the data owner together determine k challenge numbers. Then, the model owner combines the k challenge numbers with the model parameters and the first message, and the encrypted random number and the first random number respectively to generate a verification random number and a verification message as a zero-knowledge proof. The data owner may verify whether the result of the homomorphic operation on the parameter ciphertext and the auxiliary ciphertext based on the k challenge numbers is equal to the result of encrypting the verification message with the verification random number. Under the condition of equality, homomorphic operation is carried out by using the sample characteristics of the model data, and the homomorphic operation is returned to the model owner, so that the security of privacy data of all parties in the process of model data is ensured.

Description

Method and device for processing model data by combining multiple parties
Technical Field
One or more embodiments of the present specification relate to the field of machine learning and the field of data security, and more particularly, to a method and apparatus for model data processing by multi-party federation.
Background
With the development of computer technology, machine learning has been applied to various technical fields for analyzing and processing various business data. Data needed by machine learning often relate to multiple fields, for example, in a merchant classification analysis scene based on machine learning, an electronic payment platform has transaction flow data of merchants, an electronic commerce platform stores sales data of the merchants, and a banking institution has loan data of the merchants. Data often exists in the form of islands. Due to the problems of industry competition, data safety, user privacy and the like, data integration faces great resistance, and data scattered on various platforms are integrated together to train a machine learning model and are difficult to realize. Therefore, a way of multi-party joint training and business processing using machine learning models is proposed.
In a scenario of joint training of multiple parties and using a machine learning model, protection and security of data privacy become a significant issue. For example, in a multi-party computing scenario, party a holds the user sample feature data to be processed, and party B holds the data processing model. When the data processing model is used for processing sample characteristic data, if the party A directly sends the sample data to the party B, the characteristic value of a user sample is exposed, and the privacy of the user is revealed; if party B provides the data processing model to party a for use, the model parameters of the data processing model are exposed.
Therefore, it is desirable to provide an improved scheme for protecting privacy data of parties during processing of model data by multiple parties in a combined manner.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for performing model data processing by combining multiple parties, where after performing homomorphic encryption on multiple model parameters, a model owner also generates a zero-knowledge proof of ciphertext legality for verification by the data owner, thereby further protecting the security of private data of each party against disclosure.
According to a first aspect, there is provided a method for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the method performed by the model owner, comprising:
respectively encrypting k model parameters in the owned first model by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively to obtain k parameter ciphertexts;
randomly selecting a first message and a first random number; encrypting the first message by adopting the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext;
sending the k parameter ciphertexts and the auxiliary ciphertexts to the data owner;
acquiring k challenge numbers;
linearly combining the k challenge numbers with the k model parameters, superposing the k challenge numbers with the first message, and obtaining a verification message based on a superposition result; combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number;
sending a zero-knowledge proof to the data owner, wherein the zero-knowledge proof comprises the verification message and a verification random number, so that the data owner verifies whether a first result obtained by homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers is equal to a second result obtained by encrypting the verification message by using the homomorphic encryption algorithm based on the public key and the verification random number;
and receiving a feature operation ciphertext sent by the data owner, wherein the feature operation ciphertext is a result of homomorphic addition operation on the k parameter ciphertexts by using k sample features owned by the data owner under the condition that the data owner passes the verification, and the feature operation ciphertext is used for restoring a feature operation result by the model owner so as to perform service processing based on the feature operation result.
In one embodiment, the first model is a linear regression model, or a logistic regression model.
According to one embodiment, the public key comprises a natural number N and a generator h of a cyclic subgroup in a random number space defined by the natural number N;
in such a case, the step of encrypting the first message to obtain the auxiliary ciphertext may include: performing a power operation on (N +1) using the first message to obtain a first intermediate result; performing group operation based on the generator h by using the first random number to obtain a second intermediate result; and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
Accordingly, in one example of the above embodiment, the verification random number may be obtained by: and linearly combining the k challenge numbers and the k random numbers, and superposing the k challenge numbers and the k random numbers with the first random number to obtain the verification random number.
According to another embodiment, the public key comprises at least a natural number N; in such a case, the step of encrypting the first message to obtain the auxiliary ciphertext may include: performing a power operation on (N +1) using the first message to obtain a first intermediate result; performing an N power operation on the first random number to obtain a second intermediate result; and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
Accordingly, in one example of the above embodiment, the verification random number may be obtained by: sequentially performing power operation on the ith random number by using the ith challenge number for the ith challenge number and the ith random number to obtain k power operation results; and multiplying the k power operation results with the first random number to obtain the verification random number.
In one embodiment, the k challenge numbers are obtained by: receiving the k challenge numbers from the data owner for random selection thereof.
In another embodiment, the k challenge numbers are obtained by: and calculating the k challenge numbers by using a hash algorithm agreed with the data owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
More specifically, in one example, calculating the k challenge numbers may include: arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence; respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences; and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
In another example, calculating the k challenge numbers may include: arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes; and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
According to an embodiment, the step of obtaining the verification message based on the first superposition result may include: and taking the modulus of the first superposition result to N, and taking the modulus result as the verification message, wherein N is a natural number in a public key.
In one embodiment, the method further comprises: and decrypting the feature operation ciphertext by adopting a decryption algorithm corresponding to the homomorphic encryption algorithm and using a private key corresponding to the public key to obtain the feature operation result, wherein the feature operation result corresponds to the linear combination of the k sample features and the k model parameters.
According to a second aspect, there is provided a method for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the method performed by the data owner, comprising:
receiving k parameter ciphertexts and auxiliary ciphertexts of which the encryption validity is to be verified from the model owner; under the condition of legal encryption, the k parameter ciphertexts are obtained by respectively encrypting k model parameters by the model owner by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively, and the auxiliary ciphertexts are obtained by encrypting a random first message by adopting the homomorphic encryption algorithm based on the public key and the first random number;
determining k challenge numbers;
receiving a zero knowledge proof from the model owner including a verification message and a verification random number; wherein the verification message is based on a linear combination of the k challenges with the k model parameters and a superposition with the first message, the verification nonce is based on a combination of the k challenges with the k nonces and the first nonce;
performing homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers to obtain a first result; encrypting the verification message based on the public key and the verification random number by adopting the homomorphic encryption algorithm to obtain a second result;
under the condition that the first result is equal to the second result, homomorphic addition operation is carried out on the k parameter ciphertexts by using the owned k sample characteristics to obtain a characteristic operation cipher text;
and sending the feature operation ciphertext to the model owner, wherein the feature operation ciphertext is used for restoring a feature operation result by the model owner, so that business processing is performed based on the feature operation result.
In one embodiment, the public key comprises a natural number N and a generator h of a cyclic subgroup in a random number space defined by the natural number N; accordingly, the second result may be obtained by: performing a power operation on (N +1) by using the verification message to obtain a first intermediate item; performing group operation based on the generator h by using a verification random number to obtain a second intermediate item; and obtaining the second result based on the first intermediate item and the second intermediate item.
In one embodiment, the first result is obtained by: sequentially performing power operation on the ith parameter ciphertext by using the ith challenge number for the ith challenge number and the ith parameter ciphertext to obtain k power operation results; and multiplying the auxiliary ciphertext by the k power operation results, and taking a product result as the first result.
According to one embodiment, the feature operation ciphertext is obtained by: sequentially subjecting the ith sample characteristic and the ith parameter ciphertext to power operation by using the ith sample characteristic to obtain k power operation results; and multiplying the k power operation results to obtain a product result as the feature operation ciphertext.
According to one embodiment, the step of determining the number k of challenges may comprise: the k challenge numbers are randomly determined and sent to the model owner.
According to another embodiment, the step of determining the k challenges may comprise: and calculating to obtain the k challenge numbers by using a hash algorithm agreed with the model owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
More specifically, in an example, the calculating the k challenge numbers specifically includes: arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence; respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences; and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
In another example, the calculating the k challenge numbers specifically includes: arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes; and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
According to a specific embodiment, the k sample features may include one of: user attribute features, picture pixel features, audio features, text coding features.
According to a third aspect, there is provided an apparatus for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the apparatus being deployed in the model owner, comprising:
the parameter encryption unit is configured to encrypt k model parameters in the owned first model respectively based on a pre-generated public key and k random numbers respectively generated by adopting a homomorphic encryption algorithm to obtain k parameter ciphertexts;
a secondary encryption unit configured to randomly select a first message and a first random number; encrypting the first message by adopting the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext;
a first transmitting unit configured to transmit the k parameter ciphertexts and the auxiliary ciphertexts to the data owner;
a challenge number acquisition unit configured to acquire k challenge numbers;
the combination unit is configured to linearly combine the k challenges with the k model parameters, superimpose the linearly combined challenges with the first message, and obtain a verification message based on a superimposition result; combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number;
a second sending unit configured to send a zero-knowledge proof including the verification message and a verification random number to the data owner, so that whether a first result of homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers and a second result of encrypting the verification message based on the public key and the verification random number by using the homomorphic encryption algorithm are equal to each other when the data owner verifies the data;
and the receiving unit is configured to receive a feature operation ciphertext sent by the data owner, wherein the feature operation ciphertext is a result of homomorphic addition operation on the k parameter ciphertexts by using k sample features owned by the data owner under the condition that the data owner passes the verification, and the feature operation ciphertext is used for restoring a feature operation result by the model owner so as to perform service processing based on the feature operation result.
According to a fourth aspect, there is provided an apparatus for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the apparatus being deployed in the data owner, comprising:
a first receiving unit configured to receive, from the model owner, k parameter ciphertexts whose encryption validity is to be verified and an auxiliary cipher text; under the condition of legal encryption, the k parameter ciphertexts are obtained by respectively encrypting k model parameters by the model owner by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively, and the auxiliary ciphertexts are obtained by encrypting a random first message by adopting the homomorphic encryption algorithm based on the public key and the first random number;
a challenge number determination unit configured to determine k challenge numbers;
a second receiving unit configured to receive a zero-knowledge proof including a verification message and a verification random number from the model owner; wherein the verification message is based on a linear combination of the k challenges with the k model parameters and a superposition with the first message, the verification nonce is based on a combination of the k challenges with the k nonces and the first nonce;
the verification unit is configured to perform homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers to obtain a first result; encrypting the verification message based on the public key and the verification random number by adopting the homomorphic encryption algorithm to obtain a second result;
the homomorphic operation unit is configured to perform homomorphic addition operation on the k parameter ciphertexts by using the owned k sample characteristics to obtain a characteristic operation cipher text under the condition that the verification unit verifies that the first result is equal to the second result;
and the sending unit is configured to send the feature operation ciphertext to the model owner, so that the model owner restores a feature operation result, and business processing is performed based on the feature operation result.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.
According to the method and the device provided by the embodiment of the specification, in the process of carrying out model data processing by combining multiple parties, the model owner and the data owner can interact data in a homomorphic encryption and homomorphic operation mode, so that a characteristic operation result is obtained. Further, in the solution provided in this specification, the model owner also provides a zero-knowledge proof to the data owner to prove that the sent ciphertext is generated by legally encrypting the model parameter. According to the setting mode of the verification random number and the verification message in the embodiment, the zero-knowledge proof can verify the legality of the multiple parameter ciphertexts at one time without revealing any related plaintext information, so that the safety of privacy data of all parties in the model data process is further ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a process diagram for model data processing by multi-party federation in one embodiment;
FIG. 3 shows a schematic block diagram of a model data processing apparatus deployed in a model owner, according to one embodiment;
FIG. 4 shows a schematic block diagram of a model data processing apparatus deployed in a data owner, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. In this implementation scenario, 2 participants are schematically shown, where participant a is the model owner and participant B is the data owner.
The data owner B owns sample feature data to be processed, wherein the sample may be a picture to be analyzed, a user, audio, text, etc. corresponding to the sample feature, and the sample feature may include a picture pixel feature, a user attribute feature (e.g., age, gender, registration duration, occupation, etc.), an audio spectrum feature, a text encoding feature, etc.
And the model owner A owns the model for performing business processing according to the sample characteristic data. For example, when the sample is a picture, the business process may include a business process based on image recognition, such as face recognition, object detection, and the like; when the sample is a user, the business process may include business processes based on user classification, such as user crowd division, user service customization, etc.; when the sample is audio, the service processing may include service processing based on audio recognition, such as speech recognition, voiceprint analysis, speech-to-text, and the like; when the sample is text, the business processes may include business processes based on text analysis, such as semantic analysis, intent recognition, and the like.
The model may be a classification model or a regression model, and the model involves linear combination operation of model parameters and sample characteristics according to the requirements of business processing. Typically, the model may be a linear regression model, and the core algorithm thereof is the above linear combination operation. Alternatively, the model may be a logistic regression model that further applies a nonlinear function operation in addition to the linear combination operation. Alternatively, the model may be another model that requires the above linear combination operation.
Under the condition that the data owner B and the model owner A jointly perform model data processing, the data owner B cannot directly send sample data to the model owner A for the purpose of protecting private data so as to avoid revealing sample characteristic values; the model owner A can not directly send the model parameters to the data owner B, so that the value of the model parameters is prevented from being leaked. Then, the data owner B and the model owner a can jointly perform data processing by using the following scheme.
Firstly, the model owner A selects a homomorphic encryption algorithm and generates a public key pk and a private key sk under the encryption algorithm. In general, the public key pk includes a natural number N related to the order of the encrypted space. Thus, the model owner A uses the homomorphic encryption algorithm for its k model parameters (m)1,m2,...,mk) Respectively encrypted to obtain parameter cryptographs (c)1,c2,...,ck) Wherein:
ci=Enc(pk,mi;ri) (1)
where Enc denotes a homomorphic encryption algorithm, riRepresents a pair of miA random number used in encryption.
Model owner A may then cipher the parameters (c)1,c2,...,ck) And sending the data to a data owner B. Thus, the data owner B can use the k sample features (y) it owns1,y2,...,yk) And performing homomorphic addition operation on the k parameter ciphertexts.
It should be understood that the homomorphic encryption algorithm is an encryption function, and the result is equivalent when a plaintext is operated and then encrypted, and a corresponding operation is performed on a ciphertext after encryption. For example, v is encrypted with the same public key PK1And v2To obtain EPK(v1) And EPK(v2) If E is satisfiedpK(v1+v2)=EPK(v1)·EPK(v2) Then the encryption algorithm is deemed to satisfy the additive homomorphism, correspondingly, EPK(v1)·EPK(v2) Corresponding homomorphic add operations.
Easy to verify, the encryption algorithm of the addition homomorphism also satisfies the following conditions:
Figure BDA0002321297930000101
according to the above equation (2), the data owner B can use the k sample features (y) it owns1,y2,...,yk) For k parameter ciphertexts (c)1,c2,...,ck) Performing homomorphic addition operation as follows to obtain a feature operation ciphertext X':
Figure BDA0002321297930000102
then, the data owner B returns the above feature operation ciphertext X' to the model owner a. Thus, the model owner a can decrypt the above feature operation ciphertext X' using its private key sk. According to the property (2) of homomorphic operation, the model owner can restore to obtain a characteristic operation result X:
X=m1*y1+m2*y2+...+mk*ykmod(N) (4)
in the process, by using the property of homomorphic encryption, the model owner A cannot reveal the model parameters, and the data owner B cannot reveal the sample characteristics.
However, the above process is an idealized process. In one possible scenario, a malicious or impersonating model owner may not homomorphically encrypt according to an agreed protocol, but rather elaborately construct a malicious string (c)1,c2,...,ck) And sending the data to a data owner B. If the data owner B still adopts the homomorphic operation shown in formula (3) and returns the result X 'to the model owner, it is possible for the malicious model owner to reversely deduce the sample characteristics in the data owner according to the result X' by means of the characteristics of the carefully constructed character string.
In view of the above risks, according to one embodiment of the present specification, the model owner, after providing the parameter ciphertext, also provides the ciphertext validity proof P to the data owner to prove that the transmitted parameter ciphertext was indeed generated using a predetermined homomorphic encryption algorithm, rather than a maliciously constructed string. For example, when the model owner sends a ciphertext c to the data owner, the above-mentioned proof P is used for proof that there is a message m and a random number r, and c ═ Enc (pk, m; r) is satisfied. The data owner can verify the validity of the received ciphertext based on the ciphertext validity proof P, and then perform subsequent homomorphic operation under the condition that the verification is passed, so that the risk of data leakage is avoided.
For the above ciphertext validity proof, in one embodiment, the model owner adopts zero Knowledge proof zkp (zero Knowledge proof) to protect the security of the private data. The zero-knowledge proof, i.e., proof that the ciphertext c is a message by using the random number r without revealing the message m and the random number rmApplying a predetermined encryption algorithm.
The following describes the process of model data processing by multi-party federation introducing zero knowledge proof.
FIG. 2 illustrates a process diagram for model data processing by multi-party federation in one embodiment. In fig. 2, continuing with the scenario example of fig. 1, a model owner a and a data owner B are also illustrated. It is to be understood, however, that this example can be extended to more participating parties, e.g., there are multiple data owners, each interacting with a model owner. The model owner can respectively obtain the characteristic operation results of each data owner and respectively perform business processing, or the characteristic operation results of each data owner are collected and then are subjected to business processing. For simplicity and clarity of description, the following still takes model owner a and data owner B as examples to describe the process of model data processing.
First, in step 201, the model owner a employs a homomorphic encryption algorithm Z based on a pre-generated public key pk and k random numbers (r) respectively generated1,r2,., rk) for k model parameters (m) in the owned business model1,m2,...,mk) Respectively encrypted to obtain k parameter ciphertexts (c)1,c2,...,ck) Wherein each parameter cipher text ciThe meaning of (c) is shown in the aforementioned formula (1).
Specifically, in one embodiment, the homomorphic encryption algorithm Z employs a Paillier encryption algorithm. The Paillier encryption algorithm is a known encryption algorithm, and its public key can represent (N, g), where N is a natural number and can be represented as the product of two larger prime numbers p, q: n ═ p × q; g is less than N2And satisfy a certain mathematical condition. In practice, g ═ N +1 may be taken. According to the Paillier encryption algorithm, when the public key pk is used to encrypt the message m, the ciphertext c may be represented as:
c=gm*rN=(N+1)m*rN(mod N2) (5)
where r is the random number used for encryption and mod is the modulo operation.
Thus, the model parameters m can be calculated for each model parameter miUsing a corresponding random number riApplying the encryption operation of the formula (5) to obtain the corresponding parameter ciphertext ci
It can be verified that the Paillier encryption algorithm satisfies homomorphism. In particular, for ciphertext c1And c2Wherein c is1=Enc(pk,m1;r1),c2=Enc(pk,m2;r2) The Paillier encryption algorithm satisfies:
Figure BDA0002321297930000121
in another embodiment, improvement can be performed on the basis of the existing Pailiier algorithm, and a new encryption algorithm is provided, which is called an improved Pailiier algorithm. The improved Paillier algorithm public key can comprise (N, h), wherein N is a natural number, and h is a space defined by the natural number N
Figure BDA0002321297930000122
One cycle of generator elements of subgroup G. More specifically, N may be represented as two larger prime numberspqThe product of (a): n ═ p × q. According to the improved Paillier encryption algorithm, when the public key pk is used to encrypt the message m, the ciphertext c may be represented as:
c=(N+1)m*(hN)r(mod N2) (7)
where r is the random number used for encryption and mod is the modulo operation.
Thus, the model parameters m can be calculated for each model parameter miUsing a corresponding random number riApplying the improved Paillier encryption operation of the formula (7) to obtain the corresponding parameter ciphertext ci
It can be verified that the improved Paillier encryption algorithm also satisfies homomorphism. In particular, for ciphertext c1And c2Wherein c is1=Enc(pk,m1;r1),c2=Enc(pk,m2;r2) The improved Paillier encryption algorithm meets the following requirements:
Figure BDA0002321297930000131
comparing the formulas (8) and (6), it can be seen that both the Paillier algorithm and the improved Paillier algorithm satisfy the addition homomorphism required by the formula (2), and only after the addition homomorphism operation is performed, the obtained random numbers are different.
In other embodiments, other homomorphic encryption algorithms may also be employed to encrypt the k model parameters to obtain k parameter ciphertexts (c)1,c2,...,ck)。
To prove the validity of the generated parameter ciphertext, the model owner a randomly selects a message m in step 2020And selecting a random number r within a predetermined range0Hereinafter, for convenience of description, it is referred to as a first message and a first random number. The size of the above-mentioned predetermined range is explained in the subsequent steps. Then, the model owner adopts the homomorphic encryption algorithm Z based on the public key pk and the first random number r0For the first message m0Encrypting to obtain an auxiliary ciphertext c0
When the same is usedWhen the state encryption algorithm Z is a Paillier encryption algorithm, an auxiliary ciphertext c is obtained according to the formula (5)0May comprise using the first message m0Performing a power operation on (N +1) to obtain a first intermediate result
Figure BDA0002321297930000132
For the first random number r0Performing an N power operation to obtain a second intermediate result r0 N(ii) a Obtaining an auxiliary ciphertext c based on the first intermediate result and the second intermediate result0
When the homomorphic encryption algorithm Z is the improved Paillier encryption algorithm, obtaining an auxiliary ciphertext c according to the formula (7)0May comprise using the first message m0Performing a power operation on (N +1) to obtain a first intermediate result
Figure BDA0002321297930000133
Using a first random number r0Performing group operation based on the generator h to obtain a second intermediate result
Figure BDA0002321297930000134
Obtaining an auxiliary ciphertext c based on the first intermediate result and the second intermediate result0
When other homomorphic encryption algorithms are used, the first message m is encrypted according to the corresponding encryption function0Encrypting to obtain an auxiliary ciphertext c0
In step 203, the model owner transmits the k parameter ciphertexts to the data owner (c)1,c2,...,ck) And the auxiliary ciphertext c0
It should be noted that although shown as one step in fig. 2, in other embodiments, the parameter ciphertext and the auxiliary ciphertext may be sent separately in two or more steps. For example, k parameter ciphertexts may be transmitted first, and then an auxiliary cipher text may be generated and transmitted. The order of transmission is not limited herein.
Then, at step 204, the model owner obtains k challengesNumber (e)1,e2,...,ek). There are various embodiments for the acquisition of the challenge number.
In one embodiment, after receiving the auxiliary ciphertext, the data owner B randomly selects or generates k random numbers as k challenge numbers, and sends the k challenge numbers to the model owner. The model owner a acquires the k challenge numbers by receiving the transmission of the data owner B. In this embodiment, the generation of the challenge number is simple and does not require complicated calculations.
In another embodiment, the model owner a and the data owner B agree in advance on a hash algorithm, and the agreed algorithm is used to generate the k parameter ciphertexts (c) based on the ciphertexts1,c2,...,ck) And the auxiliary ciphertext c0K challenge numbers (e) are calculated1,e2,...,ek) Namely:
(e1,e2,...,ek)=Hash(c1,c2,...,ck,c0) (9)
it should be understood that according to the formula (9), the model owning party a and the data owning party B need to respectively determine k challenge numbers in the same agreed manner based on the k parameter ciphertexts and the auxiliary ciphertexts and the whole of k +1 ciphertexts. In the embodiment, the two parties respectively calculate the same k challenge numbers in an agreed mode, so that the interaction times of the two parties are reduced, and the cost brought by communication is reduced.
The process of calculating the challenge numbers by both parties can be embodied in various ways.
Specifically, in one example, the k parameter ciphertexts and the auxiliary ciphertexts may be arranged into a sequence, which is referred to as a first sequence. The first sequence is, for example, (c)1c2...ckc0). Then, k predetermined index values are added to predetermined positions of the first sequence, respectively, to obtain k second sequences. For example, in the first sequence (c) described above1c2...ckc0) Add index P to the tail of (c) to obtain (c)1c2...ckc0P);When the index P takes different k index values, for example 1, 2, a. Then, a predetermined hash function is applied to the k second sequences, respectively, so that k challenge numbers can be obtained. In other words, in this example, the challenge number eiCan be expressed as:
ei=Hash(c1c2...ckc0pi) (10)
wherein p isiIs the ith index value of the index P.
In another example, the challenge number may also be calculated as follows. K parameter ciphertexts (c) can be combined1,c2,...,ck) And the auxiliary ciphertext c0The k sequences are arranged in k predetermined sorting ways. For example, respectively with c1,c2,...,ckSequencing the first ciphertext of the sequence by keeping the relative order of the ciphertexts, and circularly forming k sequences, wherein the sequence 1 is (c)1c2...ckc0) Sequence 2 is (c)2c3...c0c1) The sequence k is (c)kc0c1...ck-2ck-1). Then, a predetermined hash function is applied to the k sequences, respectively, to obtain k challenge numbers (e)1,e2,...,ek)。
On the basis of the above specific examples, those skilled in the art may also modify the above specific examples, and calculate the k challenge numbers in more ways, which are not listed here. It should be understood that, when k challenge numbers are calculated, calculation needs to be performed based on the entirety of k +1 ciphertexts including k parameter ciphertexts and auxiliary ciphertexts, so as to jointly determine k challenge numbers.
After the model owner a obtains the k challenge numbers, in step 205, the k challenge numbers are respectively combined with the k model parameters and the k random numbers adopted in encryption to obtain a verification message m*And verifying the random number r*
Specifically, the above k challenge numbers (e) may be set1,e2,...,ek) And k model parameters (m)1,m2,...,mk) Linearly combined with the first message m0Overlapping to obtain verification message m based on the overlapping result*
In one example, the authentication message m*Namely the superposition result:
m*=m0+e1*m1+e2*m2+...+ek*mk(11)
in such a case, the first message m is selected in step 2020When considering the approximate range of the linear combination result of the subsequent k challenges and k model parameters, the first message is also in such range. This is because, if m is0If the value range of the subsequent linear combination term in the formula (11) is too different (for example, by several orders of magnitude), the result of the linear combination is exposed; and the data owner also has the k challenge numbers, it is possible to reversely deduce the original model parameters according to the result of the linear combination. Therefore, the value range of the first message needs to play a role in confusing the result of the following linear combination.
In another example, based on the superposition result shown in the above formula (11), N is further modulo, and the modulo result is taken as the verification message m*Namely:
m*=m0+e1*m1+e2*m2+...+ek*mkmod(N) (12)
wherein N is a natural number N in a public key of the Paillier encryption algorithm or the improved Paillier encryption algorithm. Since the natural number N is the order of the value space where the encrypted message is located, modulo N of the superposition result does not affect the subsequent encryption result.
In case of calculating the verification message using equation (12), the first message m is selected in step 2020When the linear combination result is obtained, the linear combination result is preferably uniformly selected from {0, 1, N-1}, and no special requirement is imposed on the value range, because the modulus operation can confuse the original linear combination result.
On the other hand, the model owner will also challenge k numbers (e)1,e2,...,ek) K random numbers (r) used in encrypting the k model parameters1,r2,...,rk) And the first random number r selected in step 2020Combining to obtain verification random number r*
When the Paillier encryption algorithm is adopted in steps 201 and 202, the following calculation can be performed with reference to the combination manner of random numbers in homomorphic operation corresponding to Paillier encryption shown in formula (6). For k challenge numbers and k random numbers, any ith challenge number e can be sequentially usediAnd ith random number riUsing the i-th challenge eiFor ith random number riPerforming exponentiation operation to obtain k exponentiation operation results
Figure BDA0002321297930000161
And the k power operation results are combined with a first random number r0Multiply to obtain the verification random number r*Namely:
Figure BDA0002321297930000162
when the improved Paillier encryption algorithm is adopted in steps 201 and 202, the following calculation can be performed with reference to the combination of random numbers in homomorphic operation corresponding to the improved Paillier encryption shown in formula (8). The number of challenges k (e) can be counted1,e2,...,ek) And the aforementioned k random numbers (r)1,r2,...,rk) Linearly combined with said first random number r0Overlapping to obtain the verification random number r*Namely:
r*=r0+e1*r1+e2*r2+...+ek*rk(14)
when the authentication random number is calculated using equation (14), since it is difficult to determine the spatial order in which the random number is located, the authentication random number r is calculated*The modulus operation is not generally performed. This is achievedIt is required that step 202 selects the first random number within a predetermined range, which is determined by the formula (14) r0The value ranges of the following linear combination items are correspondingly consistent, so that the effect of confusing the following linear combination results is achieved. For example, when each of the k random numbers in encryption is 320 bits and each challenge number is 112 bits, the range of the first random number may be (432 × logk +112) bits.
The verification message m is obtained by calculation at the model owner A*And verifying the random number r*Thereafter, at step 206, a zero knowledge proof is sent to the data owner B, including the verification message m as described above*And verifying the random number r*
After receiving the zero knowledge proof, the data owner may verify the encryption validity of the k parameter ciphertexts in step 207 based on the zero knowledge proof. Specifically, the data owner uses the above-described k challenge numbers (e)1,e2,...,ek) For the k parameter ciphertexts (c)1,c2,...,ck) And an auxiliary ciphertext c0Performing homomorphic addition operation to obtain a first result Q1. In particular, the challenge number e for the i-th in turniAnd ith parameter ciphertext ciPerforming power operation on the ith parameter ciphertext by using the ith challenge number to obtain k power operation results; and the auxiliary cryptogram c0Multiplying the k power operation results to obtain the first result Q based on the multiplication result1Namely:
Figure BDA0002321297930000171
on the other hand, the same homomorphic encryption algorithm Z is adopted, and the public key pk and the verification random number r are based on*Encrypting m an authentication message*Obtaining a second result Q2Namely:
Q2=Enc(pk,m*;r*) (16)
then, the first results Q are compared1And a second result Q2Whether or not equal.
If the k parameter ciphertexts and the auxiliary ciphertexts are legally ciphered by adopting the Paillier ciphering algorithm, the first result Q is obtained according to the addition homomorphism property of the Paillier ciphering algorithm shown in the formula (6)1Can be written as:
Figure BDA0002321297930000172
according to the authentication message m in the formula (11) or (12)*And verifying the random number r in the formula (13)*Is calculated in such a way that the right side of the above equation (17) is equal to the second result Q2
If the k parameter ciphertexts and the auxiliary ciphertexts are both legally ciphered by adopting the improved Paillier ciphering algorithm, the first result Q is obtained according to the addition homomorphism property of the improved Paillier algorithm shown in the formula (8)1Can be written as:
Figure BDA0002321297930000181
according to the authentication message m in the formula (11) or (12)*And the verification random number r in the formula (14)*Is calculated in such a way that the right side of the above equation (18) is equal to the second result Q2
In short, if the k parameter ciphertexts and the auxiliary ciphertexts are both legally ciphered by adopting the homomorphic encryption algorithm, the first result obtained by the corresponding homomorphic operation should be equal to the second result obtained by adopting the homomorphic encryption algorithm to encrypt the verification message by using the verification random number. Therefore, the data owner B can verify whether the k parameter ciphertexts are legally encrypted by adopting the agreed homomorphic encryption algorithm by verifying whether the first result and the second result are equal.
If the first result is verified to be equal to the second result, then the zero knowledge proof of verification passes, proving that the k parameter ciphertexts are legally encrypted ciphertexts and not malicious constructions, then in step 208, the data owner B uses the owned k sample characteristics (y1,y2,...,yk) For k parameter ciphertexts (c)1,c2,...,ck) And performing homomorphic addition operation to obtain a feature operation ciphertext X'. Specifically, for the ith sample feature yiAnd ith parameter ciphertext ciPerforming power operation on the ith parameter ciphertext by using the ith sample characteristic to obtain k power operation results; the result of these k exponentiations is multiplied by one another, and the product result is used as the feature operation ciphertext X', which is specifically calculated as shown in the foregoing formula (3).
Then, in step 209, the data owner B sends the above-described feature operation ciphertext X' to the model owner a.
Next, in step 210, the model owner a decrypts the feature computation ciphertext X' using the private key sk corresponding to the public key pk, so as to obtain a feature computation result. As shown in the foregoing formula (4), the feature operation result obtained by decryption is k sample features (y)1,y2,...,yk) And k model parameters (m)1,m2,...,mk) Linear combination of (1) ═ m1*y1+m2*y2+...+mk*ykmod(N)。
Then, the model owner can perform model operations required for business processing, such as image recognition, user classification, voice recognition, text processing, and the like, based on the restored feature operation result X.
It should be understood that the above obtained feature operation result can be used in the model training phase and the model using phase. In the model training stage, the k model parameters may be temporary parameters to be optimized, the model owner performs further processing and prediction based on the feature operation result, and then compares the prediction result with the sample label, thereby adjusting the current k model parameters, i.e., updating and optimizing the model. In the stage of using the model, the k model parameters are model parameters that have been trained and tuned, and the model owner may output the prediction result for the current sample for service processing after further processing based on the feature operation result.
It can be seen from the above review of the whole process that, in the process of model data processing by multi-party combination, the model owner and the data owner can interact data in a homomorphic encryption and homomorphic operation manner, so as to obtain a feature operation result. Further, in the solution provided in this specification, the model owner also provides a zero-knowledge proof to the data owner to prove that the sent ciphertext is generated by legally encrypting the model parameter. According to the setting mode of the verification random number and the verification message in the embodiment, the zero-knowledge proof can verify the legality of the multiple parameter ciphertexts at one time without revealing any related plaintext information, so that the safety of privacy data of all parties in the model data process is further ensured.
According to an embodiment of another aspect, an apparatus for joint model data processing is provided, and the apparatus is deployed in a model owner, and the model owner can be implemented by any device, platform or device cluster with computing and processing capabilities. FIG. 3 shows a schematic block diagram of a model data processing apparatus deployed in a model owner, according to one embodiment. As shown in fig. 3, the processing apparatus 300 includes:
the parameter encryption unit 31 is configured to encrypt k model parameters in the owned first model respectively based on a pre-generated public key and k random numbers respectively generated by using a homomorphic encryption algorithm to obtain k parameter ciphertexts;
a secondary encryption unit 32 configured to randomly select a first message and a first random number; encrypting the first message by adopting the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext;
a first transmitting unit 33 configured to transmit the k parameter ciphertexts and the auxiliary ciphertexts to a data owner;
a challenge number acquisition unit 34 configured to acquire k challenge numbers;
a combination unit 35 configured to linearly combine the k challenge numbers and the k model parameters, superimpose the linearly combined challenge numbers and the k model parameters on the first message, and obtain a verification message based on a superimposition result; combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number;
a second sending unit 36, configured to send a zero-knowledge proof including the verification message and a verification random number to the data owner, so that whether a first result obtained by homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers and a second result obtained by encrypting the verification message based on the public key and the verification random number by using the homomorphic encryption algorithm are equal to each other when the data owner verifies that the first result is verified;
the receiving unit 37 is configured to receive a feature operation ciphertext sent by the data owner, where the feature operation ciphertext is a result of performing homomorphic addition operation on the k parameter ciphertexts by using k sample features owned by the data owner when the data owner passes the verification, and is used by the model owner to restore a feature operation result, so as to perform service processing based on the feature operation result.
In one embodiment, the first model maintained in the model owner is a linear regression model, or a logistic regression model.
According to one embodiment, the public key comprises a natural number N and a generator h of a cyclic subgroup in a random number space defined by the natural number N; furthermore, the auxiliary encryption unit 32 is specifically configured to: performing a power operation on (N +1) using the first message to obtain a first intermediate result; performing group operation based on the generator h by using the first random number to obtain a second intermediate result; and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
Accordingly, in an example of the above embodiment, the combining unit 35 is specifically configured to: and linearly combining the k challenge numbers and the k random numbers, and superposing the k challenge numbers and the k random numbers with the first random number to obtain the verification random number.
According to another embodiment, the public key comprises at least a natural number N; correspondingly, the auxiliary encryption unit 32 is specifically configured to: performing a power operation on (N +1) using the first message to obtain a first intermediate result; performing an N power operation on the first random number to obtain a second intermediate result; and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
Accordingly, in an example of the above embodiment, the combining unit 35 is specifically configured to: sequentially performing power operation on the ith random number by using the ith challenge number for the ith challenge number and the ith random number to obtain k power operation results; and multiplying the k power operation results with the first random number to obtain the verification random number.
In one embodiment, the challenge number obtaining unit 34 is configured to: receiving the k challenge numbers from the data owner for random selection thereof.
In another embodiment, the challenge number obtaining unit 34 is configured to: and calculating the k challenge numbers by using a hash algorithm agreed with the data owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
More specifically, in one example, the challenge number obtaining unit 34 obtains the k challenge numbers by calculating: arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence; respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences; and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
In another example, the challenge number obtaining unit 34 obtains the k challenge numbers by calculating as follows: arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes; and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
According to an embodiment, the combination unit 35 is configured to: and taking the modulus of the first superposition result to N, and taking the modulus result as the verification message, wherein N is a natural number in a public key.
In one embodiment, the apparatus further comprises a decryption unit (not shown) configured to: and decrypting the feature operation ciphertext by adopting a decryption algorithm corresponding to the homomorphic encryption algorithm and using a private key corresponding to the public key to obtain the feature operation result, wherein the feature operation result corresponds to the linear combination of the k sample features and the k model parameters.
According to an embodiment of another aspect, an apparatus for joint model data processing is provided, and the apparatus is deployed in a data owner, and the data owner can be implemented by any device, platform or device cluster with computing and processing capabilities. FIG. 4 shows a schematic block diagram of a model data processing apparatus deployed in a data owner, according to one embodiment. As shown in fig. 4, the processing apparatus 400 includes:
a first receiving unit 41 configured to receive, from the model owner, k parameter ciphertexts whose encryption validity is to be verified and an auxiliary cipher text; under the condition of legal encryption, the k parameter ciphertexts are obtained by respectively encrypting k model parameters by the model owner by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively, and the auxiliary ciphertexts are obtained by encrypting a random first message by adopting the homomorphic encryption algorithm based on the public key and the first random number;
a challenge number determination unit 42 configured to determine k challenge numbers;
a second receiving unit 43 configured to receive a zero-knowledge proof including a verification message and a verification random number from the model owner; wherein the verification message is based on a linear combination of the k challenges with the k model parameters and a superposition with the first message, the verification nonce is based on a combination of the k challenges with the k nonces and the first nonce;
the verification unit 44 is configured to perform homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers to obtain a first result; encrypting the verification message based on the public key and the verification random number by adopting the homomorphic encryption algorithm to obtain a second result;
a homomorphic operation unit 45 configured to perform homomorphic addition operation on the k parameter ciphertexts by using the owned k sample features to obtain a feature operation cipher text when the verification unit 44 verifies that the first result is equal to the second result;
and a sending unit 46 configured to send the feature operation ciphertext to the model owner, so that the model owner restores the feature operation result, and thus performs service processing based on the feature operation result.
In one embodiment, the public key comprises a natural number N and a generator h of a cyclic subgroup in a space defined by the natural number N; accordingly, the verification unit 44 is configured to: performing a power operation on (N +1) by using the verification message to obtain a first intermediate item; performing group operation based on the generator h by using a verification random number to obtain a second intermediate item; and obtaining the second result based on the first intermediate item and the second intermediate item.
In one embodiment, the verification unit 44 is configured to: sequentially performing power operation on the ith parameter ciphertext by using the ith challenge number for the ith challenge number and the ith parameter ciphertext to obtain k power operation results; and multiplying the auxiliary ciphertext by the k power operation results, and taking a product result as the first result.
According to one embodiment, the homomorphic operation unit 45 is configured to: sequentially subjecting the ith sample characteristic and the ith parameter ciphertext to power operation by using the ith sample characteristic to obtain k power operation results; and multiplying the k power operation results to obtain a product result as the feature operation ciphertext.
According to one embodiment, the challenge number determination unit 42 is configured to: the k challenge numbers are randomly determined and sent to the model owner.
According to another embodiment, the challenge number determination unit 42 is configured to: and calculating to obtain the k challenge numbers by using a hash algorithm agreed with the model owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
More specifically, in one example, the challenge number determining unit 42 calculates the k challenge numbers as follows: arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence; respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences; and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
In another example, the challenge number determination unit 42 calculates the k challenge numbers as follows: arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes; and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
According to a specific embodiment, the k sample features may include one of: user attribute features, picture pixel features, audio features, text coding features.
By means of the device 300 and the device 400, the multi-party joint model data processing is realized while the security of private data is protected.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (25)

1. A method for model data processing by a multi-party federation that protects data privacy, the multiple parties including a model owner and a data owner, the method performed by the model owner, comprising:
respectively encrypting k model parameters in the owned first model by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively to obtain k parameter ciphertexts;
randomly selecting a first message and a first random number; encrypting the first message by adopting the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext;
sending the k parameter ciphertexts and the auxiliary ciphertexts to the data owner;
acquiring k challenge numbers;
linearly combining the k challenge numbers with the k model parameters, superposing the k challenge numbers with the first message, and obtaining a verification message based on a superposition result; combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number;
sending a zero-knowledge proof to the data owner, wherein the zero-knowledge proof comprises the verification message and a verification random number, so that the data owner verifies whether a first result obtained by homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers is equal to a second result obtained by encrypting the verification message by using the homomorphic encryption algorithm based on the public key and the verification random number;
and receiving a feature operation ciphertext sent by the data owner, wherein the feature operation ciphertext is a result of homomorphic addition operation on the k parameter ciphertexts by using k sample features owned by the data owner under the condition that the data owner passes the verification, and the feature operation ciphertext is used for restoring a feature operation result by the model owner so as to perform service processing based on the feature operation result.
2. The method of claim 1, wherein the first model is a linear regression model, or a logistic regression model.
3. The method of claim 1, wherein the public key comprises a natural number N and a generator h of a cyclic subgroup in a random number space defined by the natural number N;
encrypting the first message by using the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext, comprising:
performing a power operation on (N +1) using the first message to obtain a first intermediate result;
performing group operation based on the generator h by using the first random number to obtain a second intermediate result;
and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
4. The method of claim 3, wherein combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number comprises:
and linearly combining the k challenge numbers and the k random numbers, and superposing the k challenge numbers and the k random numbers with the first random number to obtain the verification random number.
5. The method of claim 1, wherein the public key comprises at least a natural number N;
encrypting the first message by using the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext, comprising:
performing a power operation on (N +1) using the first message to obtain a first intermediate result;
performing an N power operation on the first random number to obtain a second intermediate result;
and obtaining the auxiliary ciphertext based on the first intermediate result and the second intermediate result.
6. The method of claim 5, wherein combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number comprises:
sequentially performing power operation on the ith random number by using the ith challenge number for the ith challenge number and the ith random number to obtain k power operation results;
and multiplying the k power operation results with the first random number to obtain the verification random number.
7. The method of claim 1, wherein the obtaining k challenge numbers comprises:
receiving the k challenge numbers from the data owner for random selection thereof.
8. The method of claim 1, wherein the obtaining k challenge numbers comprises:
and calculating the k challenge numbers by using a hash algorithm agreed with the data owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
9. The method of claim 8, wherein calculating the k challenges comprises:
arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence;
respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences;
and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
10. The method of claim 8, wherein calculating the k challenges comprises:
arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes;
and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
11. The method of claim 3, wherein the deriving a validation message based on the first overlay result comprises:
and taking the modulus of the first superposition result to N, and taking the modulus result as the verification message.
12. The method of claim 1, further comprising: and decrypting the feature operation ciphertext by adopting a decryption algorithm corresponding to the homomorphic encryption algorithm and using a private key corresponding to the public key to obtain the feature operation result, wherein the feature operation result corresponds to the linear combination of the k sample features and the k model parameters.
13. A method for model data processing by federation of multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the method performed by the data owner, comprising:
receiving k parameter ciphertexts and auxiliary ciphertexts of which the encryption validity is to be verified from the model owner; under the condition of legal encryption, the k parameter ciphertexts are obtained by respectively encrypting k model parameters by the model owner by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively, and the auxiliary ciphertexts are obtained by encrypting a random first message by adopting the homomorphic encryption algorithm based on the public key and the first random number;
determining k challenge numbers;
receiving a zero knowledge proof from the model owner including a verification message and a verification random number; wherein the verification message is based on a linear combination of the k challenges with the k model parameters and a superposition with the first message, the verification nonce is based on a combination of the k challenges with the k nonces and the first nonce;
performing homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers to obtain a first result; encrypting the verification message based on the public key and the verification random number by adopting the homomorphic encryption algorithm to obtain a second result;
under the condition that the first result is equal to the second result, homomorphic addition operation is carried out on the k parameter ciphertexts by using the owned k sample characteristics to obtain a characteristic operation cipher text;
and sending the feature operation ciphertext to the model owner, wherein the feature operation ciphertext is used for restoring a feature operation result by the model owner, so that business processing is performed based on the feature operation result.
14. The method of claim 13, wherein the public key comprises a natural number N and a generator h of a cyclic subgroup in a random number space defined by the natural number N;
encrypting the verification message based on the public key and the verification random number to obtain a second result, comprising:
performing a power operation on (N +1) using the verification message to obtain a first intermediate term;
performing group operation based on the generator h by using the verification random number to obtain a second intermediate item;
and obtaining the second result based on the first intermediate item and the second intermediate item.
15. The method of claim 13, wherein homomorphically summing the k parameter ciphertexts and the auxiliary ciphertexts using the k challenge numbers to obtain a first result comprises:
sequentially performing power operation on the ith parameter ciphertext by using the ith challenge number for the ith challenge number and the ith parameter ciphertext to obtain k power operation results;
and multiplying the auxiliary ciphertext by the k power operation results to obtain the first result based on a product result.
16. The method of claim 13, wherein homomorphic summing the k parameter ciphertexts using the owned k sample features to obtain a feature operation cipher text, comprises:
sequentially subjecting the ith sample characteristic and the ith parameter ciphertext to power operation by using the ith sample characteristic to obtain k power operation results;
and multiplying the k power operation results to obtain the feature operation ciphertext based on the multiplication result.
17. The method of claim 13, wherein the determining k challenge numbers comprises:
the k challenge numbers are randomly determined and sent to the model owner.
18. The method of claim 13, wherein the determining k challenge numbers comprises:
and calculating to obtain the k challenge numbers by using a hash algorithm agreed with the model owner based on the k parameter ciphertexts and the auxiliary ciphertexts.
19. The method of claim 18, wherein calculating the k challenges comprises:
arranging the k parameter ciphertexts and the auxiliary ciphertexts into a first sequence;
respectively adding k preset index values at preset positions of the first sequence to obtain k second sequences;
and respectively applying a preset hash function to the k second sequences to obtain the k challenge numbers.
20. The method of claim 18, wherein calculating the k challenges comprises:
arranging the k parameter ciphertexts and the auxiliary ciphertexts into k sequences according to k preset sequencing modes;
and respectively applying a preset hash function to the k sequences to obtain the k challenge numbers.
21. The method of claim 13, wherein the k sample features comprise one of: user attribute features, picture pixel features, audio features, text coding features.
22. An apparatus for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the apparatus being deployed in the model owner, comprising:
the parameter encryption unit is configured to encrypt k model parameters in the owned first model respectively based on a pre-generated public key and k random numbers respectively generated by adopting a homomorphic encryption algorithm to obtain k parameter ciphertexts;
a secondary encryption unit configured to randomly select a first message and a first random number; encrypting the first message by adopting the homomorphic encryption algorithm based on the public key and the first random number to obtain an auxiliary ciphertext;
a first transmitting unit configured to transmit the k parameter ciphertexts and the auxiliary ciphertexts to the data owner;
a challenge number acquisition unit configured to acquire k challenge numbers;
the combination unit is configured to linearly combine the k challenges with the k model parameters, superimpose the linearly combined challenges with the first message, and obtain a verification message based on a superimposition result; combining the k challenge numbers with the k random numbers and the first random number to obtain a verification random number;
a second sending unit configured to send a zero-knowledge proof including the verification message and a verification random number to the data owner, so that whether a first result of homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers and a second result of encrypting the verification message based on the public key and the verification random number by using the homomorphic encryption algorithm are equal to each other when the data owner verifies the data;
and the receiving unit is configured to receive a feature operation ciphertext sent by the data owner, wherein the feature operation ciphertext is a result of homomorphic addition operation on the k parameter ciphertexts by using k sample features owned by the data owner under the condition that the data owner passes the verification, and the feature operation ciphertext is used for restoring a feature operation result by the model owner so as to perform service processing based on the feature operation result.
23. An apparatus for model data processing by combining multiple parties for protecting data privacy, the multiple parties including a model owner and a data owner, the apparatus being deployed in the data owner, comprising:
a first receiving unit configured to receive, from the model owner, k parameter ciphertexts whose encryption validity is to be verified and an auxiliary cipher text; under the condition of legal encryption, the k parameter ciphertexts are obtained by respectively encrypting k model parameters by the model owner by adopting a homomorphic encryption algorithm based on a pre-generated public key and k random numbers generated respectively, and the auxiliary ciphertexts are obtained by encrypting a random first message by adopting the homomorphic encryption algorithm based on the public key and the first random number;
a challenge number determination unit configured to determine k challenge numbers;
a second receiving unit configured to receive a zero-knowledge proof including a verification message and a verification random number from the model owner; wherein the verification message is based on a linear combination of the k challenges with the k model parameters and a superposition with the first message, the verification nonce is based on a combination of the k challenges with the k nonces and the first nonce;
the verification unit is configured to perform homomorphic addition operation on the k parameter ciphertexts and the auxiliary ciphertexts by using the k challenge numbers to obtain a first result; encrypting the verification message based on the public key and the verification random number by adopting the homomorphic encryption algorithm to obtain a second result;
the homomorphic operation unit is configured to perform homomorphic addition operation on the k parameter ciphertexts by using the owned k sample characteristics to obtain a characteristic operation cipher text under the condition that the verification unit verifies that the first result is equal to the second result;
and the sending unit is configured to send the feature operation ciphertext to the model owner, so that the model owner restores a feature operation result, and business processing is performed based on the feature operation result.
24. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-21.
25. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-21.
CN201911298674.3A 2019-12-17 2019-12-17 Method and device for processing model data by combining multiple parties Active CN110991655B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911298674.3A CN110991655B (en) 2019-12-17 2019-12-17 Method and device for processing model data by combining multiple parties
PCT/CN2020/123982 WO2021120861A1 (en) 2019-12-17 2020-10-27 Method and apparatus for multi-party joint model data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911298674.3A CN110991655B (en) 2019-12-17 2019-12-17 Method and device for processing model data by combining multiple parties

Publications (2)

Publication Number Publication Date
CN110991655A true CN110991655A (en) 2020-04-10
CN110991655B CN110991655B (en) 2021-04-02

Family

ID=70094376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911298674.3A Active CN110991655B (en) 2019-12-17 2019-12-17 Method and device for processing model data by combining multiple parties

Country Status (2)

Country Link
CN (1) CN110991655B (en)
WO (1) WO2021120861A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539535A (en) * 2020-06-05 2020-08-14 支付宝(杭州)信息技术有限公司 Joint feature binning method and device based on privacy protection
CN111563267A (en) * 2020-05-08 2020-08-21 京东数字科技控股有限公司 Method and device for processing federal characteristic engineering data
CN111598254A (en) * 2020-05-22 2020-08-28 深圳前海微众银行股份有限公司 Federal learning modeling method, device and readable storage medium
CN111984932A (en) * 2020-08-24 2020-11-24 支付宝(杭州)信息技术有限公司 Two-party data grouping statistical method, device and system
CN112000991A (en) * 2020-10-27 2020-11-27 支付宝(杭州)信息技术有限公司 Multi-party data joint processing method, device and system
CN112800479A (en) * 2021-04-07 2021-05-14 支付宝(杭州)信息技术有限公司 Multi-party combined data processing method and device by using trusted third party
WO2021120861A1 (en) * 2019-12-17 2021-06-24 支付宝(杭州)信息技术有限公司 Method and apparatus for multi-party joint model data processing
WO2021239008A1 (en) * 2020-05-27 2021-12-02 支付宝(杭州)信息技术有限公司 Privacy protection-based encryption method and system
CN114401079A (en) * 2022-03-25 2022-04-26 腾讯科技(深圳)有限公司 Multi-party joint information value calculation method, related equipment and storage medium
CN114422107A (en) * 2022-03-31 2022-04-29 四川高速公路建设开发集团有限公司 Fault-tolerant ciphertext data aggregation method based on intelligent engineering construction system platform
US20220158835A1 (en) * 2020-11-13 2022-05-19 Sony Group Corporation Zero-knowledge authentication based on device information
CN115242444A (en) * 2022-06-22 2022-10-25 暨南大学 Verifiable privacy protection linear regression method and system
CN115242409A (en) * 2022-09-21 2022-10-25 环球数科集团有限公司 Privacy calculation method and system based on zero-knowledge proof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008022158A2 (en) * 2006-08-14 2008-02-21 The Regents Of The University Of California System for non-interactive zero-knowledge proofs
CN102916968A (en) * 2012-10-29 2013-02-06 北京天诚盛业科技有限公司 Identity authentication method, identity authentication server and identity authentication device
CN103414690A (en) * 2013-07-15 2013-11-27 北京航空航天大学 Publicly-verifiable cloud data possession checking method
CN107682379A (en) * 2017-11-22 2018-02-09 南京汽车集团有限公司 Safe information transmission device, transmission method and storage method based on homomorphic cryptography
CN110163008A (en) * 2019-04-30 2019-08-23 阿里巴巴集团控股有限公司 A kind of method and system of the security audit of the Encryption Model of deployment
CN110414981A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of homomorphic cryptography method that supporting ZKPs and block chain transaction amount encryption method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3549306A4 (en) * 2018-11-07 2020-01-01 Alibaba Group Holding Limited Recovering encrypted transaction information in blockchain confidential transactions
BR112019014629A2 (en) * 2018-12-21 2021-07-20 Advanced New Technologies Co., Ltd. computer implemented method, computer readable storage medium and system
CN109951443B (en) * 2019-01-28 2021-06-04 湖北工业大学 Set intersection calculation method and system for privacy protection in cloud environment
CN110011781B (en) * 2019-03-04 2020-05-19 华中科技大学 Homomorphic encryption method and medium for transaction amount encryption and supporting zero knowledge proof
CN110991655B (en) * 2019-12-17 2021-04-02 支付宝(杭州)信息技术有限公司 Method and device for processing model data by combining multiple parties

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008022158A2 (en) * 2006-08-14 2008-02-21 The Regents Of The University Of California System for non-interactive zero-knowledge proofs
CN102916968A (en) * 2012-10-29 2013-02-06 北京天诚盛业科技有限公司 Identity authentication method, identity authentication server and identity authentication device
CN103414690A (en) * 2013-07-15 2013-11-27 北京航空航天大学 Publicly-verifiable cloud data possession checking method
CN107682379A (en) * 2017-11-22 2018-02-09 南京汽车集团有限公司 Safe information transmission device, transmission method and storage method based on homomorphic cryptography
CN110163008A (en) * 2019-04-30 2019-08-23 阿里巴巴集团控股有限公司 A kind of method and system of the security audit of the Encryption Model of deployment
CN110414981A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of homomorphic cryptography method that supporting ZKPs and block chain transaction amount encryption method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANUNAY KULSHRESTHA ETAL.: "Cryptographically Secure Multiparty Computation and Distributed Auctions Using Homomorphic Encryption", 《CRYPTOGRAPHY》 *
仲红: "安全多方计算的关键技术分析", 《安全多方计算的关键技术分析 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021120861A1 (en) * 2019-12-17 2021-06-24 支付宝(杭州)信息技术有限公司 Method and apparatus for multi-party joint model data processing
CN111563267A (en) * 2020-05-08 2020-08-21 京东数字科技控股有限公司 Method and device for processing federal characteristic engineering data
CN111563267B (en) * 2020-05-08 2024-04-05 京东科技控股股份有限公司 Method and apparatus for federal feature engineering data processing
CN111598254A (en) * 2020-05-22 2020-08-28 深圳前海微众银行股份有限公司 Federal learning modeling method, device and readable storage medium
WO2021239008A1 (en) * 2020-05-27 2021-12-02 支付宝(杭州)信息技术有限公司 Privacy protection-based encryption method and system
CN111539535B (en) * 2020-06-05 2022-04-12 支付宝(杭州)信息技术有限公司 Joint feature binning method and device based on privacy protection
CN111539535A (en) * 2020-06-05 2020-08-14 支付宝(杭州)信息技术有限公司 Joint feature binning method and device based on privacy protection
CN111984932A (en) * 2020-08-24 2020-11-24 支付宝(杭州)信息技术有限公司 Two-party data grouping statistical method, device and system
CN111984932B (en) * 2020-08-24 2023-11-14 支付宝(杭州)信息技术有限公司 Two-party data packet statistics method, device and system
CN112000991A (en) * 2020-10-27 2020-11-27 支付宝(杭州)信息技术有限公司 Multi-party data joint processing method, device and system
US20220158835A1 (en) * 2020-11-13 2022-05-19 Sony Group Corporation Zero-knowledge authentication based on device information
CN112800479B (en) * 2021-04-07 2021-07-06 支付宝(杭州)信息技术有限公司 Multi-party combined data processing method and device by using trusted third party
CN112800479A (en) * 2021-04-07 2021-05-14 支付宝(杭州)信息技术有限公司 Multi-party combined data processing method and device by using trusted third party
CN114401079A (en) * 2022-03-25 2022-04-26 腾讯科技(深圳)有限公司 Multi-party joint information value calculation method, related equipment and storage medium
CN114422107A (en) * 2022-03-31 2022-04-29 四川高速公路建设开发集团有限公司 Fault-tolerant ciphertext data aggregation method based on intelligent engineering construction system platform
CN115242444A (en) * 2022-06-22 2022-10-25 暨南大学 Verifiable privacy protection linear regression method and system
CN115242444B (en) * 2022-06-22 2023-08-01 暨南大学 Verifiable privacy protection linear regression method and system
CN115242409A (en) * 2022-09-21 2022-10-25 环球数科集团有限公司 Privacy calculation method and system based on zero-knowledge proof
CN115242409B (en) * 2022-09-21 2022-11-25 环球数科集团有限公司 Privacy calculation method and system based on zero-knowledge proof

Also Published As

Publication number Publication date
CN110991655B (en) 2021-04-02
WO2021120861A1 (en) 2021-06-24

Similar Documents

Publication Publication Date Title
CN110991655B (en) Method and device for processing model data by combining multiple parties
CN110912713B (en) Method and device for processing model data by multi-party combination
CN111162896B (en) Method and device for data processing by combining two parties
CN112989368B (en) Method and device for processing private data by combining multiple parties
TWI733106B (en) Model-based prediction method and device
CN110971405B (en) SM2 signing and decrypting method and system with cooperation of multiple parties
EP2547033B1 (en) Public-key encrypted bloom filters with applications to private set intersection
US20100329448A1 (en) Method for Secure Evaluation of a Function Applied to Encrypted Signals
EP3493460A1 (en) Cryptography device having secure provision of random number sequences
CN108933650B (en) Data encryption and decryption method and device
CN112906030B (en) Data sharing method and system based on multi-party homomorphic encryption
CN113162752B (en) Data processing method and device based on hybrid homomorphic encryption
CN112865953A (en) Safe multi-party computing method, device and system based on auxiliary server
Kaaniche et al. A novel zero-knowledge scheme for proof of data possession in cloud storage applications
WO2014030706A1 (en) Encrypted database system, client device and server, method and program for adding encrypted data
CN113132104A (en) Active and safe ECDSA (electronic signature SA) digital signature two-party generation method
CN111565108B (en) Signature processing method, device and system
US20240048360A1 (en) Method for processing multi-source data
CN114629620A (en) Homomorphic encryption calculation method and system, homomorphic request, calculation and key system
CN114337994A (en) Data processing method, device and system
WO2020174515A1 (en) Encryption system, key generation device, key generation method, key generation program, and homomorphic operation device
EP4262134A1 (en) Secure multi-party computation methods and apparatuses
CN114499844B (en) Method, device, equipment and medium for executing multiparty secure multiplication
EP4024755A1 (en) Secured performance of an elliptic curve cryptographic process
CN113420886B (en) Training method, device, equipment and storage medium for longitudinal federal learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026957

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant