WO2021190424A1 - 针对隐私数据进行多方联合降维处理的方法和装置 - Google Patents
针对隐私数据进行多方联合降维处理的方法和装置 Download PDFInfo
- Publication number
- WO2021190424A1 WO2021190424A1 PCT/CN2021/081962 CN2021081962W WO2021190424A1 WO 2021190424 A1 WO2021190424 A1 WO 2021190424A1 CN 2021081962 W CN2021081962 W CN 2021081962W WO 2021190424 A1 WO2021190424 A1 WO 2021190424A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- party
- holders
- attribute
- dimensionality reduction
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0442—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
Definitions
- One or more embodiments of this specification relate to the field of machine learning, and more particularly to multi-party joint dimensionality reduction processing for private data.
- the data required for machine learning often involves multiple platforms and multiple fields.
- the electronic payment platform owns the merchant's transaction flow data
- the e-commerce platform stores the merchant's sales data
- the banking institution owns the merchant's loan data.
- Data often exists in the form of islands. Due to industry competition, data security, user privacy and other issues, data integration is facing great resistance. How to integrate data scattered on various platforms without data leakage has become a challenge.
- the dimensions of various training data become larger and larger.
- these high-dimensional data often have some redundant information. Redundant information is very limited to the effect of machine learning, but the resulting high-dimensional feature data may cause a "dimensional explosion", which makes the machine learning model difficult to process and affects the training efficiency of the model. Therefore, when training and using the model, the high-dimensional sample features are often reduced in dimensionality, and they are converted into low-dimensional features without losing the amount of information as much as possible.
- the PCA (Principal Component Analysis) method is a method of statistical analysis and simplification of data sets. It uses orthogonal transformation to linearly transform the observations of a series of possible related variables, thereby projecting a series of linear variables. The values of related variables, these unrelated variables are called principal components. Principal component analysis can be used to reduce the dimensionality of the data set while maintaining the features that contribute the most to the variance in the data set. Therefore, in practice, the PCA method is often used to reduce the dimensionality of high-dimensional features.
- the PCA method of principal component analysis generally requires a unified transformation and principal component extraction for all data.
- how to use the PCA method to reduce the dimensionality of features without leaking private data has become a problem to be solved.
- One or more embodiments of this specification describe a method for multi-party joint dimensionality reduction for private data, so that multiple parties can jointly perform feature dimensionality reduction, while ensuring that the security of their private data is not leaked.
- a method for multi-party joint dimensionality reduction processing for private data is provided.
- the private data is distributed among M holders, where any k-th holder stores a number of business objects for a predetermined
- the k-th original matrix formed by the attribute values of the D item attributes the method is executed by a third party other than the M holders, and includes: receiving M encryption matrix products respectively provided to the M holders The homomorphic sum result of; wherein the k-th encrypted matrix product provided by the k-th holder is obtained by homomorphic encryption of the product of the k-th central matrix and its transposed matrix using the public key of the third party;
- the k-th central matrix is obtained by performing global zero averaging among the M holders of the attributes in the k-th original matrix; the private key corresponding to the public key is used to perform the homomorphic addition result
- Decrypt to obtain the covariance matrix; determine the dimensionality reduction transformation matrix based on the covariance matrix and the dimensionality reduction target dimension d; broadcast the dimensionality
- the third party after the above steps, also receives M dimensionality reduction matrices provided by the M holders; and based on the M dimensionality reduction matrices, determines the D item attributes of all business objects The total dimensionality reduction matrix after dimensionality reduction processing.
- the business object may be one of the following: users, merchants, commodities, and events; the business prediction analysis includes predicting the classification or regression value of the business object.
- the third party may also perform the following steps to assist each holder to perform global zero averaging : For any attribute i in the attribute of the item D, receive the encrypted sum of the attribute i, the encrypted sum is obtained by homomorphic summation of the M encrypted attributes and values provided by the M holders , Where the k-th encrypted attribute and value are obtained by homomorphically encrypting the sum result of the attribute value of the attribute i in the k-th original matrix by the k-th holder using the public key of the third party Use the private key to decrypt the encrypted sum to obtain the global sum value of the attribute i; determine the global average value of the attribute i according to the global sum value; broadcast the global average value to the M holders , So that each attribute i in the original matrix is globally zero-averaged.
- the foregoing receiving the encrypted sum for the attribute i may be: receiving the encrypted sum from one of the M holders; or, receiving the encrypted sum from the M holders The other party other than the third party receives the encrypted sum.
- the third party receiving the aforementioned homomorphic addition result may include receiving the homomorphic addition result from one of the M holders; or, from the M holders and the first The other party except the three parties receives the homomorphic sum result.
- a row in the k-th original matrix corresponds to an attribute
- a column corresponds to a business object
- the covariance matrix is, assuming that each of the M holders corresponds to the center
- the total dimension reduction matrix is a matrix obtained by horizontally splicing the M dimensionality reduction matrices.
- one row of the k-th original matrix corresponds to one business object, and one column corresponds to one attribute; in this case, the covariance matrix is, assuming that each of the M holders corresponds to The product of the joint matrix formed by longitudinal splicing of the central matrix and its transposed matrix; the total dimension reduction matrix is a matrix obtained by longitudinal splicing of the M dimension reduction matrices.
- the third party determines the dimensionality reduction transformation matrix in the following manner: determining multiple eigenvalues and multiple corresponding eigenvectors of the covariance matrix; from the multiple eigenvalues, determining The d eigenvalues of the target dimension with a larger output value are used as the d target eigenvalues; based on the d eigenvectors corresponding to the d target eigenvalues, the dimensionality reduction transformation matrix is formed.
- a method for multi-party joint dimensionality reduction processing for private data is provided.
- the private data is distributed among M holders, where any k-th holder stores a number of business objects for a predetermined
- the k-th original matrix formed by the attribute values of the attributes of the D item the method is executed by the k-th holder, including: performing a global zero mean value among the M holders for each attribute in the k-th original matrix To obtain the k-th central matrix; calculate the product matrix of the k-th central matrix and its transposed matrix, and use the public keys of a third party other than the M holders to perform homomorphic encryption on the product matrix, Obtain the k-th encrypted matrix product; provide the k-th encrypted matrix product so that the third party obtains the homomorphic addition result of the M encrypted matrix products provided by the M holders;
- the three parties receive the dimensionality reduction transformation matrix; process the k-th original matrix with the dimensionality reduction transformation matrix to obtain the k-th dimensionality reduction matrix, which is used to
- the k-th holder after obtaining the k-th dimensionality reduction matrix, the k-th holder also provides the k-th dimensionality reduction matrix to the third party, so that the third party determines to reduce the dimensionality of the D item attributes of all business objects The processed total dimension reduction matrix.
- performing global zero averaging among M holders for each attribute in the k-th original matrix includes: for any attribute i in the D-item attribute, calculating the first k the sum result of the attribute value of the attribute i in the original matrix; use the public key to homomorphically encrypt the sum result to obtain the k-th encrypted attribute and value; provide the k-th encrypted attribute and value, So that the third party can obtain the homomorphic sum result of the M encrypted attributes and values provided by the M holders; receive from the third party the homomorphic sum result of the attribute i determined based on the homomorphic sum result Global mean; the elements corresponding to the attribute i in the k-th original matrix are all subtracted from the global mean, so as to perform global zero averaging on the attribute i.
- the k-th holder sends the k-th encrypted attribute and value to the operation executor, so that the operation executor performs homomorphic summation of the M encrypted attributes and values , And send the result of the homomorphic addition to the third party; wherein, the operation executor is a party other than the k-th holder among the M holders, or the operation executor is The other party other than the M holders and the third party.
- the k-th holder receives corresponding M-1 encrypted attributes and values from M-1 other holders of the M holders; Value, perform homomorphic summation with the M-1 encrypted attributes and values, and send the homomorphic summation result to the third party.
- the k-th holder provides the k-th encrypted attribute and value, so that the third party can obtain the homomorphic addition result of the M encrypted attributes and values provided by the M holders .
- the k-th holder sends the k-th encrypted matrix product to the operation executor, so that the operation executor performs the homomorphic summation of the M encrypted matrix products, and the homomorphic
- the result of the addition is sent to the third party; wherein, the calculation executor is a party other than the k-th holder among the M holders, or the calculation executor is the M holders A party other than the party and the third party.
- the k-th holder receives corresponding M-1 encrypted matrix products from M-1 other holders of the M holders; for the k-th encrypted product, and The M-1 encrypted matrix products are homomorphically added, and the result of the homomorphic addition is sent to the third party.
- the k-th holder provides the k-th encryption matrix product, so that the third party obtains the homomorphic addition result of the M encryption matrix products respectively provided by the M holders.
- a row in the k-th original matrix corresponds to an attribute, and one column corresponds to a business object; in this case, the dimensionality reduction process is performed by multiplying the k-th original matrix by the dimensionality reduction transformation matrix .
- a row in the k-th original matrix corresponds to one business object, and one column corresponds to an attribute; in this case, the dimensionality reduction is performed by multiplying the k-th original matrix by the dimensionality reduction transformation matrix deal with.
- a device for multi-party joint dimensionality reduction processing for private data is provided.
- the private data is distributed among M holders, where any k-th holder stores a number of business objects for a predetermined
- the device is deployed in a third party other than the M holders, and includes: a receiving unit configured to receive information provided to the M holders respectively The homomorphic summation result of the product of M encrypted matrices; wherein the k-th encrypted matrix product provided by the k-th holder uses the public key of the third party to perform the product of the k-th central matrix and its transposed matrix Obtained by homomorphic encryption; the k-th central matrix is obtained by performing global zero averaging between the M holders of the attributes in the k-th original matrix; the decryption unit is configured to correspond to the public key The private key of the homomorphic addition is decrypted to obtain the covariance matrix; the determining unit is configured to determine the dimensionality reduction transformation matrix based on the covariance matrix and the dimensionality reduction target dimension d; the broadcasting unit is configured to The dimensionality reduction transformation matrix is broadcast to the M holders, so that each holder uses the dimensionality reduction transformation matrix to process its original matrix to obtain a corresponding dimensionality reduction
- a device for multi-party joint dimensionality reduction processing for private data is provided.
- the private data is distributed among M holders, where any k-th holder stores a number of business objects for a predetermined
- the k-th original matrix formed by the attribute values of the D item attribute the device is deployed in the k-th holder, and includes: an averaging unit configured to perform M holders of each attribute in the k-th original matrix The global zero averaging between the two to obtain the k-th central matrix;
- the encryption unit is configured to calculate the product matrix of the k-th central matrix and its transposed matrix, and use the public keys of a third party other than the M holders , Performing homomorphic encryption on the product matrix to obtain the k-th encryption matrix product;
- a providing unit configured to provide the k-th encryption matrix product, so that the third party obtains information provided to the M holders A homomorphic summation result of the product of M encryption matrices;
- a receiving unit configured to receive a dimensionality reduction
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect.
- a computing device including a memory and a processor, characterized in that executable code is stored in the memory, and when the processor executes the executable code, the first aspect or the first aspect is implemented. Two-sided approach.
- a neutral third party is introduced.
- the third party aggregates the matrix products of each holder to obtain the covariance matrix.
- the dimensionality reduction transformation matrix used for dimensionality reduction processing is determined.
- each holder can reduce the dimensionality of the local data based on the dimensionality reduction transformation matrix, and finally form the overall dimensionality-reduced data. In this way, the security of private data is ensured.
- Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification
- Figure 2 shows the execution process of the PCA method of principal component analysis
- FIG. 3 shows a schematic diagram of multiple parties jointly performing zero-averaging processing in an embodiment
- FIG. 4 shows a schematic diagram of a process of joint dimensionality reduction by multiple parties based on their central matrix in an embodiment
- Fig. 5 shows a schematic block diagram of an apparatus for joint dimensionality reduction deployed in a third party according to an embodiment
- Fig. 6 shows a schematic block diagram of an apparatus for joint dimensionality reduction deployed in the k-th holder according to an embodiment.
- FIG 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification.
- the training data is provided by multiple holders 1, 2, ..., M, and each holder owns a part of the training data.
- the training data can be attribute feature data of business objects, and the business objects can be users, merchants, commodities, events (such as transaction events, login events) and other business objects that need to be analyzed.
- each holder has data of the same attribute items of different object samples.
- the attribute characteristic data of these business objects belongs to private data and can be stored as a private data matrix. For the security of private data, all holders need to keep their private data locally, do not output plaintext data, and do not perform plaintext aggregation.
- the above multiple holders adopt the principal component analysis PCA method to jointly perform dimensionality reduction of the training data.
- the core step of the PCA method is to form a covariance matrix based on the data matrix to be reduced in dimension, and to solve the eigenvalues and eigenvectors of the covariance matrix.
- a neutral third-party device is introduced in order to ensure the security of the privacy data of each holder.
- Each holder k performs operations on its private data matrix locally to obtain the product matrix C k , and then collects it to a third party through homomorphic encryption, and the third party performs the operation of the covariance matrix to obtain a dimensionality reduction transformation matrix. Therefore, each holder can perform dimensionality reduction processing on its local privacy matrix based on the dimensionality reduction transformation matrix to achieve safe joint data dimensionality reduction.
- a D*N-dimensional original matrix Y can be formed based on the D-dimensional feature data of the above N samples. For example, a row can be used to represent an attribute, and a column can be used to represent a sample, thus forming an original matrix Y with D rows and N columns.
- step 202 zero-averaging processing, or called centralization processing, is performed on the original matrix to obtain a central matrix X.
- the goal of centralized processing is to make, for any attribute in the D-dimensional attribute, the mean value of the attribute value of all N samples on the attribute is 0.
- the mean value of the attribute values of all N samples for the attribute i can be obtained first, and then the element corresponding to the attribute i is subtracted from the mean value in the original matrix. For example, in the case of a row representing an attribute, for each row of the original matrix, the average value of the row is obtained, and then each element of the row is subtracted from the corresponding average value to obtain the zero-averaged central matrix X.
- step 204 the eigenvalue ⁇ (eigen value) and the eigenvector ⁇ (eigen vector) of the covariance matrix A are solved.
- the eigenvalue ⁇ and the eigenvector ⁇ of the covariance matrix A satisfy:
- the covariance matrix is a symmetric matrix, there are multiple eigenvalues and corresponding multiple eigenvectors, and the multiple eigenvectors are orthogonal to each other.
- the eigenvalues and eigenvectors of the covariance matrix can be solved by a variety of algorithms, such as eigenvalue decomposition. Multiple eigenvectors can form an eigen matrix.
- step 205 the dimensionality reduction transformation matrix T is determined.
- an eigenvector means a projection direction in the original D-dimensional space.
- the multiple eigenvectors are orthogonal to each other, which means that the multiple projection directions are orthogonal to each other.
- the essence of PCA feature dimensionality reduction is to find d mutually orthogonal projection directions in the original D-dimensional space as d coordinate directions, and project the original sample points into the d-dimensional mapping space formed by the d coordinate directions , So that the variance of the sample points after projection is as large as possible.
- the variance after projection in various directions can be represented by eigenvalues.
- the multiple eigenvalues obtained can be sorted from largest to smallest, and the first d eigenvalues can be taken as the target eigenvalues, and then the corresponding d target eigenvalues can be determined D eigenvectors of.
- the d eigenvectors correspond to the d coordinate directions selected for dimensionality reduction.
- the above d eigenvectors constitute a dimensionality reduction transformation matrix T.
- step 205 by applying a dimensionality reduction transformation matrix T to the original matrix Y, a dimensionality reduction matrix Y′ can be obtained, and the matrix Y′ is d*N-dimensional.
- transforming the original matrix Y into a matrix Y′ is equivalent to reducing the original D-dimensional feature to d-dimensionality.
- a third party is introduced in addition to the M holders.
- the third party does not need to be a trusted computing device, but it needs to be a neutral third party, that is, it will not interact with any holders.
- the parties jointly carry out data reverse calculations. Based on the third party, through homomorphic encryption, the covariance matrix jointly formed by the privacy data of each holder is determined, and then the dimensionality reduction transformation matrix is obtained, thereby performing safe joint dimensionality reduction.
- the business object is a user.
- One of the M data holders is a social platform, and has basic user attributes of n users. These attributes include, for example, user id, age, gender, occupation, and region.
- the other holder of the M data holders is, for example, another social platform, and the platform has the above-mentioned user basic attribute characteristics of the other m users. In this way, the feature data of the same attribute item of different users is horizontally distributed to different data holders.
- the business object is a merchant.
- One of the M data holders is an e-commerce platform and has attributes of n merchants. These attributes include, for example, merchant id, operating time, merchant category, accumulated sales volume, and so on.
- the other holder of the M data holders is, for example, another e-commerce platform, which has the aforementioned attribute characteristics of the other m merchants. In this way, the feature data of the same attribute item of different merchants is horizontally distributed to different data holders.
- the business object may also be a commodity to be analyzed, an event, etc., where the event may include a transaction event, a login event, a purchase event, a social event, and so on.
- the event may include a transaction event, a login event, a purchase event, a social event, and so on.
- each holder stores the attribute characteristics of the corresponding item.
- business objects are also referred to as samples.
- the attribute characteristic data of the samples owned by each holder is private data, and these private data can be stored in the form of a matrix. For example, assume that M data holders holds a prescription of any such k-holders, stored in the attribute property value D k N samples, then the number N k-dimensional attribute value of the sample D can be Form an original matrix Y k . The other holders respectively store the attribute values of the aforementioned D items of other samples to form the corresponding original matrix.
- each holder has the same attribute item data of different samples, in order to facilitate subsequent processing, in the preliminary stage before joint dimensionality reduction, each holder needs to align in the dimension of the attribute, that is, unified attribute
- the sample data are arranged in the order of predetermined attributes. Therefore, through attribute alignment, in the original matrix of each holder, the D attribute items are arranged in the same order.
- each column represents an attribute
- each row represents a sample.
- the k-th holder forms an original matrix of dimension N k *D. The following description is first combined with this matrix arrangement.
- each holder similarly forms the original matrix. If the original matrix of each holder is spliced along the vertical direction, the original full matrix Y can be formed:
- the original full matrix is an N*D-dimensional matrix, in which each column represents an attribute, and there are D columns in total; each row and column represents a sample, and there are N rows in total, where N is the number of all samples,
- each holder does not directly aggregate the plaintext of the original data, and the original full matrix is just a matrix assumed for the convenience of description.
- the attributes In order to reduce the dimensionality of the data, the attributes must first be zero-averaged. However, in the case of horizontal data distribution, for any attribute item i in the attribute of D item, the k-th holder only has the attribute value of N k samples for the attribute, and zero averaging needs to be for all N samples Calculate the attribute value of the attribute i, that is, perform global averaging calculation among M holders. Therefore, in this step, all parties need to cooperate to perform the zero-average processing together.
- Fig. 3 shows a schematic diagram of the zero-averaging process performed by multiple parties in one embodiment.
- homomorphic encryption is used to perform zero-averaging processing.
- the global zero averaging of the attribute is realized in the following way.
- step S31 the respective holders of the original matrix to calculate the k Y k, the summation result of the property attribute value i of S k.
- a column in the original matrix Y k represents an attribute
- the sum of the elements in the i-th column corresponding to the attribute i in the matrix Y k is calculated, which is the summation result S k .
- each holder k uses the public key of the third party to perform homomorphic encryption on the above-mentioned summation result S k to obtain the corresponding encryption attribute and value Enc(S k ).
- homomorphic encryption is an encryption algorithm in which an operation is performed on the plaintext and then encrypted, and the corresponding operation is performed on the ciphertext after encryption, and the result is equivalent. For example, using the same public key PK to encrypt v 1 and v 2 to obtain E PK (v 1 ) and E PK (v 2 ), if it satisfies:
- the encryption algorithm satisfies the additive homomorphism, where Add operation for the corresponding homomorphism.
- Operations can correspond to regular addition, multiplication, etc.
- Paillier's algorithm Corresponds to regular multiplication.
- Enc represents an encryption algorithm that satisfies the additive homomorphism, such as the Paillier algorithm.
- step S33 the M holders gather their encrypted attributes and values Enc(S k ) into a certain operation executor, and the operation executor performs a homomorphic addition operation on the M encrypted attributes and values , Get the homomorphic sum result Enc(S), namely:
- the above-mentioned calculation execution party is one of M holders, for example, the k-th holder shown in FIG. 3.
- the above calculation execution party can also be another party other than the M holders, but it cannot be the aforementioned third party P, so as to prevent the third party P from decrypting with its private key to obtain the attributes and values of each holder.
- S k is plaintext, causing privacy leakage.
- step S34 the above-mentioned calculation execution party sends the homomorphic addition result Enc(S) to the third party P.
- step S35 the third party P uses the private key corresponding to the public key to decrypt the homomorphic addition result Enc(S).
- the result obtained by decrypting Enc(S) is the global sum of the attribute i:
- the total number of samples N can be obtained in a variety of ways.
- each holder k reports its sample number N k to the third party P in advance, and the third party P obtains the total sample number N through formula (3).
- each holder encrypts the attribute and value Enc(S k ) at the transmitter, it sends the number of samples N k together , and the above calculation execution party summarizes the number of samples, Forward to the third party P.
- the third party P calculates the global mean value of the attribute i based on the global sum value S and the total number of samples N. Then, in step S37, the third party P broadcasts the global average to M holders.
- each holder k subtracts the global mean S′ from the elements corresponding to the attribute i in its original matrix Y k, so as to perform global zero-averaging on the attribute i.
- the global zero averaging is realized for each attribute in the original matrix.
- the matrix obtained by performing global zero-averaging on all attributes of the original matrix Y k in the k-th holder can be called the k-th central matrix, denoted as X k .
- the embodiment of FIG. 3 adopts homomorphic encryption to perform global zero averaging of attributes.
- the global zero averaging can also be achieved by other secure calculation methods, for example, by multiple parties to securely calculate the MPC.
- the k-th holder can locally calculate the sum value of the column, and thus form a D-dimensional column sum vector Sk .
- the k-th holder and other holders adopt the MPC method to sum the column and value vectors and the number of samples of each holder to obtain the total column vector S and the total number of samples N, where:
- the above MPC method can be implemented by the addition of secret sharing.
- a neutral third party or any of the M holders can calculate the total mean vector
- the total mean vector It is a D-dimensional vector, and the i-th element represents the global mean of the i-th column in the original full matrix Y.
- each holder k uses the total mean vector to perform column zero averaging to obtain the corresponding k-th central matrix X k .
- each holder performs global zero-average processing on each attribute to obtain the corresponding central matrix. If the central matrix of each holder is spliced along the vertical direction, a joint matrix X can be formed:
- the joint matrix is an N*D-dimensional matrix and a zero-averaged matrix. Since the central matrix may still leak privacy, each holder cannot directly concatenate the central matrix.
- the joint matrix is just assumed to be a matrix for the convenience of description.
- the covariance matrix is a square matrix of D*D dimensions.
- Fig. 4 shows a schematic diagram of a process in which multiple parties perform joint dimensionality reduction based on their central matrix in an embodiment.
- each holder k locally calculates the k-th central matrix X k and its transpose matrix Product matrix
- the product matrix can be simply denoted as
- each holder k uses the aforementioned third-party public key PK to calculate the product matrix Perform homomorphic encryption to obtain the k-th encryption matrix product
- the homomorphic encryption of the matrix is equivalent to the homomorphic encryption of each element in the matrix.
- step S43 the M holders multiply their encryption matrix products Concentrated in a certain operation execution party, the operation execution party performs a homomorphic addition operation on the products of M encryption matrices, and obtains the homomorphic addition result Enc(C), namely:
- the above-mentioned calculation execution party is any one of the M holders, for example, as shown in FIG. 4, it is the M-th holder.
- the aforementioned calculation execution party may also be another party other than the M holders and the aforementioned third party P.
- step S43 the operator that performs the calculation on the product of the M encryption matrices in step S43 may be the same or different from the operator that performs the calculation on the M encryption attributes and values in step S33 in FIG. 3
- the executor is not limited here.
- step S44 the above-mentioned calculation execution party sends the homomorphic addition result Enc(C) to the third party P.
- step S45 the third party P uses the private key SK corresponding to the public key PK to decrypt the homomorphic addition result Enc(C).
- the result obtained by decrypting Enc(C) is the covariance matrix C:
- step S46 the third party P determines the dimensionality reduction transformation matrix T based on the covariance matrix C and the dimensionality reduction target dimension d.
- the third party P may determine multiple eigenvalues ⁇ of the covariance matrix C and multiple corresponding eigenvectors v.
- the eigenvalues can be solved by, for example, Jacobi iteration.
- d eigenvalues of the target dimension with larger values are determined as d target eigenvalues, namely ⁇ 1 , ⁇ 2 ,..., ⁇ d , and the corresponding d eigenvectors v 1 , ⁇ 2 ,..., ⁇ d . Therefore, based on the d eigenvectors ⁇ 1 , ⁇ 2 , ..., ⁇ d , a dimensionality reduction transformation matrix T can be formed.
- the dimensionality reduction transformation matrix T can be arranged as a D*d-dimensional matrix.
- step S47 the third party P broadcasts the calculated dimensionality reduction transformation matrix T to M holders.
- each holder k can process its k-th original matrix Y k with the aforementioned dimensionality reduction transformation matrix T to obtain the corresponding k-th dimensionality reduction matrix Y′ k .
- the dimension reduction can be performed by the following formula deal with:
- the k-th holder obtains the N k *d-dimensional dimension reduction matrix Y′ k , which is equivalent to reducing the original D-dimensional features of each sample to d dimensions.
- each holder can use its dimensionality reduction matrix to perform further processing such as data analysis, model training, etc., so as to perform business forecast analysis on business objects.
- each holder may also concentrate its respective dimensionality reduction matrix Y′ k to a third party P to form a total dimensionality reduction matrix to facilitate joint training and processing of data.
- each holder k sends its k- th dimensionality reduction matrix Y′ k to the third party P, and the third party P receives M dimensionality reduction matrices provided by the M holders respectively. Then, the third party P, based on the M dimensionality reduction matrices, forms a total dimensionality reduction matrix Y'obtained by performing dimensionality reduction processing on the D item attributes of all N business objects, namely:
- the total dimension reduction matrix Y′ is equivalent to compressing and reducing the dimension of the D item attribute feature in the original full matrix Y to d.
- the total dimensionality reduction matrix Y′ can be used by all holders to jointly conduct efficient machine learning to analyze and predict business objects.
- the original matrix Y k of each holder k is a D*N k -dimensional matrix with D rows and N k columns.
- the original full matrix Y is formed by splicing the original matrix Y k of each holder along the horizontal direction:
- the original full matrix is a D*N-dimensional matrix.
- Each holder can still zero-average attributes according to the method shown in Figure 3. After each holder obtains its central matrix, if the central matrix is spliced along the horizontal direction, it can be assumed to form a joint matrix X:
- formula (12) still holds, and the method shown in FIG. 4 can still be used to obtain the covariance matrix C and the dimensionality reduction matrix T.
- the dimensionality reduction matrix T can be arranged as a d*D dimensional matrix.
- the original matrix can be processed by the following formula:
- T is a d*D-dimensional matrix
- the original matrix Y k is a D*N k -dimensional matrix, so a d*N k -dimensional reduced-dimensional matrix Y′ k is obtained .
- the dimensionality reduction matrices of the M holders can be spliced horizontally to form the final total dimensionality reduction matrix:
- the total dimension reduction matrix of d*N dimensions is obtained.
- the transposition is performed uniformly as the final total Dimension reduction matrix, namely:
- each data holder can jointly perform feature dimensionality reduction processing without leaking private data, thereby more effectively performing shared machine learning and Joint training.
- a device for multi-party joint dimensionality reduction for private data wherein the private data is distributed among M holders, and the device is deployed in a third party other than the M holders .
- each holder, as well as the third party can be implemented as any device, platform or device cluster with data storage, computing, and processing capabilities.
- any k-th holder stores a k-th original matrix composed of several business objects with respect to the attribute values of the predetermined D item attributes.
- Fig. 5 shows a schematic block diagram of an apparatus for joint dimensionality reduction deployed in a third party according to an embodiment.
- the device 500 deployed in the third party includes: a receiving unit 52 configured to receive the homomorphic addition results of the M encryption matrix products provided by the M holders;
- the k-th encrypted matrix product provided by Youfang is obtained by homomorphically encrypting the product of the k-th central matrix and its transposed matrix using the public key of the third party; the k-th central matrix is obtained from the k-th original
- Each attribute in the matrix is obtained after global zero averaging among M holders;
- the decryption unit 53 is configured to use the private key corresponding to the public key to decrypt the homomorphic addition result to obtain the covariance Matrix;
- a determining unit 54 configured to determine a dimensionality reduction transformation matrix based on the covariance matrix and the target dimension d of dimensionality reduction;
- a broadcasting unit 55 configured to broadcast the dimensionality reduction transformation matrix to the M holding
- the device 500 deployed in the third party further includes a total matrix determining unit (not shown) configured to receive M dimensionality reduction matrices respectively provided by the M holders; and based on the M A dimensionality reduction matrix determines the total dimensionality reduction matrix after dimensionality reduction processing is performed on the D item attributes of all business objects.
- a total matrix determining unit (not shown) configured to receive M dimensionality reduction matrices respectively provided by the M holders; and based on the M A dimensionality reduction matrix determines the total dimensionality reduction matrix after dimensionality reduction processing is performed on the D item attributes of all business objects.
- the business object may be one of the following: user, merchant, commodity, event.
- the business prediction analysis includes predicting the classification or regression value of the business object.
- the device 500 deployed in a third party further includes an equalization auxiliary unit 51, which further includes (not shown): an encrypted sum receiving module configured to, for any attribute i of the D item attributes , Receiving the encrypted sum for the attribute i, the encrypted sum is obtained by performing a homomorphic addition operation on the M encrypted attributes and values provided by the M holders, where the k-th encrypted attribute and value is The k-th holder uses the public key of the third party to homomorphically encrypt the sum result of the attribute value of the attribute i in the k-th original matrix; the global sum determination module is configured to use the private Key decrypts the encrypted sum to obtain the global sum value of the attribute i; the average value determination module is configured to determine the global average value of the attribute i according to the global sum value; the average broadcast module is configured to broadcast the global average value to The M holders make each of them perform global zero-averaging on the attribute i in the original matrix.
- an encrypted sum receiving module configured to, for any attribute i of the D item attributes , Receiving
- the foregoing encrypted sum receiving module may be configured to receive the encrypted sum from one of the M holders; or, from the M holders and the first The other party other than the three parties receives the encrypted sum.
- the above-mentioned receiving unit 52 may be configured to receive the homomorphic sum result from one of the M holders; or, from other than the M holders and the third party The other party receives the homomorphic sum result.
- a row in the k-th original matrix corresponds to an attribute
- a column corresponds to a business object
- the covariance matrix obtained by the decryption unit 53 is, assuming that the M holdings
- the product of the joint matrix formed by horizontally splicing the corresponding central matrix of each square and its transposed matrix; correspondingly, the total dimensionality reduction matrix is a matrix obtained by horizontally splicing the M dimensionality reduction matrices.
- one row in the k-th original matrix corresponds to one business object, and one column corresponds to one attribute;
- the covariance matrix obtained by the decryption unit 53 is, assuming the M holdings
- the product of the joint matrix formed by vertical splicing of the respective corresponding center matrices of the respective squares and the transposed matrix; correspondingly, the total dimensionality reduction matrix is a matrix obtained by longitudinal splicing of the M dimensionality reduction matrices.
- the determining unit 54 is specifically configured to: determine multiple eigenvalues of the covariance matrix and multiple corresponding eigenvectors; determine a target with a larger value from the multiple eigenvalues The dimension d eigenvalues are used as the d target eigenvalues; and the dimensionality reduction transformation matrix is formed based on the d eigenvectors corresponding to the d target eigenvalues.
- an apparatus for multi-party joint dimensionality reduction for private data is provided. Private data is distributed among M holders.
- the device is deployed in any k-th holder among the M data holders. Any k-th holder stores a number of business objects with respect to predetermined D item attributes. The k-th original matrix formed by attribute values.
- the device cooperates with a third party other than the M holders to perform data dimensionality reduction processing. It needs to be understood that each holder and the third party can be implemented as any device, platform or device cluster with data storage, computing, and processing capabilities.
- Fig. 6 shows a schematic block diagram of an apparatus for joint dimensionality reduction deployed in the k-th holder according to an embodiment. As shown in FIG.
- the device 600 deployed in the k-th holder includes: an averaging unit 61 configured to perform global zero averaging among the M holders on each attribute in the k-th original matrix , Obtain the k-th central matrix; the encryption unit 62 is configured to calculate the product matrix of the k-th central matrix and its transposed matrix, and use the public key of a third party other than the M holders to compare the product matrix Perform homomorphic encryption to obtain the k-th encryption matrix product; the providing unit 63 is configured to provide the k-th encryption matrix product, so that the third party obtains the M encryption matrix products respectively provided to the M holders The receiving unit 64 is configured to receive the dimensionality reduction transformation matrix from the third party; the dimensionality reduction processing unit 65 is configured to use the dimensionality reduction transformation matrix to process the k-th original matrix to obtain the k-th original matrix The dimensionality reduction matrix is used to perform business predictive analysis on the business object by means of machine learning.
- the above-mentioned apparatus 600 further includes a sending unit (not shown) configured to provide the k-th dimensionality reduction matrix to the third party so that it can determine to reduce the D item attributes of all business objects.
- the total dimension reduction matrix after dimension processing.
- the averaging unit 61 adopts a multi-party safe calculation of MPC, and cooperates with corresponding units in other holders to perform global zero averaging.
- the averaging unit 61 specifically includes (not shown): a sum value calculation module configured to calculate the attribute i in the k-th original matrix for any attribute i in the attribute of the D item The summation result of the attribute value of i; a sum value encryption module configured to use the public key to homomorphically encrypt the summation result to obtain the k-th encrypted attribute and value; the encryption sum value providing module is configured to provide The k-th encrypted attribute and value, so that the third party obtains the homomorphic sum result of the M encrypted attributes and values provided by the M holders; the mean value receiving module is configured to obtain the result from the third party Receive the global mean value of the attribute i determined based on the result of the homomorphic summation; the mean value processing module is configured to subtract the global mean value from the elements corresponding to the attribute i in the k-th original matrix to Attribute i is globally zero-averaged.
- the encryption and value providing module is configured to send the k-th encrypted attribute and value to an operation executor, so that the operation executor performs homomorphic addition to the M encrypted attributes and values And send the homomorphic sum result to the third party.
- the calculation execution party is a party other than the k-th holder among the M holders, or is another party other than the M holders and the third party.
- the encryption and value providing module is configured to receive corresponding M-1 encrypted attributes and values from M-1 other holders among the M holders; and encrypt the kth
- the attributes and values are homomorphically added with the M-1 encrypted attributes and values, and the result of the homomorphic addition is sent to the third party.
- the encryption and value providing module provides the k-th encrypted attribute and value, so that the third party can obtain the homomorphic addition result of the M encrypted attributes and values provided by the M holders .
- the providing unit 63 is specifically configured to send the k-th encrypted matrix product to an operation executor, so that the operation executor performs a homomorphic summation of the M encryption matrix products, and adds the same
- the result of the state addition is sent to the third party; wherein, the calculation execution party is the party other than the k-th holder among the M holders, or the calculation execution party is the M A party other than the holder and the third party.
- the providing unit 63 is specifically configured to receive corresponding M-1 encrypted matrix products from M-1 other holders among the M holders; , Perform homomorphic summation with the product of the M-1 encryption matrices, and send the homomorphic summation result to the third party.
- the providing unit 63 provides the k-th encryption matrix product, so that the third party obtains the homomorphic addition result of the M encryption matrix products respectively provided by the M holders.
- a row in the k-th original matrix corresponds to an attribute
- a column corresponds to a business object; in this case, the dimensionality reduction processing unit 65 multiplies the k-th original matrix by the dimensionality reduction transformation matrix. , Perform dimensionality reduction processing.
- a row in the k-th original matrix corresponds to one business object, and one column corresponds to an attribute; in this case, the dimensionality reduction processing unit 65 multiplies the k-th original matrix by the dimensionality reduction transform Matrix, for dimensionality reduction processing.
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 3 and FIG. 4.
- a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 3 and FIG. 4 method.
- the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof.
- these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本说明书实施例提供一种针对隐私数据进行多方联合降维处理的方法和装置。其中,多方中的各个数据持有方,在对其本地拥有的隐私数据矩阵进行转置相乘运算得到乘积矩阵后,使用第三方的公钥对乘积矩阵进行同态加密,然后将同态加密后的乘积矩阵汇总到某个运算平台中进行同态加和操作,并将同态加和结果发至第三方。第三方通过对同态加和结果进行解密,可得到主成分分析所需的协方差矩阵,进而确定出降维变换矩阵广播给各个持有方。于是,各个持有方可以利用该降维变换矩阵进行降维处理。
Description
本说明书一个或多个实施例涉及机器学习领域,尤其涉及针对隐私数据进行多方联合降维处理。
机器学习所需要的数据往往会涉及到多个平台、多个领域。例如在基于机器学习的商户分类分析场景中,电子支付平台拥有商户的交易流水数据,电子商务平台存储有商户的销售数据,银行机构拥有商户的借贷数据。数据往往以孤岛的形式存在。由于行业竞争、数据安全、用户隐私等问题,数据整合面临着很大阻力,如何在保证数据不泄露的前提下将分散在各个平台的数据整合在一起成为一项挑战。
另一方面,随着数据量的增多,各种训练数据的维度变得越来越大。高维度的大量数据尽管可以丰富机器学习的训练样本数据,但是实际上,这些高维数据往往存在一些冗余信息。冗余信息对机器学习效果的帮助十分有限,但是造成的高维特征数据有可能引起“维度爆炸”,使得机器学习模型难以处理,影响模型的训练效率。因此,在进行模型训练和使用时,常常对高维样本特征进行降维处理,在尽量不损失信息量的情况下,将其转化为低维特征。
主成分分析PCA(Principal component analysis)方法,是一种统计分析、简化数据集的方法,它利用正交变换来对一系列可能相关的变量的观测值进行线性变换,从而投影为一系列线性不相关变量的值,这些不相关变量称为主成分。主成分分析可以用于减少数据集的维数,同时保持数据集中的对方差贡献最大的特征。因此,实践中,常常采用PCA方法对高维特征进行降维。
然而,主成分分析PCA方法一般需要针对全部数据进行统一的变换和主成分提取。在多方共同拥有部分训练数据,希望联合进行模型训练的情况下,如何在不泄露隐私数据的前提下,采用PCA方法进行特征降维,成为有待解决的问题。
因此,希望能有改进的方案,可以针对隐私数据进行多方联合降维,同时保证隐私数据的安全不泄露。
发明内容
本说明书一个或多个实施例描述了针对隐私数据进行多方联合降维的方法,使得多方共同进行特征降维,同时保证各自隐私数据的安全不泄露。
根据第一方面,提供了一种针对隐私数据进行多方联合降维处理的方法,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述方法通过所述M个持有方之外的第三方执行,包括:接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;其中第k持有方提供的第k加密矩阵乘积,是使用所述第三方的公钥,对第k中心矩阵及其转置矩阵的乘积进行同态加密得到的;所述第k中心矩阵是对第k原始矩阵中各项属性进行M个持有方之间的全局零均值化后得到的;利用所述公钥对应的私钥,对同态加和结果进行解密,得到协方差矩阵;基于所述协方差矩阵和降维的目标维度d,确定出降维变换矩阵;将所述降维变换矩阵广播至所述M个持有方,使得各个持有方利用所述降维变换矩阵处理其原始矩阵,得到对应的降维矩阵;所述降维矩阵用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据一种实施方式,在上述步骤之后,第三方还接收所述M个持有方分别提供的M个降维矩阵;并基于所述M个降维矩阵,确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
在不同实施例中,业务对象可以为以下之一:用户,商户,商品,事件;所述业务预测分析包括,预测所述业务对象的分类或回归值。
根据一种实施方式,第三方在获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果之前,还可以执行以下步骤,辅助各个持有方进行全局零均值化:对于所述D项属性中的任意属性i,接收针对该属性i的加密总和,所述加密总和是对所述M个持有方提供的M个加密属性和值进行同态加和操作得到的,其中第k个加密属性和值,是所述第k持有方利用所述第三方的公钥,对第k原始矩阵中该属性i的属性值的求和结果进行同态加密得到的;利用所述私钥解密所述加密总和,得到该属性i的全局和值;根据所述全局和值,确定该属性i的全局均值;将所述全局均值广播至所述M个持有方,使其各自对其中的原始矩阵中的属性i进行全局零均值化。
在上述实施方式的不同实施例中,上述接收针对该属性i的加密总和可以是,从所述M个持有方之一,接收所述加密总和;或者,从所述M个持有方和所述第三方之外 的另一方,接收所述加密总和。
根据不同实施例,第三方接收前述同态加和结果可以包括,从所述M个持有方之一接收所述同态加和结果;或者,从所述M个持有方和所述第三方之外的另一方,接收所述同态加和结果。
在一个实施例中,所述第k原始矩阵中一行对应一项属性,一列对应一个业务对象;在这样的情况下,所述协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行横向拼接所形成的联合矩阵与其转置矩阵的乘积;所述总降维矩阵为,将所述M个降维矩阵进行横向拼接,得到的矩阵。
在另一实施例中,所述第k原始矩阵中一行对应一个业务对象,一列对应一项属性;在这样的情况下,所述协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行纵向拼接所形成的联合矩阵与其转置矩阵的乘积;所述总降维矩阵为,将所述M个降维矩阵进行纵向拼接,得到的矩阵。
根据一种实施方式,第三方通过以下方式确定出降维变换矩阵:确定所述协方差矩阵的多个本征值和对应的多个本征向量;从所述多个本征值中,确定出值较大的目标维度d个本征值作为d个目标本征值;基于所述d个目标本征值对应的d个本征向量,形成所述降维变换矩阵。
根据第二方面,提供了一种针对隐私数据进行多方联合降维处理的方法,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述方法通过该第k持有方执行,包括:对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,得到第k中心矩阵;计算第k中心矩阵及其转置矩阵的乘积矩阵,并使用所述M个持有方之外的第三方的公钥,对所述乘积矩阵进行同态加密,得到第k加密矩阵乘积;提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;从所述第三方接收降维变换矩阵;用所述降维变换矩阵处理所述第k原始矩阵,得到第k降维矩阵,用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据一种实施方式,在得到第k降维矩阵后,第k持有方还将所述第k降维矩阵提供给所述第三方,使其确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
在一个实施例中,对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,具体包括:对于所述D项属性中的任意属性i,计算所述第k原始矩阵中该属性 i的属性值的求和结果;使用所述公钥,对所述求和结果进行同态加密,得到第k加密属性和值;提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果;从所述第三方接收基于所述同态加和结果确定的该属性i的全局均值;将所述第k原始矩阵中该属性i对应的元素,均减去所述全局均值,以对该属性i进行全局零均值化。
进一步地,在一个具体例子中,第k持有方将所述第k加密属性和值,发送至运算执行方,使得所述运算执行方对所述M个加密属性和值进行同态加和,并将同态加和结果发送至所述第三方;其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者,所述运算执行方为所述M个持有方和所述第三方之外的另一方。
在另一个具体例子中,第k持有方从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密属性和值;对所述第k加密属性和值,和所述M-1个加密属性和值进行同态加和,将同态加和结果发送至所述第三方。
通过以上多种方式,第k持有方提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果。
在一个实施例中,第k持有方将所述第k加密矩阵乘积,发送至运算执行方,使得所述运算执行方对所述M个加密矩阵乘积进行同态加和,并将同态加和结果发送至所述第三方;其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者,所述运算执行方为所述M个持有方和所述第三方之外的另一方。
在另一实施例中,第k持有方从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密矩阵乘积;对所述第k加密乘积,和所述M-1个加密矩阵乘积进行同态加和,将同态加和结果发送至所述第三方。
通过以上多种方式,第k持有方提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果。
在一个实施例中,第k原始矩阵中一行对应一项属性,一列对应一个业务对象;在这样的情况下,通过用所述降维变换矩阵乘以所述第k原始矩阵,进行降维处理。
在另一实施例中,第k原始矩阵中一行对应一个业务对象,一列对应一项属性;在这样的情况下,通过用所述第k原始矩阵乘以所述降维变换矩阵,进行降维处理。
根据第三方面,提供了一种针对隐私数据进行多方联合降维处理的装置,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D 项属性的属性值构成的第k原始矩阵,所述装置部署在所述M个持有方之外的第三方中,包括:接收单元,配置为接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;其中第k持有方提供的第k加密矩阵乘积,是使用所述第三方的公钥,对第k中心矩阵及其转置矩阵的乘积进行同态加密得到的;所述第k中心矩阵是对第k原始矩阵中各项属性进行M个持有方之间的全局零均值化后得到的;解密单元,配置为利用所述公钥对应的私钥,对同态加和结果进行解密,得到协方差矩阵;确定单元,配置为基于所述协方差矩阵和降维的目标维度d,确定出降维变换矩阵;广播单元,配置为将所述降维变换矩阵广播至所述M个持有方,使得各个持有方利用所述降维变换矩阵处理其原始矩阵,得到对应的降维矩阵;所述降维矩阵用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据第四方面,提供了一种针对隐私数据进行多方联合降维处理的装置,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述装置部署在第k持有方中,包括:均值化单元,配置为对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,得到第k中心矩阵;加密单元,配置为计算第k中心矩阵及其转置矩阵的乘积矩阵,并使用所述M个持有方之外的第三方的公钥,对所述乘积矩阵进行同态加密,得到第k加密矩阵乘积;提供单元,配置为提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;接收单元,配置为从所述第三方接收降维变换矩阵;降维处理单元,配置为用所述降维变换矩阵处理所述第k原始矩阵,得到第k降维矩阵,用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面或第二方面的方法。
根据第六方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面或第二方面的方法。
根据本说明书实施例提供的方法和装置,为了保证各个持有方隐私数据安全,引入中立的第三方,通过同态加密的方式,由第三方汇总各个持有方的矩阵乘积得到协方差矩阵,从而确定出用于降维处理的降维变换矩阵。如此,各个持有方可以基于降维变换矩阵对本地数据进行降维,并最终形成总体的降维后的数据。通过这样的方式,确保了 隐私数据的安全。
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本说明书披露的一个实施例的实施场景示意图;
图2示出主成分分析PCA方法的执行过程;
图3示出在一个实施例中多方共同进行零均值化处理的示意图;
图4示出在一个实施例中多方基于其中心矩阵进行联合降维的过程示意图;
图5示出根据一个实施例的部署在第三方中用于联合降维的装置的示意性框图;
图6示出根据一个实施例的部署在第k持有方中用于联合降维的装置的示意性框图。
下面结合附图,对本说明书提供的方案进行描述。
图1为本说明书披露的一个实施例的实施场景示意图。如图1所示,在共享学习场景下,训练数据由多个持有方1,2,…,M共同提供,每个持有方拥有训练数据的一部分。训练数据可以是业务对象的属性特征数据,业务对象可以是用户,商户,商品,事件(例如交易事件,登录事件)等各种业务上有待分析的对象。在一种分布情况下,各个持有方拥有不同对象样本的相同属性项的数据。这些业务对象的属性特征数据属于隐私数据,可存储为隐私数据矩阵。为了隐私数据的安全,各个持有方需要将其隐私数据留在本地,不输出明文数据,不进行明文聚合。
在本说明书一个实施例的场景下,以上多个持有方采用主成分分析PCA方法,联合进行训练数据的降维。如本领域技术人员所知,PCA方法的核心步骤是基于待降维的数据矩阵形成协方差矩阵,并求解该协方差矩阵的本征值和本征向量。在本说明书的实施例中,为了保证各个持有方隐私数据安全,引入中立的第三方设备。各个持有方k本地对其隐私数据矩阵进行运算,得到乘积矩阵C
k后,通过同态加密的方式,汇总到第三 方,由第三方实施协方差矩阵的运算,得到降维变换矩阵。于是,各个持有方可以基于该降维变换矩阵对其本地的隐私矩阵进行降维处理,实现安全的联合数据降维。
为了更清楚地描述以上过程,首先结合图2,描述主成分分析PCA方法的执行过程。
在图2中,假定存在N个样本的D维特征数据,每维特征数据可以对应样本的一项属性的属性值。假定希望将D维特征降维为d维,其中d<D。
那么首先在步骤201,可以基于上述N个样本的D维特征数据,形成D*N维的原始矩阵Y。例如,可以用一行表示一项属性,用一列表示一个样本,如此形成D行N列的原始矩阵Y。
接着在步骤202,对该原始矩阵进行零均值化处理,或称为中心化处理,得到中心矩阵X。中心化处理的目标是使得,针对D维属性中任意一项属性,所有N个样本在该属性上的属性值的均值为0。操作上,对于属性i,可以先求得所有N个样本对于属性i的属性值的均值,然后在原始矩阵中将与该属性i对应的元素减去该均值。例如,在一行表示一项属性的情况下,对于原始矩阵的每一行,求得该行的均值,然后将该行的每个元素减去对应均值,即得到零均值化的中心矩阵X。
然后,在步骤203,计算协方差矩阵A,具体地,A=XX
T,X
T为矩阵X的转置。
接着,在步骤204,求解协方差矩阵A的本征值λ(eigen value)和本征向量ν(eigen vector)。数学上,协方差矩阵A的本征值λ和本征向量ν满足:
Aν=λν (1)
协方差矩阵为对称矩阵,存在多个本征值和对应的多个本征向量,这多个本征向量之间彼此正交。可以通过多种算法,例如本征值分解的方法,求解出协方差矩阵的本征值和本征向量。多个本征向量可以构成本征矩阵。
基于以上得出的多个本征值和多个本征向量,在步骤205,确定出降维变换矩阵T。
需要理解,在物理含义上,一个本征向量意味着在原始D维空间中的一个投影方向。多个本征向量之间彼此正交,意味着该多个投影方向之间彼此正交。而PCA特征降维的本质即为,在原始D维空间中找出d个彼此正交的投影方向作为d个坐标方向,将原始样本点投影到该d个坐标方向构成的d维映射空间中,使得投影后样本点的方差尽可能大。在各个方向投影后的方差可以通过本征值体现。
基于这样的思想,对于得到的多个本征值,可按照从大到小的顺序排序,取出前d 个本征值作为目标本征值,进而可确定出与该d个目标本征值对应的d个本征向量。该d个本征向量对应于降维选定的d个坐标方向。上述d个本征向量构成降维变换矩阵T。
于是,在步骤205,通过对原始矩阵Y施加降维变换矩阵T,就可以得到降维后的矩阵Y′,该矩阵Y′为d*N维。如此,将原始矩阵Y变换为矩阵Y′,相当于将原本的D维特征降维为d维。
以上描述了PCA方法中,通过对协方差矩阵分解本征值,而将原始矩阵进行降维的过程。然而,在图1所示的共享学习的场景下,各个数据持有方各自持有一部分样本数据,且无法进行明文聚合,因此无法直接形成表示全部样本数据的矩阵,也就难以直接获得协方差矩阵,进而对其进行求解。
为此,在本说明书的实施例中,在M个持有方之外引入第三方,该第三方可以不必是可信计算设备,但是需要是中立的第三方,即,不会与任何持有方联合进行数据反推。基于该第三方,通过同态加密的方式,确定出各个持有方隐私数据共同形成的协方差矩阵,进而得到降维变换矩阵,从而进行安全的联合降维。
下面描述联合降维的具体实施过程。
需要说明,本实施例的方案针对数据横向分布的情况而设计,即各个持有方存储有不同业务对象的相同项目的隐私数据。具体的,如图1所示,假定存在M个数据持有方。在数据横向分布的情况下,每个持有方拥有不同业务对象的相同属性项的特征数据。
例如,在一个例子中,业务对象为用户。M个数据持有方中的某一个为社交平台,拥有n个用户的用户基本属性特征,这些属性例如包括用户id、年龄、性别、职业、地区。M个数据持有方的另一持有方例如为另一社交平台,该平台拥有另外m个用户的上述用户基本属性特征。如此,不同用户的相同属性项的特征数据横向分布于不同的数据持有方。
在另一例子中,业务对象为商户。M个数据持有方中的某一个为电商平台,拥有n个商户的属性特征,这些属性例如包括商户id、经营时长、商户类别、累积销量等。M个数据持有方的另一持有方例如为另一电商平台,该平台拥有另外m个商户的上述属性特征。如此,不同商户的相同属性项的特征数据横向分布于不同的数据持有方。
在其他实施例中,业务对象还可以是有待分析的商品、事件等等,其中事件可以包括交易事件、登录事件、购买事件、社交事件等等。针对各种的业务对象,各个持有方存储有相应项目的属性特征。以下描述中,还将业务对象称为样本。
各个持有方拥有的样本的属性特征数据为隐私数据,这些隐私数据可以以矩阵的形式存储。例如,假定M个数据持有方中任意一个持有方,例如第k持有方,存储有N
k个样本的D项属性的属性值,那么,该N
k个样本的D维属性值可以构成一个原始矩阵Y
k。其他持有方分别存储有其他样本的上述D项属性的属性值,形成对应的原始矩阵。
需要理解,由于各个持有方拥有不同样本的相同属性项数据,为了便于后续处理,在进行联合降维之前的预备阶段中,各个持有方需要在属性的维度上进行对齐,也就是统一属性的顺序,均按照预定的属性顺序来排布样本数据。因此,通过属性对齐,各个持有方的原始矩阵中,D个属性项按照相同的顺序排布。
在一个实施例中,在形成原始矩阵时,用每列代表一个属性,每行表示一个样本。如此,第k持有方形成N
k*D维的原始矩阵。下面首先结合此种矩阵排布方式进行描述。
可以理解,各个持有方均类似的形成原始矩阵。如果将各个持有方的原始矩阵沿着纵向拼接,可以形成原始全矩阵Y:
该原始全矩阵为N*D维矩阵,其中每列表示一项属性,共有D列;每行列表示一个样本,共有N行,其中N为全部样本数目,
并且各行之间针对属性顺序已进行对齐。
如前所述,各个持有方并不进行原始数据的明文直接聚合,该原始全矩阵只是为了描述方便假定形成的矩阵。
参照图2对于CPA方法的介绍,为了进行数据降维,首先要对各项属性进行零均值化处理。然而,在数据横向分布的情况下,对于D项属性中的任意属性项i,第k持有方只是拥有N
k个样本针对该属性的属性值,而零均值化则需要针对所有N个样本针对该属性i的属性值进行计算,也就是进行M个持有方之间的全局均值化计算。因此,在该步骤中,需要各方协同,共同进行零均值化处理。
图3示出在一个实施例中多方共同进行零均值化处理的示意图。在图3的示例中,采用同态加密的方式进行零均值化处理。
具体的,如图3所示,假定各个持有方k拥有对应的原始矩阵Y
k,其中k=1到M。 针对D项属性中任意一项属性i,通过以下方式实现该属性的全局零均值化。
首先,在步骤S31,各个持有方k计算其原始矩阵Y
k中,该属性i的属性值的求和结果S
k。当原始矩阵Y
k中用一列表示一项属性时,计算矩阵Y
k中属性i对应的第i列的元素之和,即为该求和结果S
k。
然后,在步骤S32,各个持有方k,使用第三方的公钥,对上述求和结果S
k进行同态加密,得到对应的加密属性和值Enc(S
k)。
需要理解,同态加密是这样一种加密算法,即,对明文进行运算后再加密,与加密后对密文进行相应的运算,结果是等价的。例如,用同样的公钥PK加密v
1和v
2得到E
PK(v
1)和E
PK(v
2),如果满足:
在步骤S32中,Enc即表示满足加法同态性的加密算法,例如Paillier算法。
接着,在步骤S33,M个持有方将各自的加密属性和值Enc(S
k)集中到某个运算执行方中,由该运算执行方对M个加密属性和值进行同态加和操作,得到同态加和结果Enc(S),即:
在一实施例中,上述运算执行方为M个持有方之一,例如图3所示为第k持有方。
在另一实施例中,上述运算执行方也可是M个持有方之外的另一方,但不能是前述第三方P,以免第三方P用其私钥解密得到各持有方的属性和值S
k明文,造成隐私泄露。
然后,在步骤S34,上述运算执行方将同态加和结果Enc(S)发送给第三方P。
在步骤S35,第三方P利用与公钥对应的私钥,对同态加和结果Enc(S)进行解密。根据公式(4)示出的加法同态性,和公式(5)示出的运算方式,对Enc(S)解密得到的结果,即为该属性i的全局和值:
然后,在步骤S36,第三方P根据该全局和值,得到该属性i的全局均值S`=S/N,其中N为全部业务对象样本的数目,如公式(3)所示。
该总样本数N可以通过多种方式得到。在一个例子中,各个持有方k在预备阶段,预先向第三方P报告其样本数N
k,该第三方P通过公式(3)得到总样本数N。在另一例子中,在以上步骤S33中,各个持有方在发送器加密属性和值Enc(S
k)时,一并发送其样本数N
k,由上述运算执行方对样本数汇总后,转发给第三方P。
如此,第三方P根据全局和值S和总样本数N,计算得到了该属性i的全局均值。然后,在步骤S37,第三方P将该全局均值广播给M个持有方。
在步骤S38,各个持有方k将其原始矩阵Y
k中该属性i对应的元素,均减去全局均值S`,从而对该属性i进行全局零均值化。
通过对D项属性中的每项属性均执行上述过程,对原始矩阵中各项属性均实现全局零均值化。第k持有方中对原始矩阵Y
k的各项属性均进行全局零均值化后得到的矩阵,可以称为第k中心矩阵,记为X
k。
图3的实施例采用同态加密的方式进行了属性的全局零均值化。在其他实施例中,全局零均值化还可通过其他安全计算方式实现,例如通过多方安全计算MPC的方式。
具体地,在列表示属性的情况下,针对D列中的各个列,第k持有方可以本地计算该列的和值,于是形成D维的列和值向量S
k。
然后,第k持有方与其他持有方采用MPC方式,对各个持有方的列和值向量和样本数分别求和,得到总列向量S和总样本数N,其中:
以上MPC方式具体可以采用秘密分享的加法实现。
以上通过多种方式,各个持有方对各项属性进行全局零均值化处理,得到对应的中心矩阵。如果将各个持有方的中心矩阵沿着纵向拼接,可以形成联合矩阵X:
该联合矩阵为N*D维矩阵,且为零均值化的矩阵。由于中心矩阵仍有可能会泄露隐私,因此,各个持有方无法直接将中心矩阵进行拼接。该联合矩阵只是为了描述方便假定形成的矩阵。
在假定形成上述联合矩阵后,参照图2所示的CPA方法的步骤203,可以如下计算协方差矩阵C=XX
T:
可以理解,该协方差矩阵为D*D维的方阵。
通过公式(10)最右侧的表示可以看到,协方差矩阵可分解为M个本地计算的分解矩阵。基于这样的思路,进行后续的降维变换处理。下面描述在各个持有方形成中心矩阵之后,联合进行降维的过程。
图4示出在一个实施例中多方基于其中心矩阵进行联合降维的过程示意图。
在一个实施例中,上述运算执行方为M个持有方中的任一个,例如,如图4所示为第M持有方。
在另一实施例中,上述运算执行方也可是M个持有方和上述第三方P之外的另 一方。
需要理解,在步骤S43中对M个加密矩阵乘积进行运算的运算执行方,与图3中步骤S33中对M个加密属性和值进行运算的执行方,可以是相同的执行方,或者不同的执行方,在此不做限定。
然后,在步骤S44,上述运算执行方将同态加和结果Enc(C)发送给第三方P。
在步骤S45,第三方P利用与公钥PK对应的私钥SK,对同态加和结果Enc(C)进行解密。根据公式(4)示出的加法同态性,和公式(11)示出的运算方式,对Enc(C)解密得到的结果,即为协方差矩阵C:
于是,在步骤S46,第三方P基于协方差矩阵C和降维的目标维度d,确定出降维变换矩阵T。
具体的,第三方P可以确定协方差矩阵C的多个本征值λ和对应的多个本征向量v。本征值的求解可以通过例如雅各比迭代等方式进行。然后,从多个本征值中,确定出值较大的目标维度d个本征值作为d个目标本征值,即λ
1,λ
2,...,λ
d,并确定出对应的d个本征向量v
1,ν
2,...,ν
d。于是,可以基于该d个本征向量ν
1,ν
2,...,ν
d,形成降维变换矩阵T。该降维变换矩阵T可以排布为D*d维矩阵。
然后,在步骤S47,第三方P将计算得到的降维变换矩阵T广播给M个持有方。
在步骤S48,各个持有方k,可以用上述降维变换矩阵T处理其第k原始矩阵Y
k,得到对应的第k降维矩阵Y′
k。
如前所述,在用行表示样本,用列表示属性的情况下,第k原始矩阵为N
k*D维矩阵,而降维矩阵为D*d维,此时可以通过下式进行降维处理:
Y′
k=Y
kT (13)
如此,第k持有方得到了N
k*d维的降维矩阵Y′
k,这相当于将每个样本的原始D维特征,缩减为d维。
于是,各个持有方可以利用其降维矩阵进行数据分析、模型训练等进一步处理,从而对业务对象进行业务预测分析。
在一种实施方式中,各个持有方还可以将其各自的降维矩阵Y′
k集中到第三方P,形成总降维矩阵,以便于数据的联合训练和处理。
具体的,各个持有方k将其第k降维矩阵Y′
k发送给第三方P,于是第三方P接收到M个持有方分别提供的M个降维矩阵。然后,第三方P基于该M个降维矩阵,形成对全部N个业务对象的D项属性进行降维处理后的总降维矩阵Y’,即:
该总降维矩阵Y′相当于将原始全矩阵Y中的D项属性特征压缩降维到d。总降维矩阵Y′可用于各个持有方共同进行高效的机器学习,对业务对象进行分析和预测。
以上结合用行表示样本,用列表示属性的矩阵排布方式,描述了多方联合降维的过程。在用行表示属性,用列表示样本的情况下,也可以类似地实施以上过程,只需对少量步骤进行相应修改。
具体的,在一行表示一项属性,一列表示一个样本的情况下,各个持有方k的原始矩阵Y
k为D行N
k列的D*N
k维矩阵。并且,原始全矩阵Y通过将各个持有方的原始矩阵Y
k沿着横向拼接而形成:
Y=(Y
1Y
2...Y
M) (15)
该原始全矩阵为D*N维矩阵。
各个持有方仍可以按照图3的方法进行属性的零均值化。在各个持有方得到其中心矩阵后,如果将中心矩阵沿着横向拼接,可以假定形成联合矩阵X:
X=(X
1X
2...X
M) (16)
此种情况下,公式(12)仍然成立,仍然可以采用图4所示的方法得到协方差矩阵C和降维矩阵T。在该实施例中,可以将降维矩阵T排布为d*D维矩阵。
各个持有方k在得到降维矩阵后,可以通过下式处理其原始矩阵:
Y′
k=TY
k (17)
其中,T为d*D维矩阵,原始矩阵Y
k为D*N
k维矩阵,于是得到d*N
k维降维矩阵Y′
k。M个持有方各自的降维矩阵可以横向拼接,形成最终的总降维矩阵:
Y′=(Y′
1Y′
2...Y′
M) (18)
于是得到d*N维的总降维矩阵。在实际的机器学习操作中,由于往往是将样本排布为行,属性特征排布为列,因此通常在得到上述(18)的总降维矩阵后,再统一进 行转置,作为最终的总降维矩阵,即:
Y′=(Y′
1Y′
2...Y′
M)
T (19)
回顾以上过程,通过引入中立的第三方,并利用同态加密的方式,各个数据持有方可以在不泄露隐私数据的情况下,联合进行特征降维处理,从而更有效地进行共享机器学习和联合训练。
根据另一方面的实施例,提供了一种针对隐私数据进行多方联合降维的装置,其中隐私数据分布在M个持有方中,该装置部署在M个持有方之外的第三方中。需要理解,各个持有方,以及该第三方均可以实现为任何具有数据存储、计算、处理能力的设备、平台或设备集群。在上述M个持有方中,任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵。
图5示出根据一个实施例的部署在第三方中用于联合降维的装置的示意性框图。如图5所示,部署在第三方中的装置500包括:接收单元52,配置为接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;其中第k持有方提供的第k加密矩阵乘积,是使用所述第三方的公钥,对第k中心矩阵及其转置矩阵的乘积进行同态加密得到的;所述第k中心矩阵是对第k原始矩阵中各项属性进行M个持有方之间的全局零均值化后得到的;解密单元53,配置为利用所述公钥对应的私钥,对同态加和结果进行解密,得到协方差矩阵;确定单元54,配置为基于所述协方差矩阵和降维的目标维度d,确定出降维变换矩阵;广播单元55,配置为将所述降维变换矩阵广播至所述M个持有方,使得各个持有方利用所述降维变换矩阵处理其原始矩阵,得到对应的降维矩阵;所述降维矩阵用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据一种实施方式,部署在第三方中的装置500还包括总矩阵确定单元(未示出),配置为接收所述M个持有方分别提供的M个降维矩阵;并基于所述M个降维矩阵,确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
在不同实施例中,业务对象可以为以下之一:用户,商户,商品,事件。所述业务预测分析包括,预测所述业务对象的分类或回归值。
根据一种实施方式,部署在第三方中的装置500还包括均值化辅助单元51,其进一步包括(未示出):加密总和接收模块,配置为,对于所述D项属性中的任意属性i,接收针对该属性i的加密总和,所述加密总和是对所述M个持有方提供的M个加密属性和值进行同态加和操作得到的,其中第k个加密属性和值,是所述第k持有方利用 所述第三方的公钥,对第k原始矩阵中该属性i的属性值的求和结果进行同态加密得到的;全局总和确定模块,配置为利用所述私钥解密所述加密总和,得到该属性i的全局和值;均值确定模块,配置为根据所述全局和值,确定该属性i的全局均值;均值广播模块,配置为将所述全局均值广播至所述M个持有方,使其各自对其中的原始矩阵中的属性i进行全局零均值化。
在上述实施方式的不同实施例中,上述加密总和接收模块可以配置为,从所述M个持有方之一,接收所述加密总和;或者,从所述M个持有方和所述第三方之外的另一方,接收所述加密总和。
根据不同实施例,上述接收单元52可以配置为,从所述M个持有方之一接收所述同态加和结果;或者,从所述M个持有方和所述第三方之外的另一方,接收所述同态加和结果。
在一个实施例中,所述第k原始矩阵中一行对应一项属性,一列对应一个业务对象;在这样的情况下,所述解密单元53得到的协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行横向拼接所形成的联合矩阵与其转置矩阵的乘积;相应的,总降维矩阵为,将所述M个降维矩阵进行横向拼接,得到的矩阵。
在另一实施例中,所述第k原始矩阵中一行对应一个业务对象,一列对应一项属性;在这样的情况下,所述解密单元53得到的协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行纵向拼接所形成的联合矩阵与其转置矩阵的乘积;相应的,总降维矩阵为,将所述M个降维矩阵进行纵向拼接,得到的矩阵。
根据一种实施方式,确定单元54具体配置为:确定所述协方差矩阵的多个本征值和对应的多个本征向量;从所述多个本征值中确定出值较大的目标维度d个本征值作为d个目标本征值;基于所述d个目标本征值对应的d个本征向量形成降维变换矩阵。
根据又一实施例,提供了一种针对隐私数据进行多方联合降维的装置。隐私数据分布在M个持有方中,该装置部署在M个数据持有方中任意的第k持有方中,任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵。该装置与M个持有方之外的第三方协同进行数据降维处理。需要理解,各个持有方以及该第三方均可实现为任何具有数据存储、计算、处理能力的设备、平台或设备集群。图6示出根据一个实施例的部署在第k持有方中用于联合降维的装置的示意性框图。如图6所示,部署在第k持有方中的装置600包括:均值化单元61,配置为对所述第k 原始矩阵中各项属性进行M个持有方之间的全局零均值化,得到第k中心矩阵;加密单元62,配置为计算第k中心矩阵及其转置矩阵的乘积矩阵,并使用所述M个持有方之外的第三方的公钥,对所述乘积矩阵进行同态加密,得到第k加密矩阵乘积;提供单元63,配置为提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;接收单元64,配置为从所述第三方接收降维变换矩阵;降维处理单元65,配置为用所述降维变换矩阵处理所述第k原始矩阵,得到第k降维矩阵,用以通过机器学习的方式对所述业务对象进行业务预测分析。
根据一种实施方式,上述装置600还包括,发送单元(未示出),配置为将所述第k降维矩阵提供给所述第三方,使其确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
根据一种实施方式,所述均值化单元61采用多方安全计算MPC的方式,与其他持有方中的相应单元,协同进行全局零均值化。
根据另一种实施方式,所述均值化单元61具体包括(未示出):和值计算模块,配置为对于所述D项属性中的任意属性i,计算所述第k原始矩阵中该属性i的属性值的求和结果;和值加密模块,配置为使用所述公钥,对所述求和结果进行同态加密,得到第k加密属性和值;加密和值提供模块,配置为提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果;均值接收模块,配置为从所述第三方接收基于所述同态加和结果确定的该属性i的全局均值;均值处理模块,配置为将所述第k原始矩阵中该属性i对应的元素,均减去所述全局均值,以对该属性i进行全局零均值化。
进一步地,在一个例子中,加密和值提供模块配置为,将所述第k加密属性和值发送至运算执行方,使得所述运算执行方对所述M个加密属性和值进行同态加和,并将同态加和结果发送至所述第三方。其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者为所述M个持有方和所述第三方之外的另一方。
在另一个例子中,加密和值提供模块配置为,从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密属性和值;对所述第k加密属性和值,和所述M-1个加密属性和值进行同态加和,将同态加和结果发送至所述第三方。
通过以上多种方式,加密和值提供模块提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果。
在一个实施例中,提供单元63具体配置为,将所述第k加密矩阵乘积发送至运算执行方,使得所述运算执行方对所述M个加密矩阵乘积进行同态加和,并将同态加和结果发送至所述第三方;其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者,所述运算执行方为所述M个持有方和所述第三方之外的另一方。
在另一实施例中,提供单元63具体配置为,从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密矩阵乘积;对所述第k加密乘积,和所述M-1个加密矩阵乘积进行同态加和,将同态加和结果发送至所述第三方。
通过以上多种方式,提供单元63提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果。
在一个实施例中,第k原始矩阵中一行对应一项属性,一列对应一个业务对象;在这样的情况下,降维处理单元65通过用所述降维变换矩阵乘以所述第k原始矩阵,进行降维处理。
在另一实施例中,第k原始矩阵中一行对应一个业务对象,一列对应一项属性;在这样的情况下,降维处理单元65通过用所述第k原始矩阵乘以所述降维变换矩阵,进行降维处理。
通过以上的装置,实现隐私保护的多方联合降维。
根据另一实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图3和图4描述的方法。
根据再一实施例,还提供一种计算设备,包括存储器和处理器,所述存储器存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图3和图4所述的方法。
本领域技术人员应该意识到,在上述一个或多个示例中,本发明所描述的功能可用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。
Claims (22)
- 一种针对隐私数据进行多方联合降维处理的方法,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述方法通过所述M个持有方之外的第三方执行,包括:接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;其中第k持有方提供的第k加密矩阵乘积,是使用所述第三方的公钥,对第k中心矩阵及其转置矩阵的乘积进行同态加密得到的;所述第k中心矩阵是对第k原始矩阵中各项属性进行M个持有方之间的全局零均值化后得到的;利用所述公钥对应的私钥,对同态加和结果进行解密,得到协方差矩阵;基于所述协方差矩阵和降维的目标维度d,确定出降维变换矩阵;将所述降维变换矩阵广播至所述M个持有方,使得各个持有方利用所述降维变换矩阵处理其原始矩阵,得到对应的降维矩阵;所述降维矩阵用以通过机器学习的方式对所述业务对象进行业务预测分析。
- 根据权利要求1所述的方法,还包括:接收所述M个持有方分别提供的M个降维矩阵;基于所述M个降维矩阵,确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
- 根据权利要求1所述的方法,其中,所述业务对象为以下之一:用户,商户,商品,事件;所述业务预测分析包括,预测所述业务对象的分类或回归值。
- 根据权利要求1所述的方法,其中,在获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果之前,还包括:对于所述D项属性中的任意属性i,接收针对该属性i的加密总和,所述加密总和是对所述M个持有方提供的M个加密属性和值进行同态加和操作得到的,其中第k个加密属性和值,是所述第k持有方利用所述第三方的公钥,对第k原始矩阵中该属性i的属性值的求和结果进行同态加密得到的;利用所述私钥解密所述加密总和,得到该属性i的全局和值;根据所述全局和值,确定该属性i的全局均值;将所述全局均值广播至所述M个持有方,使其各自对其中的原始矩阵中的属性i进行全局零均值化。
- 根据权利要求4所述的方法,其中,接收针对该属性i的加密总和包括:从所述M个持有方之一,接收所述加密总和;或者,从所述M个持有方和所述第三方之外的另一方,接收所述加密总和。
- 根据权利要求1所述的方法,其中,接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果,包括:从所述M个持有方之一接收所述同态加和结果;或者,从所述M个持有方和所述第三方之外的另一方,接收所述同态加和结果。
- 根据权利要求2所述的方法,其中,所述第k原始矩阵中一行对应一项属性,一列对应一个业务对象;所述协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行横向拼接所形成的联合矩阵与其转置矩阵的乘积;所述确定对全部业务对象的D项属性进行降维处理后的总降维矩阵包括,将所述M个降维矩阵进行横向拼接,得到所述总降维矩阵。
- 根据权利要求2所述的方法,其中,所述第k原始矩阵中一行对应一个业务对象,一列对应一项属性;所述协方差矩阵为,假定所述M个持有方各自对应的中心矩阵进行纵向拼接所形成的联合矩阵与其转置矩阵的乘积;所述确定对全部业务对象的D项属性进行降维处理后的总降维矩阵,包括:将所述M个降维矩阵进行纵向拼接,得到所述总降维矩阵。
- 根据权利要求1所述的方法,其中,基于所述协方差矩阵和降维的目标维度,确定出降维变换矩阵,包括:确定所述协方差矩阵的多个本征值和对应的多个本征向量;从所述多个本征值中,确定出值较大的目标维度d个本征值作为d个目标本征值;基于所述d个目标本征值对应的d个本征向量,形成所述降维变换矩阵。
- 一种针对隐私数据进行多方联合降维处理的方法,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述方法通过该第k持有方执行,包括:对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,得到第k中心矩阵;计算第k中心矩阵及其转置矩阵的乘积矩阵,并使用所述M个持有方之外的第三方的公钥,对所述乘积矩阵进行同态加密,得到第k加密矩阵乘积;提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;从所述第三方接收降维变换矩阵;用所述降维变换矩阵处理所述第k原始矩阵,得到第k降维矩阵,用以通过机器学习的方式对所述业务对象进行业务预测分析。
- 根据权利要求10所述的方法,在得到第k降维矩阵后,还包括:将所述第k降维矩阵提供给所述第三方,使其确定对全部业务对象的D项属性进行降维处理后的总降维矩阵。
- 根据权利要求10所述的方法,其中,对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,包括:对于所述D项属性中的任意属性i,计算所述第k原始矩阵中该属性i的属性值的求和结果;使用所述公钥,对所述求和结果进行同态加密,得到第k加密属性和值;提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果;从所述第三方接收基于所述同态加和结果确定的该属性i的全局均值;将所述第k原始矩阵中该属性i对应的元素,均减去所述全局均值,以对该属性i进行全局零均值化。
- 根据权利要求12所述的方法,其中,提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果,包括:将所述第k加密属性和值,发送至运算执行方,使得所述运算执行方对所述M个加密属性和值进行同态加和,并将同态加和结果发送至所述第三方;其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者,所述运算执行方为所述M个持有方和所述第三方之外的另一方。
- 根据权利要求12所述的方法,其中,提供所述第k加密属性和值,以使得所述第三方获取到所述M个持有方提供的M个加密属性和值的同态加和结果,包括:从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密属性和值;对所述第k加密属性和值,和所述M-1个加密属性和值进行同态加和,将同态加和结果发送至所述第三方。
- 根据权利要求10所述的方法,其中,提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果,包括:将所述第k加密矩阵乘积,发送至运算执行方,使得所述运算执行方对所述M个加密矩阵乘积进行同态加和,并将同态加和结果发送至所述第三方;其中,所述运算执行方为所述M个持有方中该第k持有方之外的一方,或者,所述运算执行方为所述M个持有方和所述第三方之外的另一方。
- 根据权利要求10所述的方法,其中,提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果,包括:从所述M个持有方中M-1个其他持有方分别接收对应的M-1个加密矩阵乘积;对所述第k加密乘积,和所述M-1个加密矩阵乘积进行同态加和,将同态加和结果发送至所述第三方。
- 根据权利要求10所述的方法,其中,所述第k原始矩阵中一行对应一项属性,一列对应一个业务对象;所述用所述降维变换矩阵处理所述第k原始矩阵包括:用所述降维变换矩阵乘以所述第k原始矩阵。
- 根据权利要求10所述的方法,其中,所述第k原始矩阵中一行对应一个业务对象,一列对应一项属性;所述用所述降维变换矩阵处理所述第k原始矩阵包括:用所述第k原始矩阵乘以所述降维变换矩阵。
- 一种针对隐私数据进行多方联合降维处理的装置,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述装置部署在所述M个持有方之外的第三方中,包括:接收单元,配置为接收对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;其中第k持有方提供的第k加密矩阵乘积,是使用所述第三方的公钥,对第k中心矩阵及其转置矩阵的乘积进行同态加密得到的;所述第k中心矩阵是对第k原始矩阵中各项属性进行M个持有方之间的全局零均值化后得到的;解密单元,配置为利用所述公钥对应的私钥,对同态加和结果进行解密,得到协方差矩阵;确定单元,配置为基于所述协方差矩阵和降维的目标维度d,确定出降维变换矩阵;广播单元,配置为将所述降维变换矩阵广播至所述M个持有方,使得各个持有方利用所述降维变换矩阵处理其原始矩阵,得到对应的降维矩阵;所述降维矩阵用以通过机器学习的方式对所述业务对象进行业务预测分析。
- 一种针对隐私数据进行多方联合降维处理的装置,所述隐私数据分布在M个持有方中,其中任意的第k持有方存储有若干业务对象针对预定的D项属性的属性值构成的第k原始矩阵,所述装置部署在第k持有方中,包括:均值化单元,配置为对所述第k原始矩阵中各项属性进行M个持有方之间的全局零均值化,得到第k中心矩阵;加密单元,配置为计算第k中心矩阵及其转置矩阵的乘积矩阵,并使用所述M个持有方之外的第三方的公钥,对所述乘积矩阵进行同态加密,得到第k加密矩阵乘积;提供单元,配置为提供所述第k加密矩阵乘积,以使得所述第三方获取对所述M个持有方分别提供的M个加密矩阵乘积的同态加和结果;接收单元,配置为从所述第三方接收降维变换矩阵;降维处理单元,配置为用所述降维变换矩阵处理所述第k原始矩阵,得到第k降维矩阵,用以通过机器学习的方式对所述业务对象进行业务预测分析。
- 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-18中任一项的所述的方法。
- 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-18中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010220436.7 | 2020-03-25 | ||
CN202010220436.7A CN111400766B (zh) | 2020-03-25 | 2020-03-25 | 针对隐私数据进行多方联合降维处理的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021190424A1 true WO2021190424A1 (zh) | 2021-09-30 |
Family
ID=71429141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/081962 WO2021190424A1 (zh) | 2020-03-25 | 2021-03-22 | 针对隐私数据进行多方联合降维处理的方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111400766B (zh) |
WO (1) | WO2021190424A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113821764A (zh) * | 2021-11-22 | 2021-12-21 | 华控清交信息科技(北京)有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
CN115314202A (zh) * | 2022-10-10 | 2022-11-08 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于安全多方计算的数据处理方法及电子设备、存储介质 |
CN116383848A (zh) * | 2023-04-04 | 2023-07-04 | 北京航空航天大学 | 一种三方安全计算防作恶方法、设备及介质 |
CN116484430A (zh) * | 2023-06-21 | 2023-07-25 | 济南道图信息科技有限公司 | 一种智慧心理平台用户隐私数据加密保护方法 |
CN117440103A (zh) * | 2023-12-20 | 2024-01-23 | 山东大学 | 基于同态加密和空间优化的隐私数据处理方法及系统 |
CN117439731A (zh) * | 2023-12-21 | 2024-01-23 | 山东大学 | 基于同态加密的隐私保护大数据主成分分析方法及系统 |
CN118070308A (zh) * | 2024-02-20 | 2024-05-24 | 北京火山引擎科技有限公司 | 用于安全计算的数据处理方法、装置、介质及电子设备 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400766B (zh) * | 2020-03-25 | 2021-08-06 | 支付宝(杭州)信息技术有限公司 | 针对隐私数据进行多方联合降维处理的方法及装置 |
CN112989368B (zh) * | 2021-02-07 | 2022-05-17 | 支付宝(杭州)信息技术有限公司 | 多方联合进行隐私数据处理的方法及装置 |
CN113949505B (zh) * | 2021-10-15 | 2024-07-02 | 支付宝(杭州)信息技术有限公司 | 一种隐私保护的多方安全计算方法和系统 |
CN114710259B (zh) * | 2022-03-22 | 2024-04-19 | 中南大学 | 多方联合的安全pca投影方法及数据相关性分析方法 |
CN115622685B (zh) * | 2022-12-16 | 2023-04-28 | 成方金融科技有限公司 | 隐私数据同态加密方法、装置及系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040172A1 (en) * | 2012-01-10 | 2014-02-06 | Telcordia Technologies, Inc. | Privacy-Preserving Aggregated Data Mining |
CN103873234A (zh) * | 2014-03-24 | 2014-06-18 | 西安电子科技大学 | 面向无线体域网的生物量子密钥分发方法 |
CN109345331A (zh) * | 2018-08-21 | 2019-02-15 | 中国科学技术大学苏州研究院 | 一种带隐私保护的群智感知系统任务分配方法 |
CN110889139A (zh) * | 2019-11-26 | 2020-03-17 | 支付宝(杭州)信息技术有限公司 | 针对用户隐私数据进行多方联合降维处理的方法及装置 |
CN110889447A (zh) * | 2019-11-26 | 2020-03-17 | 支付宝(杭州)信息技术有限公司 | 基于多方安全计算检验模型特征显著性的方法和装置 |
CN110912713A (zh) * | 2019-12-20 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | 多方联合进行模型数据处理的方法及装置 |
CN111400766A (zh) * | 2020-03-25 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | 针对隐私数据进行多方联合降维处理的方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102970143B (zh) * | 2012-12-13 | 2015-04-22 | 中国科学技术大学苏州研究院 | 采用加法同态加密方法进行安全计算双方持有数和的指数的方法 |
CN110046660B (zh) * | 2019-04-10 | 2023-06-27 | 江南大学 | 一种基于半监督学习的乘积量化方法 |
CN110705648A (zh) * | 2019-10-12 | 2020-01-17 | 中国民航大学 | 大规模多视图数据自降维K-means算法及系统 |
-
2020
- 2020-03-25 CN CN202010220436.7A patent/CN111400766B/zh active Active
-
2021
- 2021-03-22 WO PCT/CN2021/081962 patent/WO2021190424A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140040172A1 (en) * | 2012-01-10 | 2014-02-06 | Telcordia Technologies, Inc. | Privacy-Preserving Aggregated Data Mining |
CN103873234A (zh) * | 2014-03-24 | 2014-06-18 | 西安电子科技大学 | 面向无线体域网的生物量子密钥分发方法 |
CN109345331A (zh) * | 2018-08-21 | 2019-02-15 | 中国科学技术大学苏州研究院 | 一种带隐私保护的群智感知系统任务分配方法 |
CN110889139A (zh) * | 2019-11-26 | 2020-03-17 | 支付宝(杭州)信息技术有限公司 | 针对用户隐私数据进行多方联合降维处理的方法及装置 |
CN110889447A (zh) * | 2019-11-26 | 2020-03-17 | 支付宝(杭州)信息技术有限公司 | 基于多方安全计算检验模型特征显著性的方法和装置 |
CN110912713A (zh) * | 2019-12-20 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | 多方联合进行模型数据处理的方法及装置 |
CN111400766A (zh) * | 2020-03-25 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | 针对隐私数据进行多方联合降维处理的方法及装置 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113821764B (zh) * | 2021-11-22 | 2022-02-11 | 华控清交信息科技(北京)有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
CN113821764A (zh) * | 2021-11-22 | 2021-12-21 | 华控清交信息科技(北京)有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
CN115314202A (zh) * | 2022-10-10 | 2022-11-08 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于安全多方计算的数据处理方法及电子设备、存储介质 |
CN115314202B (zh) * | 2022-10-10 | 2023-01-24 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于安全多方计算的数据处理方法及电子设备、存储介质 |
US11853449B1 (en) | 2022-10-10 | 2023-12-26 | Harbin Institute of Technology, (Shenzhen) (Shenzhen Int'l Technical Innovation Rearch Institute) | Data processing method based on secure multi-party computation, electronic device, and storage medium |
CN116383848B (zh) * | 2023-04-04 | 2023-11-28 | 北京航空航天大学 | 一种三方安全计算防作恶方法、设备及介质 |
CN116383848A (zh) * | 2023-04-04 | 2023-07-04 | 北京航空航天大学 | 一种三方安全计算防作恶方法、设备及介质 |
CN116484430B (zh) * | 2023-06-21 | 2023-08-29 | 济南道图信息科技有限公司 | 一种智慧心理平台用户隐私数据加密保护方法 |
CN116484430A (zh) * | 2023-06-21 | 2023-07-25 | 济南道图信息科技有限公司 | 一种智慧心理平台用户隐私数据加密保护方法 |
CN117440103A (zh) * | 2023-12-20 | 2024-01-23 | 山东大学 | 基于同态加密和空间优化的隐私数据处理方法及系统 |
CN117440103B (zh) * | 2023-12-20 | 2024-03-08 | 山东大学 | 基于同态加密和空间优化的隐私数据处理方法及系统 |
CN117439731A (zh) * | 2023-12-21 | 2024-01-23 | 山东大学 | 基于同态加密的隐私保护大数据主成分分析方法及系统 |
CN117439731B (zh) * | 2023-12-21 | 2024-03-12 | 山东大学 | 基于同态加密的隐私保护大数据主成分分析方法及系统 |
CN118070308A (zh) * | 2024-02-20 | 2024-05-24 | 北京火山引擎科技有限公司 | 用于安全计算的数据处理方法、装置、介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN111400766A (zh) | 2020-07-10 |
CN111400766B (zh) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021190424A1 (zh) | 针对隐私数据进行多方联合降维处理的方法和装置 | |
CN111162896B (zh) | 双方联合进行数据处理的方法及装置 | |
CN112989368B (zh) | 多方联合进行隐私数据处理的方法及装置 | |
Vaidya et al. | Privacy-preserving SVM classification | |
Kim et al. | Secure and differentially private logistic regression for horizontally distributed data | |
CN110889139B (zh) | 针对用户隐私数据进行多方联合降维处理的方法及装置 | |
Yu et al. | Privacy-preserving svm classification on vertically partitioned data | |
WO2016120975A1 (ja) | データ集計分析システム及びその方法 | |
CN111241570A (zh) | 保护数据隐私的双方联合训练业务预测模型的方法和装置 | |
US8891762B2 (en) | Method for privacy-preserving order selection of encrypted element | |
CN113407987B (zh) | 保护隐私的确定业务数据特征有效值的方法及装置 | |
US11265153B2 (en) | Verifying a result using encrypted data provider data on a public storage medium | |
CN114936650A (zh) | 基于隐私保护的联合训练业务模型的方法及装置 | |
Baryalai et al. | Towards privacy-preserving classification in neural networks | |
CN115510502B (zh) | 一种隐私保护的pca方法及系统 | |
Liu et al. | Privacy preserving pca for multiparty modeling | |
Chen et al. | Fed-EINI: An efficient and interpretable inference framework for decision tree ensembles in vertical federated learning | |
Wang et al. | Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval | |
Huang et al. | Secure and flexible cloud-assisted association rule mining over horizontally partitioned databases | |
Han et al. | Privacy-preserving singular value decomposition | |
Bothe et al. | eskyline: Processing skyline queries over encrypted data | |
Hu et al. | Enabling simultaneous content regulation and privacy protection for cloud storage image | |
Chen et al. | Fed-EINI: an efficient and interpretable inference framework for decision tree ensembles in federated learning | |
Rao et al. | Secure two-party feature selection | |
US20230385446A1 (en) | Privacy-preserving clustering methods and apparatuses |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21774468 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21774468 Country of ref document: EP Kind code of ref document: A1 |