CN110889139A

CN110889139A - Method and device for multi-party combined dimensionality reduction processing aiming at user privacy data

Info

Publication number: CN110889139A
Application number: CN201911174422.XA
Authority: CN
Inventors: 刘颖婷; 陈超超; 王力; 周俊
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-03-17
Anticipated expiration: 2039-11-26
Also published as: CN110889139B

Abstract

The embodiment of the specification provides a method and a device for multi-party joint dimension reduction aiming at user privacy data. Where each data holder of multiple parties locally owns a portion of user data as private data. In order to ensure the security of private data of each owner, a covariance matrix to be formed based on the private data of each owner is decomposed into a matrix which can be locally calculated by each owner or can be safely calculated by a secret shared matrix multiplication SMM; and the eigen matrix of the covariance matrix is determined together by a safe multi-party MPC calculation mode. In this way, each holding party can perform dimension reduction on the local data based on the eigen matrix, and finally form the user feature data with dimension reduction. In this way, the security of the user privacy data is ensured.

Description

Method and device for multi-party combined dimensionality reduction processing aiming at user privacy data

Technical Field

One or more embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for multi-party federated dimension reduction for private data.

Background

Data required for machine learning often involve multiple platforms and multiple domains. For example, in a merchant classification analysis scenario based on machine learning, an electronic payment platform has transaction flow data of merchants, an electronic commerce platform stores sales data of the merchants, and a banking institution has loan data of the merchants. Data often exists in the form of islands. Due to the problems of industry competition, data security, user privacy and the like, data integration faces a great resistance, and a challenge is made on how to integrate data scattered on each platform on the premise of ensuring that the data are not leaked.

On the other hand, as the amount of data increases, the dimensionality of the various training data becomes larger and larger. Although the training sample data of machine learning can be enriched by a large amount of data with high dimensionality, in reality, some redundant information often exists in the high-dimensional data. The redundant information has limited help on the machine learning effect, but the caused high-dimensional feature data possibly causes 'dimension explosion', so that the machine learning model is difficult to process and the training efficiency of the model is influenced. Therefore, when the model is trained and used, the high-dimensional sample features are often subjected to dimension reduction processing, and are converted into low-dimensional features under the condition of not losing the information quantity as much as possible.

Principal component analysis (pca) method, which is a statistical analysis, simplified data set method, uses orthogonal transformation to linearly transform the observed values of a series of possibly related variables, thereby projecting as values of a series of linearly uncorrelated variables, which are called principal components. Principal component analysis can be used to reduce the dimensionality of the data set while maintaining features in the data set that contribute most to the variance. Therefore, in practice, the PCA method is often used to reduce the dimensions of high-dimensional features.

However, Principal Component Analysis (PCA) methods typically require a uniform transformation and principal component extraction for all data. Under the condition that multiple parties share part of training data and wish to jointly perform model training, how to perform feature dimension reduction by adopting a PCA method on the premise of not revealing privacy data becomes a problem to be solved.

Therefore, an improved scheme is desired, which can perform multi-party joint dimension reduction on the private data and simultaneously ensure that the security of the private data is not leaked.

Disclosure of Invention

One or more embodiments of the present specification describe a method for performing multi-party joint dimension reduction on private data, so that multiple parties perform feature dimension reduction together, and meanwhile, the security of the respective private data is ensured not to be leaked.

According to a first aspect, a method for performing multi-party joint dimensionality reduction processing on user privacy data is provided, and the method is executed by any kth holder of M data holders, wherein the kth holder stores attribute values of multiple user attributes of N users, and M-1 other holders of the M data holders respectively store attribute values of other user attributes of the N users; the method comprises the following steps:

constructing a kth original matrix based on attribute values of a plurality of user attributes of the N users according to a preset user sequence;

carrying out zero-mean processing on each attribute in the multiple user attributes locally to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

locally calculating a first matrix obtained by multiplying the k-th central matrix by a transpose matrix of the k-th central matrix, and respectively multiplying the first matrix by the M-1 other holding parties based on respective central matrixes by using a secret shared matrix multiplication SMM to obtain a plurality of second matrixes; based on the first matrix and the plurality of second matrices, splicing to form a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other possessors forms a covariance matrix corresponding to the joint matrix;

determining each plane rotation matrix corresponding to each position on the off-diagonal line in the covariance matrix based on a safe multi-party calculation MPC, and transforming the kth decomposition matrix by using each plane rotation matrix to obtain a kth transformation matrix;

determining a dimension reduction transformation matrix based on the sum of transformation matrices respectively obtained by the M data holding parties, the plane rotation matrices and a dimension reduction target dimension d', and determining a kth dimension reduction transformation sub-matrix corresponding to the kth holding party in the dimension reduction transformation matrix;

processing the kth original matrix by using the kth dimensionality reduction transformation subarray to obtain a kth dimensionality reduction matrix;

and summing the dimensionality reduction matrixes in all the holding parties by using the MPC mode to obtain a user characteristic matrix after dimensionality reduction processing is carried out on all the user attributes of the N users, wherein the user characteristic matrix is used for carrying out user analysis.

According to one embodiment, one row of the k-th original matrix corresponds to one attribute, and one column corresponds to one user; in such a case, the zero-averaging process may include, for each row of the k-th original matrix, calculating a mean value of the row, and subtracting the mean value from all elements of the row, thereby obtaining the k-th central matrix.

In one embodiment, the joint matrix is a matrix formed by longitudinally splicing the k-th central matrix and the central matrix corresponding to each of the M-1 other holding parties.

According to one embodiment, the kth decomposition matrix is obtained by stitching:

dividing a kth decomposition matrix to be formed into M × M square matrixes formed by blocks;

and filling a block at a k-th row and a k-th column position of the square matrix with the first matrix, filling blocks at other positions of the k-th row and the k-th column with the plurality of second matrices calculated together with other possessors, respectively, filling all blocks at other positions in the square matrix with 0, and taking the filled square matrix as the k-th decomposition matrix.

In one embodiment, each position on the non-diagonal line in the covariance matrix comprises a first position; the plane rotation matrix corresponding to the first position may be determined by:

determining a second position and a third position obtained by respectively mapping the first position on a diagonal line in a transverse direction and a longitudinal direction;

obtaining a second calculation result of a difference between the element value of the second position and the element value of the third position in the covariance matrix after the previous iteration, wherein the second calculation result is obtained by adding a second holding party and a third holding party in the M data holding parties through secret sharing, the second holding party possesses the element value of the second position, and the third holding party possesses the element value of the third position;

obtaining a first calculation result by utilizing the secret sharing multiplication and a first holding party holding the element value of the first position, wherein the first calculation result is the ratio of the element value of the first position to the second calculation result;

and according to the first calculation result, obtaining a rotation angle parameter of the plane rotation matrix, and further determining the plane rotation matrix corresponding to the first position.

In one embodiment, the dimension reduction transformation matrix is determined as follows:

summing the transformation matrixes obtained by the M data holders respectively to obtain a sum matrix, and determining elements on a diagonal line of the sum matrix as a plurality of eigenvalues of the covariance matrix;

multiplying and superposing the plane rotation matrixes to obtain an intrinsic matrix;

determining the first d 'eigenvalues with larger values from the plurality of eigenvalues as target eigenvalues, and determining d' eigenvectors corresponding to the target eigenvalues from the eigen matrix; the d' eigenvectors constitute the dimension reduction transformation matrix.

According to one embodiment, determining the kth dimension reduction transformation sub-matrix corresponding to the kth holding party in the dimension reduction transformation matrix comprises:

determining the arrangement range of the k-th central matrix in the combined matrix;

and selecting a part corresponding to the arrangement range from the dimension reduction transformation matrix to form the kth dimension reduction transformation sub-matrix.

According to a second aspect, a method for multi-party joint dimension reduction processing on user privacy data is provided, and the method is executed by any kth holder of M data holders, wherein the kth holder stores attribute values of D user attributes of a plurality of users, and M-1 other holders of the M data holders respectively store attribute values of the D user attributes of other users; the method comprises the following steps:

constructing a kth original matrix based on attribute values of D user attributes of the plurality of users according to a preset user attribute sequence;

performing zero-mean processing on each attribute in the D user attributes by a safe multi-party calculation MPC to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

locally calculating a matrix obtained by multiplying the kth central matrix by a transposed matrix of the kth central matrix to be used as a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other holding parties forms a covariance matrix of the joint matrix;

determining a dimension reduction transformation matrix based on the sum of transformation matrices obtained by the M data holders, the plane rotation matrices and the dimension reduction target dimension;

processing the kth original matrix by using the dimensionality reduction transformation matrix to obtain a kth dimensionality reduction matrix;

and splicing the dimensionality reduction matrixes in all the holding parties to obtain a user characteristic matrix obtained after dimensionality reduction treatment is carried out on the D user attributes of all the users, wherein the user characteristic matrix is used for carrying out user analysis.

According to one embodiment, one row of the k-th original matrix corresponds to one attribute, and one column corresponds to one user; in such a case, the zero-averaging process may include:

locally calculating the sum of each row of the kth original matrix to form a D-dimensional row and value vector;

summing the row vector and the sample number of each holding party by using an MPC (process control device) mode with other holding parties to obtain a total row vector and a total sample number;

obtaining a total mean vector, which is the total row vector divided by the total number of samples;

and for any j-th row in the k-th original matrix, subtracting j-th element in the total mean vector from each element in the j-th row to obtain a k-th central matrix.

Further, the above-mentioned overall mean vector may be obtained by: calculating the total mean vector according to the total row vector and the total sample number; or receiving the total mean vector broadcast by other parties, wherein the other parties are one of the M-1 other holding parties; alternatively, the other party is a neutral third party.

In one embodiment, in a case where one row of the kth original matrix corresponds to one attribute and one column corresponds to one sample, the joint matrix is a matrix formed by laterally splicing the kth central matrix and the central matrices corresponding to the M-1 other possessors, respectively.

According to one embodiment, each location on the off-diagonal in the covariance matrix comprises a first location; the plane rotation matrix corresponding to the first position may be determined as follows:

locally calculating the difference between the element at the second position and the element at the third position in the kth decomposition matrix after the previous iteration to obtain a kth difference value; the sum of the difference values calculated by each holding party is calculated by using the secret sharing addition and the other holding parties in a coordinated mode, so that a second calculation result is obtained;

obtaining elements and values of the first positions corresponding to all holding parties by adopting secret sharing addition based on the elements of the first positions in the k-th decomposition matrix after the previous iteration;

calculating a ratio of the element sum value to the second calculation result as a first calculation result;

According to one embodiment, the dimension-reducing transformation matrix is determined as follows:

In one embodiment, the overall dimensionality reduction matrix is obtained by vertically splicing the transposed dimensionality reduction matrices of the respective holding parties.

According to a third aspect, a device for performing multi-party joint dimensionality reduction processing on user privacy data is provided, and the device is deployed in a kth holding party of any one of M data holding parties, wherein the kth holding party stores attribute values of multiple user attributes of N users, and M-1 other holding parties of the M data holding parties respectively store attribute values of other user attributes of the N users; the device comprises:

the original matrix construction unit is configured to construct a kth original matrix based on attribute values of a plurality of user attributes of the N users according to a preset user sequence;

the centralized processing unit is configured to locally perform zero-mean processing on each attribute in the multiple attributes to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

the decomposition matrix forming unit is configured to locally calculate a first matrix obtained by multiplying the k-th central matrix by a transpose matrix of the k-th central matrix, and multiply the first matrix and the M-1 other possessors respectively to obtain a plurality of second matrices based on respective central matrices by using a secret shared matrix multiplication SMM; based on the first matrix and the plurality of second matrices, splicing to form a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other possessors forms a covariance matrix corresponding to the joint matrix;

the rotation transformation unit is configured to determine each plane rotation matrix corresponding to each position on the off-diagonal line in the covariance matrix based on the safe multi-party calculation MPC, and transform the kth decomposition matrix by using each plane rotation matrix to obtain a kth transformation matrix;

a dimension reduction matrix determining unit configured to determine a dimension reduction transformation matrix based on a sum of transformation matrices obtained by the M data holders, the plane rotation matrices, and a dimension reduction target dimension d', and determine a kth dimension reduction transformation sub-matrix corresponding to the kth holder in the dimension reduction transformation matrix;

the dimensionality reduction processing unit is configured to process the kth original matrix by using the kth dimensionality reduction transformation sub-matrix to obtain a kth dimensionality reduction matrix;

and the dimension reduction comprehensive unit is configured to sum dimension reduction matrixes in all the holding parties by using the MPC mode to obtain a user feature matrix after dimension reduction processing is performed on all the attributes of the N users, and the user feature matrix is used for user analysis.

According to a fourth aspect, a device for performing multi-party joint dimension reduction on user privacy data is provided, where the device is deployed in a kth holding party of any one of M data holding parties, where the kth holding party stores attribute values of D user attributes of multiple users, and M-1 other holding parties of the M data holding parties respectively store attribute values of the D user attributes of other users; the device comprises:

the original matrix construction unit is configured to construct a kth original matrix based on attribute values of D user attributes of the plurality of users according to a preset user attribute sequence;

the centralized processing unit is configured to perform zero-mean processing on each attribute in the D user attributes through the safe multi-party calculation MPC to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

a decomposition matrix forming unit configured to locally calculate a matrix obtained by multiplying the kth center matrix by a transpose matrix thereof as a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other holding parties forms a covariance matrix of the joint matrix;

a dimension reduction matrix determination unit configured to determine a dimension reduction transformation matrix based on a sum of transformation matrices obtained by the M data holders, the plane rotation matrices, and a dimension reduction target dimension;

the dimension reduction processing unit is configured to process the kth original matrix by using the dimension reduction transformation matrix to obtain a kth dimension reduction matrix;

and the dimension reduction comprehensive unit is configured to splice dimension reduction matrixes in all the holding parties to obtain a user feature matrix after dimension reduction processing is carried out on the D user attributes of all the users, and the user feature matrix is used for carrying out user analysis.

According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.

According to the method and the device provided by the embodiment of the specification, in order to ensure the security of private data of each owner, a covariance matrix to be formed is decomposed into a decomposition matrix which can be locally calculated by each owner or can be safely calculated by secret sharing matrix multiplication; and the eigen matrix of the covariance matrix is determined together by a safe multi-party MPC calculation mode. In this way, each owner can perform dimensionality reduction on the local data based on the eigen matrix, and finally form overall dimensionality-reduced user characteristic data. In this way, the security of the private data is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

FIG. 2 shows the implementation of the Principal Component Analysis (PCA) method;

FIG. 3 illustrates a flow diagram of a joint dimension reduction method applied to a longitudinal distribution of data, according to one embodiment;

FIG. 4 illustrates a flow diagram of a joint dimension reduction method applied to a lateral distribution of data, according to one embodiment;

FIG. 5 shows a schematic block diagram of an apparatus deployed in a kth owner for federated dimension reduction, in accordance with one embodiment;

figure 6 shows a schematic block diagram of an apparatus deployed in a kth owner for federated dimension reduction, in accordance with one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in fig. 1, in a shared learning scenario, training data is provided in common by multiple owners 1, 2, …, M, each owner having a portion of the training data. The training data may be a vertical distribution, e.g., data where each holder has different attribute items of the same batch of samples; it may also be a horizontal distribution, for example, each holder has data of the same attribute item for different samples. In order to secure the private data, each owner needs to keep the private data locally, not output plaintext data, and not perform plaintext aggregation.

In the context of an embodiment of the present specification, the above multiple owners jointly perform the dimensionality reduction of the training data by using a Principal Component Analysis (PCA) method. As known to those skilled in the art, the core step of the PCA method is to form a covariance matrix based on the data matrix of the dimension to be reduced and to solve the eigenvalues and eigenvectors of the covariance matrix. In the embodiment of the present specification, in order to ensure the security of private data of each owner, a covariance matrix to be formed is decomposed into decomposition matrices that can be locally calculated by each owner, and local calculation is performed; and the eigen matrix of the covariance matrix is determined together by a safe multi-party MPC calculation mode. In this way, each owner can perform dimensionality reduction on the local data based on the eigen matrix, and eventually form overall dimensionality reduced data.

In order to describe the above process more clearly, first, an implementation of the principal component analysis PCA method is described with reference to fig. 2.

In fig. 2, it is assumed that there are D-dimensional feature data of N samples, and each dimension feature data may correspond to an attribute value of an attribute of a sample. Assume that it is desired to reduce the D-dimensional feature to the D 'dimension, where D' < D.

Then, first, in step 201, an original matrix Y of dimension N × D may be formed based on the above D-dimensional feature data of the N samples. For example, an attribute may be represented by a row and a sample by a column, thus forming an original matrix Y of N columns and D rows.

Then, in step 202, a zero-averaging process, or centering process, is performed on the original matrix to obtain a central matrix X. The goal of the centering process is to make the mean of the attribute values of all N samples on any one of the D-dimensional attributes to be 0. Operationally, for attribute j, the average of all N samples for the attribute value of attribute j may be first found, and then the average is subtracted from the element corresponding to attribute j in the original matrix. For example, in the case that a row represents an attribute, for each row of the original matrix, the mean value of the row is obtained, and then the corresponding mean value is subtracted from each element of the row, so as to obtain the zero-averaged central matrix X.

Then, in step 203, a covariance matrix a is calculated, specifically, a ═ XX^TWherein X is^TIs the transpose of matrix X.

Next, at step 204, the eigenvalues λ (eigen value) and eigenvectors v (eigen vector) of the covariance matrix A are solved. Mathematically, the eigenvalues λ and eigenvectors v of the covariance matrix a satisfy:

the Av ═ λ v (1) covariance matrix is a symmetric matrix, and there are a plurality of eigenvalues and a corresponding plurality of eigenvectors that are orthogonal to each other. The eigenvalues and eigenvectors of the covariance matrix can be solved by a variety of algorithms. The plurality of eigenvectors may form an eigenmatrix.

Based on the eigenvalues and eigenvectors obtained above, a dimension-reducing transformation matrix T is determined in step 205.

It is to be understood that, in a physical sense, one eigenvector means one projection direction in the original D-dimensional space. The eigenvectors are orthogonal to each other, meaning that the projection directions are orthogonal to each other. The essence of the PCA feature dimension reduction is that D 'orthogonal projection directions are found in an original D-dimensional space to serve as D' coordinate directions, and original sample points are projected into a D 'dimensional mapping space formed by the D' coordinate directions, so that the variance of the projected sample points is as large as possible. The variance after projection in each direction can be represented by eigenvalues.

Based on such an idea, the obtained eigenvalues may be sorted in descending order, and the top d ' eigenvalues may be taken out as target eigenvalues, and then d ' eigenvectors corresponding to the d ' target eigenvalues may be determined. The d 'eigenvectors correspond to the d' coordinate directions selected for dimensionality reduction. The d' eigenvectors form a dimension reduction transformation matrix T.

Then, in step 205, a reduced-dimension matrix Y ' is obtained by applying a reduced-dimension transformation matrix T to the original matrix Y, where the matrix Y ' is N × d '. Thus, transforming the original matrix Y into matrix Y 'is equivalent to reducing the original D-dimensional feature into D' -dimensional feature.

In performing principal component analysis, PCA, step 204 may be performed in a variety of ways, including by way of eigenvalue decomposition, to solve for the eigenvalues and eigenvectors of the covariance matrix.

The goal of eigenvalue decomposition is to find an orthogonal matrix U such that:

U^TAU＝D (2)

where D is a diagonal matrix, i.e., the non-diagonal elements are all 0. In this case, the jth element on the diagonal of the diagonal matrix D is the jth eigenvalue, and the vector of the jth column in the matrix U is the corresponding eigenvector.

In practice, the above orthogonal matrix may be found by means of jacobian iteration. In the Jacobian iteration scheme, a plane rotation matrix P is selected for each iteration_ijThe covariance matrix a is transformed, i.e.:

wherein A is^lCovariance matrix for the I-th iteration, A^l+1A covariance matrix for the I +1 th iteration; and, a plane rotation matrixP_ijThe following form is satisfied:

wherein, except for i rows and i columns and j rows and j columns, the diagonal is 1, and the other elements not shown in the matrix are 0. Easy to verify, planar rotation matrix P_ijIs an orthogonal matrix.

In practical operation, a specific planar rotation matrix may be selected as follows.

Covariance matrix A for iteration I^lFrom which the element position (i, j) to be processed can be determined, for example a can be selected^lElement a with maximum absolute value on middle off-diagonal_ijAnd its corresponding position (i, j) is determined. Then, from A^lTaking out element a_ij，a_iiAnd a_jjThe following calculations were performed:

through the formula (4), the rotation angle theta can be calculated, so that sin theta and cos theta are obtained and filled to corresponding positions, and then a specific plane rotation matrix P is determined_ij。

When the covariance matrix is iterated continuously by using the plurality of plane rotation matrices in the above form, elements outside a main diagonal in the covariance matrix gradually tend to zero until a diagonal matrix D is obtained, at this time, an eigenvalue can be obtained according to the diagonal matrix D, and the accumulation of the plurality of plane rotation matrices is used as the orthogonal matrix U, so as to determine each eigenvector.

The above describes the process of dimension reduction of the original matrix in dimension N x D by decomposing the eigenvalues of the covariance matrix in the PCA method. However, in the shared learning scenario shown in fig. 1, each data holder holds a part of sample data and cannot perform plaintext aggregation, so that a full matrix representing all sample data cannot be directly formed, and it is difficult to directly obtain a covariance matrix and solve the covariance matrix.

Therefore, in the embodiment of the present specification, the covariance matrix to be formed is decomposed into decomposition matrices that can be locally calculated by each owner, and local calculation is performed; then, an eigen matrix of the covariance matrix is determined jointly through a safe multi-party MPC calculation mode, and therefore joint dimensionality reduction is conducted.

The following describes specific implementation steps of joint dimensionality reduction for the cases of longitudinal distribution and transverse distribution of data, respectively.

FIG. 3 illustrates a flow diagram of a method for joint dimensionality reduction, which is applicable to the case of longitudinal distribution of data, according to one embodiment.

Assume that there are M data holders. In the case of longitudinal distribution of data, each holder possesses different attribute feature data of the same batch of samples. For example, in one specific example, one of the M data holders is an electronic payment platform. The platform has attribute values for attributes of the N users related to transaction flow, including, for example, number of payments, payment amount, payment object, and the like. Another holder of the M data holders is, for example, a banking institution that holds attribute values of the attributes of the N users related to the loan, including, for example, the amount of the loan, the number of times of the loan, the amount of the loan that has been returned, and the like. In this way, different attribute data of the same batch of samples (users) are distributed longitudinally to different data holders.

The method of fig. 3 is performed by any one of the M data holders, e.g., the kth holder. Correspondingly, the kth holder stores D of N samples_kThe attribute values of the item attributes and the attribute values of the other item attributes of the N same samples are stored in the other holders respectively.

To perform joint dimensionality reduction, as shown in FIG. 3, first at step 301, the kth holder bases D on N samples in a predetermined sample order_kConstructing a k original matrix Y by using the attribute values of the item attributes_k。

It should be understood that, since each owner possesses different attribute data of the same batch of N samples, in order to facilitate subsequent processing, each owner needs to align the attribute data in the dimension of the sample, that is, unify the order of the samples, and arrange the attribute data according to a predetermined sample order.

In one embodiment, in forming the original matrix, one attribute is represented by each row, each column representing one sample. Thus, the kth owner forms D_kOriginal matrix of dimension N. The following description is made in conjunction with this matrix arrangement.

It will be appreciated that each owner similarly forms the original matrix. If the original matrices of the respective owners are stitched along the longitudinal direction, an original full matrix Y can be formed:

the original full matrix is a D-by-N dimensional matrix, wherein each row represents an attribute and has D rows in total; each column represents a sample, and N columns are provided, wherein D is the number of all attribute items:

and the rows are aligned with respect to the sample order.

It should be understood, however, that the original full matrix is merely a matrix that is assumed to be formed for ease of description, as the respective owners do not perform a plaintext direct aggregation of the original data.

Next, at step 302, the kth owner is on local pair D_kPerforming zero-averaging processing on each item attribute in the item attributes to obtain a k-th central matrix X_k。

It will be appreciated that for D_kFor each attribute j in the item attribute, the kth holding party locally possesses the attribute value of all the N samples for the attribute j, so that the zero-averaging processing for the item attribute can be performed locally at the kth holding party.

In the case of representing attributes by rows, for the original matrix Y_kFor each row of (1), the mean value of the row may be calculated and subtracted from all elements of the row to obtain a zero meanCentral matrix X of values_k。

It can be understood that each owner similarly performs zero equalization processing to obtain a corresponding central matrix. If the central matrices of the various owners are stitched along the vertical direction, a joint matrix X can be formed:

the joint matrix is also a D x N dimensional matrix.

It should be understood that the joint matrix is a zero-mean matrix, which can be used as a basis for calculating the covariance matrix of the multi-party joint. However, since the central matrix still reveals privacy, each owner cannot directly splice the central matrix. The joint matrix is merely a matrix assumed to be formed for convenience of description.

After assuming that the above-described joint matrix is formed, the covariance matrix C ═ XX may be calculated as follows^T：

It is understood that the covariance matrix is a square matrix in the dimension D x D.

As can be seen from the rightmost representation of equation (8), the covariance matrix can be divided into M × M blocks multiplied by M matrices, where the k-th row and k-th column blocks are all multiplied by the k-th owner central matrix X_kIt is related. Thus, it is conceivable to split the covariance matrix C into several decomposition matrices that the holder can safely compute.

Specifically, in step 303, the kth owner may locally compute the kth center matrix X_kAnd its transposed matrix

Matrix obtained by multiplication

For simplicity, it is referred to as the first matrix.

In addition, the kth owner may also multiply SMM by using a secret shared matrix, based on their respective central matrices, with the other owners j, respectively, to obtain a plurality of cross matrices or simply referred to as second matrices. The second matrix obtained by performing SMM multiplication operation on the kth holding party and other jth holding parties comprises a kth central matrix X_kProduct with transpose of jth central matrix

And the product of the j-th central matrix and the transpose of the k-th central matrix

Thus, the kth owner can splice the first matrix and the plurality of second matrices to form the kth decomposition matrix C_k：

That is, the decomposition matrix is also divided into M blocks to be filled to form a square matrix, and the first matrix is used

Filling the blocks at the kth row and kth column positions in the square matrix with a second matrix co-computed with the other holding parties

Or

The blocks filling other positions of the kth row and the kth column respectively and correspondingly; the blocks at the remaining positions in the square matrix are filled with 0 s.

It will be appreciated that other owners may correspondingly derive their decomposition matrices. It is easy to verify that the sum of the corresponding decomposition matrices in all the holders can form or restore the covariance matrix C corresponding to the joint matrix X, that is:

C＝C₁+C₂+…+C_M(10) in the above k-th owner building the k-th decomposition matrix, SMM is required to multiply with a secret-shared matrix. The SMM method is a known matrix multiplication method with privacy protection, and can obtain a matrix multiplication result without leakage of original matrix data from both parties. This process is briefly described below.

Assume that a first owner owns matrix a and a second owner owns matrix B. The first and second owners may generate random matrices a 'and B', respectively.

A 'is formed by extracting an even number column from random matrix A'_eExtracting odd columns to form A'_o。

The second owner extracts the even-numbered rows from the random matrix B 'to form B'_eExtracting odd rows to form B'_o。

The first owner calculates A₁And A₂And sending it to the second holder, wherein:

A₁＝A+A′；A₂＝A′_e+A′_o(11)

second owner calculates B₁And B₂And sending it to the first holder, wherein:

B₁＝B′-B；B₂＝B′_e-B′_o(12)

the first owner locally computes P ═ (a + 2A') B₁+(A₂+A′_o)B₂；

The second owner locally calculates Q ═ A₁(2B-B′)-A₂(B₂+B′_e)

The first and second owners then exchange P and Q. It can be verified that: p + Q ═ AB

Thus, both holders get the result of the matrix multiplication without exposing the original matrices a and B.

It is understood that the second matrix in k rows and k columns in equation (9) can be calculated using the above SMM method. Therefore, each owner can compute and concatenate to get the decomposition matrix. However, the decomposition matrix and covariance matrix themselves may reveal data privacy, and therefore, the respective owners do not aggregate to form the covariance matrix C directly according to equation (10), but rather perform the rotation transformation required to decompose the eigenvalues on their local decomposition matrices.

Then, in step 304, the kth owner determines respective plane rotation matrices corresponding to respective positions on non-diagonal lines in the covariance matrix C based on the secure multi-party MPC, and decomposes the k-th decomposition matrix C using the respective plane rotation matrices_kAnd transforming to obtain a k-th transformation matrix.

Specifically, at this iteration, it is assumed that a certain position (i, j) on the off-diagonal in the covariance matrix is selected, where i and j are not equal. In practical operation, each position in the covariance matrix may be traversed sequentially as iterations of different rounds. For the position (i, j) selected in the current iteration, the MPC is calculated by the safe multi-party to obtain

Is called the first calculation result, where c_ij，c_iiAnd c_jjIs the element of the corresponding position in the covariance matrix assumed to be formed after the last iteration.

In one embodiment, the first calculation result may be obtained by addition and multiplication of secret sharing. Specifically, the currently selected location, hereinafter referred to as the first location (i, j), may be projected laterally and longitudinally onto the diagonal, respectively, to yield the second location (i, i) and the third location (j, j). Assuming element c in the second position_iiElement c in a third position in a locally maintained decomposition matrix of the second holder_jjIn a decomposition matrix maintained locally by the third owner. Then the second and third parties can share the secret using the addition without revealing c_iiAnd c_jjIn the case of the original value of (1), c is obtained_ii-c_jjThe calculation result of (2) is referred to as a second calculation result. Each holding party can obtain the calculated second calculation result and then use the secret sharing multiplication to have the first position element c_ijFirst ofThe holding parties carry out multiplication operation together to finally obtain the first calculation result.

In other embodiments, the first calculation result may also be obtained by other MPC manners, such as homomorphic encryption.

Similar to the aforementioned formula (4),

therefore, based on the first calculation result, the rotation angle θ, and thus sin θ, cos θ, can be calculated, and thus the planar rotation matrix P corresponding to the position (i, j) in the current iteration is determined_ij：

On the basis of obtaining the plane rotation matrix Pij based on the MPC method, each owner can perform rotation transformation on its local decomposition matrix by using the plane rotation matrix. For the kth owner, it may be performed that:

thus, a new matrix element is obtained:

wherein, the superscript (0) represents the position in the decomposition matrix before the current iterative transformation, the superscript (1) represents the position after the current iterative transformation.

Then, using C'_kIn place of C_kAnd performing the next iteration until all positions in the covariance matrix are traversed. After the conversion of each plane rotation matrix corresponding to each position, the kth owner obtains a kth conversion matrix C'_k。

Next, in step 305, a dimension reduction transformation matrix is determined based on the sum of the transformation matrices obtained by the M data holders, the plane rotation matrices, and the target dimension of the dimension reduction.

It can be understood that through multiple rotations of the plane rotation matrix, the transformation matrix of each owner does not contain any more privacy information, and can be directly aggregated. Therefore, the sum of the transformation matrices obtained by the M data holders, respectively, can be directly calculated to obtain a sum matrix C':

C′＝C′₁+C′₂+…+C′_M(15)

of course, for security, the sum matrix C' may be obtained by adding secret shares in this step.

It is understood that since the sum of the respective decomposition matrices is the covariance matrix C, the sum C' of the transformation matrices obtained by performing the rotation transformation on the respective decomposition matrices is equivalent to the matrix obtained by applying the rotation transformation described above to the covariance matrix C. Through the rotation transformation of a plurality of plane rotation matrixes, the non-diagonal elements in the covariance matrix gradually tend to zero. The resulting matrix C' is thus a diagonal matrix, or an approximate diagonal matrix, in dimensions D x D. The elements on the diagonal of the sum matrix C' can then be determined as the D eigenvalues of the covariance matrix, D being the sum of the number of attribute terms for each holder as previously described,

accordingly, according to the above-mentioned plane rotation matrices, an eigen matrix P can be obtained:

P＝ΠP_ij(16) that is, the eigenmatrix P is the result of the multiplication of the respective plane rotation matrices.

Based on the eigenvalues determined from the matrix C ', the eigen-matrix P, and the dimension-reduced target dimension D ', a dimension-reduced transformation matrix T can be determined, wherein the target dimension D ' < D.

Specifically, the larger first D 'eigenvalues may be taken out of the D eigenvalues obtained from the matrix C' as target eigenvalues, and D 'eigenvectors corresponding to the D' target eigenvalues may be determined from the eigen matrix P. The d' eigenvectors form a dimension reduction transformation matrix T. The dimension of the dimensionality reduction transformation matrix T is D'. multidot.D.

In the case of vertical data distribution, the kth holding party further identifies a kth dimension reduction transformation sub-matrix T corresponding to the holding party from the dimension reduction transformation matrix T_k. It is understood that in the total number of attributes D of the sample, the kth holder holds D therein_kTerm corresponding to D of k-th range in original full matrix_kAnd (6) rows. The k-th range, i.e., the k-th central matrix X of the k-th possessor_kRange of arrangement in the joint matrix X. Then, a part corresponding to the arrangement range is selected from the D column of the dimension reduction transformation matrix T to form the kth dimension reduction transformation sub-matrix T_k. This is equivalent to splitting the dimension-reduced transformation matrix T into individual transformation sub-matrices suitable for the respective owners, in columns. Correspondingly, the kth dimensionality reduction transformation subarray T corresponding to the kth holding party_kDimension D'. X.D_k。

Then, in step 306, the kth owner uses the kth dimensionality reduction transformation subarray T_kProcessing the k original matrix Y owned locally_kObtaining a k-th dimensionality reduction matrix Y'_kNamely:

Y′_k＝T_kY_k(17)

due to T_kDimension D'. X.D_kAnd Y is_kDimension D_kN, can deduce the k-th dimensionality reduction matrix Y'_kD' × N.

Next, in step 307, the dimension reduction matrices in each owner are summed up in an MPC manner to obtain a total dimension reduction matrix Y' obtained by performing dimension reduction processing on all the attributes of the N samples:

specifically, a secret sharing addition method may be adopted, and each of the holding parties jointly calculates the sum of elements at the same corresponding position of the dimensionality reduction matrix, so as to obtain a total dimensionality reduction matrix Y'.

The dimension of the total dimensionality reduction matrix Y ' is D '. times.N, which is equivalent to the dimension reduction of D item attribute features in the original full matrix Y to D '. The overall dimension reduction matrix Y' can then be used by the various owners together for efficient machine learning.

Specifically, under the condition that the sample is a user, the total dimension reduction matrix is a dimension-reduced user feature matrix, and the user feature matrix comprises dimension-reduced feature expressions of the users. By using the user feature matrix, user analysis can be effectively performed, for example, a machine learning mode is adopted, user categories are predicted, user belonged groups are analyzed, and the like, so that services are provided for users.

It can be seen that, in the above dimension reduction process, the original privacy data will not be revealed by each data holder, thereby implementing the joint dimension reduction process of privacy protection.

The above description is made in connection with a concrete expression form in which the attributes are represented by rows and the samples are represented by columns in the original matrix. It will be appreciated that the rows and columns may be interchanged. That is, in another embodiment, the original matrix may be formed such that one row corresponds to one sample and one column corresponds to one attribute. Thus, the original matrix Y of each data holder_kA transverse stitching is required to form the original full matrix. Subsequently, when zero equalization is performed, zero equalization processing is performed on each column. When it is assumed that the calculation of the covariance matrix is performed, the covariance matrix C may be made X^TAnd (4) X. The subsequent steps 303-307 are performed similarly as described above, except that the row and column exchanges are performed.

The above describes the process of performing feature dimension reduction jointly by multiple data holders in the case of longitudinal distribution of data. The following is described for the case of the data lateral distribution.

FIG. 4 illustrates a method flow diagram for joint dimensionality reduction, which is applicable to the case of a horizontal portion of data, according to one embodiment.

Assume that there are M data holders. In the case of a horizontal distribution of data, each holder possesses feature data of the same attribute item for different samples. For example, in a specific example, one of the M data holders is a social platform, and has the user basic attribute features of n users, where the attributes include user id, age, gender, occupation, and region. Another holder of the M data holders is, for example, another social platform, which possesses the above-mentioned user basic attribute features of the other M users. In this way, the feature data of the same attribute item for different users is distributed laterally across different data holders.

The method of fig. 4 is performed by any one of the M data holders, e.g., the kth holder. Accordingly, the kth owner stores N_kThe attribute values of the D-item attribute of each sample are stored in the other holders, respectively.

To perform joint dimension reduction, as shown in FIG. 4, first at step 401, the kth owner follows a predetermined attribute order based on N_kConstructing a k original matrix Y by using the attribute values of the D items of the samples_k。

It should be understood that, since each owner has the same attribute item data of different samples, in order to facilitate subsequent processing, each owner needs to align the attributes in the dimension, that is, unify the order of the attributes, and all arrange the sample data according to a predetermined attribute order.

In one embodiment, in forming the original matrix, one attribute is represented by each row, each column representing one sample. Thus, the k-th owner forms D × N_kThe original matrix of dimensions. The following description is made in conjunction with this matrix arrangement.

It will be appreciated that each owner similarly forms the original matrix. If the original matrices of the respective owners are stitched along the transversal direction, an original full matrix Y can be formed:

Y＝(Y₁Y₂…Y_M) (19)

the original full matrix is a D-by-N dimensional matrix, wherein each row represents an attribute and has D rows in total; each column represents one sample, for a total of N columns, where N is the total number of samples,

and the columns are aligned with respect to attribute order.

As previously described, each owner does not perform a plaintext direct aggregation of the original data, and the original full matrix is simply a matrix assumed to be formed for ease of description.

Then, in step 402, the kth owner performs zero-averaging on each of the D-item attributes by secure multi-party MPC to obtain a kth central matrix X_k。

It should be understood that unlike the case of data distributed longitudinally, in the case of data distributed laterally, the kth owner only owns N for any attribute item j in the D item attributes_kOne sample is for the attribute value of the attribute, and zero-averaging requires calculation for the attribute value of the attribute j for all N samples. Therefore, in this step, it is necessary to perform the zero-averaging process together with other data holders by using the MPC.

In particular, where a row represents an attribute, for each of the D rows, the kth holder may locally compute the sum of the row, thus forming a D-dimensional row and value vector S_k。

Then, the kth holding party and other holding parties adopt an MPC mode to respectively sum the line vector and the sample number of each holding party to obtain a total line vector S and a total sample number N, wherein:

the total number of samples N is calculated using the aforementioned equation (20).

The MPC scheme may specifically be implemented by using secret sharing addition.

Then, the neutral third party or any of the M holding parties can calculate the overall mean vector

The total mean vector

Is a D-dimensional vector, wherein the jth element represents the average value of the jth row in the original full matrix Y.

Calculating the total mean vector

The calculator of (a) then broadcasts the vector to each of the M holders. Then, each owner performs zero equalization using the overall average vector. Specifically, for the kth owner, the above total mean vector is received or calculated

Then, for its local original matrix Y_kThe jth element in the total mean vector is subtracted from each element in the jth row, and then the kth central matrix X is obtained_k。

It can be understood that each owner similarly performs zero equalization processing to obtain a corresponding central matrix. If the central matrices of the various owners are stitched along the lateral direction, a joint matrix X can be formed:

X＝(X₁X₂...X_M) (23)

the joint matrix is a D x N dimensional matrix and is a zero-averaged matrix. As described above, since privacy is revealed in the central matrix, each owner cannot directly splice the central matrix. The joint matrix is merely a matrix assumed to be formed for convenience of description.

As can be seen from the rightmost representation of equation (24), the covariance matrix can be decomposed into M locally computed decomposition matrices.

Thus, in step 403, the kth owner may locally compute the kth center matrix X_kAnd its transposed matrix

Matrix obtained by multiplication

Forming a kth decomposition matrix C_k：

It will be appreciated that other owners may correspondingly derive their decomposition matrices. At this time, the sum of the corresponding decomposition matrices in all the possessors can form or reduce the covariance matrix C corresponding to the joint matrix X, that is:

C＝C₁+C₂+…+C_M(26)

next, in a manner similar to that of FIG. 3, at step 404, the kth owner determines respective plane rotation matrices corresponding to respective positions on the non-diagonal in the covariance matrix C based on the secure multi-party computing MPC, and decomposes the kth decomposition matrix C using the respective plane rotation matrices_kAnd transforming to obtain a k-th transformation matrix.

In particular, the various positions in the covariance matrix may be traversed in turn. For the position (i, j) selected in the current iteration, the MPC is calculated by the safe multi-party to obtain

In one embodiment, the first calculation result may be obtained by means of secret sharing. In the case of the data lateral distribution, the covariance is calculated according to the equations (25) and (26) unlike the case of the data longitudinal distributionThe elements of each position in the matrix are the sum of the elements of the same position in the decomposition matrix of the respective holding party. Thus, in one embodiment, in solving the first calculation, the position (i, j) selected in the current iteration (referred to as the first position for simplicity) is first mapped onto the diagonal line in the lateral and longitudinal directions, respectively, to obtain the second position (i, i) and the third position (j, j). And the kth holding party locally calculates the difference between the element at the second position and the element at the third position in the kth decomposition matrix after the last iteration to obtain the kth difference value. Then, by adopting the addition of secret sharing, all the possessors cooperatively calculate the sum of the difference values, thereby obtaining a second calculation result c_ii-c_jj. Then, each holding party obtains the element sum value of the first position by adopting secret sharing addition based on the element of the first position in the local decomposition matrix, and the ratio of the sum value to the second calculation result is the first calculation result.

Based on the first calculation result, the rotation angle θ, and hence sin θ, coS θ, can be calculated, and then the plane rotation matrix P corresponding to the position (i, j) in the current iteration is determined, as shown in the above equation (4)_ij：

Obtaining a plane rotation matrix P based on an MPC mode_ijEach owner can utilize the plane rotation matrix to perform rotation transformation on the local decomposition matrix. For the kth owner, it may be performed that:

then, using C'_kIn place of C_kAnd performing the next iteration until all positions in the covariance matrix are traversed. By transformation of the rotation matrix in each plane through the correspondence of each positionThen, the kth own side obtains a kth transform matrix C'_k。

Next, in step 405, a dimension reduction transformation matrix T is determined based on the sum of the transformation matrices obtained by the M data holders, the plane rotation matrices, and the target dimension of the dimension reduction.

As described above, the transformation matrices of the respective owners do not contain any more private information through the multiple rotations of the plane rotation matrix, and can be aggregated directly. Therefore, the sum of the transformation matrices obtained by the M data holders, respectively, can be directly calculated to obtain a sum matrix C':

C′＝C′₁+C′₂+…+C′_M(28)

alternatively, for further security, the sum matrix may be obtained by addition of secret sharing. The sum matrix C' is a diagonal matrix of dimension D x D, or an approximate diagonal matrix, whose diagonal elements may be the D eigenvalues of the covariance matrix.

Furthermore, from the above-mentioned respective plane rotation matrices, an eigenmatrix P can be obtained:

P＝ΠP_ij(29)

that is, the eigenmatrix P is the result of the multiplication of the respective plane rotation matrices.

Then, based on the eigenvalues determined from the matrix C ', the eigen matrix P, and the dimension-reduced target dimension D ', a dimension-reduced transformation matrix T can be determined, wherein the target dimension D ' < D.

Similarly, from the D eigenvalues obtained from the matrix C ', the larger first D' eigenvalues are taken out as target eigenvalues, and D 'eigenvectors corresponding to the D' target eigenvalues are determined from the eigen matrix P. The d' eigenvectors form a dimension reduction transformation matrix T. The dimension of the dimensionality reduction transformation matrix T is D'. multidot.D. In the case of a horizontal distribution of data, the dimension-reducing transformation matrix T may directly transform the original matrix of each kth owner.

Thus, at step 406, the kth owner processes the locally owned kth original matrix Y using the above-described dimensionality reduction transformation matrix T_kObtaining a k-th dimensionality reduction matrix Y'_kNamely:

Y′_k＝TY_k(30)

since T has dimension D'. multidot.D, and Y_kDimension D x N_kIt can be inferred that the k-th dimension reduction matrix Y'_kOf d'. times.N_k。

Next, in step 407, transpose and splice the dimensionality reduction matrix in each holding party to obtain a total dimensionality reduction matrix Y' obtained by performing dimensionality reduction on the D-item attributes of all samples:

it can be understood that the kth dimension reduction matrix Y'_kAfter transposition, the dimension is N_kD ', and longitudinally splicing the transposes of the dimensionality reduction matrixes to obtain a total dimensionality reduction matrix Y ' with the dimensionality of N x d '. The overall dimension reduction matrix Y 'is equivalent to the compression dimension reduction of D item attribute features in the original overall matrix Y to D'. The overall dimension reduction matrix Y' can then be used by the various owners together for efficient machine learning.

Similarly, in the above dimension reduction process, the original privacy data will not be revealed by each data holder, thereby realizing the joint dimension reduction process of privacy protection.

It will be appreciated that the above has been described in connection with the representation of attributes in rows in the original matrix, with the concrete representation of samples in columns. It will be appreciated that the rows and columns may be interchanged. In another embodiment, when the original matrix is formed with the data distributed laterally, one row may correspond to one sample and one column to one attribute. Thus, the original matrix Y of each data holder_kA vertical stitching is required to form the original full matrix. Subsequently, when zero equalization is performed, zero equalization processing is performed on each column. When it is assumed that the calculation of the covariance matrix is performed, the covariance matrix C may be made X^TAnd (4) X. The subsequent steps 403-407 are performed similarly as described above, except that the row and column exchanges are performed.

Through the mode described above, each data holder can jointly perform feature dimension reduction processing without revealing privacy data, so that shared machine learning and joint training can be performed more effectively.

According to another aspect of the embodiments, an apparatus for multi-party joint dimension reduction for private data is provided, and the apparatus is deployed in the kth holder of any M data holders, and each holder may be implemented as any device, platform or device cluster having data storage, computation and processing capabilities. Figure 5 shows a schematic block diagram of an apparatus deployed in a kth owner for federated dimension reduction, in accordance with one embodiment. In the case of a vertical data distribution, the kth owner stores attribute values of a plurality of attributes of N samples, and M-1 other owners among the M data owners store attribute values of other attributes of the N samples, respectively. As shown in fig. 5, the apparatus 500 disposed in the kth holder includes:

an original matrix constructing unit 51 configured to construct a kth original matrix based on attribute values of the plurality of attributes of the N samples in a predetermined sample order;

a centralization processing unit 52 configured to locally perform zero-mean processing on each of the multiple attributes to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

a decomposition matrix forming unit 53 configured to locally calculate a first matrix obtained by multiplying the k-th central matrix by a transpose matrix thereof, and multiply the k-th central matrix by a secret shared matrix multiplication SMM with the M-1 other possessors based on respective central matrices to obtain a plurality of second matrices; based on the first matrix and the plurality of second matrices, splicing to form a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other possessors forms a covariance matrix corresponding to the joint matrix;

a rotation transformation unit 54 configured to determine each planar rotation matrix corresponding to each position on the off-diagonal line in the covariance matrix based on the secure multi-party calculation MPC, and transform the kth decomposition matrix by using each planar rotation matrix to obtain a kth transformation matrix;

a dimension reduction matrix determining unit 55 configured to determine a dimension reduction transformation matrix based on a sum of transformation matrices obtained by the M data holders, the plane rotation matrices, and a dimension reduction target dimension d', and determine a kth dimension reduction transformation sub-matrix corresponding to the kth holder in the dimension reduction transformation matrix;

a dimension reduction processing unit 56 configured to process the kth original matrix by using the kth dimension reduction transformation subarray to obtain a kth dimension reduction matrix;

and the dimension reduction comprehensive unit 57 is configured to sum the dimension reduction matrixes in each holding party by using the MPC mode to obtain a total dimension reduction matrix obtained by performing dimension reduction processing on all the attributes of the N samples.

In one embodiment, one row of the k-th original matrix corresponds to one attribute and one column corresponds to one sample. In such a case, the centering processing unit 52 is configured to calculate a mean value of each row of the k-th original matrix, and subtract the mean value from all elements of the row, so as to obtain the k-th center matrix.

Accordingly, in one embodiment, the joint matrix may be a matrix formed by longitudinally splicing the k-th central matrix and the central matrix corresponding to each of the M-1 other owners.

According to an embodiment, the decomposition matrix forming unit 53 is specifically configured to:

In one embodiment, the non-diagonal positions in the covariance matrix include a first position; the rotation transformation unit 54 is specifically configured to:

In one embodiment, the dimension reduction matrix determination unit 55 is configured to:

In one embodiment, the dimension reduction matrix determination unit 55 is further configured to:

According to an embodiment of another aspect, an apparatus for multi-party federated dimension reduction for private data is provided, where the apparatus is deployed in a kth holding party of any M data holding parties, and each holding party may be implemented as any device, platform or device cluster having data storage, computation and processing capabilities. Figure 6 shows a schematic block diagram of an apparatus deployed in a kth owner for federated dimension reduction, in accordance with one embodiment. In the case of a horizontal data distribution, the kth holder stores attribute values of the D item attributes of a plurality of samples, and M-1 other holders among the M data holders store attribute values of the D item attributes of other samples, respectively. As shown in fig. 6, the apparatus 600 disposed in the kth holder includes:

an original matrix constructing unit 61 configured to construct a kth original matrix based on attribute values of the D-term attributes of the plurality of samples in a predetermined attribute order;

the centralized processing unit 62 is configured to perform zero-mean processing on each attribute in the D-item attributes through the secure multi-party computing MPC to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

a decomposition matrix forming unit 63 configured to locally calculate a matrix obtained by multiplying the kth center matrix by a transpose matrix thereof as a kth decomposition matrix; the sum of the k decomposition matrix and the corresponding decomposition matrix in the M-1 other holding parties forms a covariance matrix of the joint matrix;

a rotation transformation unit 64 configured to determine each planar rotation matrix corresponding to each position on the off-diagonal line in the covariance matrix based on the secure multi-party calculation MPC, and transform the kth decomposition matrix by using each planar rotation matrix to obtain a kth transformation matrix;

a dimension reduction matrix determination unit 65 configured to determine a dimension reduction transformation matrix based on the sum of transformation matrices obtained by the M data holders, the plane rotation matrices, and a dimension reduction target dimension;

a dimension reduction processing unit 66 configured to process the kth original matrix by using the dimension reduction transformation matrix to obtain a kth dimension reduction matrix;

and the dimension reduction comprehensive unit 67 is configured to splice the dimension reduction matrixes in each holding party to obtain a total dimension reduction matrix obtained by performing dimension reduction processing on the D-item attributes of all the samples.

In one embodiment, one row of the k-th original matrix corresponds to one attribute and one column corresponds to one sample. In such a case, the centralized processing unit 62 is specifically configured to:

In one embodiment, in a case where one row of the k-th original matrix corresponds to one attribute and one column corresponds to one sample, the joint matrix may be a matrix formed by laterally splicing the k-th central matrix and the central matrices corresponding to the M-1 other possessors, respectively.

According to one embodiment, each position on the non-diagonal line in the covariance matrix comprises a first position; the rotation transformation unit 64 may be configured to:

determining a second position and a third position obtained by respectively mapping the first position on the diagonal line in the transverse direction and the longitudinal direction:

In one embodiment, the dimension reduction matrix determining unit 65 is specifically configured to:

According to an embodiment, the dimensionality reduction integration unit 67 is configured to vertically splice the transposed dimensionality reduction matrix of each possessor to obtain the total dimensionality reduction matrix.

Through the device, multi-party joint dimension reduction of privacy protection is realized.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3 and 4.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 3 and 4.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method for carrying out multi-party joint dimension reduction processing on user privacy data is executed by any kth holding party in M data holding parties, wherein the kth holding party stores attribute values of multiple user attributes of N users, and M-1 other holding parties in the M data holding parties respectively store attribute values of other user attributes of the N users; the method comprises the following steps:

2. The method of claim 1, wherein one row of the k-th original matrix corresponds to one user attribute, and one column corresponds to one user;

the locally performing zero-mean processing on each attribute of the plurality of user attributes includes:

for each row of the k-th original matrix, calculating the mean value of the row, and subtracting the mean value from all elements of the row to obtain the k-th central matrix.

3. The method according to claim 2, wherein the joint matrix is a matrix formed by longitudinally splicing the k-th central matrix with the central matrix corresponding to each of the M-1 other holders.

4. The method of claim 1, wherein stitching to form a kth decomposition matrix based on the first matrix and the plurality of second matrices comprises:

5. The method of claim 1, wherein each position in the covariance matrix that is off-diagonal comprises a first position; the determining each plane rotation matrix corresponding to each position on the off-diagonal in the covariance matrix based on the secure multi-party computing MPC comprises:

6. The method of claim 1, wherein determining a dimensionality reduction transformation matrix based on a sum of transformation matrices obtained by the M data-holders, the respective plane rotation matrices, and a dimensionality reduction target dimension comprises:

7. The method of claim 1, wherein determining a kth dimension-reducing transformation sub-matrix corresponding to the kth holding party in the dimension-reducing transformation matrix comprises:

8. A method for carrying out multi-party joint dimension reduction processing on user privacy data is executed by any kth holding party in M data holding parties, wherein the kth holding party stores attribute values of D user attributes of a plurality of users, and M-1 other holding parties in the M data holding parties respectively store attribute values of the D user attributes of other users; the method comprises the following steps:

9. The method of claim 8, wherein one row of the k-th original matrix corresponds to one user attribute, and one column corresponds to one user;

the zero-mean processing is performed on each attribute in the D user attributes through the secure multi-party computing MPC, and the zero-mean processing comprises the following steps:

summing the row vector and the sample number of each holding party by using an MPC (process control protocol) mode with other holding parties to obtain a total row vector and a total number of users;

obtaining a total mean vector, wherein the total mean vector is the total row vector divided by the total number of users;

10. The method of claim 9, wherein obtaining an overall mean vector comprises:

calculating the total mean vector according to the total row vector and the total user number; or

Receiving the overall mean vector broadcast by other parties, wherein the other parties are one of the M-1 other holding parties; alternatively, the other party is a neutral third party.

11. The method of claim 8, wherein one row of the k-th original matrix corresponds to one user attribute, and one column corresponds to one user; the joint matrix is a matrix formed by transversely splicing the k-th central matrix and the central matrices corresponding to the M-1 other holding parties.

12. The method of claim 8, wherein each position in the covariance matrix that is off-diagonal comprises a first position; the determining each plane rotation matrix corresponding to each position on the off-diagonal in the covariance matrix based on the secure multi-party computing MPC comprises:

13. The method of claim 8, wherein determining a dimensionality reduction transformation matrix based on a sum of transformation matrices obtained by the M data-holders, the respective plane rotation matrices, and a dimensionality reduction target dimension comprises:

14. The method of claim 8, wherein the step of splicing the dimensionality reduction matrices in the respective holding parties to obtain a total dimensionality reduction matrix obtained by performing dimensionality reduction on the D attributes of all the samples comprises:

and longitudinally splicing the transposed dimension reduction matrixes of all holding parties to obtain the total dimension reduction matrix.

15. A device for multi-party joint dimension reduction processing aiming at user privacy data is deployed in any kth holding party of M data holding parties, wherein the kth holding party stores attribute values of multiple user attributes of N users, and M-1 other holding parties of the M data holding parties respectively store attribute values of other user attributes of the N users; the device comprises:

the centralized processing unit is configured to locally perform zero-mean processing on each attribute in the multiple user attributes to obtain a kth central matrix; the kth central matrix forms a joint matrix under the condition that the kth central matrix is spliced with the central matrices corresponding to the M-1 other holding parties respectively;

and the dimension reduction comprehensive unit is configured to sum dimension reduction matrixes in all the holding parties by using the MPC mode to obtain a user feature matrix after dimension reduction processing is carried out on all the user attributes of the N users, and the user feature matrix is used for carrying out user analysis.

16. A device for multi-party joint dimension reduction processing aiming at user privacy data is deployed in any kth holding party of M data holding parties, wherein the kth holding party stores attribute values of D user attributes of a plurality of users, and M-1 other holding parties of the M data holding parties respectively store attribute values of the D user attributes of other users; the device comprises:

17. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-14.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-14.