CN109409117B

CN109409117B - Differential privacy protection method and device for sensitive data

Info

Publication number: CN109409117B
Application number: CN201710697388.9A
Authority: CN
Inventors: 刘子奇; 周俊; 李小龙
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd
Priority date: 2017-08-15
Filing date: 2017-08-15
Publication date: 2021-10-22
Anticipated expiration: 2037-08-15
Also published as: CN109409117A

Abstract

The present specification provides a differential privacy protection method for sensitive data, where the sensitive data is a matrix X of dimensions n × d, X is decomposed into a product of a matrix P of dimensions n × k and a matrix Q of dimensions k × d, and n, k, and d are natural numbers, the method including: according to the value ranges of X, P and Q, determining that the value is not less than

The supremum of (B); x is the number of_iIs the ith row of the matrix X, i is a natural number from 1 to n;

given P, Q, x_iA likelihood function of (a); in proportion to

Posterior distribution of

Sampling, wherein P and Q obtained by sampling are output data meeting the epsilon-difference privacy;

is a priori distribution over P and Q.

Description

Differential privacy protection method and device for sensitive data

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a differential privacy protection method and apparatus for sensitive data, and a data mining method and apparatus for differential privacy protection.

Background

With the development and popularization of the internet, various activities performed on the basis of the network generate data continuously, and a lot of enterprises, governments, even individuals and the like master a lot of user data. The data mining technology can find valuable knowledge, modes, rules and other information from a large amount of data, provides auxiliary support for scientific research, business decision, process control and the like, and becomes an important mode for data utilization.

In some application scenarios, the data used for mining contains a lot of sensitive information, such as data of the financial industry, data of government departments, and the like. How to protect the privacy of the sensitive information in the data mining process becomes an increasingly concerned problem.

Differential Privacy (DP) defines a model that quantifies the risk of leakage of sensitive information formally, if there is a positive real e>0, a randomization algorithm

Having an input field of

For arbitrary input data sets

And X, Y differ by only one record, an

If equation 1 holds, the algorithm

Satisfy ∈ -differential privacy.

In formula 1, P (-) represents the probability of sensitive information being revealed; the element belongs to the differential privacy protection parameter and represents the strength of the protection capability, and the larger the element belongs to, the worse the protection capability is, and the smaller the protection capability is, the better the protection capability is.

Therefore, the difference privacy limits any record pair algorithm

The influence of the output result is made to pass through an analysis algorithm

The output of (a) can be obtained from a record in the input data set, which is almost comparable to the information that can be obtained from an input data set without such a record. When the differential privacy technology is applied to an actual scene, the difficulty is to design an efficient algorithm capable of processing large-scale data.

Disclosure of Invention

In view of the above, the present specification provides a differential privacy protection method for sensitive data, where the sensitive data is a matrix X of dimensions n × d, X is decomposed into a product of a matrix P of dimensions n × k and a matrix Q of dimensions k × d, and n, k, and d are natural numbers, the method including:

according to the value ranges of X, P and Q, determining that the value is not less than

given P, Q, x_iA likelihood function of (a);

in proportion to

Posterior distribution of

is a priori distribution over P and Q.

The data mining method for differential privacy protection provided by the specification comprises the following steps:

acquiring a matrix P with dimensions of n x k; the matrix P is proportional to

Posterior distribution of

Sampling to obtain; x is a matrix with n X d dimensions and can be decomposed into a product of a matrix P and a matrix Q with k X d dimensions; x is the number of_iIs the ith row of the matrix X;

given P, Q, x_iA likelihood function of (a);

is a prior distribution over P and Q; b is determined according to the value ranges of X, P and Q and is not less than

The supremum of the maximum value of (a); n, k and d are natural numbers, i is a natural number from 1 to n, and epsilon is a differential privacy protection parameter;

and generating a training sample by adopting the matrix P, and training the data mining model.

The present specification also provides an apparatus for differential privacy protection of sensitive data, where the sensitive data is a matrix X of dimensions n × d, X is decomposable to a product of a matrix P of dimensions n × k and a matrix Q of dimensions k × d, and n, k, and d are natural numbers, the apparatus including:

a supremum determining unit for determining not less than the value range of X, P and Q

given P, Q, x_iA likelihood function of (a);

a posterior sampling unit for proportional to

Posterior distribution of

is a priori distribution over P and Q.

The present specification provides a data mining apparatus with differential privacy protection, including:

a protection data acquisition unit for acquiring a matrix P of n × k dimensions; the matrix P is proportional to

Posterior distribution of

given P, Q, x_iA likelihood function of (a);

and the model training unit is used for generating training samples by adopting the matrix P and training the data mining model.

This specification provides a computer device comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; and when the processor runs the computer program, executing the steps of the differential privacy protection method for the sensitive data.

This specification provides a computer device comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; and when the processor runs the computer program, the steps of the data mining method for differential privacy protection are executed.

The present specification provides a computer readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the differential privacy protection method for sensitive data.

The present specification also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the differential privacy preserving data mining method described above.

As can be seen from the above technical solutions, in the embodiments of the present specification, the sensitive data matrix X is decomposed into a matrix P and a matrix Q, and is determined to be not less than

The supremum B of (P, Q) satisfies the 4B-difference privacy characteristic by using the statistic obtained by the posterior distribution sampling of (P, Q), and is constructed in proportion to

Posterior distribution of

Sampling of P and Q is carried out, so that a matrix decomposition result meeting the epsilon-difference privacy can be obtained without matrix inversion, and large-scale privacy data can be protected fast and efficiently; meanwhile, the method is suitable for the matrix X distributed at will, and can be widely applied to various scenes.

Drawings

FIG. 1 is a flow chart of a method for differential privacy protection of sensitive data in an embodiment of the present description;

FIG. 2 is a flow diagram of a data mining method for differential privacy protection in an embodiment of the present description;

FIG. 3 is a schematic diagram of a privacy preserving and data mining process for sensitive data in an example of application of the present specification;

FIG. 4 is a hardware block diagram of an apparatus for carrying out embodiments of the present description;

FIG. 5 is a logic structure diagram of a differential privacy protection apparatus for sensitive data according to an embodiment of the present disclosure;

fig. 6 is a logical block diagram of a data mining device for differential privacy protection in an embodiment of the present specification.

Detailed Description

Matrix Factorization (MF), which breaks down an original Matrix into products of several matrices, is a dimension reduction technique. The matrix decomposition technique achieves the purpose of compressing, representing and approximating the original matrix by finding the effective low-dimensional features in the original matrix. Thus, the low rank matrix resulting from the matrix decomposition technique is able to retain most of the information content of the original matrix, but is different from the original matrix, so that the matrix decomposition technique can be used to process sensitive data.

Assuming that the sensitive data has n (n is a natural number) records, and each record has d (d is a natural number) data items, the sensitive data can be expressed as a matrix X with n X d dimensions. Matrix decomposition is performed on X, that is, an n × k (k is a natural number) dimensional matrix P and a k × d dimensional matrix Q satisfying formula 1 are found:

x is PQ formula 1

The low rank matrix P or Q is an approximation of the sensitive data (i.e., matrix X), which retains a large amount of information in the sensitive data. If it is desired to use the low rank matrix P or Q as privacy preserving data, it is also necessary to satisfy the condition that it is difficult to reversely derive the sensitive data matrix by the low rank matrix, and the matrix P or Q satisfying the differential privacy requirement may satisfy the above condition.

Embodiments of the present description provide a new differential privacy protection method for sensitive data and a new data mining method for differential privacy protection, which set values for P and QRange of not less than

Is taken as a supremum B, and the statistics based on the (P, Q) posterior distribution satisfy the 4B-differential privacy characteristic, which is proportional to

The (P, Q) posterior distribution of (e) is sampled to obtain P and Q satisfying e-differential privacy. The embodiment of the specification does not relate to inverse matrix sampling for posterior distribution, can quickly and efficiently obtain a decomposed matrix, and is suitable for processing large-scale data or online data; and there is no requirement for the distribution function of the sensitive data.

Embodiments of the present description may be implemented on any device with computing and storage capabilities, such as a mobile phone, a tablet Computer, a PC (Personal Computer), a notebook, a server, and so on; the functions in the embodiments of the present specification may also be implemented by a logical node operating in two or more devices.

In the embodiment of the present specification, a flow of a differential privacy protection method for sensitive data is shown in fig. 1.

Step 110, according to the value ranges of X, P and Q, determining that the value is not less than

Upper limit B of the maximum value of (a).

In the embodiments of the present disclosure, the matrices X, P and Q have respective ranges. That is, for matrix X, there are two real numbers max_XAnd min_XAnd satisfies formula 2:

min_X≤x_i,j≤max_Xformula 2

In formula 2, x_i,jIs the element in ith row and jth column of matrix X, i is a natural number from 1 to n, and j is a natural number from 1 to d.

Similarly, there are two such real numbers for matrices P and Q, respectively.

In an actual application scene, most of original data from a service system has a value range; even for individual specific data items without value ranges, the data items with the value ranges can be converted into the data items with the value ranges through some simple processing modes, and the meaning represented by the data items is not influenced. For example, a data item without a value range can be mapped to a data item with a value range through a proper mathematical function; for another example, the value range may be defined by dividing the data items without value range into several levels, and representing the specific values with the level values.

In one implementation, the raw data may be represented as a matrix X in n X d dimensions^pIn the pair X^pAnd taking the matrix X obtained after normalization processing according to the columns as a sensitive data matrix. Thus, x_i,jHas a value range of [0,1 ]]。

For the matrices P and Q, the value ranges of the elements can be preset. Because the information contained in the matrix X is embodied on the relative value between the elements in P and Q, the limit value range does not influence the approximation degree of the information carried in P and Q and the information in X.

Is x at P, Q_iOf a likelihood function of (2), wherein x_iIs the ith row of the matrix X; indicating the likelihood that X is observed given P and Q.

Is a logarithmic form of the likelihood function.

In the embodiment of the specification, the matrix X obeys a certain distribution, and the form of the distribution function of the matrix X can be predetermined according to factors such as the requirement of an actual application scene, the characteristics of sensitive data and the like; the parameters of the X distribution function may be determined using sensitive data according to the prior art, and are not described in detail.

For a matrix X that follows any one of the distributions, after the form and parameters of its distribution function have been determined,

will be composed of x_iThe value of the element(s) in P and Q, and the form and parameters of the distribution function of X. Thus, can be based on x_iThe value ranges of the elements in P and Q and the parameters of the X distribution function to obtain

To thereby determine the range of values of

Is measured.

How to determine X is based on the normal distribution is described below as an example

Is measured. Assuming that the standard deviation of the normal distribution of X is σ, equation 3 holds:

in formula 3, Q^TIs a transposed matrix of Q, p_iIs line i of P, q_jColumn j of Q, Const is determined by the mean and standard deviation σ of the normal distribution of X, and is equal to X_iA constant independent of the values of the elements in P and Q.

It can be seen that x_iThe value ranges of the elements in P and Q are applied to the formula 3, and then- (x) can be determined_i,j-p_i·q_j)²Range of possible values, thereby obtaining

Range of values of (a). For X that follows a normal distribution,

is determined by the value ranges of the matrix X, P and Q, and the mean and variance of the normal distribution to which X follows.

The matrix X having other forms of distribution functionsWhen, for example, a position (poisson) distribution, a discrete distribution, a beta distribution, etc., the corresponding distribution can also be distributed by means of this form

The expression of (2) is obtained by the value ranges and distribution parameters of the elements in X, P and Q

Is measured. And will not be described in detail.

In the embodiments of the present specification, it may be not less than

A certain value of the maximum value of (B) as the supremum B. In other words, the supremum boundary B satisfies equation 4, i.e., for all possible x_iB satisfies the condition of being not less than maximum

Step 120, in proportion to

Posterior distribution of

is a prior distribution of P and Q.

Suppose that one of the matrices X is recorded as X_kIs replaced by x'_kIf we get matrix X', let θ be (P, Q), then sample an arbitrary a posteriori distribution of (P, Q):

because:

can be seen from any

The sampled statistics satisfy 4B-differential privacy. Then for any differential privacy protection parameter e, from being proportional to

Posterior distribution of

Sampling is performed as shown in equation 5:

any P and Q obtained by sampling according to the formula 5 meet the requirement of epsilon-differential privacy and can be used as the output of differential privacy matrix decomposition.

In the embodiments of the present description, any sampling method may be used for sampling, and is not limited. For example, a Markov chain Monte Carlo (Markov chain Monte Carlo), a random Gradient Hamilton Monte Carlo (Stochastic Gradient Hamiltonian Monte Carlo), or the like can be used as the sampling method.

Matrices P and Q, which are full e-differential privacy requirements, can be used as data for which privacy protection has been accomplished. For example, in one example, the n × k dimensional matrix P may be provided to a data mining party as a data source or a part of the data source for data mining, so as to protect sensitive data and learn a new model by using information carried by the sensitive data as an input of a machine learning model.

In the embodiment of the present specification, a flow of a data mining method for differential privacy protection is shown in fig. 2.

In step 210, a matrix P of dimensions n x k is obtained. Matrix P is proportional to

Posterior distribution of

Sampling to obtain; e is a differential privacy protection parameter; x is a matrix with n X d dimensions and can be decomposed into a product of a matrix P and a matrix Q with k X d dimensions; x is the number of_iIs the ith row of the matrix X;

given P, Q, x_iA likelihood function of (a);

The supremum of (a).

By using the differential privacy protection method for sensitive data in the embodiment of the present specification, a party having sensitive data decomposes a sensitive data matrix X into matrices P and Q, and then provides the matrix P with dimensions n × k as a data source after differential privacy protection to a party performing data mining. The party performing data mining may obtain the matrix P from the party owning sensitive data in any way, and the embodiments of the present specification are not limited.

And step 220, training the data mining model by using the obtained training samples.

After obtaining the matrix P, the party performing data mining may train the data mining model by using the matrix P as a training sample with a capacity of n (i.e., using each row of P as a data record); or the matrix P may be used as a partial data source, and after data fusion is performed with other data sources, a training sample is generated, and then the generated training sample is used to train the data mining model.

The specific data fusion mode can be determined according to factors such as the characteristics of data in an actual application scene, the type of a data mining model and the like, and is not limited; for example, assuming that the matrix P contains the differential privacy protection data of n users, for a trusted data mining party, t (t is a natural number) data items without sensitive information of the same n users may be spliced with the matrix P to form n × t + t training samples.

In the embodiments of the present specification, the type of the data mining model and the specific training method are not limited.

It can be seen that in the embodiments of the present specification, the sensitive data matrix X is decomposed into a matrix P and a matrix Q having a certain value range so as to be not less than

Is proportional to the value of the maximum value of (P, Q) as the supremum B, satisfies the 4B-differential privacy characteristic using the statistics obtained from the posterior distribution sampling of (P, Q)

The (P, Q) posterior distribution of (A, B) is subjected to P and Q sampling, so that a matrix decomposition result meeting the e-difference privacy can be obtained without matrix inversion, a decomposition matrix of large-scale data or online data can be obtained quickly and efficiently, and no requirement is imposed on a distribution function of X.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In one application example of the specification, a data provider entrusts a trusted data miner to mine user data, and provides the data miner with a data source required for partial mining. The data provider is the party who grasps the original data with user sensitive information, and the data miner grasps non-sensitive information by the same user group. Before the data mining party provides the data to the data mining party, the original data needs to be subjected to differential privacy protection. The specific data processing procedure is shown in fig. 3.

At a data provider, d sensitive data items of n users are constructed into an n-X-d original data matrix, and the matrix X is obtained after the original data matrix is subjected to normalization processing according to columns. The matrix X follows a normal distribution with mean μ and standard deviation σ.

At the data provider, according to equation 3, the determination is made

And B satisfying equation 4 is taken as a supremum. Then according to equation 5, is proportional to

Posterior distribution of

And sampling to obtain P and Q meeting the requirement of epsilon-difference privacy.

The data provider provides the matrix P to the data miner.

At the data mining side, t non-sensitive data items of the same n users are constructed into a non-sensitive data matrix with n x t dimensions. After the data mining side obtains the matrix P, the matrix P is obtained for each P_iAnd the affiliated user performs data fusion on the non-sensitive data matrix and the matrix P, and generates an n-x (k + t) -dimensional matrix Y after the data fusion, wherein each row of the Y comprises t non-sensitive data items of the same user and k data items of the user in the matrix P.

And training the data mining model by taking the matrix Y as a training sample at a data mining party to obtain a result model.

Corresponding to the implementation of the above flow, embodiments of the present specification further provide a differential privacy protection device for sensitive data and a data mining device for differential privacy protection. Both of these means can be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, the logical device is formed by reading a corresponding computer program instruction into a memory for running through a Central Processing Unit (CPU) of the device. In terms of hardware, the device in which the apparatus is located generally includes other hardware such as a chip for transmitting and receiving wireless signals and/or other hardware such as a board for implementing a network communication function, in addition to the CPU, the memory, and the storage shown in fig. 4.

Fig. 5 illustrates a differential privacy protection apparatus for sensitive data, according to an embodiment of the present disclosure, where the sensitive data is a matrix X of dimensions n × d, X may be decomposed into a product of a matrix P of dimensions n × k and a matrix Q of dimensions k × d, and n, k, and d are natural numbers, the apparatus includes an infimum determination unit and an a posteriori sampling unit, where: the supremum determining unit is used for determining that the value is not less than the value range of X, P and Q

given P, Q, x_iA likelihood function of (a); the posterior sampling unit is used for proportional

Posterior distribution of

Sampling is carried out, and the P and Q obtained by sampling are output numbers meeting the epsilon-difference privacyAccordingly;

is a priori distribution over P and Q.

Optionally, the apparatus further includes a sampling data output unit, configured to provide the matrix P to a data mining party, so that the data mining party performs data mining on the matrix P as at least part of the data source.

Optionally, the apparatus further includes a normalization processing unit, configured to perform normalization processing on the matrix X by columns before determining the supremum boundary B.

Optionally, X follows a normal distribution; the above-mentioned

Is determined according to the value ranges of X, P and Q, and the mean and variance of the normal distribution.

Optionally, the posterior sampling unit is specifically configured to: in proportion to

Posterior distribution of

And carrying out sampling by adopting a Markov chain Monte Carlo sampling method or a random gradient Hamilton Monte Carlo sampling method.

Fig. 6 is a diagram illustrating a data mining apparatus for differential privacy protection according to an embodiment of the present disclosure, where the data mining apparatus includes a protected data obtaining unit and a model training unit, where: the protection data acquisition unit is used for acquiring a matrix P with n x k dimensions; the matrix P is proportional to

Posterior distribution of

given P, Q, x_iA likelihood function of (a);

The supremum of the maximum value of (a); n, k and d are natural numbers, i is a natural number from 1 to n, and epsilon is a differential privacy protection parameter; and the model training unit is used for generating training samples by adopting the matrix P and training the data mining model.

Optionally, the model training unit is specifically configured to: and taking the matrix P as a partial data source, performing data fusion with other data sources to generate a training sample, and training the data mining model.

Embodiments of the present description provide a computer device that includes a memory and a processor. Wherein the memory has stored thereon a computer program executable by the processor; the processor, when executing the stored computer program, performs the steps of the differential privacy protection method for sensitive data in the embodiments of the present specification. For a detailed description of the steps of the differential privacy protection method for sensitive data, reference is made to the preceding contents, which are not repeated.

Embodiments of the present description provide a computer device that includes a memory and a processor. Wherein the memory has stored thereon a computer program executable by the processor; the processor, when executing the stored computer program, performs the steps of the differential privacy preserving data mining method of the embodiments of the present specification. For a detailed description of the steps of the data mining method for differential privacy protection, refer to the previous contents and are not repeated.

Embodiments of the present description provide a computer-readable storage medium having stored thereon computer programs which, when executed by a processor, perform the steps of the differential privacy protection method for sensitive data in embodiments of the present description. For a detailed description of the steps of the differential privacy protection method for sensitive data, reference is made to the preceding contents, which are not repeated.

Embodiments of the present description provide a computer-readable storage medium having stored thereon computer programs which, when executed by a processor, perform the steps of the differential privacy preserving data mining method of embodiments of the present description. For a detailed description of the steps of the data mining method for differential privacy protection, refer to the previous contents and are not repeated.

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Claims

1. A method of differential privacy protection for sensitive data, the sensitive data being a matrix X of dimensions n X d, X being resolvable as a product of a matrix P of dimensions n X k and a matrix Q of dimensions k X d, n, k, d being natural numbers, the method comprising:

given P, Q, x_iA likelihood function of (a);

in proportion to

Posterior distribution of

is a prior distribution over P and Q; the sampling is performed according to the following equation:

2. the method of claim 1, further comprising: and providing the matrix P to a data mining party, and enabling the data mining party to perform data mining by using the matrix P as at least part of data source.

3. The method of claim 1, further comprising: before the supremum boundary B is determined, the matrix X is normalized by columns.

4. The method of claim 1, wherein X follows a normal distribution; the above-mentioned

5. The method of claim 1, wherein said compliance is proportional to

Posterior distribution of

Sampling is carried out, including: in proportion to

Posterior distribution of

6. A differential privacy protected data mining method, comprising:

acquiring a matrix P with dimensions of n x k; the matrix P is proportional to

Posterior distribution of

Sampling to obtain; the sampling is performed according to the following equation:

x is a matrix with n X d dimensions and can be decomposed into a product of a matrix P and a matrix Q with k X d dimensions; x is the number of_iIs the ith row of the matrix X;

given P, Q, x_iA likelihood function of (a);

7. The method of claim 6, wherein the training of the data mining model using the training samples generated by the matrix P comprises: and taking the matrix P as a partial data source, performing data fusion with other data sources to generate a training sample, and training the data mining model.

8. An apparatus for differential privacy protection of sensitive data, the sensitive data being a matrix X of dimensions n X d, X being resolvable as a product of a matrix P of dimensions n X k and a matrix Q of dimensions k X d, n, k, d being natural numbers, the apparatus comprising:

given P, Q, x_iA likelihood function of (a);

a posterior sampling unit for proportional to

Posterior distribution of

9. the apparatus of claim 8, the apparatus further comprising: and the sampling data output unit is used for providing the matrix P to the data mining party, and the data mining party performs data mining by taking the matrix P as at least part of data source.

10. The apparatus of claim 8, the apparatus further comprising: and the normalization processing unit is used for performing normalization processing on the matrix X according to columns before the supremum boundary B is determined.

11. The apparatus of claim 8, the X obeys a normal distribution; the above-mentioned

12. The apparatus according to claim 8, the a posteriori sampling unit being specifically configured to: in proportion to

Posterior distribution of

13. A differential privacy protected data mining apparatus, comprising:

Posterior distribution of

given P, Q, x_iA likelihood function of (a);

14. The apparatus of claim 13, the model training unit to: and taking the matrix P as a partial data source, performing data fusion with other data sources to generate a training sample, and training the data mining model.

15. A computer device, comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; the processor, when executing the computer program, performs the method of any of claims 1 to 5.

16. A computer device, comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; the processor, when executing the computer program, performs the method of any of claims 6 to 7.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.

18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 6 to 7.