Summary of the invention
This specification embodiment is intended to provide a kind of more effective data ciphering method, with solve it is in the prior art not
Foot.
To achieve the above object, this specification provides a kind of data ciphering method on one side, and the method implements difference
Privacy algorithm, comprising: obtain raw data matrix, the raw data matrix limits multiple points of the first dimensional space, wherein
The number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, the dimension of first dimensional space
Degree is the number of dimensions of described eigenvector;It obtains for limiting the first parameter of the difference privacy algorithm and for indicating number
According to the second parameter of validity after encryption;Intermediate data matrix is obtained, wherein the intermediate data matrix limits described the
Multiple points in dimension space, also, multiple points that the intermediate data matrix limits is by the initial data squares
Multiple points that battle array limits are disturbed respectively and the point that obtains, wherein the disturbance is based on first parameter and described second
The offset of parameter;And the intermediate data matrix is multiplied with projection matrix, to obtain encryption data matrix, the projection square
Battle array is used for: multiple points that the intermediate data matrix limits are projected as the corresponding multiple of second dimensional space
Point, and making, Euclidean distance between any two points in second dimensional space with it is right in first dimensional space
The ratio of the Euclidean distance between two o'clock is answered in a certain range, wherein the number of dimensions of second dimensional space is based on described the
One parameter and the second parameter and obtain.
In one embodiment, in above-mentioned data ciphering method, the acquisition intermediate data matrix includes: to the original
Beginning data matrix carries out singular value decomposition, the raw data matrix is expressed as to the product of three matrixes, wherein described three
The number of the diagonal element for being located in the middle diagonal matrix in the product of a matrix is equal to the number of dimensions of second dimensional space;
Based on first parameter and the second parameter, disturbance parameter is determined;Based on the disturbance parameter, to each of described diagonal matrix
Diagonal element is deviated;And the product of three matrixes after offset is calculated, using as the intermediate data matrix.
In one embodiment, in above-mentioned data ciphering method, singular value decomposition is carried out to the raw data matrix
Including carrying out averaging operation to the value of each dimension of each point of raw data matrix restriction, wherein the mean value
For the multiple point each value of identical dimensional average value;And to the initial data square after past averaging operation
Battle array carries out singular value decomposition.
In one embodiment, in above-mentioned data ciphering method, the encryption data matrix limits the second dimensional space
In multiple points, also, the encryption data matrix limit multiple points and the raw data matrix limit multiple points minute
Not Dui Ying, and, the distance between two o'clock in multiple points that the encryption data matrix limits and the raw data matrix
The difference of the distance between the correspondence two o'clock in multiple points limited is related to the disturbance parameter.
In one embodiment, in above-mentioned data ciphering method, the projection matrix obtains at random from random matrix, institute
The each matrix element for stating random matrix is stochastic variable, and each stochastic variable is mutually indepedent and has same distribution, wherein
The random matrix meets: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix.
In one embodiment, in above-mentioned data ciphering method, second dimensional space be r dimension space, it is described with
Machine variable meets the Gaussian Profile that desired value is 0, variance is 1/r.
In one embodiment, in above-mentioned data processing method, second dimensional space be r dimension space, it is described with
Machine variable meetsOn be uniformly distributed.
In one embodiment, in above-mentioned data ciphering method, second dimensional space be r dimension space, it is described with
Machine variable meet respectively withParameter probability valuingDistribution.
In one embodiment, in above-mentioned data ciphering method, the difference privacy algorithm is (ε, δ)-difference privacy
Algorithm, first parameter includes ε and δ.
In one embodiment, in above-mentioned data ciphering method, second parameter includes η and ν, and wherein η indicates institute
State the point of raw data matrix restriction to the distance between the maximum relative error that occurs after the processing Jing Guo the method,
ν represents the maximum probability that the method is made a fault in being performed a plurality of times, and, it is described based on first parameter and the second ginseng
Number, the value of the number of dimensions and disturbance parameter that determine the second dimensional space includes determining second dimensional space based on η and ν
Number of dimensions.
In one embodiment, described to be based on first parameter and the second parameter in above-mentioned data ciphering method, really
The number of dimensions of fixed second dimensional space and the value of disturbance parameter include that the number of dimensions based on ε, δ and second dimensional space is true
The value of the fixed disturbance parameter.
On the other hand this specification provides a kind of data encryption device, described device implements difference privacy algorithm, comprising: the
One acquiring unit, is configured to, and obtains raw data matrix, and the raw data matrix limits multiple points of the first dimensional space,
Wherein, the number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, first dimensional space
Number of dimensions be described eigenvector number of dimensions;Second acquisition unit is configured to, and is obtained and is calculated for limiting the difference privacy
First parameter of method and the second parameter for indicating the validity of data after encryption;Unit is disturbed, is configured to, is obtained intermediate
Data matrix, wherein the intermediate data matrix limits multiple points in first dimensional space, also, the intermediate data
Multiple points that matrix limits as the point that is obtained and being disturbed respectively to multiple points that the raw data matrix limits,
Described in disturbance be the offset based on first parameter and second parameter;And projecting cell, it is configured to, in described
Between data matrix be multiplied with projection matrix, to obtain encryption data matrix, the projection matrix is used for: by the intermediate data square
Multiple points that battle array limits are projected as corresponding multiple points of second dimensional space, and make, second dimension
The ratio of the Euclidean distance between two o'clock is corresponded in the Euclidean distance and first dimensional space between any two points in space
It is worth in a certain range, wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.
In one embodiment, in above-mentioned data encryption device, the disturbance unit further includes following subelement: being decomposed
Subelement is configured to, and carries out singular value decomposition to the raw data matrix, the raw data matrix is expressed as three
The product of matrix, wherein the number of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes is equal to institute
State the number of dimensions of the second dimensional space;It determines subelement, is configured to, be based on first parameter and the second parameter, determine disturbance
Parameter;Subelement is deviated, is configured to, the disturbance parameter is based on, each diagonal element of the diagonal matrix is deviated;With
And computation subunit, it is configured to, the product of three matrixes after calculating offset, using as the intermediate data matrix.
In one embodiment, in above-mentioned data encryption device, the decomposition subelement is additionally configured to, to described original
The value of each dimension for each point that data matrix limits carries out averaging operation, wherein the mean value is that the multiple point exists
The average value of each value of identical dimensional;And singular value point is carried out to the raw data matrix after past averaging operation
Solution.
In the Data Encryption Scheme according to this specification embodiment, by being based on difference privacy parameters and data validity
Parameter disturbs initial data, can be with the contradiction between balancing safety and validity, and to safety and can have
Effect property provides stringent quantization and guarantees.Meanwhile initial data is projected using the projection matrix based on J-L lemma, into one
Step ensure that data validity.
Specific embodiment
This specification embodiment is described below in conjunction with attached drawing.
Fig. 1 shows the application scenarios of the data ciphering method according to this specification embodiment.It include multiple in the scene
Data providing 11 and data processing side 12.The data providing 11 is, for example, shopping website, social activity APP etc., each data
Provider be owned by oneself user group and the user group characteristic.The data processing side 12 usually has big
Data-handling capacity is, for example, ant gold clothes.Multiple data providings 11 provide to data processing side 12 encrypted respectively
The characteristic of its user, so that data processing side 12 individually builds the encryption data of each data providing 11 respectively
Mould and analysis.In this scenario, by using difference privacy technology, so that protecting user hidden while ensureing availability of data
It is private.Since the data of the 12 pairs of different data providings 11 in data processing side are individually handled respectively, data providing
11 local initial data can be encrypted after send it to data processing side 12 again, without disclosing in ciphering process
Part.And data processing side 12 is after receiving encryption data, and encrypted data can be carried out with modeling analysis, and nothing
Method obtains any information relevant to initial data.
In the server of data providing 11, firstly, parameter and data validity parameter based on difference privacy algorithm,
The raw data matrix is applied and is disturbed, to obtain intermediate data matrix, then, by by intermediate data matrix and projection
Matrix multiple, to obtain encryption data matrix.Wherein, the projection matrix is to meet J-L lemma (Johnson-
Lindenstrauss lemma) the projection matrix obtained at random.Processing in this way, so that the encryption data matrix obtained
Meet difference privacy standard, also meet data validity standard, thus effectively equilibrium data validity and personal secrets.
Fig. 2 shows the flow charts according to the data ciphering method of this specification embodiment.In one embodiment, the party
Method is executed in the server end of data providing, however, this method is not limited to execute in the server end of data providing, example
It can such as be executed in third-party server end, the server end in data processing side executes, etc..It is hidden that the method implements difference
Private algorithm, comprising the following steps: in step S21, obtain raw data matrix, it is empty that the raw data matrix limits the first dimension
Between multiple points, wherein the number of the multiple point correspond to number of users, the point correspond to user feature vector, it is described
The number of dimensions of first dimensional space is the number of dimensions of described eigenvector;In step S22, obtain for limiting the difference privacy
First parameter of algorithm and the second parameter for indicating the validity of data after encryption;In step S23, intermediate data is obtained
Matrix, wherein the intermediate data matrix limits multiple points in first dimensional space, also, the intermediate data matrix
The multiple points limited is the points obtained and disturbing respectively to multiple points that the raw data matrix limits, wherein institute
Stating disturbance is the offset based on first parameter and second parameter;And in step S24, by the intermediate data matrix
It is multiplied, to obtain encryption data matrix, the projection matrix is used for: the intermediate data matrix being limited more with projection matrix
A point is projected as corresponding multiple points of second dimensional space, and makes, appointing in second dimensional space
The ratio of the Euclidean distance between two o'clock is corresponded in Euclidean distance and first dimensional space between meaning two o'clock in certain model
In enclosing, wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.
Firstly, obtaining raw data matrix in step S21, the raw data matrix limits the more of the first dimensional space
A point, wherein the number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, first dimension
The number of dimensions for spending space is the number of dimensions of described eigenvector.For example, prestoring n row d column in data providing server end
Matrix A, so that the matrix A prestored described in obtaining is as raw data matrix.In another embodiment, data providing server
End randomly selects the d feature of n user according to the inquiry request at data processing server end from the user data of storage
Data form the matrix A of n row d column, using as raw data matrix.
In the raw data matrix of such as n row d column, n for example represents number of users, and quantity is million magnitudes, Qian Wanliang
Grade etc..D is, for example, the characteristic of each user, for example, when data providing is shopping website, feature that each user includes
For example including gender, age, address, purchase type of merchandize, purchase commodity price, shopping hours etc., quantity be thousand magnitudes,
Ten thousand magnitudes etc..Each matrix element A of the matrix AijThe characteristic value of j-th of feature of i-th of user is represented, wherein 1≤i≤n,
1≤j≤d.Raw data matrix A can be understood as limiting n point of d dimension space, that is, every row in raw data matrix A can
To regard the feature vector of user, the dimension of dimension, that is, user feature vector of d dimension space, the i.e. columns of matrix A as.It can manage
Solution, raw data matrix are not limited to the matrix of n row d column, such as can be the matrix of d row n column.
In step S22, obtain for limiting the first parameter of the difference privacy algorithm and for indicating that data are encrypting
Second parameter of validity afterwards.In this specification embodiment, the difference privacy algorithm can be selected according to concrete scene demand
Various difference privacy algorithms are taken, such as (ε, δ)-difference privacy algorithm, ε-difference privacy algorithm, stochastic difference privacy algorithm etc..
The parameter of each difference privacy algorithm can correspondingly be obtained.For example, for (ε, δ)-difference privacy algorithm A (X), that is, for institute
There is satisfaction only discrepant input X and X ' and all possible output on a user characteristics
There is the establishment of condition shown in following formula (1):
Wherein, ε has quantified the maximum relative error for a possibility that feature of single data record is by privacy leakage;δ quantization
It may by the record percentage of privacy leakage in all records.The value of ε and δ is smaller, then personal secrets are higher.It can
The value of ε and δ are determined according to specific application scenarios.For example, indicating the field of multiple characteristic values of n user in raw data matrix
ε can be set as the magnitude less than Ln (10), δ can be set as to the magnitude less than 0.1 by Jing Zhong.
The parameter for indicating the validity of encryption data can be determined according to specific application scenarios.For example, in a reality
It applies in example, for indicating that the parameter of validity of encryption data includes η and ν, wherein η indicates what the raw data matrix limited
Point to the distance between the maximum relative error that occurs after the processing Jing Guo the method, ν represent the method multiple
The maximum probability made a fault in execution, wherein the value of η and ν is smaller, then data validity is higher.It can be according to specific application
Scene selectes the value of η and ν, for example, in the scene that raw data matrix indicates multiple characteristic values of n user, it can be by η and ν
All value is the magnitude less than 0.1.The parameter for indicating encryption data validity is not limited to above-mentioned η and ν, for example, to original
Beginning data matrix increases Gauss disturbance can indicate encryption data by the variances sigma of Gauss disturbance in the scene that is encrypted
Validity, that is, σ is smaller, and the validity of data is better.
In step S23, intermediate data matrix is obtained, wherein the intermediate data matrix limits in first dimensional space
Multiple points, also, multiple points for limiting of the intermediate data matrix is multiple by limiting the raw data matrix
The point for being disturbed and being obtained respectively is put, wherein the disturbance is the offset based on first parameter and second parameter.
In one embodiment, the intermediate data matrix is obtained by following steps shown in Fig. 3:
In step S31, singular value decomposition is carried out to the raw data matrix, the raw data matrix is expressed as
The product of three matrixes, wherein the number etc. of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes
In the number of dimensions of the second dimensional space.
As it is known by the man skilled in the art, giving any m with reference to the process of the singular value decomposition shown in Fig. 4 to matrix A
The matrix of row n columnThe singular value decomposition of A can be expressed as the product of three matrixes: A=U Σ VT.WhereinWithRespectively orthogonal matrix, Σ are diagonal matrix.It is non-on the leading diagonal of diagonal matrix Σ
Zero is known as the singular value of matrix A, wherein the singular value arranges from big to small on leading diagonal, with rectangular big in figure
It is small to schematically illustrate.
Diagonal line is worked as in singular value decomposition to the reduction process of diagonal matrix with reference in singular value decomposition shown in fig. 5
On singular value be less than predetermined threshold when, it is smaller to the importance of initial matrix A.Therefore, by the way that singular value number to be taken as
R gives up the lesser singular value after r-th of singular value, then what the singular value decomposition of matrix A can be of equal value is expressed as A=U Σ
VT, wherein It is the square matrix of singular value for diagonal line.By the reduction process,
Data calculation amount can be reduced under the premise of keeping data validity.
In this embodiment, raw data matrix is decomposed into the product of three matrixes by singular value decomposition, wherein in
Between diagonal matrix be diagonal matrix after reduction, the number of diagonal element is the number of dimensions of above-mentioned second dimensional space.Institute
Stating the second dimensional space is the dimension obtained and the projection matrix by being described below projects the intermediate data matrix
Spend space.Wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.In a reality
It applies in example, second parameter includes η and ν, wherein the number of dimensions of second dimensional space is determined based on η and ν.At one
In embodiment, if the number of dimensions of the second dimensional space is r, the value of r is obtained based on following formula (2):
It can get from the formula, the magnitude of r isMagnitude.
In one embodiment, the value of r is more specifically limited based on following formula (3):
In one embodiment, carrying out singular value decomposition to the raw data matrix includes, to the initial data square
The value of each dimension for each point that battle array limits carries out averaging operation, wherein the mean value is the multiple point in identical dimension
The average value of each value of degree;And singular value decomposition is carried out to the raw data matrix after past averaging operation.Example
Such as, raw data matrix is the matrix of n row d column, and wherein n indicates number of users, and d indicates the characteristic of user, to initial data square
Battle array carries out averaging operation, that is, carries out operating as shown in formula (4), wherein indicate raw data matrix with A:
1 in formula (4) indicates complete 1 column vector.
By removing averaging operation as shown in formula (4), it is equivalent to and the feature vector of each user has been moved to
Near the origin in dimension space, so as to simplify calculating process.It is appreciated that this goes averaging operation to be not required, In
In the case where not going mean value, the singular value decomposition similarly can be carried out to raw data matrix.
In step S32, it is based on first parameter and the second parameter, determines disturbance parameter.The disturbance parameter is for true
The offset of the fixed diagonal element to the diagonal matrix Σ in above-mentioned steps S31 with by the offset to the diagonal element, and carries out
Disturbance to the raw data matrix (or raw data matrix through past mean value).In one embodiment, for going
The raw data matrix of value then can determine w by following formula (5) if disturbance parameter is w:
Wherein, r is the number of dimensions of the second dimensional space determined above by formula (2) or (3).
By formula (5) it was determined that the magnitude of w isMagnitude.
In one embodiment, it for removing the raw data matrix of mean value, can more specifically be determined by following formula (6)
W:
For the raw data matrix without removing average value processing, can similarly obtain the value of w, for example, based on ε, δ, η and
ν is to w value, so that can satisfy difference privacy to the processing of raw data matrix and at the same time protection data validity.
In step S33, it is based on the disturbance parameter, each diagonal element of the diagonal matrix is deviated.This is specific
Offset operation can be the offset operation as shown in following formula (7):
Wherein, In×dIndicate the unit matrix of n × d.Process shown in formula (7) that is, first done square to the nonzero term of Σ,
Then square for adding w, then again to itself and evolution.
In step S34, the product of three matrixes after calculating offset, using as the intermediate data matrix.By to Σ
Relevant to disturbance parameter offset is carried out, then calculates the product of above-mentioned formula (7), that is, to raw data matrix (or through going
The raw data matrix of average value processing) each single item carried out offset relevant to the disturbance parameter, thus obtain through disturbing
Raw data matrix, i.e. intermediate data matrix.Since the intermediate data matrix is simply by the raw data matrix
Matrix element deviated and obtained, therefore, the intermediate data matrix still limits multiple points in first dimensional space,
That is, the intermediate data matrix is also n × d dimension matrix in the case where raw data matrix is n × d matrix.Also, due to,
For each user, each characteristic value of each user passes through offset relevant to the disturbance parameter, therefore, each user
Vector sum of the offset in first dimensional space of whole features be the offset for corresponding to the point of the user, also,
The offset of user's point is also related to the disturbance parameter.That is, multiple points that the intermediate data matrix limits are logical
The point for being disturbed and being obtained respectively to multiple points that the raw data matrix limits is crossed, wherein the disturbance is based on described
The offset of first parameter and second parameter.
It is appreciated that the method for obtaining the intermediate data matrix is not limited to above-mentioned unusual in this specification embodiment
It is worth the method decomposed, for example, random Gaussian disturbance can be applied by each matrix element to raw data matrix or drawn at random general
Lars disturbance, and obtain intermediate data matrix.
Fig. 2 is returned, in step S24, the intermediate data matrix is multiplied with projection matrix, to obtain encryption data square
Battle array, the projection matrix are used for: multiple points that the intermediate data matrix limits are projected as to minute of second dimensional space
Not corresponding multiple points, and make, the Euclidean distance and described first between any two points in second dimensional space
The ratio of the Euclidean distance between two o'clock is corresponded in dimensional space in a certain range, wherein the dimension of second dimensional space
Base is obtained in first parameter and the second parameter.
For example, wherein n represents number of users, and d is every when as described above, the raw data matrix is the matrix A of n row d column
The characteristic of a user, then intermediate data matrix B is also the matrix of n row d column.To relative to intermediate data matrix, it may be determined that
Projection matrix M is d row r column matrix.Wherein r is second dimensional space of the projection matrix M by the intermediate data matrix projection extremely
Dimension, be based on first parameter and the second gain of parameter, in one embodiment, can by aforementioned formula (2) or
(3) it determines.
By multiplying the right side projection matrix M and raw data matrix A, the encryption data matrix of n row r column, the process can get
It is understood that n point of d dimension space to be projected as to n point of r dimension space.It is appreciated that projection matrix does not limit and original number
Multiply according to the matrix V right side, for example, projection matrix can be the matrix of r row d column when original matrix V is the matrix of d row n column, passes through
By the projection matrix and raw data matrix V premultiplication, the intermediate data matrix of r row n column can get, which can similarly understand
For the n point that n point of d dimension space is projected as to r dimension space.
The projection matrix M meets J-L lemma, that is, n point of d dimension space is projected as the n of r dimension space by projection matrix M
After a point, the corresponding points in two spaces meet following formula (8):
Wherein λJLFor scheduled smaller real number, such as 0 < λJL< 0.1, wherein x and y is two points of d dimension space,Square of Euclidean distance between d dimension space midpoint x and point y.XM and yM is to project to r by projection matrix
In dimension space with the point x and corresponding two points of point y in d dimension space.For r dimension space midpoint xM
Square of Euclidean distance between point yM.It can be obtained by above-mentioned formula (8), the Euclidean distance between point x and point y and point xM and point
Euclidean distance between yM differs 1 ± λJLThe factor.To between Euclidean distance and point xM between point x and point y and point yM
The ratio of Euclidean distance is in a certain range.And due to λJLValue is smaller, such as λJL=0.05, thus, it is believed that point xM and point yM
Between Euclidean distance it is approximate constant compared to the Euclidean distance between point x and point y.Meet the projection of J-L lemma by generating
Matrix M, and multiple points in d dimension space are projected as multiple points in r dimension space using projection matrix M, projection can be passed through
Matrix M encrypts multiple points in d dimension space, simultaneously as guarantee of the J-L lemma to data validity, so as to
The analysis result for multiple points in d dimension space is obtained by multiple points in study r dimension space.In one embodiment
In, r < d, so that dimension-reduction treatment has been carried out to multiple points in d dimension space by projection matrix M, so that it is complicated to reduce calculating
Degree.
In one embodiment, the projection matrix M for meeting J-L lemma, which can be, meets MT* the real matrix of M=I, i.e.,
M is orthogonal matrix, and wherein I is unit matrix.For example, work as n=3, when r=3, i.e. the matrix that M is 3 × 3, M can be following institute
The orthogonal matrix shown:
When 3 points of d dimension space (such as d=5) (such as are indicated the feature vector of 3 users by the orthogonal matrix
Point) when projecting to 3 of r=3 dimension space, for example, the orthogonal matrix and raw data matrix (3 × 5 matrix) right side are multiplied,
It can guarantee that corresponding to the distance between two o'clock in the distance of 3 points between any two and d dimension space in r dimension space is basically unchanged, i.e.,
Meet J-L lemma.However, since using the real matrix arbitrarily obtained, as projection matrix M, projection matrix M itself is without appointing
Therefore what randomness is not contributed the safety of difference privacy algorithm.Here, projection matrix is not limited to square matrix, example
Such as, M may be 3 × 2 matrix etc., as long as it meets MT* M=I.
In one embodiment, the projection matrix M for meeting J-L lemma can be obtained at random from random matrix Q, it is described with
Each matrix element of machine matrix Q is respectively each stochastic variable mutually indepedent and with same distribution.The wherein random square
Battle array meets: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix, i.e. E (QT* Q)=I.
For example, random matrix Q can be as follows by stochastic variable fij(x) matrix of (i=1,2,3, j=1,2,3) composition:
Wherein each fijIt (x) is independent identically distributed stochastic variable, and E (QT* Q)=I.When calculating projection matrix M,
For each fij(x), the random value of x within a predetermined range is independently obtained, for example, the random value in [0,1], then leads to
Cross fij(x) function calculates fij(x) value, to obtain each matrix element of projection matrix M.Here random matrix Q is not limited to
Square matrix, for example, Q may be 3 × 2 matrix etc., as long as it meets E (QT* Q)=I.By from random matrix Q with
Machine obtains projection matrix M, and the randomness by obtaining projection matrix M further increases the safety of difference privacy algorithm.
In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, f for rij(x) meet
The Gaussian Profile that desired value is 0, variance is 1/r.That is, fij(x)~N (0,1/r), that is, fij(x) it is distributed for Gauss cumulative probability
The inverse function of function, x value range are [0,1], indicate fij(x) the Gauss cumulative distribution probability of each value is (from-∞ to the value
Probability integral).When calculating projection matrix M, for each fij(x), the random value in [0,1] is independently obtained as x's
Value, then passes through fij(x) expression formula calculates fij(x) value, to obtain each matrix element of projection matrix M.
In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, f for rij(x) meet
InOn be uniformly distributed.Its cumulative distribution function similarly can be obtained according to the probability distribution of the variable
Inverse function, that is, fij(x) about the expression formula of x, wherein x value range is [0,1], indicates fij(x) accumulation of each value
Distribution probability.By for each fij(x), value of the random value as x on [0,1] is independently obtained, and passes through fij(x)
Expression formula calculate fij(x) value, thus, it can similarly obtain each matrix element of projection matrix M.
In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, second dimension for r
Degree space is r dimension space, fij(x) meet respectively withParameter probability valuing0、Distribution.Here,
fij(x) it is discrete random variable, can refer to each matrix element that above content similarly obtains projection matrix M.
After obtaining the encryption data matrix, which can be sent to data processing by data providing
Side, to carry out modeling analysis to the encryption data matrix.The encryption data matrix limits multiple points in the second dimensional space,
Also, multiple points that the encryption data matrix limits are respectively corresponded with multiple points that the raw data matrix limits.It is described
In multiple points that the distance between two o'clock in multiple points that encryption data matrix limits is limited with the raw data matrix
The difference of the distance between corresponding two o'clock is related to the disturbance parameter.In one embodiment, above-mentioned to through removing average value processing
Raw data matrix carry out singular value decomposition to obtain intermediate data matrix in the case where, what the encryption data matrix limited
The distance between the corresponding two o'clock in multiple points that the distance between two o'clock in multiple points is limited with the raw data matrix
Difference be approximately w2, wherein the value of w is determined by above-mentioned formula (5) or (6).
Fig. 6 shows a kind of data encryption device 600 according to this specification embodiment.Described device implements difference privacy
Algorithm, comprising:
First acquisition unit 61, is configured to, and obtains raw data matrix, it is empty that the raw data matrix limits the first dimension
Between multiple points, wherein the number of the multiple point correspond to number of users, the point correspond to user feature vector, it is described
The number of dimensions of first dimensional space is the number of dimensions of described eigenvector;
Second acquisition unit 62, is configured to, and obtains the first parameter for limiting the difference privacy algorithm and is used for table
Second parameter of the validity of registration evidence after encryption;
Unit 63 is disturbed, is configured to, intermediate data matrix is obtained, wherein the intermediate data matrix limits first dimension
Multiple points in space are spent, also, multiple points that the intermediate data matrix limits is by limiting the raw data matrix
Fixed multiple points are disturbed respectively and the point that obtains, wherein the disturbance is based on first parameter and second parameter
Offset;And
Projecting cell 64, is configured to, and the intermediate data matrix is multiplied with projection matrix, to obtain encryption data square
Battle array, the projection matrix are used for: multiple points that the intermediate data matrix limits are projected as to minute of second dimensional space
Not corresponding multiple points, and make, the Euclidean distance and described first between any two points in second dimensional space
The ratio of the Euclidean distance between two o'clock is corresponded in dimensional space in a certain range, wherein the dimension of second dimensional space
Base is obtained in first parameter and the second parameter.
In one embodiment, in above-mentioned data encryption device, the disturbance unit 63 further includes following subelement:
Subelement 631 is decomposed, is configured to, singular value decomposition is carried out to the raw data matrix, by the original number
The product of three matrixes is expressed as according to matrix, wherein pair for being located in the middle diagonal matrix in the product of three matrixes
The number of angle member is equal to the number of dimensions of second dimensional space;
It determines subelement 632, is configured to, be based on first parameter and the second parameter, determine disturbance parameter;
Subelement 633 is deviated, is configured to, the disturbance parameter is based on, each diagonal element of the diagonal matrix is carried out
Offset;And
Computation subunit 634, is configured to, the product of three matrixes after calculating offset, using as the intermediate data square
Battle array.
In one embodiment, in above-mentioned data encryption device, the decomposition subelement 631 is additionally configured to, to described
The value of each dimension for each point that raw data matrix limits carries out averaging operation, wherein the mean value is the multiple
Average value of the point in each value of identical dimensional;And the raw data matrix after past averaging operation is carried out unusual
Value is decomposed.
In the Data Encryption Scheme according to this specification embodiment, by being based on difference privacy parameters and data validity
Parameter disturbs initial data, can be with the contradiction between balancing safety and validity, and to safety and can have
Effect property provides stringent quantization and guarantees.Meanwhile initial data is projected using the projection matrix based on J-L lemma, into one
Step ensure that data validity.
Those of ordinary skill in the art should further appreciate that, describe in conjunction with the embodiments described herein
Each exemplary unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clear
Illustrate to Chu the interchangeability of hardware and software, generally describes each exemplary group according to function in the above description
At and step.These functions hold track actually with hardware or software mode, depending on technical solution specific application and set
Count constraint condition.Those of ordinary skill in the art can realize each specific application using distinct methods described
Function, but this realization is it is not considered that exceed scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can hold track with hardware, processor
Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only storage
Device (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology neck
In any other form of storage medium well known in domain.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.