CN110443061A

CN110443061A - A kind of data ciphering method and device

Info

Publication number: CN110443061A
Application number: CN201810413622.5A
Authority: CN
Inventors: 李梁; 周俊; 李小龙
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-05-03
Filing date: 2018-05-03
Publication date: 2019-11-12
Anticipated expiration: 2038-05-03
Also published as: CN110443061B

Abstract

This specification embodiment discloses a kind of data ciphering method and device, which comprises obtains raw data matrix；Obtain the second parameter for limiting the first parameter of the difference privacy algorithm and for indicating the validity of data after encryption；Obtain intermediate data matrix, wherein the intermediate data matrix limits multiple points in first dimensional space, and, multiple points that the intermediate data matrix limits is the points that obtain and disturbing respectively to multiple points that the raw data matrix limits, wherein the disturbance is the offset based on first parameter and second parameter；And the intermediate data matrix is multiplied with projection matrix, to obtain encryption data matrix.

Description

A kind of data ciphering method and device

Technical field

This specification embodiment is related to Internet technical field, more particularly, to a kind of data ciphering method and device.

Background technique

Under internet big data modeling analysis demand, the privacy of user how to be protected to be very important problem.At this Under background, difference privacy technology is more and more applied.Difference privacy is the formalization of a kind of pair of data personal secrets Definition guarantees the information that individual single data are not revealed while carrying out modeling analysis to total data.Difference privacy It is that individual personal secrets are most rationally guaranteed under big data modeling analysis demand.For in difference privacy algorithm with Machine Encryption Algorithm, it will usually consider the validity of data.General data validity thinks that encrypted data are in specific indexes On performance approximately be equal to performance of the initial data in same index.However, between data validity and personal secrets There are certain contradictions, in general, data validity is higher, then personal secrets are relatively poorer；Conversely, personal secrets are better, The validity of data is poorer.Therefore, it is necessary to a kind of more effective Data Encryption Schemes, and allowing to more effectively equilibrium data has Effect property and personal secrets.

Summary of the invention

This specification embodiment is intended to provide a kind of more effective data ciphering method, with solve it is in the prior art not Foot.

To achieve the above object, this specification provides a kind of data ciphering method on one side, and the method implements difference Privacy algorithm, comprising: obtain raw data matrix, the raw data matrix limits multiple points of the first dimensional space, wherein The number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, the dimension of first dimensional space Degree is the number of dimensions of described eigenvector；It obtains for limiting the first parameter of the difference privacy algorithm and for indicating number According to the second parameter of validity after encryption；Intermediate data matrix is obtained, wherein the intermediate data matrix limits described the Multiple points in dimension space, also, multiple points that the intermediate data matrix limits is by the initial data squares Multiple points that battle array limits are disturbed respectively and the point that obtains, wherein the disturbance is based on first parameter and described second The offset of parameter；And the intermediate data matrix is multiplied with projection matrix, to obtain encryption data matrix, the projection square Battle array is used for: multiple points that the intermediate data matrix limits are projected as the corresponding multiple of second dimensional space Point, and making, Euclidean distance between any two points in second dimensional space with it is right in first dimensional space The ratio of the Euclidean distance between two o'clock is answered in a certain range, wherein the number of dimensions of second dimensional space is based on described the One parameter and the second parameter and obtain.

In one embodiment, in above-mentioned data ciphering method, the acquisition intermediate data matrix includes: to the original Beginning data matrix carries out singular value decomposition, the raw data matrix is expressed as to the product of three matrixes, wherein described three The number of the diagonal element for being located in the middle diagonal matrix in the product of a matrix is equal to the number of dimensions of second dimensional space； Based on first parameter and the second parameter, disturbance parameter is determined；Based on the disturbance parameter, to each of described diagonal matrix Diagonal element is deviated；And the product of three matrixes after offset is calculated, using as the intermediate data matrix.

In one embodiment, in above-mentioned data ciphering method, singular value decomposition is carried out to the raw data matrix Including carrying out averaging operation to the value of each dimension of each point of raw data matrix restriction, wherein the mean value For the multiple point each value of identical dimensional average value；And to the initial data square after past averaging operation Battle array carries out singular value decomposition.

In one embodiment, in above-mentioned data ciphering method, the encryption data matrix limits the second dimensional space In multiple points, also, the encryption data matrix limit multiple points and the raw data matrix limit multiple points minute Not Dui Ying, and, the distance between two o'clock in multiple points that the encryption data matrix limits and the raw data matrix The difference of the distance between the correspondence two o'clock in multiple points limited is related to the disturbance parameter.

In one embodiment, in above-mentioned data ciphering method, the projection matrix obtains at random from random matrix, institute The each matrix element for stating random matrix is stochastic variable, and each stochastic variable is mutually indepedent and has same distribution, wherein The random matrix meets: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix.

In one embodiment, in above-mentioned data ciphering method, second dimensional space be r dimension space, it is described with Machine variable meets the Gaussian Profile that desired value is 0, variance is 1/r.

In one embodiment, in above-mentioned data processing method, second dimensional space be r dimension space, it is described with Machine variable meetsOn be uniformly distributed.

In one embodiment, in above-mentioned data ciphering method, second dimensional space be r dimension space, it is described with Machine variable meet respectively withParameter probability valuingDistribution.

In one embodiment, in above-mentioned data ciphering method, the difference privacy algorithm is (ε, δ)-difference privacy Algorithm, first parameter includes ε and δ.

In one embodiment, in above-mentioned data ciphering method, second parameter includes η and ν, and wherein η indicates institute State the point of raw data matrix restriction to the distance between the maximum relative error that occurs after the processing Jing Guo the method, ν represents the maximum probability that the method is made a fault in being performed a plurality of times, and, it is described based on first parameter and the second ginseng Number, the value of the number of dimensions and disturbance parameter that determine the second dimensional space includes determining second dimensional space based on η and ν Number of dimensions.

In one embodiment, described to be based on first parameter and the second parameter in above-mentioned data ciphering method, really The number of dimensions of fixed second dimensional space and the value of disturbance parameter include that the number of dimensions based on ε, δ and second dimensional space is true The value of the fixed disturbance parameter.

On the other hand this specification provides a kind of data encryption device, described device implements difference privacy algorithm, comprising: the One acquiring unit, is configured to, and obtains raw data matrix, and the raw data matrix limits multiple points of the first dimensional space, Wherein, the number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, first dimensional space Number of dimensions be described eigenvector number of dimensions；Second acquisition unit is configured to, and is obtained and is calculated for limiting the difference privacy First parameter of method and the second parameter for indicating the validity of data after encryption；Unit is disturbed, is configured to, is obtained intermediate Data matrix, wherein the intermediate data matrix limits multiple points in first dimensional space, also, the intermediate data Multiple points that matrix limits as the point that is obtained and being disturbed respectively to multiple points that the raw data matrix limits, Described in disturbance be the offset based on first parameter and second parameter；And projecting cell, it is configured to, in described Between data matrix be multiplied with projection matrix, to obtain encryption data matrix, the projection matrix is used for: by the intermediate data square Multiple points that battle array limits are projected as corresponding multiple points of second dimensional space, and make, second dimension The ratio of the Euclidean distance between two o'clock is corresponded in the Euclidean distance and first dimensional space between any two points in space It is worth in a certain range, wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.

In one embodiment, in above-mentioned data encryption device, the disturbance unit further includes following subelement: being decomposed Subelement is configured to, and carries out singular value decomposition to the raw data matrix, the raw data matrix is expressed as three The product of matrix, wherein the number of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes is equal to institute State the number of dimensions of the second dimensional space；It determines subelement, is configured to, be based on first parameter and the second parameter, determine disturbance Parameter；Subelement is deviated, is configured to, the disturbance parameter is based on, each diagonal element of the diagonal matrix is deviated；With And computation subunit, it is configured to, the product of three matrixes after calculating offset, using as the intermediate data matrix.

In one embodiment, in above-mentioned data encryption device, the decomposition subelement is additionally configured to, to described original The value of each dimension for each point that data matrix limits carries out averaging operation, wherein the mean value is that the multiple point exists The average value of each value of identical dimensional；And singular value point is carried out to the raw data matrix after past averaging operation Solution.

In the Data Encryption Scheme according to this specification embodiment, by being based on difference privacy parameters and data validity Parameter disturbs initial data, can be with the contradiction between balancing safety and validity, and to safety and can have Effect property provides stringent quantization and guarantees.Meanwhile initial data is projected using the projection matrix based on J-L lemma, into one Step ensure that data validity.

Detailed description of the invention

This specification embodiment is described in conjunction with the accompanying drawings, and this specification embodiment can be made clearer:

Fig. 1 shows the application scenarios of the data ciphering method according to this specification embodiment；

Fig. 2 shows the flow charts according to the data ciphering method of this specification embodiment；

Fig. 3 shows the flow chart for obtaining the method for intermediate data matrix；

Fig. 4 shows the process to the singular value decomposition of matrix A；

Fig. 5 shows the reduction process in singular value decomposition to diagonal matrix；And

Fig. 6 shows the data encryption device 600 according to this specification embodiment.

Specific embodiment

This specification embodiment is described below in conjunction with attached drawing.

Fig. 1 shows the application scenarios of the data ciphering method according to this specification embodiment.It include multiple in the scene Data providing 11 and data processing side 12.The data providing 11 is, for example, shopping website, social activity APP etc., each data Provider be owned by oneself user group and the user group characteristic.The data processing side 12 usually has big Data-handling capacity is, for example, ant gold clothes.Multiple data providings 11 provide to data processing side 12 encrypted respectively The characteristic of its user, so that data processing side 12 individually builds the encryption data of each data providing 11 respectively Mould and analysis.In this scenario, by using difference privacy technology, so that protecting user hidden while ensureing availability of data It is private.Since the data of the 12 pairs of different data providings 11 in data processing side are individually handled respectively, data providing 11 local initial data can be encrypted after send it to data processing side 12 again, without disclosing in ciphering process Part.And data processing side 12 is after receiving encryption data, and encrypted data can be carried out with modeling analysis, and nothing Method obtains any information relevant to initial data.

In the server of data providing 11, firstly, parameter and data validity parameter based on difference privacy algorithm, The raw data matrix is applied and is disturbed, to obtain intermediate data matrix, then, by by intermediate data matrix and projection Matrix multiple, to obtain encryption data matrix.Wherein, the projection matrix is to meet J-L lemma (Johnson- Lindenstrauss lemma) the projection matrix obtained at random.Processing in this way, so that the encryption data matrix obtained Meet difference privacy standard, also meet data validity standard, thus effectively equilibrium data validity and personal secrets.

Fig. 2 shows the flow charts according to the data ciphering method of this specification embodiment.In one embodiment, the party Method is executed in the server end of data providing, however, this method is not limited to execute in the server end of data providing, example It can such as be executed in third-party server end, the server end in data processing side executes, etc..It is hidden that the method implements difference Private algorithm, comprising the following steps: in step S21, obtain raw data matrix, it is empty that the raw data matrix limits the first dimension Between multiple points, wherein the number of the multiple point correspond to number of users, the point correspond to user feature vector, it is described The number of dimensions of first dimensional space is the number of dimensions of described eigenvector；In step S22, obtain for limiting the difference privacy First parameter of algorithm and the second parameter for indicating the validity of data after encryption；In step S23, intermediate data is obtained Matrix, wherein the intermediate data matrix limits multiple points in first dimensional space, also, the intermediate data matrix The multiple points limited is the points obtained and disturbing respectively to multiple points that the raw data matrix limits, wherein institute Stating disturbance is the offset based on first parameter and second parameter；And in step S24, by the intermediate data matrix It is multiplied, to obtain encryption data matrix, the projection matrix is used for: the intermediate data matrix being limited more with projection matrix A point is projected as corresponding multiple points of second dimensional space, and makes, appointing in second dimensional space The ratio of the Euclidean distance between two o'clock is corresponded in Euclidean distance and first dimensional space between meaning two o'clock in certain model In enclosing, wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.

Firstly, obtaining raw data matrix in step S21, the raw data matrix limits the more of the first dimensional space A point, wherein the number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, first dimension The number of dimensions for spending space is the number of dimensions of described eigenvector.For example, prestoring n row d column in data providing server end Matrix A, so that the matrix A prestored described in obtaining is as raw data matrix.In another embodiment, data providing server End randomly selects the d feature of n user according to the inquiry request at data processing server end from the user data of storage Data form the matrix A of n row d column, using as raw data matrix.

In the raw data matrix of such as n row d column, n for example represents number of users, and quantity is million magnitudes, Qian Wanliang Grade etc..D is, for example, the characteristic of each user, for example, when data providing is shopping website, feature that each user includes For example including gender, age, address, purchase type of merchandize, purchase commodity price, shopping hours etc., quantity be thousand magnitudes, Ten thousand magnitudes etc..Each matrix element A of the matrix A_ijThe characteristic value of j-th of feature of i-th of user is represented, wherein 1≤i≤n, 1≤j≤d.Raw data matrix A can be understood as limiting n point of d dimension space, that is, every row in raw data matrix A can To regard the feature vector of user, the dimension of dimension, that is, user feature vector of d dimension space, the i.e. columns of matrix A as.It can manage Solution, raw data matrix are not limited to the matrix of n row d column, such as can be the matrix of d row n column.

In step S22, obtain for limiting the first parameter of the difference privacy algorithm and for indicating that data are encrypting Second parameter of validity afterwards.In this specification embodiment, the difference privacy algorithm can be selected according to concrete scene demand Various difference privacy algorithms are taken, such as (ε, δ)-difference privacy algorithm, ε-difference privacy algorithm, stochastic difference privacy algorithm etc.. The parameter of each difference privacy algorithm can correspondingly be obtained.For example, for (ε, δ)-difference privacy algorithm A (X), that is, for institute There is satisfaction only discrepant input X and X ' and all possible output on a user characteristics There is the establishment of condition shown in following formula (1):

Wherein, ε has quantified the maximum relative error for a possibility that feature of single data record is by privacy leakage；δ quantization It may by the record percentage of privacy leakage in all records.The value of ε and δ is smaller, then personal secrets are higher.It can The value of ε and δ are determined according to specific application scenarios.For example, indicating the field of multiple characteristic values of n user in raw data matrix ε can be set as the magnitude less than Ln (10), δ can be set as to the magnitude less than 0.1 by Jing Zhong.

The parameter for indicating the validity of encryption data can be determined according to specific application scenarios.For example, in a reality It applies in example, for indicating that the parameter of validity of encryption data includes η and ν, wherein η indicates what the raw data matrix limited Point to the distance between the maximum relative error that occurs after the processing Jing Guo the method, ν represent the method multiple The maximum probability made a fault in execution, wherein the value of η and ν is smaller, then data validity is higher.It can be according to specific application Scene selectes the value of η and ν, for example, in the scene that raw data matrix indicates multiple characteristic values of n user, it can be by η and ν All value is the magnitude less than 0.1.The parameter for indicating encryption data validity is not limited to above-mentioned η and ν, for example, to original Beginning data matrix increases Gauss disturbance can indicate encryption data by the variances sigma of Gauss disturbance in the scene that is encrypted Validity, that is, σ is smaller, and the validity of data is better.

In step S23, intermediate data matrix is obtained, wherein the intermediate data matrix limits in first dimensional space Multiple points, also, multiple points for limiting of the intermediate data matrix is multiple by limiting the raw data matrix The point for being disturbed and being obtained respectively is put, wherein the disturbance is the offset based on first parameter and second parameter.

In one embodiment, the intermediate data matrix is obtained by following steps shown in Fig. 3:

In step S31, singular value decomposition is carried out to the raw data matrix, the raw data matrix is expressed as The product of three matrixes, wherein the number etc. of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes In the number of dimensions of the second dimensional space.

As it is known by the man skilled in the art, giving any m with reference to the process of the singular value decomposition shown in Fig. 4 to matrix A The matrix of row n columnThe singular value decomposition of A can be expressed as the product of three matrixes: A=U Σ V^T.WhereinWithRespectively orthogonal matrix, Σ are diagonal matrix.It is non-on the leading diagonal of diagonal matrix Σ Zero is known as the singular value of matrix A, wherein the singular value arranges from big to small on leading diagonal, with rectangular big in figure It is small to schematically illustrate.

Diagonal line is worked as in singular value decomposition to the reduction process of diagonal matrix with reference in singular value decomposition shown in fig. 5 On singular value be less than predetermined threshold when, it is smaller to the importance of initial matrix A.Therefore, by the way that singular value number to be taken as R gives up the lesser singular value after r-th of singular value, then what the singular value decomposition of matrix A can be of equal value is expressed as A=U Σ V^T, wherein It is the square matrix of singular value for diagonal line.By the reduction process, Data calculation amount can be reduced under the premise of keeping data validity.

In this embodiment, raw data matrix is decomposed into the product of three matrixes by singular value decomposition, wherein in Between diagonal matrix be diagonal matrix after reduction, the number of diagonal element is the number of dimensions of above-mentioned second dimensional space.Institute Stating the second dimensional space is the dimension obtained and the projection matrix by being described below projects the intermediate data matrix Spend space.Wherein the number of dimensions of second dimensional space is based on first parameter and the second parameter and obtains.In a reality It applies in example, second parameter includes η and ν, wherein the number of dimensions of second dimensional space is determined based on η and ν.At one In embodiment, if the number of dimensions of the second dimensional space is r, the value of r is obtained based on following formula (2):

It can get from the formula, the magnitude of r isMagnitude.

In one embodiment, the value of r is more specifically limited based on following formula (3):

In one embodiment, carrying out singular value decomposition to the raw data matrix includes, to the initial data square The value of each dimension for each point that battle array limits carries out averaging operation, wherein the mean value is the multiple point in identical dimension The average value of each value of degree；And singular value decomposition is carried out to the raw data matrix after past averaging operation.Example Such as, raw data matrix is the matrix of n row d column, and wherein n indicates number of users, and d indicates the characteristic of user, to initial data square Battle array carries out averaging operation, that is, carries out operating as shown in formula (4), wherein indicate raw data matrix with A:

1 in formula (4) indicates complete 1 column vector.

By removing averaging operation as shown in formula (4), it is equivalent to and the feature vector of each user has been moved to Near the origin in dimension space, so as to simplify calculating process.It is appreciated that this goes averaging operation to be not required, In In the case where not going mean value, the singular value decomposition similarly can be carried out to raw data matrix.

In step S32, it is based on first parameter and the second parameter, determines disturbance parameter.The disturbance parameter is for true The offset of the fixed diagonal element to the diagonal matrix Σ in above-mentioned steps S31 with by the offset to the diagonal element, and carries out Disturbance to the raw data matrix (or raw data matrix through past mean value).In one embodiment, for going The raw data matrix of value then can determine w by following formula (5) if disturbance parameter is w:

Wherein, r is the number of dimensions of the second dimensional space determined above by formula (2) or (3).

By formula (5) it was determined that the magnitude of w isMagnitude.

In one embodiment, it for removing the raw data matrix of mean value, can more specifically be determined by following formula (6) W:

For the raw data matrix without removing average value processing, can similarly obtain the value of w, for example, based on ε, δ, η and ν is to w value, so that can satisfy difference privacy to the processing of raw data matrix and at the same time protection data validity.

In step S33, it is based on the disturbance parameter, each diagonal element of the diagonal matrix is deviated.This is specific Offset operation can be the offset operation as shown in following formula (7):

Wherein, I_n×dIndicate the unit matrix of n × d.Process shown in formula (7) that is, first done square to the nonzero term of Σ, Then square for adding w, then again to itself and evolution.

In step S34, the product of three matrixes after calculating offset, using as the intermediate data matrix.By to Σ Relevant to disturbance parameter offset is carried out, then calculates the product of above-mentioned formula (7), that is, to raw data matrix (or through going The raw data matrix of average value processing) each single item carried out offset relevant to the disturbance parameter, thus obtain through disturbing Raw data matrix, i.e. intermediate data matrix.Since the intermediate data matrix is simply by the raw data matrix Matrix element deviated and obtained, therefore, the intermediate data matrix still limits multiple points in first dimensional space, That is, the intermediate data matrix is also n × d dimension matrix in the case where raw data matrix is n × d matrix.Also, due to, For each user, each characteristic value of each user passes through offset relevant to the disturbance parameter, therefore, each user Vector sum of the offset in first dimensional space of whole features be the offset for corresponding to the point of the user, also, The offset of user's point is also related to the disturbance parameter.That is, multiple points that the intermediate data matrix limits are logical The point for being disturbed and being obtained respectively to multiple points that the raw data matrix limits is crossed, wherein the disturbance is based on described The offset of first parameter and second parameter.

It is appreciated that the method for obtaining the intermediate data matrix is not limited to above-mentioned unusual in this specification embodiment It is worth the method decomposed, for example, random Gaussian disturbance can be applied by each matrix element to raw data matrix or drawn at random general Lars disturbance, and obtain intermediate data matrix.

Fig. 2 is returned, in step S24, the intermediate data matrix is multiplied with projection matrix, to obtain encryption data square Battle array, the projection matrix are used for: multiple points that the intermediate data matrix limits are projected as to minute of second dimensional space Not corresponding multiple points, and make, the Euclidean distance and described first between any two points in second dimensional space The ratio of the Euclidean distance between two o'clock is corresponded in dimensional space in a certain range, wherein the dimension of second dimensional space Base is obtained in first parameter and the second parameter.

For example, wherein n represents number of users, and d is every when as described above, the raw data matrix is the matrix A of n row d column The characteristic of a user, then intermediate data matrix B is also the matrix of n row d column.To relative to intermediate data matrix, it may be determined that Projection matrix M is d row r column matrix.Wherein r is second dimensional space of the projection matrix M by the intermediate data matrix projection extremely Dimension, be based on first parameter and the second gain of parameter, in one embodiment, can by aforementioned formula (2) or (3) it determines.

By multiplying the right side projection matrix M and raw data matrix A, the encryption data matrix of n row r column, the process can get It is understood that n point of d dimension space to be projected as to n point of r dimension space.It is appreciated that projection matrix does not limit and original number Multiply according to the matrix V right side, for example, projection matrix can be the matrix of r row d column when original matrix V is the matrix of d row n column, passes through By the projection matrix and raw data matrix V premultiplication, the intermediate data matrix of r row n column can get, which can similarly understand For the n point that n point of d dimension space is projected as to r dimension space.

The projection matrix M meets J-L lemma, that is, n point of d dimension space is projected as the n of r dimension space by projection matrix M After a point, the corresponding points in two spaces meet following formula (8):

Wherein λ_JLFor scheduled smaller real number, such as 0 < λ_JL< 0.1, wherein x and y is two points of d dimension space,Square of Euclidean distance between d dimension space midpoint x and point y.XM and yM is to project to r by projection matrix In dimension space with the point x and corresponding two points of point y in d dimension space.For r dimension space midpoint xM Square of Euclidean distance between point yM.It can be obtained by above-mentioned formula (8), the Euclidean distance between point x and point y and point xM and point Euclidean distance between yM differs 1 ± λ_JLThe factor.To between Euclidean distance and point xM between point x and point y and point yM The ratio of Euclidean distance is in a certain range.And due to λ_JLValue is smaller, such as λ_JL=0.05, thus, it is believed that point xM and point yM Between Euclidean distance it is approximate constant compared to the Euclidean distance between point x and point y.Meet the projection of J-L lemma by generating Matrix M, and multiple points in d dimension space are projected as multiple points in r dimension space using projection matrix M, projection can be passed through Matrix M encrypts multiple points in d dimension space, simultaneously as guarantee of the J-L lemma to data validity, so as to The analysis result for multiple points in d dimension space is obtained by multiple points in study r dimension space.In one embodiment In, r < d, so that dimension-reduction treatment has been carried out to multiple points in d dimension space by projection matrix M, so that it is complicated to reduce calculating Degree.

In one embodiment, the projection matrix M for meeting J-L lemma, which can be, meets M^T* the real matrix of M=I, i.e., M is orthogonal matrix, and wherein I is unit matrix.For example, work as n=3, when r=3, i.e. the matrix that M is 3 × 3, M can be following institute The orthogonal matrix shown:

When 3 points of d dimension space (such as d=5) (such as are indicated the feature vector of 3 users by the orthogonal matrix Point) when projecting to 3 of r=3 dimension space, for example, the orthogonal matrix and raw data matrix (3 × 5 matrix) right side are multiplied, It can guarantee that corresponding to the distance between two o'clock in the distance of 3 points between any two and d dimension space in r dimension space is basically unchanged, i.e., Meet J-L lemma.However, since using the real matrix arbitrarily obtained, as projection matrix M, projection matrix M itself is without appointing Therefore what randomness is not contributed the safety of difference privacy algorithm.Here, projection matrix is not limited to square matrix, example Such as, M may be 3 × 2 matrix etc., as long as it meets M^T* M=I.

In one embodiment, the projection matrix M for meeting J-L lemma can be obtained at random from random matrix Q, it is described with Each matrix element of machine matrix Q is respectively each stochastic variable mutually indepedent and with same distribution.The wherein random square Battle array meets: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix, i.e. E (Q^T* Q)=I. For example, random matrix Q can be as follows by stochastic variable f_ij(x) matrix of (i=1,2,3, j=1,2,3) composition:

Wherein each f_ijIt (x) is independent identically distributed stochastic variable, and E (Q^T* Q)=I.When calculating projection matrix M, For each f_ij(x), the random value of x within a predetermined range is independently obtained, for example, the random value in [0,1], then leads to Cross f_ij(x) function calculates f_ij(x) value, to obtain each matrix element of projection matrix M.Here random matrix Q is not limited to Square matrix, for example, Q may be 3 × 2 matrix etc., as long as it meets E (Q^T* Q)=I.By from random matrix Q with Machine obtains projection matrix M, and the randomness by obtaining projection matrix M further increases the safety of difference privacy algorithm.

In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, f for r_ij(x) meet The Gaussian Profile that desired value is 0, variance is 1/r.That is, f_ij(x)~N (0,1/r), that is, f_ij(x) it is distributed for Gauss cumulative probability The inverse function of function, x value range are [0,1], indicate f_ij(x) the Gauss cumulative distribution probability of each value is (from-∞ to the value Probability integral).When calculating projection matrix M, for each f_ij(x), the random value in [0,1] is independently obtained as x's Value, then passes through f_ij(x) expression formula calculates f_ij(x) value, to obtain each matrix element of projection matrix M.

In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, f for r_ij(x) meet InOn be uniformly distributed.Its cumulative distribution function similarly can be obtained according to the probability distribution of the variable Inverse function, that is, f_ij(x) about the expression formula of x, wherein x value range is [0,1], indicates f_ij(x) accumulation of each value Distribution probability.By for each f_ij(x), value of the random value as x on [0,1] is independently obtained, and passes through f_ij(x) Expression formula calculate f_ij(x) value, thus, it can similarly obtain each matrix element of projection matrix M.

In one embodiment, the example as above-mentioned random matrix Q ties up the second dimensional space, second dimension for r Degree space is r dimension space, f_ij(x) meet respectively withParameter probability valuing0、Distribution.Here, f_ij(x) it is discrete random variable, can refer to each matrix element that above content similarly obtains projection matrix M.

After obtaining the encryption data matrix, which can be sent to data processing by data providing Side, to carry out modeling analysis to the encryption data matrix.The encryption data matrix limits multiple points in the second dimensional space, Also, multiple points that the encryption data matrix limits are respectively corresponded with multiple points that the raw data matrix limits.It is described In multiple points that the distance between two o'clock in multiple points that encryption data matrix limits is limited with the raw data matrix The difference of the distance between corresponding two o'clock is related to the disturbance parameter.In one embodiment, above-mentioned to through removing average value processing Raw data matrix carry out singular value decomposition to obtain intermediate data matrix in the case where, what the encryption data matrix limited The distance between the corresponding two o'clock in multiple points that the distance between two o'clock in multiple points is limited with the raw data matrix Difference be approximately w², wherein the value of w is determined by above-mentioned formula (5) or (6).

Fig. 6 shows a kind of data encryption device 600 according to this specification embodiment.Described device implements difference privacy Algorithm, comprising:

First acquisition unit 61, is configured to, and obtains raw data matrix, it is empty that the raw data matrix limits the first dimension Between multiple points, wherein the number of the multiple point correspond to number of users, the point correspond to user feature vector, it is described The number of dimensions of first dimensional space is the number of dimensions of described eigenvector；

Second acquisition unit 62, is configured to, and obtains the first parameter for limiting the difference privacy algorithm and is used for table Second parameter of the validity of registration evidence after encryption；

Unit 63 is disturbed, is configured to, intermediate data matrix is obtained, wherein the intermediate data matrix limits first dimension Multiple points in space are spent, also, multiple points that the intermediate data matrix limits is by limiting the raw data matrix Fixed multiple points are disturbed respectively and the point that obtains, wherein the disturbance is based on first parameter and second parameter Offset；And

Projecting cell 64, is configured to, and the intermediate data matrix is multiplied with projection matrix, to obtain encryption data square Battle array, the projection matrix are used for: multiple points that the intermediate data matrix limits are projected as to minute of second dimensional space Not corresponding multiple points, and make, the Euclidean distance and described first between any two points in second dimensional space The ratio of the Euclidean distance between two o'clock is corresponded in dimensional space in a certain range, wherein the dimension of second dimensional space Base is obtained in first parameter and the second parameter.

In one embodiment, in above-mentioned data encryption device, the disturbance unit 63 further includes following subelement:

Subelement 631 is decomposed, is configured to, singular value decomposition is carried out to the raw data matrix, by the original number The product of three matrixes is expressed as according to matrix, wherein pair for being located in the middle diagonal matrix in the product of three matrixes The number of angle member is equal to the number of dimensions of second dimensional space；

It determines subelement 632, is configured to, be based on first parameter and the second parameter, determine disturbance parameter；

Subelement 633 is deviated, is configured to, the disturbance parameter is based on, each diagonal element of the diagonal matrix is carried out Offset；And

Computation subunit 634, is configured to, the product of three matrixes after calculating offset, using as the intermediate data square Battle array.

In one embodiment, in above-mentioned data encryption device, the decomposition subelement 631 is additionally configured to, to described The value of each dimension for each point that raw data matrix limits carries out averaging operation, wherein the mean value is the multiple Average value of the point in each value of identical dimensional；And the raw data matrix after past averaging operation is carried out unusual Value is decomposed.

Those of ordinary skill in the art should further appreciate that, describe in conjunction with the embodiments described herein Each exemplary unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clear Illustrate to Chu the interchangeability of hardware and software, generally describes each exemplary group according to function in the above description At and step.These functions hold track actually with hardware or software mode, depending on technical solution specific application and set Count constraint condition.Those of ordinary skill in the art can realize each specific application using distinct methods described Function, but this realization is it is not considered that exceed scope of the present application.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can hold track with hardware, processor Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only storage Device (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology neck In any other form of storage medium well known in domain.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of data ciphering method, the method implements difference privacy algorithm, comprising:

Raw data matrix is obtained, the raw data matrix limits multiple points of the first dimensional space, wherein the multiple point Number correspond to number of users, the point correspond to user feature vector, the number of dimensions of first dimensional space is described The number of dimensions of feature vector；

Obtain the first parameter for limiting the difference privacy algorithm and for indicating the validity of data after encryption Two parameters；

Intermediate data matrix is obtained, wherein the intermediate data matrix limits multiple points in the first dimensional space, also, described Multiple points that intermediate data matrix limits is obtain and disturbing respectively to multiple points that the raw data matrix limits The point obtained, wherein the disturbance is the offset based on first parameter and second parameter；And

The intermediate data matrix is multiplied with projection matrix, to obtain encryption data matrix, the projection matrix is used for: by institute The multiple points for stating the restriction of intermediate data matrix are projected as corresponding multiple points of second dimensional space, and make, It is corresponded between two o'clock in the Euclidean distance and first dimensional space between any two points in second dimensional space The ratio of Euclidean distance in a certain range, wherein the number of dimensions of second dimensional space be based on first parameter and second Parameter and obtain.

2. data ciphering method according to claim 1, wherein the acquisition intermediate data matrix includes:

Singular value decomposition is carried out to the raw data matrix, the raw data matrix is expressed as multiplying for three matrixes Product, wherein the number of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes is equal to second dimension Spend the number of dimensions in space；

Based on first parameter and the second parameter, disturbance parameter is determined；

Based on the disturbance parameter, each diagonal element of the diagonal matrix is deviated；And

The product of three matrixes after calculating offset, using as the intermediate data matrix.

3. data ciphering method according to claim 2, wherein carry out singular value decomposition packet to the raw data matrix It includes, averaging operation is carried out to the value of each dimension for each point that the raw data matrix limits, wherein the mean value is Average value of the multiple point in each value of identical dimensional；And to the raw data matrix after past averaging operation Carry out singular value decomposition.

4. data ciphering method according to claim 2, wherein the encryption data matrix limits in the second dimensional space Multiple points, also, the encryption data matrix limit multiple points and the raw data matrix limit multiple points distinguish It is corresponding, and, the distance between two o'clock in multiple points that the encryption data matrix limits is limited with the raw data matrix The difference of the distance between correspondence two o'clock in fixed multiple points is related to the disturbance parameter.

5. data ciphering method according to claim 1, wherein the projection matrix obtains at random from random matrix, institute The each matrix element for stating random matrix is stochastic variable, and each stochastic variable is mutually indepedent and has same distribution, wherein The random matrix meets: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix.

6. data ciphering method according to claim 5, wherein second dimensional space is r dimension space, described random Variable meets the Gaussian Profile that desired value is 0, variance is 1/r.

7. data processing method according to claim 5, wherein second dimensional space is r dimension space, described random Variable meetsOn be uniformly distributed.

8. data ciphering method according to claim 5, wherein second dimensional space is r dimension space, described random Variable meet respectively withParameter probability valuing0、Distribution.

9. data ciphering method according to claim 1, wherein the difference privacy algorithm is the calculation of (ε, δ)-difference privacy Method, first parameter includes ε and δ.

10. data ciphering method according to claim 9, wherein second parameter includes η and ν, and wherein η indicates institute State the point of raw data matrix restriction to the distance between the maximum relative error that occurs after the processing Jing Guo the method, V represents the maximum probability that the method is made a fault in being performed a plurality of times, and, it is described based on first parameter and the second ginseng Number, the value of the number of dimensions and disturbance parameter that determine the second dimensional space includes determining second dimensional space based on η and ν Number of dimensions.

11. data ciphering method according to claim 10, wherein it is described to be based on first parameter and the second parameter, The value of the number of dimensions and disturbance parameter that determine the second dimensional space includes the number of dimensions based on ε, δ and second dimensional space Determine the value of the disturbance parameter.

12. a kind of data encryption device, described device implements difference privacy algorithm, comprising:

First acquisition unit is configured to, and obtains raw data matrix, and the raw data matrix limits the more of the first dimensional space A point, wherein the number of the multiple point corresponds to number of users, and the point corresponds to the feature vector of user, first dimension The number of dimensions for spending space is the number of dimensions of described eigenvector；

Second acquisition unit is configured to, and is obtained for limiting the first parameter of the difference privacy algorithm and for indicating data Second parameter of validity after encryption；

Unit is disturbed, is configured to, intermediate data matrix is obtained, wherein the intermediate data matrix limits in the first dimensional space It is multiple, also, multiple points that the intermediate data matrix limits is multiple points by limiting the raw data matrix The point for being disturbed and being obtained respectively, wherein the disturbance is the offset based on first parameter and second parameter；With And

Projecting cell is configured to, and the intermediate data matrix is multiplied with projection matrix, described to obtain encryption data matrix Projection matrix is used for: multiple points that the intermediate data matrix limits are projected as the corresponding of second dimensional space It is multiple, and make, the Euclidean distance and first dimensional space between any two points in second dimensional space The ratio of Euclidean distance between middle corresponding two o'clock in a certain range, wherein the number of dimensions of second dimensional space be based on institute It states the first parameter and the second parameter and obtains.

13. data encryption device according to claim 12, wherein the disturbance unit further includes following subelement:

Subelement is decomposed, is configured to, singular value decomposition is carried out to the raw data matrix, by the raw data matrix table It is shown as the product of three matrixes, wherein of the diagonal element for being located in the middle diagonal matrix in the product of three matrixes Number is equal to the number of dimensions of second dimensional space；

It determines subelement, is configured to, be based on first parameter and the second parameter, determine disturbance parameter；

Subelement is deviated, is configured to, the disturbance parameter is based on, each diagonal element of the diagonal matrix is deviated；With And

Computation subunit is configured to, the product of three matrixes after calculating offset, using as the intermediate data matrix.

14. data encryption device according to claim 13, wherein the decomposition subelement is additionally configured to, to the original The value of each dimension for each point that beginning data matrix limits carries out averaging operation, wherein the mean value is the multiple point In the average value of each value of identical dimensional；And singular value is carried out to the raw data matrix after past averaging operation It decomposes.

15. data encryption device according to claim 13, wherein the encryption data matrix limits the second dimensional space In multiple points, also, the encryption data matrix limit multiple points and the raw data matrix limit multiple points minute Not Dui Ying, and, the distance between two o'clock in multiple points that the encryption data matrix limits and the raw data matrix The difference of the distance between the correspondence two o'clock in multiple points limited is related to the disturbance parameter.

16. data encryption device according to claim 12, wherein the projection matrix obtains at random from random matrix, Each matrix element of the random matrix is stochastic variable, and each stochastic variable is mutually indepedent and has same distribution, Described in random matrix meet: the desired value of the product of the transposition of the random matrix and the random matrix is unit matrix.

17. data encryption device according to claim 16, wherein second dimensional space be r dimension space, it is described with Machine variable meets the Gaussian Profile that desired value is 0, variance is 1/r.

18. data processing equipment according to claim 16, wherein second dimensional space be r dimension space, it is described with Machine variable meetsOn be uniformly distributed.

19. data encryption device according to claim 16, wherein second dimensional space be r dimension space, it is described with Machine variable meet respectively withParameter probability valuing0、Distribution.

20. data encryption device according to claim 12, wherein the difference privacy algorithm is (ε, δ)-difference privacy Algorithm, first parameter includes ε and δ.

21. data ciphering method according to claim 20, wherein second parameter includes η and ν, and wherein η indicates institute State the point of raw data matrix restriction to the distance between the maximum relative error that occurs after the processing Jing Guo the method, V represents the maximum probability that the method is made a fault in being performed a plurality of times, and, it is described based on first parameter and the second ginseng Number, the value of the number of dimensions and disturbance parameter that determine the second dimensional space includes determining second dimensional space based on η and ν Number of dimensions.

22. data ciphering method according to claim 21, wherein it is described to be based on first parameter and the second parameter, The value of the number of dimensions and disturbance parameter that determine the second dimensional space includes the number of dimensions based on ε, δ and second dimensional space Determine the value of the disturbance parameter.