CN112231745A

CN112231745A - Big data security and privacy protection method based on geometric deformation and storage medium

Info

Publication number: CN112231745A
Application number: CN202010914945.XA
Authority: CN
Inventors: 许杰; 石凯; 张锋军; 李庆华; 牛作元; 朱王小江
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2021-01-15

Abstract

The invention relates to the field of big data processing, and discloses a big data security and privacy protection method and a storage medium based on geometric deformation, wherein the method comprises the following steps of establishing an attribute sensitive set: dividing the attribute of the data into four sets which respectively represent four sensitivity degrees; cleaning data: deleting the incomplete data entries, and serializing the discrete data to obtain data to be divided; dividing data: screening out data corresponding to all attributes in the same sensitive set, and classifying the data into different columns of the same matrix to form a same sensitive data set; geometric deformation: respectively carrying out corresponding translation, scaling, rotation or similar transformation processing on the homosensitive data set, and recording transformation parameters for use in subsequent inverse transformation processing; obtaining a final data set: and transforming the four processed data sets into one data set. The invention can provide a simple, high-efficiency and hierarchical data privacy protection method for the release and transmission of mass data, and the data can be restored through the inverse geometric deformation transformation.

Description

Big data security and privacy protection method based on geometric deformation and storage medium

Technical Field

The invention relates to the technical field of big data processing, in particular to a big data security and privacy protection method and a storage medium based on geometric deformation.

Background

When data sharing exchange, data release and data use are carried out in a big data environment, a method for protecting data privacy in a grading way is provided, attacks of data analysis means such as cluster analysis and the like can be resisted, and the privacy safety problem of big data can be solved from the source of the data. Currently, methods for providing privacy protection in big data environments are mainly: data distortion based techniques, data encryption based techniques, distribution limited based techniques, etc., which have the following features and disadvantages.

(1) The method comprises the following steps: techniques based on data distortion

The method carries out distortion processing on the sensitive data by means of adding noise, introducing random factors, carrying out linear transformation on private vectors and the like so as to achieve the aim of changing the head and the face of original data. This processing method can be done quickly, but it is less secure and affects the data analysis result at the cost of reducing the accuracy of the data, and generally this processing method can only obtain approximate calculation results.

(2) The second method comprises the following steps: techniques based on data encryption

The method adopts a sensitive data encryption mode, and hides sensitive data in a data mining process, and the specific method comprises secure multi-party computing (SMC), distributed anonymization and the like. The method has good effect of resisting distributed data mining and high safety. However, since each sensitive attribute data is encrypted, there are difficulties in data recovery and use and large operation overhead of privacy protection processing before release, and in addition, it is necessary to consider how to protect the key separately while selecting the cryptographic algorithm separately.

(3) The third method comprises the following steps: techniques based on data anonymity

The method selectively releases original data, deletes or modifies the clear identifiers of higher personal information and sensitive data, thereby being incapable of determining specific individuals and realizing privacy protection. The technology comprises measures of generalization, clustering, inhibition and the like. The purpose of data anonymity-based techniques is to ensure that the risk of disclosure of sensitive data and privacy is within a tolerable range, rather than ensuring complete security, and thus all are vulnerable to targeted attacks, thereby resulting in privacy disclosure.

Disclosure of Invention

In order to solve the problems, the invention provides a big data security privacy protection method and a storage medium based on geometric deformation, which can solve the problems that the efficiency of processing sensitive data by using a privacy protection technology and the data security cannot be simultaneously ensured in the traditional method, solve the problem that the data subjected to privacy protection cannot be simply and efficiently recovered and reused according to needs, solve the problem that the privacy protection processing cannot be efficiently and uniformly carried out in three stages of a data source, a data frame and data analysis, realize the privacy protection from local to whole, and also solve the problem that the traditional privacy protection technology based on anonymity only achieves privacy disclosure tolerance and cannot realize complete privacy protection.

The invention relates to a big data security and privacy protection method based on geometric deformation, which comprises the following steps:

establishing an attribute sensitive set: dividing the attributes of the data into four sets which respectively represent four sensitivity degrees, including insensitive, low-sensitivity, medium-sensitivity and high-sensitivity attributes;

cleaning data: deleting the incomplete data entries, and serializing the discrete data to obtain data to be divided;

dividing data: screening out data corresponding to all attributes in the same sensitive set, and classifying the data into different columns of the same matrix to form a same sensitive data set;

geometric deformation: respectively carrying out corresponding translation, scaling, rotation or similar transformation on insensitive, low-sensitivity, medium-sensitivity and high-sensitivity data sets in the same-sensitivity data set, and recording transformation parameters for use in subsequent inverse transformation;

obtaining a final data set: converting the four processed data sets into one data set, namely obtaining data processed by big data privacy protection based on geometric deformation; and if the subsequent data prototype is needed to carry out big data analysis, the original data is obtained by adopting inverse transformation processing.

Further, in the translation transformation process, a formula of translation disturbance is adopted as follows:

X_t＝X_t-1+T，T＝[t_x,t_y]

wherein, X_t＝[x_t,y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the coordinates of the data before transformation, t_xFor horizontal translation, t_yIs the amount of translation in the vertical direction.

Further, the translation transformation process comprises the following sub-steps:

step S1, inputting a privacy attribute set V and a noise set T_addSelecting two privacy attributes A of a privacy attribute set V_jAnd A_j+ k, where k is a predetermined value, the noise set T is selected_addAn additive noise term e of_j；

S2, selecting the privacy attribute pair A_jAnd A_j+ k, and additive noise term e_jForming a matrix;

s3, performing geometric deformation calculation according to a translation disturbance formula: v ← transform (V, T)_add)。

Further, in the scaling transformation process, the formula of scaling disturbance is adopted as follows:

X_t＝sX_t-1

wherein, X_t＝[x_t,y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the data coordinates before transformation and the scalar s represents the uniform scaling amount.

Further, in the rotation transformation processing, a formula of rotation disturbance is adopted as follows:

wherein, X_t＝[x_t,y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Represents the data coordinates before transformation, and θ is the angle of rotation transformation.

Further, in the similarity transformation processing, a formula of similarity disturbance is adopted as follows:

wherein, X_t＝[x_t,y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the coordinates of the data before transformation, theta being the angle parameter of the rotation transformation, t_xFor horizontal translation parameters, t_yFor the translation parameter in the vertical direction, the scalar s represents the uniform scaling parameter; similar perturbations are a combination of rotational, translational and scaling perturbations.

The invention relates to a storage medium, which stores a computer program, wherein the computer program realizes the steps of the big data security and privacy protection method based on geometric deformation when being executed by a processor.

The invention has the beneficial effects that:

(1) the invention provides a method for achieving the purpose of privacy protection by using a method for calculating geometric transformation relation between images in computer vision to interfere data, which can interfere the data from a data source at the bottommost layer of a system, and the interfered data not only makes cluster analysis on the data invalid or obtains an error result in an analysis stage, but also effectively protects the overall data security and privacy of a big data system, and realizes the security and privacy protection from local to overall.

(2) The invention provides a method for establishing a homosensitive set in a self-defined mode, and for different sensitivity requirements, a corresponding geometric transformation mode is adopted for data processing, such as translation, scaling, rotation and the like, so that the operation overhead is reduced according to a sensitivity classification processing mode, and the targeted privacy protection is also ensured.

(3) The invention utilizes the geometric transformation processing, can adopt the corresponding inverse transformation processing according to the requirement, can completely recover the data, has no distortion of the recovered data, and ensures the authenticity of the subsequent big data analysis.

In summary, the invention can provide a simple, efficient and hierarchical data privacy protection method for the release and transmission of mass data, distinguish the attribute sensitivity degree by using the customized attribute sensitivity set, and efficiently realize targeted privacy protection processing by classifying through the geometric deformation technology in the field of digital images. And finally, data can be restored through geometric deformation inverse transformation, and authenticity of big data analysis is guaranteed. Meanwhile, the invention adopts the technology and the algorithm which are easy to realize and relatively mature. The method and the device can be suitable for each stage needing data privacy protection in the big data full life cycle.

Drawings

FIG. 1 is a flow chart of a big data privacy protection method based on geometric deformation of the invention;

FIG. 2 is a data state before perturbation;

FIG. 3 is a data state after perturbation.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

As shown in FIG. 1, the present invention provides a big data security and privacy protection method based on geometric deformation, which includes the following steps:

In an embodiment of the present invention, in the step of establishing the attribute-sensitive set, the insensitive attribute may specifically include "height", "weight", and the like, and the highly sensitive attribute may specifically include "income", "identification number", and the like.

In one embodiment of the present invention, in the step of cleaning data, the incomplete data entry is deleted, and the serialization of the scattered data may specifically be: and (3) rewriting the occupation into the occupation { students, teachers, doctors and nurses }, so as to obtain the data to be divided.

In one embodiment of the present invention, in the translation transformation process, a formula of translation disturbance is adopted as follows:

X_t＝X_t-1+T，T＝[t_x,t_y]

Specifically, when the age and income are disturbed, T [ -3,1000], the data states before and after the shift disturbance are as shown in fig. 2 and 3.

More specifically, the translation transformation process comprises the following sub-steps:

In one embodiment of the present invention, in the scaling transformation process, the formula of the scaling perturbation is adopted as follows:

X_t＝sX_t-1

In one embodiment of the present invention, in the rotation transformation process, the formula of the rotation disturbance is adopted as follows:

In one embodiment of the present invention, in the similarity transformation process, the formula of the similarity perturbation is adopted as follows:

The invention also provides a storage medium storing a computer program, which when executed by a processor implements the steps of the above-mentioned big data security and privacy protection method based on geometric deformation.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A big data security and privacy protection method based on geometric deformation is characterized by comprising the following steps:

2. The big data security and privacy protection method based on geometric deformation according to claim 1, wherein in the translation transformation process, a formula of translation disturbance is adopted as follows:

X_t＝X_t-1+T，T＝[t_x，t_y]

wherein, X_t＝[x_t，y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the coordinates of the data before transformation, t_xFor horizontal translation, t_yIs the amount of translation in the vertical direction.

3. The big data security and privacy protection method based on geometric deformation as claimed in claim 2, wherein the translation transformation process includes the following sub-steps:

4. The big data security and privacy protection method based on geometric deformation according to claim 1, wherein in the scaling transformation process, the formula of scaling disturbance is:

X_t＝sX_t-1

wherein, X_t＝[x_t，y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the data coordinates before transformation and the scalar s represents the uniform scaling amount.

5. The big data security and privacy protection method based on geometric deformation according to claim 1, wherein in the rotation transformation process, a formula of rotation disturbance is adopted as follows:

wherein, X_t＝[x_t，y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Represents the data coordinates before transformation, and θ is the angle of rotation transformation.

6. The big data security and privacy protection method based on geometric deformation according to claim 1, wherein in the similarity transformation process, a formula of similarity perturbation is adopted as follows:

wherein, X_t＝[x_t，y_t]^TCoordinates, X, representing data corresponding to each attribute_t-1Representing the coordinates of the data before transformation, theta being the angle parameter of the rotation transformation, t_xFor horizontal translation parameters, t_yFor the translation parameter in the vertical direction, the scalar s represents the uniform scaling parameter; similar perturbations are a combination of rotational, translational and scaling perturbations.

7. A storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of a geometric-deformation-based big data security and privacy protection method according to any one of claims 1 to 6.