CN113254988B - High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment - Google Patents

High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment Download PDF

Info

Publication number
CN113254988B
CN113254988B CN202110446261.6A CN202110446261A CN113254988B CN 113254988 B CN113254988 B CN 113254988B CN 202110446261 A CN202110446261 A CN 202110446261A CN 113254988 B CN113254988 B CN 113254988B
Authority
CN
China
Prior art keywords
data
matrix
privacy
dimensional
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110446261.6A
Other languages
Chinese (zh)
Other versions
CN113254988A (en
Inventor
赵兴文
洪意阳
李晖
朱辉
寇笑语
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110446261.6A priority Critical patent/CN113254988B/en
Publication of CN113254988A publication Critical patent/CN113254988A/en
Application granted granted Critical
Publication of CN113254988B publication Critical patent/CN113254988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Algebra (AREA)
  • Computer Security & Cryptography (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of information security data release, and discloses a method, a system, a medium and equipment for classified protection and release of high-dimensional sensitive data privacy, wherein the method for classified protection and release of the high-dimensional sensitive data privacy comprises the following steps: the issuing party carries out data acquisition and selects a total privacy protection parameter epsilon; after the data acquisition of the publisher is finished, carrying out data preprocessing; the data publisher evaluates the privacy protection level of the data attribute; the issuing party carries out data privacy noise disturbance; obtaining a k-dimensional low-dimensional matrix by using a sparse matrix transformation method; restoring the low-dimensional matrix to further restore the matrix before equalization; and forming m rows and n columns of matrixes and data tables for privacy protection and noise addition, and forming data for externally releasing a privacy version. According to the method, the operation efficiency of privacy protection release of the mass high-dimensional sensitive data is greatly improved under the condition that privacy protection degree requirements of all attributes of the mass high-dimensional sensitive data set are different; under the same privacy protection degree, the usability of the data is improved.

Description

High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment
Technical Field
The invention belongs to the technical field of information security data release, and particularly relates to a method, a system, a medium and equipment for privacy classified protection release of high-dimensional sensitive data.
Background
At present, with the advent of big data cloud computing, the information age is full of various high-dimensional data at any time. Authority departments related to the fields of medical treatment, civil life, finance, public security and the like often need to issue some data to be provided for a third party to analyze and count. However, the information issued by the authority department often contains extremely sensitive data, and if the original information is directly issued, the sensitive data of the individual sample is likely to be utilized by a third party. When an attacker has strong background knowledge and illegal intent, the disclosure of such sensitive information will have unpredictable consequences.
Data release is an important way of information disclosure, so that ensuring the information security of sensitive data release is an important measure. Traditional sensitive data publishing can adopt a classical Laplace algorithm to directly add noise disturbance, so that information of individual samples cannot be revealed under the condition that the statistical result is relatively approximate. However, the data to be published in the fields of medicine, civil life, finance, public security and the like are often high in dimensionality and high in samples. If the traditional Laplace mechanism plus noise disturbance method is adopted, the data distortion degree is large due to excessive noise caused by overlarge data volume, so that the data availability is poor due to the data distribution, and the public confidence of the data distribution of an authoritative department is reduced. Meanwhile, a large amount of data required to be released often have data required by different sensitivity levels in a large data set, wherein a part of attributes need strong confidentiality even though certain usability is sacrificed; but some of the attributes do not need to be strictly confidential and should still be on the principle of guaranteeing usability and reducing error rates as much as possible. Therefore, it is practical to make different privacy protection schemes according to the respective privacy protection requirements of different attributes.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) The conventional privacy protection mechanism is usually a laplacian scrambling mechanism, which introduces excessive noise when facing a large amount of high-dimensional data, and the problem of how to control the data distortion degree needs to be solved urgently.
(2) The traditional privacy protection algorithm for the high-volume high-dimensional data is low in operation processing speed in the existing hardware resources, and the problems that how to improve the operation processing efficiency of the privacy protection algorithm for the high-volume high-dimensional data under the limited hardware resources and the like need to be solved.
(3) The traditional privacy protection method for the data set does not consider different privacy protection degree requirements required by different attributes, so that some promotion spaces are provided for considering the difference of the privacy protection degrees among the different attributes.
(4) At present, the existing privacy protection method for massive high-dimensional data is usually to convert an original data matrix into a low-dimensional projection matrix, and then add disturbance noise to the projection matrix, so as to recover a data matrix of a privacy protection version with the same scale as the original data matrix. However, this method mostly belongs to adding noise disturbance after computing the projection matrix, and this solution still has a large promotion space in terms of balance of privacy protection and usability.
The difficulty in solving the above problems and defects is: the design of a scheme needs to not only consider the balance of privacy and usability of different attributes in a data set, but also improve the operation efficiency of overall privacy protection and release of a mass of high-dimensional sensitive data sets.
The significance of solving the problems and the defects is as follows: according to the method, the operation efficiency of privacy protection release of the mass high-dimensional sensitive data is greatly improved under the condition that privacy protection degree requirements of all attributes of the mass high-dimensional sensitive data set are different. Meanwhile, compared with the traditional privacy protection processing method, the usability of the data is improved under the same privacy protection degree.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a high-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment, and particularly relates to a high-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment based on block sparse matrix transformation.
The invention is realized in this way, a classified protection and release method for high-dimensional sensitive data privacy, which comprises the following steps:
receiving m data samples in total of input original data, wherein the dimension of each sample is n-dimension, and the privacy protection parameter needing to be set is epsilon; and outputting the data of the privacy protection version as m data samples, wherein the dimensionality of each sample is n-dimension, and the disturbance data of the privacy protection version is used as the data which is finally published to the outside in a public way.
The privacy classified protection publishing method of the high-dimensional sensitive data comprises seven stages: the method comprises a data acquisition stage, a data preprocessing stage, a data privacy protection level evaluation stage, a data disturbance stage, a data transformation stage, a data restoration stage and a data release stage.
Further, the privacy classification protection publishing method of the high-dimensional sensitive data comprises the following steps:
step one, data acquisition: the issuing party performs data acquisition and selects a proper total privacy protection parameter epsilon; data collection is the basis for subsequent steps. While the corresponding parameters are selected.
Step two, data preprocessing: after the data acquisition of the publisher is finished, carrying out data preprocessing; scanning each sample respectively, and if n attribute values of the samples have a certain attribute as a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n; and (4) processing the original data in the step one to form a data matrix.
Step three, evaluating the data privacy protection level: the data publisher evaluates the privacy protection level of the data attributes, and then rearranges the attributes; and dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked. The data matrix is divided into blocks first, so that the processing time for directly calculating the original covariance matrix in the fourth step is avoided to be too long. And meanwhile, marking the sensitivity level of the block matrix, and laying a cushion for subsequent privacy pre-calculation distribution.
Step four, data disturbance: adding Wishart privacy noise disturbance to a covariance matrix of a block matrix by a publisher; and carrying out data privacy protection noise disturbance to protect the original information from being leaked.
Step five, data transformation: obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p; by using the sparse matrix transformation method, the operation processing efficiency of matrix dimension reduction can be further greatly improved.
Step six, data restoration: restoring the low-dimensional matrix to further restore the matrix before equalization; the original form of the matrix is restored.
Step seven, data release: splicing m rows and p columns of block matrixes formed by restoring all privacy noise disturbed low-dimensional matrixes to form m rows and n columns of matrixes for privacy protection and noise addition, and adding corresponding attribute names to a data table header; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version. And on the basis of the sixth step, performing corresponding data processing, including recovering the data matrix before equalization, and removing the columns filled with 0 in the last block. And finally, adding the header name to form a complete data release table.
Further, the data acquisition stage comprises a data collection stage and a parameter selection stage.
A data collection stage, namely collecting the data of each individual sample, determining specific values of n attributes in each sample, and updating and calculating the value m of the total number of the individual samples in the data set in real time;
in the parameter selection stage, a privacy protection parameter epsilon represents the budget of privacy allocation, and if epsilon is larger, the smaller the privacy protection degree is, and the stronger the data availability is; the smaller epsilon, the greater the degree of privacy protection and the poorer the usability of the data. Therefore, the value of epsilon needs to be continuously adjusted according to the actual privacy protection requirement, and finally, a proper total privacy protection parameter epsilon is determined according to the actual privacy protection requirement.
Further, the data privacy protection level evaluation stage comprises an attribute sensitivity level evaluation stage, an attribute rearrangement stage, a data set division stage and a blocking sensitivity level marking stage.
The attribute sensitivity level evaluation stage is used for evaluating the sensitivity level of each row of attributes, and each row of levels are labeled according to relative high, medium and low;
and in the attribute rearrangement stage, after the sensitivity levels are marked, the matrix is arranged according to the sensitivity level of each attribute, and if the sensitivity level of the attribute is higher, the arranged dimensionality is closer to the front, and the dimensionality is lower. After all attributes are arranged according to the sensitivity levels, a new matrix with m rows and n columns is formed again;
in the data set division stage, all attributes are arranged according to the sensitivity level to form a new matrix with m rows and n columns, and then partitioning is carried out according to the dimension threshold value as p; filling the last block matrix with the dimension less than p with a value of 0 to form a p-dimensional filling matrix, wherein p is less than or equal to 10;
and a blocking sensitivity level marking stage, wherein level marking is carried out on the p-dimensional blocking matrixes of m rows and p columns, and the sensitivity levels of the attributes are sorted and divided according to high, medium and low levels in the attribute rearranging stage and the data set dividing stage in the last two stages, so that the integral privacy protection degree of the divided blocking matrixes is divided into three levels of high, medium and low levels.
At this point, a complete m rows and n columns are divided to form a plurality of new block matrixes X with m rows and p columns, and the block matrixes are divided into three levels, namely high, medium and low levels according to the requirements of different privacy protection degrees.
Further, the data perturbation module comprises: the method comprises a data averaging stage, a covariance matrix calculation stage, a privacy budget parameter allocation stage, a privacy noise extraction stage and a data privacy noise adding stage.
In the data equalization stage, the average value of the row is subtracted from each numerical value to form an equalized matrix X;
a stage of calculating covariance matrix, namely calculating covariance matrix A of X after averaging each p-dimensional block matrix,
Figure GDA0003256820500000051
a privacy budget parameter distribution stage according to different sensitive attributesThe strength of the privacy protection level of sex allocates different privacy protection requirements; for the differential privacy protection technology, the smaller the privacy protection parameter is, the higher the corresponding privacy protection level is, and the lower the data availability is; and (3) according to the proportion of 1, the privacy budget is divided into a high sensitivity attribute column, a medium sensitivity attribute column and a low sensitivity attribute column: 9: a ratio of 90 matches the privacy budget for each attribute; summing the privacy budgets of all attribute columns in each block matrix to obtain the privacy budget epsilon of the block matrix i
In the privacy noise extraction stage, for a block matrix with m rows and p columns, a positive definite matrix C with m rows and m columns is generated; the positive definite matrix C satisfies that m eigenvalues are all equal, and all eigenvalues are set as
Figure GDA0003256820500000052
Extracting a noise sample matrix W with m rows and p columns from the Wishart distribution W (m +1,C);
and a data privacy and noise adding stage, namely adding the noise sample matrix W to a covariance matrix A to form a covariance matrix A 'after scrambling noise, namely A' = A + W.
Further, the data restoration phase comprises a low-dimensional matrix restoration phase and an averaging restoration phase.
In the low-dimensional matrix restoration stage, a feature vector matrix is multiplied by a k-dimensional low-dimensional matrix to obtain a restoration matrix, but the restoration matrix is not subjected to averaging;
and in the equalization restoration stage, adding the average value of the column where the corresponding original matrix is positioned to each element of the restoration matrix to obtain an equalized and restored matrix, and removing the attribute column filled with 0 before in the last block.
Another objective of the present invention is to provide a hierarchical protection and release system for high-dimensional sensitive data privacy, which applies the hierarchical protection and release method for high-dimensional sensitive data privacy, and the hierarchical protection and release system for high-dimensional sensitive data privacy includes:
the data acquisition module is used for acquiring data by a publisher and selecting a proper total privacy protection parameter epsilon;
the data preprocessing module is used for preprocessing the data after the data acquisition of the publisher is finished; scanning each sample respectively, and if some attribute of n attribute values of the samples is a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
the data privacy protection level evaluation module is used for evaluating the privacy protection level of the data attribute by the data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
the data disturbance module is used for adding Wishart privacy noise disturbance to the covariance matrix of the block matrix by the publisher;
the data transformation module is used for obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
the data recovery module is used for recovering the low-dimensional matrix so as to recover the matrix before equalization;
the data publishing module is used for splicing m rows and p columns of block matrixes formed by restoring all low-dimensional matrixes disturbed by the privacy noise to form m rows and n columns of matrixes for privacy protection and noise addition, and adding corresponding attribute names to the head of a data table; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
data acquisition, namely acquiring data by a publisher, and selecting a proper total privacy protection parameter epsilon;
data preprocessing, namely preprocessing the data after the data acquisition of the issuing party is finished; scanning each sample respectively, and if some attribute of n attribute values of the samples is a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
evaluating the privacy protection level of the data, and evaluating the privacy protection level of the data attribute by a data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
data disturbance, which is used for adding Wishart privacy noise disturbance to the covariance matrix of the block matrix by the publisher;
data transformation, namely obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
data recovery, namely recovering the low-dimensional matrix so as to recover the matrix before equalization;
data release, namely splicing all m rows and p columns of block matrixes subjected to privacy protection noise scrambling and restoration to form m rows and n columns of matrixes subjected to privacy protection and noise addition, and adding corresponding attribute names to a data table header; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
data acquisition, namely, a publisher acquires data and selects a proper total privacy protection parameter epsilon;
data preprocessing, namely preprocessing the data after the data acquisition of the issuing party is finished; scanning each sample respectively, and if n attribute values of the samples have a certain attribute as a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
evaluating the privacy protection level of the data, and evaluating the privacy protection level of the data attribute by a data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
data disturbance, which is used for adding Wishart privacy noise disturbance to the covariance matrix of the block matrix by the publisher;
data transformation, namely obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
data recovery, namely recovering the low-dimensional matrix so as to recover the matrix before equalization;
data publishing, namely splicing m rows and p columns of block matrixes formed by restoring all low-dimensional matrixes disturbed by privacy noise to form m rows and n columns of matrixes with privacy protection and noise, and adding corresponding attribute names to the head of a data table; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
Another objective of the present invention is to provide an information data processing terminal, which is used for implementing the hierarchical protection and release system for privacy of high-dimensional sensitive data.
By combining all the technical schemes, the invention has the advantages and positive effects that: according to the high-dimensional sensitive data privacy classified protection publishing method based on the block sparse matrix transformation, the operation efficiency of the privacy protection publishing of the mass high-dimensional sensitive data is greatly improved under the condition that different privacy protection degree requirements of all attributes of the mass high-dimensional sensitive data set are considered. Meanwhile, compared with the traditional privacy protection processing method, the usability of the data is improved under the same privacy protection degree.
The invention provides a high-dimensional data privacy classification protection processing scheme based on the block sparse matrix transformation, which fully utilizes the high-speed dimension reduction characteristic of the block sparse matrix transformation and aims at the safety risk possibly suffered by direct data release and different privacy protection levels required by different types of attributes. Meanwhile, a differential privacy technology is introduced into the high-dimensional sensitive data privacy protection issuing method, so that the large-scale and high-dimensional sensitive data are safely issued in a grading manner, and strong background knowledge attack of an attacker can be resisted. Meanwhile, the block sparse matrix transformation method provided by the invention can greatly improve the privacy protection operation speed of mass data, so that the distortion degree of the data is reduced under the condition of providing the same privacy protection degree as that of the traditional method. The method can be suitable for meeting the privacy grading protection data release requirements of mass high-dimensional sensitive information data in the fields of medical treatment, civil life, finance, public security and the like.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a privacy-preserving and hierarchical-protecting publishing method for high-dimensional sensitive data according to an embodiment of the present invention.
FIG. 2 is a block diagram of a system for privacy-preserving and hierarchical-protecting distribution of high-dimensional sensitive data according to an embodiment of the present invention;
in the figure: 1. a data acquisition module; 2. a data preprocessing module; 3. a data privacy protection level evaluation module; 4. a data perturbation module; 5. a data transformation module; 6. a data restoration module; 7. and a data publishing module.
Fig. 3 is a schematic application scenario diagram of a high-dimensional sensitive data privacy classification protection publishing method according to an embodiment of the present invention.
Fig. 4 is a flowchart of example data processing of a privacy-preserving publishing method for high-dimensional sensitive data according to an embodiment of the present invention.
Fig. 5 is a schematic diagram comparing a conventional differential privacy laplacian-plus-noise mechanism algorithm with the high-dimensional sensitive data privacy classification protection publishing method according to the present invention.
FIG. 6 is a schematic diagram of a comparative experiment provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, a medium, and a device for privacy-based hierarchical protection and release of high-dimensional sensitive data, which are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the privacy-classified protection publishing method for high-dimensional sensitive data according to the embodiment of the present invention includes the following steps:
s101, data acquisition: the issuing party collects data and selects proper total privacy protection parameters;
s102, data preprocessing: after the data acquisition of the publisher is finished, carrying out data preprocessing;
s103, evaluating the data privacy protection level: the data publisher evaluates the privacy protection level of the data attribute;
s104, data disturbance: the issuing party carries out data privacy noise disturbance;
s105, data transformation: obtaining a k-dimensional low-dimensional matrix by using a sparse matrix transformation method;
s106, data restoration: restoring the low-dimensional matrix to further restore the matrix before equalization;
s107, data release: and forming m rows and n columns of matrixes and data tables for privacy protection and noise addition, and forming data for externally releasing a privacy version.
As shown in fig. 2, the privacy-preserving and hierarchical-protecting publishing system for high-dimensional sensitive data according to an embodiment of the present invention includes:
the data acquisition module 1 is used for acquiring data by a publisher and selecting a proper total privacy protection parameter epsilon;
the data preprocessing module 2 is used for preprocessing the data after the data acquisition of the publisher is finished; scanning each sample respectively, and if n attribute values of the samples have a certain attribute as a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
the data privacy protection level evaluation module 3 is used for evaluating the privacy protection level of the data attribute by the data issuing party so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
the data disturbance module 4 is used for adding Wishart privacy noise disturbance to the covariance matrix of the block matrix by the publisher;
the data transformation module 5 is used for obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
the data recovery module 6 is used for recovering the low-dimensional matrix so as to recover the matrix before equalization;
the data release module 7 is configured to splice m rows and p columns of block matrices formed by restoring all low-dimensional matrices disturbed by the privacy noise to form m rows and n columns of matrices for privacy protection and noise addition, and add corresponding attribute names to a data table header; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
An application scenario diagram of the high-dimensional sensitive data privacy classification protection publishing method provided by the embodiment of the invention is shown in fig. 3.
The technical solution of the present invention is further described below with reference to examples.
The invention aims to provide a high-dimensional sensitive data privacy protection issuing method based on block sparse matrix transformation aiming at the problems of insufficient privacy protection effect, overlong operation time and the like of issuing of the existing massive high-dimensional sensitive data, so that the high efficiency of operation processing during issuing of the massive high-dimensional sensitive data and the balance between privacy protection and data errors are realized.
As shown in FIG. 4, the invention relates to a high-dimensional sensitive data privacy protection publishing method based on block sparse matrix transformation. Receiving m data samples in total of input original data, wherein the dimension of each sample is n-dimension, and the privacy protection parameter needing to be set is epsilon; and outputting the data of the privacy protection version as m data samples, wherein the dimensionality of each sample is n-dimension, and the disturbance data of the privacy protection version is used as the data which is finally published to the outside in a public way. The whole process comprises seven major stages: the method comprises a data acquisition stage, a data preprocessing stage, a data privacy protection level evaluation stage, a data disturbance stage, a data transformation stage, a data restoration stage and a data release stage. Specifically, each stage is as follows:
(1) The data acquisition stage comprises two sub-stages: a data collection stage and a parameter selection stage.
A data collection stage: and collecting the data of each individual sample, determining specific values of n attributes in each sample, and updating the value m for calculating the total number of the individual samples in the data set in real time.
A parameter selection stage: the privacy protection parameter epsilon represents the budget of privacy allocation, if epsilon is larger, the smaller the privacy protection degree is, and the stronger the data availability is; the smaller epsilon, the greater the degree of privacy protection and the poorer the usability of the data. Therefore, the value of epsilon needs to be continuously adjusted according to the actual privacy protection requirement, and finally, a proper total privacy protection parameter epsilon is determined according to the actual privacy protection requirement.
(2) A data preprocessing stage: and respectively scanning each sample, and if n attribute values of the samples have a certain attribute as a null value, filling the n attribute values with 0 to ensure that each attribute of the n dimension has a numerical value. All data are arranged into a matrix with m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n.
(3) The evaluation stage of the data privacy protection level comprises four sub-stages: the method comprises an attribute sensitivity level evaluation stage, an attribute rearrangement stage, a data set division stage and a blocking sensitivity level marking stage.
And (3) an attribute sensitivity level evaluation stage: and evaluating the sensitivity level of each column of attributes, wherein each column of levels is marked according to relative high, medium and low.
Attribute rearrangement stage: after the sensitivity levels are labeled, the matrix is arranged according to the sensitivity level of each attribute, and if the sensitivity level of the attribute is higher, the arranged dimension is more advanced (the dimension is lower). After all attributes are arranged according to the sensitivity level, a new matrix with m rows and n columns is formed again.
A data set division stage: since the matrix dimension value n after being arranged according to the sensitivity level is usually large according to the attribute, the operation efficiency of calculating the covariance matrix in the subsequent data processing stage is greatly influenced, for example, the dimension number in the current census data table is very large, each attribute class is usually hundreds, and the operation efficiency will be extremely influenced undoubtedly for calculating the covariance matrix. Therefore, dividing the n-dimensional matrix arranged according to the sensitivity level into a plurality of relatively low-dimensional matrices is of great significance for improving the operating efficiency of subsequent covariance processing. The effect after partitioning should control the dimension of each block matrix within 10. Therefore, all attributes are arranged according to the sensitivity level to form a new matrix with m rows and n columns, and then partitioning is carried out according to the dimension threshold value p (p is less than or equal to 10). And filling the block matrix with the last block dimension less than p by using a value 0 to form a filling matrix with the dimension p.
And a blocking sensitivity level marking stage: and (3) carrying out grade marking on the p-dimensional block matrixes of m rows and p columns, wherein in the last two stages, the attribute rearrangement stage and the data set division stage sort and divide the sensitivity grade of the attribute according to high, medium and low. Therefore, the sensitivity level of the block matrix at this stage can be easily divided into three levels, namely, the high, medium and low levels, i.e., the overall privacy protection degree of the divided block matrix is divided into three levels, namely, the high, medium and low levels.
At this point, a complete m rows and n columns are divided to form a new block matrix X with a plurality of m rows and p columns, and the block matrix is divided into three levels of high, medium and low according to the requirements of different privacy protection degrees.
(4) The data perturbation phase comprises five sub-phases: the method comprises a data averaging stage, a covariance matrix calculation stage, a privacy budget parameter allocation stage, a privacy noise extraction stage and a data privacy noise adding stage.
Data equalization stage: to avoid the effect of the dimension, the mean value of the column is subtracted from each value, and an averaged matrix X is formed.
A stage of calculating covariance matrix, namely calculating covariance matrix A of X after averaging each p-dimensional block matrix,
Figure GDA0003256820500000131
a privacy budget parameter allocation phase: and distributing different privacy protection requirements according to the privacy protection grade strengths of different sensitive attributes. For differential privacy protection techniques, the smaller the privacy protection parameter, the higher the corresponding privacy protection level, and the lower the availability of data. Therefore, the method for allocating at this stage is to allocate the privacy budget according to the proportion of 1:9: the scale of 90 matches the privacy budget for each attribute. Then, summing the privacy budgets of all attribute columns in each block matrix to obtain the privacy budget epsilon of the block matrix i
A privacy noise extraction stage: since the covariance matrix has the characteristics of symmetry and semi-positive definite, the disturbance noise added to the covariance matrix needs to satisfy the two properties. In particular, for m rows and p columns of blocking momentsFor the matrix, firstly, a positive definite matrix C with m rows and m columns is generated, the positive definite matrix C is required to satisfy the condition that m eigenvalues are equal, and all eigenvalues are set as
Figure GDA0003256820500000132
And extracting a noise sample matrix W with m rows and p columns from the Wishart distribution W (m +1,C). This ensures that the entire blocking matrix with noise perturbation added satisfies (e, 0) -differential privacy, i.e. for adjacent blocking matrices (two blocking matrices differ only by one element), their outputs are almost identical after noise perturbation. Therefore, even if an attacker has all information except a certain element, the specific value of the element cannot be known, and the strong privacy protection characteristic is achieved.
Data privacy noise adding stage: the noise sample matrix W in this block is added to the covariance matrix a, and the covariance matrix a 'after scrambling noise is formed, i.e., a' = a + W.
(5) A data transformation stage:
and constructing a projection operator by adopting an algorithm of sparse matrix transformation based on eigenvector estimation. While the orthogonal feature matrix of the sparse matrix transform can be represented as a series of finite consecutive Gives rotation multiplications. The specific implementation of sparse matrix transformation is as follows:
for h=1to H:
(1) adjust i, j to satisfy
Figure GDA0003256820500000141
Reaches a maximum value
(2) Calculating an angle
Figure GDA0003256820500000142
(3) Sparse matrix G for each round h Are all coordinate (i) h ,j h ) Performing a Givens rotation transformation, G h =I+Θ(i h ,j hh ) Where I is the identity matrix.
(4) Computing
Figure GDA0003256820500000143
(5) Calculating out
Figure GDA0003256820500000144
Wherein, an iterative method is adopted to obtain G of each step h The iteration times H can be obtained by adopting a cross validation mode, and then the feature vector matrix is obtained by utilizing the sparse matrix transformation method
Figure GDA0003256820500000145
And calculate
Figure GDA0003256820500000146
And obtaining the characteristic values, and then arranging the characteristic vectors from large to small according to the corresponding characteristic values. And computing a sum of eigenvalues
Figure GDA0003256820500000147
The cumulative contribution rate of the eigenvector and the first k maximum values in the eigenvalue matrix of more than 95% are combined into Λ = { λ = [ ] 12 ,......,λ k Get the eigenvector matrix E corresponding to the eigenvalue k Generally, the more front the principal component is, the more abundant the information it contains. And then transposing the characteristic vector and multiplying the transposed characteristic vector and the averaged matrix to obtain a k-dimensional (k is less than or equal to p) low-dimensional matrix.
(6) And (3) data recovery stage: the method comprises two sub-stages, namely a low-dimensional matrix recovery stage and an equalization recovery stage.
And (3) restoring the low-dimensional matrix: and multiplying the characteristic vector matrix by the k-dimensional low-dimensional matrix to obtain a recovery matrix. But the restoration matrix at this time is not yet de-equalized.
Equalization restoration stage: and adding the average value of the column where the corresponding original matrix is located to each element of the restored matrix to obtain the matrix after the equalization restoration.
(7) And (3) a data release stage: and splicing m rows and p columns of block matrixes formed by restoring all the low-dimensional matrixes disturbed by the privacy noise to form m rows and n columns of matrixes with privacy protection and noise, adding corresponding attribute names to the head of the data table, and adjusting and restoring according to the attribute arrangement sequence of the original data table to form a complete data table. So far, the data of the external release privacy version can be formed.
Fig. 5 is a comparison between a conventional differential privacy laplacian-plus-noise mechanism algorithm and the block sparse matrix transform-based high-dimensional sensitive data privacy protection distribution method provided by the present invention, and when privacy protection budgets of the same attribute in the two methods are controlled to be equal, an approximate error rate comparison is performed.
Fig. 6 is a comparative experiment provided by an embodiment of the present invention, which aims to examine the influence on the processing speed of the privacy operation after the blocking. Specifically, the other dependent variables are controlled to be unchanged, and only one variable of 'whether a partitioning scheme is adopted' is changed to compare the operation efficiency between the two variables. Whereas the run time of the conventional non-partitioned matrix is 1893 × 10 -4 And the conclusion can be drawn from the second, and the blocking scheme greatly improves the privacy protection operation processing efficiency.
TABLE 1 approximate error Rate comparison Table
Privacy preserving budget Error rate of the conventional Laplace method Error rate of the invention
0.1 1670% 152%
0.5 580% 98%
1 180% 46%
10 35% 5%
100 12% 0.90%
TABLE 2 run-time comparison of banking matrices
Number of blocks Blocking method runtime of the present invention/10 -4 Second of
5 26
6 20
7 18
8 16
9 22
10 26
Whereas the run time of the conventional non-partitioned matrix is 1893 × 10 -4 And the conclusion can be drawn from the second, and the blocking scheme greatly improves the privacy protection operation processing efficiency.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the procedures or functions according to the embodiments of the present invention are wholly or partially generated. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A classified protection and release method for privacy of high-dimensional sensitive data is characterized by comprising the following steps:
receiving m data samples in total of input original data, wherein the dimension of each sample is n-dimension, and the privacy protection parameter needing to be set is epsilon; outputting data of the privacy protection version as m data samples, wherein the dimension of each sample is n-dimension, and taking the disturbance data of the privacy protection version as the data which is finally published to the outside;
the classified protection and release method for the privacy of the high-dimensional sensitive data comprises seven major stages: the method comprises a data acquisition stage, a data preprocessing stage, a data privacy protection level evaluation stage, a data disturbance stage, a data transformation stage, a data restoration stage and a data release stage;
the privacy classified protection publishing method of the high-dimensional sensitive data comprises the following steps:
step one, data acquisition: the issuing party performs data acquisition and selects a proper total privacy protection parameter epsilon;
step two, data preprocessing: after the data acquisition of the publisher is finished, carrying out data preprocessing; scanning each sample respectively, and if n attribute values of the samples have a certain attribute as a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
step three, evaluating the data privacy protection level: the data publisher evaluates the privacy protection level of the data attributes, and then rearranges the attributes; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
step four, data disturbance: adding Wishart privacy noise disturbance to a covariance matrix of a block matrix by a publisher;
step five, data transformation: obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
step six, data restoration: restoring the low-dimensional matrix to further restore the matrix before equalization;
step seven, data release: splicing m rows and p columns of block matrixes formed by restoring all privacy noise disturbed low-dimensional matrixes to form m rows and n columns of matrixes with privacy protection and noise, and adding corresponding attribute names to the head of a data table; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
2. The privacy-based hierarchical protection and release method for high-dimensional sensitive data according to claim 1, wherein the data acquisition phase comprises a data collection phase and a parameter selection phase;
a data collection stage, namely collecting the data of each individual sample, determining specific values of n attributes in each sample, and updating and calculating the value m of the total number of the individual samples in the data set in real time;
in the parameter selection stage, a privacy protection parameter epsilon represents the budget of privacy allocation, and if epsilon is larger, the smaller the privacy protection degree is, and the stronger the data availability is; if epsilon is smaller, the greater the degree of privacy protection, the poorer the usability of data; therefore, the value of epsilon needs to be continuously adjusted according to the actual privacy protection requirement, and finally, a proper total privacy protection parameter epsilon is determined according to the actual privacy protection requirement.
3. The method for privacy-preserving and-protecting issuance of high-dimensional sensitive data according to claim 1, wherein the data privacy-preserving level evaluation phase comprises an attribute sensitivity level evaluation phase, an attribute rearrangement phase, a data set division phase, and a blocking sensitivity level labeling phase;
an attribute sensitivity level evaluation stage, wherein the sensitivity level of each row of attributes is evaluated, and each row of levels are labeled according to relative high, medium and low;
in the attribute rearrangement stage, after the sensitivity levels are marked, the matrix is arranged according to the sensitivity level of each attribute, and if the sensitivity level of the attribute is higher, the arranged dimensionality is closer to the front, and the dimensionality is lower; after all attributes are arranged according to the sensitivity levels, a new matrix with m rows and n columns is formed again;
in the data set division stage, all attributes are arranged according to the sensitivity level to form a new matrix with m rows and n columns, and then blocking is carried out according to the dimension threshold value as p; filling the block matrix with the last block dimension less than p by using a numerical value 0 to form a p-dimension filling matrix, wherein p is less than or equal to 10;
in the blocking sensitivity level marking stage, level marking is carried out on a p-dimensional blocking matrix of m rows and p columns, and since the sensitivity levels of the attributes are sorted and divided according to high, medium and low in the attribute rearranging stage and the data set dividing stage in the last two stages, the integral privacy protection degree of the divided blocking matrix is divided into three levels of high, medium and low;
at this point, a complete m rows and n columns are divided to form a new block matrix X with a plurality of m rows and p columns, and the block matrix is divided into three levels of high, medium and low according to the requirements of different privacy protection degrees.
4. The privacy-preserving publication method for high-dimensional sensitive data according to claim 1, wherein the data perturbation module comprises: a data averaging stage, a covariance matrix calculation stage, a privacy budget parameter allocation stage, a privacy noise extraction stage and a data privacy noise adding stage;
in the data equalization stage, subtracting the average value of the columns of the block matrix X where the element is positioned from each numerical value to form an equalized matrix X;
a stage of calculating covariance matrix, namely calculating covariance matrix A of X after averaging each p-dimensional block matrix,
Figure FDA0003820315310000031
in the privacy budget parameter distribution stage, different privacy protection budget parameters are distributed according to the privacy protection level strengths of different sensitive attributes; for the differential privacy protection technology, the smaller the privacy protection parameter is, the higher the corresponding privacy protection level is, and the lower the data availability is; the privacy budget is calculated according to the proportion of a high sensitive attribute column, a medium sensitive attribute column and a low sensitive attribute column of 1:9: a ratio of 90 matches the privacy budget for each attribute; summing the privacy budgets of all attribute columns in each block matrix to obtain the privacy budget epsilon of the block matrix i
In the privacy noise extraction stage, for a block matrix with m rows and p columns, a positive definite matrix C with m rows and m columns is generated; the positive definite matrix C satisfies that m eigenvalues are all equal, and all eigenvalues are set as
Figure FDA0003820315310000032
Extracting a noise sample matrix W with m rows and p columns from the Wishart distribution W (m +1,C);
and a data privacy and noise adding stage, namely adding the noise sample matrix W to a covariance matrix A to form a covariance matrix A 'after scrambling noise, namely A' = A + W.
5. The privacy-preserving and publishing method for high-dimensional sensitive data according to claim 1, wherein the data recovery phase comprises a low-dimensional matrix recovery phase and an averaging recovery phase;
in the low-dimensional matrix restoration stage, a feature vector matrix is multiplied by a k-dimensional low-dimensional matrix to obtain a restoration matrix, but the restoration matrix is not subjected to averaging;
and in the equalization restoration stage, adding the average value of the column where the corresponding original matrix is positioned to each element of the restoration matrix to obtain an equalized and restored matrix, and removing the column filled with 0 before in the last block.
6. A privacy-preserving and-publishing system for high-dimensional sensitive data, which implements the privacy-preserving and-publishing method for high-dimensional sensitive data according to any one of claims 1 to 5, wherein the privacy-preserving and-publishing system for high-dimensional sensitive data comprises:
the data acquisition module is used for acquiring data by a publisher and selecting a proper total privacy protection parameter epsilon;
the data preprocessing module is used for preprocessing the data after the data acquisition of the publisher is finished; scanning each sample respectively, and if some attribute of n attribute values of the samples is a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
the data privacy protection level evaluation module is used for evaluating the privacy protection level of the data attribute by the data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
the data disturbance module is used for adding Wishart privacy noise disturbance to the covariance matrix of the block matrix by the publisher;
the data transformation module is used for obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
the data recovery module is used for recovering the low-dimensional matrix so as to recover the matrix before equalization;
the data publishing module is used for splicing m rows and p columns of block matrixes formed by recovering all low-dimensional matrixes disturbed by privacy noise to form m rows and n columns of matrixes with privacy protection and noise, and adding corresponding attribute names to the head of a data table; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
data acquisition, namely acquiring data by a publisher, and selecting a proper total privacy protection parameter epsilon;
data preprocessing, namely preprocessing the data after the data acquisition of the issuing party is finished; scanning each sample respectively, and if some attribute of n attribute values of the samples is a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
evaluating the privacy protection level of the data, and evaluating the privacy protection level of the data attribute by a data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
data disturbance: adding Wishart privacy noise disturbance to a covariance matrix of a block matrix by a publisher;
data transformation, namely obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
data recovery, recovering the low-dimensional matrix, and further recovering the matrix before equalization;
data publishing, namely splicing m rows and p columns of block matrixes formed by restoring all privacy noise disturbed low-dimensional matrixes to form m rows and n columns of matrixes for privacy protection and noise addition, and adding corresponding attribute names to a data table header; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
8. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
data acquisition, namely acquiring data by a publisher, and selecting a proper total privacy protection parameter epsilon;
data preprocessing, namely preprocessing the data after the data acquisition of the issuing party is finished; scanning each sample respectively, and if some attribute of n attribute values of the samples is a null value, filling the n attribute values with 0 to ensure that each attribute of n dimensions has a numerical value; integrating and arranging all data into a matrix which is m rows and n columns as a whole, namely the number of samples of the matrix is m, and the dimension is n;
evaluating the privacy protection level of the data, and evaluating the privacy protection level of the data attribute by a data publisher so as to rearrange the attribute; dividing the data set into relatively low-dimensional block matrixes, wherein after the data set is divided into blocks, the dimension of each block matrix is p, and the sensitivity level of the whole block matrix is marked;
data disturbance: adding Wishart privacy noise disturbance to a covariance matrix of a block matrix by a publisher;
data transformation, namely obtaining a characteristic vector matrix and a corresponding characteristic value diagonal matrix by using a sparse matrix transformation method, and taking characteristic vectors corresponding to the first k maximum values of the characteristic value diagonal matrix; transposing the characteristic vector matrix and multiplying the transposed characteristic vector matrix and the averaged matrix to obtain a k-dimensional low-dimensional matrix, wherein k is less than or equal to p;
data recovery, namely recovering the low-dimensional matrix so as to recover the matrix before equalization;
data publishing, namely splicing m rows and p columns of block matrixes formed by recovering all privacy noise disturbed low-dimensional matrixes to form m rows and n columns of matrixes for privacy protection and noise, and adding corresponding attribute names to the head of a data table; and adjusting and restoring the attribute arrangement sequence of the original data table to form a complete data table and form data of the external release privacy version.
9. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the privacy-preserving and publishing system of high-dimensional sensitive data according to claim 6.
CN202110446261.6A 2021-04-25 2021-04-25 High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment Active CN113254988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110446261.6A CN113254988B (en) 2021-04-25 2021-04-25 High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110446261.6A CN113254988B (en) 2021-04-25 2021-04-25 High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment

Publications (2)

Publication Number Publication Date
CN113254988A CN113254988A (en) 2021-08-13
CN113254988B true CN113254988B (en) 2022-10-14

Family

ID=77221455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110446261.6A Active CN113254988B (en) 2021-04-25 2021-04-25 High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment

Country Status (1)

Country Link
CN (1) CN113254988B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115630964B (en) * 2022-12-22 2023-03-10 南京邮电大学 Construction method of high-dimensional private data-oriented correlation data transaction framework
CN116761164B (en) * 2023-08-11 2023-11-14 北京科技大学 Privacy data transmission method and system based on matrix completion
CN117349896B (en) * 2023-12-05 2024-02-06 中国电子科技集团公司第十研究所 Data collection method, analysis method and analysis system based on sensitivity classification
CN117763620B (en) * 2024-02-22 2024-06-04 国网河南省电力公司经济技术研究院 Electric power big data dynamic desensitization method based on isomorphic encryption algorithm
CN118053596B (en) * 2024-03-04 2024-08-06 飞图云科技(山东)有限公司 Intelligent medical platform data management method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766745A (en) * 2017-11-14 2018-03-06 广西师范大学 Classification method for secret protection in hierarchical data issue
CN110348238A (en) * 2019-05-28 2019-10-18 北京邮电大学 A kind of application oriented secret protection stage division and device
CN110377605A (en) * 2019-07-24 2019-10-25 贵州大学 A kind of Sensitive Attributes identification of structural data and classification stage division

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11238397B2 (en) * 2015-02-09 2022-02-01 Fedex Corporate Services, Inc. Methods, apparatus, and systems for generating a corrective pickup notification for a shipped item using a mobile master node
CN105354693A (en) * 2015-11-04 2016-02-24 中南大学 Logistics industry-oriented system and method for cascade protection of user privacy information
CN110020546B (en) * 2019-01-07 2020-12-04 南京邮电大学 Privacy data grading protection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766745A (en) * 2017-11-14 2018-03-06 广西师范大学 Classification method for secret protection in hierarchical data issue
CN110348238A (en) * 2019-05-28 2019-10-18 北京邮电大学 A kind of application oriented secret protection stage division and device
CN110377605A (en) * 2019-07-24 2019-10-25 贵州大学 A kind of Sensitive Attributes identification of structural data and classification stage division

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于属性分类的多敏感属性隐私保护方法;王茜等;《计算机工程》;20130815(第08期);第183-186页 *
基于最大信息系数的主成分分析差分隐私数据发布算法;彭长根等;《信息网络安全》;20200210(第02期);第41-52页 *
差分隐私保护参数ε的选取研究;何贤芒等;《通信学报》;20151225(第12期);第128-134页 *

Also Published As

Publication number Publication date
CN113254988A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113254988B (en) High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment
CN109983467B (en) System and method for anonymizing data sets
CN110378148B (en) Multi-domain data privacy protection method facing cloud platform
WO2016138606A1 (en) Secure account management using tokens
CN109726758B (en) Data fusion issuing algorithm based on differential privacy
CN112632077A (en) Data storage method, device, equipment and storage medium based on redis
CN114328640A (en) Differential privacy protection and data mining method and system based on mobile user dynamic sensitive data
CN111835776A (en) Network traffic data privacy protection method and system
Qu et al. Privacy preserving in big data sets through multiple shuffle
Mungamuru et al. Privacy, Preservation and Performance: The 3 P's of Distributed Data Management
CN106469042A (en) The generation method of pseudo random number and device
CN112367396B (en) Method and device for determining sample characteristic quantile in distributed cluster
Hu et al. Decentralized Matrix Factorization with Heterogeneous Differential Privacy
Ling et al. An Improved K-means Algorithm Based on Multiple Clustering and Density
CN113343242A (en) Malicious Android application online detection method and device
Zdunek et al. Distributed geometric nonnegative matrix factorization and hierarchical alternating least squares–based nonnegative tensor factorization with the MapReduce paradigm
Memon et al. A parallel method for scalable anonymization of transaction data
Qian et al. Integer-granularity locality-sensitive bloom filter
Aggarwal Privacy and the dimensionality curse
Xu et al. SparseBF: Enhancing Scalability and Efficiency for Sparsely Filled Privacy-Preserving Record Linkage
Aggarwal On the analytical properties of high-dimensional randomization
CN114741726B (en) Data processing method and device based on privacy protection and electronic equipment
KR102648905B1 (en) Method and device for privacy-constrained data perturbation
CN114817977B (en) Anonymous protection method based on sensitive attribute value constraint
WO2024047735A1 (en) Fairness evaluation program, fairness evaluation method, and information processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant