CN110443068B - Privacy protection method and device - Google Patents

Privacy protection method and device Download PDF

Info

Publication number
CN110443068B
CN110443068B CN201910709096.1A CN201910709096A CN110443068B CN 110443068 B CN110443068 B CN 110443068B CN 201910709096 A CN201910709096 A CN 201910709096A CN 110443068 B CN110443068 B CN 110443068B
Authority
CN
China
Prior art keywords
tables
merging
original data
attribute
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910709096.1A
Other languages
Chinese (zh)
Other versions
CN110443068A (en
Inventor
喻民
黄伟庆
夏剑锋
刘超
姜建国
李敏
安韶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910709096.1A priority Critical patent/CN110443068B/en
Publication of CN110443068A publication Critical patent/CN110443068A/en
Application granted granted Critical
Publication of CN110443068B publication Critical patent/CN110443068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The embodiment of the invention provides a privacy protection method and device. The privacy protection method comprises the following steps: merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table; and grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables. According to the privacy protection method and device provided by the embodiment of the invention, a plurality of data tables containing user associated information in the same database are analyzed as a whole to obtain the merged table, and the merged table is decomposed by using the correlation theory of the database and the multidimensional sensitive attribute privacy protection method to obtain a plurality of published data tables, so that the privacy protection of multiple tables and multiple privacy attributes can be realized, the privacy protection effect can be effectively improved, the data processing efficiency is improved, and the data availability is improved.

Description

Privacy protection method and device
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a privacy protection method and device.
Background
With the development of informatization, a high-level data collection and sharing mechanism provides great convenience for various data mining works, and meanwhile, the risk of personal privacy information disclosure is increased. In order to reduce the risk of personal privacy information leakage, the collected data needs to be published or shared after data privacy protection.
The research goals of data privacy protection are mainly two: after data is released or shared, an attacker is prevented from obtaining personal related sensitive attributes or revealing identity characteristics of specific individuals; and secondly, the adopted data privacy protection technology should reduce the influence on the original data as much as possible and keep higher data availability.
At present, most of privacy protection for relational data adopts an anonymous privacy protection technology, and the anonymous privacy protection technology is mainly divided into the following technologies: anonymity based on generalization, anonymity based on clustering, and anonymity based on data decomposition.
The principle of anonymous privacy protection based on the generalization technology is to generalize the quasi-identifier attributes, and uniformly replace each quasi-identifier attribute value in the original data record with the generalized value, so that the one-to-one corresponding relationship between the quasi-identifier attributes and the sensitive attributes in the original data table is changed into a one-to-many relationship, and the purpose of privacy protection is achieved.
The anonymous technology based on clustering carries out clustering by aligning the identifier attribute, replaces the original quasi identifier attribute with the centroid of the class, thereby realizing the purpose that the quasi identifier attribute and the sensitive attribute are changed into one-to-many relationship from the original one-to-one relationship. The clustering technique is the same as the generalization in nature, and the quasi-identifier attribute is replaced by the same value, but the technical means adopted are different.
The anonymity technology based on data decomposition divides the quasi-identifier attribute and the sensitive attribute into two different tables for issuing and disclosing, and cuts off the direct incidence relation between the quasi-identifier attribute and the sensitive attribute at one time to achieve the purpose of privacy protection. The data decomposition technology mainly utilizes the fact that after the data record table is vertically decomposed, if no description of the dependency relationship of a data function exists, connection between corresponding data records has multiple corresponding possibilities, so that the connection between sensitive attributes and user individuals is cut off, and the effect of privacy protection is also achieved.
Most of the existing privacy protection algorithms have the problems of high calculation complexity and poor data availability. The existing privacy protection technology mostly uses an anonymous privacy protection technology based on generalization, the core of the anonymous privacy protection technology lies in a data grouping algorithm, and the protection technology based on generalization needs to generalize the identifier attribute before grouping data and then performs grouping; therefore, the search space of the algorithm is large, the complexity of the algorithm is high, and meanwhile, the generalization can greatly influence the data availability. In reality, the scale of data is very large due to the fact that data are collected and stored all the time, the processing efficiency of the data is affected due to the high complexity of an algorithm, and meanwhile, the value of the data is reduced due to the low data availability, and the use of the data is affected.
In addition, the existing privacy protection technology only considers the privacy security problem caused by the fact that multiple related data records of a certain user may exist in a single data table in a database, but cannot solve the privacy disclosure problem caused by the related data records of the user which may exist in other data tables in the database. The data of individuals and users are usually stored in a data table in the form of data records and stored in a database, and in the same database, besides a personal information data table directly related to the users, other data tables indirectly related to the individuals exist, and the tables are related to each other through key values related to the tables. In this case, through the association analysis, there is a great risk that privacy is leaked to the user, and therefore, protection of data privacy in this case is an urgent problem to be solved.
Disclosure of Invention
In view of the problems existing in the prior art, the embodiments of the present invention provide a privacy protecting method and apparatus that overcome the above problems or at least partially solve the above problems.
In a first aspect, an embodiment of the present invention provides a privacy protection method, including:
merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table;
and grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and acquiring a plurality of issued data tables.
In a second aspect, an embodiment of the present invention provides a privacy protecting apparatus, including:
the table merging module is used for merging original data tables related to the user sensitive information according to the associated key values among the original data tables to obtain a merged table;
and the table decomposition module is used for grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result and acquiring a plurality of release data tables.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the privacy protecting method provided by any of the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a privacy protection method according to any one of the various possible implementations of the first aspect.
According to the privacy protection method and device provided by the embodiment of the invention, a plurality of data tables containing user associated information in the same database are analyzed as a whole to obtain the merged table, and the merged table is decomposed by using the correlation theory of the database and the multidimensional sensitive attribute privacy protection method to obtain a plurality of published data tables, so that the privacy protection of multiple tables and multiple privacy attributes can be realized, the privacy protection effect of multiple tables in the database can be effectively improved, the data processing efficiency is improved, and the data availability is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a privacy protection method according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a privacy protecting apparatus provided according to an embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The embodiments of the present invention, and all other embodiments obtained by a person of ordinary skill in the art without any inventive step, are within the scope of the present invention.
In order to overcome the above problems in the prior art, embodiments of the present invention provide a privacy protection method and apparatus, and the inventive concept is to obtain multiple release data tables by combining and re-decomposing multiple data tables in order to solve the problem of privacy disclosure caused by the presence of user-related data records in multiple data tables in a database, so that it is difficult to obtain sensitive information of a user according to the release data tables.
Fig. 1 is a schematic flowchart of a privacy protection method according to an embodiment of the present invention. As shown in fig. 1, a privacy protection method includes: and S101, merging original data tables related to user sensitive information according to key values related between the original data tables to obtain a merged table.
It should be noted that the original data table is an original data table in the database. Because the original data table related to the user sensitive information contains the user sensitive information, the privacy of the user can be leaked when the original data table is directly published, so that the privacy protection needs to be carried out on the original data table, a plurality of published data tables are obtained, and the plurality of published data tables are used for data publishing or sharing.
Data sheet, four types of attributes are generally designed. That is, the key name of the data table may be one of the following four types: identifier attributes, quasi-identifier attributes, sensitive attributes, and other attributes. The identifier attribute refers to an attribute for identifying the identity of an individual, such as a user number; quasi-identifier attributes, which refer to attributes that can be used to identify an individual's identity when connected to an external data source, such as name, age, city, etc.; sensitive attributes are attributes containing individual privacy information, and privacy protection is required, such as mobile phone numbers, contents of express goods, diseases suffered by the people, home addresses and the like; the other attributes refer to other attributes not belonging to the above three attributes.
The original data table related to the user sensitive information refers to an original data table containing the user sensitive information. The key value name of the original data table related to the sensitive information of the user comprises the key value name of the sensitive attribute.
Specifically, a plurality of original data tables related to user sensitive information are merged to obtain a merged table.
It can be understood that, for any two original data tables, the association relationship between the two data records in the two tables can be determined according to the same key value name and the same key value in the two tables. The data records in the merged table can be obtained through the connection of the data records with the incidence relation in different original data tables, so that the merged table is obtained.
It should be noted that, for the original data table, the merge table, or the release data table, if the key name is a column attribute, the data record refers to a row in the data table.
And S102, grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables.
Specifically, a merge table is obtained by merging a plurality of original data tables related to user sensitive information, and privacy protection of the original data tables is converted into meaning protection of the merge table.
Because multiple original data tables related to user sensitive information have multiple sensitive attributes, multiple sensitive attributes also exist in the merged table, which is equivalent to converting the privacy protection problem of multiple tables into the privacy protection problem of multiple sensitive attributes.
According to the privacy protection method of the multiple sensitive attributes, anonymity based on data decomposition is carried out on the merging table, the merging table is decomposed, and multiple issued data tables are obtained, so that any sensitive attribute and the quasi-identifier attribute are in different issued data tables, different sensitive attributes are in different issued data tables, and the direct association relation between the sensitive attribute and the quasi-identifier attribute is cut off. The embodiment of the present invention is not particularly limited to the specifically adopted privacy protection method with multiple sensitive attributes.
Before the merging table is decomposed, the data records in the merging table may be grouped based on a multi-sensitive attribute privacy protection method, and a grouping result is obtained, that is, a group to which each data record in the merging table belongs is determined, and the group to which the data record belongs may be represented by a group number.
When the merge table is decomposed, the group number corresponding to each data record may also be added to each distribution data table as a key value, that is, the distribution data tables may be associated by the group number.
It should be noted that the number of the groups is much smaller than the number of the data records in the merge table, the group number and the data records are not in one-to-one correspondence, even if all or most of the release data tables are obtained, only one-to-many relationship between the sensitive attribute and the quasi-identifier attribute can be obtained according to the group number and the obtained release data tables, and the one-to-one relationship between the sensitive attribute and the quasi-identifier attribute in the original data table cannot be restored, so that privacy protection can be realized.
According to the embodiment of the invention, a plurality of data tables containing user associated information in the same database are analyzed as a whole to obtain the merged table, the merged table is decomposed by utilizing the correlation theory of the database and the multidimensional sensitive attribute privacy protection method to obtain a plurality of published data tables, the privacy protection of multiple tables and multiple privacy attributes can be realized, the privacy protection effect of the multiple tables of the database can be effectively improved, the data processing efficiency is improved, and the data availability is improved.
Based on the content of the foregoing embodiments, before merging the original data tables related to the user sensitive information according to the key values associated between the original data tables, the method further includes: a table of raw data associated with user sensitive information is determined.
Specifically, for an original data table in the current database, if it is determined that the key value name of the original data table includes a key value name of a sensitive attribute, the original data table is determined to be an original data table related to user sensitive information.
By judging each original data table in the current database, all original data tables in the database related to the user sensitive information can be determined.
According to the embodiment of the invention, all the original data tables related to the user sensitive information in the database are determined, so that all the original data tables related to the user sensitive information can be analyzed as a whole, and privacy protection of multiple tables and multiple privacy attributes can be realized.
Based on the content of the foregoing embodiments, the specific step of determining the original data table related to the user sensitive information includes: determining a key value name as a main table from an original data table comprising an identifier attribute, at least one quasi identifier attribute and at least one sensitive attribute; for each original data table of the non-primary table, if the key value names of the original data tables of the non-primary tables are judged to include at least one key value name which is the same as the key value name included in the primary table and include at least one sensitive attribute, determining the original data tables of the non-primary tables as secondary tables; the primary and secondary tables are determined to be the original data tables associated with the user sensitive information.
Specifically, for each original data table in the previous database, if the key name of the original data table includes an identifier attribute, at least one quasi-identifier attribute, and at least one sensitive attribute, the original data table is determined to be the primary table.
The key name of the primary table includes a sensitive attribute, which may be a primary sensitive attribute of the user.
For each original data table except the primary table, if the key value name of the original data table comprises the key value name of the sensitive attribute, the data in the original data table comprises the sensitive attribute; if the key value name of the original data table further includes at least one key value name that is the same as the key value name included in the primary table, it is described that an association relationship exists between the original data table and the primary table, and the original data table can be used as a secondary table. Because the association relationship exists between the auxiliary table and the main table, the privacy protection problem can be considered by taking the main table and the auxiliary table as a whole according to the association relationship existing between the auxiliary table and the main table.
It can be understood that, if the key value name of the original data table includes a key value name that is a sensitive attribute, but there is no key value name that is the same as the key value name included in the primary table, it means that although the data in the original data table includes a sensitive attribute, since there is no association between the original data table and the primary table, the original data table, the primary table and the secondary table cannot be considered as a whole for the privacy protection problem, and the original data table and other original data tables related to user sensitive information that have an association with the original data table may be considered as a whole for the privacy protection problem.
After the primary table and the secondary table are determined, the primary table and the secondary tables can be determined as original data tables related to user sensitive information, and the primary table and the secondary tables are subjected to overall analysis to achieve privacy protection.
According to the embodiment of the invention, the primary table and the secondary tables with the association relation are determined, the primary table and each secondary table are determined as the original data tables related to the user sensitive information, the original data tables related to the user sensitive information with the association relation can be taken as a whole to consider the privacy protection problem, and therefore the privacy protection of multiple tables and multiple privacy attributes can be realized.
Based on the content of each embodiment, merging the original data tables related to the user sensitive information according to the key values associated between the original data tables, and the specific step of obtaining the merged table includes: for each auxiliary table, the same key values included by the auxiliary table and the main table are used as the key values associated between the auxiliary table and the main table; and connecting the data records of the same key value according to the same key value to obtain a merging table.
In particular, for a relational data set D { T }1,T2,…,TgWhich contains a plurality of raw data tables.
The main table T can be expressed as
Figure BDA0002153105410000071
Figure BDA0002153105410000072
Wherein, tid*A number representing a data record; user _ id represents the user's identification number, Ai(1 ≦ i ≦ p) is the quasi-identifier attribute; ci(i is more than or equal to 1 and less than or equal to f) is taken as a main table T*With other sub-tables TiAn associated key-value name;
Figure BDA0002153105410000073
is a sensitive attribute; i T*Where n represents a data table T*Comprising n data records, each data record being represented as
Figure BDA0002153105410000074
Figure BDA0002153105410000075
An attribute value representing the ith attribute in the ith data record.
The number of records, the number of data records; the identification number of the user refers to the user number; the attribute value of the ith attribute refers to a key value corresponding to the ith key value name.
Ith sub-watch TiCan be expressed as { tid, Ci,S1,S2,…,Sj}. Wherein tid represents the number of data records; ci(1 ≦ i ≦ f) representing the key-value name associated with the primary table; si(1 ≦ i ≦ j) represents a secondary sensitivity attribute associated with the primary table.
For the primary table and any one secondary table, the key value name associated with the primary table and the secondary table exists in the key value name of the primary table and the key value name of the secondary table. That is, the key name is included in both the primary table and the secondary table.
For each data record in the primary table, the same key value (i.e., the same key value) is determined according to the key value name associated with the primary table and each secondary table. The key values corresponding to the key value names associated with the primary table and the secondary table are the same, which indicates that the data records in which the key values are located in the primary table and the secondary table are associated, and the primary table and the secondary table are data records for recording related information of the same user, and the primary table and the secondary table can be connected and combined into one data record in the combination table, that is, the key value in each secondary table is added to the corresponding data record in the primary table.
Because the key value name in the merging table comprises a plurality of sensitive attributes, a method for privacy protection of the merging table can be adopted, and a multi-sensitive-attribute privacy protection method can be adopted.
According to the same key values included by the auxiliary table and the main table, the data records in which the same key values are located are connected to obtain the merging table, all original data tables related to user sensitive information can be analyzed as a whole, and privacy protection of multiple tables and multiple privacy attributes can be achieved.
Based on the content of the foregoing embodiments, according to the same key value, connecting the data records where the same key value is located and acquiring the merge table further includes: the same key value in the data record is deleted.
It should be noted that, the primary table and the secondary table are associated by the same key value name, and if the merged table obtained after the primary table and each secondary table is directly anonymized based on data decomposition, there is a possibility that a one-to-one relationship between the quasi-privacy attribute and the sensitive attribute in the merged table is restored by the same key value name, and there is still a risk of privacy disclosure.
Thus, the obtained merge table T can be expressed as
Figure BDA0002153105410000081
Figure BDA0002153105410000082
The same key values of the primary table and the secondary tables in the data records of the merged table can prevent the one-to-one relationship between the quasi-privacy attribute and the sensitive attribute from being obtained according to a plurality of published data tables based on the same key values, so that privacy protection can be realized.
Based on the content of the foregoing embodiments, the specific step of grouping the data records in the merge table according to the key value name of the sensitive attribute in the merge table includes: taking the key value name which is sensitive attribute in the merging table as one dimension of the multidimensional bucket space, and converting the merging table into the multidimensional bucket space; the data records in the merge table are grouped such that for each sub-bucket in the multidimensional bucket space, the sub-bucket includes data records that are in different groups.
Specifically, the data records in the merge table may be grouped based on multidimensional bucket grouping theory, and a grouping number of each data record may be determined.
Each sensitive attribute (i.e., the key value name of the sensitive attribute) can be used as one dimension of the multidimensional bucket space, and each data record in the merge table is mapped to one point in the multidimensional bucket space, so that the conversion (mapping) from the merge table to the multidimensional bucket space is realized.
Each dimension sensitive attribute is divided into a plurality of intervals according to key values, so that the intersected subspace can be determined as a sub-bucket according to the intersection of the intervals of each dimension, and the multi-dimension bucket space is divided into a plurality of sub-buckets.
The number of groupings is determined based on a maximum number of data records included in a sub-bucket in the multi-dimensional bucket space. When grouping data records, the following principles are followed: the data records included in the same sub-bucket are located in different packets to control the frequency of occurrence of sensitive attributes in each packet. Also, each group includes a plurality of data records, and the number of data records included in each group may not be equal.
The embodiment of the invention groups the data records in the merging table based on the multidimensional bucket grouping theory, and can control the occurrence frequency of the sensitive attribute in each group, so that the problem of privacy disclosure of a plurality of published data tables obtained by decomposing the merging table according to the grouping result can be avoided, and the privacy of a user can be effectively protected.
Based on the content of each embodiment, the specific steps of decomposing the merge table according to the grouping result and acquiring multiple release data tables include: and decomposing the merging tables according to the grouping result, so that each release data table comprises a key value name which is a sensitive attribute or comprises a key value name which is an identifier attribute and a quasi-identifier attribute, and associating the release data tables through the grouping result.
Specifically, since the merging table is decomposed to realize that the one-to-one relationship between the quasi-privacy attribute and the sensitive attribute cannot be obtained according to the decomposition result, when the merging table is decomposed according to the grouping result (grouping number), the key name included in the merging table is divided into a plurality of parts, so that one part includes key names which are the identifier attribute and the quasi-identifier attribute, and each of the other parts includes a key name which is the sensitive attribute.
In order to ensure the minimum influence on the original data, the combined table can be decomposed by combining the main table and the auxiliary table for reduction. Specifically, the key value name included in the release data table including the sensitive attribute may be determined by combining the key value name included in the sub-table in which each sensitive attribute is located, so that the release data table including the sensitive attribute includes the sensitive attribute and the key value names of other attributes included in the sub-table in which the sensitive attribute is located; the key names included in the published data table including the identifier attribute and the quasi-identifier attribute may also be determined in conjunction with the key names included in the primary table, such that the published data table including the identifier attribute and the quasi-identifier attribute includes the key names of the identifier attribute, the quasi-identifier attribute, and other attributes included in the primary table.
In order to maintain data availability and subsequent data analysis results as much as possible, the group number may be used as a key name for each published data table.
Before the merge table is decomposed, the group number may also be used as a key name of the merge table, and the group number of each data record may be added to the data record as a key.
After determining the key name of each published data table, the data records in the merge table may be resolved. Decomposing one data record in the merging table into one data record in each release data table; and decomposing each data record in the merging table to obtain the multiple publishing data tables.
In order to reduce the hiding rate and reduce the influence of privacy protection on the original data, data records and key values which do not meet the data privacy protection condition can be omitted. After data records and key values which do not meet the data privacy protection condition in the merging table are omitted, decomposition is carried out to obtain a plurality of published data tables; after the decomposition and combination, the data records and the key values which do not meet the data privacy protection condition in the decomposition result can be discarded to obtain a plurality of release data tables. If the data privacy protection condition is not met, the corresponding key value name is the other attribute, and privacy protection is not needed.
According to the embodiment of the invention, the merging table is decomposed according to the group number, and a plurality of published data tables close to the main table and each auxiliary table are restored, so that the hiding rate can be reduced, the influence of privacy protection on the original data can be reduced, and the privacy protection effect is better.
In order to facilitate understanding of the embodiments of the present invention, the privacy protection method provided by the embodiments of the present invention is described below by way of an example.
The original data tables in the database are shown in tables 1 and 2.
Table 1 recipient information table
Record tuple User number Name (I) City Telephone number Express bill number
t1 1 A City 1 Num1 K5730
t2 1 A City 1 Num1 K1152
t3 2 B City 1 Num1 K7215
t4 3 C City 2 Num2 K2462
t5 4 D City 3 Num3 K6247
Table 2 express delivery article information table
Order numbering Express bill number Contents of an article Weight (D) Express type
49321 K5730 Document 1 Type 1
49548 K1152 Cold medicine 1 Type 2
49296 K7215 Document 1 Type 1
49286 K2462 Tea leaves 1 Type 3
49518 K6247 Shampoo liquid 1 Type 2
Since the name and city in table 1 are quasi-identifier attributes and the phone number is the main sensitive attribute, table 1 is determined to be the main table. Since the key values (specific number of the courier note, e.g., K5730) of the courier note numbers in table 2 and the courier note numbers in table 1 are the same, the contents of the article are associated secondary sensitive attributes, and table 2 is determined as a secondary table.
Although privacy disclosure can be prevented to some extent by deleting two columns, namely, the express bill number in table 2 and the express bill number in table 1, table 1 and table 2 after deleting the express bill number column do not have any correlation, and data availability and subsequent data analysis results are affected.
Because table 1 and table 2 can be associated by the express ticket number, table 1 and table 2 can be merged according to the same key value (referring to a specific express ticket number, such as K5730) in table 1 and table 2, and each key value of the express ticket number is deleted, so as to obtain table 3. Table 3 is a merged table.
Table 3 recipient information table
Figure BDA0002153105410000111
As shown in table 3, the merge table may express the "identifier", the quasi-identifier attribute ", and the" sensitive attribute ", or may display only the key name of the" identifier ", the key name of the" quasi-identifier attribute ", and the key name of the" sensitive attribute "without displaying the" identifier ", the quasi-identifier attribute", and the "sensitive attribute".
After the table 3 is obtained, the data records in the table 3 are grouped according to the multidimensional bucket grouping theory, and the multidimensional bucket corresponding to the table 3 is obtained. Since Table 3 includes two sensitive attributes, the multidimensional bucket corresponding to Table 3 is a 2-dimensional bucket. The corresponding 2-dimensional buckets of table 3 are shown in table 4.
TABLE 4 sensitive Attribute 2-dimensional bucket
Figure BDA0002153105410000121
From table 4, it can be determined that the data records in table 3 are divided into two groups, the group number is used as one key value name of the merge table, and the data records t2 which do not satisfy the data privacy protection condition are discarded, so as to obtain table 5. Table 5 adds a merged table of grouping results.
Table 5 recipient information table
Figure BDA0002153105410000122
Table 5 is decomposed according to the packet numbers to obtain three release data tables as shown in table 6-1, table 6-2 and table 6-3.
Table 6-1 release data table 1
User number Name (I) City Group number
1 A City 1 1
3 C City 2 1
2 B City 1 2
4 D City 3 2
Table 6-2 release data table 2
Group number Telephone number
1 Num1
1 Num2
2 Num1
2 Num3
Tables 6-3 release data Table 3
Order numbering Group number Contents of an article Weight (D) Express type
49321 1 Document 1 Type 1
49286 1 Tea leaves 1 Type 3
49296 2 Document 1 Type 1
49518 2 Shampoo liquid 1 Type 2
It can be seen that, for table 6-1, table 6-2 and table 6-3, even though the association can be realized by the packet number, only the one-to-many relationship between the sensitive attribute (such as the phone number) and the quasi-identifier attribute (such as the name) can be obtained, and the one-to-one relationship between the sensitive attribute and the quasi-identifier attribute in the original data table cannot be restored, so that the privacy protection of the multi-table multi-sensitive attribute can be realized.
Fig. 2 is a functional block diagram of a privacy protecting apparatus according to an embodiment of the present invention. Based on the content of the foregoing embodiments, as shown in fig. 2, the privacy protecting apparatus includes a table merging module 201 and a table decomposing module 202, where:
a table merging module 201, configured to merge original data tables related to user sensitive information according to associated key values between the original data tables to obtain a merged table;
the table decomposition module 202 is configured to group the data records in the merge table according to the key value names of the sensitive attributes in the merge table, and decompose the merge table according to a grouping result to obtain multiple release data tables.
Specifically, for any two original data tables, the table merging module 201 may determine an association relationship between two data records in the two tables according to the same key value name and the same key value in the two tables; the data records in the merged table can be obtained through the connection of the data records with the incidence relation in different original data tables, so that the merged table is obtained.
The table decomposition module 202 may perform anonymity based on data decomposition on the merged table according to the multi-sensitive attribute privacy protection method, decompose the merged table, and obtain multiple published data tables, so that any sensitive attribute and the quasi-identifier attribute are in different published data tables, and different sensitive attributes are in different published data tables, thereby cutting off a direct association relationship between the sensitive attribute and the quasi-identifier attribute.
The privacy protection apparatus provided in the embodiment of the present invention is configured to execute the privacy protection method provided in the embodiment of the present invention, and specific methods and processes for implementing corresponding functions by modules included in the privacy protection apparatus are described in the above embodiments of the privacy protection method, and are not described herein again.
The privacy protecting apparatus is used in the privacy protecting method of the foregoing embodiments. Therefore, the description and definition in the privacy protecting method in the foregoing embodiments may be used for understanding the execution modules in the embodiments of the present invention.
According to the embodiment of the invention, a plurality of data tables containing user associated information in the same database are analyzed as a whole to obtain the merged table, the merged table is decomposed by utilizing the correlation theory of the database and the multidimensional sensitive attribute privacy protection method to obtain a plurality of published data tables, the privacy protection of multiple tables and multiple privacy attributes can be realized, the privacy protection effect of the multiple tables of the database can be effectively improved, the data processing efficiency is improved, and the data availability is improved.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present invention. Based on the content of the above embodiment, as shown in fig. 3, the electronic device may include: a processor (processor)301, a memory (memory)302, and a bus 303; wherein, the processor 301 and the memory 302 complete the communication with each other through the bus 303; the processor 301 is configured to invoke computer program instructions stored in the memory 302 and executable on the processor 301 to perform the methods provided by the various method embodiments described above, including, for example: merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table; and grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables.
Another embodiment of the present invention discloses a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments, for example, including: merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table; and grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables.
Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Another embodiment of the present invention provides a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table; and grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. It is understood that the above-described technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the above-described embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A privacy preserving method, comprising:
merging original data tables related to user sensitive information according to associated key values among the original data tables to obtain a merged table;
grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result, and obtaining a plurality of release data tables, so that any sensitive attribute and the quasi-identifier attribute are in different release data tables, and different sensitive attributes are in different release data tables;
before merging the original data tables related to the user sensitive information according to the associated key values between the original data tables, the method further comprises the following steps:
determining the original data table related to the user sensitive information;
the specific step of determining the original data table related to the user sensitive information comprises:
determining a key value name as a main table from an original data table comprising an identifier attribute, at least one quasi identifier attribute and at least one sensitive attribute;
for each original data table of the non-primary table, if the key value names of the original data tables of the non-primary tables are judged to include at least one key value name which is the same as the key value name included in the primary table and include at least one sensitive attribute, determining the original data tables of the non-primary tables as secondary tables;
and determining the primary table and each secondary table as an original data table related to user sensitive information.
2. The privacy protection method according to claim 1, wherein the step of merging the original data tables related to the user sensitive information according to the key values associated between the original data tables, and the step of obtaining the merged table includes:
for each auxiliary table, using the same key value included by the auxiliary table and the main table as the key value associated between the auxiliary table and the main table;
and connecting the data records where the same key values are located according to the same key values to obtain the merging table.
3. The privacy protection method according to claim 2, wherein connecting the data records where the same key value is located and acquiring the merge table according to the same key value further comprises:
deleting the same key value in the data record.
4. The privacy protection method according to any one of claims 1 to 3, wherein the specific step of grouping the data records in the merge table according to the key-value name of the sensitive attribute in the merge table includes:
taking the key value name which is sensitive attribute in the merging table as one dimension of the multidimensional bucket space, and converting the merging table into the multidimensional bucket space;
grouping the data records in the merge table such that for each sub-bucket in the multi-dimensional bucket space, the sub-bucket includes data records that are in different groupings.
5. The privacy protection method according to claim 4, wherein the specific step of decomposing the merged table according to the grouping result to obtain a plurality of published data tables includes:
and decomposing the merging tables according to the grouping result, so that each release data table comprises a key value name which is a sensitive attribute or comprises a key value name which is an identifier attribute and a quasi-identifier attribute, and the release data tables are associated through the grouping result.
6. A privacy preserving apparatus, comprising:
the table merging module is used for merging original data tables related to the user sensitive information according to the associated key values among the original data tables to obtain a merged table;
the table decomposition module is used for grouping the data records in the merging table according to the key value names of the sensitive attributes in the merging table, decomposing the merging table according to a grouping result and obtaining a plurality of release data tables, so that any sensitive attribute and the quasi-identifier attribute are in different release data tables, and different sensitive attributes are in different release data tables;
the table merging module is further used for determining the original data table related to the user sensitive information;
the table merging module is further specifically configured to determine, as a primary table, a key-value name of an original data table including an identifier attribute, at least one quasi-identifier attribute, and at least one sensitive attribute;
for each original data table of the non-primary table, if the key value names of the original data tables of the non-primary tables are judged to include at least one key value name which is the same as the key value name included in the primary table and include at least one sensitive attribute, determining the original data tables of the non-primary tables as secondary tables;
and determining the primary table and each secondary table as an original data table related to user sensitive information.
7. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 5.
8. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 5.
CN201910709096.1A 2019-08-01 2019-08-01 Privacy protection method and device Active CN110443068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910709096.1A CN110443068B (en) 2019-08-01 2019-08-01 Privacy protection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910709096.1A CN110443068B (en) 2019-08-01 2019-08-01 Privacy protection method and device

Publications (2)

Publication Number Publication Date
CN110443068A CN110443068A (en) 2019-11-12
CN110443068B true CN110443068B (en) 2022-03-22

Family

ID=68432822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910709096.1A Active CN110443068B (en) 2019-08-01 2019-08-01 Privacy protection method and device

Country Status (1)

Country Link
CN (1) CN110443068B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079179A (en) * 2019-12-16 2020-04-28 北京天融信网络安全技术有限公司 Data processing method and device, electronic equipment and readable storage medium
CN113726764B (en) * 2021-08-27 2023-03-24 杭州溪塔科技有限公司 Private data transmission method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156668A (en) * 2014-08-04 2014-11-19 江苏大学 Privacy protection reissuing method for multiple sensitive attribute data
CN106650487A (en) * 2016-09-29 2017-05-10 广西师范大学 Multi-partite graph privacy protection method published based on multi-dimension sensitive data
CN107358116A (en) * 2017-06-29 2017-11-17 华中科技大学 A kind of method for secret protection in multi-sensitive attributes data publication
EP3477530A1 (en) * 2017-10-26 2019-05-01 Sap Se K-anonymity and l-diversity data anonymization in an in-memory database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9773117B2 (en) * 2014-06-04 2017-09-26 Microsoft Technology Licensing, Llc Dissolvable protection of candidate sensitive data items

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156668A (en) * 2014-08-04 2014-11-19 江苏大学 Privacy protection reissuing method for multiple sensitive attribute data
CN106650487A (en) * 2016-09-29 2017-05-10 广西师范大学 Multi-partite graph privacy protection method published based on multi-dimension sensitive data
CN107358116A (en) * 2017-06-29 2017-11-17 华中科技大学 A kind of method for secret protection in multi-sensitive attributes data publication
EP3477530A1 (en) * 2017-10-26 2019-05-01 Sap Se K-anonymity and l-diversity data anonymization in an in-memory database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数据发布中面向多敏感属性的隐私保护方法;杨晓春等;《计算机学报》;20080430;全文 *

Also Published As

Publication number Publication date
CN110443068A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
EP3477529B1 (en) Bottom up data anonymization in an in-memory database
EP3356964B1 (en) Policy enforcement system
US11630853B2 (en) Metadata classification
US10528761B2 (en) Data anonymization in an in-memory database
Onashoga et al. KC-Slice: A dynamic privacy-preserving data publishing technique for multisensitive attributes
Agrawal et al. Foundations of uncertain-data integration
US8875302B2 (en) Classification of an electronic document
US10565398B2 (en) K-anonymity and L-diversity data anonymization in an in-memory database
Caruccio et al. GDPR compliant information confidentiality preservation in big data processing
Jayabalan et al. Anonymizing healthcare records: a study of privacy preserving data publishing techniques
US11968214B2 (en) Efficient retrieval and rendering of access-controlled computer resources
CN110443068B (en) Privacy protection method and device
Abbasi et al. A clustering‐based anonymization approach for privacy‐preserving in the healthcare cloud
US11734252B2 (en) Online determination of result set sensitivity
Yi et al. Privacy protection method for multiple sensitive attributes based on strong rule
Gkoulalas-Divanis et al. Anonymization of electronic medical records to support clinical analysis
CN110390211B (en) Sensitive attribute data processing method and system
US20210294794A1 (en) Vector embedding models for relational tables with null or equivalent values
S Aldeen et al. A hybrid K-anonymity data relocation technique for privacy preserved data mining in cloud computing
WO2019019711A1 (en) Method and apparatus for publishing behaviour pattern data, terminal device and medium
Shimona Survey on privacy preservation technique
Xiang et al. Privacy vs. Utility: An Enhanced K-coRated
Patil et al. Comparative analysis of privacy preserving techniques in distributed database
Asayesh et al. (t, k)‐Hypergraph anonymization: an approach for secure data publishing
Jayapradha et al. A Survey on Privacy-Preserving Data Publishing Models for Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant