WO2020175306A1

WO2020175306A1 - Anonymizing device, anonymizing method, and program

Info

Publication number: WO2020175306A1
Application number: PCT/JP2020/006714
Authority: WO
Inventors: 禅石倉; 長谷川　聡; 高橋　誠治; 進角田
Original assignee: 日本電信電話株式会社
Priority date: 2019-02-26
Filing date: 2020-02-20
Publication date: 2020-09-03
Also published as: US20220019696A1; CN113474778B; CN113474778A; US11972021B2; EP3933635A4; JP7088405B2; EP3933635A1; JPWO2020175306A1; EP3933635B1

Abstract

The present invention provides technology for anonymizing data without impairing the usefulness of the data. The present invention includes: a duplicate elimination unit for generating, from M×N anonymization object tables, M×L partial tables that include the records of L anonymization object tables in which sets of values of p master attributes are mutually different, with M denoting the number of attributes, N denoting the number of records, p denoting the number of master attributes, L denoting the number of sets of values of mutually different p master attributes; an anonymization unit for generating, from the partial tables, M×L pre-anonymized partial tables in which the partial tables are anonymized with respect to the p master attributes; and a duplicate restoration unit for generating, from the anonymization object tables and pre-anonymized partial tables, M×N pre-anonymized tables in which the anonymization object tables are anonymized with respect to the p master attributes.

Description

Specification

Title of invention: Anonymization device, anonymization method, program

Technical field

The present invention relates to anonymization technology.

Background technology

[0002] In recent years, a technique called privacy-preserving data mining, which can obtain a result while protecting privacy in data mining, has attracted attention. Such techniques include k-anonymization described in Non-Patent Document 1 and Pk-anonymization described in Non-Patent Document 2.

The processing target of these anonymization techniques is a table as shown in FIG. Here, a table is data that includes N (N is an integer of 1 or more) records that are a set of values for M (M is an integer of 2 or more) attributes. A record is called a row, and a set of values for a certain attribute (for example, name) is called a column. For example, the first row of the table in Figure 1 is (Mr. A, Male, 30s, Convenience store, 150),

The first column is (A, C, E, A, B, D, E). Also, the size of the table is expressed as MXN. For example, the table in Figure 1 is a 5X7 table (M=5, N=7). The attribute value included in the table is called the element of the table.

[0004] Attributes that are anonymized by the anonymization technology described in Non-Patent Document 1 or 2 are master attributes, and other attributes, that is, attributes that are not anonymized by the anonymization technology. Is called a history attribute. In addition, deleting a set of attribute values for a certain master attribute, that is, a column is called attribute deletion. Attribute deletion is an example of anonymization technology.

Prior art documents

Non-patent literature

[0005] Non-Patent Document 1: Latanya Sweeney, “k-anonymity: a model for protecting privacy, International Journal of Uncertainty, Fuzziness and Know Led. ge-Based Systems, Vo l. 10, Issue 5, October 2002.

Non-patent document 2: Dai Igarashi, Koji Senda, Katsumi Takahashi, “Extension of k-anonymity to probabilistic index and its application example”, Computer Security Symposium 2010 (CSS2 009), pp. 1-6, October 2009.

Summary of the invention

Problems to be Solved by the Invention

[0006] The anonymization techniques described in Non-Patent Document 1 and Non-Patent Document 2 target a table composed of special records in which a master attribute value pair and a history attribute value pair have a one-to-one correspondence. As anonymization process to protect the data. Therefore, for example, in the table in Fig. 1, we try to anonymize using the k-anonymization as the two attributes of the master attribute: three attributes of name, gender, and age group, history attribute of purchase store, and purchase price. Then, it cannot be applied as it is. As a method to solve this problem, the method of dividing the table and making it anonymous is explained below.

[0007] Focusing on a certain history attribute, consider dividing the table according to its value. Focusing on the purchase store, Figure 1 can be divided into the two tables shown in Figure 2, namely, the table showing the purchase history at the convenience store and the table showing the purchase history at the supermarket. The two tables in Figure 2 are equivalent to the table in Figure 1, and there is no difference in the information that the two tables in Figure 2 and the table in Figure 1 represent. In this way, the two tables in Fig. 2 obtained by dividing the table in Fig. 1 each have a one-to-one correspondence between the master attribute value pairs and the history attribute value pairs. If the attributes are deleted and k-anonymization is performed by gender and age with k=3, no table contains more than three records with the same set of master attribute values. , I get two tables with all records deleted. Here, the shaded parts represent the anonymized elements. As a result, the table obtained by anonymizing the table in Figure 1 is a table with all records deleted, as in the table in Figure 4.

[0008] In this way, by paying attention to one history attribute and dividing the table according to its value and anonymizing each table, \¥0 2020/175 306 3 卩 (: 170? 2020 /006714

Since the number of records contained in the table is smaller than the number of records contained in the original table, the number of deleted records will be larger and the usefulness of the data (table in Fig. 1) will be impaired. Will result.

[0009] As another example, when the name is deleted from the attributes and anonymization is performed with 1^=2 as the gender and age, as shown in Fig. 5, the table showing the purchase history at the convenience store is 1 You will get a table with one deleted record and a table with no deleted records for the table showing the purchase history at the supermarket. As a result, the table obtained by anonymizing the table in Figure 1 is a table with one deleted record, like the table in Figure 6. As you can see from the table in Figure 6, the data of Mr. Mami's purchase history at the supermarket remains after anonymization, but the data of his purchase history at the convenience store is deleted, and the data about Mr. The combination relationship of is destroyed.

[0010] In this way, by focusing on one history attribute and dividing the table according to its value and anonymizing that table, even if the data combination relationship collapses, the data (Fig. 1 As a result, the usefulness of the table) is impaired.

[001 1] Therefore, an object of the present invention is to provide a technique for anonymizing data without impairing its usefulness.

Means for solving the problem

[0012] One aspect of the present invention is that 11/1 is an integer of 2 or more that represents the number of attributes, 1\1 is an integer of 1 or more that represents the number of records, and 1 is 1 or more that represents the number of master attributes. An integer less than or equal to 1,! Let _ be an integer greater than or equal to 1 and less than or equal to 1\1 that represents the number of different sets of master attribute values, and from the anonymization target table of 11/1 X 1^, set the number of master attribute value sets. Contains different records from the anonymization target table

And a de-duplication unit that generates an anonymized partial table of 1^1 XI· that is an anonymization of the partial table for each master attribute from the partial table. , said to the anonymous subject Te ^_ Bull anonymous Kasumi part from Te ^_ Bull, 11/1 X 1 ^ anonymous pre-tape of which were anonymous the anonymous target table as the target master ^_ attribute of卩個 \\0 2020/175 306 4 卩 (: 170? 2020 /006714

And a deduplication unit that generates a bull.

Effect of the invention

[0013] According to the present invention, it is possible to anonymize data without impairing its usefulness.

Brief description of the drawings

[0014] [Fig. 1] Fig. 1 is a diagram showing an example of an anonymization target table.

[Fig. 2] Fig. 2 is a diagram illustrating a method of dividing a table to make it anonymous.

[Fig. 3] Fig. 3 is a diagram illustrating a method of dividing a table to make it anonymous.

[FIG. 4] A diagram for explaining a method of dividing a table to make it anonymous.

[FIG. 5] FIG. 5 is a diagram illustrating a method of anonymizing a table by dividing it.

FIG. 6 is a diagram illustrating a method of dividing a table to make it anonymous.

FIG. 7 is a block diagram showing an example of the configuration of the anonymization device 100.

FIG. 8 is a flowchart showing an example of the operation of the anonymization device 100.

FIG. 9 is a block diagram showing an example of the configuration of the deduplication unit 110.

FIG. 10 is a flowchart showing an example of the operation of the deduplication unit 110.

FIG. 11 is a diagram showing an example of an anonymization target table.

Ru Figure der to FIG. 12 shows the ^_ examples of the resulting table generation process of coding already table.

FIG. 13 is a diagram showing an example of an encoded table.

FIG. 14 is a diagram showing an example of a table obtained in the process of generating the duplicate record number table.

FIG. 15 is a diagram showing an example of a table obtained in the process of generating the duplicate record number table.

FIG. 16 is a view to view the ^_ examples of the resulting table generation process duplicate record number table.

FIG. 17 is a diagram showing an example of a duplicate record number table.

18 is a diagram showing an ^_ example of the resulting table generation process parts table. Is a diagram illustrating an ^_ example of the resulting table generation process in FIG. 19 parts table. \¥0 2020/175306 5 卩 (: 170? 2020 /006714

FIG. 20 is a diagram showing an example of a partial table.

FIG. 21 is a diagram showing an example of an anonymized partial table.

FIG. 22 is a diagram showing an example of an anonymized table.

MODE FOR CARRYING OUT THE INVENTION

[0015] Hereinafter, embodiments of the present invention will be described in detail. It should be noted that components having the same function are denoted by the same reference numeral, and redundant description will be omitted.

[0016] <First Embodiment>

11/1 is an integer of 2 or more that represents the number of attributes, 1\1 is an integer of 1 or more that represents the number of records, is an integer of 1 or more and 11/1 or less that represents the number of master attributes,! Let _ be an integer greater than or equal to 1 and less than or equal to |\| that represents the number of different sets of master attribute values. Anonymization device 100

From 1\1 anonymization target table (anonymization target table), anonymized target table of 11/1 X 1^ is generated by anonymizing the anonymization target table for approximately master attributes. To do.

[0017] Hereinafter, the anonymization device 100 will be described with reference to FIGS. 7 to 8. FIG. 7 is a block diagram showing the configuration of the anonymization device 100. FIG. 8 is a flow chart showing the operation of the anonymization device 100. As shown in FIG. 7, the anonymization device 100 includes a deduplication unit 110, anonymization unit 120, duplication restoration unit 130, and recording unit 190. The recording unit 190 is a component that appropriately records information necessary for the processing of the anonymization device 100. The recording unit 190 records, for example, a table generated in the process of processing by the anonymization device 100 such as an anonymization target table.

[0018] The operation of the anonymization device 100 will be described with reference to FIG.

[0019] In 311 0, the deduplication unit 1 10 inputs the anonymization target table of 11/1 X 1^, and from the anonymization target table, the sets of values of the master attribute of each are mutually exclusive. Generates a 1^x 1· partial table containing different records of different tables to be anonymized and outputs the partial table. Hereinafter, the deduplication unit 110 will be described with reference to FIGS. 9 to 10. FIG. 9 is a block diagram showing the configuration of the deduplication unit 110. FIG. 10 is a flow chart showing the operation of the deduplication unit 110. As shown in Fig. 9, the deduplication unit 1 1 1 0 It includes a record number table generator 1 1 3 and a partial table generator 1 1 5.

The operation of the deduplication unit 110 will be described with reference to FIG.

[0021] In S 1 1 1, the encoded table generation unit 1 1 1 1 is a master table of p XN that includes N records that are pairs of values of p master attribute values from the anonymization target table. Is generated, and the encoded table of p XN in which the values of the master attribute table are encoded is generated. Hereinafter, a specific example of each table will be described. First, the encoded table generation unit 1 11 1 generates the master attribute table of FIG. 12 from the anonymization target table of FIG. 11. Next, the encoded table generation unit 1 11 generates the encoded table of FIG. 13 from the master attribute table of FIG. The encoded table is a table obtained by encoding according to the rule that elements of the master attribute table having the same value are assigned the same integer value.

In [0022] S 1 1 3, duplicate record number table generating unit 1 1 3, from the coding already table generated by the S 1 1 1, anonymization target set of values of p number of master attribute is the same ^_ Generate a 2XL duplicate record number table that has a set of record numbers of the records in the table and a set of key values that are the elements (for example, the minimum value of the set) of the set. Here, the record number is a number that identifies a record included in the anonymization target table. Hereinafter, description will be made using specific examples of each table. First, the duplicate record number table generation unit 1 13 generates the coded table with record numbers of FIG. 14 from the coded table of FIG. The coded table with record numbers is a (p+1) XN table obtained by adding the record number sequence to the coded table. Next, the duplicate record number table generation unit 1 13 generates the duplicate record number table of FIG. 17 from the encoded table with the record number of FIG. At that time, the duplicate record number table generator 1 1 3 generates, for example, a 2 XN table using the map structure shown in FIG. 15 and a 2 XL table using the map structure shown in FIG. Generate a record number table. Here, the map structure means that for one attribute, the values of multiple relevant attributes are combined into one. \¥0 2020/175306 7 卩(: 170? 2020/006714

This is a data structure that is stored collectively. For example, in FIG. 15, a plurality of values such as [1, 2, 3], [4, 5, 6] are stored as one element for the attribute called encoded data.

[0023] In 3 1 1 5, the partial table generation unit 1 1 5

3 Generate a partial table from the duplicate record number table generated in 1 1. Hereinafter, a specific example of each table will be described. First, the partial table generation function ^ 1 15 generates the record numbered anonymization target table of Fig. 18 from the anonymization target table of Fig. 11. The anonymization target table with record numbers is a (11/1+1) 1\1 table obtained by adding a column of record numbers to the anonymization target table. Next, the partial table generation unit 115 generates the partial table of FIG. 20 from the record number-added anonymization target table of FIG. 18 and the duplicate record number table of FIG. At that time, the partial table generation unit 115 generates, for example, the partial table with record numbers shown in FIG. 19 and the partial table.

[0024] In 3120, the anonymization unit 120 inputs the partial table generated in 3110, and anonymizes the partial table from the partial table for a master attribute of 1 unit. Generate the anonymized partial table of ^X1· and output the anonymized partial table. Hereinafter, a specific example of each table will be described. The anonymization unit 120 generates the anonymized partial table of FIG. 21 from the partial table of FIG. The anonymized partial table in Fig. 21 is obtained by deleting the attributes of the name, generalizing the address and age, and anonymizing 1^=2.

[0025] Note that, for the anonymization, in addition to attribute deletion, generalization, and anonymization, general anonymization may be used. A record shuffle that changes the order of records (upper and lower) may be used. When performing anonymization including a process of changing the order of records like record shuffle, the anonymization unit 120 generates a table showing transition of record numbers.

[0026] In 3130, the duplication restoration unit 1300 uses the anonymization target table, the anonymized partial table generated in 3120, and the duplicate record number table generated in 3110. \¥ 2020/175306 8 卩(: 170? 2020/006714

— Input the table, generate the anonymized table from the anonymization target table and the anonymized partial table using the duplicate record number table, and output the anonymized table. Hereinafter, a specific example of each table will be described. Duplicate restoration unit 1 3 0 anonymous Kasumi portion Te ^_ table of FIG 1, to produce a anonymization already table of FIG 2.

[0027] When anonymization including a process of changing the order of records is performed in 3120, the duplication restoration unit 1300 determines the anonymization target table and the anonymized part generated in 3120. Input the table and the duplicate record number table generated in 3110 and the table that represents the transition of the record number generated in 3120, and enter the duplicate record number table and the record from the anonymization target table and the anonymized partial table. Anonymized table is generated using the table showing the number transition and the anonymized table is output.

[0028] According to the embodiment of the present invention, it is possible to anonymize data without impairing its usefulness.

[0029] <Additional Notes>

The device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a communication device capable of communicating with the outside of the hardware entity (for example, communication. Communication part that can be connected,

It may have cache memory, registers, etc.), memory

External storage device that is a hard disk and its input, output, and communication units, 〇11, [¾1\/1,

It has a bus to connect it so that data can be exchanged between external storage devices. In addition, if necessary, the hardware entity

It is also possible to provide a device (drive) capable of reading and writing a recording medium such as. General-purpose computers are examples of physical entities that have such hardware resources.

[0030] The external storage device of the hardware entity is necessary for the program required to realize the above-mentioned functions and the processing of this program. Data, etc. are stored (not limited to an external storage device, for example, the program may be stored in a ROM which is a read-only storage device). Further, data and the like obtained by the processing of these programs are appropriately stored in the RAM, an external storage device, or the like.

[0031] In the hardware entity, each program stored in the external storage device (or ROM, etc.) and the data necessary for the processing of each program are read into the memory as necessary, and are interpreted and executed by the CPU as appropriate. It is processed. As a result, C P U realizes a predetermined function (each component described above, part, means, etc.).

The present invention is not limited to the above-mentioned embodiments, and can be modified as appropriate without departing from the spirit of the present invention. Further, the processing described in the above embodiments is not only executed in time series in the order described, but also in parallel or individually in accordance with the processing capability of the device that executes the processing or the need. Good.

As described above, the hardware entity described in the above embodiment

When the processing function of (the device of the present invention) is realized by a computer, the processing content of the function that the hardware entity should have is described by a program. Then, by executing this program on the computer, the processing function of the hardware entity is realized on the computer.

The program describing the processing contents can be recorded in a computer-readable recording medium. The computer-readable recording medium may be, for example, any magnetic recording device, optical disk, magneto-optical recording medium, semiconductor memory, or the like. Specifically, for example, a magnetic recording device is a hard disk device, a flexible disk, a magnetic tape, etc., and an optical disc is a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact). Disc Read Only Memory), CD-R (Recordable) /RW (Rewritable), etc. 〇 (Magneto-Optical disc) etc. can be used as semiconductor memory such as EEP-R 〇 M (Electrically Erasable and Programmable-Read Only Memory) etc.

[0035] Further, the distribution of this program is, for example, a DV recording the program.

D, CD-ROM, and other portable recording media will be sold, transferred, or loaned. Further, the program may be stored in a storage device of a server computer and transferred from the server computer to another computer via a network to distribute the program.

A computer that executes such a program, for example, first temporarily stores the program recorded on a portable recording medium or the program transferred from the server computer in its own storage device. When executing the process, the computer reads the program stored in its own storage device and executes the process according to the read program. As another execution form of this program, the computer may read the program directly from the portable recording medium and execute the processing according to the program. Furthermore, the program is transferred from the server computer to this computer. It is also possible to execute processing according to the received program one after another. In addition, the so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, realizes the processing function only by the execution instruction and the result acquisition, is used. It may be configured to execute processing. It should be noted that the program in this embodiment includes information used for processing by an electronic computer and conforms to the program (data that is not a direct command to the computer but has the property of defining the processing of the computer, etc.). ..

Further, in this embodiment, the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents is realized by hardware. It may be that. \¥0 2020/175 306 1 1 卩 (: 170? 2020 /006714

[0038] The foregoing description of the embodiments of the invention has been presented for purposes of illustration and description. I have no intention of being exhaustive and of limiting the invention to the precise form disclosed. Modifications and variations are possible from the above teachings. The embodiments are intended to provide the best illustration of the principles of the invention and to those of ordinary skill in the art in various embodiments and in various ways to suit the contemplated and practical use. It was chosen and represented so that it could be used with additional transformations. All such variations and variations are within the scope of the invention as defined by the appended claims, which are construed in accordance with the breadth to which they are impartially and legally imparted.

Claims

\¥0 2020/175 306 12 卩(: 17 2020/006714 Claims

[Claim 1] 11/1 is an integer of 2 or more representing the number of attributes, 1\1 is an integer of 1 or more representing the number of records, is an integer of 1 or more and 11/1 or less representing the number of master attributes, !! Let _ be an integer greater than or equal to 1 and less than or equal to 1\1 that represents the number of different sets of master attribute values.

A deduplication unit that generates a partial table,

From the partial table, an anonymization unit that generates an anonymized partial table of 1^1 X I· that anonymizes the partial table for each master attribute,

From the anonymization target table and the anonymized partial table, a duplicate restoration unit that generates an anonymized table of M X N that anonymizes the anonymization target table for each master attribute,

Anonymization device including.

[Claim 2] The anonymization device according to claim 1,

A record number is a number for identifying a record included in the anonymization target table,

The deduplication unit uses a set of record numbers of the records of the anonymization target table having the same set of master attribute values and a set of key values as elements of the set as a record 2 X 1 ·Duplicate record number table of

The duplication restoration unit generates the anonymized table using the duplication record number table.

An anonymization device characterized by the above.

[Claim 3] 11/1 is an integer of 2 or more indicating the number of attributes, 1\1 is an integer of 1 or more indicating the number of records, is an integer of 1 or more and 11/1 or less indicating the number of master attributes, !! Let _ be an integer greater than or equal to 1 and less than or equal to 1\1 that represents the number of different sets of master attribute values. \\0 2020/175 306 13 卩 (: 170? 2020 /006714

Includes 1_ records of the anonymization target table with different sex value pairs

A deduplication step of generating the partial table of 1), and the anonymization device generates 1^X 1·anonymized partial tables of the partial table that are anonymized for the master attributes from the partial table. An anonymization step to

The anonymization device has anonymized the anonymization target table from the anonymization target table and the anonymized partial table for the anonymization target table for 11 master attributes. A duplicate restoration step to generate a table,

Anonymization method including.

[Claim 4] A program for causing a computer to function as the anonymization device according to claim 1 or 2.