WO2013128879A1

WO2013128879A1 - Information processing device for implementing anonymization process, anonymization method, and program therefor

Info

Publication number: WO2013128879A1
Application number: PCT/JP2013/001073
Authority: WO
Inventors: 由起豊田
Original assignee: 日本電気株式会社
Priority date: 2012-03-01
Filing date: 2013-02-25
Publication date: 2013-09-06

Abstract

The present invention provides an information processing device for implementing anonymization in such a manner that the abstraction levels of attribute values to be focused on are preferentially locally lowered. The information processing device is provided with: a means for outputting user information codes that include attribute values to be focused on by generating grouped focus area anonymization group candidates, and determining the focus area anonymization group candidate with the smallest information loss as a focus area anonymization group; and a means for calculating the information loss in the focus area anonymization group candidates.

Description

Information processing apparatus for performing anonymization process, anonymization method, and program therefor

The present invention relates to an information processing apparatus that performs anonymization processing that abstracts user data and improves anonymity, an anonymization method, and a program therefor.

In recent years, various related technologies have been known for anonymizing privacy information.

Non-Patent Document 1 discloses a technique regarding k-anonymity. k-anonymity is an index that guarantees that there are k or more sets of personal information including combinations of the same quasi-identifiers due to anonymization of quasi-identifiers that are information that may identify an individual. Specifically, when there are at least k or more records having a common combination of attribute values for any attribute in certain disclosed data including a plurality of records including attribute values of a plurality of attributes (quasi-identifiers) The disclosed data satisfies k-anonymity. In other words, k-anonymity means that attributes having the same combination of quasi-identifiers can be obtained by abstracting the attribute values (also called quasi-identifiers) of attributes that can be information for identifying individuals into common values. It is an index that guarantees that it will be more than one. Hereinafter, a set of records having the same combination of quasi-identifiers is referred to as an anonymization group.

Non-Patent Document 2 discloses a technique of an anonymization method using Local Recording (local re-encoding). An anonymization method using Local Recording is a method of replacing attribute values of some records with more generalized ones. Non-Patent Document 2 discloses an anonymization method that includes anonymization group G with a part of anonymization group G ′ having a sufficient number of records, and anonymization group G It is a technology that merges and raises the level of abstraction. In the anonymization method of Non-Patent Document 2, records to be merged are selected from the anonymization group G ′ so that k-anonymity k is minimized in the anonymized group after merging. That is, the anonymization method of Non-Patent Document 2 is a method for minimizing information loss due to abstraction (anonymization) by minimizing an increase in the degree of abstraction of information in the entire disclosed data.

However, the anonymization by the related technology as described above has a problem that information required by the data user may be lost because all data is treated equally to satisfy k-anonymity. Patent Document 1 discloses a privacy protection device that solves such problems. This privacy protection device generalizes (abstracts) attribute values (quasi-identifiers) based on priorities that are set for each attribute name (attribute type) and indicate the importance for the data user. In other words, this privacy protection device abstracts an attribute value of an attribute having a lower priority order so that the original information is retained for an attribute having a higher priority order.

JP 2011-128862 A

However, in the technique described in Patent Document 1 described above, there is a problem that, when anonymizing data, the abstraction level of an attribute value to be focused on cannot be lowered locally.

The reason why the level of abstraction cannot be locally reduced is that the technique described in Patent Document 1 is a technique that generalizes attribute values based on the priority set in units of attribute types. .

An object of the present invention is to provide an information processing apparatus that executes anonymization processing that can solve the above-described problems, an anonymization method, and a program therefor.

The information processing apparatus according to the present invention acquires a plurality of user information records including arbitrary attribute values, and includes a plurality of user information records including at least an attention attribute value that is the specific attribute value. A focus partial anonymization group creation means for creating a generalized group candidate;
Information loss amount for calculating an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate for information obtained from the user information record corresponding to the focus partial anonymization group candidate Calculating means,
The focus partial anonymization group creating means determines and outputs the focus partial anonymization group candidate corresponding to the smallest amount of information loss as the focus partial anonymization group among the created focus partial anonymization group candidates .

In the anonymization method of the present invention, the computer
Get multiple user information records containing any attribute value,
Create a focus partial anonymization group candidate that groups a plurality of the user information records including at least an attribute value of interest that is the specific attribute value;
For information obtained from the user information record corresponding to the focus partial anonymization group candidate, calculate an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate,
Of the created focus partial anonymization group candidates, the focus partial anonymization group candidate corresponding to the smallest amount of information loss is determined as a focus partial anonymization group and output.

The non-volatile recording medium of the present invention acquires a plurality of user information records including arbitrary attribute values,
Create a focus partial anonymization group candidate that groups a plurality of the user information records including at least an attribute value of interest that is the specific attribute value;
For information obtained from the user information record corresponding to the focus partial anonymization group candidate, calculate an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate,
A program that causes a computer to execute a process for determining and outputting the focus partial anonymization group candidate corresponding to the smallest amount of information loss among the created focus partial anonymization group candidates is recorded. To do.

The present invention has an effect that it is possible to obtain anonymized data so as to preferentially lower the abstraction level of an attribute value to be focused on locally.

FIG. 1 is a block diagram showing the configuration of the anonymization system according to the first embodiment. FIG. 2 is a diagram illustrating an example of a user information record in the first embodiment. FIG. 3 is a diagram illustrating an example of the anonymized user information record in the first embodiment. FIG. 4 is a diagram illustrating an example of a property record in the first embodiment. FIG. 5 is a diagram illustrating an example of a focus record in the first embodiment. FIG. 6 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization device according to the first embodiment. FIG. 7 is a flowchart illustrating the operation of the anonymization apparatus according to the first embodiment. FIG. 8 is a diagram illustrating an example in which the focus partial anonymization group creation unit selects a user information record in the first embodiment. FIG. 9 is a diagram illustrating an example of a focus partial anonymization group candidate in the first embodiment. FIG. 10 is a diagram illustrating an example of a focus partial anonymization group candidate in the first embodiment. FIG. 11 is a block diagram showing the configuration of the anonymization system according to the second embodiment. FIG. 12 is a diagram illustrating an example of a user information record in the second embodiment. FIG. 13 is a diagram illustrating an example of a division value record in the second embodiment. FIG. 14 is a block diagram illustrating a configuration of an anonymization system according to the third embodiment. FIG. 15 is a diagram illustrating an example of a focus record in the third embodiment. FIG. 16 is a flowchart illustrating the operation of the anonymization device of the third exemplary embodiment. FIG. 17 is a block diagram illustrating a configuration of the anonymization system according to the fourth embodiment.

Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In addition, in each embodiment described in each drawing and specification, the same code | symbol is given to the component provided with the same function.

<< First Embodiment >>
FIG. 1 is a block diagram showing the configuration of the anonymization system according to the first embodiment of the present invention.

Referring to FIG. 1, an anonymization system (also referred to as an information processing system) according to this embodiment includes an anonymization device (also referred to as an information processing device) 100, a user information storage unit 510, and an anonymized user information storage unit 520. Prepare.

The anonymization device 100, the user information storage unit 510, and the anonymized user information storage unit 520 are connected by a network (not shown). Note that the user information storage unit 510 may be included in the anonymization device 100. Further, the anonymized user information storage unit 520 may be included in the anonymization device 100.

The anonymization device 100 anonymizes the user information stored in the user information storage unit 510 and stores the anonymized user information in the anonymized user information storage unit 520.

FIG. 2 is a diagram illustrating an example of a user information record 511 stored in the user information storage unit 510. The user information storage unit 510 includes a plurality of user information records 511 as user information. As shown in FIG. 2, the user information storage unit 510 includes one or more user information records 511. The user information record 511 includes a number 519, an age 512, and a medical condition 513.

Age 512 is one of the quasi-identifiers. The medical condition 513 is one of sensitive attributes. The quasi-identifier (age 512) and the sensitive attribute (medical condition 513) are also generally called attributes. The quasi-identifier is information that may make it possible to identify an individual by combining them. Sensitive attributes are information that is generally not desired to be known to humans.

Note that the number 519 is a number for identifying the user information record 511. When the user information record 511 needs to be individually shown and described, for example, the user information record 511 having the number 519 of “1” is described as the user information record 511 (1).

User information is, for example, receipt information held by a government agency or a medical institution. The receipt information includes the date of birth, sex, illness, and the like.

In the user information record 511 shown in FIG. 2, the attribute value of the age attribute is age 512, and the attribute value of the disease attribute is disease state 513. For example, the user corresponding to the user information record 511 (1) indicates that he is 20 years old and suffers from heart disease.

Note that the user information record 511 may be arbitrary information regardless of the above. For example, the user information record may include age 512, medical condition 513, and other types of information (for example, gender). The user information record may not include the medical condition 513, for example. Furthermore, each arbitrary attribute (quasi-identifier and sensitive attribute) may include a plurality of attribute values. For example, the medical condition 513 may include two attribute values “hay fever” and “tooth decay”.

FIG. 3 is a diagram illustrating an example of the anonymized user information record 521 stored in the anonymized user information storage unit 520. The anonymized user information storage unit 520 includes k or more anonymized user information records 521. As shown in FIG. 3, the anonymized user information record 521 includes a group number 529, an age 512, and a medical condition 513.

The group number 529 is a number for identifying the anonymized user information record 521. When the anonymized user information record 521 needs to be individually shown and described, for example, the anonymized user information record 521 having the group number 529 of “1” is described as the anonymized user information record 521 (1). . The anonymized user information record 521 may not include the group number 529. In this case, the anonymization apparatus 100 may specify and process the anonymized user information record 521 using the age 512, for example.

The anonymized user information record 521 is anonymized user information. The user information is as described above. For example, the user corresponding to the anonymized user information record 521 (1) has an age of 20 to 21 and has suffered from a heart disease, a fracture, or an infection.

Next, each component with which the anonymization apparatus 100 in 1st Embodiment is provided is demonstrated. Note that the components shown in FIG. 1 are not hardware components but functional units.

As shown in FIG. 1, the anonymization device 100 includes a property storage unit 110, a focus value storage unit 120, an anonymization execution reception unit 130, a focus partial anonymization group creation unit 140, an information loss amount calculation unit 150, and an anonymization group. A creation unit 160 is included.

=== Property Storage Unit 110 ===
The property storage unit 110 stores information that becomes an anonymization index.

FIG. 4 is a diagram illustrating an example of the property record 111 stored in the property storage unit 110. As shown in FIG. 4, the property storage unit 110 includes one or more property records 111. The property record 111 includes a parameter name 112 and a parameter value 113. In addition, at least one of the property records 111 stored in the property storage unit 110 is a set of a parameter name 112 and a parameter value 113 that specify k-anonymity k.

4, in the property record 111 that specifies k-anonymity k, the parameter name 112 is “k” and the parameter value 113 is “3”. In FIG. 4, in the property record 111 indicating the quasi-identifier, the parameter name 112 is “quasi-identifier name” and the parameter value 113 is “age”. In the property record 111 indicating the sensitive attribute, the parameter name 112 is “sensitive attribute” and the parameter value 113 is “disease state”.

=== Focus Value Storage Unit 120 ===
The focus value storage unit 120 holds information indicating an attribute value (attention attribute value) to be noted among attribute values included in the user information record 511.

FIG. 5 is a diagram illustrating an example of the focus record 121 stored in the focus value storage unit 120. As shown in FIG. 5, the focus record 121 includes a quasi-identifier name 122 and a focus value 123. The information included in the focus record 121 is information input to the anonymization device 100 in advance by a user of anonymized data (not shown). Note that the information included in the focus record 121 may be included in an anonymization process execution start instruction to be described later and input to the anonymization device 100.

5, the focus record 121 has a semi-identifier name 122 of “age” and a focus value 123 of “21”. Therefore, the focus record 121 indicates that the attribute value to be noticed is, for example, the attribute value having the age 512 of “21” in the user information record 511 illustrated in FIG. 2.

=== Anonymization Execution Accepting Unit 130 ===
The anonymization execution reception unit 130 receives an anonymization process execution start instruction from the outside, and outputs the received anonymization process execution start instruction.

=== Focus Partial Anonymization Group Creation Unit 140 ===
The focus partial anonymization group creation unit 140 uses the user information record 511 stored in the user information storage unit 510 to generate a focus partial anonymization group based on the focus record 121 stored in the focus value storage unit 120. create.

In the present embodiment, the focus partial anonymization group creation unit 140 acquires the user information record 511 and the property record 111 via the anonymization group creation unit 160 in response to an instruction from the anonymization group creation unit 160. And create a focus part anonymization group. The focus partial anonymization group creation unit 140 may create a focus partial anonymization group in response to an anonymization process execution start instruction received by the anonymization execution reception unit 130. In addition, the focus part anonymization group creation unit 140 may acquire the user information record 511 directly from the user information storage unit 510. Further, the focus part anonymization group creation unit 140 may acquire the property record 111 directly from the property storage unit 110.

Then, the focus partial anonymization group creation unit 140 outputs the created focus partial anonymization group to the anonymization group creation unit 160.

Specifically, the focus partial anonymization group creation unit 140 creates a focus partial anonymization group as follows.

First, the focus part anonymization group creation unit 140 groups the user information record 511 including at least the focus value 123 and other user information records 511 in the user information record 511 shown in FIG. Create a group candidate.

At this time, the focus partial anonymization group creation unit 140 performs grouping based on the information of the property record 111 that specifies k-anonymity k of the property storage unit 110. For example, the focus partial anonymization group creation unit 140 creates a focus partial anonymization group candidate by grouping k user information records 511 including at least the user information record 511 including the focus value 123.

Secondly, the focus partial anonymization group creation unit 140 outputs the created focus partial anonymization group candidate to the information loss amount calculation unit 150. Then, the focus partial anonymization group creation unit 140 receives the information loss amount of the focus partial anonymization group candidate from the information loss amount calculation unit 150.

Thirdly, the focus partial anonymization group creation unit 140 focuses the focus partial anonymization group candidate having the smallest value of the corresponding information loss amount among the plurality of focus partial anonymization group candidates that have received the information loss amount. Determine as a partially anonymized group.

The focus partial anonymization group includes information corresponding to the user information record 511 including at least the focus value 123. The corresponding information is, for example, information “infectious disease” of the medical condition 513 in the user information record 511 including the age 512 having the same value as “21” which is the focus value 123.

=== Information Loss Calculation Unit 150 ===
The information loss amount calculation unit 150 calculates the information loss amount of the focus part anonymization group candidate. For example, the information loss amount calculation unit 150 calculates the information loss amount by the following equation.

Information loss amount = (maximum value of specific attribute value included in focus value anonymization group candidate−minimum value of specific attribute value included in focus value anonymization group candidate + 1) × number of records.

In the above formula, the “specific attribute value included in the focus value anonymization group candidate” is, in other words, the quasi-identifier name of the focus value storage unit 120 of the user information record 511 corresponding to the focus value anonymization group candidate. The attribute value specified by 122 is age 512. That is, the specific attribute value is an attribute value corresponding to the quasi-identifier name 122 of the focus record 121, and in the case of the focus record 121 shown in FIG. 5, the specific attribute value is age 512. Then, “the maximum value of the specific attribute value included in the focus value anonymization group candidate−the minimum value of the specific attribute value included in the focus value anonymization group candidate”, that is, the difference of the age 512 is the focus value anonymization This is a range of attribute values specified by the quasi-identifier name 122 of the focus value storage unit 120 of the user information record 511 corresponding to the group candidate.

If the attribute value is a character string such as an address or gender, the information loss amount calculation unit 150 calculates the information loss amount of the categorical value (character string) as shown in Non-Patent Document 2. It may be. For example, the information loss amount calculation unit 150 may calculate information loss amount = (number of original user information records) / (number of specific attribute values when grouped). For example, referring to FIG. 2 and FIG. 3, in the case of the anonymized user information record 521 whose group number 529 is “1”, the information loss amount calculation unit 150 calculates information loss amount = 3/1 = 3. Here, the number of user information records as the source is three user information records 511 having numbers 519 of “1”, “2”, and “3”. Further, the number of specific attribute values when grouped is one in which the group number 529 is “1” and the age 512 attribute value is “20-21”.

=== Anonymization group creation unit 160 ===
The anonymization group creation unit 160 creates the anonymized user information record 521 and stores the created anonymized user information record 521 in the anonymized user information storage unit 520. In addition, the anonymization group creation part 160 does not perform the process memorize | stored in the anonymized user information storage part 520, when the anonymized user information record 521 which can ensure anonymity cannot be created.

The anonymization group creation unit 160 creates the anonymized user information record 521 when triggered by the anonymization process execution start instruction received by the anonymization execution reception unit 130.

Specifically, the anonymization group creation unit 160 creates the anonymized user information record 521 as follows.

First, the anonymization group creation unit 160 passes the user information record 511 and the property record 111 to the focus partial anonymization group creation unit 140 and instructs the creation of the focus partial anonymization group. Then, the anonymization group creation unit 160 receives the focus partial anonymization group from the focus partial anonymization group creation unit 140 as a response to the creation instruction.

Secondly, the anonymization group creation unit 160 creates one or more anonymization groups from the user information record 511 other than the user information record 511 corresponding to the focus partial anonymization group created by the focus partial anonymization group creation unit 140. create.

Third, the anonymization group creation unit 160 creates an anonymized user information record 521 corresponding to the focus partial anonymization group received from the focus partial anonymization group creation unit 140 and the anonymization group created by itself. And stored in the anonymized user information storage unit 520. An anonymized user information record 521 corresponding to each of the focus partial anonymization group and the anonymization group is obtained by collecting user information records 511 included in the focus partial anonymization group and the anonymization group, and assigning a group number 529. It is.

This completes the description of each component of the functional unit of the anonymization device 100.

Next, the components of the anonymization device 100 in hardware units will be described.

FIG. 6 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.

As shown in FIG. 6, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.

The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). Further, the CPU 701 reads a program (for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG. 7 described later) and data from a recording medium 707 mounted on the storage device 703, and loads the read program and data. Write to the storage unit 702. The CPU 701 follows the read program and based on the read data, the anonymization execution reception unit 130, the focus partial anonymization group creation unit 140, the information loss amount calculation unit 150, and the anonymization group creation unit shown in FIG. Various processes are executed as 160.

Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).

The storage unit 702 stores programs and data. The storage unit 702 may include a property storage unit 110 and a focus value storage unit 120. Furthermore, when the computer 700 (anonymization apparatus 100) includes the user information storage unit 510 and the anonymized user information storage unit 520, the storage unit 702 may include these.

The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer. The storage device 703 may include a property storage unit 110 and a focus value storage unit 120. Furthermore, when the computer 700 (anonymization device 100) includes the user information storage unit 510 and the anonymized user information storage unit 520, the storage device 703 may include these.

The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like. The input unit 704 is included as part of the anonymization execution reception unit 130.

The output unit 705 is realized by a display, for example, and is used for confirming the output.

The communication unit 706 implements an interface with the user information storage unit 510 and the anonymized user information storage unit 520. The communication unit 706 is included as a part of the anonymization group creation unit 160.

As described above, the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .

Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.

This completes the description of each component of the computer 700 that implements the anonymization device 100 according to the present embodiment.

Next, the operation of this embodiment will be described in detail with reference to FIGS. 1 to 10 (drawings).

FIG. 7 is a flowchart showing the operation of the anonymization device 100 of this embodiment. Note that the processing according to this flowchart may be executed based on the above-described program control by the CPU. Further, the step name of the process is described by a symbol as in S601.

The anonymization execution reception unit 130 receives an anonymization process execution start instruction from a user of anonymization data (not shown) and outputs the instruction to the anonymization group creation unit 160 (S601).

Next, the anonymization group creation unit 160 obtains the user information record 511 from the user information storage unit 510 upon receiving the anonymization process execution start instruction (S602).

Next, the anonymization group creation unit 160 acquires the property record 111 having the parameter name 112 of “k” from the property storage unit 110 (S603).

Next, the anonymization group creation unit 160 includes the user information record 511 acquired in S602, and the parameter value 113 (eg, “3”) of the property record 111 having the parameter name 112 “k” acquired in S603. Is output to the focus partial anonymization group creation unit 140 to instruct the creation of the focus partial anonymization group. (S604).

Next, the focus part anonymization group creation unit 140 acquires the focus record 121 (for example, the quasi-identifier name 122 is “age” and the focus value 123 is “21”) from the focus value storage unit 120 (S605).

Next, the focus partial anonymization group creation unit 140 creates a focus partial anonymization group candidate based on the acquired focus record 121 (S606).

FIG. 8 is a diagram illustrating an example in which the focus partial anonymization group creation unit 140 selects the user information record 511 when creating a focus partial anonymization group candidate. As shown in FIG. 8, the focus partial anonymization group creating unit 140 regards three records as one group from the user information record 511 whose age 512 is “21” in the direction of decreasing age, and makes the focus value anonymized. Create group candidates. Further, the focus partial anonymization group creation unit 140 creates focus value anonymization group candidates by regarding three records as one group in the direction of increasing age. The focus partial anonymization group creation unit 140 further creates a focus value anonymization group candidate when three records centered on the user information record 511 having the age 512 of “21” are regarded as one group. May be. FIG. 9 is a diagram illustrating an example in which the focus partial anonymization group creation unit 140 creates one focus partial anonymization group candidate by collecting three records in the direction of decreasing age. FIG. 10 is a diagram illustrating an example in which the focus partial anonymization group creation unit 140 creates one focus partial anonymization group candidate by collecting three records in the direction of increasing age.

Next, the focus partial anonymization group creation unit 140 transmits the created focus value anonymization group candidate to the information loss amount calculation unit 150 (S607).

Next, the information loss amount calculation unit 150 calculates an information loss amount for each received focus value anonymization group candidate (S608).

For example, the information loss amount calculation unit 150 calculates the information loss amount of the focus partial anonymization group candidate shown in FIGS. 9 and 10 using the above-described information loss amount calculation formula as follows.

The information loss amount of the focus partial anonymization group candidate shown in FIG. 9 is (21-20 + 1) × 3 = 6.

The information loss amount of the focus partial anonymization group candidate shown in FIG. 10 is (23-21 + 1) × 3 = 9.

Next, based on the information loss amount received from the information loss amount calculation unit 150, the focus partial anonymization group creation unit 140 has the smallest information loss amount (for example, “6”) and the focus partial anonymization group candidate (for example, 9 is determined as a focus value anonymization group. Subsequently, the focus partial anonymization group creation unit 140 outputs the determined focus partial anonymization group candidate to the anonymization group creation unit 160 (S609).

Next, the anonymization group creation unit 160 creates one or more anonymization groups from the user information record 511 other than the user information record 511 corresponding to the focus partial anonymization group created by the focus partial anonymization group creation unit 140. (S610).

Next, the anonymization group creation unit 160 creates an anonymized user information record 521 corresponding to the focus partial anonymization group received from the focus partial anonymization group creation unit 140 and the anonymization group created by itself. Is stored in the anonymized user information storage unit 520 (S611).

The above is the description of the operation of the present embodiment.

The anonymized user information record 521 created as described above gives priority to minimizing the abstraction level of the quasi-identifier (age 512) corresponding to the focus value 123 specified by the user who uses the anonymized data. The anonymization process is performed. That is, the anonymization device 100 according to the present embodiment can minimize the abstraction level of the quasi-identifier corresponding to the designated focus value 123.

By using an anonymized data set (a set of anonymized user information records 521) whose abstraction level has been lowered locally, for example, the postal code of the disaster area, the study guidance guidelines have been significantly changed It is possible to examine in detail the vicinity of a meaningful value such as the date of birth.

The effect of the present embodiment described above is that it is possible to obtain anonymized data so as to preferentially lower the abstraction level of the attribute value to be focused on locally.

The reason is that the following configuration is included. That is, first, the focus partial anonymization group creation unit 140 creates a focus partial anonymization group candidate. Secondly, the information loss amount calculation unit 150 calculates the information loss amount of each focus partial anonymization group candidate. Thirdly, the focus partial anonymization group creation unit 140 determines a focus partial anonymization group candidate corresponding to the smallest amount of information loss as a focus partial anonymization group.

<< Second Embodiment >>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

FIG. 11 is a block diagram showing the configuration of the anonymization system according to the second embodiment of the present invention.

Referring to FIG. 11, the anonymization system according to the present embodiment includes an anonymization device 200, a user information storage unit 510, and an anonymized user information storage unit 520.

The anonymization device 200, the user information storage unit 510, and the anonymized user information storage unit 520 are connected by a network (not shown). Note that the user information storage unit 510 may be included in the anonymization device 200. Further, the anonymized user information storage unit 520 may be included in the anonymization device 200.

The anonymization device 200 anonymizes the user information stored in the user information storage unit 510 and stores the anonymized user information in the anonymized user information storage unit 520.

Referring to FIG. 11, the anonymization device 200 according to the present embodiment further includes a divided value storage unit 270 as compared with the anonymization device 100 according to the first embodiment. Moreover, the anonymization apparatus 200 has the focus partial anonymization group creation part 240 instead of the focus partial anonymization group creation part 140 compared with the anonymization apparatus 100 of 1st Embodiment.

FIG. 12 is a diagram showing an example of the user information record 511 in the present embodiment. As shown in FIG. 12, the user information record 511 in the present embodiment further includes a consultation date 514 as a quasi-identifier as compared to the user information record 511 in FIG.

=== Division Value Storage Unit 270 ===
The division value storage unit 270 holds information indicating attribute values for dividing user information.

FIG. 13 is a diagram illustrating an example of the division value record 271 stored in the division value storage unit 270. As illustrated in FIG. 13, the division value storage unit 270 includes one or more division value records 271. The division value record 271 includes a semi-identifier name 272 and a division value 273. The information included in the division value record 271 is information input to the anonymization device 200 in advance by a user of anonymized data (not shown). Note that the information included in the division value record 271 may be included in the anonymization process execution start instruction and input to the anonymization device 200.

13, in the division value record 271, the quasi-identifier name 272 is “visit date”, and the division value 273 is “November 30, 2011, December 1, 2011”. Therefore, the division value record 271 indicates that the user information is divided into a user information record 511 before November 30, 2011 and a user information record 511 after December 1, 2011.

=== Focus partial anonymization group creation unit 240 ===
The focus part anonymization group creation unit 240 uses the user information record 511 stored in the user information storage unit 510 to store the focus record 121 and the divided value storage unit 270 stored in the focus value storage unit 120. Based on the division value record 271, a focus partial anonymization group is created.

Specifically, the focus partial anonymization group creation unit 240 divides the user information record 511 shown in FIG. 12 based on the division value record 271 to create a plurality of division groups. Next, the focus partial anonymization group creation unit 240 executes steps S606 to S609 in FIG. 7 for each divided group of the user information record 511 in the same manner as the focus partial anonymization group creation unit 140 of the first embodiment. And create a focus part anonymization group.

For example, the focus part anonymization group creation unit 240 sets the user information record 511 in FIG. 12 as the user information record 511 having the numbers 519 of “1”, “3”, “4”, and “6”, and the number 519 as “ It is divided into user information records 511 of “2”, “5” and “7”.

Next, the focus partial anonymization group creation unit 240 executes steps S606 to S609, and outputs the focus partial anonymization group created without crossing the division value 273.

Then, the anonymized group creation unit 160 stores the anonymized user information record 521 corresponding to the focus partial anonymization group created without straddling the division value 273 in the anonymized user information storage unit 520.

By configuring as described above, for example, it becomes possible to anonymize user information by separating the front and back of a day when a certain guideline is established, a day when a new drug is permitted to be used, etc. A converted user information record 521 can be created.

The division value record 271 is not limited to the example described above, and may specify division of an arbitrary attribute value included in the user information record 511. Moreover, the division value record 271 may be plural.

The effect of the present embodiment described above is that, in addition to the effect of the first embodiment, for the user information record 511 with a narrower range, the abstraction level of the attribute value to be focused is preferentially lowered locally. It is possible to obtain anonymized data.

The reason is that the focus part anonymization group creation unit 240 divides the user information record 511 based on the division value record 271.

<< Third Embodiment >>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

FIG. 14 is a block diagram showing the configuration of the anonymization system according to the third embodiment of the present invention.

Referring to FIG. 14, the anonymization system according to this embodiment includes an anonymization device 300, a user information storage unit 510, and an anonymized user information storage unit 520. 3 is a block diagram showing a configuration of an anonymization device 300. FIG.

The anonymization device 300, the user information storage unit 510, and the anonymized user information storage unit 520 are connected by a network (not shown). Note that the user information storage unit 510 may be included in the anonymization device 300. Further, the anonymized user information storage unit 520 may be included in the anonymization device 300.

The anonymization device 300 anonymizes the user information stored in the user information storage unit 510 and stores the anonymized user information in the anonymized user information storage unit 520.

Referring to FIG. 14, the anonymization device 300 in the present embodiment is different from the anonymization device 100 of the first embodiment in that a focus partial anonymization group creation unit 340 is used instead of the focus partial anonymization group creation unit 140. Instead of the anonymization group creation unit 160, an anonymization group creation unit 360 is provided.

FIG. 15 is a diagram illustrating an example of the focus record 121 stored in the focus value storage unit 120 according to the present embodiment. As shown in FIG. 15, the focus value storage unit 120 of this embodiment includes a plurality of focus records 121. The focus record 121 of this embodiment further includes a priority 128 in addition to the quasi-identifier name 122 and the focus value 123.

The priority 128 is information indicating the order of the focus records 121 when a focus partial anonymization group is created. The priority 128 may be information indicating the weight of the focus record 121.

=== Focus partial anonymization group creation unit 340 ===
The focus partial anonymization group creation unit 340 has the following differences with respect to the focus partial anonymization group creation unit 140.

Upon receiving an instruction from the anonymization group creation unit 360, the focus partial anonymization group creation unit 340 creates the focus partial anonymization group using the focus records 121 in order of priority 128, and creates the anonymization group Output to the unit 360. At this time, the focus partial anonymization group creation unit 340 adds the completion information to the focus partial anonymization group and outputs it to the anonymization group creation unit 360. Here, the completion information is information indicating whether focus partial anonymization group candidates including each of the focus records 121 have been created (“complete”) or not (“incomplete”) for all focus records 121. .

=== Anonymization group creation unit 360 ===
The anonymization group creation unit 360 has the following differences with respect to the anonymization group creation unit 160.

The anonymization group creation unit 360 receives from the focus partial anonymization group creation unit 340 a focus partial anonymization group to which information indicating whether or not an unused focus record 121 remains is added. Then, the anonymization group creation unit 360 creates the anonymized user information record 521 in the same manner as the anonymization group creation unit 160.

Next, the anonymization group creation unit 360 confirms the completion information. When the completion information is “incomplete”, the anonymization group creation unit 360 passes the created anonymized user information record 521 to the focus partial anonymization group creation unit 340 to create the focus partial anonymization group again. Instruct. Also, when the completion information is “completed”, the anonymized group creation unit 360 stores the created anonymized user information record 521 in the anonymized user information storage unit 520.

That is, the anonymization apparatus 300 according to the present embodiment can specify the focus value 123 for each of a plurality of quasi-identifiers (arbitrary quasi-identifier names 122 and quasi-identifiers having the same quasi-identifier name 122). .

Next, the operation of this embodiment will be described with reference to the drawings.

FIG. 16 is a flowchart showing the operation of the present embodiment.

S601 to S603 are the same operations as S601 to S603 in FIG.

Next, the anonymization group creation unit 360 outputs one of the anonymized user information records 521 and the parameter value 113 to the focus partial anonymization group creation unit 340, and instructs the creation of the focus partial anonymization group. . (S634). Here, the anonymized user information record 521 is the anonymized user information record 521 created in S602 or the user information record 511 acquired in S602. The parameter value 113 is the parameter value 113 of the property record 111 whose parameter name 112 acquired in S603 is “k”.

S605 to S608 are the same operations as S605 to S608 in FIG.

Next, based on the information loss amount received from the information loss amount calculation unit 150, the focus partial anonymization group creation unit 340 determines the focus partial anonymization group candidate with the smallest information loss amount as the focus value anonymization group. . Subsequently, the focus partial anonymization group creation unit 340 adds completion information to the determined focus partial anonymization group candidate and outputs the completion information to the anonymization group creation unit 160 (S639).

S610 is the same operation as S610 in FIG.

Next, the anonymization group creation unit 360 creates an anonymized user information record 521 corresponding to the focus partial anonymization group received from the focus partial anonymization group creation unit 340 and the anonymization group created by itself. (S641).

Next, the anonymization group creation unit 360 confirms the completion information (S642). If the completion information is “incomplete” (NO in S642), the process returns to S634.

If the completion signal is “complete” (YES in S642), the anonymized user information record 521 created in S641 is stored in the anonymized user information storage unit 520 (S643).

The above is the description of the operation of the present embodiment.

By configuring as described above, for example, a focus value 123 is set in the date of birth, date of birth, and quasi-identifier of gender, and attention is paid to a female patient at an age when permission to use a certain new drug is given. The incidence of cervical cancer can be examined.

The focus record 121 may not include the priority 128. In this case, the focus part anonymization group creation unit 340 may use the focus records 121 included in the address of the young or old number in the focus value storage unit 120 in order. Further, the focus partial anonymization group creation unit 340 may use the plurality of focus records 121 in a fixed order predetermined for the semi-identifier name 122 or in an arbitrary order.

In addition to the effect of the first embodiment, the effect in the present embodiment described above is that anonymized data is preferentially lowered locally in order to preferentially reduce the abstraction level of the attribute value to be noticed from a plurality of viewpoints. It is a point that can be obtained.

The reason is that the following configuration is included. First, the focus partial anonymization group creation unit 340 creates a focus partial anonymization group by sequentially using a plurality of focus records 121. Second, the anonymization group creation unit 360 creates a focus partial anonymization group for the created anonymized user information record 521 until the focus partial anonymization group creation unit 340 uses all the focus records 121. The focus partial anonymization group creation unit 340 is instructed.

<< Fourth Embodiment >>
Next, a fourth embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

FIG. 17 is a block diagram showing a configuration of an anonymization apparatus 400 according to the fourth embodiment of the present invention.

17, the anonymization device 400 includes a focus partial anonymization group creation unit 140 and an information loss amount calculation unit 150.

=== Focus Partial Anonymization Group Creation Unit 140 ===
The focus partial anonymization group creation unit 140 stores a plurality of user information records 511 including arbitrary attribute values (for example, age 512, medical condition 513 shown in FIG. 6, consultation date 514 shown in FIG. 12). Obtain from means. User information storage means (not shown) may be included in the anonymization device 400, or may be a user information storage unit 510 as shown in FIG.

The focus partial anonymization group creation unit 140 creates a focus partial anonymization group candidate in which a plurality of user information records 511 are grouped. The plurality of user information records 511 include at least an attention attribute value. The attention attribute value is one of the above-described arbitrary attribute values, and is an attribute value specified by the quasi-identifier name 122 and the focus value 123 included in the focus record 121.

Also, the focus partial anonymization group creation unit 140 determines the focus partial anonymization group candidate corresponding to the smallest amount of information loss among the created focus partial anonymization group candidates and outputs the focus partial anonymization group candidate.

=== Information Loss Calculation Unit 150 ===
The information loss amount calculation unit 150 calculates and outputs the information output amount of the focus partial anonymization group candidate created by the focus partial anonymization group creation unit 140. The amount of information output is the amount of information obtained from the focus partial anonymization group candidate lost (decreased) with respect to the information obtained from the user information record 511 corresponding to the focus partial anonymization group candidate. Show.

Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.

In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.

The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.

In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.

Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.

Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.

As mentioned above, although this invention was demonstrated with reference to each embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2012-045548 for which it applied on March 1, 2012, and takes in those the indications of all here.
DESCRIPTION OF SYMBOLS 100 Anonymization device 110 Property storage unit 111 Property record 112 Parameter name 113 Parameter value 120 Focus value storage unit 121 Focus record 122 Quasi-identifier name 123 Focus value 128 Priority 130 Anonymization execution reception unit 140 Focus partial anonymization group Creation unit 150 Information loss calculation unit 160 Anonymization group creation unit 200 Anonymization device 240 Focus partial anonymization group creation unit 270 Division value storage unit 271 Division value record 272 Quasi-identifier name 273 Division value 300 Anonymization device 340 Focus portion anonymous Group creation unit 360 anonymization group creation unit 400 anonymization device 510 user information storage unit 511 user information record 512 age 513 medical condition 514 medical examination 519 No. 520 anonymized already user information storage unit 521 anonymized already user information record 529 group number 700 computer 701 CPU
702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium

Claims

Focus part for acquiring a plurality of user information records including arbitrary attribute values and creating a focus part anonymization group candidate in which a plurality of user information records including at least the attribute value of interest that is the specific attribute value are grouped Anonymization group creation means;
Information loss amount for calculating an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate for information obtained from the user information record corresponding to the focus partial anonymization group candidate Calculating means,
The focus partial anonymization group creating means determines and outputs the focus partial anonymization group candidate corresponding to the smallest amount of information loss as the focus partial anonymization group among the created focus partial anonymization group candidates Information processing device.
The information loss amount calculating means includes:
The amount of information loss based on the attribute value range of the same attribute name as the attention attribute value of the user information record corresponding to each of the focus partial anonymization group candidates and the number of the corresponding user information records The information processing apparatus according to claim 1, wherein calculation is performed.
The focus partial anonymization group creation means creates the focus partial anonymization group candidate that groups at least the k user information records in k-anonymity including at least the user information record including the attention attribute value. The information processing apparatus according to claim 1 or 2.
The focus part anonymization group creation unit
Dividing the plurality of user information records based on division information indicating an attribute value for dividing the plurality of user information records;
4. The focus partial anonymization group candidate is created by grouping a plurality of the user information records including at least the attention attribute value within the range of the divided user information records. 5. 1. An information processing apparatus according to item 1.
The focus partial anonymization group creation means is instructed to create the focus partial anonymization group to obtain the focus partial anonymization group, and anonymous from a user information record other than the user information record corresponding to the focus partial anonymization group An anonymized group creating means for creating an anonymized user information record corresponding to each of the focused partial anonymized group and the anonymized group,
The focus part anonymization group creation means includes at least one of the plurality of attention attribute values sequentially for each of the plurality of attention attribute values each time an instruction to create the focus partial anonymization group is received. Create the focus partial anonymization group candidate,
The anonymized group creating means creates the anonymous anonymous until the focus partial anonymized group creating means creates the focus partial anonymized group candidate including the focused attribute value for all the plurality of focused attribute values. The information processing apparatus according to any one of claims 1 to 4, wherein the focus partial anonymization group creation unit is instructed to create the focus partial anonymization group for a converted user information record.
Computer
Get multiple user information records containing any attribute value,
Create a focus partial anonymization group candidate that groups a plurality of the user information records including at least an attribute value of interest that is the specific attribute value;
For information obtained from the user information record corresponding to the focus partial anonymization group candidate, calculate an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate,
Of the created focus partial anonymization group candidates, determine the focus partial anonymization group candidate corresponding to the smallest amount of information loss as a focus partial anonymization group, and output,
Anonymization method.
Get multiple user information records containing any attribute value,
Create a focus partial anonymization group candidate that groups a plurality of the user information records including at least an attribute value of interest that is the specific attribute value;
For information obtained from the user information record corresponding to the focus partial anonymization group candidate, calculate an information loss amount indicating an amount of loss of information obtained from the focus partial anonymization group candidate,
A program for causing a computer to execute a process for determining and outputting the focus partial anonymization group candidate corresponding to the smallest amount of information loss among the created focus partial anonymization group candidates is recorded. Non-volatile recording medium.
The information processing apparatus according to any one of claims 1 to 6,
User information storage means for storing the user information record;
An anonymized user information storage means for storing an anonymized user information record included in the focus partial anonymization group.