WO2013183250A1

WO2013183250A1 - Information processing device for anonymization and anonymization method

Info

Publication number: WO2013183250A1
Application number: PCT/JP2013/003347
Authority: WO
Inventors: 翼高橋
Original assignee: 日本電気株式会社
Priority date: 2012-06-04
Filing date: 2013-05-28
Publication date: 2013-12-12
Also published as: JPWO2013183250A1

Abstract

This invention provides an anonymization device for generating a dataset with reinforced anonymity such that individual specificity cannot be enhanced even by drawing parallels between quasi-identifiers of a group of records subsequent to k-anonymization. The anonymization device is provided with: a k-anonymization means for generating a k-anonymized dataset by converting intrinsic identification information of an anonymization-target dataset to false identification information and, in addition, converting a quasi-identifier to satisfy a determined k-anonymity; and an anonymity reinforcement means for generating and outputting an anonymity-reinforced dataset by converting quasi-identifiers that always have the same value corresponding to the intrinsic identification information for each same piece of false identification information of the k-anonymized dataset to quasi-identifiers for which instantiation is not possible by drawing parallels between these quasi-identifiers.

Description

Information processing apparatus and anonymization method for anonymization

The present invention relates to an information processing apparatus, anonymization method, and program for processing information and anonymizing it.

Various related technologies for anonymizing information are known.

For example, Non-Patent Document 1 proposes k-anonymity, which is a well-known anonymity index. A technique for satisfying a predetermined k-anonymity in a data set to be anonymized is called k-anonymization. The attribute information to be converted included in the anonymization target data set is referred to as a quasi-identifier. The quasi-identifier is not unique identification information (for example, a name) that identifies an individual. The quasi-identifier is information of an attribute that may identify an individual by combining with other information that is not unique identification information. In this k-anonymization, a process of converting the target quasi-identifier is performed so that at least k records having the same quasi-identifier exist in the data set to be anonymized. That is, a process of converting the target quasi-identifier into information that is difficult to identify (or identify) an individual is performed so as to satisfy k-anonymity. As this conversion process, anonymization processes such as generalization and cutoff are known. In the generalization, the original detailed (specific) information of the quasi-identifier is converted into more abstract information.

For example, Patent Document 1 discloses a privacy protection device including data processing means for processing data until the above k-anonymity is satisfied.

JP 2011-180839 A

However, in the techniques described in the above-mentioned prior art documents, there is a problem that when the quasi-identifiers of the k-anonymized record groups are compared with each other, the individual specificity indicating the degree to which the individual can be identified may increase. There is.

The reason for this is that k-anonymity described in Non-Patent Document 1 and the privacy protection device of Patent Document 1 include a case where a plurality of records having the same unique identification information are included in the data set to be anonymized. This is because the problem is not considered.

Specifically, privacy protection by k-anonymity may fail as follows. First, the data set to be anonymized includes a plurality of records having the same unique identification information. Secondly, the anonymization target data set is k-anonymized while maintaining a specific connection relationship between a plurality of records having the same unique identification information. Third, the quasi-identifiers of record groups after k-anonymization that can be related by the connection relationship are compared. Fourthly, the quasi-identifier abstracted by the above-mentioned k-anonymization is embodied by the comparison.

A data set including a plurality of records having the same unique identification information as described above is stored in a predetermined recording medium. The data set includes historical information accumulated by those service providers, such as purchase information and medical information. These purchase information and medical information are generally stored in a recording medium as a set of a plurality of records for one individual (user). For example, purchase information associated with a credit card number is generated every time a user performs a purchase action using the same credit card. Such purchase information is associated with the user and stored in a recording medium as a record. Similarly, medical information is generated every time a medical practice is received using the same insurance card. Then, medical information associated with the same insured person is accumulated in the recording medium.

An example of the case where privacy protection by k-anonymity fails as described above will be described with specific data.

FIG. 2 is a diagram illustrating an example of an anonymization target data set. FIG. 3 is a diagram illustrating an example of a data set obtained by k-anonymizing the data set to be anonymized in FIG.

The data set shown in FIG. 3 includes attributes of “gender”, “birth date”, and “care date” as quasi-identifiers. These quasi-identifiers are obtained by applying k = 2 anonymization to the anonymization target data set shown in FIG. Also, the name (unique identification information) existing in each record of the anonymization target data set shown in FIG. 2 is converted to a fake ID (IDentifier) in the data set shown in FIG. The fake ID is local identification information for the data set shown in FIG. 3 that shows only the relationship between each record of the data set shown in FIG. 3 and does not specify a specific individual.

The data set shown in FIG. 3 is designed to prevent a person's records from being narrowed down to less than k from any combination of knowledge about “gender”, “birth date”, and “care date” for an individual. ing. The data set shown in FIG. 3 is processed so that k-anonymity of k = 2 is maintained with respect to the quasi-identifier of each record of the anonymization target data set shown in FIG. That is, the data set shown in FIG. 3 is such that there are two or more records associated with an individual, regardless of what knowledge is used for “gender”, “birth date”, and “care date”. = 2 k-anonymized data set.

The data set shown in FIG. 3 is different from the data set handled in the background art as follows. That is, it is that the data set shown in FIG. 3 is anonymized information set (FIG. 2) to be anonymized in which a plurality of records are stored for one individual. Specifically, each individual record group of the anonymization target data set shown in FIG. 2 has a specific connection relationship, that is, a connection relationship in which attributes of names (unique identification information) are common. The difference is that a plurality of records having the same unique identification information are stored in the recording medium as a data set shown in FIG. 3 by anonymized fake IDs. That is, as described above, k-anonymization of related technology does not take into consideration that a plurality of records of one individual appear at the same time.

The data set shown in FIG. 3 does not truly satisfy k = 2 anonymity for each record. The reason is as follows.

The target record 822 of “sex: female, date of birth: February 2, 1985, date of medical treatment: April 2010” having the name: “Alice” shown in FIG. 2 is a false ID: “ 2 is processed into an anonymized record 832 of “sex: Any, date of birth: 1981-1985, date of medical treatment: April 2010”. Further, the target record 825 of “sex: female, date of birth: February 2, 1985, date of medical treatment: May 2010” having the name: “Alice” shown in FIG. 2 is a false ID shown in FIG. It is processed into an anonymized record 835 of “sex: woman, date of birth: 1985-1986, date of medical treatment: May 2010” with “2”.

At this time, it is assumed that a certain person x knows “sex: woman, date of birth: February 2, 1985” as information about Alice. Even in such a case, each of the anonymization record 832 and the anonymization record 835 having the fake ID: 2 has 2-anonymity regarding “gender: female, date of birth: February 2, 1985”. Yes.

However, the attributes of “sex” and “birth date” have an invariant attribute value for a certain individual. Therefore, it can be easily estimated that anonymized records having the same fake ID have the same attribute value as such an invariant attribute value in the target record before anonymization. Given this knowledge, the person x can combine the anonymization records based on the fake ID. In this case, the person x has anonymized record 832 of “sex: Any, date of birth: 1981-1985” with a false ID: 2 and anonymity of “sex: woman, date of birth: 1985-1986”. From the product with the quantified record 835, “sex: female, date of birth: 1985” can be obtained as information of the anonymized record of the actualized false ID: “2”. From the above, the 2-anonymity of the false ID: “2” is broken.

An object of the present invention is to provide an information processing apparatus, anonymization method, and program for anonymization that can solve the above-described problems.

The information processing apparatus that performs anonymization according to the present invention includes the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information. Are converted to false identification information that is uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into false identification information. k- to satisfy anonymity, converts the anonymous quasi identifier, said converting the target record to anonymous record comprising said anonymized record k- generating the anonymous data sets k- anonymizing section And converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, and anonymity enhancement including the enhanced record An anonymity enhancing unit that generates and outputs a data set, and the anonymity enhancing unit includes the target quasi-identifier that always has the same attribute value corresponding to each of the unique identification information in the anonymization target data set. The anonymization quasi-identifier corresponding to the anonymization quasi-identifier is converted into information that cannot be instantiated the quasi-identification quasi-identifier by comparing the quasi-identification quasi-identifier, and Convert to the enhanced record.

In the anonymization method of the present invention, for each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is The restoration information to the unique identification information is not included, converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into k-anonymity by the anonymization target data set Is converted to an anonymized quasi-identifier, the target record is converted to an anonymized record, a k-anonymized data set including the anonymized record is generated, and the k-anonymized data set is , Converting the anonymization record having the same false identification information into an enhancement record, generating an anonymity enhancement data set including the enhancement record, and outputting, In the anonymization target data set, the enhancement target quasi-identifier corresponding to the target quasi-identifier always corresponding to each of the unique identification information and having the same attribute value is the anonymization quasi-identifier, The anonymization record is converted to the strengthening record by converting the strengthening target semi-identifier into information that cannot be materialized by comparison.

The non-volatile recording medium program of the present invention provides each of the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information. Are converted to false identification information that is uniquely assigned to each of the unique identification information, and the anonymization target data set is k− so as to satisfy the anonymity, it converts the anonymous quasi identifier, converting the target record to anonymous record, and generating a k- anonymous data set that contains the anonymous record, the k- anonymous Anonymization enhanced data including the enhanced record by converting the anonymized record having the same false identification information into a strengthened record A process for generating a computer and a process for causing the computer to execute recording and a process for converting the anonymization record into the enhanced record correspond to each of the unique identification information in the anonymization target data set. Information corresponding to the target quasi-identifier that always has the same attribute value, the quasi-identifier quasi-identifier that is the anonymization quasi-identifier, and the reinforcement quasi-identifier cannot be instantiated by contrasting the strengthening quasi-identifier It is processing to convert to.

The present invention has an effect that it is possible to generate a data set with enhanced anonymity so that the personality cannot be improved even if the quasi-identifier of the record group after k-anonymization is compared.

FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment. FIG. 2 is a diagram illustrating an example of the anonymization target data set in the first embodiment. FIG. 3 is a diagram showing an example of a k-anonymization data set in the first embodiment. FIG. 4 is a diagram illustrating an example of the anonymity enhancement data set in the first embodiment. FIG. 5 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization apparatus according to the first embodiment. FIG. 6 is a flowchart showing the operation of the anonymization device according to the first embodiment. FIG. 7 is a flowchart showing the operation of the anonymity enhancing unit in the first embodiment. FIG. 8 is a flowchart showing the operation of the anonymity enhancing unit in the modification of the first embodiment. FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set according to the first embodiment. FIG. 10 is a block diagram illustrating a configuration of the anonymization apparatus according to the second embodiment. FIG. 11 is a diagram illustrating an example of the anonymization target data set in the second embodiment. FIG. 12 is a diagram illustrating an image when the target records of the anonymization target data set are distributed to groups in the second embodiment. FIG. 13 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment. FIG. 14 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment. FIG. 15 is information showing an example of the information loss amount corresponding to the combination of target records in the second embodiment. FIG. 16 is a diagram illustrating an example of a k-anonymized data set according to the second embodiment. FIG. 17 is a flowchart showing the operation of the anonymization apparatus according to the second embodiment.

Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In each of the following embodiments and drawings, a general technique is adopted for a configuration not related to the essence of the present invention, and detailed description and illustration in this embodiment are omitted. In the following embodiments and drawings, the same reference numerals are given to components having similar functions.

<First Embodiment>
FIG. 1 is a block diagram showing a configuration of an anonymization apparatus (generally also called an information processing apparatus) 100 according to the first embodiment of the present invention.

Referring to FIG. 1, the anonymization device 100 according to the present embodiment includes a k-anonymization unit 110 and an anonymity enhancement unit 120.

The constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of a computer. Here, the components shown in FIG. 1 will be described as components divided into functional units of a computer.

=== k-anonymization unit 110 ===
The k-anonymization unit 110 converts the anonymization target data set stored in a storage device (not shown) into a k-anonymization data set that satisfies k (for example, 2) k-anonymity. . The conversion is a process for anonymizing data, and is also referred to as “processing”, but here, it is unified with “conversion”.

That, k- anonymizing section 110, the anonymization target data set to generate a record (hereinafter, referred to as the target record) was converted to anonymous record k- anonymous data sets included in anonymized subject dataset .

The k-anonymization unit 110 converts the target record into an anonymization record as follows. First, the k-anonymization unit 110 converts the unique identification information included in each target record into false identification information that is uniquely assigned to the unique identification information. Here, the false identification information is identification information that does not include restoration information to the unique identification information.

Second, the k-anonymization unit 110 generates a quasi-identifier (also referred to as a target quasi-identifier) included in each target record for at least k quasi-identifiers having the same attribute in generating the k-anonymization data set. To an anonymized quasi-identifier that is uniquely assigned. Here, the anonymization quasi-identifier is a quasi-identifier determined so that the anonymization target data set including the anonymization quasi-identifier satisfies predetermined k-anonymity.

=== Anonymization target data set 820 ===
FIG. 2 is a diagram illustrating an example of the anonymization target data set 820. The anonymization target data set 820 shown in FIG. 2 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110. The anonymization target data set 820 includes a plurality of target records (for example, one of them is the target record 822) including attributes of name (unique identification information), gender, date of birth, date of medical treatment, and name of injury and illness. . Here, the attribute includes an attribute name (attribute element name) and a value of the attribute (attribute value). For example, regarding the first attribute of the target record 822, the element name is “name”, and “Alice” is the attribute value.

Here, the name is a kind of unique identification information and is information for identifying an individual.

Moreover, each of the attributes, sex, date of birth, and date of medical care is a quasi-identifier (target quasi-identifier). An invariant quasi-identifier that always has the same attribute value corresponding to each unique identification information is called an invariant quasi-identifier. A variable quasi-identifier that may have a different attribute value corresponding to each unique identification information is called a variable quasi-identifier. For example, the attributes “gender” and “birth date” are invariant identifiers. The attribute “medical care date” is a variable quasi-identifier.

=== k-anonymized data set 830 ===
FIG. 3 is a diagram illustrating an example of the k-anonymization data set 830. The anonymization target data set 830 shown in FIG. 3 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110. The k-anonymization data set 830 shown in FIG. 3 is a k-anonymization data set in which the anonymization target data set 820 shown in FIG. It is an example. That is, the anonymization quasi-identifier of k-anonymization data set 830 is an anonymization quasi-identifier of the anonymization object data set 820, respectively. Is converted to an attribute value that satisfies k-anonymity of k = 2.

As shown in FIG. 3, the k-anonymization data set 830 includes a plurality of anonymization records (for example, anonymization records) including attributes of fake ID (fake identification information), gender, date of birth, date of medical treatment, and name of sickness. Record 832). Each of the false IDs of the k-anonymization data set 830 corresponds to each of the names included in the anonymization target data set 820 on a one-to-one basis. The fake ID indicates only the relationship between each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and does not specify a specific individual, and is local identification information of the k-anonymization data set 830. is there.

Further, the attributes included in the k-anonymization data set 830, such as sex, date of birth, and medical care date, are quasi-identifiers (anonymization quasi-identifiers) as in the case of the anonymization target data set 820 described above. is there.

2 corresponds to the anonymization records of the anonymization target data set 820 shown in FIG. 2 and the anonymization records of the k-anonymization data set 830 shown in FIG. Anonymized record 832 corresponds, and target record 825 corresponds to anonymized record 835).

=== Anonymity Strengthening Unit 120 ===
The anonymity enhancing unit 120 generates and outputs an anonymity enhancing data set for the data set that has been k-anonymized by the k-anonymizing unit 110 (for example, the k-anonymized data set 830).

Specifically, the anonymity enhancement unit 120 executes processing for enhancing anonymity for the quasi-identifier to be strengthened included in the anonymization record having the same false ID included in the k-anonymization data set. To do. Here, the quasi-identifier to be strengthened (strengthening quasi-identifier) is an invariant quasi-identifier among the anonymization quasi-identifiers included in the k-anonymization data set. In addition, the process for strengthening quasi-identifiers is strengthened so that when the quasi-identifiers to be reinforced are compared, the quasi-identifiers to be reinforced cannot be instantiated (individuals can be identified or identified) It is to convert the target quasi-identifier into data with enhanced anonymity. Hereinafter, converting this strengthening target quasi-identifier into data with enhanced anonymity is referred to as strengthening processing.

That is, the anonymity strengthening unit 120 strengthens the quasi-identifier to be strengthened so as to prevent the failure of k-anonymity due to the comparison of quasi-identifiers included in a plurality of anonymized records 831 of the same user (the same fake Id). Process.

For example, the anonymity enhancing unit 120 reinforces the reinforcement target quasi-identifier to the same attribute value for each attribute name. For example, the same attribute value is an attribute value that includes all the reinforcement target quasi-identifiers for each attribute name having the same false ID, and indicates the minimum range. The same attribute value may be an attribute value indicating an arbitrary range including all the reinforcement target quasi-identifiers having the same false ID for each attribute name. Hereinafter, “all strengthening target quasi-identifiers having the same fake ID for each attribute name” are abbreviated as “same fake ID strengthening target quasi-identifiers”.

=== Anonymity enhancement data set 840 ===
FIG. 6 is a diagram illustrating an example of the anonymity enhancement data set 840. The anonymity enhancement data set 840 is information output from the anonymity conversion processing unit 120 and stored in a storage device (not shown). As illustrated in FIG. 6, the anonymity enhancement data set 840 includes a plurality of enhancement records (for example, enhancement records 842) including attributes of fake ID, gender, date of birth, date of medical care, and name of injury and illness. The enhancement record is obtained by strengthening the sex and date of birth, which are the quasi-identifiers to be strengthened in the k-anonymization data set 830 shown in FIG.

2, each anonymization record of the anonymization target data set 820 shown in FIG. 2, each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and each enhancement record of the anonymity enhancement data set 840 shown in FIG. Correspond in the order of arrangement. For example, the target record 822 and the anonymization record 832 correspond to the strengthening record 842, and the target record 825, the anonymization record 835, and the strengthening record 845 correspond to each other.

This completes the description of each component of the functional unit of the anonymization device 100.

Next, the components of the anonymization device 100 in hardware units will be described.

In the present embodiment, the anonymization device 100 can be realized by an information processing device such as a computer. Each component (functional block) in the anonymization apparatus 100 and the anonymization apparatus in other embodiments described later is realized by hardware resources included in the information processing apparatus. The information processing apparatus may include a CPU (Central Processing Unit) that executes a computer program (software program: hereinafter may be simply referred to as “program”) stored in a recording medium.

For example, the anonymization device 100 includes hardware such as a CPU of a computer, a main storage device, and an auxiliary storage device, and is realized by the cooperation of the CPU based on a program loaded from the storage device or the like to the main storage device. The However, the functions realized by the CPU are not limited to the block configuration shown in FIG. 1 (k-anonymization unit 110, anonymity enhancement unit 120), and various implementation forms that can be adopted by those skilled in the art can be applied. (The same applies to the following embodiments).

Note that the anonymization device 100 and the anonymization device according to each embodiment to be described later may be realized by a dedicated device.

FIG. 5 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.

As shown in FIG. 5, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.

The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). The CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702. Here, the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.

Then, the CPU 701 executes various processes as the k-anonymization unit 110 and the anonymity enhancement unit 120 shown in FIG. 1 according to the read program and based on the read data.

Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).

The storage unit 702 stores programs and data. The storage unit 702 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.

The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer. The storage device 703 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.

The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like.

The output unit 705 is realized by a display, for example, and is used for confirming the output.

The communication unit 706 implements an interface with the outside (for example, a data server that stores an anonymization target data set). The communication unit 706 is included as part of the k-anonymization unit 110, for example.

As described above, the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .

Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.

This completes the description of each component of the computer 700 that implements the anonymization device 100 according to the present embodiment.

Next, the operation of this embodiment will be described in detail with reference to FIGS.

First, the operation of the anonymization device 100 will be described with reference to the flowchart shown in FIG. FIG. 6 is a flowchart showing the operation of the anonymization device 100 of this embodiment. Note that the processing according to this flowchart may be executed based on the above-described program control by the CPU. Further, the step name of the process is described by a symbol as in S601.

The k-anonymization unit 110 acquires the anonymization target data set 820 (S601). For example, the k-anonymization unit 110 reads the anonymization target data set held in the storage unit 702 or the storage device 703 illustrated in FIG. Note that the k-anonymization unit 110 may receive the anonymization target data set from the outside (not shown) via the communication unit 706. Further, the k-anonymization unit 110 may receive the anonymization target data set input via the input unit 704.

Next, k- anonymizing section 110, the anonymization target data set 820 k- and anonymized generate k- anonymous data set 830, and outputs to the storage unit 702 or the storage device 703 (S602). The k-anonymization unit 110 sets the anonymization target data set 820 so that anonymization records having at least k different false IDs in the k-anonymization data set 830 have the same combination of anonymization quasi-identifiers. The target quasi-identifier of each target record is converted into an anonymization quasi-identifier. Here, the method of converting the target quasi-identifier of each target record into the anonymized quasi-identifier is, for example, abstraction by generalizing the target quasi-identifier. The method for anonymizing the target quasi-identifier of each target record is not limited to a specific method, and various methods such as perturbation may be used.

Next, the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the k-anonymization data set 830 to generate an anonymity enhancement data set 840 obtained by converting the anonymization record into the enhancement record (S603). ).

For example, the anonymity strengthening unit 120 reinforces the same false ID strengthening target quasi-identifier so as to be the same for each attribute name. In this way, the anonymity enhancing unit 120 converts all anonymized records having the same fake ID into enhanced records. For this conversion process, any combination of various processes generally used in k-anonymization such as generalization and perturbation can be used.

If the invariant canonical identifier (strengthening identifier) for each attribute name is the same in all the strengthening records having the same fake ID, even if the invariant canonical identifiers (strengthening identifiers) of a plurality of strengthening records are compared Invariant quasi-identifiers (strengthened identifiers) are never embodied. Therefore, even when compared, the desired k-anonymity can be prevented from being broken.

Next, the anonymity enhancing unit 120 outputs the generated anonymity enhancing data set 840 (S604). For example, the anonymity enhancing unit 120 outputs the anonymity enhancing data set to the outside (not shown) via the communication unit 706. Note that the anonymity enhancing unit 120 may store the anonymity enhancing data set in the storage unit 702 or the storage device 703 illustrated in FIG. 5. Further, the anonymity enhancing unit 120 may output the anonymity enhancing data set to the output unit 705 shown in FIG. 5 and control it to be displayed on the display.

The above is description of operation | movement of the anonymization apparatus 100 by the flowchart shown in FIG.

Next, S603 in the flowchart shown in FIG. 6 will be described.

FIG. 7 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set.

The anonymity enhancing unit 120 performs the processing from S611 to S614 for each of all anonymized record groups having the same false ID. For example, in the case of the k-anonymization data set 830 shown in FIG. 3, the anonymity enhancement unit 120 performs anonymization record 832 and anonymization record 835 with false ID: 2, anonymization record 834 with false ID: 4 and anonymization The processing from S611 to S614 is performed on the conversion record 836.

The anonymity enhancing unit 120 selects a fake ID to be processed (S611). Here, when a plurality of anonymization records have the same fake ID, the fake ID is a fake ID to be processed. If there is no fake ID to be processed (YES in S612), the process ends.

Next, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifiers of all anonymized records having the selected false ID into the same attribute value for each attribute name (S613).

Next, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record to be processed into the same attribute value as the specific reinforcement quasi-identifier for each attribute name (S614). Here, the processing target anonymization record is an anonymization record belonging to the same reinforcement target quasi-identifier group (described later) as “anonymization record obtained by strengthening the reinforcement quasi-identifier in S613”. The specific reinforcement target quasi-identifier is the reinforcement quasi-identifier of “anonymization record obtained by strengthening the reinforcement target quasi-identifier in S613”.

Note that when there is no anonymization record to be processed at the start of the processing in S614 (no in S614), the processing returns to S611.

Here, the quasi-identifier group to be strengthened will be described with reference to the k-anonymization data set 830 shown in FIG.

For example, the anonymization record having the same quasi-identifier to be strengthened as either the anonymization record 832 or the anonymization record 835 of the false ID: 2 is the anonymization record 831 having the false ID: 1 (gender: Any, date of birth) : 1981-1985) and anonymized record 836 with fake ID: 4 (gender: female, date of birth: 1985-1986). A plurality of such anonymized records having the same reinforcement target quasi-identifier belong to the same reinforcement target quasi-identifier group.

That is, the anonymization record 832 having the false ID: 2 and the anonymization record 831 having the false ID: 1 belong to the same reinforcement target quasi-identifier group. Also, the anonymization record 835 having the false ID: 2 and the anonymization record 836 having the false ID: 4 belong to the same reinforcement target quasi-identifier group. Further, the anonymization records belonging to the same reinforcement target quasi-identifier group as the anonymization record having the false ID: 2 are the anonymization record 831 of the false ID: 1 and the anonymization record 836 of the false ID: 4.

Next, the anonymity enhancement unit 120 strengthens the reinforcement target quasi-identifier of the processing target anonymization record to “the same attribute value as the specific reinforcement target quasi-identifier strengthened in S614” for each attribute name ( S615). Here, the anonymization record to be processed is an anonymization record having the same fake ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again in S614. The specific reinforcement target quasi-identifier is the reinforcement target quasi-identifier strengthened in S614.

Then, the process returns to S614.

Note that when there is no anonymization record having the same fake ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again in S614 at the start of the process of S615 (No in S615), the process returns to S611.

As described above, the anonymity enhancing unit 120 assigns the reinforcement target quasi-identifiers of the same attribute value between the anonymization records having the same fake ID to the reinforcement quasi-identifiers. Reinforce processing. Furthermore, the anonymity enhancement unit 120 applies the reinforcement process to the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the reinforcement record in which the reinforcement process is applied to the reinforcement target quasi-identifier. Apply. The anonymity enhancement unit 120 recursively reinforces the reinforcement target quasi-identifier.

That is, when the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of a certain anonymization record, the anonymity enhancement unit 120 also reinforces the reinforcement target quasi-identifier in the anonymization record having the same false ID. Furthermore, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the anonymization record obtained by strengthening the reinforcement target quasi-identifier. Furthermore, the anonymity enhancing unit 120 recursively repeats the reinforcement process of the reinforcement target quasi-identifier of the anonymization record having the same false ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again.

Next, an operation in which the anonymity enhancing unit 120 converts the k-anonymized data set 830 into the anonymity enhanced data set 840 will be described with specific values.

First, the anonymity enhancing unit 120 selects a fake ID: 2 as a fake ID to be processed (S611).

Next, the anonymity strengthening unit 120 reinforces the selected anonymization record 832 having the selected false ID: 2 and the reinforcement target quasi-identifier of the anonymization record 835 by converting them into the same attribute value for each attribute name. (S613).

Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is “Any” including “Any” and “female”. In addition, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is “1981 to 1986” including “1981 to 1985” and “1985 to 1986”.

In this way, the anonymity enhancement unit 120 generalizes the reinforcement target quasi-identifiers of the anonymization record 832 and the anonymization record 835 having a false ID: 2 into “sex: Any, date of birth: 1981-1986”. To do.

Next, anonymity reinforcing portion 120 has enhanced processed-enhanced quasi identifier in S613, the false ID: belonging to two anonymization record 832 and the same be reinforced semi identifier group and anonymizing record 835, anonymization record Strengthen the quasi-identifier for reinforcement. That is, the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 831 of false ID: 1 belonging to the same reinforcement target quasi-identifier group as the anonymization record 832 of false ID: 2 to the false ID: 2. Reinforce processing to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 832. At the same time, the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 836 of false ID: 4 belonging to the same reinforcement target quasi-identifier group as the anonymization record 835 of false ID: 2 to the false ID: 2 Strengthening is performed to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 835 (S614).

Next, the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S611).

Next, the anonymity enhancing unit 120 reinforces the selected anonymization record 834 having the selected false ID: 4 and the reinforcement target quasi-identifier of the anonymization record 836 by converting them into the same attribute value for each attribute name. (S613). Here, the fake ID: 2 anonymization record 835 and the fake ID: 4 anonymization record 836 belonging to the same reinforcement target quasi-identifier group are not changed by the process of S612 when the fake ID: 2 is selected in S611. The quasi-identifier “sex: Any, date of birth: 1981-1986” is given. Therefore, the anonymity strengthening unit 120 generalizes the anonymization record 834 and the anonymization record 836 of the false ID: 4 into “sex: Any, date of birth: 1981-1990”. Here, "sex: Any, date of birth: 1981-1990" means "sex: woman, date of birth: 1986-1990" and "sex: Any, date of birth: 1981-1986" This is a reinforcement target quasi-identifier that includes each attribute value for each attribute name and uses the same attribute value as the attribute value indicating the minimum range.

Next, anonymity reinforced section 120, be reinforced semi identifier enhanced processed fake ID in S613: belonging to 4 anonymization record 834 and the same be reinforced semi identifier group and anonymizing record 836, strengthening anonymized record Strengthen the target quasi-identifier. In other words, anonymity strengthening section 120, false ID: fake ID belong to the same strengthening the subject quasi-identifier groups and anonymous record 834 of 4: to strengthen the subject quasi-identifier of anonymous record 833 for each attribute name of 3, anonymous Reinforce processing to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 834. In addition, anonymity strengthening section 120, false ID: 4 of anonymity record 836 false belong to the same strengthening the subject quasi-identifier group and ID: the strengthening subject quasi-identifier for each attribute names of 2 of anonymity record 835, anonymous Strengthening is performed to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 836 (S614).

Next, the anonymity strengthening unit 120 anonymizes the anonymization record 832 of the false ID: 2 for each attribute name along with the re-strengthening process for the reinforcement target quasi-identifier of the anonymization record 835 of the false ID: 2 in S614. The data is strengthened to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened of the record 835 (S615).

Next, the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of the anonymization record 832 of fake ID: 2, and the same reinforcement target quasi-identifier group as the anonymization record 832 of fake ID: 2. For each attribute name, the reinforcement target quasi-identifier of the anonymization record 831 of fake ID: 1 is the same attribute value as that of the quasi-identification record of the anonymization record 832 of fake ID: 2, “sex: Any, date of birth Japan: 1981-1986 ”(S614).

Next, since there is no fake ID to be processed (YES in S612), the process ends. <Modification of First Embodiment>
As described above, the anonymity enhancing unit 120 of the present embodiment performs an enhancement process of the reinforcement target quasi-identifier (invariant quasi-identifier), thereby anonymizing quasi-identifiers between anonymized records having the same false ID. Prevents the breakdown of k-anonymity due to contrast. That is, the anonymity enhancing unit 120 of the present embodiment performs enhancement processing on the reinforcement target quasi-identifier so that k-anonymity is satisfied even if this comparison is made.

However, in the above-mentioned strengthening process, when the reinforcement target quasi-identifier of one anonymization record is strengthened, the strengthening process recursively spreads to other anonymization records, and many anonymization records The strengthening quasi-identifier (invariant quasi-identifier) is greatly abstracted.

This is because, in order for a strengthened record having a certain false ID to satisfy k-anonymity, at least k-1 strengthened records having another false ID including the same strengthened quasi-identifier are required. That is, k-anonymity has great privacy strength. Therefore, the loss of information in the anonymity enhancing data set that retains that k-anonymity is also significant.

Therefore, a modified example of the first embodiment in which the loss of information is suppressed to be relatively small will be described.

The k-anonymization data set 830 is k-anonymized by the k-anonymization unit 110. Therefore, k-anonymity is satisfied in each anonymized record unit. Thus, k-case of further generalization anonymization quasi identifier anonymized records satisfying anonymity, corresponding to anonymous quasi identifier of the anonymous record, enhanced record after generalization is always present or k or . That is, there are always k or more strengthened records after generalization corresponding to the target quasi-identifier of the target record, like the k-anonymized record. Therefore, the anonymity enhancement data set obtained by further generalizing the anonymization quasi-identifier is the same as the k-anonymity of the k-anonymization data set 830 even if it is outside the strict definition of k-anonymity. Can have privacy strength.

Therefore, further conversion (enhancement processing) to eliminate the failure of k-anonymity due to comparison of invariant canonical identifiers of a plurality of anonymized records having the same fake ID does not necessarily guarantee k-anonymity. Also good. That is, the further transformation is only a further generalization to avoid an increase in individual specificity due to contrast.

Specifically, it is sufficient that all the invariant identifiers of the anonymized records having the same fake ID have the same attribute value for each attribute name by this further generalization. If all anonymization records having the same fake ID have the same invariant identifier for each attribute name, further specialization (incarnation) of the invariant identifier is impossible. Therefore, individual specificity does not increase.

This means that the anonymity strengthening unit 120 should generalize the anonymization record having the same false ID into a superset of invariant quasi-identifiers (strengthening target quasi-identifiers). Here, the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier. Here, the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier. In other words, a super set is a set that represents a superordinate concept of a set. Here, the attribute value of the invariant quasi-identifier is, for each attribute name, an attribute value of all invariant quasi-identifiers or a superset (or union) that includes all of the values included in the attribute values of the invariant quasi-identifier. Converted. Here, the union is the smallest superset among supersets that include all the invariant identifier attribute values or all the values included in the invariant identifier attribute values. A superset may be expressed using a range or the like. Such a superset can maintain the same privacy strength as the k-anonymity guaranteed by the k-anonymization unit 110.

Next, the operation of the anonymity enhancing unit 120 in the modification of the first embodiment will be described.

FIG. 8 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set in the modification of the first embodiment.

8 are the same as S611, S612, and S613 in FIG. 7, respectively.

Next, the operation in which the anonymity enhancing unit 120 converts the k-anonymized data set 830 into the anonymity enhanced data set will be described with specific values.

FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set 850 generated by the anonymity enhancing unit 120 by generalizing the k-anonymized data set 830.

First, the anonymity enhancing unit 120 selects fake ID: 2 as a fake ID to be processed (S621).

Next, anonymity reinforcing portion 120 selects the false ID: enhanced target level identifier anonymization record 832 and anonymizing record 835 with 2, to enhance processing to the same attribute value for each attribute name (S623).

Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “Any” including “Any” and “female”. Also, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is an attribute value indicating a minimum range including, for example, “1981 to 1985” and “1985 to 1986”. 1981-1986 ”. Note that the attribute value does not necessarily have to be an attribute value indicating the minimum range including all of them. For example, all anonymization standards having an attribute name of “birth date” such as “1980 to 1989” are used. An attribute value in an arbitrary range including the identifier may be used.

Next, the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S621).

Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “female” including “female” and “female”. Further, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is, for example, an attribute value indicating a minimum range including “1986 to 1990” and “1985 to 1986”. 1986-1990 ".

Next, since there are no remaining false IDs to be subjected to reinforcement processing on the reinforcement target quasi-identifier (NO in S622), the process ends.

In order to satisfy k-anonymity even after comparing invariant quasi-identifiers, anonymization records belonging to the same quasi-identifier group to be strengthened must be strengthened. However, as described above, when only the invariant quasi-identifiers are not embodied, it is not necessary to reinforce the anonymization records belonging to the same reinforcement target quasi-identifier group.

An anonymity enhancing data set 850 shown in FIG. 9 is an invariant quasi-identifier (enhanced quasi-identifier) that includes those invariant quasi-identifiers (target quasi-identifiers) for each target record of the anonymization target data set 820 shown in FIG. ), And the number of types of fake IDs in the enhancement record is two or more. Therefore, the anonymity enhancement data set 850 has a privacy strength comparable to k = 2 anonymity (2-anonymity) of k = 2.

The first effect of the present embodiment described above is to generate a data set that cannot improve individual specificity even if the invariant semi-identifier (anonymized semi-identifier) of the anonymized record after anonymization is compared. it is that it allows.

The reason is that the k-anonymization data set generated by the k-anonymization unit 110 is processed by the anonymity enhancement unit 120 to strengthen the quasi-identifier to be strengthened included in the anonymization record to generate the anonymity enhancement data set. This is because the way.

The second effect of the present embodiment described above is that a data set that does not improve personal identification while strictly maintaining k-anonymity of the k-anonymization data set generated by the k-anonymization unit 110 is generated. is a point to be able to.

This is because the anonymity strengthening unit 120 recursively executes the above-described strengthening process to generate an anonymity strengthening data set.

The third effect of the present embodiment described above is that anonymity is maintained so that the loss of information is kept relatively small and personal identification cannot be improved by comparing the quasi-identifiers of the k-anonymized records. It is a point that makes it possible to generate a data set enhanced.

The reason is that the anonymity enhancing unit 120 reinforces only the reinforcement target quasi-identifier included in the anonymized record having the same fake ID to generate the anonymity enhanced data set.

<Second Embodiment>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

First, an outline of the anonymization device of the second embodiment will be described.

The anonymization apparatus of this embodiment calculates an information loss amount corresponding to generalization (anonymity enhancement) by the anonymity enhancement unit 120 illustrated in FIG. Then, the anonymization apparatus of this embodiment determines a combination of target records in k-anonymization based on the calculated information loss amount so that the information loss amount is minimized, for example. Then, the anonymization device of the present embodiment converts the target quasi-identifier of the anonymization target data set based on the determined combination of target records so as to satisfy desired anonymity, and k-anonymization data set to generate.

The anonymization device of this embodiment calculates the information loss amount with the unique identification information as a unit. The reason is to cope with the case where the anonymization target data set includes a plurality of target records for one unique identification information.

Specifically, in the anonymization apparatus of the present embodiment, the anonymity enhancing unit 120 performs reinforcement processing in order to prevent anonymity failure due to comparison. However, when the information loss amount of each target record is calculated in record units, the information loss amount does not include the loss of information when strengthened by the anonymity enhancing unit 120. The loss is for an anonymization target data set including a plurality of target records for one unique identification information.

As described in the first embodiment, the enhancement processing by the anonymity enhancement unit 120 further generalizes the anonymization record of the k-anonymization data set based on the false ID corresponding to each unique identification information. it is intended. Therefore, the information loss in that case is not taken into account only by obtaining the information loss amount of the single target record.

For this reason, the anonymization apparatus of this embodiment is an information loss including an information loss amount in units of unique identification information corresponding to each fake ID, that is, an information loss when strengthened by the anonymity enhancement unit 120 the amount is calculated. Hereinafter, this “information loss amount corresponding to each unique identification information associated with the strengthening process of the quasi-identifier to be strengthened by the anonymity enhancing unit 120” will be referred to as a strengthened processing information loss amount. Thus, the anonymization device of the present embodiment is compatible with a case where the anonymization target data set includes a plurality of target records for one unique identification information, and an anonymity enhancement data set in which loss of information due to generalization is further reduced. to generate.

The above is the description of the outline of the anonymization device of the second embodiment.

FIG. 10 is a block diagram showing the configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.

Referring to FIG. 10, the anonymization device 200 according to this embodiment includes a k-anonymization unit 210 instead of the k-anonymization unit 110 as compared to the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 further includes a combination determination unit 230 and an information loss calculation unit 240 as compared with the anonymization device 100. The combination determination unit 230 may include an information loss calculation unit 240.

=== Combination Determination Unit 230 ===
The combination determination unit 230 determines a combination of target records when the target records are distributed to one or more groups based on the amount of loss of reinforced processing information.

Specifically, the combination determination unit 230 generates one or more combination candidates. Here, a combination candidate is a candidate for a combination of target records when target records included in the anonymization target data set are distributed to one or more groups.

The combination determination unit 230 passes the combination candidates to the information loss calculation unit 240. Then, the combination determination unit 230 receives the reinforced processing information loss amount corresponding to each combination candidate from the information loss calculation unit 240.

The combination determination unit 230 determines a combination candidate having the smallest information loss calculated based on the received amount of reinforced processing information loss as a combination of target records. That is, the combination determination unit 230 determines the combination of the target records so that the total sum of information loss amounts after the reinforcement processing by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to.

=== Anonymization target data set 820 ===
FIG. 11 is a diagram illustrating an example of the anonymization target data set 860. As shown in FIG. 11, the anonymization target data set 860 includes a plurality of target records (for example, target records 8601) including attributes of a patient ID (also referred to as unique identification information), a birth year, a medical treatment date, and a wound name. . The attribute “birth year” is an invariant canonical identifier. The attribute “medical care date” is a variable quasi-identifier.

FIG. 12 is a diagram showing an image when the target records of the anonymization target data set 860 are distributed to one or more groups. The dotted line in FIG. 12 shows an example of partitioning in which target records are combined and distributed to groups so that 3-anonymity can be guaranteed. Hereinafter, this group is referred to as an anonymous group.

The partition 401 and the partition 402 are partitions that divide the anonymization target data set by attribute: year of birth. Moreover, the partition 403 and the partition 404 are partitions which divide | segment the anonymization object data set by attribute: medical treatment date.

Here, as shown in FIG. 12, each anonymous group includes target records having three or more different patient IDs. In FIG. 12, for the sake of convenience, the patient ID is shown using a false ID shown in FIG. 13 (corresponding to the patient ID shown in FIG. 12 in the order of arrangement). Therefore, each anonymous group is a group that is partitioned so that 3-anonymity can be guaranteed.

For example, the combination determination unit 230 determines whether to adopt an anonymization group divided by either the partition 403 or the partition 404, that is, a combination of target records. In this way, the combination determining unit 230 can satisfy the desired k-anonymity among the candidate combinations of the target records, and the amount of reinforced processing information loss corresponding to each patient ID (unique identification information) The candidate for the combination of the target records having the smallest sum is selected and determined as the combination of the target records.

=== Information Loss Calculation Unit 240 ===
The information loss calculation unit 240 calculates the reinforced processing information loss amount.

Specifically, the information loss calculation unit 240 receives a combination candidate from the combination determination unit 230. Next, the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate. Next, the information loss calculation unit 240 passes the calculated reinforced processing information loss amount to the combination determination unit 230.

Note that the information loss calculation unit 240 calculates the reinforced processing information loss amount by using a calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120.

Next, the strengthening process of the anonymity enhancement unit 120 is performed to change the target quasi-identifier (invariant quasi-identifier whose attribute is “birth year”) included in the target record of the anonymization target data set 860 to an attribute value exceeding the minimum range. A case of strengthening processing that generalizes to a set will be described.

For example, the information loss calculation unit 240 calculates the reinforced processing information loss amount by NCP (Normalized City Penalty). Various indexes for measuring the amount of information loss have been proposed. The information loss calculation unit 240 may calculate the reinforced processing information loss amount by using any calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120 without being limited to the NCP.

General NCP of target record unit is NCP (r.a) = | r., Where NCP (r.a) is an NCP value related to attribute a of a target record r. a_max-r. a_min | / | a. max-a. min |. Here, r. a_max is the maximum attribute value of the attribute a of the target record r, r. a_min is the minimum value of the attribute value of the attribute a of the target record r. In addition, a. max is the maximum value of the attribute a in all target records in the anonymization target data set 860, a. min represents the minimum value of the attribute a in all target records in the anonymization target data set 860.

For example, in the case of an anonymization group divided by the partition 403, that is, a combination of target records, an NCP for each target record having a patient ID: Alice is calculated as follows. The target quasi-identifier with the attribute “birth year” included in the target record 8601 is k-anonymized in 1981-1988 by the k-anonymization unit 210 in the anonymization group divided by the partition 403. Therefore, the NCP of the target quasi-identifier whose attribute included in the target record 8601 is “birth year” is 0.78 (the third decimal place is rounded off, and so on). Similarly, the NCP of the target semi-identifier whose attribute included in the target record 8604 is “Birth Year” is 0.67, and the NCP of the target semi-identifier whose attribute included in the target record 8607 is “Birth Year” is 0.44. is there.

The information loss calculation unit 240 of the present embodiment calculates an NCP for each patient ID as the reinforced processing information loss amount. An index obtained by extending the above-mentioned NCP from the information loss amount in the target record unit to the reinforced processing information loss amount in the patient ID unit is represented as NCP *. If the value of NCP * related to attribute a of a patient IDu is NCP * (u.a), NCP * (u.a) = | u. a_max ―
u. a_min | / | a. max-a. min |. Where u. a_max is the maximum value of the values of the attribute a of all target records having the patient ID u. a_min indicates a minimum value among the values of the attribute a of all target records having the patient IDu.

For example, each of the target quasi-identifiers whose attributes included in the target record 8601, the target record 8604, and the target record 8607 having the patient ID: Alice are “birth year” is k-anonymized in the anonymization group divided by the partition 403. Part 210 converts to 1981-1988, 1983-1989 and 1981-1985. In this case, the minimum value of the “year of birth” attribute included in the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 1981, and the maximum value is 1989. Therefore, the NCP * of the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 0.89.

FIG. 13 shows an anonymization quasi-identifier corresponding to a target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 401, partition 403, and partition 404. It illustrates. FIG. 14 shows an anonymization quasi-identifier corresponding to the target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 402, partition 403, and partition 404. It illustrates. That is, FIG. 13 and FIG. 14 show an example of the anonymization quasi-identifier when the anonymization target data set 860 is k-anonymized in a certain combination of target records.

FIG. 15 is information indicating an example of the information loss amount corresponding to the combination of the target records. Specifically, FIG. 15 shows the value of NCP * for each patient ID and the sum of NCP * of the entire anonymization target data set 860 when each of the partition 401 and the partition 402 is adopted. FIG. 15 shows that the loss of information due to anonymization can be reduced when the partition 402 is adopted instead of the partition 401. In this case, the combination determination unit 230 employs the partition 402.

=== k-anonymization unit 210 ===
The k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group of the combination of the target records determined by the combination determination unit 230 into an anonymization quasi-identifier, and k-anonymization data set Is generated. For example, the k-anonymization unit 210 converts each of the target quasi-identifiers included in the target records belonging to each anonymous group into the same attribute value for each attribute name.

That is, the k-anonymization unit 210 anonymizes the target quasi-identifier so that the total amount of information loss after conversion by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to convert to of quasi-identifier.

FIG. 16 is a diagram illustrating an example of a k-anonymization data set generated by the k-anonymization unit 210. This k-anonymization data set is obtained by the k-anonymization unit 210 when the combination determination unit 230 determines to divide the combination of the target records by the partition 402, the partition 403, and the partition 404. K-anonymized.

In addition, the component of the hardware unit of the anonymization apparatus 200 may be the configuration shown in FIG.

Next, the operation of this embodiment will be described in detail with reference to the drawings.

FIG. 17 is a flowchart showing the operation of the anonymization apparatus 200 according to this embodiment.

The combination determination unit 230 generates one or more combination candidates and passes the generated combination candidates to the information loss calculation unit 240 (S631).

Next, the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate, and passes the calculated reinforced processing information loss amount to the combination determination unit 230 (S632).

Next, the combination determination unit 230 determines a combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as a combination of target records (S633).

Next, the k-anonymization unit 210 sets the target quasi-identifier included in each target record of the combination of the target records determined by the combination determination unit 230 as an anonymization quasi-identifier for each target record belonging to each anonymous group. The k-anonymization data set is generated by conversion, and output to the storage unit 702 or the storage device 703 (S634).

Next, the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the received k-anonymization data set, and generates an anonymity enhancement data set obtained by converting the anonymization record into the enhancement record (S635). ).

It should be noted that the anonymization target data set may include a plurality of invariant quasi-identifiers. In this case, for example, the combination determination unit 230 adds up all the invariant identifiers NCP * in the same combination candidate in the same attribute name and the same fake ID unit, and sums the totals based on the “enhanced processing information loss amount”. Loss of information calculated in this way. Note that the above-mentioned summation may be performed for any same attribute name.

In addition to the effects of the first embodiment, the first effect of the present embodiment described above is anonymity with a smaller loss of information than the anonymity enhancing data set generated by the anonymization device 100 of the first embodiment. It is a point that makes it possible to generate an enhanced data set.

The reason is that the following configuration is included. That is, first, the information loss calculation unit 240 calculates the reinforced processing information loss amount corresponding to each unique identification information. Secondly, the combination determining unit 230 determines a combination of target records based on the amount of reinforced processing information loss. Third, the k-anonymization unit 210 converts the target quasi-identifier included in the determined target record into the same attribute value for each attribute name, thereby generating a k-anonymization data set.

The second effect of the present embodiment described above is that it is possible to generate an anonymity-enhanced data set with relatively small loss of information.

The reason is that the combination determining unit 230 determines the combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as the combination of the target records.

Alternatively, the reason is that the k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group into the same attribute value for each attribute name. This is because the way.

Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.

In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.

The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.

In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.

Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.

Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.

As mentioned above, although this invention was demonstrated with reference to each embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2012-127257 filed on June 4, 2012, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 100 Anonymization apparatus 110 k-anonymization part 120 Anonymity enhancement part 200 Anonymization apparatus 210 k-anonymization part 230 Combination determination part 240 Information loss calculation part 401 Partition 402 Partition 403 Partition 404 Partition 700 Computer 701 CPU
702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 820 Anonymization target data set 822 Target record 825 Target record 830 k-anonymization data set 830 Anonymization data set 831 Anonymization record 832 Anonymization record 833 Anonymization record 834 Anonymization record 835 Anonymization record 836 Anonymization record 840 Anonymity enhancement data set 842 Enhancement record 845 Enhancement record 850 Anonymity enhancement data set 860 Anonymization target data set 8601 Target record 8604 Target record 8607 Target record

Claims

For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity K-anonymization means for converting to a semi-identifier, converting the target record into an anonymization record, and generating a k-anonymization data set including the anonymization record;
Anonymity enhancing means for converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, generating and outputting an anonymity enhanced data set including the enhanced record, and Including
The anonymity enhancing means corresponds to the target quasi-identifier that always has the same attribute value corresponding to each of the unique identification information in the anonymization target data set, An information processing apparatus that converts the anonymization record into the enhancement record by converting the reinforcement target quasi-identifier into information that cannot be materialized by comparing the reinforcement quasi-identifier.
Each of the reinforcement target quasi-identifiers is an attribute consisting of an attribute name and an attribute value,
The anonymity enhancing means includes the anonymity enhancing data obtained by converting the reinforcement target quasi-identifier into the same attribute value for each attribute name included in all the anonymized records having the same false identification information. The information processing apparatus according to claim 1, wherein a set is generated.
The anonymity enhancing means includes an attribute value that includes the reinforcement target quasi-identifier for each attribute name included in all the anonymized records having the same false identification information and indicates a minimum range. The information processing apparatus according to claim 2, wherein the reinforcement target quasi-identifier is converted as an attribute value.
Information loss calculating means for calculating the amount of reinforced processing information loss corresponding to each of the unique identification information accompanying the conversion of the quasi-identifier to be strengthened by the anonymity enhancing means,
Based on the amount of reinforced processing information loss calculated by the information loss calculation means, further includes a combination determination means for determining a combination of the target records when the target records are allocated to one or more groups,
The k-anonymization means converts the target quasi-identifier included in the target record belonging to each group into the same attribute value for each attribute name, and generates a k-anonymization data set.
The information processing apparatus according to any one of claims 1 to 3.
The union determination unit sets the combinations of the target records so that the sum of information loss amounts after conversion by the anonymity enhancement unit of all the target records included in the anonymization target data set is minimized. The information processing apparatus according to claim 4, wherein the information processing apparatus is determined.
The k-anonymization means converts target quasi-identifiers so that the total sum of information loss after conversion by the anonymity enhancement means of all the target records included in the anonymization target data set is minimized. The information processing apparatus according to claim 4 or 5, wherein:
For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity Converting to a semi-identifier, converting the target record to an anonymization record, generating a k-anonymization data set including the anonymization record,
For the k-anonymization data set, converting the anonymization record having the same false identification information into an enhancement record, generating an anonymity enhancement data set including the enhancement record, and outputting,
The reinforcement target quasi-identifier corresponding to the target quasi-identifier always corresponding to each of the unique identification information in the anonymization target data set and corresponding to the target quasi-identifier, An anonymization method of converting the anonymization record into the strengthening record by converting the information to the quasi-identifier to be strengthened by comparison.
The amount of strengthening processing information loss corresponding to each of the unique identification information associated with the conversion of the strengthening target quasi-identifier is calculated, and the target records are distributed to one or more groups based on the calculated amount of strengthening processing information loss Determine the combination of target records in case
Converting the target quasi-identifier included in the target records belonging to each of the groups into the same attribute value for each attribute name to generate a k-anonymized data set;
The anonymization method according to claim 7.
For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity Converting to a semi-identifier, converting the target record to an anonymization record, and generating a k-anonymization data set including the anonymization record;
A process for converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, generating an anonymity enhanced data set including the enhanced record, and outputting to the computer Let it run
The process of converting the anonymization record into the enhancement record is the anonymization quasi-identifier corresponding to the target quasi-identifier always having the same attribute value corresponding to each of the unique identification information in the anonymization target data set. A non-volatile recording medium storing a program, which is a process of converting an enhancement target quasi-identifier into information in which the reinforcement quasi-identifier cannot be realized by comparison with the reinforcement quasi-identifier.
One or more groups of the target records are calculated based on the calculated amount of reinforced processing information loss corresponding to each of the unique identification information associated with the conversion of the reinforced target quasi-identifier and the calculated amount of reinforced processing information loss A process of determining a combination of the target records in the case of distribution to
Converting the target quasi-identifier included in the target record belonging to each group into the same attribute value for each attribute name, and generating a k-anonymized data set. A non-volatile recording medium on which the program according to claim 9 is recorded.