WO2013183250A1 - Information processing device for anonymization and anonymization method - Google Patents

Information processing device for anonymization and anonymization method Download PDF

Info

Publication number
WO2013183250A1
WO2013183250A1 PCT/JP2013/003347 JP2013003347W WO2013183250A1 WO 2013183250 A1 WO2013183250 A1 WO 2013183250A1 JP 2013003347 W JP2013003347 W JP 2013003347W WO 2013183250 A1 WO2013183250 A1 WO 2013183250A1
Authority
WO
WIPO (PCT)
Prior art keywords
anonymization
target
quasi
record
identifier
Prior art date
Application number
PCT/JP2013/003347
Other languages
French (fr)
Japanese (ja)
Inventor
翼 高橋
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2014519824A priority Critical patent/JPWO2013183250A1/en
Publication of WO2013183250A1 publication Critical patent/WO2013183250A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Definitions

  • the present invention relates to an information processing apparatus, anonymization method, and program for processing information and anonymizing it.
  • Non-Patent Document 1 proposes k-anonymity, which is a well-known anonymity index.
  • a technique for satisfying a predetermined k-anonymity in a data set to be anonymized is called k-anonymization.
  • the attribute information to be converted included in the anonymization target data set is referred to as a quasi-identifier.
  • the quasi-identifier is not unique identification information (for example, a name) that identifies an individual.
  • the quasi-identifier is information of an attribute that may identify an individual by combining with other information that is not unique identification information.
  • a process of converting the target quasi-identifier is performed so that at least k records having the same quasi-identifier exist in the data set to be anonymized.
  • Patent Document 1 discloses a privacy protection device including data processing means for processing data until the above k-anonymity is satisfied.
  • Non-Patent Document 1 and the privacy protection device of Patent Document 1 include a case where a plurality of records having the same unique identification information are included in the data set to be anonymized. This is because the problem is not considered.
  • the data set to be anonymized includes a plurality of records having the same unique identification information.
  • the anonymization target data set is k-anonymized while maintaining a specific connection relationship between a plurality of records having the same unique identification information.
  • the quasi-identifiers of record groups after k-anonymization that can be related by the connection relationship are compared.
  • the quasi-identifier abstracted by the above-mentioned k-anonymization is embodied by the comparison.
  • a data set including a plurality of records having the same unique identification information as described above is stored in a predetermined recording medium.
  • the data set includes historical information accumulated by those service providers, such as purchase information and medical information.
  • purchase information and medical information are generally stored in a recording medium as a set of a plurality of records for one individual (user). For example, purchase information associated with a credit card number is generated every time a user performs a purchase action using the same credit card. Such purchase information is associated with the user and stored in a recording medium as a record. Similarly, medical information is generated every time a medical practice is received using the same insurance card. Then, medical information associated with the same insured person is accumulated in the recording medium.
  • FIG. 2 is a diagram illustrating an example of an anonymization target data set.
  • FIG. 3 is a diagram illustrating an example of a data set obtained by k-anonymizing the data set to be anonymized in FIG.
  • the fake ID is local identification information for the data set shown in FIG. 3 that shows only the relationship between each record of the data set shown in FIG. 3 and does not specify a specific individual.
  • the data set shown in FIG. 3 is designed to prevent a person's records from being narrowed down to less than k from any combination of knowledge about “gender”, “birth date”, and “care date” for an individual. ing.
  • the data set shown in FIG. 3 is different from the data set handled in the background art as follows. That is, it is that the data set shown in FIG. 3 is anonymized information set (FIG. 2) to be anonymized in which a plurality of records are stored for one individual. Specifically, each individual record group of the anonymization target data set shown in FIG. 2 has a specific connection relationship, that is, a connection relationship in which attributes of names (unique identification information) are common. The difference is that a plurality of records having the same unique identification information are stored in the recording medium as a data set shown in FIG. 3 by anonymized fake IDs. That is, as described above, k-anonymization of related technology does not take into consideration that a plurality of records of one individual appear at the same time.
  • the target record 822 of “sex: female, date of birth: February 2, 1985, date of medical treatment: April 2010” having the name: “Alice” shown in FIG. 2 is a false ID: “ 2 is processed into an anonymized record 832 of “sex: Any, date of birth: 1981-1985, date of medical treatment: April 2010”. Further, the target record 825 of “sex: female, date of birth: February 2, 1985, date of medical treatment: May 2010” having the name: “Alice” shown in FIG. 2 is a false ID shown in FIG. It is processed into an anonymized record 835 of “sex: woman, date of birth: 1985-1986, date of medical treatment: May 2010” with “2”.
  • each of the anonymization record 832 and the anonymization record 835 having the fake ID: 2 has 2-anonymity regarding “gender: female, date of birth: February 2, 1985”. Yes.
  • the attributes of “sex” and “birth date” have an invariant attribute value for a certain individual. Therefore, it can be easily estimated that anonymized records having the same fake ID have the same attribute value as such an invariant attribute value in the target record before anonymization.
  • the person x can combine the anonymization records based on the fake ID.
  • the person x has anonymized record 832 of “sex: Any, date of birth: 1981-1985” with a false ID: 2 and anonymity of “sex: woman, date of birth: 1985-1986”. From the product with the quantified record 835, “sex: female, date of birth: 1985” can be obtained as information of the anonymized record of the actualized false ID: “2”. From the above, the 2-anonymity of the false ID: “2” is broken.
  • An object of the present invention is to provide an information processing apparatus, anonymization method, and program for anonymization that can solve the above-described problems.
  • the information processing apparatus that performs anonymization according to the present invention includes the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information. Are converted to false identification information that is uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into false identification information.
  • An anonymity enhancing unit that generates and outputs a data set, and the anonymity enhancing unit includes the target quasi-identifier that always has the same attribute value corresponding to each of the unique identification information in the anonymization target data set.
  • the anonymization quasi-identifier corresponding to the anonymization quasi-identifier is converted into information that cannot be instantiated the quasi-identification quasi-identifier by comparing the quasi-identification quasi-identifier, and Convert to the enhanced record.
  • each of the unique identification information is The restoration information to the unique identification information is not included, converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into k-anonymity by the anonymization target data set Is converted to an anonymized quasi-identifier, the target record is converted to an anonymized record, a k-anonymized data set including the anonymized record is generated, and the k-anonymized data set is , Converting the anonymization record having the same false identification information into an enhancement record, generating an anonymity enhancement data set including the enhancement record, and outputting,
  • the enhancement target quasi-identifier corresponding to the target quasi-identifier always corresponding to each of the unique identification information and having the same attribute value is the anonymization quasi-identifier
  • the anonymization record is converted to the strengthening record by
  • the non-volatile recording medium program of the present invention provides each of the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information.
  • the anonymization target data set is k ⁇ so as to satisfy the anonymity, it converts the anonymous quasi identifier, converting the target record to anonymous record, and generating a k- anonymous data set that contains the anonymous record, the k- anonymous Anonymization enhanced data including the enhanced record by converting the anonymized record having the same false identification information into a strengthened record
  • a process for generating a computer and a process for causing the computer to execute recording and a process for converting the anonymization record into the enhanced record correspond to each of the unique identification information in the anonymization target data set.
  • the quasi-identifier quasi-identifier that is the anonymization quasi-identifier, and the reinforcement quasi-identifier cannot be instantiated by contrasting the strengthening quasi-identifier It is processing to convert to.
  • the present invention has an effect that it is possible to generate a data set with enhanced anonymity so that the personality cannot be improved even if the quasi-identifier of the record group after k-anonymization is compared.
  • FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment.
  • FIG. 2 is a diagram illustrating an example of the anonymization target data set in the first embodiment.
  • FIG. 3 is a diagram showing an example of a k-anonymization data set in the first embodiment.
  • FIG. 4 is a diagram illustrating an example of the anonymity enhancement data set in the first embodiment.
  • FIG. 5 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization apparatus according to the first embodiment.
  • FIG. 6 is a flowchart showing the operation of the anonymization device according to the first embodiment.
  • FIG. 7 is a flowchart showing the operation of the anonymity enhancing unit in the first embodiment.
  • FIG. 8 is a flowchart showing the operation of the anonymity enhancing unit in the modification of the first embodiment.
  • FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set according to the first embodiment.
  • FIG. 10 is a block diagram illustrating a configuration of the anonymization apparatus according to the second embodiment.
  • FIG. 11 is a diagram illustrating an example of the anonymization target data set in the second embodiment.
  • FIG. 12 is a diagram illustrating an image when the target records of the anonymization target data set are distributed to groups in the second embodiment.
  • FIG. 13 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment.
  • FIG. 14 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment.
  • FIG. 15 is information showing an example of the information loss amount corresponding to the combination of target records in the second embodiment.
  • FIG. 16 is a diagram illustrating an example of a k-anonymized data set according to the second embodiment.
  • FIG. 17 is a flowchart showing the operation of the anonymization apparatus according to the second embodiment.
  • FIG. 1 is a block diagram showing a configuration of an anonymization apparatus (generally also called an information processing apparatus) 100 according to the first embodiment of the present invention.
  • an anonymization apparatus generally also called an information processing apparatus
  • the anonymization device 100 includes a k-anonymization unit 110 and an anonymity enhancement unit 120.
  • the constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of a computer.
  • the components shown in FIG. 1 will be described as components divided into functional units of a computer.
  • the k-anonymization unit 110 converts the anonymization target data set stored in a storage device (not shown) into a k-anonymization data set that satisfies k (for example, 2) k-anonymity. .
  • the conversion is a process for anonymizing data, and is also referred to as “processing”, but here, it is unified with “conversion”.
  • k- anonymizing section 110 the anonymization target data set to generate a record (hereinafter, referred to as the target record) was converted to anonymous record k- anonymous data sets included in anonymized subject dataset .
  • the k-anonymization unit 110 converts the target record into an anonymization record as follows. First, the k-anonymization unit 110 converts the unique identification information included in each target record into false identification information that is uniquely assigned to the unique identification information.
  • the false identification information is identification information that does not include restoration information to the unique identification information.
  • the k-anonymization unit 110 generates a quasi-identifier (also referred to as a target quasi-identifier) included in each target record for at least k quasi-identifiers having the same attribute in generating the k-anonymization data set.
  • a quasi-identifier also referred to as a target quasi-identifier
  • the anonymization quasi-identifier is a quasi-identifier determined so that the anonymization target data set including the anonymization quasi-identifier satisfies predetermined k-anonymity.
  • FIG. 2 is a diagram illustrating an example of the anonymization target data set 820.
  • the anonymization target data set 820 shown in FIG. 2 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110.
  • the anonymization target data set 820 includes a plurality of target records (for example, one of them is the target record 822) including attributes of name (unique identification information), gender, date of birth, date of medical treatment, and name of injury and illness.
  • the attribute includes an attribute name (attribute element name) and a value of the attribute (attribute value). For example, regarding the first attribute of the target record 822, the element name is “name”, and “Alice” is the attribute value.
  • the name is a kind of unique identification information and is information for identifying an individual.
  • each of the attributes, sex, date of birth, and date of medical care is a quasi-identifier (target quasi-identifier).
  • An invariant quasi-identifier that always has the same attribute value corresponding to each unique identification information is called an invariant quasi-identifier.
  • a variable quasi-identifier that may have a different attribute value corresponding to each unique identification information is called a variable quasi-identifier.
  • the attributes “gender” and “birth date” are invariant identifiers.
  • the attribute “medical care date” is a variable quasi-identifier.
  • FIG. 3 is a diagram illustrating an example of the k-anonymization data set 830.
  • the anonymization target data set 830 shown in FIG. 3 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110.
  • the k-anonymization data set 830 includes a plurality of anonymization records (for example, anonymization records) including attributes of fake ID (fake identification information), gender, date of birth, date of medical treatment, and name of sickness. Record 832).
  • Each of the false IDs of the k-anonymization data set 830 corresponds to each of the names included in the anonymization target data set 820 on a one-to-one basis.
  • the fake ID indicates only the relationship between each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and does not specify a specific individual, and is local identification information of the k-anonymization data set 830. is there.
  • the attributes included in the k-anonymization data set 830 are quasi-identifiers (anonymization quasi-identifiers) as in the case of the anonymization target data set 820 described above. is there.
  • FIG. 2 corresponds to the anonymization records of the anonymization target data set 820 shown in FIG. 2 and the anonymization records of the k-anonymization data set 830 shown in FIG. Anonymized record 832 corresponds, and target record 825 corresponds to anonymized record 835).
  • the anonymity enhancement unit 120 executes processing for enhancing anonymity for the quasi-identifier to be strengthened included in the anonymization record having the same false ID included in the k-anonymization data set.
  • the quasi-identifier to be strengthened is an invariant quasi-identifier among the anonymization quasi-identifiers included in the k-anonymization data set.
  • the process for strengthening quasi-identifiers is strengthened so that when the quasi-identifiers to be reinforced are compared, the quasi-identifiers to be reinforced cannot be instantiated (individuals can be identified or identified) It is to convert the target quasi-identifier into data with enhanced anonymity.
  • converting this strengthening target quasi-identifier into data with enhanced anonymity is referred to as strengthening processing.
  • the anonymity strengthening unit 120 strengthens the quasi-identifier to be strengthened so as to prevent the failure of k-anonymity due to the comparison of quasi-identifiers included in a plurality of anonymized records 831 of the same user (the same fake Id). Process.
  • the anonymity enhancing unit 120 reinforces the reinforcement target quasi-identifier to the same attribute value for each attribute name.
  • the same attribute value is an attribute value that includes all the reinforcement target quasi-identifiers for each attribute name having the same false ID, and indicates the minimum range.
  • the same attribute value may be an attribute value indicating an arbitrary range including all the reinforcement target quasi-identifiers having the same false ID for each attribute name.
  • all strengthening target quasi-identifiers having the same fake ID for each attribute name are abbreviated as “same fake ID strengthening target quasi-identifiers”.
  • FIG. 6 is a diagram illustrating an example of the anonymity enhancement data set 840.
  • the anonymity enhancement data set 840 is information output from the anonymity conversion processing unit 120 and stored in a storage device (not shown).
  • the anonymity enhancement data set 840 includes a plurality of enhancement records (for example, enhancement records 842) including attributes of fake ID, gender, date of birth, date of medical care, and name of injury and illness.
  • the enhancement record is obtained by strengthening the sex and date of birth, which are the quasi-identifiers to be strengthened in the k-anonymization data set 830 shown in FIG.
  • each anonymization record of the anonymization target data set 820 shown in FIG. 2 each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and each enhancement record of the anonymity enhancement data set 840 shown in FIG. Correspond in the order of arrangement.
  • the target record 822 and the anonymization record 832 correspond to the strengthening record 842
  • the target record 825, the anonymization record 835, and the strengthening record 845 correspond to each other.
  • the anonymization device 100 can be realized by an information processing device such as a computer.
  • Each component (functional block) in the anonymization apparatus 100 and the anonymization apparatus in other embodiments described later is realized by hardware resources included in the information processing apparatus.
  • the information processing apparatus may include a CPU (Central Processing Unit) that executes a computer program (software program: hereinafter may be simply referred to as “program”) stored in a recording medium.
  • CPU Central Processing Unit
  • program software program: hereinafter may be simply referred to as “program” stored in a recording medium.
  • the anonymization device 100 includes hardware such as a CPU of a computer, a main storage device, and an auxiliary storage device, and is realized by the cooperation of the CPU based on a program loaded from the storage device or the like to the main storage device.
  • the functions realized by the CPU are not limited to the block configuration shown in FIG. 1 (k-anonymization unit 110, anonymity enhancement unit 120), and various implementation forms that can be adopted by those skilled in the art can be applied. (The same applies to the following embodiments).
  • the anonymization device 100 and the anonymization device according to each embodiment to be described later may be realized by a dedicated device.
  • FIG. 5 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.
  • the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside.
  • the recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
  • the CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown).
  • the CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702.
  • the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
  • the CPU 701 executes various processes as the k-anonymization unit 110 and the anonymity enhancement unit 120 shown in FIG. 1 according to the read program and based on the read data.
  • the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
  • the storage unit 702 stores programs and data.
  • the storage unit 702 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.
  • the storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707.
  • the storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer.
  • the storage device 703 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.
  • the input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation.
  • the input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like.
  • the output unit 705 is realized by a display, for example, and is used for confirming the output.
  • the communication unit 706 implements an interface with the outside (for example, a data server that stores an anonymization target data set).
  • the communication unit 706 is included as part of the k-anonymization unit 110, for example.
  • the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG.
  • the means for realizing each unit included in the computer 700 is not limited to the above.
  • the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
  • the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707.
  • the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.
  • FIG. 6 is a flowchart showing the operation of the anonymization device 100 of this embodiment. Note that the processing according to this flowchart may be executed based on the above-described program control by the CPU. Further, the step name of the process is described by a symbol as in S601.
  • the k-anonymization unit 110 acquires the anonymization target data set 820 (S601). For example, the k-anonymization unit 110 reads the anonymization target data set held in the storage unit 702 or the storage device 703 illustrated in FIG. Note that the k-anonymization unit 110 may receive the anonymization target data set from the outside (not shown) via the communication unit 706. Further, the k-anonymization unit 110 may receive the anonymization target data set input via the input unit 704.
  • k- anonymizing section 110 the anonymization target data set 820 k- and anonymized generate k- anonymous data set 830, and outputs to the storage unit 702 or the storage device 703 (S602).
  • the k-anonymization unit 110 sets the anonymization target data set 820 so that anonymization records having at least k different false IDs in the k-anonymization data set 830 have the same combination of anonymization quasi-identifiers.
  • the target quasi-identifier of each target record is converted into an anonymization quasi-identifier.
  • the method of converting the target quasi-identifier of each target record into the anonymized quasi-identifier is, for example, abstraction by generalizing the target quasi-identifier.
  • the method for anonymizing the target quasi-identifier of each target record is not limited to a specific method, and various methods such as perturbation may be used.
  • the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the k-anonymization data set 830 to generate an anonymity enhancement data set 840 obtained by converting the anonymization record into the enhancement record (S603). ).
  • the anonymity strengthening unit 120 reinforces the same false ID strengthening target quasi-identifier so as to be the same for each attribute name.
  • the anonymity enhancing unit 120 converts all anonymized records having the same fake ID into enhanced records.
  • any combination of various processes generally used in k-anonymization such as generalization and perturbation can be used.
  • invariant canonical identifier for each attribute name is the same in all the strengthening records having the same fake ID, even if the invariant canonical identifiers (strengthening identifiers) of a plurality of strengthening records are compared Invariant quasi-identifiers (strengthened identifiers) are never embodied. Therefore, even when compared, the desired k-anonymity can be prevented from being broken.
  • the anonymity enhancing unit 120 outputs the generated anonymity enhancing data set 840 (S604).
  • the anonymity enhancing unit 120 outputs the anonymity enhancing data set to the outside (not shown) via the communication unit 706.
  • the anonymity enhancing unit 120 may store the anonymity enhancing data set in the storage unit 702 or the storage device 703 illustrated in FIG. 5. Further, the anonymity enhancing unit 120 may output the anonymity enhancing data set to the output unit 705 shown in FIG. 5 and control it to be displayed on the display.
  • FIG. 7 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set.
  • the anonymity enhancing unit 120 performs the processing from S611 to S614 for each of all anonymized record groups having the same false ID. For example, in the case of the k-anonymization data set 830 shown in FIG. 3, the anonymity enhancement unit 120 performs anonymization record 832 and anonymization record 835 with false ID: 2, anonymization record 834 with false ID: 4 and anonymization The processing from S611 to S614 is performed on the conversion record 836.
  • the anonymity enhancing unit 120 selects a fake ID to be processed (S611).
  • a fake ID is a fake ID to be processed. If there is no fake ID to be processed (YES in S612), the process ends.
  • the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifiers of all anonymized records having the selected false ID into the same attribute value for each attribute name (S613).
  • the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record to be processed into the same attribute value as the specific reinforcement quasi-identifier for each attribute name (S614).
  • the processing target anonymization record is an anonymization record belonging to the same reinforcement target quasi-identifier group (described later) as “anonymization record obtained by strengthening the reinforcement quasi-identifier in S613”.
  • the specific reinforcement target quasi-identifier is the reinforcement quasi-identifier of “anonymization record obtained by strengthening the reinforcement target quasi-identifier in S613”.
  • the anonymization record having the same quasi-identifier to be strengthened as either the anonymization record 832 or the anonymization record 835 of the false ID: 2 is the anonymization record 831 having the false ID: 1 (gender: Any, date of birth) : 1981-1985) and anonymized record 836 with fake ID: 4 (gender: female, date of birth: 1985-1986).
  • a plurality of such anonymized records having the same reinforcement target quasi-identifier belong to the same reinforcement target quasi-identifier group.
  • the anonymization record 832 having the false ID: 2 and the anonymization record 831 having the false ID: 1 belong to the same reinforcement target quasi-identifier group.
  • the anonymization record 835 having the false ID: 2 and the anonymization record 836 having the false ID: 4 belong to the same reinforcement target quasi-identifier group.
  • the anonymization records belonging to the same reinforcement target quasi-identifier group as the anonymization record having the false ID: 2 are the anonymization record 831 of the false ID: 1 and the anonymization record 836 of the false ID: 4.
  • the anonymity enhancement unit 120 strengthens the reinforcement target quasi-identifier of the processing target anonymization record to “the same attribute value as the specific reinforcement target quasi-identifier strengthened in S614” for each attribute name ( S615).
  • the anonymization record to be processed is an anonymization record having the same fake ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again in S614.
  • the specific reinforcement target quasi-identifier is the reinforcement target quasi-identifier strengthened in S614.
  • the anonymity enhancing unit 120 assigns the reinforcement target quasi-identifiers of the same attribute value between the anonymization records having the same fake ID to the reinforcement quasi-identifiers. Reinforce processing. Furthermore, the anonymity enhancement unit 120 applies the reinforcement process to the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the reinforcement record in which the reinforcement process is applied to the reinforcement target quasi-identifier. Apply. The anonymity enhancement unit 120 recursively reinforces the reinforcement target quasi-identifier.
  • the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of a certain anonymization record
  • the anonymity enhancement unit 120 also reinforces the reinforcement target quasi-identifier in the anonymization record having the same false ID. Furthermore, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the anonymization record obtained by strengthening the reinforcement target quasi-identifier. Furthermore, the anonymity enhancing unit 120 recursively repeats the reinforcement process of the reinforcement target quasi-identifier of the anonymization record having the same false ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again.
  • the anonymity enhancing unit 120 selects a fake ID: 2 as a fake ID to be processed (S611).
  • the anonymity strengthening unit 120 reinforces the selected anonymization record 832 having the selected false ID: 2 and the reinforcement target quasi-identifier of the anonymization record 835 by converting them into the same attribute value for each attribute name. (S613).
  • the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is “Any” including “Any” and “female”.
  • the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is “1981 to 1986” including “1981 to 1985” and “1985 to 1986”.
  • the anonymity enhancement unit 120 generalizes the reinforcement target quasi-identifiers of the anonymization record 832 and the anonymization record 835 having a false ID: 2 into “sex: Any, date of birth: 1981-1986”. To do.
  • anonymity reinforcing portion 120 has enhanced processed-enhanced quasi identifier in S613, the false ID: belonging to two anonymization record 832 and the same be reinforced semi identifier group and anonymizing record 835, anonymization record Strengthen the quasi-identifier for reinforcement. That is, the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 831 of false ID: 1 belonging to the same reinforcement target quasi-identifier group as the anonymization record 832 of false ID: 2 to the false ID: 2. Reinforce processing to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 832.
  • the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 836 of false ID: 4 belonging to the same reinforcement target quasi-identifier group as the anonymization record 835 of false ID: 2 to the false ID: 2 Strengthening is performed to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 835 (S614).
  • the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S611).
  • the anonymity enhancing unit 120 reinforces the selected anonymization record 834 having the selected false ID: 4 and the reinforcement target quasi-identifier of the anonymization record 836 by converting them into the same attribute value for each attribute name. (S613).
  • the fake ID: 2 anonymization record 835 and the fake ID: 4 anonymization record 836 belonging to the same reinforcement target quasi-identifier group are not changed by the process of S612 when the fake ID: 2 is selected in S611.
  • the quasi-identifier “sex: Any, date of birth: 1981-1986” is given. Therefore, the anonymity strengthening unit 120 generalizes the anonymization record 834 and the anonymization record 836 of the false ID: 4 into “sex: Any, date of birth: 1981-1990”.
  • anonymity reinforced section 120 be reinforced semi identifier enhanced processed fake ID in S613: belonging to 4 anonymization record 834 and the same be reinforced semi identifier group and anonymizing record 836, strengthening anonymized record Strengthen the target quasi-identifier.
  • anonymity strengthening section 120 false ID: fake ID belong to the same strengthening the subject quasi-identifier groups and anonymous record 834 of 4: to strengthen the subject quasi-identifier of anonymous record 833 for each attribute name of 3, anonymous Reinforce processing to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 834.
  • anonymity strengthening section 120 false ID: 4 of anonymity record 836 false belong to the same strengthening the subject quasi-identifier group and ID: the strengthening subject quasi-identifier for each attribute names of 2 of anonymity record 835, anonymous Strengthening is performed to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 836 (S614).
  • the anonymity strengthening unit 120 anonymizes the anonymization record 832 of the false ID: 2 for each attribute name along with the re-strengthening process for the reinforcement target quasi-identifier of the anonymization record 835 of the false ID: 2 in S614.
  • the data is strengthened to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened of the record 835 (S615).
  • the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of the anonymization record 832 of fake ID: 2, and the same reinforcement target quasi-identifier group as the anonymization record 832 of fake ID: 2.
  • the reinforcement target quasi-identifier of the anonymization record 831 of fake ID: 1 is the same attribute value as that of the quasi-identification record of the anonymization record 832 of fake ID: 2, “sex: Any, date of birth Japan: 1981-1986 ”(S614).
  • the anonymity enhancing unit 120 of the present embodiment performs an enhancement process of the reinforcement target quasi-identifier (invariant quasi-identifier), thereby anonymizing quasi-identifiers between anonymized records having the same false ID. Prevents the breakdown of k-anonymity due to contrast. That is, the anonymity enhancing unit 120 of the present embodiment performs enhancement processing on the reinforcement target quasi-identifier so that k-anonymity is satisfied even if this comparison is made.
  • the k-anonymization data set 830 is k-anonymized by the k-anonymization unit 110. Therefore, k-anonymity is satisfied in each anonymized record unit.
  • k-case of further generalization anonymization quasi identifier anonymized records satisfying anonymity, corresponding to anonymous quasi identifier of the anonymous record, enhanced record after generalization is always present or k or . That is, there are always k or more strengthened records after generalization corresponding to the target quasi-identifier of the target record, like the k-anonymized record.
  • the anonymity enhancement data set obtained by further generalizing the anonymization quasi-identifier is the same as the k-anonymity of the k-anonymization data set 830 even if it is outside the strict definition of k-anonymity. Can have privacy strength.
  • the anonymity strengthening unit 120 should generalize the anonymization record having the same false ID into a superset of invariant quasi-identifiers (strengthening target quasi-identifiers).
  • the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier.
  • the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier.
  • a super set is a set that represents a superordinate concept of a set.
  • the attribute value of the invariant quasi-identifier is, for each attribute name, an attribute value of all invariant quasi-identifiers or a superset (or union) that includes all of the values included in the attribute values of the invariant quasi-identifier. Converted.
  • the union is the smallest superset among supersets that include all the invariant identifier attribute values or all the values included in the invariant identifier attribute values.
  • a superset may be expressed using a range or the like. Such a superset can maintain the same privacy strength as the k-anonymity guaranteed by the k-anonymization unit 110.
  • FIG. 8 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set in the modification of the first embodiment.
  • FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set 850 generated by the anonymity enhancing unit 120 by generalizing the k-anonymized data set 830.
  • the anonymity enhancing unit 120 selects fake ID: 2 as a fake ID to be processed (S621).
  • anonymity reinforcing portion 120 selects the false ID: enhanced target level identifier anonymization record 832 and anonymizing record 835 with 2, to enhance processing to the same attribute value for each attribute name (S623).
  • the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “Any” including “Any” and “female”.
  • the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is an attribute value indicating a minimum range including, for example, “1981 to 1985” and “1985 to 1986”. 1981-1986 ”.
  • the attribute value does not necessarily have to be an attribute value indicating the minimum range including all of them.
  • all anonymization standards having an attribute name of “birth date” such as “1980 to 1989” are used.
  • An attribute value in an arbitrary range including the identifier may be used.
  • the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S621).
  • anonymity reinforcing portion 120 selects the false ID: enhanced target level identifier anonymization record 832 and anonymizing record 835 with 2, to enhance processing to the same attribute value for each attribute name (S623).
  • the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “female” including “female” and “female”.
  • the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is, for example, an attribute value indicating a minimum range including “1986 to 1990” and “1985 to 1986”. 1986-1990 ".
  • anonymization records belonging to the same quasi-identifier group to be strengthened must be strengthened.
  • the first effect of the present embodiment described above is to generate a data set that cannot improve individual specificity even if the invariant semi-identifier (anonymized semi-identifier) of the anonymized record after anonymization is compared. it is that it allows.
  • the reason is that the k-anonymization data set generated by the k-anonymization unit 110 is processed by the anonymity enhancement unit 120 to strengthen the quasi-identifier to be strengthened included in the anonymization record to generate the anonymity enhancement data set. This is because the way.
  • the second effect of the present embodiment described above is that a data set that does not improve personal identification while strictly maintaining k-anonymity of the k-anonymization data set generated by the k-anonymization unit 110 is generated. is a point to be able to.
  • the anonymity strengthening unit 120 recursively executes the above-described strengthening process to generate an anonymity strengthening data set.
  • the third effect of the present embodiment described above is that anonymity is maintained so that the loss of information is kept relatively small and personal identification cannot be improved by comparing the quasi-identifiers of the k-anonymized records. It is a point that makes it possible to generate a data set enhanced.
  • the reason is that the anonymity enhancing unit 120 reinforces only the reinforcement target quasi-identifier included in the anonymized record having the same fake ID to generate the anonymity enhanced data set.
  • the anonymization apparatus of this embodiment calculates an information loss amount corresponding to generalization (anonymity enhancement) by the anonymity enhancement unit 120 illustrated in FIG. Then, the anonymization apparatus of this embodiment determines a combination of target records in k-anonymization based on the calculated information loss amount so that the information loss amount is minimized, for example. Then, the anonymization device of the present embodiment converts the target quasi-identifier of the anonymization target data set based on the determined combination of target records so as to satisfy desired anonymity, and k-anonymization data set to generate.
  • the anonymization device of this embodiment calculates the information loss amount with the unique identification information as a unit. The reason is to cope with the case where the anonymization target data set includes a plurality of target records for one unique identification information.
  • the anonymity enhancing unit 120 performs reinforcement processing in order to prevent anonymity failure due to comparison.
  • the information loss amount of each target record is calculated in record units, the information loss amount does not include the loss of information when strengthened by the anonymity enhancing unit 120.
  • the loss is for an anonymization target data set including a plurality of target records for one unique identification information.
  • the enhancement processing by the anonymity enhancement unit 120 further generalizes the anonymization record of the k-anonymization data set based on the false ID corresponding to each unique identification information. it is intended. Therefore, the information loss in that case is not taken into account only by obtaining the information loss amount of the single target record.
  • the anonymization apparatus of this embodiment is an information loss including an information loss amount in units of unique identification information corresponding to each fake ID, that is, an information loss when strengthened by the anonymity enhancement unit 120 the amount is calculated.
  • this “information loss amount corresponding to each unique identification information associated with the strengthening process of the quasi-identifier to be strengthened by the anonymity enhancing unit 120” will be referred to as a strengthened processing information loss amount.
  • the anonymization device of the present embodiment is compatible with a case where the anonymization target data set includes a plurality of target records for one unique identification information, and an anonymity enhancement data set in which loss of information due to generalization is further reduced. to generate.
  • FIG. 10 is a block diagram showing the configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.
  • the anonymization device 200 includes a k-anonymization unit 210 instead of the k-anonymization unit 110 as compared to the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 further includes a combination determination unit 230 and an information loss calculation unit 240 as compared with the anonymization device 100.
  • the combination determination unit 230 may include an information loss calculation unit 240.
  • the combination determination unit 230 generates one or more combination candidates.
  • a combination candidate is a candidate for a combination of target records when target records included in the anonymization target data set are distributed to one or more groups.
  • the combination determination unit 230 passes the combination candidates to the information loss calculation unit 240. Then, the combination determination unit 230 receives the reinforced processing information loss amount corresponding to each combination candidate from the information loss calculation unit 240.
  • the combination determination unit 230 determines a combination candidate having the smallest information loss calculated based on the received amount of reinforced processing information loss as a combination of target records. That is, the combination determination unit 230 determines the combination of the target records so that the total sum of information loss amounts after the reinforcement processing by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to.
  • FIG. 11 is a diagram illustrating an example of the anonymization target data set 860.
  • the anonymization target data set 860 includes a plurality of target records (for example, target records 8601) including attributes of a patient ID (also referred to as unique identification information), a birth year, a medical treatment date, and a wound name. .
  • the attribute “birth year” is an invariant canonical identifier.
  • the attribute “medical care date” is a variable quasi-identifier.
  • FIG. 12 is a diagram showing an image when the target records of the anonymization target data set 860 are distributed to one or more groups.
  • the dotted line in FIG. 12 shows an example of partitioning in which target records are combined and distributed to groups so that 3-anonymity can be guaranteed.
  • this group is referred to as an anonymous group.
  • the partition 401 and the partition 402 are partitions that divide the anonymization target data set by attribute: year of birth.
  • the partition 403 and the partition 404 are partitions which divide
  • each anonymous group includes target records having three or more different patient IDs.
  • the patient ID is shown using a false ID shown in FIG. 13 (corresponding to the patient ID shown in FIG. 12 in the order of arrangement). Therefore, each anonymous group is a group that is partitioned so that 3-anonymity can be guaranteed.
  • the combination determination unit 230 determines whether to adopt an anonymization group divided by either the partition 403 or the partition 404, that is, a combination of target records. In this way, the combination determining unit 230 can satisfy the desired k-anonymity among the candidate combinations of the target records, and the amount of reinforced processing information loss corresponding to each patient ID (unique identification information) The candidate for the combination of the target records having the smallest sum is selected and determined as the combination of the target records.
  • the information loss calculation unit 240 receives a combination candidate from the combination determination unit 230. Next, the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate. Next, the information loss calculation unit 240 passes the calculated reinforced processing information loss amount to the combination determination unit 230.
  • the information loss calculation unit 240 calculates the reinforced processing information loss amount by using a calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120.
  • the strengthening process of the anonymity enhancement unit 120 is performed to change the target quasi-identifier (invariant quasi-identifier whose attribute is “birth year”) included in the target record of the anonymization target data set 860 to an attribute value exceeding the minimum range.
  • a case of strengthening processing that generalizes to a set will be described.
  • the information loss calculation unit 240 calculates the reinforced processing information loss amount by NCP (Normalized City Penalty). Various indexes for measuring the amount of information loss have been proposed.
  • the information loss calculation unit 240 may calculate the reinforced processing information loss amount by using any calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120 without being limited to the NCP.
  • NCP (r.a)
  • NCP (r.a) is an NCP value related to attribute a of a target record r. a_max-r. a_min
  • r. a_max is the maximum attribute value of the attribute a of the target record r
  • r. a_min is the minimum value of the attribute value of the attribute a of the target record r.
  • a. max is the maximum value of the attribute a in all target records in the anonymization target data set 860
  • a. min represents the minimum value of the attribute a in all target records in the anonymization target data set 860.
  • an NCP for each target record having a patient ID: Alice is calculated as follows.
  • the target quasi-identifier with the attribute “birth year” included in the target record 8601 is k-anonymized in 1981-1988 by the k-anonymization unit 210 in the anonymization group divided by the partition 403. Therefore, the NCP of the target quasi-identifier whose attribute included in the target record 8601 is “birth year” is 0.78 (the third decimal place is rounded off, and so on).
  • the NCP of the target semi-identifier whose attribute included in the target record 8604 is “Birth Year” is 0.67
  • the NCP of the target semi-identifier whose attribute included in the target record 8607 is “Birth Year” is 0.44.
  • the information loss calculation unit 240 of the present embodiment calculates an NCP for each patient ID as the reinforced processing information loss amount.
  • each of the target quasi-identifiers whose attributes included in the target record 8601, the target record 8604, and the target record 8607 having the patient ID: Alice are “birth year” is k-anonymized in the anonymization group divided by the partition 403.
  • Part 210 converts to 1981-1988, 1983-1989 and 1981-1985.
  • the minimum value of the “year of birth” attribute included in the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 1981, and the maximum value is 1989. Therefore, the NCP * of the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 0.89.
  • FIG. 13 shows an anonymization quasi-identifier corresponding to a target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 401, partition 403, and partition 404. It illustrates.
  • FIG. 14 shows an anonymization quasi-identifier corresponding to the target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 402, partition 403, and partition 404. It illustrates. That is, FIG. 13 and FIG. 14 show an example of the anonymization quasi-identifier when the anonymization target data set 860 is k-anonymized in a certain combination of target records.
  • FIG. 15 is information indicating an example of the information loss amount corresponding to the combination of the target records. Specifically, FIG. 15 shows the value of NCP * for each patient ID and the sum of NCP * of the entire anonymization target data set 860 when each of the partition 401 and the partition 402 is adopted. FIG. 15 shows that the loss of information due to anonymization can be reduced when the partition 402 is adopted instead of the partition 401. In this case, the combination determination unit 230 employs the partition 402.
  • the k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group of the combination of the target records determined by the combination determination unit 230 into an anonymization quasi-identifier, and k-anonymization data set Is generated. For example, the k-anonymization unit 210 converts each of the target quasi-identifiers included in the target records belonging to each anonymous group into the same attribute value for each attribute name.
  • the k-anonymization unit 210 anonymizes the target quasi-identifier so that the total amount of information loss after conversion by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to convert to of quasi-identifier.
  • FIG. 16 is a diagram illustrating an example of a k-anonymization data set generated by the k-anonymization unit 210.
  • This k-anonymization data set is obtained by the k-anonymization unit 210 when the combination determination unit 230 determines to divide the combination of the target records by the partition 402, the partition 403, and the partition 404. K-anonymized.
  • component of the hardware unit of the anonymization apparatus 200 may be the configuration shown in FIG.
  • FIG. 17 is a flowchart showing the operation of the anonymization apparatus 200 according to this embodiment.
  • the combination determination unit 230 generates one or more combination candidates and passes the generated combination candidates to the information loss calculation unit 240 (S631).
  • the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate, and passes the calculated reinforced processing information loss amount to the combination determination unit 230 (S632).
  • the combination determination unit 230 determines a combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as a combination of target records (S633).
  • the k-anonymization unit 210 sets the target quasi-identifier included in each target record of the combination of the target records determined by the combination determination unit 230 as an anonymization quasi-identifier for each target record belonging to each anonymous group.
  • the k-anonymization data set is generated by conversion, and output to the storage unit 702 or the storage device 703 (S634).
  • the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the received k-anonymization data set, and generates an anonymity enhancement data set obtained by converting the anonymization record into the enhancement record (S635). ).
  • the anonymization target data set may include a plurality of invariant quasi-identifiers.
  • the combination determination unit 230 adds up all the invariant identifiers NCP * in the same combination candidate in the same attribute name and the same fake ID unit, and sums the totals based on the “enhanced processing information loss amount”. Loss of information calculated in this way. Note that the above-mentioned summation may be performed for any same attribute name.
  • the first effect of the present embodiment described above is anonymity with a smaller loss of information than the anonymity enhancing data set generated by the anonymization device 100 of the first embodiment. It is a point that makes it possible to generate an enhanced data set.
  • the information loss calculation unit 240 calculates the reinforced processing information loss amount corresponding to each unique identification information.
  • the combination determining unit 230 determines a combination of target records based on the amount of reinforced processing information loss.
  • the k-anonymization unit 210 converts the target quasi-identifier included in the determined target record into the same attribute value for each attribute name, thereby generating a k-anonymization data set.
  • the second effect of the present embodiment described above is that it is possible to generate an anonymity-enhanced data set with relatively small loss of information.
  • the reason is that the combination determining unit 230 determines the combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as the combination of the target records.
  • the reason is that the k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group into the same attribute value for each attribute name. This is because the way.
  • each component described in each of the above embodiments does not necessarily need to be an independent entity.
  • each component may be realized as a module with a plurality of components.
  • each component may be realized by a plurality of modules.
  • Each component may be configured such that a certain component is a part of another component.
  • Each component may be configured such that a part of a certain component overlaps a part of another component.
  • each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement
  • the program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up.
  • the read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
  • a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
  • each of the embodiments described above it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.
  • the specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation
  • movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
  • Anonymization apparatus 110 k-anonymization part 120 Anonymity enhancement part 200 Anonymization apparatus 210 k-anonymization part 230 Combination determination part 240 Information loss calculation part 401 Partition 402 Partition 403 Partition 404 Partition 700 Computer 701 CPU 702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 820 Anonymization target data set 822 Target record 825 Target record 830 k-anonymization data set 830 Anonymization data set 831 Anonymization record 832 Anonymization record 833 Anonymization record 834 Anonymization record 835 Anonymization record 836 Anonymization record 840 Anonymity enhancement data set 842 Enhancement record 845 Enhancement record 850 Anonymity enhancement data set 860 Anonymization target data set 8601 Target record 8604 Target record 8607 Target record

Abstract

This invention provides an anonymization device for generating a dataset with reinforced anonymity such that individual specificity cannot be enhanced even by drawing parallels between quasi-identifiers of a group of records subsequent to k-anonymization. The anonymization device is provided with: a k-anonymization means for generating a k-anonymized dataset by converting intrinsic identification information of an anonymization-target dataset to false identification information and, in addition, converting a quasi-identifier to satisfy a determined k-anonymity; and an anonymity reinforcement means for generating and outputting an anonymity-reinforced dataset by converting quasi-identifiers that always have the same value corresponding to the intrinsic identification information for each same piece of false identification information of the k-anonymized dataset to quasi-identifiers for which instantiation is not possible by drawing parallels between these quasi-identifiers.

Description

匿名化を行う情報処理装置及び匿名化方法Information processing apparatus and anonymization method for anonymization
 本発明は、情報を加工して匿名化する情報処理装置、匿名化方法及びプログラムに関する。 The present invention relates to an information processing apparatus, anonymization method, and program for processing information and anonymizing it.
 情報を匿名化する様々な関連技術が知られている。 Various related technologies for anonymizing information are known.
 例えば、非特許文献1は、よく知られた匿名性指標であるk-匿名性を提案する。匿名化対象のデータセットに所定のk-匿名性を充足させる手法は、k-匿名化と呼ばれる。また、匿名化対象のデータセットに含まれる変換されるべき属性の情報は、準識別子(quasi-identifier)という。準識別子は、個人を特定する固有識別情報(例えば、名前)ではない。準識別子は、固有識別情報でない他の情報と組み合わせることで個人が特定される恐れがある属性の情報である。このk-匿名化では、同じ準識別子を有するレコードが匿名化対象のデータセットの中に少なくともk個以上存在するように、対象となる準識別子を、変換する処理が行われる。即ち、k-匿名性を満足するように、対象となる準識別子を、個人を特定(または識別)され難い情報に変換する処理が行われる。この変換処理としては、一般化、切り落とし等の匿名化処理が知られている。その一般化において、準識別子の元の詳細(具体的)な情報は、より抽象化された情報に変換される。 For example, Non-Patent Document 1 proposes k-anonymity, which is a well-known anonymity index. A technique for satisfying a predetermined k-anonymity in a data set to be anonymized is called k-anonymization. The attribute information to be converted included in the anonymization target data set is referred to as a quasi-identifier. The quasi-identifier is not unique identification information (for example, a name) that identifies an individual. The quasi-identifier is information of an attribute that may identify an individual by combining with other information that is not unique identification information. In this k-anonymization, a process of converting the target quasi-identifier is performed so that at least k records having the same quasi-identifier exist in the data set to be anonymized. That is, a process of converting the target quasi-identifier into information that is difficult to identify (or identify) an individual is performed so as to satisfy k-anonymity. As this conversion process, anonymization processes such as generalization and cutoff are known. In the generalization, the original detailed (specific) information of the quasi-identifier is converted into more abstract information.
 例えば、特許文献1は、上述のようなk-匿名性を満足するまでデータを加工する、データ加工手段を含むプライバシー保護装置を開示する。 For example, Patent Document 1 discloses a privacy protection device including data processing means for processing data until the above k-anonymity is satisfied.
特開2011-180839号公報JP 2011-180839 A
 しかしながら、上述した先行技術文献に記載された技術においては、k-匿名化後のレコード群の準識別子を互いに対比すると、個人を特定できる度合いを示す個人特定性が上昇する場合があるという問題点がある。 However, in the techniques described in the above-mentioned prior art documents, there is a problem that when the quasi-identifiers of the k-anonymized record groups are compared with each other, the individual specificity indicating the degree to which the individual can be identified may increase. There is.
 その理由は、非特許文献1に記載されたk-匿名性及び特許文献1のプライバシー保護装置は、匿名化対象のデータセットに同一の固有識別情報を持つ複数のレコードが含まれている場合の問題点を考慮していないからである。 The reason for this is that k-anonymity described in Non-Patent Document 1 and the privacy protection device of Patent Document 1 include a case where a plurality of records having the same unique identification information are included in the data set to be anonymized. This is because the problem is not considered.
 具体的には、以下のようにk-匿名性によるプライバシー保護が破綻する場合がある。第一に、匿名化対象のデータセットが、同一の固有識別情報を持つ複数のレコードを含んでいる。第二に、その匿名化対象のデータセットが、同一の固有識別情報を持つ複数のレコード間の特定の接続関係を保ったまま、k-匿名化される。そして、第三に、その接続関係によって関連付け可能なk-匿名化後のレコード群の準識別子が対比される。更に第四に、その対比により、上述のk-匿名化により抽象化された準識別子が具体化される。 Specifically, privacy protection by k-anonymity may fail as follows. First, the data set to be anonymized includes a plurality of records having the same unique identification information. Secondly, the anonymization target data set is k-anonymized while maintaining a specific connection relationship between a plurality of records having the same unique identification information. Third, the quasi-identifiers of record groups after k-anonymization that can be related by the connection relationship are compared. Fourthly, the quasi-identifier abstracted by the above-mentioned k-anonymization is embodied by the comparison.
 上述の同一の固有識別情報を持つ複数のレコードを含むデータセットは、所定の記録媒体に格納される。そのデータセットは、例えば、購買情報や診療情報等のような、それらのサービス提供者によって蓄積されている履歴情報を含む。これらの、購買情報や診療情報は、一般に、一個人(ユーザ)に関して複数のレコードの集合として、記録媒体に格納される。例えばクレジットカード番号に対応付けられた購買情報は、あるユーザが同一のクレジットカードを用いて購買行動をする度に発生する。それらの購買情報が、そのユーザに関連付けられてレコードとして記録媒体に蓄積される。また、診療情報も同様に、同一の保険証を用いて診療行為を受ける度に発生する。そして、同一被保険者に対応付けられた診療情報が記録媒体に蓄積される。 A data set including a plurality of records having the same unique identification information as described above is stored in a predetermined recording medium. The data set includes historical information accumulated by those service providers, such as purchase information and medical information. These purchase information and medical information are generally stored in a recording medium as a set of a plurality of records for one individual (user). For example, purchase information associated with a credit card number is generated every time a user performs a purchase action using the same credit card. Such purchase information is associated with the user and stored in a recording medium as a record. Similarly, medical information is generated every time a medical practice is received using the same insurance card. Then, medical information associated with the same insured person is accumulated in the recording medium.
 上述したようにk-匿名性によるプライバシー保護が破綻する場合の例を、具体的なデータを示して説明する。 An example of the case where privacy protection by k-anonymity fails as described above will be described with specific data.
 図2は、匿名化対象のデータセットの一例を示す図である。図3は、図2の匿名化対象のデータセットを、k-匿名化したデータセットの例を示す図である。 FIG. 2 is a diagram illustrating an example of an anonymization target data set. FIG. 3 is a diagram illustrating an example of a data set obtained by k-anonymizing the data set to be anonymized in FIG.
 図3に示すデータセットは、「性別」、「生年月日」及び「診療年月」の属性を準識別子として含む。そして、これらの準識別子は、図2に示す匿名化対象のデータセットにk=2のk-匿名化を施したものである。また、図2に示す匿名化対象のデータセットの各レコードに存在する名前(固有識別情報)は、図3に示すデータセットにおいて偽ID(IDentifier)に変換されている。偽IDは、図3に示すデータセットの各レコード間の関係のみを示し、具体的な個人を特定しない、図3に示すデータセットのための局所的な識別情報である。 The data set shown in FIG. 3 includes attributes of “gender”, “birth date”, and “care date” as quasi-identifiers. These quasi-identifiers are obtained by applying k = 2 anonymization to the anonymization target data set shown in FIG. Also, the name (unique identification information) existing in each record of the anonymization target data set shown in FIG. 2 is converted to a fake ID (IDentifier) in the data set shown in FIG. The fake ID is local identification information for the data set shown in FIG. 3 that shows only the relationship between each record of the data set shown in FIG. 3 and does not specify a specific individual.
 図3に示すデータセットは、ある個人に対する「性別」、「生年月日」及び「診療年月」に関する知識の任意の組み合わせから、個人のレコードをk個未満に絞り込まれることを防ぐようにされている。図3に示すデータセットは、図2に示す匿名化対象のデータセットの各レコードの準識別子に対して、k=2のk-匿名性が保たれるように、加工を施されている。即ち、図3に示すデータセットは、「性別」、「生年月日」及び「診療年月」に対するいかなる知識を用いたとしても、ある個人に紐づくレコードが2つ以上存在するように、k=2のk-匿名化を施されたデータセットである。 The data set shown in FIG. 3 is designed to prevent a person's records from being narrowed down to less than k from any combination of knowledge about “gender”, “birth date”, and “care date” for an individual. ing. The data set shown in FIG. 3 is processed so that k-anonymity of k = 2 is maintained with respect to the quasi-identifier of each record of the anonymization target data set shown in FIG. That is, the data set shown in FIG. 3 is such that there are two or more records associated with an individual, regardless of what knowledge is used for “gender”, “birth date”, and “care date”. = 2 k-anonymized data set.
 図3に示すデータセットにおいて、背景技術において扱うデータセットと異なる点は、次の点である。すなわち、それは、図3に示すデータセットは、一個人に関して複数のレコードが格納されている匿名化対象のデータセット(図2)を匿名化した情報であるという点である。具体的には、図2に示す匿名化対象のデータセットの各個人のレコード群には、特定の接続関係、すなわち、名前(固有識別情報)の属性が共通するという接続関係が存在する。そして、同じ固有識別情報を有する複数のレコードが、匿名化された偽IDによって図3に示すデータセットとして記録媒体に保存されている点が異なる。即ち、上述したように、関連技術のk-匿名化では、一個人のレコードが同時に複数出現することが考慮されていない。 The data set shown in FIG. 3 is different from the data set handled in the background art as follows. That is, it is that the data set shown in FIG. 3 is anonymized information set (FIG. 2) to be anonymized in which a plurality of records are stored for one individual. Specifically, each individual record group of the anonymization target data set shown in FIG. 2 has a specific connection relationship, that is, a connection relationship in which attributes of names (unique identification information) are common. The difference is that a plurality of records having the same unique identification information are stored in the recording medium as a data set shown in FIG. 3 by anonymized fake IDs. That is, as described above, k-anonymization of related technology does not take into consideration that a plurality of records of one individual appear at the same time.
 この図3に示すデータセットは、各レコードに対して、真にk=2のk-匿名性を充足できていない。その理由は、以下の通りである。 The data set shown in FIG. 3 does not truly satisfy k = 2 anonymity for each record. The reason is as follows.
 図2に示す名前:「Alice」を持つ「性別:女、生年月日:1985年2月2日、診療年月:2010年4月」の対象レコード822は、図3に示す偽ID:「2」を持つ「性別:Any、生年月日:1981~1985年、診療年月:2010年4月」の匿名化レコード832に加工されている。また、図2に示す名前:「Alice」を持つ「性別:女、生年月日:1985年2月2日、診療年月:2010年5月」の対象レコード825は、図3に示す偽ID:「2」を持つ「性別:女、生年月日:1985~1986年、診療年月:2010年5月」の匿名化レコード835に加工されている。 The target record 822 of “sex: female, date of birth: February 2, 1985, date of medical treatment: April 2010” having the name: “Alice” shown in FIG. 2 is a false ID: “ 2 is processed into an anonymized record 832 of “sex: Any, date of birth: 1981-1985, date of medical treatment: April 2010”. Further, the target record 825 of “sex: female, date of birth: February 2, 1985, date of medical treatment: May 2010” having the name: “Alice” shown in FIG. 2 is a false ID shown in FIG. It is processed into an anonymized record 835 of “sex: woman, date of birth: 1985-1986, date of medical treatment: May 2010” with “2”.
 このとき、ある人物xは、Aliceに関する情報として「性別:女、生年月日:1985年2月2日」を知っていると仮定する。そのような場合でも、偽ID:2を持つ匿名化レコード832及び匿名化レコード835のそれぞれは、「性別:女、生年月日:1985年2月2日」に関して2-匿名性を有している。 At this time, it is assumed that a certain person x knows “sex: woman, date of birth: February 2, 1985” as information about Alice. Even in such a case, each of the anonymization record 832 and the anonymization record 835 having the fake ID: 2 has 2-anonymity regarding “gender: female, date of birth: February 2, 1985”. Yes.
 しかしながら、「性別」や「生年月日」の属性はある個人に対して不変な属性値を持つ。従って、同一の偽IDを持つ匿名化レコードは、匿名化前の対象レコードにおいて、そのような不変な属性値として同一の属性値を持っていることが容易に推定できる。この知識を前提とし、その人物xは、偽IDに基づいて匿名化レコードを結合することができる。この場合、人物xは、偽ID:2をもつ「性別:Any、生年月日:1981~1985年」の匿名化レコード832と、「性別:女、生年月日:1985~1986年」の匿名化レコード835との積から、具体化された偽ID:「2」の匿名化レコードの情報として「性別:女、生年月日:1985年」を得ることができる。以上より、偽ID:「2」の2-匿名性は破られてしまう。 However, the attributes of “sex” and “birth date” have an invariant attribute value for a certain individual. Therefore, it can be easily estimated that anonymized records having the same fake ID have the same attribute value as such an invariant attribute value in the target record before anonymization. Given this knowledge, the person x can combine the anonymization records based on the fake ID. In this case, the person x has anonymized record 832 of “sex: Any, date of birth: 1981-1985” with a false ID: 2 and anonymity of “sex: woman, date of birth: 1985-1986”. From the product with the quantified record 835, “sex: female, date of birth: 1985” can be obtained as information of the anonymized record of the actualized false ID: “2”. From the above, the 2-anonymity of the false ID: “2” is broken.
 本発明の目的は、上述した問題点を解決できる匿名化を行う情報処理装置、匿名化方法及びプログラムを提供することにある。 An object of the present invention is to provide an information processing apparatus, anonymization method, and program for anonymization that can solve the above-described problems.
 本発明の匿名化を行う情報処理装置は、固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成するk-匿名化部と、前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力する匿名性強化部と、を含み、前記匿名性強化部は、前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である前記強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換することで、前記匿名化レコードを前記強化レコードに変換する。 The information processing apparatus that performs anonymization according to the present invention includes the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information. Are converted to false identification information that is uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into false identification information. k- to satisfy anonymity, converts the anonymous quasi identifier, said converting the target record to anonymous record comprising said anonymized record k- generating the anonymous data sets k- anonymizing section And converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, and anonymity enhancement including the enhanced record An anonymity enhancing unit that generates and outputs a data set, and the anonymity enhancing unit includes the target quasi-identifier that always has the same attribute value corresponding to each of the unique identification information in the anonymization target data set. The anonymization quasi-identifier corresponding to the anonymization quasi-identifier is converted into information that cannot be instantiated the quasi-identification quasi-identifier by comparing the quasi-identification quasi-identifier, and Convert to the enhanced record.
 本発明の匿名化方法は、固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成し、前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力し、前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換することで、前記匿名化レコードを前記強化レコードに変換する。 In the anonymization method of the present invention, for each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is The restoration information to the unique identification information is not included, converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is converted into k-anonymity by the anonymization target data set Is converted to an anonymized quasi-identifier, the target record is converted to an anonymized record, a k-anonymized data set including the anonymized record is generated, and the k-anonymized data set is , Converting the anonymization record having the same false identification information into an enhancement record, generating an anonymity enhancement data set including the enhancement record, and outputting, In the anonymization target data set, the enhancement target quasi-identifier corresponding to the target quasi-identifier always corresponding to each of the unique identification information and having the same attribute value is the anonymization quasi-identifier, The anonymization record is converted to the strengthening record by converting the strengthening target semi-identifier into information that cannot be materialized by comparison.
 本発明の不揮発性記録媒体プログラムは、固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成する処理と、前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力する処理と、をコンピュータに実行させるプログラムを記録し、前記匿名化レコードを前記強化レコードに変換する処理は、前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換する処理である。 The non-volatile recording medium program of the present invention provides each of the unique identification information for an anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information. Are converted to false identification information that is uniquely assigned to each of the unique identification information, and the anonymization target data set is k− so as to satisfy the anonymity, it converts the anonymous quasi identifier, converting the target record to anonymous record, and generating a k- anonymous data set that contains the anonymous record, the k- anonymous Anonymization enhanced data including the enhanced record by converting the anonymized record having the same false identification information into a strengthened record A process for generating a computer and a process for causing the computer to execute recording and a process for converting the anonymization record into the enhanced record correspond to each of the unique identification information in the anonymization target data set. Information corresponding to the target quasi-identifier that always has the same attribute value, the quasi-identifier quasi-identifier that is the anonymization quasi-identifier, and the reinforcement quasi-identifier cannot be instantiated by contrasting the strengthening quasi-identifier It is processing to convert to.
 本発明は、k-匿名化後のレコード群の準識別子を対比しても個人特定性の向上ができないように、匿名性を強化したデータセットを生成することが可能になるという効果がある。 The present invention has an effect that it is possible to generate a data set with enhanced anonymity so that the personality cannot be improved even if the quasi-identifier of the record group after k-anonymization is compared.
図1は第1の実施形態に係る匿名化装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment. 図2は第1の実施形態における匿名化対象データセットの一例を示す図である。FIG. 2 is a diagram illustrating an example of the anonymization target data set in the first embodiment. 図3は第1の実施形態におけるk-匿名化データセットの一例を示す図である。FIG. 3 is a diagram showing an example of a k-anonymization data set in the first embodiment. 図4は第1の実施形態における匿名性強化データセットの一例を示す図である。FIG. 4 is a diagram illustrating an example of the anonymity enhancement data set in the first embodiment. 図5は第1の実施形態に係る匿名化装置を実現するコンピュータのハードウェア構成を示すブロック図である。FIG. 5 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization apparatus according to the first embodiment. 図6は第1の実施形態における匿名化装置の動作を示すフローチャートである。FIG. 6 is a flowchart showing the operation of the anonymization device according to the first embodiment. 図7は第1の実施形態における匿名性強化部の動作を示すフローチャートである。FIG. 7 is a flowchart showing the operation of the anonymity enhancing unit in the first embodiment. 図8は第1の実施形態の変形例における匿名性強化部の動作を示すフローチャートである。FIG. 8 is a flowchart showing the operation of the anonymity enhancing unit in the modification of the first embodiment. 図9は第1の実施形態における匿名性強化データセットの一例を示す図である。FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set according to the first embodiment. 図10は第2の実施形態に係る匿名化装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of the anonymization apparatus according to the second embodiment. 図11は第2の実施形態における匿名化対象データセットの一例を示す図である。FIG. 11 is a diagram illustrating an example of the anonymization target data set in the second embodiment. 図12は第2の実施形態における、匿名化対象データセットの対象レコードをグループに振り分ける場合のイメージを示す図である。FIG. 12 is a diagram illustrating an image when the target records of the anonymization target data set are distributed to groups in the second embodiment. 図13は第2の実施形態における、ある対象レコードの組み合わせにおいて、匿名化対象データセットがk-匿名化された場合の、匿名化準識別子の一例を  示す。FIG. 13 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment. 図14は第2の実施形態における、ある対象レコードの組み合わせにおいて、匿名化対象データセットがk-匿名化された場合の、匿名化準識別子の一例を  示す。FIG. 14 shows an example of the anonymization quasi-identifier when the anonymization target data set is k-anonymized in a combination of certain target records in the second embodiment. 図15は第2の実施形態における、対象レコードの組み合わせに対応する情報損失量の一例を示す情報である。FIG. 15 is information showing an example of the information loss amount corresponding to the combination of target records in the second embodiment. 図16は第2の実施形態におけるk-匿名化データセットの一例を示す図である。FIG. 16 is a diagram illustrating an example of a k-anonymized data set according to the second embodiment. 図17は第2の実施形態における匿名化装置の動作を示すフローチャートである。FIG. 17 is a flowchart showing the operation of the anonymization apparatus according to the second embodiment.
 本発明を実施するための形態について図面を参照して詳細に説明する。尚、以下の各実施形態及び各図において、本発明の本質に関わらない構成については一般的な技術を採用することとし、本実施形態における詳細な説明及び図示は省略する。また、以下の各実施形態及び各図において、同様の機能を備える構成要素には同様の符号が与えられている。 Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In each of the following embodiments and drawings, a general technique is adopted for a configuration not related to the essence of the present invention, and detailed description and illustration in this embodiment are omitted. In the following embodiments and drawings, the same reference numerals are given to components having similar functions.
 <第1の実施形態>
 図1は、本発明の第1の実施形態に係る匿名化装置(一般的に情報処理装置とも呼ばれる)100の構成を示すブロック図である。
<First Embodiment>
FIG. 1 is a block diagram showing a configuration of an anonymization apparatus (generally also called an information processing apparatus) 100 according to the first embodiment of the present invention.
 図1を参照すると、本実施形態に係る匿名化装置100は、k-匿名化部110と、匿名性強化部120とを含む。 Referring to FIG. 1, the anonymization device 100 according to the present embodiment includes a k-anonymization unit 110 and an anonymity enhancement unit 120.
 尚、図1に示す構成要素は、ハードウェア単位の構成要素でも、コンピュータの機能単位に分割した構成要素でもよい。ここでは、図1に示す構成要素は、コンピュータの機能単位に分割した構成要素として説明する。 The constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of a computer. Here, the components shown in FIG. 1 will be described as components divided into functional units of a computer.
 ===k-匿名化部110===
 k-匿名化部110は、図示しない記憶装置に格納された匿名化対象データセットを、予め定められたk(例えば、2)のk-匿名性を充足するk-匿名化データセットに変換する。尚、その変換は、データを匿名化する処理であり、「加工」とも呼ばれるが、ここでは、「変換」に統一する。
=== k-anonymization unit 110 ===
The k-anonymization unit 110 converts the anonymization target data set stored in a storage device (not shown) into a k-anonymization data set that satisfies k (for example, 2) k-anonymity. . The conversion is a process for anonymizing data, and is also referred to as “processing”, but here, it is unified with “conversion”.
 すなわち、k-匿名化部110は、匿名化対象データセットについて、匿名化対象データセットに含まれるレコード(以後、対象レコードと呼ぶ)を匿名化レコードに変換したk-匿名化データセットを生成する。 That, k- anonymizing section 110, the anonymization target data set to generate a record (hereinafter, referred to as the target record) was converted to anonymous record k- anonymous data sets included in anonymized subject dataset .
 k-匿名化部110は、以下のようにして、対象レコードを匿名化レコードに変換する。第一に、k-匿名化部110は、各対象レコードに含まれる固有識別情報を、固有識別情報に対して固有に割り当てられる偽識別情報に変換する。ここで、偽識別情報は、固有識別情報への復元情報を含まない識別情報である。 The k-anonymization unit 110 converts the target record into an anonymization record as follows. First, the k-anonymization unit 110 converts the unique identification information included in each target record into false identification information that is uniquely assigned to the unique identification information. Here, the false identification information is identification information that does not include restoration information to the unique identification information.
 第二に、k-匿名化部110は、k-匿名化データセットの生成において、各対象レコードに含まれる準識別子(対象準識別子とも呼ばれる)を、同じ属性の少なくともk個の準識別子に対して固有に割り当てられる匿名化準識別子に変換する。ここで、匿名化準識別子は、その匿名化準識別子を含む匿名化対象データセットが、予め定められたk-匿名性を満足するように決定された準識別子である。 Second, the k-anonymization unit 110 generates a quasi-identifier (also referred to as a target quasi-identifier) included in each target record for at least k quasi-identifiers having the same attribute in generating the k-anonymization data set. To an anonymized quasi-identifier that is uniquely assigned. Here, the anonymization quasi-identifier is a quasi-identifier determined so that the anonymization target data set including the anonymization quasi-identifier satisfies predetermined k-anonymity.
 ===匿名化対象データセット820===
 図2は、匿名化対象データセット820の一例を示す図である。図2に示す匿名化対象データセット820は、図示しない記憶装置に格納される。この記憶装置は、k-匿名化部110に含まれていても良いし、k-匿名化部110に接続される外部記憶媒体でもよい。匿名化対象データセット820は、名前(固有識別情報)、性別、生年月日、診療年月及び傷病名の属性からなる複数の対象レコード(例えば、それらのうちの1つが対象レコード822)を含む。ここで、属性は、属性名(属性の要素名)とその属性の値(属性値)とからなる。例えば、対象レコード822の第1の属性は、要素名が「名前」で、「Alice」が属性値である。
=== Anonymization target data set 820 ===
FIG. 2 is a diagram illustrating an example of the anonymization target data set 820. The anonymization target data set 820 shown in FIG. 2 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110. The anonymization target data set 820 includes a plurality of target records (for example, one of them is the target record 822) including attributes of name (unique identification information), gender, date of birth, date of medical treatment, and name of injury and illness. . Here, the attribute includes an attribute name (attribute element name) and a value of the attribute (attribute value). For example, regarding the first attribute of the target record 822, the element name is “name”, and “Alice” is the attribute value.
 ここで、名前は、一種の固有識別情報であり、個人を特定する情報である。 Here, the name is a kind of unique identification information and is information for identifying an individual.
 また、属性である性別、生年月日及び診療年月のそれぞれは、準識別子(対象準識別子)である。固有識別情報のそれぞれに対応して常に同じ属性値を持つ、不変な準識別子は、不変準識別子と呼ばれる。また、固有識別情報のそれぞれに対応して異なる属性値を持つ場合がある、変動的な準識別子は、変動準識別子と呼ばれる。例えば、属性である「性別」、「生年月日」は、不変準識別子である。また、属性である「診療年月」は、変動準識別子である。 Moreover, each of the attributes, sex, date of birth, and date of medical care is a quasi-identifier (target quasi-identifier). An invariant quasi-identifier that always has the same attribute value corresponding to each unique identification information is called an invariant quasi-identifier. A variable quasi-identifier that may have a different attribute value corresponding to each unique identification information is called a variable quasi-identifier. For example, the attributes “gender” and “birth date” are invariant identifiers. The attribute “medical care date” is a variable quasi-identifier.
 ===k-匿名化データセット830===
 図3は、k-匿名化データセット830の一例を示す図である。図3に示す匿名化対象データセット830は、図示しない記憶装置に格納される。この記憶装置は、k-匿名化部110に含まれていても良いし、k-匿名化部110に接続される外部記憶媒体でもよい。図3に示すk-匿名化データセット830は、k-匿名化部110によって図2に示す匿名化対象データセット820がK=2のk-匿名化を施された、k-匿名化データセットの一例である。即ち、k-匿名化データセット830の性別、生年月日及び診療年月の匿名化準識別子のそれぞれは、匿名化対象データセット820の性別、生年月日及び診療年月の対象準識別子のそれぞれがk=2のk-匿名性を満足するような属性値に変換されたものである。
=== k-anonymized data set 830 ===
FIG. 3 is a diagram illustrating an example of the k-anonymization data set 830. The anonymization target data set 830 shown in FIG. 3 is stored in a storage device (not shown). This storage device may be included in the k-anonymization unit 110 or may be an external storage medium connected to the k-anonymization unit 110. The k-anonymization data set 830 shown in FIG. 3 is a k-anonymization data set in which the anonymization target data set 820 shown in FIG. It is an example. That is, the anonymization quasi-identifier of k-anonymization data set 830 is an anonymization quasi-identifier of the anonymization object data set 820, respectively. Is converted to an attribute value that satisfies k-anonymity of k = 2.
 図3に示すように、k-匿名化データセット830は、偽ID(偽識別情報)、性別、生年月日、診療年月及び傷病名の各属性からなる複数の匿名化レコード(例えば、匿名化レコード832)を含む。それらのk-匿名化データセット830の偽IDのそれぞれは、匿名化対象データセット820に含まれる名前のそれぞれに、1対1に対応する。その偽IDは、図3に示すk-匿名化データセット830の各匿名化レコード間の関係のみを示し、具体的な個人を特定しない、k-匿名化データセット830の局所的な識別情報である。 As shown in FIG. 3, the k-anonymization data set 830 includes a plurality of anonymization records (for example, anonymization records) including attributes of fake ID (fake identification information), gender, date of birth, date of medical treatment, and name of sickness. Record 832). Each of the false IDs of the k-anonymization data set 830 corresponds to each of the names included in the anonymization target data set 820 on a one-to-one basis. The fake ID indicates only the relationship between each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and does not specify a specific individual, and is local identification information of the k-anonymization data set 830. is there.
 また、k-匿名化データセット830に含まれる属性である、性別、生年月日、診療年月は、上述した匿名化対象データセット820の場合と同様に、準識別子(匿名化準識別子)である。 Further, the attributes included in the k-anonymization data set 830, such as sex, date of birth, and medical care date, are quasi-identifiers (anonymization quasi-identifiers) as in the case of the anonymization target data set 820 described above. is there.
 尚、図2に示す匿名化対象データセット820の各対象レコードと図3に示すk-匿名化データセット830の各匿名化レコードとは、並び順で対応している(例えば、対象レコード822と匿名化レコード832とが対応し、対象レコード825と匿名化レコード835とが対応している)。 2 corresponds to the anonymization records of the anonymization target data set 820 shown in FIG. 2 and the anonymization records of the k-anonymization data set 830 shown in FIG. Anonymized record 832 corresponds, and target record 825 corresponds to anonymized record 835).
 ===匿名性強化部120===
 匿名性強化部120は、k-匿名化部110によってk-匿名化が施されたデータセット(例えば、k-匿名化データセット830)について、匿名性強化データセットを生成し、出力する。
=== Anonymity Strengthening Unit 120 ===
The anonymity enhancing unit 120 generates and outputs an anonymity enhancing data set for the data set that has been k-anonymized by the k-anonymizing unit 110 (for example, the k-anonymized data set 830).
 具体的には、匿名性強化部120は、k-匿名化データセットに含まれる同一の偽IDを持つ匿名化レコードに含まれる強化対象となる準識別子に対し、匿名性を強化する処理を実行する。ここで、強化対象となる準識別子(強化対象準識別子)は、k-匿名化データセットに含まれる匿名化準識別子の内の、不変準識別子であるものである。また、強化対象準識別子に対する処理は、強化対象準識別子を対比したときに強化対象準識別子の具体化(個人を識別可能または特定可能な属性値に変換すること)が不可能なように、強化対象準識別子を匿名性が強化されたデータに変換することである。以後、この強化対象準識別子を匿名性が強化されたデータに変換することは、強化加工と呼ばれる。 Specifically, the anonymity enhancement unit 120 executes processing for enhancing anonymity for the quasi-identifier to be strengthened included in the anonymization record having the same false ID included in the k-anonymization data set. To do. Here, the quasi-identifier to be strengthened (strengthening quasi-identifier) is an invariant quasi-identifier among the anonymization quasi-identifiers included in the k-anonymization data set. In addition, the process for strengthening quasi-identifiers is strengthened so that when the quasi-identifiers to be reinforced are compared, the quasi-identifiers to be reinforced cannot be instantiated (individuals can be identified or identified) It is to convert the target quasi-identifier into data with enhanced anonymity. Hereinafter, converting this strengthening target quasi-identifier into data with enhanced anonymity is referred to as strengthening processing.
 即ち、匿名性強化部120は、同一ユーザ(同一の偽Id)の複数の匿名化レコード831に含まれる、準識別子の対比によるk-匿名性の破綻を防ぐように、強化対象準識別子を強化加工する。 That is, the anonymity strengthening unit 120 strengthens the quasi-identifier to be strengthened so as to prevent the failure of k-anonymity due to the comparison of quasi-identifiers included in a plurality of anonymized records 831 of the same user (the same fake Id). Process.
 例えば、匿名性強化部120は、強化対象準識別子を、属性名毎に同一の属性値に強化加工する。例えば、この同一の属性値は、同一の偽IDを持つ、属性名毎の全ての強化対象準識別子を包含する属性値であって、かつ最小の範囲を示す属性値である。尚、この同一の属性値は、属性名毎の、同一の偽IDを持つ全ての強化対象準識別子を包含する、任意の範囲を示す属性値でもよい。以後、「属性名毎の、同一の偽IDを持つ全ての強化対象準識別子」を、「同一偽ID強化対象準識別子」と、省略して表記する。 For example, the anonymity enhancing unit 120 reinforces the reinforcement target quasi-identifier to the same attribute value for each attribute name. For example, the same attribute value is an attribute value that includes all the reinforcement target quasi-identifiers for each attribute name having the same false ID, and indicates the minimum range. The same attribute value may be an attribute value indicating an arbitrary range including all the reinforcement target quasi-identifiers having the same false ID for each attribute name. Hereinafter, “all strengthening target quasi-identifiers having the same fake ID for each attribute name” are abbreviated as “same fake ID strengthening target quasi-identifiers”.
 ===匿名性強化データセット840===
 図6は、匿名性強化データセット840の一例を示す図である。匿名性強化データセット840は、匿名性変換処理部120から出力され、図示しない記憶装置に格納される情報である。図6に示すように、匿名性強化データセット840は、偽ID、性別、生年月日、診療年月及び傷病名の各属性からなる複数の強化レコード(例えば、強化レコード842)を含む。強化レコードは、図3に示すk-匿名化データセット830の強化対象準識別子である性別及び生年月日が匿名性強化部120によって強化加工されたものである。
=== Anonymity enhancement data set 840 ===
FIG. 6 is a diagram illustrating an example of the anonymity enhancement data set 840. The anonymity enhancement data set 840 is information output from the anonymity conversion processing unit 120 and stored in a storage device (not shown). As illustrated in FIG. 6, the anonymity enhancement data set 840 includes a plurality of enhancement records (for example, enhancement records 842) including attributes of fake ID, gender, date of birth, date of medical care, and name of injury and illness. The enhancement record is obtained by strengthening the sex and date of birth, which are the quasi-identifiers to be strengthened in the k-anonymization data set 830 shown in FIG.
 尚、図2に示す匿名化対象データセット820の各対象レコード及び図3に示すk-匿名化データセット830の各匿名化レコードと、図4に示す匿名性強化データセット840の各強化レコードとは、並び順で対応している。例えば、対象レコード822及び匿名化レコード832と、強化レコード842とが対応し、対象レコード825及び匿名化レコード835と、強化レコード845とが対応している。 2, each anonymization record of the anonymization target data set 820 shown in FIG. 2, each anonymization record of the k-anonymization data set 830 shown in FIG. 3, and each enhancement record of the anonymity enhancement data set 840 shown in FIG. Correspond in the order of arrangement. For example, the target record 822 and the anonymization record 832 correspond to the strengthening record 842, and the target record 825, the anonymization record 835, and the strengthening record 845 correspond to each other.
 以上が、匿名化装置100の機能単位の各構成要素についての説明である。 This completes the description of each component of the functional unit of the anonymization device 100.
 次に、匿名化装置100のハードウェア単位の構成要素について説明する。 Next, the components of the anonymization device 100 in hardware units will be described.
 本実施形態において、匿名化装置100は、コンピュータ等の情報処理装置によって実現することができる。匿名化装置100及び後述する他の実施形態における匿名化装置における各構成要素(機能ブロック)は、情報処理装置が備えるハードウェア資源によって実現される。その情報処理装置は、記録媒体に格納されるコンピュータ・プログラム(ソフトウェア・プログラム:以下、単に「プログラム」と称する場合がある)を実行するCPU(Central Processing Unit)を含んでもよい。 In the present embodiment, the anonymization device 100 can be realized by an information processing device such as a computer. Each component (functional block) in the anonymization apparatus 100 and the anonymization apparatus in other embodiments described later is realized by hardware resources included in the information processing apparatus. The information processing apparatus may include a CPU (Central Processing Unit) that executes a computer program (software program: hereinafter may be simply referred to as “program”) stored in a recording medium.
 例えば、匿名化装置100は、コンピュータのCPU、主記憶装置、補助記憶装置等のハードウェアを含み、記憶装置等から主記憶装置にロードされたプログラムに基づいてCPUが協働することによって実現される。但し、CPUによって実現される機能は、図1に示したブロック構成(k-匿名化部110、匿名性強化部120)には限定されず、当業者が採用し得る様々な実装形態を適用可能である(以下の各実施形態においても同様)。 For example, the anonymization device 100 includes hardware such as a CPU of a computer, a main storage device, and an auxiliary storage device, and is realized by the cooperation of the CPU based on a program loaded from the storage device or the like to the main storage device. The However, the functions realized by the CPU are not limited to the block configuration shown in FIG. 1 (k-anonymization unit 110, anonymity enhancement unit 120), and various implementation forms that can be adopted by those skilled in the art can be applied. (The same applies to the following embodiments).
 尚、匿名化装置100及び後述する各実施形態に係る匿名化装置は、専用の装置によって実現してもよい。 Note that the anonymization device 100 and the anonymization device according to each embodiment to be described later may be realized by a dedicated device.
 図5は、本実施形態における匿名化装置100を実現するコンピュータ700のハードウェア構成を示す図である。 FIG. 5 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.
 図5に示すように、コンピュータ700は、CPU(Central Processing Unit)701、記憶部702、記憶装置703、入力部704、出力部705及び通信部706を含む。更に、コンピュータ700は、外部から供給される記録媒体(または記憶媒体)707を含む。記録媒体707は、情報を非一時的に記憶する不揮発性記録媒体であってもよい。 As shown in FIG. 5, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
 CPU701は、オペレーティングシステム(不図示)を動作させて、コンピュータ700の、全体の動作を制御する。また、CPU701は、例えば記憶装置703に装着された記録媒体707から、プログラムやデータを読み込み、読み込んだプログラムやデータを記憶部702に書き込む。ここで、そのプログラムは、例えば、後述の図7に示すフローチャートの動作をコンピュータ700に実行させるプログラムである。 The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). The CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702. Here, the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
 そして、CPU701は、読み込んだプログラムに従って、また読み込んだデータに基づいて、図1に示すk-匿名化部110及び匿名性強化部120として各種の処理を実行する。 Then, the CPU 701 executes various processes as the k-anonymization unit 110 and the anonymity enhancement unit 120 shown in FIG. 1 according to the read program and based on the read data.
 尚、CPU701は、通信網(不図示)に接続されている外部コンピュータ(不図示)から、記憶部702にプログラムやデータをダウンロードするようにしてもよい。 Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
 記憶部702は、プログラムやデータを記憶する。記憶部702は、匿名対象データセット、k-匿名化データセット及び匿名性強化データセットを記憶してよい。 The storage unit 702 stores programs and data. The storage unit 702 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.
 記憶装置703は、例えば、光ディスク、フレキシブルディスク、磁気光ディスク、外付けハードディスク及び半導体メモリであって、記録媒体707を含む。記憶装置703は、プログラムをコンピュータ読み取り可能に記録する。また、記憶装置703は、データをコンピュータ読み取り可能に記録してもよい。記憶装置703は、匿名対象のデータセット、k-匿名化データセット及び匿名性強化データセットを記憶してよい。 The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer. The storage device 703 may store an anonymity target data set, a k-anonymization data set, and an anonymity enhancement data set.
 入力部704は、例えばマウスやキーボード、内蔵のキーボタンなどで実現され、入力操作に用いられる。入力部704は、マウスやキーボード、内蔵のキーボタンに限らず、例えばタッチパネル、加速度計、ジャイロセンサ、カメラなどでもよい。 The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like.
 出力部705は、例えばディスプレイで実現され、出力を確認するために用いられる。 The output unit 705 is realized by a display, for example, and is used for confirming the output.
 通信部706は、外部(例えば、匿名化対象データセットを記憶するデータサーバ)とのインタフェースを実現する。通信部706は、例えばk-匿名化部110の一部として含まれる。 The communication unit 706 implements an interface with the outside (for example, a data server that stores an anonymization target data set). The communication unit 706 is included as part of the k-anonymization unit 110, for example.
 以上説明したように、図1に示す匿名化装置100の機能単位のブロックは、図5に示すハードウェア構成のコンピュータ700によって実現される。但し、コンピュータ700が備える各部の実現手段は、上記に限定されない。すなわち、コンピュータ700は、物理的に結合した1つの装置により実現されてもよいし、物理的に分離した2つ以上の装置を有線または無線で接続し、これら複数の装置により実現されてもよい。 As described above, the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
 尚、上述のプログラムのコードを記録した記録媒体707が、コンピュータ700に供給され、CPU701は、記録媒体707に格納されたプログラムのコードを読み出して実行するようにしてもよい。或いは、CPU701は、記録媒体707に格納されたプログラムのコードを、記憶部702、記憶装置703またはその両方に格納するようにしてもよい。すなわち、本実施形態は、コンピュータ700(CPU701)が実行するプログラム(ソフトウェア)を、一時的にまたは非一時的に、記憶する記録媒体707の実施形態を含む。 Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.
 以上が、本実施形態における匿名化装置100を実現するコンピュータ700の、ハードウェア単位の各構成要素についての説明である。 This completes the description of each component of the computer 700 that implements the anonymization device 100 according to the present embodiment.
 次に本実施形態の動作について、図1~図9を参照して詳細に説明する。 Next, the operation of this embodiment will be described in detail with reference to FIGS.
 まず、図6に示すフローチャートを参照して匿名化装置100の動作を説明する。図6は、本実施形態の匿名化装置100の動作を示すフローチャートである。尚、このフローチャートによる処理は、前述したCPUによるプログラム制御に基づいて、実行されても良い。また、処理のステップ名については、S601のように、記号で記載する。 First, the operation of the anonymization device 100 will be described with reference to the flowchart shown in FIG. FIG. 6 is a flowchart showing the operation of the anonymization device 100 of this embodiment. Note that the processing according to this flowchart may be executed based on the above-described program control by the CPU. Further, the step name of the process is described by a symbol as in S601.
 k-匿名化部110は、匿名化対象データセット820を取得する(S601)。k-匿名化部110は、例えば、図5に示す記憶部702または記憶装置703に保持されている匿名化対象データセットを読み取る。尚、k-匿名化部110は、通信部706を介して、外部(不図示)から匿名化対象データセットを受信するようにしてもよい。また、k-匿名化部110は、入力部704を介して入力された匿名化対象データセットを受け取るようにしてもよい。 The k-anonymization unit 110 acquires the anonymization target data set 820 (S601). For example, the k-anonymization unit 110 reads the anonymization target data set held in the storage unit 702 or the storage device 703 illustrated in FIG. Note that the k-anonymization unit 110 may receive the anonymization target data set from the outside (not shown) via the communication unit 706. Further, the k-anonymization unit 110 may receive the anonymization target data set input via the input unit 704.
 次に、k-匿名化部110は、匿名化対象データセット820をk-匿名化してk-匿名化データセット830を生成し、記憶部702または記憶装置703に出力する(S602)。k-匿名化部110は、k-匿名化データセット830において少なくともk種類の異なる偽IDを持つ匿名化レコードが、同一の匿名化準識別子の組み合わせを持つように、匿名化対象データセット820の各対象レコードの対象準識別子を匿名化準識別子に変換する。ここで、各対象レコードの対象準識別子を匿名化準識別子に変換する方法は、例えば、対象準識別子を汎化することによる抽象化である。尚、各対象レコードの対象準識別子をk-匿名化する方法は、特定の方法に捕らわれず例えば摂動化などの様々な方法を用いてよい。 Next, k- anonymizing section 110, the anonymization target data set 820 k- and anonymized generate k- anonymous data set 830, and outputs to the storage unit 702 or the storage device 703 (S602). The k-anonymization unit 110 sets the anonymization target data set 820 so that anonymization records having at least k different false IDs in the k-anonymization data set 830 have the same combination of anonymization quasi-identifiers. The target quasi-identifier of each target record is converted into an anonymization quasi-identifier. Here, the method of converting the target quasi-identifier of each target record into the anonymized quasi-identifier is, for example, abstraction by generalizing the target quasi-identifier. The method for anonymizing the target quasi-identifier of each target record is not limited to a specific method, and various methods such as perturbation may be used.
 次に、匿名性強化部120は、k-匿名化データセット830に含まれる強化対象準識別子を強化加工して、匿名化レコードを強化レコードに変換した匿名性強化データセット840を生成する(S603)。 Next, the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the k-anonymization data set 830 to generate an anonymity enhancement data set 840 obtained by converting the anonymization record into the enhancement record (S603). ).
 例えば、匿名性強化部120は、同一偽ID強化対象準識別子を、属性名毎に同一になるように強化加工する。こうして、匿名性強化部120は、同一の偽IDを持つ全ての匿名化レコードを強化レコードに変換する。この変換処理には、汎化や摂動化など一般的にk-匿名化で用いられる様々な処理の、任意の組み合わせを用いることができる。 For example, the anonymity strengthening unit 120 reinforces the same false ID strengthening target quasi-identifier so as to be the same for each attribute name. In this way, the anonymity enhancing unit 120 converts all anonymized records having the same fake ID into enhanced records. For this conversion process, any combination of various processes generally used in k-anonymization such as generalization and perturbation can be used.
 同一の偽IDを持つ全ての強化レコードにおいて、属性名毎の不変準識別子(強化識別子)が同一である場合、複数の強化レコードの不変準識別子(強化識別子)を対比したとしても、それ以上に不変準識別子(強化識別子)が具体化されることはない。従って、対比された場合でも、所望のk-匿名性の、破綻を防ぐことができる。 If the invariant canonical identifier (strengthening identifier) for each attribute name is the same in all the strengthening records having the same fake ID, even if the invariant canonical identifiers (strengthening identifiers) of a plurality of strengthening records are compared Invariant quasi-identifiers (strengthened identifiers) are never embodied. Therefore, even when compared, the desired k-anonymity can be prevented from being broken.
 次に、匿名性強化部120は、生成した匿名性強化データセット840を出力する(S604)。匿名性強化部120は、例えば、通信部706を介して、外部(不図示)に匿名性強化データセットを出力する。尚、匿名性強化部120は、図5に示す記憶部702または記憶装置703に匿名性強化データセットを格納するようにしてもよい。また、匿名性強化部120は、図5に示す出力部705に匿名性強化データセットを出力し、ディスプレイに表示するよう制御してもよい。 Next, the anonymity enhancing unit 120 outputs the generated anonymity enhancing data set 840 (S604). For example, the anonymity enhancing unit 120 outputs the anonymity enhancing data set to the outside (not shown) via the communication unit 706. Note that the anonymity enhancing unit 120 may store the anonymity enhancing data set in the storage unit 702 or the storage device 703 illustrated in FIG. 5. Further, the anonymity enhancing unit 120 may output the anonymity enhancing data set to the output unit 705 shown in FIG. 5 and control it to be displayed on the display.
 以上が、図6に示すフローチャートによる匿名化装置100の動作の説明である。 The above is description of operation | movement of the anonymization apparatus 100 by the flowchart shown in FIG.
 次に、図6に示すフローチャートのS603について説明する。 Next, S603 in the flowchart shown in FIG. 6 will be described.
 図7は、匿名性強化部120が匿名性強化データセットを生成する動作(図6に示すS603)を示すフローチャートである。 FIG. 7 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set.
 匿名性強化部120は、同一偽IDを持つ全ての匿名化レコード群のそれぞれに対して、S611からS614の処理を行う。例えば、図3に示すk-匿名化データセット830の場合、匿名性強化部120は、偽ID:2の匿名化レコード832及び匿名化レコード835と、偽ID:4の匿名化レコード834及び匿名化レコード836とに対して、S611からS614の処理を行う。 The anonymity enhancing unit 120 performs the processing from S611 to S614 for each of all anonymized record groups having the same false ID. For example, in the case of the k-anonymization data set 830 shown in FIG. 3, the anonymity enhancement unit 120 performs anonymization record 832 and anonymization record 835 with false ID: 2, anonymization record 834 with false ID: 4 and anonymization The processing from S611 to S614 is performed on the conversion record 836.
 匿名性強化部120は、処理対象の偽IDを選出する(S611)。ここで、複数の匿名化レコードが同一の偽IDを持つ場合、その偽IDが処理対象の偽IDである。処理対象の偽IDがない場合(S612でYES)、処理は終了する。 The anonymity enhancing unit 120 selects a fake ID to be processed (S611). Here, when a plurality of anonymization records have the same fake ID, the fake ID is a fake ID to be processed. If there is no fake ID to be processed (YES in S612), the process ends.
 次に、匿名性強化部120は、選出した偽IDを持つ全ての匿名化レコードの強化対象準識別子を、属性名毎の同一の属性値に強化加工する(S613)。 Next, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifiers of all anonymized records having the selected false ID into the same attribute value for each attribute name (S613).
 次に、匿名性強化部120は、処理対象の匿名化レコードの強化対象準識別子を、属性名毎に特定の強化対象準識別子と同一の属性値に強化加工する(S614)。ここで、処理対象の匿名化レコードは、「S613において、強化対象準識別子を強化加工した匿名化レコード」と同一の強化対象準識別子グループ(後述)に属する、匿名化レコードである。また、特定の強化対象準識別子は、「S613において、強化対象準識別子を強化加工した匿名化レコード」の強化対象準識別子である。 Next, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record to be processed into the same attribute value as the specific reinforcement quasi-identifier for each attribute name (S614). Here, the processing target anonymization record is an anonymization record belonging to the same reinforcement target quasi-identifier group (described later) as “anonymization record obtained by strengthening the reinforcement quasi-identifier in S613”. The specific reinforcement target quasi-identifier is the reinforcement quasi-identifier of “anonymization record obtained by strengthening the reinforcement target quasi-identifier in S613”.
 尚、S614の処理開始時において、処理対象の匿名化レコードが1つもない場合(S614で無し)、処理はS611へ戻る。 Note that when there is no anonymization record to be processed at the start of the processing in S614 (no in S614), the processing returns to S611.
 ここで、図3に示すk-匿名化データセット830を参照して、強化対象準識別子グループについて説明する。 Here, the quasi-identifier group to be strengthened will be described with reference to the k-anonymization data set 830 shown in FIG.
 例えば、偽ID:2の匿名化レコード832及び匿名化レコード835のいずれかと同一の強化対象準識別子を持つ匿名化レコードは、偽ID:1を持つ匿名化レコード831(性別:Any、生年月日:1981~1985年)と偽ID:4を持つ匿名化レコード836(性別:女、生年月日:1985~1986年)である。このような、同一の強化対象準識別子を持つ複数の匿名化レコードは、同一の強化対象準識別子グループに所属する。 For example, the anonymization record having the same quasi-identifier to be strengthened as either the anonymization record 832 or the anonymization record 835 of the false ID: 2 is the anonymization record 831 having the false ID: 1 (gender: Any, date of birth) : 1981-1985) and anonymized record 836 with fake ID: 4 (gender: female, date of birth: 1985-1986). A plurality of such anonymized records having the same reinforcement target quasi-identifier belong to the same reinforcement target quasi-identifier group.
 即ち、偽ID:2を持つ匿名化レコード832と偽ID:1を持つ匿名化レコード831とは、同じ強化対象準識別子グループに所属する。また、偽ID:2を持つ匿名化レコード835と偽ID:4を持つ匿名化レコード836とは、同じ強化対象準識別子グループに所属する。また、偽ID:2を持つ匿名化レコードと同一の強化対象準識別子グループに属する匿名化レコードは、偽ID:1の匿名化レコード831及び偽ID:4の匿名化レコード836である。 That is, the anonymization record 832 having the false ID: 2 and the anonymization record 831 having the false ID: 1 belong to the same reinforcement target quasi-identifier group. Also, the anonymization record 835 having the false ID: 2 and the anonymization record 836 having the false ID: 4 belong to the same reinforcement target quasi-identifier group. Further, the anonymization records belonging to the same reinforcement target quasi-identifier group as the anonymization record having the false ID: 2 are the anonymization record 831 of the false ID: 1 and the anonymization record 836 of the false ID: 4.
 次に、匿名性強化部120は、処理対象の匿名化レコードの強化対象準識別子を、属性名毎に「S614で強化加工した特定の強化対象準識別子と同一の属性値」に強化加工する(S615)。ここで、処理対象の匿名化レコードは、S614で強化対象準識別子を再び強化加工した匿名化レコードと同一の偽IDを持つ、匿名化レコードである。また、特定の強化対象準識別子は、S614で強化加工された強化対象準識別子である。 Next, the anonymity enhancement unit 120 strengthens the reinforcement target quasi-identifier of the processing target anonymization record to “the same attribute value as the specific reinforcement target quasi-identifier strengthened in S614” for each attribute name ( S615). Here, the anonymization record to be processed is an anonymization record having the same fake ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again in S614. The specific reinforcement target quasi-identifier is the reinforcement target quasi-identifier strengthened in S614.
 そして、処理はS614へ戻る。 Then, the process returns to S614.
 尚、S615の処理開始時において、S614で強化対象準識別子を再び強化加工した匿名化レコードと同一の偽IDを持つ匿名化レコードが1つもない場合(S615で無)、処理はS611へ戻る。 Note that when there is no anonymization record having the same fake ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again in S614 at the start of the process of S615 (No in S615), the process returns to S611.
 以上のようにして、匿名性強化部120は、同一の偽IDを持つ匿名化レコード間で同一の属性値の強化対象準識別子(不変準識別子)を与えるように、それらの強化対象準識別子に強化加工を施す。更に、匿名性強化部120は、強化対象準識別子に強化加工が施された強化レコードと同一の強化対象準識別子グループに所属している、匿名化レコードの強化対象準識別子にも、強化加工を施す。匿名性強化部120は、この強化対象準識別子の強化加工を再帰的に行う。 As described above, the anonymity enhancing unit 120 assigns the reinforcement target quasi-identifiers of the same attribute value between the anonymization records having the same fake ID to the reinforcement quasi-identifiers. Reinforce processing. Furthermore, the anonymity enhancement unit 120 applies the reinforcement process to the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the reinforcement record in which the reinforcement process is applied to the reinforcement target quasi-identifier. Apply. The anonymity enhancement unit 120 recursively reinforces the reinforcement target quasi-identifier.
 即ち、匿名性強化部120は、ある匿名化レコードの強化対象準識別子を強化加工した場合、同一の偽IDを持つ匿名化レコードにおける強化対象準識別子も同じように強化加工する。更に、匿名性強化部120は、強化対象準識別子を強化加工した匿名化レコードと同じ強化対象準識別子グループに属する匿名化レコードの強化対象準識別子を強化加工する。更に、匿名性強化部120は、強化対象準識別子を再び強化加工した匿名化レコードと同一の偽IDを持つ匿名化レコードの強化対象準識別子の強化加工へと再帰的に繰り返していく。 That is, when the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of a certain anonymization record, the anonymity enhancement unit 120 also reinforces the reinforcement target quasi-identifier in the anonymization record having the same false ID. Furthermore, the anonymity enhancing unit 120 strengthens the reinforcement target quasi-identifier of the anonymization record belonging to the same reinforcement target quasi-identifier group as the anonymization record obtained by strengthening the reinforcement target quasi-identifier. Furthermore, the anonymity enhancing unit 120 recursively repeats the reinforcement process of the reinforcement target quasi-identifier of the anonymization record having the same false ID as the anonymization record obtained by strengthening the reinforcement target quasi-identifier again.
 次に、匿名性強化部120がk-匿名化データセット830を匿名性強化データセット840に変換する動作を、具体的な値を示して説明する。 Next, an operation in which the anonymity enhancing unit 120 converts the k-anonymized data set 830 into the anonymity enhanced data set 840 will be described with specific values.
 まず、匿名性強化部120は、偽ID:2を、処理対象の偽IDとして選出する(S611)。 First, the anonymity enhancing unit 120 selects a fake ID: 2 as a fake ID to be processed (S611).
 次に、匿名性強化部120は、選出した偽ID:2を持つ匿名化レコード832及び匿名化レコード835の強化対象準識別子を、属性名毎の同一の属性値に変換することで強化加工する(S613)。 Next, the anonymity strengthening unit 120 reinforces the selected anonymization record 832 having the selected false ID: 2 and the reinforcement target quasi-identifier of the anonymization record 835 by converting them into the same attribute value for each attribute name. (S613).
 ここでは、属性名が「性別」の強化対象準識別子に対する同一の属性値は、「Any」と「女」を包含する「Any」となる。また、属性名が「生年月日」の強化対象準識別子に対する同一の属性値は、「1981~1985年」と「1985~1986年」を包含する「1981~1986年」となる。 Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is “Any” including “Any” and “female”. In addition, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is “1981 to 1986” including “1981 to 1985” and “1985 to 1986”.
 このようにして匿名性強化部120は、偽ID:2を持つ匿名化レコード832及び匿名化レコード835の強化対象準識別子を、「性別:Any、生年月日:1981~1986年」に汎化する。 In this way, the anonymity enhancement unit 120 generalizes the reinforcement target quasi-identifiers of the anonymization record 832 and the anonymization record 835 having a false ID: 2 into “sex: Any, date of birth: 1981-1986”. To do.
 次に、匿名性強化部120は、S613において強化対象準識別子を強化加工した、偽ID:2の匿名化レコード832及び匿名化レコード835と同一の強化対象準識別子グループに属する、匿名化レコードの強化対象準識別子を強化加工する。即ち、匿名性強化部120は、偽ID:2の匿名化レコード832と同一の強化対象準識別子グループに属する、偽ID:1の匿名化レコード831の強化対象準識別子を、偽ID:2の匿名化レコード832の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1986年」に強化加工する。同時に、匿名性強化部120は、偽ID:2の匿名化レコード835と同一の強化対象準識別子グループに属する、偽ID:4の匿名化レコード836の強化対象準識別子を、偽ID:2の匿名化レコード835の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1986年」に強化加工する(S614)。 Next, anonymity reinforcing portion 120 has enhanced processed-enhanced quasi identifier in S613, the false ID: belonging to two anonymization record 832 and the same be reinforced semi identifier group and anonymizing record 835, anonymization record Strengthen the quasi-identifier for reinforcement. That is, the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 831 of false ID: 1 belonging to the same reinforcement target quasi-identifier group as the anonymization record 832 of false ID: 2 to the false ID: 2. Reinforce processing to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 832. At the same time, the anonymity enhancement unit 120 assigns the reinforcement target quasi-identifier of the anonymization record 836 of false ID: 4 belonging to the same reinforcement target quasi-identifier group as the anonymization record 835 of false ID: 2 to the false ID: 2 Strengthening is performed to “sex: Any, date of birth: 1981-1986”, which is the same attribute value as the quasi-identifier to be strengthened in the anonymization record 835 (S614).
 次に、匿名性強化部120は、偽ID:4を、処理対象の偽IDとして選出する(S611)。 Next, the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S611).
 次に、匿名性強化部120は、選出した偽ID:4を持つ匿名化レコード834及び匿名化レコード836の強化対象準識別子を、属性名毎の同一の属性値に変換することで強化加工する(S613)。ここで、同一の強化対象準識別子グループに属する偽ID:2の匿名化レコード835と偽ID:4の匿名化レコード836は、S611において偽ID:2が選出された場合のS612の処理によって不変準識別子「性別:Any、生年月日:1981~1986年」を与えられている。従って、匿名性強化部120は、偽ID:4の匿名化レコード834及び匿名化レコード836を、「性別:Any、生年月日:1981~1990年」に汎化する。ここで、「性別:Any、生年月日:1981~1990年」は、「性別:女、生年月日:1986~1990年」と「性別:Any、生年月日:1981~1986年」とにおいて属性名毎に属性値のそれぞれを包含し、かつ最小の範囲を示す属性値を同一の属性値とした強化対象準識別子である。 Next, the anonymity enhancing unit 120 reinforces the selected anonymization record 834 having the selected false ID: 4 and the reinforcement target quasi-identifier of the anonymization record 836 by converting them into the same attribute value for each attribute name. (S613). Here, the fake ID: 2 anonymization record 835 and the fake ID: 4 anonymization record 836 belonging to the same reinforcement target quasi-identifier group are not changed by the process of S612 when the fake ID: 2 is selected in S611. The quasi-identifier “sex: Any, date of birth: 1981-1986” is given. Therefore, the anonymity strengthening unit 120 generalizes the anonymization record 834 and the anonymization record 836 of the false ID: 4 into “sex: Any, date of birth: 1981-1990”. Here, "sex: Any, date of birth: 1981-1990" means "sex: woman, date of birth: 1986-1990" and "sex: Any, date of birth: 1981-1986" This is a reinforcement target quasi-identifier that includes each attribute value for each attribute name and uses the same attribute value as the attribute value indicating the minimum range.
 次に、匿名性強化部120は、S613において強化対象準識別子を強化加工した偽ID:4の匿名化レコード834及び匿名化レコード836と同一の強化対象準識別子グループに属する、匿名化レコードの強化対象準識別子を強化加工する。即ち、匿名性強化部120は、偽ID:4の匿名化レコード834と同一の強化対象準識別子グループに属する偽ID:3の匿名化レコード833の強化対象準識別子を属性名毎に、匿名化レコード834の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1990年」に強化加工する。また、匿名性強化部120は、偽ID:4の匿名化レコード836と同一の強化対象準識別子グループに属する偽ID:2の匿名化レコード835の強化対象準識別子を属性名毎に、匿名化レコード836の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1990年」に強化加工する(S614)。 Next, anonymity reinforced section 120, be reinforced semi identifier enhanced processed fake ID in S613: belonging to 4 anonymization record 834 and the same be reinforced semi identifier group and anonymizing record 836, strengthening anonymized record Strengthen the target quasi-identifier. In other words, anonymity strengthening section 120, false ID: fake ID belong to the same strengthening the subject quasi-identifier groups and anonymous record 834 of 4: to strengthen the subject quasi-identifier of anonymous record 833 for each attribute name of 3, anonymous Reinforce processing to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 834. In addition, anonymity strengthening section 120, false ID: 4 of anonymity record 836 false belong to the same strengthening the subject quasi-identifier group and ID: the strengthening subject quasi-identifier for each attribute names of 2 of anonymity record 835, anonymous Strengthening is performed to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened in the record 836 (S614).
 次に、匿名性強化部120は、S614における偽ID:2の匿名化レコード835の強化対象準識別子に対する再びの強化加工に伴って、偽ID:2の匿名化レコード832を属性名毎に匿名化レコード835の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1990年」に強化加工する(S615)。 Next, the anonymity strengthening unit 120 anonymizes the anonymization record 832 of the false ID: 2 for each attribute name along with the re-strengthening process for the reinforcement target quasi-identifier of the anonymization record 835 of the false ID: 2 in S614. The data is strengthened to “sex: Any, date of birth: 1981-1990”, which is the same attribute value as the quasi-identifier to be strengthened of the record 835 (S615).
 次に、匿名性強化部120は、偽ID:2の匿名化レコード832の強化対象準識別子の再びの強化加工に伴って、偽ID:2の匿名化レコード832と同一の強化対象準識別子グループに属する偽ID:1の匿名化レコード831の強化対象準識別子を属性名毎に、偽ID:2の匿名化レコード832の強化対象準識別子と同一の属性値である「性別:Any、生年月日:1981~1986年」に強化加工する(S614)。 Next, the anonymity strengthening unit 120 reinforces the reinforcement target quasi-identifier of the anonymization record 832 of fake ID: 2, and the same reinforcement target quasi-identifier group as the anonymization record 832 of fake ID: 2. For each attribute name, the reinforcement target quasi-identifier of the anonymization record 831 of fake ID: 1 is the same attribute value as that of the quasi-identification record of the anonymization record 832 of fake ID: 2, “sex: Any, date of birth Japan: 1981-1986 ”(S614).
 次に、処理対象の偽IDは残っていないため(S612でYES)、処理は終了する。<第1の実施形態の変形例>
 以上説明したように、本実施形態の匿名性強化部120は、強化対象準識別子(不変準識別子)の強化加工を施すことによって、同一の偽IDを持つ匿名化レコード間の匿名化準識別子の対比によるk-匿名性の破綻を防ぐ。即ち、本実施形態の匿名性強化部120は、この対比をされたとしてもk-匿名性が充足されるように強化対象準識別子に強化加工を施す。
Next, since there is no fake ID to be processed (YES in S612), the process ends. <Modification of First Embodiment>
As described above, the anonymity enhancing unit 120 of the present embodiment performs an enhancement process of the reinforcement target quasi-identifier (invariant quasi-identifier), thereby anonymizing quasi-identifiers between anonymized records having the same false ID. Prevents the breakdown of k-anonymity due to contrast. That is, the anonymity enhancing unit 120 of the present embodiment performs enhancement processing on the reinforcement target quasi-identifier so that k-anonymity is satisfied even if this comparison is made.
 しかしながら、上述の強化加工においては、一つの匿名化レコードの強化対象準識別子が強化加工された場合、再帰的にその強化加工が他の匿名化レコードへと波及していき、多くの匿名化レコードの強化対象準識別子(不変準識別子)が大きく抽象化されてしまう。 However, in the above-mentioned strengthening process, when the reinforcement target quasi-identifier of one anonymization record is strengthened, the strengthening process recursively spreads to other anonymization records, and many anonymization records The strengthening quasi-identifier (invariant quasi-identifier) is greatly abstracted.
 これは、ある偽IDを持つ強化レコードがk-匿名性を充足させるためには、同一の強化準識別子を含む他の偽IDを持つ強化レコードが少なくともk-1個必要になるためである。即ち、k-匿名性は大きいプライバシー強度を持っている。それゆえに、そのk-匿名性を保持する匿名性強化データセットの情報の損失もまた大きい。 This is because, in order for a strengthened record having a certain false ID to satisfy k-anonymity, at least k-1 strengthened records having another false ID including the same strengthened quasi-identifier are required. That is, k-anonymity has great privacy strength. Therefore, the loss of information in the anonymity enhancing data set that retains that k-anonymity is also significant.
 そこで、情報の損失が比較的小さく抑えられた、第1の実施形態の変形例を説明する。 Therefore, a modified example of the first embodiment in which the loss of information is suppressed to be relatively small will be described.
 k-匿名化データセット830は、k-匿名化部110によって、k-匿名化が施されている。そのため、各匿名化レコード単位では、k-匿名性が満たされている。従って、k-匿名性を充足する匿名化レコードの匿名化準識別子を更に汎化する場合、その匿名化レコードの匿名化準識別子に対応する、汎化後の強化レコードは常にk個以上存在する。即ち、対象レコードの対象準識別子に対応する汎化後の強化レコードは、k-匿名化されたレコードと同様に、常にk個以上存在する。従って、匿名化準識別子を更に汎化された匿名性強化データセットは、厳密なk-匿名性の定義からは外れる場合であっても、k-匿名化データセット830のk-匿名性と同様のプライバシー強度を持つことができる。 The k-anonymization data set 830 is k-anonymized by the k-anonymization unit 110. Therefore, k-anonymity is satisfied in each anonymized record unit. Thus, k-case of further generalization anonymization quasi identifier anonymized records satisfying anonymity, corresponding to anonymous quasi identifier of the anonymous record, enhanced record after generalization is always present or k or . That is, there are always k or more strengthened records after generalization corresponding to the target quasi-identifier of the target record, like the k-anonymized record. Therefore, the anonymity enhancement data set obtained by further generalizing the anonymization quasi-identifier is the same as the k-anonymity of the k-anonymization data set 830 even if it is outside the strict definition of k-anonymity. Can have privacy strength.
 ゆえに、同一の偽IDを持つ複数の匿名化レコードの不変準識別子の対比によるk-匿名性の破綻を解消するための、さらなる変換(強化加工)は、必ずしもk-匿名性を保証しなくてもよい。即ち、そのさらなる変換は、対比による個人特定性の上昇を回避するための、さらなる汎化だけでよい。 Therefore, further conversion (enhancement processing) to eliminate the failure of k-anonymity due to comparison of invariant canonical identifiers of a plurality of anonymized records having the same fake ID does not necessarily guarantee k-anonymity. Also good. That is, the further transformation is only a further generalization to avoid an increase in individual specificity due to contrast.
 具体的には、このさらなる汎化によって、同一の偽IDを持つ匿名化レコードの不変準識別子が全て、属性名毎の同一の属性値を持てばよい。同一の偽IDを持つ全ての匿名化レコードが属性名毎に同一の不変準識別子を持てば、それ以上の不変準識別子の特化(具体化)は不可能である。従って、個人の特定性が上昇することはない。 Specifically, it is sufficient that all the invariant identifiers of the anonymized records having the same fake ID have the same attribute value for each attribute name by this further generalization. If all anonymization records having the same fake ID have the same invariant identifier for each attribute name, further specialization (incarnation) of the invariant identifier is impossible. Therefore, individual specificity does not increase.
 これは即ち、匿名性強化部120が、同一の偽IDを持つ匿名化レコードを不変準識別子(強化対象準識別子)の超集合へと汎化すればよいということである。ここで、超集合は、同一偽ID強化対象準識別が、その属性名毎に、同一の属性値(その属性名毎の全ての不変準識別子の、属性値の範囲を包含する属性値)を持つ不変準識別子に変換されたものである。ここで、超集合は、同一偽ID強化対象準識別が、その属性名毎に、同一の属性値(その属性名毎の全ての不変準識別子の、属性値の範囲を包含する属性値)を持つ不変準識別子に変換されたものである。換言すると、超集合は、ある集合の上位概念を表す集合である。ここでは、不変準識別子の属性値は、属性名毎に、全ての不変準識別子の属性値、もしくは不変準識別子の属性値に含まれる値の全てを包含する超集合(または和集合)へと変換される。ここで、和集合とは、全ての不変準識別子の属性値、もしくは不変準識別子の属性値に含まれる値の全てを包含する超集合のうち、最小の超集合である。また、超集合は範囲等を用いて表現してもよい。このような超集合は、k-匿名化部110によって保証されたk-匿名性と同程度のプライバシー強度を保つことができる。 This means that the anonymity strengthening unit 120 should generalize the anonymization record having the same false ID into a superset of invariant quasi-identifiers (strengthening target quasi-identifiers). Here, the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier. Here, the super-set has the same false ID strengthening target semi-identification for each attribute name, and the same attribute value (attribute value including the range of attribute values of all invariant semi-identifiers for each attribute name). It has been converted to an invariant canonical identifier. In other words, a super set is a set that represents a superordinate concept of a set. Here, the attribute value of the invariant quasi-identifier is, for each attribute name, an attribute value of all invariant quasi-identifiers or a superset (or union) that includes all of the values included in the attribute values of the invariant quasi-identifier. Converted. Here, the union is the smallest superset among supersets that include all the invariant identifier attribute values or all the values included in the invariant identifier attribute values. A superset may be expressed using a range or the like. Such a superset can maintain the same privacy strength as the k-anonymity guaranteed by the k-anonymization unit 110.
 次に、本第1の実施形態の変形例における、匿名性強化部120の動作を説明する。 Next, the operation of the anonymity enhancing unit 120 in the modification of the first embodiment will be described.
 図8は、本第1の実施形態の変形例における、匿名性強化部120が匿名性強化データセットを生成する動作(図6に示すS603)を示すフローチャートである。 FIG. 8 is a flowchart showing an operation (S603 shown in FIG. 6) in which the anonymity enhancing unit 120 generates the anonymity enhancing data set in the modification of the first embodiment.
 図8のS621、S622及びS623のそれぞれの動作は、図7のS611、S612及びS613のそれぞれと同じである。 8 are the same as S611, S612, and S613 in FIG. 7, respectively.
 次に、匿名性強化部120がk-匿名化データセット830を匿名性強化データセットに変換する動作を、具体的な値を示して説明する。 Next, the operation in which the anonymity enhancing unit 120 converts the k-anonymized data set 830 into the anonymity enhanced data set will be described with specific values.
 図9は、匿名性強化部120がk-匿名化データセット830を汎化して生成した、匿名性強化データセット850の一例を示す図である。 FIG. 9 is a diagram illustrating an example of the anonymity enhancing data set 850 generated by the anonymity enhancing unit 120 by generalizing the k-anonymized data set 830.
 まず、匿名性強化部120は、偽ID:2を処理対象の偽IDとして選出する(S621)。 First, the anonymity enhancing unit 120 selects fake ID: 2 as a fake ID to be processed (S621).
 次に、匿名性強化部120は、選出した偽ID:2を持つ匿名化レコード832及び匿名化レコード835の強化対象準識別子を、属性名毎に同一の属性値に強化加工する(S623)。 Next, anonymity reinforcing portion 120 selects the false ID: enhanced target level identifier anonymization record 832 and anonymizing record 835 with 2, to enhance processing to the same attribute value for each attribute name (S623).
 ここで、属性名が「性別」の強化対象準識別子に対する同一の属性値は、例えば、「Any」と「女」を包含する「Any」である。また、属性名が「生年月日」の強化対象準識別子に対する同一の属性値は、例えば、「1981~1985年」と「1985~1986年」を包含する最小の範囲を示す属性値である「1981~1986年」である。尚、その属性値は、必ずしも全てを包含する最小の範囲を示す属性値でなくてもよく、例えば「1980~1989年」のような、属性名が「生年月日」の全ての匿名化準識別子を包含する任意の範囲の属性値でもよい。 Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “Any” including “Any” and “female”. Also, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is an attribute value indicating a minimum range including, for example, “1981 to 1985” and “1985 to 1986”. 1981-1986 ”. Note that the attribute value does not necessarily have to be an attribute value indicating the minimum range including all of them. For example, all anonymization standards having an attribute name of “birth date” such as “1980 to 1989” are used. An attribute value in an arbitrary range including the identifier may be used.
 次に、匿名性強化部120は、偽ID:4を処理対象の偽IDとして選出する(S621)。 Next, the anonymity enhancing unit 120 selects a fake ID: 4 as a fake ID to be processed (S621).
 次に、匿名性強化部120は、選出した偽ID:2を持つ匿名化レコード832及び匿名化レコード835の強化対象準識別子を、属性名毎に同一の属性値に強化加工する(S623)。 Next, anonymity reinforcing portion 120 selects the false ID: enhanced target level identifier anonymization record 832 and anonymizing record 835 with 2, to enhance processing to the same attribute value for each attribute name (S623).
 ここで、属性名が「性別」の強化対象準識別子に対する同一の属性値は、例えば、「女」と「女」を包含する「女」である。また、属性名が「生年月日」の強化対象準識別子に対する同一の属性値は、例えば、「1986~1990年」と「1985~1986年」を包含する最小の範囲を示す属性値である「1986~1990年」である。 Here, the same attribute value for the strengthened quasi-identifier whose attribute name is “sex” is, for example, “female” including “female” and “female”. Further, the same attribute value for the reinforcement target quasi-identifier whose attribute name is “birth date” is, for example, an attribute value indicating a minimum range including “1986 to 1990” and “1985 to 1986”. 1986-1990 ".
 次に、強化対象準識別子に強化加工を施す対象の偽IDは残っていないため(S622でNO)、処理は終了する。 Next, since there are no remaining false IDs to be subjected to reinforcement processing on the reinforcement target quasi-identifier (NO in S622), the process ends.
 不変準識別子の対比後もk-匿名性を充足させるためには、同じ強化対象準識別子グループに属する匿名化レコードに対しても強化加工が必要であった。しかし、以上説明したように、不変準識別子を具体化させないようにだけする際には、同じ強化対象準識別子グループに属する匿名化レコードに対する強化加工は、不要である。 In order to satisfy k-anonymity even after comparing invariant quasi-identifiers, anonymization records belonging to the same quasi-identifier group to be strengthened must be strengthened. However, as described above, when only the invariant quasi-identifiers are not embodied, it is not necessary to reinforce the anonymization records belonging to the same reinforcement target quasi-identifier group.
 図9に示す匿名性強化データセット850は、図2に示す匿名化対象データセット820の各対象レコードに対して、それらの不変準識別子(対象準識別子)を包含する不変準識別子(強化準識別子)を含む、強化レコードの偽IDの種類数が2個以上である。従って、匿名性強化データセット850は、k=2のk-匿名性(2-匿名性)と同程度のプライバシー強度を持っている。 An anonymity enhancing data set 850 shown in FIG. 9 is an invariant quasi-identifier (enhanced quasi-identifier) that includes those invariant quasi-identifiers (target quasi-identifiers) for each target record of the anonymization target data set 820 shown in FIG. ), And the number of types of fake IDs in the enhancement record is two or more. Therefore, the anonymity enhancement data set 850 has a privacy strength comparable to k = 2 anonymity (2-anonymity) of k = 2.
 上述した本実施形態における第1の効果は、k-匿名化後の匿名化レコードの不変準識別子(匿名化準識別子)を対比しても、個人特定性の向上ができないデータセットを生成することを可能にする点である。 The first effect of the present embodiment described above is to generate a data set that cannot improve individual specificity even if the invariant semi-identifier (anonymized semi-identifier) of the anonymized record after anonymization is compared. it is that it allows.
 その理由は、k-匿名化部110が生成したk-匿名化データセットを、匿名性強化部120が匿名化レコードに含まれる強化対象準識別子を強化加工して、匿名性強化データセットを生成するようにしたからである。 The reason is that the k-anonymization data set generated by the k-anonymization unit 110 is processed by the anonymity enhancement unit 120 to strengthen the quasi-identifier to be strengthened included in the anonymization record to generate the anonymity enhancement data set. This is because the way.
 上述した本実施形態における第2の効果は、k-匿名化部110が生成したk-匿名化データセットのk-匿名性を厳密に保持しつつ、個人特定性の向上ができないデータセットを生成することを可能にする点である。 The second effect of the present embodiment described above is that a data set that does not improve personal identification while strictly maintaining k-anonymity of the k-anonymization data set generated by the k-anonymization unit 110 is generated. is a point to be able to.
 その理由は、匿名性強化部120が上述の強化加工を再帰的に実行して、匿名性強化データセットを生成するようにしたからである。 This is because the anonymity strengthening unit 120 recursively executes the above-described strengthening process to generate an anonymity strengthening data set.
 上述した本実施形態における第3の効果は、情報の損失を比較的小さく抑えつつ、k-匿名化後のレコード群の準識別子を対比しても個人特定性の向上ができないように、匿名性を強化したデータセットを生成することを可能にする点である。 The third effect of the present embodiment described above is that anonymity is maintained so that the loss of information is kept relatively small and personal identification cannot be improved by comparing the quasi-identifiers of the k-anonymized records. It is a point that makes it possible to generate a data set enhanced.
 その理由は、匿名性強化部120が同一の偽IDを持つ匿名化レコードに含まれる強化対象準識別子だけを強化加工して、匿名性強化データセットを生成するようにしたからである。 The reason is that the anonymity enhancing unit 120 reinforces only the reinforcement target quasi-identifier included in the anonymized record having the same fake ID to generate the anonymity enhanced data set.
  <第2の実施形態>
 次に、本発明の第2の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。
<Second Embodiment>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
 まず、第2の実施形態の匿名化装置の概要を説明する。 First, an outline of the anonymization device of the second embodiment will be described.
 本実施形態の匿名化装置は、図1に示す匿名性強化部120による汎化(匿名性強化)に対応する、情報損失量を計算する。それから、本実施形態の匿名化装置は、計算した情報損失量に基づいて、例えば情報損失量が最も小さくなるように、k-匿名化における対象レコードの組み合わせを決定する。それから、本実施形態の匿名化装置は、決定した対象レコードの組み合わせに基づいて、匿名化対象データセットの対象準識別子を、所望の匿名性を充足するように変換してk-匿名化データセットを生成する。 The anonymization apparatus of this embodiment calculates an information loss amount corresponding to generalization (anonymity enhancement) by the anonymity enhancement unit 120 illustrated in FIG. Then, the anonymization apparatus of this embodiment determines a combination of target records in k-anonymization based on the calculated information loss amount so that the information loss amount is minimized, for example. Then, the anonymization device of the present embodiment converts the target quasi-identifier of the anonymization target data set based on the determined combination of target records so as to satisfy desired anonymity, and k-anonymization data set to generate.
 本実施形態の匿名化装置は、固有識別情報を単位として情報損失量を計算する。その理由は、匿名化対象データセットが一つの固有識別情報に対する複数の対象レコードを含む場合に対応するためである。 The anonymization device of this embodiment calculates the information loss amount with the unique identification information as a unit. The reason is to cope with the case where the anonymization target data set includes a plurality of target records for one unique identification information.
 具体的には、本実施形態の匿名化装置においては、匿名性強化部120が、対比による匿名性の破綻を防ぐために、強化加工を実施する。しかし、各対象レコードの情報損失量をレコード単位で計算した場合、その情報損失量は、匿名性強化部120によって強化加工された際の情報の損失分を含まない。その損失分は、一つの固有識別情報に対する複数の対象レコードを含む匿名化対象データセットに対するものである。 Specifically, in the anonymization apparatus of the present embodiment, the anonymity enhancing unit 120 performs reinforcement processing in order to prevent anonymity failure due to comparison. However, when the information loss amount of each target record is calculated in record units, the information loss amount does not include the loss of information when strengthened by the anonymity enhancing unit 120. The loss is for an anonymization target data set including a plurality of target records for one unique identification information.
 匿名性強化部120による強化加工は、第1の実施形態で説明したように、k-匿名化データセットの匿名化レコードを、各固有識別情報に対応する偽IDに基づいて、更に汎化するものである。従って、単一の対象レコードの情報損失量を求めるだけでは、その場合の情報の損失が加味されない。 As described in the first embodiment, the enhancement processing by the anonymity enhancement unit 120 further generalizes the anonymization record of the k-anonymization data set based on the false ID corresponding to each unique identification information. it is intended. Therefore, the information loss in that case is not taken into account only by obtaining the information loss amount of the single target record.
 このため、本実施形態の匿名化装置は、各偽IDに対応する固有識別情報を単位とする情報損失量、即ち匿名性強化部120によって強化加工された際の情報の損失分を含む情報損失量を、計算する。以後、この「匿名性強化部120による強化対象準識別子の強化加工に伴う、固有識別情報のそれぞれに対応する情報損失量」を強化加工情報損失量と呼ぶ。こうして、本実施形態の匿名化装置は、匿名化対象データセットが一つの固有識別情報に対する複数の対象レコードを含む場合にも対応し、汎化による情報の損失をより小さくした匿名性強化データセットを生成する。 For this reason, the anonymization apparatus of this embodiment is an information loss including an information loss amount in units of unique identification information corresponding to each fake ID, that is, an information loss when strengthened by the anonymity enhancement unit 120 the amount is calculated. Hereinafter, this “information loss amount corresponding to each unique identification information associated with the strengthening process of the quasi-identifier to be strengthened by the anonymity enhancing unit 120” will be referred to as a strengthened processing information loss amount. Thus, the anonymization device of the present embodiment is compatible with a case where the anonymization target data set includes a plurality of target records for one unique identification information, and an anonymity enhancement data set in which loss of information due to generalization is further reduced. to generate.
 以上が、第2の実施形態の匿名化装置の概要の説明である。 The above is the description of the outline of the anonymization device of the second embodiment.
 図10は、本発明の第2の実施形態に係る匿名化装置200の構成を示すブロック図である。 FIG. 10 is a block diagram showing the configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.
 図10を参照すると、本実施形態における匿名化装置200は、第1の実施形態の匿名化装置100に比べて、k-匿名化部110に替えてk-匿名化部210を含む。また、匿名化装置200は、匿名化装置100に比べて、組合せ決定部230及び情報損失計算部240を更に含む。尚、組合せ決定部230は、情報損失計算部240を含むようにしてもよい。 Referring to FIG. 10, the anonymization device 200 according to this embodiment includes a k-anonymization unit 210 instead of the k-anonymization unit 110 as compared to the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 further includes a combination determination unit 230 and an information loss calculation unit 240 as compared with the anonymization device 100. The combination determination unit 230 may include an information loss calculation unit 240.
 ===組合せ決定部230===
 組合せ決定部230は、強化加工情報損失量に基づいて、対象レコードを1以上のグループに振り分ける場合の対象レコードの組み合わせを決定する。
=== Combination Determination Unit 230 ===
The combination determination unit 230 determines a combination of target records when the target records are distributed to one or more groups based on the amount of loss of reinforced processing information.
 具体的には、組合せ決定部230は、組み合わせ候補を1以上生成する。ここで、組み合わせ候補は、匿名化対象データセットに含まれる対象レコードを1以上のグループに振り分ける場合の、対象レコードの組み合わせの候補である。 Specifically, the combination determination unit 230 generates one or more combination candidates. Here, a combination candidate is a candidate for a combination of target records when target records included in the anonymization target data set are distributed to one or more groups.
 組合せ決定部230は、情報損失計算部240に組み合わせ候補を渡す。そして、組合せ決定部230は、各組み合わせ候補に対応する強化加工情報損失量を情報損失計算部240から受け取る。 The combination determination unit 230 passes the combination candidates to the information loss calculation unit 240. Then, the combination determination unit 230 receives the reinforced processing information loss amount corresponding to each combination candidate from the information loss calculation unit 240.
 組合せ決定部230は、受け取った強化加工情報損失量に基づいて算出した情報の損失が、最も小さい組み合わせ候補を対象レコードの組み合わせとして決定する。即ち、組合せ決定部230は、匿名化対象データセットに含まれる全ての対象レコードの、匿名性強化部120による強化加工後の、情報損失量の総和が最小になるように対象レコードの組み合わせを決定する。 The combination determination unit 230 determines a combination candidate having the smallest information loss calculated based on the received amount of reinforced processing information loss as a combination of target records. That is, the combination determination unit 230 determines the combination of the target records so that the total sum of information loss amounts after the reinforcement processing by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to.
 ===匿名化対象データセット820===
 図11は、匿名化対象データセット860の一例を示す図である。図11に示すように匿名化対象データセット860は、患者ID(固有識別情報とも呼ばれる)、生年、診療年月及び傷病名の属性からなる複数の対象レコード(例えば、対象レコード8601)を複数含む。尚、属性である「生年」は、不変準識別子である。また、属性である「診療年月」は、変動準識別子である。
=== Anonymization target data set 820 ===
FIG. 11 is a diagram illustrating an example of the anonymization target data set 860. As shown in FIG. 11, the anonymization target data set 860 includes a plurality of target records (for example, target records 8601) including attributes of a patient ID (also referred to as unique identification information), a birth year, a medical treatment date, and a wound name. . The attribute “birth year” is an invariant canonical identifier. The attribute “medical care date” is a variable quasi-identifier.
 図12は、匿名化対象データセット860の対象レコードを1以上のグループに振り分ける場合のイメージを示す図である。図12中の点線は、3-匿名性を保証可能なように対象レコードを組み合わせ、グループに振り分けるパーティショニングの一例を示す。以後、このグループを、匿名グループと呼ぶ。 FIG. 12 is a diagram showing an image when the target records of the anonymization target data set 860 are distributed to one or more groups. The dotted line in FIG. 12 shows an example of partitioning in which target records are combined and distributed to groups so that 3-anonymity can be guaranteed. Hereinafter, this group is referred to as an anonymous group.
 パーティション401、パーティション402は、属性:生年によって匿名化対象データセットを分割するパーティションである。また、パーティション403、パーティション404は、属性:診療年月によって匿名化対象データセットを分割するパーティションである。 The partition 401 and the partition 402 are partitions that divide the anonymization target data set by attribute: year of birth. Moreover, the partition 403 and the partition 404 are partitions which divide | segment the anonymization object data set by attribute: medical treatment date.
 ここで、図12に示すように、各匿名グループは、3以上の異なる患者IDを持った対象レコードを含む。尚、図12では、便宜上、患者IDを図13に示す偽ID(図12に示す患者IDに並び順で対応する)を用いて示す。従って、各匿名グループのそれぞれは、3-匿名性を保証可能なようにパーティショニングされたグループである。 Here, as shown in FIG. 12, each anonymous group includes target records having three or more different patient IDs. In FIG. 12, for the sake of convenience, the patient ID is shown using a false ID shown in FIG. 13 (corresponding to the patient ID shown in FIG. 12 in the order of arrangement). Therefore, each anonymous group is a group that is partitioned so that 3-anonymity can be guaranteed.
 例えば、組合せ決定部230は、パーティション403及びパーティション404のいずれかにより分割される匿名化グループ、即ち対象レコードの組み合わせの候補を採用するかを決定する。組合せ決定部230は、このようにして、対象レコードの組み合わせの候補の中から、所望のk-匿名性が充足可能であって、各患者ID(固有識別情報)に対応する強化加工情報損失量の総和が最も小さい、対象レコードの組み合わせの候補を採用し、対象レコードの組み合わせとして決定する。 For example, the combination determination unit 230 determines whether to adopt an anonymization group divided by either the partition 403 or the partition 404, that is, a combination of target records. In this way, the combination determining unit 230 can satisfy the desired k-anonymity among the candidate combinations of the target records, and the amount of reinforced processing information loss corresponding to each patient ID (unique identification information) The candidate for the combination of the target records having the smallest sum is selected and determined as the combination of the target records.
 ===情報損失計算部240===
 情報損失計算部240は、強化加工情報損失量を計算する。
=== Information Loss Calculation Unit 240 ===
The information loss calculation unit 240 calculates the reinforced processing information loss amount.
 具体的には、情報損失計算部240は、組合せ決定部230から組み合わせ候補を受け取る。次に、情報損失計算部240は、受け取った組み合わせ候補に基づいて、強化加工情報損失量を計算する。次に、情報損失計算部240は、計算した強化加工情報損失量を組合せ決定部230に渡す。 Specifically, the information loss calculation unit 240 receives a combination candidate from the combination determination unit 230. Next, the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate. Next, the information loss calculation unit 240 passes the calculated reinforced processing information loss amount to the combination determination unit 230.
 尚、情報損失計算部240は、匿名性強化部120の強化加工に対応する計算方法を用いて、強化加工情報損失量を計算する。 Note that the information loss calculation unit 240 calculates the reinforced processing information loss amount by using a calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120.
 次に、匿名性強化部120の強化加工が、匿名化対象データセット860の対象レコードに含まれる対象準識別子(属性が「生年」の不変準識別子)を、最小の範囲を示す属性値の超集合に汎化する、強化加工である場合について説明する。 Next, the strengthening process of the anonymity enhancement unit 120 is performed to change the target quasi-identifier (invariant quasi-identifier whose attribute is “birth year”) included in the target record of the anonymization target data set 860 to an attribute value exceeding the minimum range. A case of strengthening processing that generalizes to a set will be described.
 例えば、情報損失計算部240は、NCP(Normalized Cirtainty Penalty)によって、強化加工情報損失量を計算する。尚、情報損失量を測る指標は様々なものが提案されている。情報損失計算部240は、NCPに限らず、匿名性強化部120の強化加工に対応する任意の計算方法を用いて、強化加工情報損失量を計算してよい。 For example, the information loss calculation unit 240 calculates the reinforced processing information loss amount by NCP (Normalized City Penalty). Various indexes for measuring the amount of information loss have been proposed. The information loss calculation unit 240 may calculate the reinforced processing information loss amount by using any calculation method corresponding to the strengthening processing of the anonymity strengthening unit 120 without being limited to the NCP.
 一般的な、対象レコード単位のNCPは、ある対象レコードrの属性aに関するNCPの値をNCP(r.a)とすると、NCP(r.a)=|r.a_max-r.a_min|/|a.max-a.min|で求められる。ここで、r.a_maxは対象レコードrの属性aの属性値の最大値、r.a_minは対象レコードrの属性aの属性値の最小値である。また、a.maxは匿名化対象データセット860中の全対象レコードにおける属性aの最大値、a.minは匿名化対象データセット860中の全対象レコードにおける属性aの最小値を表す。 General NCP of target record unit is NCP (r.a) = | r., Where NCP (r.a) is an NCP value related to attribute a of a target record r. a_max-r. a_min | / | a. max-a. min |. Here, r. a_max is the maximum attribute value of the attribute a of the target record r, r. a_min is the minimum value of the attribute value of the attribute a of the target record r. In addition, a. max is the maximum value of the attribute a in all target records in the anonymization target data set 860, a. min represents the minimum value of the attribute a in all target records in the anonymization target data set 860.
 例えば、パーティション403により分割される匿名化グループ、即ち対象レコードの組み合わせの場合、患者ID:Aliceを持つ各対象レコード単位のNCPは、以下のように算出される。対象レコード8601に含まれる属性が「生年」の対象準識別子は、パーティション403により分割された匿名化グループにおいて、k-匿名化部210によって1981-1988年にk-匿名化される。従って、対象レコード8601に含まれる属性が「生年」の対象準識別子のNCPは、0.78(小数点3位以下は四捨五入、以下も同様)である。同様に、対象レコード8604に含まれる属性が「生年」の対象準識別子のNCPは、0.67、対象レコード8607に含まれる属性が「生年」の対象準識別子のNCPは、が0.44である。 For example, in the case of an anonymization group divided by the partition 403, that is, a combination of target records, an NCP for each target record having a patient ID: Alice is calculated as follows. The target quasi-identifier with the attribute “birth year” included in the target record 8601 is k-anonymized in 1981-1988 by the k-anonymization unit 210 in the anonymization group divided by the partition 403. Therefore, the NCP of the target quasi-identifier whose attribute included in the target record 8601 is “birth year” is 0.78 (the third decimal place is rounded off, and so on). Similarly, the NCP of the target semi-identifier whose attribute included in the target record 8604 is “Birth Year” is 0.67, and the NCP of the target semi-identifier whose attribute included in the target record 8607 is “Birth Year” is 0.44. is there.
 本実施形態の情報損失計算部240は、強化加工情報損失量として、患者ID単位のNCPを計算する。上述のNCPを対象レコード単位の情報損失量から、患者ID単位の強化加工情報損失量へと拡張した指標をNCP*と表す。ある患者IDuの属性aに関するNCP*の値をNCP*(u.a)とすると、NCP*(u.a)=|u.a_max ―
u.a_min|/|a.max-a.min|で求められる。ここで、u.a_maxは患者IDuを持つ全ての対象レコードの属性aの値における最大値を、u.a_minは患者IDuを持つ全ての対象レコードの属性aの値における最小値を示す。
The information loss calculation unit 240 of the present embodiment calculates an NCP for each patient ID as the reinforced processing information loss amount. An index obtained by extending the above-mentioned NCP from the information loss amount in the target record unit to the reinforced processing information loss amount in the patient ID unit is represented as NCP *. If the value of NCP * related to attribute a of a patient IDu is NCP * (u.a), NCP * (u.a) = | u. a_max ―
u. a_min | / | a. max-a. min |. Where u. a_max is the maximum value of the values of the attribute a of all target records having the patient ID u. a_min indicates a minimum value among the values of the attribute a of all target records having the patient IDu.
 例えば、患者ID:Aliceを持つ対象レコード8601、対象レコード8604及び対象レコード8607に含まれる属性が「生年」の対象準識別子のそれぞれは、パーティション403により分割された匿名化グループにおいて、k-匿名化部210によって1981-1988年、1983-1989年及び1981-1985年に変換される。この場合、患者ID:Aliceの対象レコード8601、対象レコード8604及び対象レコード8607に含まれる属性が「生年」の最小値は1981であり、最大値は1989である。従って、患者ID:Aliceの対象レコード8601、対象レコード8604及び対象レコード8607のNCP*は、0.89である。 For example, each of the target quasi-identifiers whose attributes included in the target record 8601, the target record 8604, and the target record 8607 having the patient ID: Alice are “birth year” is k-anonymized in the anonymization group divided by the partition 403. Part 210 converts to 1981-1988, 1983-1989 and 1981-1985. In this case, the minimum value of the “year of birth” attribute included in the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 1981, and the maximum value is 1989. Therefore, the NCP * of the target record 8601, the target record 8604, and the target record 8607 of the patient ID: Alice is 0.89.
 図13は、匿名化対象データセット860がパーティション401、パーティション403及びパーティション404により分割された場合の、各対象レコードに含まれる属性が「生年」の対象準識別子に対応する、匿名化準識別子を示す図である。図14は、匿名化対象データセット860がパーティション402、パーティション403及びパーティション404により分割された場合の、各対象レコードに含まれる属性が「生年」の対象準識別子に対応する、匿名化準識別子を示す図である。即ち、図13及び図14は、ある対象レコードの組み合わせにおいて、匿名化対象データセット860がk-匿名化された場合の、匿名化準識別子の一例を示す。 FIG. 13 shows an anonymization quasi-identifier corresponding to a target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 401, partition 403, and partition 404. It illustrates. FIG. 14 shows an anonymization quasi-identifier corresponding to the target quasi-identifier whose attribute included in each target record is “birth year” when the anonymization target data set 860 is divided by partition 402, partition 403, and partition 404. It illustrates. That is, FIG. 13 and FIG. 14 show an example of the anonymization quasi-identifier when the anonymization target data set 860 is k-anonymized in a certain combination of target records.
 図15は、対象レコードの組み合わせに対応する情報損失量の一例を示す情報である。具体的には、図15は、パーティション401及びパーティション402のそれぞれを採用した場合の、患者ID毎NCP*の値と匿名化対象データセット860全体のNCP*の総和とを示している。図15は、パーティション401ではなくパーティション402を採用した場合に、匿名化による情報の損失を小さくできることを示している。この場合、組合せ決定部230は、パーティション402を採用する。 FIG. 15 is information indicating an example of the information loss amount corresponding to the combination of the target records. Specifically, FIG. 15 shows the value of NCP * for each patient ID and the sum of NCP * of the entire anonymization target data set 860 when each of the partition 401 and the partition 402 is adopted. FIG. 15 shows that the loss of information due to anonymization can be reduced when the partition 402 is adopted instead of the partition 401. In this case, the combination determination unit 230 employs the partition 402.
 ===k-匿名化部210===
 k-匿名化部210は、組合せ決定部230が決定した対象レコードの組み合わせの、各匿名グループに属する対象レコードに含まれる対象準識別子を匿名化準識別子に変換して、k-匿名化データセットを生成する。k-匿名化部210は、例えば、各匿名グループに属する対象レコードに含まれる対象準識別子のそれぞれを、属性名毎に同一の属性値に変換する。
=== k-anonymization unit 210 ===
The k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group of the combination of the target records determined by the combination determination unit 230 into an anonymization quasi-identifier, and k-anonymization data set Is generated. For example, the k-anonymization unit 210 converts each of the target quasi-identifiers included in the target records belonging to each anonymous group into the same attribute value for each attribute name.
 即ち、k-匿名化部210は、匿名化対象データセットに含まれる全ての対象レコードの、匿名性強化部120による変換後の、情報損失量の総和が最小になるように対象準識別子を匿名化準識別子に変換する。 That is, the k-anonymization unit 210 anonymizes the target quasi-identifier so that the total amount of information loss after conversion by the anonymity enhancement unit 120 of all target records included in the anonymization target data set is minimized. to convert to of quasi-identifier.
 図16は、k-匿名化部210が生成するk-匿名化データセットの一例を示す図である。このk-匿名化データセットは、組合せ決定部230が対象レコードの組み合わせをパーティション402、パーティション403及びパーティション404により分割するように決定した場合に、k-匿名化部210が匿名化対象データセット860をk-匿名化したものである。 FIG. 16 is a diagram illustrating an example of a k-anonymization data set generated by the k-anonymization unit 210. This k-anonymization data set is obtained by the k-anonymization unit 210 when the combination determination unit 230 determines to divide the combination of the target records by the partition 402, the partition 403, and the partition 404. K-anonymized.
 尚、匿名化装置200のハードウェア単位の構成要素は、図5に示す構成であってよい。 In addition, the component of the hardware unit of the anonymization apparatus 200 may be the configuration shown in FIG.
 次に本実施形態の動作について、図面を参照して詳細に説明する。 Next, the operation of this embodiment will be described in detail with reference to the drawings.
 図17は、本実施形態に係る匿名化装置200の動作を示すフローチャートである。 FIG. 17 is a flowchart showing the operation of the anonymization apparatus 200 according to this embodiment.
 組合せ決定部230は、1以上の組み合わせ候補を生成し、生成した組み合わせ候補を情報損失計算部240に渡す(S631)。 The combination determination unit 230 generates one or more combination candidates and passes the generated combination candidates to the information loss calculation unit 240 (S631).
 次に、情報損失計算部240は、受け取った組み合わせ候補に基づいて、強化加工情報損失量を計算し、計算した強化加工情報損失量を組合せ決定部230に渡す(S632)。 Next, the information loss calculation unit 240 calculates the reinforced processing information loss amount based on the received combination candidate, and passes the calculated reinforced processing information loss amount to the combination determination unit 230 (S632).
 次に、組合せ決定部230は、受け取った強化加工情報損失量に基づいて算出した情報の損失が最も小さい組み合わせ候補を、対象レコードの組み合わせとして決定する(S633)。 Next, the combination determination unit 230 determines a combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as a combination of target records (S633).
 次に、k-匿名化部210は、組合せ決定部230が決定した対象レコードの組み合わせの、各匿名グループに属する対象レコード毎に、それらの対象レコードに含まれる対象準識別子を匿名化準識別子に変換して、k-匿名化データセットを生成し、記憶部702または記憶装置703に出力する(S634)。 Next, the k-anonymization unit 210 sets the target quasi-identifier included in each target record of the combination of the target records determined by the combination determination unit 230 as an anonymization quasi-identifier for each target record belonging to each anonymous group. The k-anonymization data set is generated by conversion, and output to the storage unit 702 or the storage device 703 (S634).
 次に、匿名性強化部120は、受け取ったk-匿名化データセットに含まれる強化対象準識別子を強化加工して、匿名化レコードを強化レコードに変換した匿名性強化データセットを生成する(S635)。 Next, the anonymity enhancement unit 120 reinforces the reinforcement target quasi-identifier included in the received k-anonymization data set, and generates an anonymity enhancement data set obtained by converting the anonymization record into the enhancement record (S635). ).
 尚、匿名化対象データセットは、複数の不変準識別子を含んでいてもよい。この場合、組合せ決定部230は、例えば、同一の組み合わせ候補における不変準識別子のNCP*を、同一の属性名かつ同一の偽ID単位で全て合算し、その総和を「強化加工情報損失量に基づいて算出した情報の損失」としてよい。尚、任意の同一の属性名を対象として、上述の合算を行ってよい。 It should be noted that the anonymization target data set may include a plurality of invariant quasi-identifiers. In this case, for example, the combination determination unit 230 adds up all the invariant identifiers NCP * in the same combination candidate in the same attribute name and the same fake ID unit, and sums the totals based on the “enhanced processing information loss amount”. Loss of information calculated in this way. Note that the above-mentioned summation may be performed for any same attribute name.
 上述した本実施形態における第1の効果は、第1の実施形態の効果に加え、第1の実施形態の匿名化装置100が生成する匿名性強化データセットより、情報の損失を小さくした匿名性強化データセットを生成することを可能にする点である。 In addition to the effects of the first embodiment, the first effect of the present embodiment described above is anonymity with a smaller loss of information than the anonymity enhancing data set generated by the anonymization device 100 of the first embodiment. It is a point that makes it possible to generate an enhanced data set.
 その理由は、以下のような構成を含むからである。即ち、第1に情報損失計算部240が固有識別情報のそれぞれに対応する強化加工情報損失量を計算する。第2に、組合せ決定部230が強化加工情報損失量に基づいて対象レコードの組み合わせを決定する。第3に、k-匿名化部210が決定された組み合わせの対象レコードに含まれる対象準識別子を属性名毎に同一の属性値に変換して、k-匿名化データセットを生成する。 The reason is that the following configuration is included. That is, first, the information loss calculation unit 240 calculates the reinforced processing information loss amount corresponding to each unique identification information. Secondly, the combination determining unit 230 determines a combination of target records based on the amount of reinforced processing information loss. Third, the k-anonymization unit 210 converts the target quasi-identifier included in the determined target record into the same attribute value for each attribute name, thereby generating a k-anonymization data set.
 上述した本実施形態における第2の効果は、情報の損失を相対的により小さくした匿名性強化データセットを生成することを可能にする点である。 The second effect of the present embodiment described above is that it is possible to generate an anonymity-enhanced data set with relatively small loss of information.
 その理由は、組合せ決定部230が、受け取った強化加工情報損失量に基づいて算出した情報の損失が、最も小さい組み合わせ候補を、対象レコードの組み合わせとして決定するようにしたからである。 The reason is that the combination determining unit 230 determines the combination candidate with the smallest information loss calculated based on the received amount of strengthening processing information loss as the combination of the target records.
 或いは、その理由は、k-匿名化部210は、k-匿名化部210は、各匿名グループに属する対象レコードに含まれる対象準識別子のそれぞれを、属性名毎に同一の属性値に変換するようにしたからである。 Alternatively, the reason is that the k-anonymization unit 210 converts the target quasi-identifier included in the target record belonging to each anonymous group into the same attribute value for each attribute name. This is because the way.
 以上の各実施形態で説明した各構成要素は、必ずしも個々に独立した存在である必要はない。例えば、各構成要素は、複数の構成要素が1個のモジュールとして実現されてよい。また、各構成要素は、1つの構成要素が複数のモジュールで実現されてもよい。また、各構成要素は、ある構成要素が他の構成要素の一部であるような構成であってよい。また、各構成要素は、ある構成要素の一部と他の構成要素の一部とが重複するような構成であってもよい。 Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.
 以上説明した各実施形態における各構成要素及び各構成要素を実現するモジュールは、必要に応じ、可能であれば、ハードウェア的に実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、コンピュータ及びプログラムで実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、ハードウェア的なモジュールとコンピュータ及びプログラムとの混在により実現されてもよい。 In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.
 そのプログラムは、例えば、磁気ディスクや半導体メモリなど、不揮発性のコンピュータ可読記録媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られる。この読み取られたプログラムは、そのコンピュータの動作を制御することにより、そのコンピュータを前述した各実施形態における構成要素として機能させる。 The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
 また、以上説明した各実施形態では、複数の動作をフローチャートの形式で順番に記載してあるが、その記載の順番は複数の動作を実行する順番を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の順番は内容的に支障しない範囲で変更することができる。 In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.
 更に、以上説明した各実施形態では、複数の動作は個々に相違するタイミングで実行されることに限定されない。例えば、ある動作の実行中に他の動作が発生したり、ある動作と他の動作との実行タイミングが部分的に乃至全部において重複していたりしていてもよい。 Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
 更に、以上説明した各実施形態では、ある動作が他の動作の契機になるように記載しているが、その記載はある動作と他の動作との全ての関係を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の関係は内容的に支障のない範囲で変更することができる。また各構成要素の各動作の具体的な記載は、各構成要素の各動作を限定するものではない。このため、各構成要素の具体的な各動作は、各実施形態を実施する上で機能的、性能的、その他の特性に対して支障をきたさない範囲内で変更されて良い。 Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
 以上、各実施形態及び実施例を参照して本発明を説明したが、本発明は上記実施形態及び実施例に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 As mentioned above, although this invention was demonstrated with reference to each embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 この出願は、2012年6月4日に出願された日本出願特願2012-127257を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2012-127257 filed on June 4, 2012, the entire disclosure of which is incorporated herein.
 100  匿名化装置
 110  k-匿名化部
 120  匿名性強化部
 200  匿名化装置
 210  k-匿名化部
 230  組合せ決定部
 240  情報損失計算部
 401  パーティション
 402  パーティション
 403  パーティション
 404  パーティション
 700  コンピュータ
 701  CPU
 702  記憶部
 703  記憶装置
 704  入力部
 705  出力部
 706  通信部
 707  記録媒体
 820  匿名化対象データセット
 822  対象レコード
 825  対象レコード
 830  k-匿名化データセット
 830  匿名化データセット
 831  匿名化レコード
 832  匿名化レコード
 833  匿名化レコード
 834  匿名化レコード
 835  匿名化レコード
 836  匿名化レコード
 840  匿名性強化データセット
 842  強化レコード
 845  強化レコード
 850  匿名性強化データセット
 860  匿名化対象データセット
 8601  対象レコード
 8604  対象レコード
 8607  対象レコード
DESCRIPTION OF SYMBOLS 100 Anonymization apparatus 110 k-anonymization part 120 Anonymity enhancement part 200 Anonymization apparatus 210 k-anonymization part 230 Combination determination part 240 Information loss calculation part 401 Partition 402 Partition 403 Partition 404 Partition 700 Computer 701 CPU
702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 820 Anonymization target data set 822 Target record 825 Target record 830 k-anonymization data set 830 Anonymization data set 831 Anonymization record 832 Anonymization record 833 Anonymization record 834 Anonymization record 835 Anonymization record 836 Anonymization record 840 Anonymity enhancement data set 842 Enhancement record 845 Enhancement record 850 Anonymity enhancement data set 860 Anonymization target data set 8601 Target record 8604 Target record 8607 Target record

Claims (10)

  1.  固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成するk-匿名化手段と、
     前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力する匿名性強化手段と、を含み、
     前記匿名性強化手段は、前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換することで、前記匿名化レコードを前記強化レコードに変換する
     情報処理装置。
    For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity K-anonymization means for converting to a semi-identifier, converting the target record into an anonymization record, and generating a k-anonymization data set including the anonymization record;
    Anonymity enhancing means for converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, generating and outputting an anonymity enhanced data set including the enhanced record, and Including
    The anonymity enhancing means corresponds to the target quasi-identifier that always has the same attribute value corresponding to each of the unique identification information in the anonymization target data set, An information processing apparatus that converts the anonymization record into the enhancement record by converting the reinforcement target quasi-identifier into information that cannot be materialized by comparing the reinforcement quasi-identifier.
  2.  前記強化対象準識別子のそれぞれは属性名と属性値とからなる属性であって、
     前記匿名性強化手段は、前記同一の前記偽識別情報を持つ全ての前記匿名化レコードに含まれる、前記強化対象準識別子を前記属性名毎に同一の属性値に変換した、前記匿名性強化データセットを生成する
     ことを特徴とする請求項1記載の情報処理装置。
    Each of the reinforcement target quasi-identifiers is an attribute consisting of an attribute name and an attribute value,
    The anonymity enhancing means includes the anonymity enhancing data obtained by converting the reinforcement target quasi-identifier into the same attribute value for each attribute name included in all the anonymized records having the same false identification information. The information processing apparatus according to claim 1, wherein a set is generated.
  3.  前記匿名性強化手段は、前記同一の偽識別情報を持つ全ての前記匿名化レコードに含まれる前記属性名毎の前記強化対象準識別子を包含しかつ最小の範囲を示す属性値を、前記同一の属性値として当該強化対象準識別子を変換する
     ことを特徴とする請求項2記載の情報処理装置。
    The anonymity enhancing means includes an attribute value that includes the reinforcement target quasi-identifier for each attribute name included in all the anonymized records having the same false identification information and indicates a minimum range. The information processing apparatus according to claim 2, wherein the reinforcement target quasi-identifier is converted as an attribute value.
  4.  前記匿名性強化手段による前記強化対象準識別子の変換に伴う前記固有識別情報のそれぞれに対応する強化加工情報損失量を計算する情報損失計算手段と、
     前記情報損失計算手段が計算した前記強化加工情報損失量に基づいて、前記対象レコードを1以上のグループに振り分ける場合の前記対象レコードの組み合わせを決定する組合せ決定手段を更に含み、
     前記k-匿名化手段は、各前記グループに属する前記対象レコードに含まれる前記対象準識別子を前記属性名毎に同一の属性値に変換して、k-匿名化データセットを生成する、
     ことを特徴とする請求項1乃至3のいずれか1項に記載の情報処理装置。
    Information loss calculating means for calculating the amount of reinforced processing information loss corresponding to each of the unique identification information accompanying the conversion of the quasi-identifier to be strengthened by the anonymity enhancing means,
    Based on the amount of reinforced processing information loss calculated by the information loss calculation means, further includes a combination determination means for determining a combination of the target records when the target records are allocated to one or more groups,
    The k-anonymization means converts the target quasi-identifier included in the target record belonging to each group into the same attribute value for each attribute name, and generates a k-anonymization data set.
    The information processing apparatus according to any one of claims 1 to 3.
  5.  前記組合決定手段は、前記匿名化対象データセットに含まれる全ての前記対象レコードの、前記匿名性強化手段による変換後の、情報損失量の総和が最小になるように各前記対象レコードの組み合わせを決定する
     ことを特徴とする請求項4記載の情報処理装置。
    The union determination unit sets the combinations of the target records so that the sum of information loss amounts after conversion by the anonymity enhancement unit of all the target records included in the anonymization target data set is minimized. The information processing apparatus according to claim 4, wherein the information processing apparatus is determined.
  6.  前記k-匿名化手段は、前記匿名化対象データセットに含まれる全ての前記対象レコードの、前記匿名性強化手段による変換後の、情報損失量の総和が最小になるように対象準識別子を変換する
     ことを特徴とする請求項4または5記載の情報処理装置。
    The k-anonymization means converts target quasi-identifiers so that the total sum of information loss after conversion by the anonymity enhancement means of all the target records included in the anonymization target data set is minimized. The information processing apparatus according to claim 4 or 5, wherein:
  7.  固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成し、
     前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力し、
     前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換することで、前記匿名化レコードを前記強化レコードに変換する
     匿名化方法。
    For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity Converting to a semi-identifier, converting the target record to an anonymization record, generating a k-anonymization data set including the anonymization record,
    For the k-anonymization data set, converting the anonymization record having the same false identification information into an enhancement record, generating an anonymity enhancement data set including the enhancement record, and outputting,
    The reinforcement target quasi-identifier corresponding to the target quasi-identifier always corresponding to each of the unique identification information in the anonymization target data set and corresponding to the target quasi-identifier, An anonymization method of converting the anonymization record into the strengthening record by converting the information to the quasi-identifier to be strengthened by comparison.
  8.  前記強化対象準識別子の変換に伴う前記固有識別情報のそれぞれに対応する強化加工情報損失量を計算し、前記計算した前記強化加工情報損失量に基づいて、前記対象レコードを1以上のグループに振り分ける場合の前記対象レコードの組み合わせを決定し、
     各前記グループに属する前記対象レコードに含まれる前記対象準識別子を前記属性名毎に同一の属性値に変換して、k-匿名化データセットを生成する、
     ことを特徴とする請求項7記載の匿名化方法。
    The amount of strengthening processing information loss corresponding to each of the unique identification information associated with the conversion of the strengthening target quasi-identifier is calculated, and the target records are distributed to one or more groups based on the calculated amount of strengthening processing information loss Determine the combination of target records in case
    Converting the target quasi-identifier included in the target records belonging to each of the groups into the same attribute value for each attribute name to generate a k-anonymized data set;
    The anonymization method according to claim 7.
  9.  固有識別情報と前記固有識別情報に対応する1以上の対象準識別子とを含む1以上の対象レコードを含む匿名化対象データセットについて、前記固有識別情報のそれぞれを、前記固有識別情報への復元情報を含まず、前記固有識別情報のそれぞれに固有に割り当てられる偽識別情報に変換し、かつ前記対象準識別子のそれぞれを、前記匿名化対象データセットがk-匿名性を満足するように、匿名化準識別子に変換して、前記対象レコードを匿名化レコードに変換し、前記匿名化レコードを含むk-匿名化データセットを生成する処理と、
     前記k-匿名化データセットについて、同一の前記偽識別情報を持つ前記匿名化レコードを強化レコードに変換し、前記強化レコードを含む匿名性強化データセットを生成し、出力する処理と、をコンピュータに実行させ、
     前記匿名化レコードを前記強化レコードに変換する処理は、前記匿名化対象データセットにおいて前記固有識別情報のそれぞれに対応して常に同じ属性値を持つ前記対象準識別子に対応する、前記匿名化準識別子である強化対象準識別子を、前記強化対象準識別子の対比による当該強化対象準識別子の具体化が不可能な情報に変換する処理である
     プログラムを記録した不揮発性記録媒体。
    For each anonymization target data set including one or more target records including unique identification information and one or more target quasi-identifiers corresponding to the unique identification information, each of the unique identification information is restored to the unique identification information. Is converted into false identification information uniquely assigned to each of the unique identification information, and each of the target quasi-identifiers is anonymized so that the anonymization target data set satisfies k-anonymity Converting to a semi-identifier, converting the target record to an anonymization record, and generating a k-anonymization data set including the anonymization record;
    A process for converting the anonymized record having the same false identification information into an enhanced record for the k-anonymized data set, generating an anonymity enhanced data set including the enhanced record, and outputting to the computer Let it run
    The process of converting the anonymization record into the enhancement record is the anonymization quasi-identifier corresponding to the target quasi-identifier always having the same attribute value corresponding to each of the unique identification information in the anonymization target data set. A non-volatile recording medium storing a program, which is a process of converting an enhancement target quasi-identifier into information in which the reinforcement quasi-identifier cannot be realized by comparison with the reinforcement quasi-identifier.
  10.  前記強化対象準識別子の変換に伴う前記固有識別情報のそれぞれに対応する強化加工情報損失量を計算する処理と、前記計算した前記強化加工情報損失量に基づいて、前記対象レコードを1以上のグループに振り分ける場合の前記対象レコードの組み合わせを決定する処理と、
     各前記グループに属する前記対象レコードに含まれる前記対象準識別子を前記属性名毎に同一の属性値に変換して、k-匿名化データセットを生成する処理と、をコンピュータに実行させる
     ことを特徴とする請求項9記載のプログラムを記録した不揮発性記録媒体。
    One or more groups of the target records are calculated based on the calculated amount of reinforced processing information loss corresponding to each of the unique identification information associated with the conversion of the reinforced target quasi-identifier and the calculated amount of reinforced processing information loss A process of determining a combination of the target records in the case of distribution to
    Converting the target quasi-identifier included in the target record belonging to each group into the same attribute value for each attribute name, and generating a k-anonymized data set. A non-volatile recording medium on which the program according to claim 9 is recorded.
PCT/JP2013/003347 2012-06-04 2013-05-28 Information processing device for anonymization and anonymization method WO2013183250A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2014519824A JPWO2013183250A1 (en) 2012-06-04 2013-05-28 Information processing apparatus and anonymization method for anonymization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012127257 2012-06-04
JP2012-127257 2012-06-04

Publications (1)

Publication Number Publication Date
WO2013183250A1 true WO2013183250A1 (en) 2013-12-12

Family

ID=49711658

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/003347 WO2013183250A1 (en) 2012-06-04 2013-05-28 Information processing device for anonymization and anonymization method

Country Status (2)

Country Link
JP (1) JPWO2013183250A1 (en)
WO (1) WO2013183250A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101704702B1 (en) * 2016-04-18 2017-02-08 (주)케이사인 Tagging based personal data de-identification system and de-identification method of personal data
KR20200026559A (en) * 2018-09-03 2020-03-11 (주)아이알컴퍼니 Dataset De-identification Method and Apparatus Using K-anonymity Model
JP7382902B2 (en) 2020-06-18 2023-11-17 株式会社日立製作所 Data provision server device and data provision method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011145401A1 (en) * 2010-05-19 2011-11-24 株式会社日立製作所 Identity information de-identification device
WO2012067213A1 (en) * 2010-11-16 2012-05-24 日本電気株式会社 Information processing system and anonymizing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011145401A1 (en) * 2010-05-19 2011-11-24 株式会社日立製作所 Identity information de-identification device
WO2012067213A1 (en) * 2010-11-16 2012-05-24 日本電気株式会社 Information processing system and anonymizing method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101704702B1 (en) * 2016-04-18 2017-02-08 (주)케이사인 Tagging based personal data de-identification system and de-identification method of personal data
KR20200026559A (en) * 2018-09-03 2020-03-11 (주)아이알컴퍼니 Dataset De-identification Method and Apparatus Using K-anonymity Model
KR102126386B1 (en) * 2018-09-03 2020-06-24 (주)아이알컴퍼니 Dataset De-identification Method and Apparatus Using K-anonymity Model
JP7382902B2 (en) 2020-06-18 2023-11-17 株式会社日立製作所 Data provision server device and data provision method

Also Published As

Publication number Publication date
JPWO2013183250A1 (en) 2016-01-28

Similar Documents

Publication Publication Date Title
Kumar et al. Blockchain utilization in healthcare: Key requirements and challenges
JP6007969B2 (en) Anonymization device and anonymization method
Cheng et al. Validity of in-hospital mortality data among patients with acute myocardial infarction or stroke in National Health Insurance Research Database in Taiwan
CA3046247C (en) Data platform for automated data extraction, transformation, and/or loading
Heo et al. Prediction of patients requiring intensive care for COVID-19: development and validation of an integer-based score using data from Centers for Disease Control and Prevention of South Korea
US20210165913A1 (en) Controlling access to de-identified data sets based on a risk of re- identification
US11449674B2 (en) Utility-preserving text de-identification with privacy guarantees
US11093645B2 (en) Coordinated de-identification of a dataset across a network
US11074641B1 (en) Systems, methods and computer-program products for eligibility verification
US20160306999A1 (en) Systems, methods, and computer-readable media for de-identifying information
WO2013105076A1 (en) Automated document redaction
US11468996B1 (en) Maintaining stability of health services entities treating influenza
US11194922B2 (en) Protecting study participant data for aggregate analysis
WO2013183250A1 (en) Information processing device for anonymization and anonymization method
US20150227756A1 (en) Adaptive access control in relational database management systems
JP2013190838A (en) Information anonymization system, information loss determination method, and information loss determination program
KR20110099214A (en) Type descriptor management for frozen objects
US11269632B1 (en) Data conversion to/from selected data type with implied rounding mode
JPWO2014030302A1 (en) Information processing apparatus and anonymization processing method for performing anonymization
WO2013121738A1 (en) Distributed anonymization device, and distributed anonymization method
JP7351347B2 (en) Information trading system, information trading method and program
JP2017228255A (en) Evaluation device, evaluation method and program
WO2014136422A1 (en) Information processing device for performing anonymization processing, and anonymization method
Syed et al. API driven on-demand participant ID pseudonymization in heterogeneous multi-study research
Solnick et al. Emergency department returns and early follow-up visits after heart failure hospitalization: Cohort study examining the role of race

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13800297

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014519824

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13800297

Country of ref document: EP

Kind code of ref document: A1