WO2015008480A1 - Information processing device that performs anonymization, and anonymization method - Google Patents

Information processing device that performs anonymization, and anonymization method Download PDF

Info

Publication number
WO2015008480A1
WO2015008480A1 PCT/JP2014/003732 JP2014003732W WO2015008480A1 WO 2015008480 A1 WO2015008480 A1 WO 2015008480A1 JP 2014003732 W JP2014003732 W JP 2014003732W WO 2015008480 A1 WO2015008480 A1 WO 2015008480A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
anonymization
views
anonymized
unit
Prior art date
Application number
PCT/JP2014/003732
Other languages
French (fr)
Japanese (ja)
Inventor
隆夫 竹之内
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2015008480A1 publication Critical patent/WO2015008480A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Definitions

  • the present invention relates to anonymization technology, and more particularly to privacy protection technology in secondary use of personal information.
  • Patent Document 1 discloses a technique for managing multiple types of medical information with different subscriber identification information.
  • the medical information management device disclosed in Patent Literature 1 determines a matching pattern for uniquely identifying a subscriber in the basic ledger. Then, the medical information management device manages various types of medical information related to the subscriber in association with each other according to the matching pattern. Further, the medical information management apparatus anonymizes and manages the symbols, numbers, kanji names, and kana names of health insurance subscribers.
  • Such a medical information management apparatus has a problem that the privacy protection of the subscriber is insufficient. This is because it is possible to identify an individual by combining information such as date of birth, sex, medical history, etc. that has not been anonymized in the medical information management apparatus. As a result, information that the individual does not want to be known to others (attribute value of the sensitive attribute) may be known to other individuals. Information such as date of birth, sex, medical history, and the like is called a quasi-identifier. That is, the quasi-identifier is information that makes it possible to identify an individual by being combined. In other words, the information contained in the basic ledger must be sufficiently protected for privacy by anonymization.
  • FIG. 26 is a diagram illustrating an example of general personal information 910 that needs privacy protection and is an anonymization target.
  • personal information 910 has an identifier (ID) as an identifier 915, a postal code as a quasi-identifier 916, an age as a quasi-identifier 917, and a disease name as a sensitive attribute (918). Note that the zip code, age, disease name, and the like constituting the personal information 910 are also generally called attributes.
  • personal information 910 is represented by a table composed of a plurality of records.
  • Each of the records includes attribute values corresponding to each individual (each attribute, that is, the zip code, age, and disease name).
  • the sensitive attribute 918 is information that is desired to be prevented from being disclosed in association with a specific individual. Note that attributes such as the postal code, age, and disease name are treated as either a semi-identifier or a sensitive attribute depending on the situation.
  • FIG. 27 shows anonymized information 920 corresponding to personal information 910.
  • the anonymization information 920 does not include an ID.
  • the attribute values of the quasi-identifier 926 and the quasi-identifier 927 of the anonymization information 920 are obtained by generalizing the attribute values of the quasi-identifier 916 and the quasi-identifier 917 of the personal information 910, respectively.
  • the sensitive attribute 928 of the anonymized information 920 is the same as the sensitive attribute 918 of the personal information 910.
  • Patent Document 2 Patent Document 3
  • Non-Patent Document 1 Non-Patent Document 2
  • Patent Document 2 discloses a technique for protecting the privacy of public information. Specifically, Patent Document 2 discloses a technique for performing top-down processing and determining k-anonymity and l-diversity for the result. Patent Document 2 discloses a technique for performing bottom-up processing and determining k-anonymity and l-diversity for the result. Here, the top-down process and the bottom-up process are repeatedly executed. Further, Patent Document 2 discloses a technique for performing partial anonymization processing and determining k-anonymity and l-diversity for the result.
  • Patent Document 3 discloses a technique for anonymizing important information (sensitive attribute) in addition to the quasi-identifier.
  • Non-Patent Document 1 discloses an example of an anonymization method. Specifically, Non-Patent Document 1 discloses an anonymization method based on a top-down approach using a Mondrian method.
  • FIG. 28 shows an example of l-diversification using the Mondrian method.
  • QID1 and QID2 shown in FIG. 28 are quasi-identifiers.
  • SA shown in FIG. 28 is a sensitive attribute.
  • An original view 931 shown in FIG. 28 is a view of certain personal information (not shown) composed of original (unprocessed) attribute values.
  • the view is a view in the database technology, for example, a table obtained by extracting some attributes from the table as shown in FIG.
  • l-diversification is executed by the following procedure using, for example, a certain anonymizing device (not shown).
  • the anonymization device generates the most generalized view 932 by most generalizing the quasi-identifier included in the original view 931 (processing it into an ambiguous value).
  • the generalized view 932 one group of records having the same combination of quasi-identifier values (attribute values) can be created. This group of records having the same combination of quasi-identifier values is also called an equivalence class.
  • the anonymization device divides the equivalent class and refines the quasi-identifier in units of the divided equivalent class. Specifically, the anonymization device regroups the records based on the attribute value of the specific quasi-identifier in the original view 931 with the specific value of the specific quasi-identifier as a boundary. , Split its equivalence class.
  • the anonymization device selects the specific quasi-identifier sequentially from the left column of the view to be divided.
  • the anonymization device sets an average value of attribute values of the specific quasi-identifier included in the equivalence class as the specific value.
  • the anonymization device selects QID1 as the specific quasi-identifier for the generalized view 932 and determines the specific value as “120 (rounds off the decimal point)”.
  • the anonymization device regroups the records included in the generalized view 932 based on the attribute value of QID1 in the original view 931, with this “120” as a boundary.
  • the anonymization device refines the quasi-identifier for each regrouped equivalence class and generates a split view 933.
  • the specific quasi-identifier may be the quasi-identifier having the largest value range.
  • the specific value may be a median value of the attribute values of the specific quasi-identifier.
  • the anonymization device repeats the second process while the divided view generated in the second process described above satisfies a predetermined anonymity.
  • FIG. 28 shows that a split view 934 is generated from the split view 933 and a split view 935 is further generated from the split view 934, with predetermined anonymity as 3-diversity.
  • the display in which “3” of 3-diversity is replaced with an arbitrary numerical value also means that “l” of 1-diversity is the “arbitrary numerical value”.
  • Non-Patent Document 2 discloses an example of an l-diversity verification method. Specifically, Non-Patent Document 2 discloses a method for verifying l-diversity when matching a plurality of anonymized views thereof. Hereinafter, this l-diversity is referred to as “l-diversity of multiple views”. The “anonymized view” is referred to as “anonymized view”.
  • FIG. 29 is a diagram for explaining the l-diversity of a plurality of views.
  • personal information 941 includes “ID” as an identifier, “QI_1” and “QI_2” as quasi-identifiers, and “SA” as a sensitive attribute.
  • the first view 942 is a view corresponding to the first pattern.
  • the first anonymized view 944 is an anonymized view so that the first view 942 satisfies 2-diversity.
  • the second view 943 is a view corresponding to the second pattern.
  • the second anonymized view 945 is an anonymized view so that the second view 943 satisfies 2-diversity.
  • both the first anonymized view 944 and the second anonymized view 945 satisfy 2-diversity.
  • the attacker can infer from the first anonymized view 944 that the value of “SA” of “user4” is “B” or “C” based on the premise knowledge. Further, the attacker can infer from the second anonymized view 945 that the value of “SA” of “user4” is “A” or “B” based on the premise knowledge. Therefore, the attacker can specify that the value of “SA” of “user4” is “B”. In other words, the l-diversity of the multiple views in this case is 1-diversity.
  • Non-Patent Document 2 describes the following algorithm as a method for verifying l-diversity of multiple views.
  • the verification method acquires a sensitive attribute corresponding to the target ID in each anonymized view.
  • the verification method obtains a product set for each ID of the acquired sensitive attribute.
  • the l-diversity in the target ID is determined based on whether or not the number of sensitive attributes in the product set is equal to or greater than l (l-diversity “l” to be verified).
  • FIG. 30 is a diagram for explaining an example in which the l-diversity of the two anonymized views 944 and 945 shown in FIG. 29 is verified.
  • the attribute value 951 is an attribute value of each individual sensitive attribute that can be inferred from the first anonymized view 944.
  • the attribute value 952 is an attribute value of the sensitive attribute of each individual that can be inferred from the second anonymized view 945.
  • the product set 953 is a product set of the attribute value 951 and the attribute value 952.
  • the number of sensitive attributes included in each product set 953 is the value of “l” of l-diversity of multiple views in each target ID.
  • the technique described in the above-described prior art document has a problem in that the amount of information loss varies among a plurality of anonymized views. In other words, there is a problem that the information loss amount of a certain anonymized view is smaller than necessary, and the information loss amount of another anonymized view may be larger than a required limit.
  • the anonymization techniques disclosed in Patent Documents 2 and 3 perform anonymization (referred to as pre-anonymization) in principle so that the amount of information loss is minimized. Therefore, when anonymization (referred to as subsequent anonymization) is performed from another point of view (in terms of correspondence with another quasi-identifier, which is different from the quasi-identifier at the time of preceding anonymization), the preceding anonymization is performed. It is also necessary to consider the anonymity of the anonymized view generated by. That is, the subsequent line anonymization is limited by the result of the preceding anonymization.
  • An object of the present invention is to provide an information processing apparatus, an anonymization method, and a computer-readable non-transitory recording medium recording the program for solving the above-described problems.
  • An information processing apparatus performs, in parallel, a view acquisition unit that acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and anonymization of the plurality of views in parallel.
  • Anonymizing means for outputting the obtained anonymized view.
  • a computer obtains a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and executes anonymization of the plurality of views in parallel. Output anonymized views.
  • a computer-readable non-transitory recording medium acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals and performs anonymization of the plurality of views in parallel.
  • a program that causes the computer to execute the process of outputting the anonymized view obtained in this way is recorded.
  • the present invention has an effect that it is possible to reduce variation in the amount of information loss between a plurality of anonymized views.
  • FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment of the present invention.
  • FIG. 2 shows an example of personal information in the first embodiment.
  • FIG. 3 shows an example of a view in the first embodiment.
  • FIG. 4 shows an example of a view in the first embodiment.
  • FIG. 5 shows an example of a pattern in the first embodiment.
  • FIG. 6 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization device according to the first embodiment.
  • FIG. 7 is a flowchart illustrating the operation of the anonymization device according to the first embodiment.
  • FIG. 8 is a diagram illustrating an example of maximum generalized view generation in the first embodiment.
  • FIG. 9 is a diagram for explaining an example of the refinement stage in the first embodiment.
  • FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment of the present invention.
  • FIG. 2 shows an example of personal information in the first embodiment.
  • FIG. 3 shows an example of a
  • FIG. 10 is a diagram for explaining an example of the refinement stage in the first embodiment.
  • FIG. 11 shows an example of a view in the related art.
  • FIG. 12 shows an example of a view in the related art.
  • FIG. 13 shows an example of anonymization information in the first embodiment.
  • FIG. 14 is a flowchart illustrating the operation of the anonymization device according to the first modification example of the first embodiment.
  • FIG. 15 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment.
  • FIG. 16 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment.
  • FIG. 17 is a block diagram showing a configuration of the anonymization device according to the second exemplary embodiment of the present invention.
  • FIG. 18 is a diagram for explaining division point candidates in the second embodiment.
  • FIG. 19 is a diagram for explaining division point candidates in the second embodiment.
  • FIG. 20 is a diagram for explaining the sensitive attribute of the equivalent class in the second embodiment.
  • FIG. 21 is a flowchart illustrating the operation of the anonymization device according to the second embodiment.
  • FIG. 22 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment.
  • FIG. 23 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment.
  • FIG. 24 shows an example of a view in the related art.
  • FIG. 25 is a block diagram showing a configuration of the anonymization device according to the third exemplary embodiment of the present invention.
  • FIG. 26 shows an example of personal information in the related art.
  • FIG. 27 shows an example of anonymization information in the related art.
  • FIG. 28 shows an example of l-diversification using the Mondrian method in the related art.
  • FIG. 29 is a diagram for explaining l-diversity of a plurality of views in the related art.
  • FIG. 30 is a diagram illustrating verification of l-diversity of multiple views in the related art.
  • FIG. 1 is a block diagram showing the configuration of the anonymization device 100 according to the first embodiment of the present invention.
  • the anonymization device 100 includes a view acquisition unit 110 and an anonymization unit 120.
  • the constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of the computer apparatus.
  • the components shown in FIG. 1 will be described as components divided into functional units of the computer apparatus.
  • FIG. 2 is a diagram illustrating an example of the personal information 810.
  • the personal information 810 includes an identifier 815 and a plurality of attributes for each of a plurality of individuals.
  • the plurality of attributes are a quasi-identifier 816, a quasi-identifier 817, and a sensitive attribute 818. That is, one record of the personal information 810 includes an identifier of a certain individual and a plurality of attributes.
  • FIG. 3 and FIG. 4 shows an example of a view composed of a combination of arbitrary attributes in the personal information 810.
  • the views acquired from arbitrary personal information for example, a view 821 and a view 822 described later
  • an acquired view for example, a view 821 and a view 822 described later
  • FIG. 3 is a diagram showing a view 821 including a combination of the quasi-identifier 816 and the sensitive attribute 818 in the personal information 810.
  • FIG. 4 is a diagram showing a view 822 that is a combination of the quasi-identifier 817 and the sensitive attribute 818 in the personal information 810.
  • the pattern is information indicating which attribute group is desired to be analyzed such as correlation.
  • the pattern is represented, for example, by information (for example, attribute name) that specifies a quasi-identifier or a sensitive attribute.
  • the quasi-identifier and its sensitive attribute may be plural.
  • FIG. 5 is a diagram illustrating an example of the pattern 804.
  • each of the patterns 804 includes a pair of attribute names “QI_1” and “SA” and a pair of attribute names “QI_2” and “SA”.
  • the pattern 804 illustrated in FIG. 5 obtains a view 821 including a set of attributes “QI_1” and “SA” and a view 822 including a set of attributes “QI_2” and “SA”. It is information to suggest.
  • the pattern 804 shown in FIG. 5 is also expressed as “ ⁇ QI_1, SA ⁇ and ⁇ QI_2, SA ⁇ ”.
  • the view acquisition unit 110 acquires the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 from the personal information 810 based on the pattern 804 shown in FIG.
  • Input of personal information 810 and input of pattern 804 are performed as follows.
  • the view acquisition unit 110 when inputting the personal information 810, the view acquisition unit 110 reads the personal information 810 stored in advance in a storage unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may receive the personal information 810 from the outside via a communication unit (not shown) of the anonymization device 100.
  • the view acquisition unit 110 accepts the input of the pattern 804 given by the operator via an input unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may read a pattern 804 stored in advance in a storage unit (not shown) of the anonymization device 100.
  • the anonymized view is a generic name for the output anonymized view.
  • the anonymization unit 120 outputs anonymized views corresponding to the views 821 and 822 obtained by executing the anonymization of the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 in parallel.
  • the anonymization unit 120 executes the anonymization using, for example, a Mondrian method.
  • the Mondrian method converts anonymization target information (for example, each of the view 821 and the view 822) into a single equivalent class (group) in a state of maximum generalization, and divides the equivalent class to create a new equivalent class. It is a technique that repeats generating a class.
  • the “single equivalence class in the state of maximum generalization” is an equivalence class having anonymization target information (for example, each of the view 821 and the view 822) as one group.
  • the new equivalence class is an equivalence class in which the quasi-identifier is refined in units of new groups generated by the division.
  • a combination of one division and detailing is referred to as a detailing stage.
  • a state in which all records are included in a single equivalence class to form one group is a state in which all quasi-identifiers included in the anonymization information are most generalized.
  • “detailed quasi-identifiers in new group units” means that the quasi-identifier values included in the new group are changed from the maximum value to the minimum value of the original values of those quasi-identifiers. Process it and make it a new equivalence class.
  • the anonymization unit 120 executes the refinement stage for each of the plurality of acquired views in parallel. More specifically, the anonymization unit 120 executes the first refinement stage of the view 821, and then executes the first refinement stage of the view 822. Next, the anonymization unit 120 executes the second refinement stage of the view 821, and then executes the second refinement stage of the view 822. Next, the anonymization unit 120 performs the detailed steps after the third time in the same manner. Also, the anonymization unit 120 executes the first refinement stage of each acquired view in the same manner when there are three or more acquired views, and then the second refinement stage of each acquired view. Execute.
  • FIG. 6 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.
  • the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside.
  • the recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
  • the CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown).
  • the CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702.
  • the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
  • the CPU 701 executes various processes as the view acquisition unit 110 and the anonymization unit 120 shown in FIG. 1 according to the read program and based on the read data.
  • the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
  • the storage unit 702 stores programs and data.
  • the storage unit 702 may store personal information 810, a pattern 804, each acquired view (for example, the view 821 and the view 822), and the like.
  • the storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707.
  • the storage device 703 (recording medium 707) stores the program in a computer-readable manner.
  • the storage device 703 may store data.
  • the storage device 703 may store personal information 810, patterns 804, acquired views, and the like.
  • the input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation.
  • the input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, for example.
  • the anonymization device 100 may acquire the personal information 810 and the pattern 804 input via the input unit 704.
  • the output unit 705 is realized by a display, for example, and is used for confirming the output.
  • the anonymization device 100 may output the anonymized view via the output unit 705.
  • the communication unit 706 realizes an interface with the outside.
  • the communication unit 706 is included as part of the view acquisition unit 110 and the anonymization unit 120.
  • the anonymization device 100 may acquire the personal information 810 and the pattern 804 from the outside via the communication unit 706. Further, the anonymization device 100 may output the anonymization view to the outside via the communication unit 706.
  • the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG.
  • the means for realizing each unit included in the computer 700 is not limited to the above.
  • the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
  • the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707.
  • the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.
  • a storage medium that stores information non-temporarily is also referred to as a non-volatile storage medium.
  • FIG. 7 is a flowchart showing the operation of this embodiment. Note that the processing according to this flowchart may be executed based on the program control by the CPU 701 described above. Further, the step name of the process is described by a symbol as in S601.
  • the view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
  • the view acquisition unit 110 acquires the view 821 and the view 822 from the personal information 810 based on each of the patterns 804 (S602).
  • the anonymization unit 120 generates a view 831 and a view 832 that are the most generalized quasi-identifiers (“QI_1” and “QI_2”) included in each of the view 821 and the view 822 (S603).
  • the process for generating the view 831 and the view 832 may be a process for updating the view 821 and the view 822 to the view 831 and the view 832, respectively.
  • the process for generating each stage anonymized view in the following description may be a process for updating the stage anonymized view one stage before corresponding to each stage anonymized view.
  • the stage anonymized view is a general term for a view 831, a view 832, a view 841 (described later), a view 842 (described later), a view 851 (described later), and a view 852 (described later).
  • FIG. 8 shows an image in which the anonymization unit 120 generates the view 831 by most abstracting (maximum generalization) the quasi-identifier “QI — 1” included in the view 821 shown in FIG.
  • FIG. 8 illustrates an image in which the anonymization unit 120 generates the view 832 by abstracting the quasi-identifier “QI — 2” included in the view 822 illustrated in FIG.
  • the anonymization unit 120 executes the details of each of the view 831 and the view 832 in parallel (S604).
  • FIG. 9 shows an image in which the anonymization unit 120 executes the first detailing stage. As illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 831 to generate the view 841. Further, as illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 832 to generate a view 842.
  • FIG. 10 shows an image in which the anonymization unit 120 executes the second refinement stage. As illustrated in FIG. 10, the anonymization unit 120 performs the second refinement stage on the view 841 to generate a view 851. Also, as illustrated in FIG. 10, the anonymization unit 120 performs a second detailing step on the view 842 to generate a view 852.
  • the anonymization unit 120 outputs anonymized views (here, view 841 and view 842) that satisfy the required anonymity among the generated views (S605).
  • the view 851 and the view 852 are not output because they do not satisfy the 2-diversity of a plurality of views when they are matched.
  • the anonymization unit 120 may skip the division and refinement in the refinement stage.
  • the anonymization unit 120 may perform division and refinement of the refinement stage only for the stage anonymization view including the equivalence class that can be split.
  • “when there is no equivalence class that can be split” means that when the stage anonymization view is split and refined, the stage anonymization view cannot satisfy the anonymity that should be satisfied alone. Is the case.
  • the anonymization unit 120 ends the process of S604. It should be noted that the anonymization unit 120 may end the process of S604 when executing a predetermined number of details.
  • This related technique is a method that simply combines the Mondrian method disclosed in Non-Patent Document 1 and the “l-diversity verification method when matching multiple anonymized views” disclosed in Non-Patent Document 2. is there.
  • the process of anonymizing two acquired views by this method is simply a process of first anonymizing the first view and then anonymizing the second view.
  • the anonymization apparatus 100 of this embodiment anonymizes a plurality of acquired views in parallel.
  • the anonymization target information is assumed to be a view 821 and a view 822.
  • the required anonymity is 2-anonymity and 2-view diversity.
  • the view 821 is first anonymized and the result is output.
  • the view 822 is anonymized and the result is output.
  • FIG. 11 is a diagram showing anonymization of the view 821 by the related technology. As shown in FIG. 11, in the related art, the view 821 is anonymized, and the view 851 having the smallest amount of information loss while satisfying 2-diversity is output.
  • FIG. 12 is a diagram showing anonymization of the view 822 by the related technology.
  • the view 822 is anonymized.
  • the view 842 and the view 852 do not satisfy the 2-diversity of the plurality of views for the record of “user4” when the view 852 and the view 851 are matched with each other. Therefore, the view 832 is output as the anonymous result of the view 822.
  • the view 851 and the view 832 are output.
  • the amount of information loss in this embodiment will be described.
  • the information loss amount of a certain quasi-identifier is set as the quasi-identifier of each record. Is the sum of generalization widths.
  • the total information loss amount is defined as the total information loss amount of the quasi-identifier included in the anonymized information.
  • FIG. 13 shows anonymization information 921.
  • the total information loss amount of the anonymized information 921 is obtained as follows.
  • the information loss amount of the quasi-identifier “ZIP code” is “8”, which is the sum of the generalization widths of all records (all “1”).
  • the information loss amount of “age” of the quasi-identifier is the generalization width of each record (in order from the top row, “2”, “2”, “3”, “3”, “7”, “7”, “2” and “2”) are totaled to be “28”.
  • the total information loss amount of the anonymized information 921 is “68”, which is the total information loss amount of each quasi-identifier.
  • Each of the view 841 and the view 842 output by the present embodiment is divided once.
  • the total information loss amounts of the view 841 and the view 842 are “25” and “43”, respectively.
  • the view 851 is divided twice, whereas the view 832 is never divided.
  • the total information loss amounts of the view 851 and the view 832 are “17” and “91”, respectively.
  • the analyst can expect that the analysis using each of the view 841 and the view 842 output according to the present embodiment maintains a certain accuracy.
  • the analyst can expect high accuracy in the analysis using the view 851 output by the related technology, but cannot expect practical accuracy in the analysis using the view 832 that is also output. In other words, the analyst cannot obtain an analysis result having a certain accuracy with a plurality of anonymized views output by the related technology.
  • the first effect in the present embodiment described above is that it is possible to reduce the variation in the total information loss amount among a plurality of anonymized views.
  • the second effect of the present embodiment described above is that an analysis result having a certain accuracy can be obtained for each of a plurality of anonymized views.
  • the reason is that the view acquisition unit 110 acquires a plurality of acquisition views, and the anonymization unit 120 executes anonymization of the plurality of views in parallel.
  • the anonymization unit 120 executes the refinement stage for each of the first stage stage anonymized views and updates (generates) the second stage stage anonymized view. It is then determined whether those second stage stage anonymized views satisfy the l-diversity of the multiple views.
  • the anonymization unit 120 determines that l-diversity of the plurality of views is satisfied, the second stage stage anonymized view is used as a new first stage stage anonymized view, and the next stage That refinement stage of the. Also, when the anonymization unit 120 determines that the l-diversity of the plurality of views is not satisfied, the anonymization unit 120 outputs the second stage stage anonymization view.
  • the anonymization unit 120 repeatedly performs the refinement step, satisfies the l-diversity of the plurality of views, and generates the most total information loss amount. Output fewer anonymized views as anonymized views.
  • FIG. 14 is a flowchart showing the operation of this modification.
  • the view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
  • the view acquisition unit 110 acquires an acquisition view (for example, the view 821 and the view 822) from the personal information 810 based on each of the patterns 804 (S602).
  • the anonymization unit 120 generates a stage anonymized view (for example, the view 831 and the view 832) that is the most generalized quasi-identifier included in each of the acquired views (S603).
  • the anonymization unit 120 executes the refinement stage for each stage anonymization view (S606).
  • the anonymization unit 120 performs the view 841 and the view 842 (detailed one-stage anonymized view) of the view 831 and the view 832 (the first-stage anonymized view). Update to the second stage stage anonymization view).
  • the anonymization unit 120 determines whether or not l-diversity of a plurality of views is satisfied (S607).
  • FIG. 15 is a diagram for explaining an example of the “detailing stage” in S606 for the first time and the “determination” in S607.
  • the anonymization unit 120 generates a view 841 and a view 842 obtained by refining each of the view 831 and the view 832 by one stage at the detailing stage. Further, the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 841 and the view 842 are matched.
  • the sensitive attribute value is an attribute value of “SA” of the sensitive attribute.
  • the confirmation content 843 corresponds to the attribute value “SA” of the sensitive attribute that can be inferred in the view 841, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 842, and Indicates the intersection.
  • the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 841 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 842 is “SA” of the sensitive attribute that can be estimated. ”Attribute value.
  • the anonymization unit 120 determines whether or not a division point candidate exists in any stage anonymization view (S608). . If the candidate for the dividing point exists (YES in S608), the process returns to S606.
  • FIG. 16 is a diagram for explaining an example of the “detailed stage” in S606 for the second time and the “determination” in S607 following that when returning from S607.
  • the view acquisition unit 110 generates a view 851 and a view 852 in which each of the view 841 and the view 842 is refined by one stage at the detailing stage.
  • the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 851 and the view 852 are matched.
  • the confirmation content 853 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 851, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 852, and Indicates the intersection.
  • the intersection of the attribute value of the sensitive attribute “SA” that can be inferred in the view 851 and the attribute value of the sensitive attribute “SA” that can be inferred in the view 852 is “SA” of the sensitive attribute that can be inferred. ”Attribute value.
  • the view acquisition unit 110 uses the second stage stage anonymized view generated in the detailing stage as the first stage stage. Return to anonymized view. Subsequently, the anonymization unit 120 outputs the returned first-stage anonymization view as an anonymization view (S609). Here, the anonymization unit 120 returns the view 851 and the view 852 to the view 841 and the view 842, respectively. Subsequently, the anonymization unit 120 outputs the view 841 and the view 842.
  • the anonymization unit 120 If the candidate for the division point does not exist (NO in S608), the anonymization unit 120 outputs the second stage stage anonymized view generated in the detailing stage as an anonymized view (S610). .
  • This modification can output an anonymized view that satisfies the required anonymity and has the least amount of overall information loss.
  • the anonymization unit 120 executes one detailing step for each of the plurality of step anonymized views, and then confirms “l-diversity of multiple views”. For example, the anonymization unit 120 executes the refinement stage on the view 831 and then executes the refinement stage on the view 832. Next, the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 842.
  • the anonymization unit 120 refines the view 831 and updates it to the view 841.
  • the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 832 that has not been detailed yet.
  • the anonymization unit 120 determines whether or not the required anonymity is satisfied following the refinement stage for one stage anonymization view.
  • This modified example can reduce the process of returning the second stage stage anonymized view generated by the refinement stage to the first stage stage anonymized view in S609 of FIG.
  • the anonymization unit 120 sequentially executes the refinement stages of the stage anonymization view based on the priority corresponding to each of the acquisition views (for example, the view 821 and the view 822) to be refined. For example, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view having a relatively high priority.
  • the anonymization unit 120 receives the priority input by the operator via the input unit 704 shown in FIG.
  • the anonymization unit 120 may read the priority stored in advance in the storage unit 702 or the storage device 703 shown in FIG. Moreover, you may make it the anonymization part 120 receive the priority from the apparatus which is not shown in figure via the communication part 706 shown in FIG. Further, the anonymization unit 120 may read out the priority recorded in the recording medium 707 via the storage device 703 shown in FIG.
  • the anonymization unit 120 executes the refinement steps in order from the step anonymization view corresponding to the acquisition view to be refined that has a relatively high priority. Further, the anonymization unit 120 may calculate the priority based on the combination of the patterns 804. For example, based on the rate at which the quasi-identifier is included in the plurality of patterns 804, the anonymization unit 120 calculates a higher priority for the pattern 804 having a lower rate of the quasi-identifier included in each pattern 804.
  • the anonymization unit 120 may determine that “QI — 2” and “QI — 3” are more important than “QI — 1” because they are included in only one pattern 804.
  • the anonymization unit 120 may sequentially execute the detailed steps of the step anonymization view based on both the priority and the total information loss amount. For example, the anonymization unit 120 gives priority to the stage anonymization view having a larger value obtained by multiplying the priority corresponding to each of the stage anonymization views and the total information loss amount, and executes the refinement stage. You can do it.
  • the anonymization unit 120 doubles the information loss amount of the first acquisition view when the priority of the first acquisition view is “2” and the priority of the second acquisition view is “1”. And the value obtained by multiplying the amount of information loss of the second acquired view by one. Then, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view corresponding to the acquisition view having the larger value.
  • this modification has an effect that the information loss amount of a specific view 850 can be relatively reduced based on the priority.
  • this modification calculates the priority based on the combination of the patterns 804, and relatively reduces the information loss amount of the specific view 850 corresponding to the quasi-identifier included in the acquired view. Has the effect of becoming possible.
  • this modification has an effect that the information loss amount of the specific view 850 can be relatively reduced based on the value calculated from the priority and the information loss amount.
  • FIG. 17 is a block diagram showing a configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.
  • the anonymization device 200 further includes a similarity calculation unit 230 as compared with the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 includes an anonymization unit 220 instead of the anonymization unit 120.
  • the equivalence class is an equivalence class when it is divided at each of a plurality of division point candidates for the stage anonymized view before execution of a certain refinement stage.
  • SA combination the combination of the extracted types of sensitive attribute values is referred to as “SA combination”.
  • the similarity calculation unit 230 may extract the sensitive attribute value only for the stage anonymization view having the division point candidates.
  • the similarity calculation unit 230 calculates the similarity corresponding to each of the plurality of division point candidates based on the SA combination.
  • the degree of similarity is the degree of similarity between the respective stage anonymized views when the detailing stage is executed with the division point candidate.
  • the similarity calculation unit 230 calculates the similarity based on the edit distance between the SA combinations of the respective stage anonymized views for each record of the personal information 810.
  • the editing distance between SA combinations means that, in two SA combinations including an arbitrary number of sensitive attribute values, the number of editing (deletion or addition) from one SA combination to the other SA combination. This is a method of indicating the distance.
  • the similarity calculation unit 230 divides each stage anonymized view to satisfy the required l-diversity even if the refinement stage is executed at a division point with the stage anonymized view. A point is extracted as a division point candidate.
  • FIG. 18 shows split point candidates of the view 831 extracted by the similarity calculation unit 230. Further, FIG. 19 shows the dividing point candidates of the view 832 extracted by the similarity calculation unit 230.
  • the similarity calculation unit 230 extracts the SA combination for each division point candidate of each stage anonymized view.
  • FIG. 20 shows “a1”, “a2”, “a3”, and “a4” as the division point candidates for the view 831, and “b1”, “b2”, “b3”, and “b4” as the division point candidates for the view 832.
  • the SA combinations corresponding to the above are shown.
  • the similarity calculation unit 230 calculates the similarity between each stage anonymized view (here, each of the view 831 and the view 832) by combining each division point candidate of each stage anonymized view (hereinafter, This is called “candidate combination”).
  • the SA combination corresponding to the view 831 is “ ⁇ A, B, C ⁇ ”, that corresponding to the view 832
  • the SA combination is “ ⁇ A, B ⁇ ”, and the edit distance is “1”.
  • the SA combinations match, so the edit distance is “0”, and the sum is “1”, so the similarity is “ ⁇ 1”. Since the similarity is greater as the editing distance is smaller, the similarity is determined by inverting the sign of the total editing distance.
  • the similarity calculation unit 230 calculates the similarity for all candidate combinations. In this case, each candidate combination of “a1” and “b1”, “a2” and “b2”, “a3” and “b3”, and “a4” and “b4” has the highest similarity.
  • the similarity calculation unit 230 determines that the greater the number is, the greater the similarity is based on the number of product sets of SA combinations between the respective stages of anonymized views for each record of the personal information 810. You may do it.
  • the anonymization unit 220 determines a division point candidate included in the candidate combination having the highest similarity as a division point. In addition, when there are a plurality of candidate combinations having the highest similarity, the anonymization unit 220 selects a candidate combination including, for example, a division point candidate close to the average value. In that case, the anonymization unit 220 may select a candidate combination including a division point candidate close to the median (Median).
  • the anonymization unit 220 selects (determines) “a3” and “b3”.
  • the anonymization unit 220 executes the refinement step at the determined division point.
  • FIG. 21 is a flowchart showing the operation of the present embodiment.
  • the view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
  • the view acquisition unit 110 acquires the view 821 and the view 822 from the personal information 810 based on each of the patterns 804 (S602).
  • the anonymization unit 220 generates a view 831 and a view 832 that are the most generalized quasi-identifiers included in each of the view 821 and the view 822 (S603).
  • the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 831 and view 832) (S624).
  • the anonymization unit 220 determines a division point based on the similarity (S625). For example, the anonymization unit 220 determines the division point candidates “a3” and “b3” as the division points in the first process of S625.
  • the anonymization unit 220 executes the refinement step for each step anonymization view at the determined division point (S606).
  • the anonymization unit 220 determines whether or not l-diversity of a plurality of views is satisfied when each stage anonymized view that has been subjected to the refinement stage is matched and analyzed ( S607).
  • FIG. 22 is a diagram for explaining the first operation of S606 and S607.
  • the anonymization unit 220 executes the refinement stage on the view 831 and the view 832 at the division points “a3” and “b3”, and generates the view 871 and the view 872.
  • the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 871 and the view 872 are matched.
  • the confirmation content 873 includes attribute values “SA” of the sensitive attribute that can be inferred in the view 871, attribute values of “SA” in the sensitive attribute that can be inferred in the view 872, and their corresponding values. Indicates the intersection.
  • the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 871 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 872 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value.
  • the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). . If the candidate for division point exists (YES in S608), the process returns to S624.
  • the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 841 and view 842) (S624).
  • the anonymization unit 220 determines a division point based on the similarity (S625).
  • the anonymization unit 220 executes the refinement step for each step anonymization view at the determined division point (S606).
  • the anonymization unit 220 determines whether or not l-diversity of a plurality of views is satisfied when each stage anonymized view that has been subjected to the refinement stage is matched and analyzed ( S607).
  • FIG. 23 is a diagram for explaining the second operation of S606 and S607.
  • the anonymization unit 220 performs the refinement step on the view 871 and the view 872 at the division point determined in the second S625, and generates the view 881 and the view 882. .
  • the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 881 and the view 882 are matched.
  • the confirmation content 883 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 881, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 882, and their corresponding values. Indicates the intersection.
  • the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 881 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 882 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value.
  • the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). .
  • the anonymization unit 220 anonymizes each stage anonymization view (second stage stage anonymization view) for which the detailing stage has been executed. A view is output (S610). Here, the anonymization unit 220 outputs the view 881 and the view 882.
  • the anonymization unit 220 displays each stage anonymized view that has been subjected to the refinement stage, and each stage prior to the execution of the refinement stage.
  • the anonymized view is returned to the anonymized view (the first stage anonymized view).
  • the anonymization unit 220 outputs the anonymized view (S609).
  • the anonymization view output from the anonymization device 200 according to the present embodiment based on the personal information 810 and the pattern 804 has a smaller amount of information loss than the anonymization view output from the anonymization device 100 according to the first embodiment.
  • each of the view 881 and the view 882 has one more division than the view 841 and the view 842.
  • SA combinations sensitive attribute value candidates
  • the division point at the detailing stage is not necessarily the optimum division point, and the product set of SA combinations when matching each stage anonymization view is relatively There may be a division point that becomes smaller. Therefore, the anonymization device 100 cannot satisfy the l-diversity of the plurality of views in the course of repeating the detailing stage. As a result, compared with the anonymization device 200, the anonymization device 100 reduces the number of divisions and relatively increases information loss.
  • the anonymization apparatus 200 executes the refinement step so that the SA combinations that can be inferred from each step anonymization view are similar for each individual. Specifically, the anonymization apparatus 200 performs the division at the division points where the SA combinations of the equivalent classes between the respective stages of anonymized views are similar for each record corresponding to each individual. Therefore, the anonymization device 200 can execute more detailed steps. As a result, the anonymization device 200 can output an anonymization view that has smaller information loss and satisfies the required anonymity.
  • FIG. 24 shows a view 893 that is anonymized by related technology with the entire personal information 810 as one acquired view. As shown in FIG. 24, the view 893 is obtained by dividing the personal information 810 into two parts. The information loss amounts of “QI_1” and “QI_2” in the view 893 are “25” and “49”, respectively.
  • the information loss amounts of “QI_1” of the view 841 and “QI_2” of the view 842, which are outputs of the present embodiment, are “17” and “35”, respectively. That is, the anonymization according to the present embodiment can reduce the amount of information loss compared with the anonymization according to the related technology.
  • the effect of this embodiment described above is that, in addition to the effect of the first embodiment, the amount of information loss of the anonymized view to be output can be further reduced.
  • the reason is that the anonymization unit 220 determines the division point based on the similarity of the SA combination of the equivalent class calculated by the similarity calculation unit 230.
  • the similarity calculation unit 230 calculates the similarity for the division point candidate adjacent to the average value of the quasi-identifiers of the equivalent classes to be divided. In this case, the similarity calculation unit 230 may calculate the similarity with respect to the division point candidate adjacent to the median point of the quasi-identifier of the equivalent class to be divided.
  • the number of adjacent division point candidates may be a preset number. Further, the number of adjacent division point candidates may be determined based on, for example, the set number of acquired views (the number of patterns 804).
  • the similarity calculation unit 230 sets the number of adjacent division point candidates to five. That is, since the number of acquired views is double, the similarity calculation unit 230 sets the number of division point candidates to 0.5 (the reciprocal of 2).
  • This modification has the effect that the processing load in the anonymization device 200 can be reduced first.
  • this modification has an effect that the processing load can be controlled based on the number of acquired views.
  • the present modification has an effect of preventing the tendency that the amount of calculation increases as the number of acquired views increases.
  • the second embodiment may be modified in the same manner as the first to third modifications of the first embodiment.
  • FIG. 25 is a block diagram showing a configuration of the anonymization device 300 according to the third exemplary embodiment of the present invention.
  • the anonymization device 200 in the present embodiment includes a view acquisition unit 310 instead of the view acquisition unit 110, as compared to the anonymization device 100 of the first embodiment.
  • the personal information includes four attributes “QI — 1”, “QI — 2”, “QI — 3”, and “SA”.
  • the correlation coefficient between “QI_1” and “SA” is “0.8”, the correlation coefficient between “QI_2” and “SA” is “0.1”, and the correlation coefficient between “QI — 3” and “SA”. Is “0.7”.
  • the determination threshold is “0.5”.
  • the view acquisition unit 310 acquires an acquired view based on the strong correlation patterns of “QI_1, SA” and “QI_3, SA”.
  • the structure of the strong correlation pattern is the same as the structure of the pattern 804 shown in FIG.
  • the view acquisition unit 310 may determine a strong correlation pattern by using an association rule analyzer (Association Rule Mining).
  • the correlation rule analysis can determine the strength of correlation between a plurality of attributes.
  • the view acquisition unit 310 acquires an acquisition view using “QI_1, SA” and “QI_1, QI_3, SA” as strong correlation patterns.
  • the effect of the present embodiment described above is that, in addition to the effect of the first embodiment, it is not necessary for a person to determine the pattern 804 in advance.
  • the reason is that the view acquisition unit 310 acquires the acquired view based on the mutual relationship of attributes included in the personal information.
  • the view acquisition unit 310 acquires a new acquired view based on the learning result of the acquired acquisition view. For example, the view acquisition unit 310 stores a previously input combination of patterns 804 as illustrated in FIG. 5 and learns correlations between strong correlation patterns included in the plurality of combinations of patterns 804. Next, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern having a strong correlation with the newly input pattern 804 based on the learned result.
  • the acquisition view is acquired with a strong correlation pattern with strong correlation” will be described with a specific example.
  • the personal information includes six attributes “QI — 1”, “QI — 2”, “QI — 3”, “QI — 4”, “QI — 5”, and “SA”.
  • Second patterns 804 ⁇ QI_1, SA ⁇ , ⁇ QI_2, SA ⁇ , ⁇ QI_3, SA ⁇ .
  • the view acquisition unit 310 learns a combination of these patterns 804 by association rule mining, and obtains the following learning result.
  • support is the ratio of the pattern 804 including ⁇ QI_1, SA ⁇ among all the stored patterns 804.
  • Constant is a ratio including ⁇ QI_2, SA ⁇ in the pattern 804 including ⁇ QI_1, SA ⁇ .
  • the view acquisition unit 310 When ⁇ QI_1, SA ⁇ is newly input as the fifth pattern 804, the view acquisition unit 310 also acquires the acquired view of ⁇ QI_2, SA ⁇ based on the learning result.
  • the view acquisition unit 310 may learn the stored pattern 804 by classification tree learning or the like.
  • the view acquisition unit 310 may acquire an acquired view based on the learning result without inputting a new pattern 804. In this case, the view acquisition unit 310 may determine a strong correlation pattern based on a determination threshold for support and confidence.
  • the anonymization device 300 can generate an acquisition view that can be easily used in combination with a specific acquisition view, for example, by learning a pattern 804 input by a plurality of people.
  • anonymization suitable for analysis by “pattern” that is often performed by many people is possible. The reason is that the acquired view is acquired based on the result of learning the stored pattern 804.
  • anonymization can be executed without a person determining a strong correlation pattern.
  • the reason is that the view acquisition unit 310 acquires the acquired view based on the learning result without inputting the new pattern 804.
  • the view acquisition unit 310 acquires a new acquisition view that further includes attributes that are not included in the acquired acquisition view in the acquired acquisition view. Specifically, the view acquisition unit 310 further acquires an acquired view in which each attribute not included in the acquired acquired view is added to the acquired acquired view.
  • the personal information includes six attributes “QI — 1”, “QI — 2”, “QI — 3”, “QI — 4”, “QI — 5”, and “SA”.
  • the view acquisition unit 310 acquires an acquired view using ⁇ QI_1, SA ⁇ and ⁇ QI_2, SA ⁇ , which are strong correlation patterns included in the combination of the patterns 804. Further, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern obtained by adding one attribute to each of the strong correlation patterns included in the combination of the patterns 804.
  • the strong correlation pattern obtained by adding one attribute is ⁇ QI_1, QI_2, SA ⁇ , ⁇ QI_1, QI_3, SA ⁇ , ⁇ QI_1, QI_4, SA ⁇ , ⁇ QI_1, QI_5, SA ⁇ , ⁇ QI_2, QI_3, SA. ⁇ , ⁇ QI_2, QI_4, SA ⁇ , ⁇ QI_2, QI_5, SA ⁇ .
  • the added attributes are all attributes that are not included in the strong correlation pattern included in the combination of the patterns 804.
  • the attribute to be added may be an arbitrary attribute that is not included in the strong correlation pattern included in the combination of the patterns 804.
  • the anonymization device 300 can increase the accuracy of analysis other than the input strong correlation pattern.
  • the analyst can also analyze the multi-view l- in addition to the attributes included in the input strong correlation pattern.
  • Anonymized views satisfying diversity can be obtained. The reason is that the view acquisition unit 310 generates an acquisition view in which an attribute is added to the input strong correlation pattern.
  • each component described in each of the above embodiments does not necessarily need to be an independent entity.
  • each component may be realized as a module with a plurality of components.
  • each component may be realized by a plurality of modules.
  • Each component may be configured such that a certain component is a part of another component.
  • Each component may be configured such that a part of a certain component overlaps a part of another component.
  • each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement
  • the program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up.
  • the read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
  • a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
  • each of the embodiments described above it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.
  • the specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation
  • movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
  • Anonymization apparatus 110 View acquisition part 120 Anonymization part 200 Anonymization apparatus 220 Anonymization part 230 Similarity calculation part 300 Anonymization apparatus 310 View acquisition part 700 Computer 701 CPU 702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 804 Pattern 810 Personal information 815 Identifier 816 Semi-identifier 817 Semi-identifier 818 Sensitive attribute 821 View 822 View 831 View 832 View 841 View 852 View 851 View 851 871 View 872 View 873 Confirmation Content 881 View 882 View 883 Confirmation Content 893 View 910 Personal Information 915 Identifier 916 Semi-identifier 918 Semi-identifier 918 Sensitive attribute 920 Anonymized information 926 Semi-identifier 927 Semi-identifier 928 Sensitive attribute 921 Anonymized information 931 932 Generalized view 933 Split view 934 Split view 35 split view 941 Personal information 942 first view 943 second view 944

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is an information processing device which performs anonymization so as to decrease fluctuation of information loss between multiple anonymized views. This information processing device is provided with a view acquiring means which acquires multiple views of personal information including multiple attributes of multiple individuals, and an anonymizing means which outputs the anonymized views obtained by performing anonymization of said multiple views in parallel.

Description

匿名化を実行する情報処理装置及び匿名化方法Information processing apparatus for performing anonymization and anonymization method
 本発明は、匿名化の技術に関し、特にパーソナル情報の2次利用におけるプライバシ保護の技術に関する。 The present invention relates to anonymization technology, and more particularly to privacy protection technology in secondary use of personal information.
 パーソナル情報の利用技術及びさまざまな関連技術が知られている。 Personal information utilization technology and various related technologies are known.
 例えば、特許文献1は、複数種類の、加入者の識別情報が異なる医療情報を管理する技術を開示する。特許文献1に開示される医療情報管理装置は、基本台帳において加入者をユニークに識別するための突合パターンを決定する。そして、その医療情報管理装置は、その突合パターンにより、その加入者に関するさまざまな種類の医療情報を相互に対応付けて管理する。更に、その医療情報管理装置は、健保加入者の記号、番号、漢字氏名及びカナ氏名を匿名化して、管理する。 For example, Patent Document 1 discloses a technique for managing multiple types of medical information with different subscriber identification information. The medical information management device disclosed in Patent Literature 1 determines a matching pattern for uniquely identifying a subscriber in the basic ledger. Then, the medical information management device manages various types of medical information related to the subscriber in association with each other according to the matching pattern. Further, the medical information management apparatus anonymizes and manages the symbols, numbers, kanji names, and kana names of health insurance subscribers.
 しかしながら、このような医療情報管理装置では、その加入者のプライバシ保護が不十分であるという問題があった。なぜならば、その医療情報管理装置において匿名化されていない生年月日や性別、病歴などの情報を組み合わせて個人を識別することが可能だからである。そして、その結果として、その個人が他人に知られたくない情報(センシティブ属性の属性値)を、他の個人に知られてしまう恐れがあるからである。尚、この生年月日や性別、病歴などのような情報は、準識別子(Quasi-Identifier)と呼ばれる。即ち、準識別子は、組み合わされることによって個人を識別することを可能にする情報である。換言すると、その基本台帳に含まれる情報は、匿名化により、十分にプライバシを保護されなければならない。 However, such a medical information management apparatus has a problem that the privacy protection of the subscriber is insufficient. This is because it is possible to identify an individual by combining information such as date of birth, sex, medical history, etc. that has not been anonymized in the medical information management apparatus. As a result, information that the individual does not want to be known to others (attribute value of the sensitive attribute) may be known to other individuals. Information such as date of birth, sex, medical history, and the like is called a quasi-identifier. That is, the quasi-identifier is information that makes it possible to identify an individual by being combined. In other words, the information contained in the basic ledger must be sufficiently protected for privacy by anonymization.
 図26は、プライバシ保護が必要であり、匿名化の対象となる、一般的なパーソナル情報910の一例を示す図である。 FIG. 26 is a diagram illustrating an example of general personal information 910 that needs privacy protection and is an anonymization target.
 図26に示すように、パーソナル情報910は、識別子915としてID(Identifier)を、準識別子916として郵便番号を、準識別子917として年齢を、及びセンシティブ属性(Sensitive Attribute)918として病名を有する。尚、パーソナル情報910を構成する、郵便番号、年齢及び病名などは、一般的に属性とも呼ばれる。 26, personal information 910 has an identifier (ID) as an identifier 915, a postal code as a quasi-identifier 916, an age as a quasi-identifier 917, and a disease name as a sensitive attribute (918). Note that the zip code, age, disease name, and the like constituting the personal information 910 are also generally called attributes.
 そして、図26に示すように、パーソナル情報910は、複数のレコードからなるテーブルで表される。レコードのそれぞれは、各個人に対応する属性値(各属性、即ち郵便番号、年齢及び病名のそれぞれの値)を含む。センシティブ属性918は、特定の個人に対応付けられて開示されることを防止したい情報である。尚、郵便番号、年齢及び病名などの属性は、状況に応じて、準識別子及びセンシティブ属性のいずれとしても扱われる。 Then, as shown in FIG. 26, personal information 910 is represented by a table composed of a plurality of records. Each of the records includes attribute values corresponding to each individual (each attribute, that is, the zip code, age, and disease name). The sensitive attribute 918 is information that is desired to be prevented from being disclosed in association with a specific individual. Note that attributes such as the postal code, age, and disease name are treated as either a semi-identifier or a sensitive attribute depending on the situation.
 図27は、パーソナル情報910に対応する、匿名化情報920である。匿名化情報920は、IDを含まない。匿名化情報920の準識別子926及び準識別子927のそれぞれ属性値は、パーソナル情報910の準識別子916及び準識別子917のそれぞれの属性値が汎化されたものである。匿名化情報920のセンシティブ属性928は、パーソナル情報910のセンシティブ属性918と同じである。 FIG. 27 shows anonymized information 920 corresponding to personal information 910. The anonymization information 920 does not include an ID. The attribute values of the quasi-identifier 926 and the quasi-identifier 927 of the anonymization information 920 are obtained by generalizing the attribute values of the quasi-identifier 916 and the quasi-identifier 917 of the personal information 910, respectively. The sensitive attribute 928 of the anonymized information 920 is the same as the sensitive attribute 918 of the personal information 910.
 上述のようなパーソナル情報910の匿名化に関する技術が特許文献2、特許文献3、非特許文献1及び非特許文献2に記載されている。 Techniques regarding anonymization of the personal information 910 as described above are described in Patent Document 2, Patent Document 3, Non-Patent Document 1, and Non-Patent Document 2.
 特許文献2は、公開情報のプライバシを保護する技術を開示する。具体的には、特許文献2は、トップダウン処理とその結果についてのk-匿名性及びl-多様性の判定とを行う技術を開示する。また、特許文献2は、ボトムアップ処理とその結果についてのk-匿名性及びl-多様性の判定とを行う技術を開示する。ここで、そのトップダウン処理及びボトムアップ処理は、繰り返し実行される。更に特許文献2は、部分匿名化処理とその結果についてのk-匿名性及びl-多様性の判定とを行う技術を開示する。 Patent Document 2 discloses a technique for protecting the privacy of public information. Specifically, Patent Document 2 discloses a technique for performing top-down processing and determining k-anonymity and l-diversity for the result. Patent Document 2 discloses a technique for performing bottom-up processing and determining k-anonymity and l-diversity for the result. Here, the top-down process and the bottom-up process are repeatedly executed. Further, Patent Document 2 discloses a technique for performing partial anonymization processing and determining k-anonymity and l-diversity for the result.
 特許文献3は、準識別子に加えて、更に重要情報(センシティブ属性)を匿名化する技術を開示する。 Patent Document 3 discloses a technique for anonymizing important information (sensitive attribute) in addition to the quasi-identifier.
 非特許文献1は、匿名化手法の一例を開示する。具体的には、非特許文献1は、モンドリアン(Mondrian)法を用いた、トップダウンアプローチによる匿名化方法を開示する。 Non-Patent Document 1 discloses an example of an anonymization method. Specifically, Non-Patent Document 1 discloses an anonymization method based on a top-down approach using a Mondrian method.
 図28は、モンドリアン法を用いたl-多様化の一例を示す。 FIG. 28 shows an example of l-diversification using the Mondrian method.
 図28に示すQID1及びQID2は、準識別子である。また、図28に示すSAは、センシティブ属性である。また、図28に示す元ビュー931は、元(未加工)の属性値からなる、あるパーソナル情報(不図示)のビューである。ここで、ビューとは、データベース技術におけるビューのことであり、例えば図26に示すようなテーブルの内の、一部の属性を抜き出したテーブルのことを言う。 QID1 and QID2 shown in FIG. 28 are quasi-identifiers. Also, SA shown in FIG. 28 is a sensitive attribute. An original view 931 shown in FIG. 28 is a view of certain personal information (not shown) composed of original (unprocessed) attribute values. Here, the view is a view in the database technology, for example, a table obtained by extracting some attributes from the table as shown in FIG.
 図28に示すように、l-多様化は、例えばある匿名化装置(不図示)により、以下の手順で実行される。 As shown in FIG. 28, l-diversification is executed by the following procedure using, for example, a certain anonymizing device (not shown).
 第1に、その匿名化装置は、元ビュー931に含まれる準識別子を最も汎化(曖昧な値に加工)し、最汎化ビュー932を生成する。最汎化ビュー932には、準識別子の値(属性値)の組み合わせが同一であるレコードからなるグループが1つできる。この、準識別子の値の組み合わせが同一であるレコードからなるグループは、等価クラスとも呼ばれる。 First, the anonymization device generates the most generalized view 932 by most generalizing the quasi-identifier included in the original view 931 (processing it into an ambiguous value). In the generalized view 932, one group of records having the same combination of quasi-identifier values (attribute values) can be created. This group of records having the same combination of quasi-identifier values is also called an equivalence class.
 第2に、その匿名化装置は、その等価クラスを分割し、その分割した等価クラス単位で準識別子を詳細化する。具体的には、その匿名化装置は、特定の準識別子の特定の値を境にして、元ビュー931におけるその特定の準識別子の属性値に基づいて、それらのレコードを再グループ化することで、その等価クラスを分割する。 Second, the anonymization device divides the equivalent class and refines the quasi-identifier in units of the divided equivalent class. Specifically, the anonymization device regroups the records based on the attribute value of the specific quasi-identifier in the original view 931 with the specific value of the specific quasi-identifier as a boundary. , Split its equivalence class.
 例えば、その匿名化装置は、その特定の準識別子を、分割対象のそのビューの左側の列から順番に選択する。また、その匿名化装置は、その等価クラスに含まれるその特定の準識別子の属性値の平均値を、その特定の値とする。 For example, the anonymization device selects the specific quasi-identifier sequentially from the left column of the view to be divided. In addition, the anonymization device sets an average value of attribute values of the specific quasi-identifier included in the equivalence class as the specific value.
 具体的には、その匿名化装置は、最汎化ビュー932について、その特定の準識別子としてQID1を選択し、その特定の値を「120(小数点以下を四捨五入)」と決定する。その匿名化装置は、この「120」を境にして、元ビュー931におけるQID1の属性値に基づいて、最汎化ビュー932に含まれるレコードを再グループ化する。次に、その匿名化装置は、再グループ化されたそれぞれの等価クラス毎に準識別子を詳細化し、分割ビュー933を生成する。 Specifically, the anonymization device selects QID1 as the specific quasi-identifier for the generalized view 932 and determines the specific value as “120 (rounds off the decimal point)”. The anonymization device regroups the records included in the generalized view 932 based on the attribute value of QID1 in the original view 931, with this “120” as a boundary. Next, the anonymization device refines the quasi-identifier for each regrouped equivalence class and generates a split view 933.
 尚、その特定の準識別子は、値域の最も大きい準識別子であってもよい。また、その特定の値は、その特定の準識別子の属性値の中央値であってもよい。その特定の準識別子の選択手法及びその特定の値の決定手法に関しては、複数の公知技術が存在するが、それらの技術の説明は割愛する。 The specific quasi-identifier may be the quasi-identifier having the largest value range. The specific value may be a median value of the attribute values of the specific quasi-identifier. There are a plurality of known techniques regarding the method for selecting the specific quasi-identifier and the method for determining the specific value, but the description of these techniques is omitted.
 第3に、その匿名化装置は、上述の第2の処理で生成した分割ビューが所定の匿名性を満たしている間、その第2の処理を繰り返す。 Thirdly, the anonymization device repeats the second process while the divided view generated in the second process described above satisfies a predetermined anonymity.
 具体的には、図28は、所定の匿名性を3-多様性として、分割ビュー933から分割ビュー934が生成され、更に分割ビュー934から分割ビュー935が生成されることを示す。ここで、3-多様性は、l-多様性のlが「3」であることを意味する。換言すると、3-多様性は、「l=3のl-多様性」を表すとも言える。以後、3-多様性の「3」が任意の数値に置き換えられた表示も、l-多様性の「l」がその「任意の数値」であることを意味する。 Specifically, FIG. 28 shows that a split view 934 is generated from the split view 933 and a split view 935 is further generated from the split view 934, with predetermined anonymity as 3-diversity. Here, 3-diversity means that l-diversity l is “3”. In other words, it can be said that 3-diversity represents “l-diversity with l = 3”. Hereinafter, the display in which “3” of 3-diversity is replaced with an arbitrary numerical value also means that “l” of 1-diversity is the “arbitrary numerical value”.
 非特許文献2は、l-多様性の検証方法の一例を開示する。具体的には、非特許文献2は、それらの複数の匿名化されたビューを突き合せた場合のl-多様性の検証方法を開示する。以下、このl-多様性を、「複数ビューのl-多様性」と呼ぶ。また、「匿名化されたビュー」を、「匿名化ビュー」と呼ぶ。 Non-Patent Document 2 discloses an example of an l-diversity verification method. Specifically, Non-Patent Document 2 discloses a method for verifying l-diversity when matching a plurality of anonymized views thereof. Hereinafter, this l-diversity is referred to as “l-diversity of multiple views”. The “anonymized view” is referred to as “anonymized view”.
 図29は、複数ビューのl-多様性を説明する図である。 FIG. 29 is a diagram for explaining the l-diversity of a plurality of views.
 図29に示すように、パーソナル情報941は、識別子として「ID」を、準識別子として「QI_1」及び「QI_2」を、更にセンシティブ属性として「SA」を、含む。 29, personal information 941 includes “ID” as an identifier, “QI_1” and “QI_2” as quasi-identifiers, and “SA” as a sensitive attribute.
 パーソナル情報941について、第1のパターンとして「QI_1」と「SA」との関連、及び第2のパターンとして「QI_2」と「SA」との関連の、それぞれが分析される場合を想定する。また、この場合の所要の匿名性を、2-多様性とする。 Assume that the personal information 941 is analyzed for the relationship between “QI_1” and “SA” as the first pattern and the relationship between “QI_2” and “SA” as the second pattern. The required anonymity in this case is 2-diversity.
 第1ビュー942は、第1のパターンに対応するビューである。第1匿名化ビュー944は、第1ビュー942が2-多様性を満足するように匿名化されたビューである。第2ビュー943は、第2のパターンに対応するビューである。第2匿名化ビュー945は、第2ビュー943が2-多様性を満足するように匿名化されたビューである。 The first view 942 is a view corresponding to the first pattern. The first anonymized view 944 is an anonymized view so that the first view 942 satisfies 2-diversity. The second view 943 is a view corresponding to the second pattern. The second anonymized view 945 is an anonymized view so that the second view 943 satisfies 2-diversity.
 図29に示すように、第1匿名化ビュー944及び第2匿名化ビュー945のいずれも、2-多様性を満足している。 As shown in FIG. 29, both the first anonymized view 944 and the second anonymized view 945 satisfy 2-diversity.
 ここで、ある攻撃者が、センシティブ属性を特定するための攻撃を実行する。その攻撃者は、前提知識として、「user4」の「QI_1」が「13」であり「QI_2」が「21」であることを知っている。 Here, an attacker performs an attack to identify sensitive attributes. The attacker knows that “QI_1” of “user4” is “13” and “QI_2” is “21” as prerequisite knowledge.
 その攻撃者は、その前提知識に基づいて、第1匿名化ビュー944から「user4」の「SA」の値が、「B」または「C」であることを推測できる。また、その攻撃者は、その前提知識に基づいて、第2匿名化ビュー945から「user4」の「SA」の値が、「A」または「B」であることを推測できる。従って、その攻撃者は、「user4」の「SA」の値が、「B」であることを特定できる。換言すると、この場合の複数ビューのl-多様性は、1-多様性である。 The attacker can infer from the first anonymized view 944 that the value of “SA” of “user4” is “B” or “C” based on the premise knowledge. Further, the attacker can infer from the second anonymized view 945 that the value of “SA” of “user4” is “A” or “B” based on the premise knowledge. Therefore, the attacker can specify that the value of “SA” of “user4” is “B”. In other words, the l-diversity of the multiple views in this case is 1-diversity.
 非特許文献2には、複数ビューのl-多様性の検証手法として、以下のアルゴリズムが記載されている。第1に、その検証手法は、各匿名化ビューにおいて、対象IDに対応するセンシティブ属性を取得する。第2に、その検証手法は、その取得したセンシティブ属性のID毎の積集合を求める。第3に、その積集合のセンシティブ属性の数が、l(検証しようとしているl-多様性の「l」)以上か否かにより、その対象IDにおけるl-多様性を判定する。 Non-Patent Document 2 describes the following algorithm as a method for verifying l-diversity of multiple views. First, the verification method acquires a sensitive attribute corresponding to the target ID in each anonymized view. Second, the verification method obtains a product set for each ID of the acquired sensitive attribute. Third, the l-diversity in the target ID is determined based on whether or not the number of sensitive attributes in the product set is equal to or greater than l (l-diversity “l” to be verified).
 図30は、図29に示す第1匿名化ビュー944と第2匿名化ビュー945との2つの匿名化ビューのl-多様性を検証した例を説明する図である。 FIG. 30 is a diagram for explaining an example in which the l-diversity of the two anonymized views 944 and 945 shown in FIG. 29 is verified.
 図30に示すように、属性値951は、第1匿名化ビュー944から推測され得る、各個人のセンシティブ属性の属性値である。属性値952は、第2匿名化ビュー945から推測され得る、各個人のセンシティブ属性の属性値である。積集合953は、属性値951と属性値952との積集合である。 30, the attribute value 951 is an attribute value of each individual sensitive attribute that can be inferred from the first anonymized view 944. The attribute value 952 is an attribute value of the sensitive attribute of each individual that can be inferred from the second anonymized view 945. The product set 953 is a product set of the attribute value 951 and the attribute value 952.
 この積集合953のそれぞれに含まれるセンシティブ属性の数が、それぞれの対象IDにおける、複数ビューのl-多様性の「l」の値である。 The number of sensitive attributes included in each product set 953 is the value of “l” of l-diversity of multiple views in each target ID.
特開2012-098879号公報JP 2012-098879 A 特開2012-159982号公報JP 2012-159982 A 特開2013-084027号公報JP 2013-084027 A
 しかしながら、上述した先行技術文献に記載された技術においては、複数の匿名化ビュー間における情報損失量のばらつきが大きいという問題点がある。換言すると、ある匿名化ビューの情報損失量は必要以上に小さく、他の匿名化ビューの情報損失量は所要の限度を超えて大きい場合があるという問題点がある。 However, the technique described in the above-described prior art document has a problem in that the amount of information loss varies among a plurality of anonymized views. In other words, there is a problem that the information loss amount of a certain anonymized view is smaller than necessary, and the information loss amount of another anonymized view may be larger than a required limit.
 その理由は、特許文献2及び3に開示された匿名化技術のいずれも、非特許文献2に示されるようなその「複数の匿名化ビュー」が生成されることを考慮していないからである。従って、第1の匿名化ビューを生成した後に更に第2の匿名化ビューを生成する場合、「それらの複数の匿名化ビューを突き合せた場合のl-多様性」を満足させるために、その第2の匿名化ビューの情報損失量を犠牲にしなければならない場合があるからである。 The reason is that none of the anonymization techniques disclosed in Patent Documents 2 and 3 considers that the “plurality of anonymization views” as shown in Non-Patent Document 2 are generated. . Therefore, when generating a second anonymized view after generating the first anonymized view, in order to satisfy “l-diversity when matching those anonymized views” This is because the amount of information loss in the second anonymized view may have to be sacrificed.
 具体的には、特許文献2及び3に開示された匿名化技術は、原則的に情報損失量が最小になるように匿名化(先行匿名化と呼ぶ)を実行する。従って、更に、別の観点(先行匿名化の際の準識別子とは異なる、別の準識別子との対応の観点)で匿名化(後行匿名化と呼ぶ)を実行する場合、その先行匿名化により生成された匿名化ビューの匿名性についても合わせて考慮する必要がある。即ち、その後行匿名化は、その先行匿名化による結果からの制限を受ける。 Specifically, the anonymization techniques disclosed in Patent Documents 2 and 3 perform anonymization (referred to as pre-anonymization) in principle so that the amount of information loss is minimized. Therefore, when anonymization (referred to as subsequent anonymization) is performed from another point of view (in terms of correspondence with another quasi-identifier, which is different from the quasi-identifier at the time of preceding anonymization), the preceding anonymization is performed. It is also necessary to consider the anonymity of the anonymized view generated by. That is, the subsequent line anonymization is limited by the result of the preceding anonymization.
 本発明の目的は、上述した問題点を解決する情報処理装置、匿名化方法、及びそのためのプログラム或いはそのプログラムを記録したコンピュータ読み取り可能な非一時的記録媒体を提供することにある。 An object of the present invention is to provide an information processing apparatus, an anonymization method, and a computer-readable non-transitory recording medium recording the program for solving the above-described problems.
 本発明の一様態における情報処理装置は、複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得するビュー取得手段と、前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力する匿名化手段と、を含む。 An information processing apparatus according to an embodiment of the present invention performs, in parallel, a view acquisition unit that acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and anonymization of the plurality of views in parallel. Anonymizing means for outputting the obtained anonymized view.
 本発明の一様態における匿名化方法は、コンピュータが、複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得し、前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力する。 In the anonymization method according to one embodiment of the present invention, a computer obtains a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and executes anonymization of the plurality of views in parallel. Output anonymized views.
 本発明の一様態におけるコンピュータ読み取り可能な非一時的記録媒体は、複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得し、前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力する処理をコンピュータに実行させるプログラムを記録する。 A computer-readable non-transitory recording medium according to one embodiment of the present invention acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals and performs anonymization of the plurality of views in parallel. A program that causes the computer to execute the process of outputting the anonymized view obtained in this way is recorded.
 本発明は、複数の匿名化されたビュー間における、情報損失量のばらつきを低減することが可能になるという効果がある。 The present invention has an effect that it is possible to reduce variation in the amount of information loss between a plurality of anonymized views.
図1は、本発明の第1の実施形態に係る匿名化装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment of the present invention. 図2は、第1の実施形態におけるパーソナル情報の一例を示す。FIG. 2 shows an example of personal information in the first embodiment. 図3は、第1の実施形態におけるビューの一例を示す。FIG. 3 shows an example of a view in the first embodiment. 図4は、第1の実施形態におけるビューの一例を示す。FIG. 4 shows an example of a view in the first embodiment. 図5は、第1の実施形態におけるパターンの一例を示す。FIG. 5 shows an example of a pattern in the first embodiment. 図6は、第1の実施形態に係る匿名化装置を実現するコンピュータのハードウェア構成を示すブロック図である。FIG. 6 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization device according to the first embodiment. 図7は、第1の実施形態における匿名化装置の動作を示すフローチャートである。FIG. 7 is a flowchart illustrating the operation of the anonymization device according to the first embodiment. 図8は、第1の実施形態における最大汎化ビュー生成の例を説明する図である。FIG. 8 is a diagram illustrating an example of maximum generalized view generation in the first embodiment. 図9は、第1の実施形態における詳細化段階の例を説明する図である。FIG. 9 is a diagram for explaining an example of the refinement stage in the first embodiment. 図10は、第1の実施形態における詳細化段階の例を説明する図である。FIG. 10 is a diagram for explaining an example of the refinement stage in the first embodiment. 図11は、関連技術におけるビューの一例を示す。FIG. 11 shows an example of a view in the related art. 図12は、関連技術におけるビューの一例を示す。FIG. 12 shows an example of a view in the related art. 図13は、第1の実施形態における匿名化情報の一例を示す。FIG. 13 shows an example of anonymization information in the first embodiment. 図14は、第1の実施形態の第1の変形例における匿名化装置の動作を示すフローチャートである。FIG. 14 is a flowchart illustrating the operation of the anonymization device according to the first modification example of the first embodiment. 図15は、第1の実施形態における詳細化段階と複数ビューのl-多様性の判定との例を説明する図である。FIG. 15 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment. 図16は、第1の実施形態における詳細化段階と複数ビューのl-多様性の判定との例を説明する図である。FIG. 16 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment. 図17は、本発明の第2の実施形態に係る匿名化装置の構成を示すブロック図である。FIG. 17 is a block diagram showing a configuration of the anonymization device according to the second exemplary embodiment of the present invention. 図18は、第2の実施形態における分割点候補を説明する図である。FIG. 18 is a diagram for explaining division point candidates in the second embodiment. 図19は、第2の実施形態における分割点候補を説明する図である。FIG. 19 is a diagram for explaining division point candidates in the second embodiment. 図20は、第2の実施形態における等価クラスのセンシティブ属性について説明する図である。FIG. 20 is a diagram for explaining the sensitive attribute of the equivalent class in the second embodiment. 図21は、第2の実施形態における匿名化装置の動作を示すフローチャートである。FIG. 21 is a flowchart illustrating the operation of the anonymization device according to the second embodiment. 図22は、第2の実施形態における詳細化段階と複数ビューのl-多様性の判定との例を説明する図である。FIG. 22 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment. 図23は、第2の実施形態における詳細化段階と複数ビューのl-多様性の判定との例を説明する図である。FIG. 23 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment. 図24は、関連技術におけるビューの一例を示す。FIG. 24 shows an example of a view in the related art. 図25は、本発明の第3の実施形態に係る匿名化装置の構成を示すブロック図である。FIG. 25 is a block diagram showing a configuration of the anonymization device according to the third exemplary embodiment of the present invention. 図26は、関連技術におけるパーソナル情報の一例を示す。FIG. 26 shows an example of personal information in the related art. 図27は、関連技術における匿名化情報の一例を示す。FIG. 27 shows an example of anonymization information in the related art. 図28は、関連技術におけるモンドリアン法を用いたl-多様化の一例を示す。FIG. 28 shows an example of l-diversification using the Mondrian method in the related art. 図29は、関連技術における複数ビューのl-多様性を説明する図である。FIG. 29 is a diagram for explaining l-diversity of a plurality of views in the related art. 図30は、関連技術における複数ビューのl-多様性の検証を説明する図である。FIG. 30 is a diagram illustrating verification of l-diversity of multiple views in the related art.
 本発明を実施するための形態について図面を参照して詳細に説明する。尚、各図面及び明細書記載の各実施形態において、同様の構成要素には同様の符号を付与し、適宜説明を省略する。 Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In each embodiment described in each drawing and specification, the same reference numerals are given to the same components, and the description thereof is omitted as appropriate.
 <<<第1の実施形態>>>
 図1は、本発明の第1の実施形態に係る匿名化装置100の構成を示すブロック図である。
<<<< first embodiment >>>>
FIG. 1 is a block diagram showing the configuration of the anonymization device 100 according to the first embodiment of the present invention.
 図1に示すように、本実施形態に係る匿名化装置100は、ビュー取得部110と、匿名化部120とを含む。尚、図1に示す構成要素は、ハードウェア単位の構成要素でも、コンピュータ装置の機能単位に分割した構成要素でもよい。ここでは、図1に示す構成要素は、コンピュータ装置の機能単位に分割した構成要素として説明する。 As shown in FIG. 1, the anonymization device 100 according to the present embodiment includes a view acquisition unit 110 and an anonymization unit 120. The constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of the computer apparatus. Here, the components shown in FIG. 1 will be described as components divided into functional units of the computer apparatus.
 ===パーソナル情報810===
 図2は、パーソナル情報810の一例を示す図である。図2に示すように、パーソナル情報810は、複数の個人のそれぞれの、識別子815と複数の属性とを含む。ここで、その複数の属性は、準識別子816、準識別子817及びセンシティブ属性818である。即ち、パーソナル情報810の1つのレコードは、ある個人の識別子と複数の属性とを含む。
=== Personal information 810 ===
FIG. 2 is a diagram illustrating an example of the personal information 810. As shown in FIG. 2, the personal information 810 includes an identifier 815 and a plurality of attributes for each of a plurality of individuals. Here, the plurality of attributes are a quasi-identifier 816, a quasi-identifier 817, and a sensitive attribute 818. That is, one record of the personal information 810 includes an identifier of a certain individual and a plurality of attributes.
 図2の最上行に示す「ID」、「QI_1」、「QI_2」及び「SA」は、属性名である。 “ID”, “QI_1”, “QI_2”, and “SA” shown in the top row of FIG. 2 are attribute names.
 ===ビュー===
 図3及び図4のそれぞれは、パーソナル情報810の内の任意の属性の組み合わせからなるビューの一例を示す。以後、任意のパーソナル情報から取得されるそのビュー(例えば、後述するビュー821、ビュー822など)を総称して、取得ビューとも記載する。
=== View ===
Each of FIG. 3 and FIG. 4 shows an example of a view composed of a combination of arbitrary attributes in the personal information 810. Hereinafter, the views acquired from arbitrary personal information (for example, a view 821 and a view 822 described later) are collectively referred to as an acquired view.
 図3は、パーソナル情報810の内の準識別子816とセンシティブ属性818との組み合わせからなるビュー821を示す図である。 FIG. 3 is a diagram showing a view 821 including a combination of the quasi-identifier 816 and the sensitive attribute 818 in the personal information 810.
 図4は、パーソナル情報810の内の準識別子817とセンシティブ属性818との組み合わせからなるビュー822を示す図である。 FIG. 4 is a diagram showing a view 822 that is a combination of the quasi-identifier 817 and the sensitive attribute 818 in the personal information 810.
 ===パターン804===
 パターンとは、どの属性の組において相関などの分析をしたいかを表す情報である。そのパターンは、例えば、準識別子やセンシティブ属性を特定する情報(例えば属性名)で表される。尚、その準識別子やそのセンシティブ属性は、複数であっても良い。
=== Pattern 804 ===
The pattern is information indicating which attribute group is desired to be analyzed such as correlation. The pattern is represented, for example, by information (for example, attribute name) that specifies a quasi-identifier or a sensitive attribute. The quasi-identifier and its sensitive attribute may be plural.
 図5は、パターン804の一例を示す図である。図5に示すように、パターン804のそれぞれは、属性名の「QI_1」及び「SA」の組と、属性名の「QI_2」及び「SA」の組とのそれぞれを含む。換言すると、図5に示すパターン804は、「QI_1」及び「SA」の属性の組からなるビュー821と、「QI_2」及び「SA」の属性の組からなるビュー822とを取得することを、示唆する情報である。尚、図5に示すパターン804は、「{QI_1,SA}及び{QI_2,SA}」のようにも表記される。 FIG. 5 is a diagram illustrating an example of the pattern 804. As shown in FIG. 5, each of the patterns 804 includes a pair of attribute names “QI_1” and “SA” and a pair of attribute names “QI_2” and “SA”. In other words, the pattern 804 illustrated in FIG. 5 obtains a view 821 including a set of attributes “QI_1” and “SA” and a view 822 including a set of attributes “QI_2” and “SA”. It is information to suggest. The pattern 804 shown in FIG. 5 is also expressed as “{QI_1, SA} and {QI_2, SA}”.
 次に、匿名化装置100の機能単位の各構成要素について説明する。 Next, each component of the functional unit of the anonymization device 100 will be described.
 ===ビュー取得部110===
 ビュー取得部110は、パーソナル情報810から複数の取得ビューを取得する。
=== View Acquisition Unit 110 ===
The view acquisition unit 110 acquires a plurality of acquisition views from the personal information 810.
 例えば、ビュー取得部110は、図5に示すパターン804に基づいて、パーソナル情報810から、図3に示すビュー821及び図4に示すビュー822を取得する。 For example, the view acquisition unit 110 acquires the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 from the personal information 810 based on the pattern 804 shown in FIG.
 パーソナル情報810の入力及びパターン804の入力は、次のように行われる。 Input of personal information 810 and input of pattern 804 are performed as follows.
 例えば、パーソナル情報810の入力において、ビュー取得部110は、匿名化装置100の図示しない記憶手段に予め記憶されたパーソナル情報810を読み出す。また、ビュー取得部110は、匿名化装置100の図示しない通信手段を介して、外部からパーソナル情報810を受信してもよい。 For example, when inputting the personal information 810, the view acquisition unit 110 reads the personal information 810 stored in advance in a storage unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may receive the personal information 810 from the outside via a communication unit (not shown) of the anonymization device 100.
 例えば、パターン804の入力において、ビュー取得部110は、匿名化装置100の図示しない入力手段を介して操作者から与えられたパターン804の入力を、受け付ける。また、ビュー取得部110は、匿名化装置100の図示しない記憶手段に予め記憶されたパターン804を読み出してもよい。 For example, in the input of the pattern 804, the view acquisition unit 110 accepts the input of the pattern 804 given by the operator via an input unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may read a pattern 804 stored in advance in a storage unit (not shown) of the anonymization device 100.
 ===匿名化部120===
 匿名化部120は、ビュー取得部110が取得したその複数の取得ビューの匿名化を並列に実行する。次に、匿名化部120は、その匿名化により得た匿名化ビューを出力する。尚、匿名化ビューは、その出力される匿名化ビューの総称である。
=== Anonymizing unit 120 ===
The anonymization unit 120 performs anonymization of the plurality of acquired views acquired by the view acquisition unit 110 in parallel. Next, the anonymization unit 120 outputs the anonymized view obtained by the anonymization. The anonymized view is a generic name for the output anonymized view.
 例えば、匿名化部120は、図3に示すビュー821及び図4に示すビュー822の匿名化を並行に実行して得た、ビュー821及びビュー822のそれぞれに対応する匿名化ビューを出力する。 For example, the anonymization unit 120 outputs anonymized views corresponding to the views 821 and 822 obtained by executing the anonymization of the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 in parallel.
 ここで、匿名化部120は、例えば、モンドリアン(Mondrian)法を用いて、その匿名化を実行する。そのモンドリアン法は、匿名化対象情報(例えば、ビュー821及びビュー822のそれぞれ)を最大汎化された状態の単一の等価クラス(グループ)に変換し、その等価クラスを分割して新たな等価クラスを生成することを繰り返す手法である。ここで、「最大汎化された状態の単一の等価クラス」は、匿名化対象情報(例えば、ビュー821及びビュー822のそれぞれ)を1つのグループとする等価クラスである。また、その新たな等価クラスは、その分割によって生成された新たなグループ単位で、準識別子を詳細化された等価クラスである。以降、1回のその分割とその詳細化との組を、詳細化段階と呼ぶ。 Here, the anonymization unit 120 executes the anonymization using, for example, a Mondrian method. The Mondrian method converts anonymization target information (for example, each of the view 821 and the view 822) into a single equivalent class (group) in a state of maximum generalization, and divides the equivalent class to create a new equivalent class. It is a technique that repeats generating a class. Here, the “single equivalence class in the state of maximum generalization” is an equivalence class having anonymization target information (for example, each of the view 821 and the view 822) as one group. The new equivalence class is an equivalence class in which the quasi-identifier is refined in units of new groups generated by the division. Hereinafter, a combination of one division and detailing is referred to as a detailing stage.
 ここで、「全てのレコードが単一の等価クラスに含まれて1つのグループを成す状態」とは、その匿名化情報に含まれる全ての準識別子が、最も汎化された状態である。 Here, “a state in which all records are included in a single equivalence class to form one group” is a state in which all quasi-identifiers included in the anonymization information are most generalized.
 また、「新たなグループ単位での準識別子の詳細化」とは、その新たなグループに含まれる準識別子の値を、それらの準識別子の元の値の最大値から最小値までを示す値に加工し、新たな等価クラスとすることである。 In addition, “detailed quasi-identifiers in new group units” means that the quasi-identifier values included in the new group are changed from the maximum value to the minimum value of the original values of those quasi-identifiers. Process it and make it a new equivalence class.
 即ち、匿名化部120は、その複数の取得ビューのそれぞれについてのその詳細化段階を、並列に実行する。より具体的には、匿名化部120は、ビュー821の1回目の詳細化段階を実行し、続けてビュー822の1回目のその詳細化段階を実行する。次に、匿名化部120は、ビュー821の2回目のその詳細化段階を実行し、続けてビュー822の2回目の詳細化段階を実行する。次に、匿名化部120は、3回目以降のその詳細化段階も同様に実行する。また、匿名化部120は、取得ビューが3個以上の場合も同様に、それぞれの取得ビューの1回目のその詳細化段階を実行し、その後、それぞれの取得ビューの2回目のその詳細化段階を実行する。 That is, the anonymization unit 120 executes the refinement stage for each of the plurality of acquired views in parallel. More specifically, the anonymization unit 120 executes the first refinement stage of the view 821, and then executes the first refinement stage of the view 822. Next, the anonymization unit 120 executes the second refinement stage of the view 821, and then executes the second refinement stage of the view 822. Next, the anonymization unit 120 performs the detailed steps after the third time in the same manner. Also, the anonymization unit 120 executes the first refinement stage of each acquired view in the same manner when there are three or more acquired views, and then the second refinement stage of each acquired view. Execute.
 以上が、匿名化装置100の機能単位の各構成要素についての説明である。 This completes the description of each component of the functional unit of the anonymization device 100.
 次に、匿名化装置100のハードウェア単位の構成要素について説明する。 Next, the components of the anonymization device 100 in hardware units will be described.
 図6は、本実施形態における匿名化装置100を実現するコンピュータ700のハードウェア構成を示す図である。 FIG. 6 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.
 図6に示すように、コンピュータ700は、CPU(Central Processing Unit)701、記憶部702、記憶装置703、入力部704、出力部705及び通信部706を含む。更に、コンピュータ700は、外部から供給される記録媒体(または記憶媒体)707を含む。記録媒体707は、情報を非一時的に記憶する不揮発性記録媒体であってもよい。 As shown in FIG. 6, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
 CPU701は、オペレーティングシステム(不図示)を動作させて、コンピュータ700の、全体の動作を制御する。また、CPU701は、例えば記憶装置703に装着された記録媒体707から、プログラムやデータを読み込み、読み込んだプログラムやデータを記憶部702に書き込む。ここで、そのプログラムは、例えば、後述の図7に示すフローチャートの動作をコンピュータ700に実行させるプログラムである。 The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). The CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702. Here, the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
 そして、CPU701は、読み込んだプログラムに従って、また読み込んだデータに基づいて、図1に示すビュー取得部110及び匿名化部120として各種の処理を実行する。 Then, the CPU 701 executes various processes as the view acquisition unit 110 and the anonymization unit 120 shown in FIG. 1 according to the read program and based on the read data.
 尚、CPU701は、通信網(不図示)に接続されている外部コンピュータ(不図示)から、記憶部702にプログラムやデータをダウンロードするようにしてもよい。 Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
 記憶部702は、プログラムやデータを記憶する。記憶部702は、パーソナル情報810やパターン804、各取得ビュー(例えば、ビュー821やビュー822など)などを記憶してよい。 The storage unit 702 stores programs and data. The storage unit 702 may store personal information 810, a pattern 804, each acquired view (for example, the view 821 and the view 822), and the like.
 記憶装置703は、例えば、光ディスク、フレキシブルディスク、磁気光ディスク、外付けハードディスク及び半導体メモリであって、記録媒体707を含む。記憶装置703(記録媒体707)は、プログラムをコンピュータ読み取り可能に記憶する。また、記憶装置703は、データを記憶してもよい。記憶装置703は、パーソナル情報810やパターン804、各取得ビューなどを記憶してよい。 The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 (recording medium 707) stores the program in a computer-readable manner. The storage device 703 may store data. The storage device 703 may store personal information 810, patterns 804, acquired views, and the like.
 入力部704は、例えばマウスやキーボード、内蔵のキーボタンなどで実現され、入力操作に用いられる。入力部704は、マウスやキーボード、内蔵のキーボタンに限らず、例えばタッチパネルなどでもよい。匿名化装置100は、入力部704を介して入力されたパーソナル情報810やパターン804を取得するようにしてよい。 The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, for example. The anonymization device 100 may acquire the personal information 810 and the pattern 804 input via the input unit 704.
 出力部705は、例えばディスプレイで実現され、出力を確認するために用いられる。匿名化装置100は、出力部705を介して匿名化ビューを出力するようにしてよい。 The output unit 705 is realized by a display, for example, and is used for confirming the output. The anonymization device 100 may output the anonymized view via the output unit 705.
 通信部706は、外部とのインタフェースを実現する。通信部706は、ビュー取得部110及び匿名化部120の一部として含まれる。匿名化装置100は、通信部706を介して、外部からパーソナル情報810やパターン804を取得するようにしてよい。また、匿名化装置100は、通信部706を介して、外部へ匿名化ビューを出力するようにしてよい。 The communication unit 706 realizes an interface with the outside. The communication unit 706 is included as part of the view acquisition unit 110 and the anonymization unit 120. The anonymization device 100 may acquire the personal information 810 and the pattern 804 from the outside via the communication unit 706. Further, the anonymization device 100 may output the anonymization view to the outside via the communication unit 706.
 以上説明したように、図1に示す匿名化装置100の機能単位のブロックは、図6に示すハードウェア構成のコンピュータ700によって実現される。但し、コンピュータ700が備える各部の実現手段は、上記に限定されない。すなわち、コンピュータ700は、物理的に結合した1つの装置により実現されてもよいし、物理的に分離した2つ以上の装置を有線または無線で接続し、これら複数の装置により実現されてもよい。 As described above, the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
 尚、上述のプログラムのコードを記録した記録媒体707が、コンピュータ700に供給され、CPU701は、記録媒体707に格納されたプログラムのコードを読み出して実行するようにしてもよい。或いは、CPU701は、記録媒体707に格納されたプログラムのコードを、記憶部702、記憶装置703またはその両方に格納するようにしてもよい。すなわち、本実施形態は、コンピュータ700(CPU701)が実行するプログラム(ソフトウェア)を、一時的にまたは非一時的に、記憶する記録媒体707の実施形態を含む。尚、情報を非一時的に記憶する記憶媒体は、不揮発性記憶媒体とも呼ばれる。 Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily. A storage medium that stores information non-temporarily is also referred to as a non-volatile storage medium.
 以上が、本実施形態における匿名化装置100を実現するコンピュータ700の、ハードウェア単位の各構成要素についての説明である。 This completes the description of each component of the computer 700 that implements the anonymization device 100 according to the present embodiment.
 次に本実施形態の動作について、図1~図10を参照して詳細に説明する。 Next, the operation of this embodiment will be described in detail with reference to FIGS.
 図7は、本実施形態の動作を示すフローチャートである。尚、このフローチャートによる処理は、前述したCPU701によるプログラム制御に基づいて、実行されても良い。また、処理のステップ名については、S601のように、記号で記載する。 FIG. 7 is a flowchart showing the operation of this embodiment. Note that the processing according to this flowchart may be executed based on the program control by the CPU 701 described above. Further, the step name of the process is described by a symbol as in S601.
 前提として、所要の匿名性は2-匿名性(k=2の「k-匿名性」)、及び複数ビューの2-多様性(l=2の「複数ビューのl-多様性」)であるとする。 The premise is that the required anonymity is 2-anonymity (“k-anonymity” with k = 2) and 2-diversity with multiple views (“l-diversity with multiple views” with l = 2). And
 ビュー取得部110は、パーソナル情報810及びパターン804を取得する(S601)。 The view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
 次に、ビュー取得部110は、パターン804のそれぞれに基づいて、パーソナル情報810からビュー821及びビュー822を取得する(S602)。 Next, the view acquisition unit 110 acquires the view 821 and the view 822 from the personal information 810 based on each of the patterns 804 (S602).
 次に、匿名化部120は、ビュー821及びビュー822のそれぞれに含まれる準識別子(「QI_1」及び「QI_2」)を最も汎化した状態のビュー831及びビュー832を生成する(S603)。尚、ビュー831及びビュー832を生成する処理は、ビュー821及びビュー822のそれぞれをビュー831及びビュー832に更新する処理であってよい。同様に、以下の説明における各段階匿名化ビューを生成する処理は、その各段階匿名化ビューに対応する1段階前の段階匿名化ビューを更新する処理であってよい。ここで、段階匿名化ビューは、ビュー831、ビュー832、ビュー841(後述)、ビュー842(後述)、ビュー851(後述)及びビュー852(後述)の総称である。 Next, the anonymization unit 120 generates a view 831 and a view 832 that are the most generalized quasi-identifiers (“QI_1” and “QI_2”) included in each of the view 821 and the view 822 (S603). The process for generating the view 831 and the view 832 may be a process for updating the view 821 and the view 822 to the view 831 and the view 832, respectively. Similarly, the process for generating each stage anonymized view in the following description may be a process for updating the stage anonymized view one stage before corresponding to each stage anonymized view. Here, the stage anonymized view is a general term for a view 831, a view 832, a view 841 (described later), a view 842 (described later), a view 851 (described later), and a view 852 (described later).
 図8は、匿名化部120が、図3に示すビュー821に含まれる準識別子の「QI_1」を最も抽象化(最大汎化)し、ビュー831を生成するイメージを示す。また、図8は、匿名化部120が、図4に示すビュー822に含まれる準識別子の「QI_2」を最も抽象化(最大汎化)し、ビュー832を生成するイメージを示す。 FIG. 8 shows an image in which the anonymization unit 120 generates the view 831 by most abstracting (maximum generalization) the quasi-identifier “QI — 1” included in the view 821 shown in FIG. FIG. 8 illustrates an image in which the anonymization unit 120 generates the view 832 by abstracting the quasi-identifier “QI — 2” included in the view 822 illustrated in FIG.
 次に、匿名化部120は、ビュー831及びビュー832のそれぞれについてのその詳細化段階を、並列に実行する(S604)。 Next, the anonymization unit 120 executes the details of each of the view 831 and the view 832 in parallel (S604).
 図9は、匿名化部120が、1回目の詳細化段階を実行するイメージを示す。図9に示すように、匿名化部120は、ビュー831について、1回目の詳細化段階を実行し、ビュー841を生成する。また、図9に示すように、匿名化部120は、ビュー832について、1回目の詳細化段階を実行し、ビュー842を生成する。 FIG. 9 shows an image in which the anonymization unit 120 executes the first detailing stage. As illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 831 to generate the view 841. Further, as illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 832 to generate a view 842.
 図10は、匿名化部120が、2回目の詳細化段階を実行するイメージを示す。図10に示すように、匿名化部120は、ビュー841について、2回目の詳細化段階を実行し、ビュー851を生成する。また、図10に示すように、匿名化部120は、ビュー842について、2回目の詳細化段階を実行し、ビュー852を生成する。 FIG. 10 shows an image in which the anonymization unit 120 executes the second refinement stage. As illustrated in FIG. 10, the anonymization unit 120 performs the second refinement stage on the view 841 to generate a view 851. Also, as illustrated in FIG. 10, the anonymization unit 120 performs a second detailing step on the view 842 to generate a view 852.
 次に、匿名化部120は、その生成した各ビューの内、所要の匿名性を満足する、匿名化ビュー(ここでは、ビュー841及びビュー842)を、出力する(S605)。尚、ビュー851とビュー852とは、これらを突き合せた場合複数ビューの2-多様性を満足しないため、出力されない。 Next, the anonymization unit 120 outputs anonymized views (here, view 841 and view 842) that satisfy the required anonymity among the generated views (S605). The view 851 and the view 852 are not output because they do not satisfy the 2-diversity of a plurality of views when they are matched.
 尚、ある段階匿名化ビューの詳細化段階において、分割可能な等価クラスがない場合、匿名化部120は、その詳細化段階において、分割及び詳細化をスキップしてよい。 It should be noted that if there is no equivalence class that can be divided in the refinement stage of a certain stage anonymization view, the anonymization unit 120 may skip the division and refinement in the refinement stage.
 換言すると、匿名化部120は、分割可能な等価クラスを含む段階匿名化ビューのみについて、詳細化段階の分割及び詳細化を実行するようにしてよい。ここで「分割可能な等価クラスがない場合」とは、その段階匿名化ビューに対して分割及び詳細化を実行した場合、その段階匿名化ビューが、単独で満足すべき匿名性を満足できなくなる場合である。 In other words, the anonymization unit 120 may perform division and refinement of the refinement stage only for the stage anonymization view including the equivalence class that can be split. Here, “when there is no equivalence class that can be split” means that when the stage anonymization view is split and refined, the stage anonymization view cannot satisfy the anonymity that should be satisfied alone. Is the case.
 また、匿名化部120は、その段階匿名化ビューのいずれにおいても、分割可能な等価クラスがない場合は、S604の処理を終了する。尚、匿名化部120は予め定められた回数の詳細化段階を実行した場合に、S604の処理を終了するようにしてもよい。 Further, if there is no equivalence class that can be divided in any of the anonymization views, the anonymization unit 120 ends the process of S604. It should be noted that the anonymization unit 120 may end the process of S604 when executing a predetermined number of details.
 以上が本実施形態の動作の説明である。 The above is the description of the operation of this embodiment.
 次に、本実施形態による匿名化に対する比較として、関連技術による匿名化について説明する。 Next, as a comparison with anonymization according to the present embodiment, anonymization by related technology will be described.
 この関連技術は、非特許文献1が開示するモンドリアン法と非特許文献2が開示する「複数の匿名化ビューを突き合せた場合のl-多様性の検証方法」とを単純に組み合わせた手法である。例えば、その手法により2つの取得ビューを匿名化する処理は、単純に、まず1つ目のビューを匿名化し、その後、2つ目のビューを匿名化するという処理である。これに対し、本実施形態の匿名化装置100は、複数の取得ビューを並列に匿名化する。 This related technique is a method that simply combines the Mondrian method disclosed in Non-Patent Document 1 and the “l-diversity verification method when matching multiple anonymized views” disclosed in Non-Patent Document 2. is there. For example, the process of anonymizing two acquired views by this method is simply a process of first anonymizing the first view and then anonymizing the second view. On the other hand, the anonymization apparatus 100 of this embodiment anonymizes a plurality of acquired views in parallel.
 匿名化対象情報は、ビュー821及びビュー822であるとする。また、所要の匿名性は2-匿名性、及び複数ビューの2-多様性であるとする。 The anonymization target information is assumed to be a view 821 and a view 822. The required anonymity is 2-anonymity and 2-view diversity.
 関連技術においては、例えば、まずビュー821が匿名化され、その結果が出力される。次に、ビュー822が匿名化され、その結果が出力される。 In the related technology, for example, the view 821 is first anonymized and the result is output. Next, the view 822 is anonymized and the result is output.
 図11は、関連技術によるビュー821の匿名化を示す図である。図11に示すように、関連技術においては、ビュー821が匿名化され、2-多様性を満たしつつ情報損失量の最も小さい、ビュー851が出力される。 FIG. 11 is a diagram showing anonymization of the view 821 by the related technology. As shown in FIG. 11, in the related art, the view 821 is anonymized, and the view 851 having the smallest amount of information loss while satisfying 2-diversity is output.
 図12は、関連技術によるビュー822の匿名化を示す図である。図12に示すように、関連技術においては、ビュー822が匿名化される。ここで、ビュー842及びビュー852は、先に出力されたビュー851と突き合わされた場合、「user4」のレコードについて複数ビューの2-多様性を満足しない。そこで、ビュー822の匿名結果としてビュー832が出力される。 FIG. 12 is a diagram showing anonymization of the view 822 by the related technology. As shown in FIG. 12, in the related art, the view 822 is anonymized. Here, the view 842 and the view 852 do not satisfy the 2-diversity of the plurality of views for the record of “user4” when the view 852 and the view 851 are matched with each other. Therefore, the view 832 is output as the anonymous result of the view 822.
 結果的に、関連技術においては、ビュー851とビュー832とが出力される。 As a result, in the related technology, the view 851 and the view 832 are output.
 ここで、本実施形態における情報損失量について説明する。匿名化情報全体の情報損失量(以下、全体情報損失量と呼ぶ)の計算方法は、いくつか提案されているが、本実施形態では、ある準識別子の情報損失量を各レコードのその準識別子の汎化幅の合計とする。また、全体情報損失量をその匿名化情報に含まれる準識別子の情報損失量の合計とする。 Here, the amount of information loss in this embodiment will be described. Several methods for calculating the information loss amount of the entire anonymized information (hereinafter referred to as the total information loss amount) have been proposed. In this embodiment, the information loss amount of a certain quasi-identifier is set as the quasi-identifier of each record. Is the sum of generalization widths. The total information loss amount is defined as the total information loss amount of the quasi-identifier included in the anonymized information.
 図13は、匿名化情報921を示す。この匿名化情報921の全体情報損失量は、以下のように求められる。 FIG. 13 shows anonymization information 921. The total information loss amount of the anonymized information 921 is obtained as follows.
 準識別子の「ZIPコード」の情報損失量は、各レコードの汎化幅(全て、「1」)を合計して、「8」である。 The information loss amount of the quasi-identifier “ZIP code” is “8”, which is the sum of the generalization widths of all records (all “1”).
 準識別子の「年齢」の情報損失量は、各レコードの汎化幅(最上位行から順番に、「2」、「2」、「3」、「3」、「7」、「7」、「2」、「2」)を合計して、「28」である。 The information loss amount of “age” of the quasi-identifier is the generalization width of each record (in order from the top row, “2”, “2”, “3”, “3”, “7”, “7”, “2” and “2”) are totaled to be “28”.
 準識別子の「国籍」の情報損失量は、4か国を*で汎化しているものと仮定して、各レコードの汎化幅(全て、「4」)を合計して、「32」である。 Assuming that the quasi-identifier “nationality” information loss is generalized with * in four countries, the totalization width (all “4”) of each record is summed up to “32”. is there.
 従って、匿名化情報921の全体の全体情報損失量は、各準識別子の情報損失量を合計して、「68」である。 Therefore, the total information loss amount of the anonymized information 921 is “68”, which is the total information loss amount of each quasi-identifier.
 以上が、情報損失量の説明である。 The above is an explanation of the amount of information loss.
 本実施形態により出力される、ビュー841及びビュー842のそれぞれは、1回ずつ分割されている。そして、ビュー841及びビュー842のそれぞれの全体情報損失量は、「25」及び「43」である。 Each of the view 841 and the view 842 output by the present embodiment is divided once. The total information loss amounts of the view 841 and the view 842 are “25” and “43”, respectively.
 一方、関連技術において、ビュー851は2回分割されているのに対し、ビュー832は、1度も分割されていない。そして、ビュー851及びビュー832のそれぞれの全体情報損失量は、「17」及び「91」である。 On the other hand, in the related technology, the view 851 is divided twice, whereas the view 832 is never divided. The total information loss amounts of the view 851 and the view 832 are “17” and “91”, respectively.
 即ち、分析者は、本実施形態により出力されるビュー841及びビュー842のそれぞれを用いた分析が一定の精度を保つことを、期待できる。しかし、分析者は、関連技術により出力されるビュー851を用いた分析に高い精度が期待できるのに対し、同じく出力されるビュー832を用いた分析には実用的な精度を期待できない。換言すると、分析者は、関連技術により出力される複数の匿名化ビューでは、一定の精度を有する分析結果を得ることができない。 That is, the analyst can expect that the analysis using each of the view 841 and the view 842 output according to the present embodiment maintains a certain accuracy. However, the analyst can expect high accuracy in the analysis using the view 851 output by the related technology, but cannot expect practical accuracy in the analysis using the view 832 that is also output. In other words, the analyst cannot obtain an analysis result having a certain accuracy with a plurality of anonymized views output by the related technology.
 上述した本実施形態における第1の効果は、複数の匿名化ビュー間における、全体情報損失量のばらつきを低減することが可能になることである。 The first effect in the present embodiment described above is that it is possible to reduce the variation in the total information loss amount among a plurality of anonymized views.
 上述した本実施形態における第2の効果は、複数の匿名化ビューのそれぞれについて、一定の精度を有する分析結果を得ることが可能になることである。 The second effect of the present embodiment described above is that an analysis result having a certain accuracy can be obtained for each of a plurality of anonymized views.
 その理由は、ビュー取得部110が複数の取得ビューを取得し、匿名化部120が、その複数のビューの匿名化を並列に実行するようにしたからである。 The reason is that the view acquisition unit 110 acquires a plurality of acquisition views, and the anonymization unit 120 executes anonymization of the plurality of views in parallel.
 <<<第1の実施形態の第1の変形例>>>
 匿名化部120は、第1の段階の段階匿名化ビューのそれぞれについてその詳細化段階を実行して第2の段階の段階匿名化ビューに更新(を生成)する。次に、それらの第2の段階の段階匿名化ビューが複数ビューのl-多様性を満足するか否かを判定する。
<<< First Modification of First Embodiment >>>
The anonymization unit 120 executes the refinement stage for each of the first stage stage anonymized views and updates (generates) the second stage stage anonymized view. It is then determined whether those second stage stage anonymized views satisfy the l-diversity of the multiple views.
 匿名化部120は、複数ビューのl-多様性が満足されると判定した場合に、その第2の段階の段階匿名化ビューを新たな第1の段階の段階匿名化ビューとして、次の段階のその詳細化段階を実行する。また、匿名化部120は、複数ビューのl-多様性が満足されないと判定した場合に、その第2の段階の段階匿名化ビューを出力する。 If the anonymization unit 120 determines that l-diversity of the plurality of views is satisfied, the second stage stage anonymized view is used as a new first stage stage anonymized view, and the next stage That refinement stage of the. Also, when the anonymization unit 120 determines that the l-diversity of the plurality of views is not satisfied, the anonymization unit 120 outputs the second stage stage anonymization view.
 換言すると、匿名化部120は、複数ビューのl-多様性が満足される限り、その詳細化段階を繰り返し実行し、その複数ビューのl-多様性を満足し、かつ最も全体情報損失量の少ない段階匿名化ビューを匿名化ビューとして出力する。 In other words, as long as the l-diversity of a plurality of views is satisfied, the anonymization unit 120 repeatedly performs the refinement step, satisfies the l-diversity of the plurality of views, and generates the most total information loss amount. Output fewer anonymized views as anonymized views.
 次に本変形例の動作について、図面を参照して詳細に説明する。 Next, the operation of this modification will be described in detail with reference to the drawings.
 図14は、本変形例の動作を示すフローチャートである。 FIG. 14 is a flowchart showing the operation of this modification.
 ビュー取得部110は、パーソナル情報810及びパターン804を取得する(S601)。 The view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
 次に、ビュー取得部110は、パターン804のそれぞれに基づいて、パーソナル情報810から取得ビュー(例えば、ビュー821及びビュー822)を取得する(S602)。 Next, the view acquisition unit 110 acquires an acquisition view (for example, the view 821 and the view 822) from the personal information 810 based on each of the patterns 804 (S602).
 次に、匿名化部120は、取得ビューのそれぞれに含まれる準識別子を最も汎化した状態の、段階匿名化ビュー(例えば、ビュー831及びビュー832)を生成する(S603)。 Next, the anonymization unit 120 generates a stage anonymized view (for example, the view 831 and the view 832) that is the most generalized quasi-identifier included in each of the acquired views (S603).
 次に、匿名化部120は、各段階匿名化ビューに対して詳細化段階を実行する(S606)。 Next, the anonymization unit 120 executes the refinement stage for each stage anonymization view (S606).
 具体的には、1回目のS606の処理では、匿名化部120は、ビュー831及びビュー832のそれぞれ(第1の段階の段階匿名化ビュー)を、1段階詳細化したビュー841及びビュー842(第2の段階の段階匿名化ビュー)に更新する。 Specifically, in the process of S606 for the first time, the anonymization unit 120 performs the view 841 and the view 842 (detailed one-stage anonymized view) of the view 831 and the view 832 (the first-stage anonymized view). Update to the second stage stage anonymization view).
 次に、匿名化部120は、ビュー841及びビュー842が突き合わされて分析された場合に、複数ビューのl-多様性が満足されるか否かを判定する(S607)。 Next, when the view 841 and the view 842 are matched and analyzed, the anonymization unit 120 determines whether or not l-diversity of a plurality of views is satisfied (S607).
 図15は、1回目のS606の「詳細化段階」とそれに続くS607の「判定」との例を説明する図である。図15に示す例では、匿名化部120は、詳細化段階により、ビュー831及びビュー832のそれぞれを1段階詳細化したビュー841及びビュー842を生成する。また、匿名化部120は、ビュー841及びビュー842を突き合せた場合の、各個人に対応するレコード毎の、全ての推測可能なセンシティブ属性値を抽出する。ここで、そのセンシティブ属性値は、センシティブ属性の「SA」の属性値である。 FIG. 15 is a diagram for explaining an example of the “detailing stage” in S606 for the first time and the “determination” in S607. In the example illustrated in FIG. 15, the anonymization unit 120 generates a view 841 and a view 842 obtained by refining each of the view 831 and the view 832 by one stage at the detailing stage. Further, the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 841 and the view 842 are matched. Here, the sensitive attribute value is an attribute value of “SA” of the sensitive attribute.
 図15を参照すると、確認内容843は、各個人に対応する、ビュー841で推測できるセンシティブ属性の「SA」の属性値、ビュー842で推測できるセンシティブ属性の「SA」の属性値、及びそれらの積集合を示す。ここで、その積集合は、ビュー841で推測できるセンシティブ属性の「SA」の属性値と、ビュー842で推測できるセンシティブ属性の「SA」の属性値とを突き合せて推測できるセンシティブ属性の「SA」の属性値である。そして、図15を参照すると、いずれのレコードにおいても、そのそれらを突き合せて推測できるセンシティブ属性の「SA」の属性値の数は2以上である。従って、匿名化部120は、l=2の「複数ビューのl-多様性」が満足されると判定する。 Referring to FIG. 15, the confirmation content 843 corresponds to the attribute value “SA” of the sensitive attribute that can be inferred in the view 841, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 842, and Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 841 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 842 is “SA” of the sensitive attribute that can be estimated. ”Attribute value. Referring to FIG. 15, in any record, the number of attribute values of the sensitive attribute “SA” that can be estimated by matching the records is two or more. Therefore, the anonymization unit 120 determines that “l-diversity of multiple views” of l = 2 is satisfied.
 次に、複数ビューのl-多様性が満足される場合(S607でYES)、匿名化部120は、いずれかの段階匿名化ビューに分割点候補が存在するか否かを判定する(S608)。そして、その分割点候補が存在する場合(S608でYES)、処理はS606へ戻る。 Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 120 determines whether or not a division point candidate exists in any stage anonymization view (S608). . If the candidate for the dividing point exists (YES in S608), the process returns to S606.
 図16は、S607から戻ってきた場合の、2回目のS606の「詳細化段階」とそれに続くS607の「判定」との例を説明する図である。図16に示す例では、ビュー取得部110は、詳細化段階により、ビュー841及びビュー842のそれぞれを1段階詳細化したビュー851及びビュー852を生成する。また、匿名化部120は、ビュー851及びビュー852を突き合せた場合の、各個人に対応するレコード毎の、全ての推測可能なセンシティブ属性値を抽出する。 FIG. 16 is a diagram for explaining an example of the “detailed stage” in S606 for the second time and the “determination” in S607 following that when returning from S607. In the example illustrated in FIG. 16, the view acquisition unit 110 generates a view 851 and a view 852 in which each of the view 841 and the view 842 is refined by one stage at the detailing stage. Further, the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 851 and the view 852 are matched.
 図16を参照すると、確認内容853は、各個人に対応する、ビュー851で推測できるセンシティブ属性の「SA」の属性値、ビュー852で推測できるセンシティブ属性の「SA」の属性値、及びそれらの積集合を示す。ここで、その積集合は、ビュー851で推測できるセンシティブ属性の「SA」の属性値と、ビュー852で推測できるセンシティブ属性の「SA」の属性値とを突き合せて推測できるセンシティブ属性の「SA」の属性値である。そして、図16を参照すると、IDが「user4」のレコードにおいて、その突き合せて推測できるセンシティブ属性の「SA」の属性値の数が1である。従って、匿名化部120は、l=2の「複数ビューのl-多様性」が満足されないと判定する。 Referring to FIG. 16, the confirmation content 853 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 851, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 852, and Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be inferred in the view 851 and the attribute value of the sensitive attribute “SA” that can be inferred in the view 852 is “SA” of the sensitive attribute that can be inferred. ”Attribute value. Referring to FIG. 16, in the record whose ID is “user4”, the number of attribute values of the sensitive attribute “SA” that can be estimated by matching is one. Therefore, the anonymization unit 120 determines that “l-diversity of multiple views” with l = 2 is not satisfied.
 次に、複数ビューのl-多様性が満足されない場合(S607でNO)、ビュー取得部110は、その詳細化段階において生成した第2の段階の段階匿名化ビューを、第1の段階の段階匿名化ビューに戻す。続けて、匿名化部120は、その戻した第1の段階の段階匿名化ビューを匿名化ビューとし、出力する(S609)。ここでは、匿名化部120は、ビュー851及びビュー852のそれぞれをビュー841及びビュー842に戻す。続けて、匿名化部120は、ビュー841及びビュー842を出力する。このビュー841及びビュー842は、所要の匿名性(l=2の「複数ビューのl-多様性」)を満足し、かつ最も全体情報損失量の少ない匿名化ビューである。 Next, when the l-diversity of the plurality of views is not satisfied (NO in S607), the view acquisition unit 110 uses the second stage stage anonymized view generated in the detailing stage as the first stage stage. Return to anonymized view. Subsequently, the anonymization unit 120 outputs the returned first-stage anonymization view as an anonymization view (S609). Here, the anonymization unit 120 returns the view 851 and the view 852 to the view 841 and the view 842, respectively. Subsequently, the anonymization unit 120 outputs the view 841 and the view 842. The views 841 and 842 are anonymized views that satisfy the required anonymity (“l-diversity of multiple views of l = 2”) and have the least amount of overall information loss.
 尚、その分割点候補が存在しない場合(S608でNO)、匿名化部120は、その詳細化段階において生成された第2の段階の段階匿名化ビューを匿名化ビューとし、出力する(S610)。 If the candidate for the division point does not exist (NO in S608), the anonymization unit 120 outputs the second stage stage anonymized view generated in the detailing stage as an anonymized view (S610). .
 以上が本変形例の動作の説明である。 The above is the description of the operation of this modification.
 本変形例は、所要の匿名性を満足し、かつ最も全体情報損失量の少ない匿名化ビューを出力することができる。 This modification can output an anonymized view that satisfies the required anonymity and has the least amount of overall information loss.
 <<<第1の実施形態の第2の変形例>>>
 上述の第1の実施形態において、匿名化部120は、複数の段階匿名化ビューのそれぞれについて1回の詳細化段階を実行し、その後「複数ビューのl-多様性」を確認する。例えば、匿名化部120は、ビュー831に対して詳細化段階を実行し、続けてビュー832に対して詳細化段階を実行する。次に、匿名化部120は、ビュー841及びビュー842について、「複数ビューのl-多様性」を確認する。
<<< Second Modification of First Embodiment >>>
In the first embodiment described above, the anonymization unit 120 executes one detailing step for each of the plurality of step anonymized views, and then confirms “l-diversity of multiple views”. For example, the anonymization unit 120 executes the refinement stage on the view 831 and then executes the refinement stage on the view 832. Next, the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 842.
 本第2の変形例においては、匿名化部120は、ビュー831を詳細化し、ビュー841に更新する。次に、匿名化部120が、ビュー841とまだ詳細化されていないビュー832について、「複数ビューのl-多様性」を確認する。 In the second modification, the anonymization unit 120 refines the view 831 and updates it to the view 841. Next, the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 832 that has not been detailed yet.
 換言すると、本変形例では、匿名化部120は、1つのその段階匿名化ビューに対するその詳細化段階に続けて、所要の匿名性が満足されるか否かを判定する。 In other words, in the present modification, the anonymization unit 120 determines whether or not the required anonymity is satisfied following the refinement stage for one stage anonymization view.
 本変形例は、図14のS609における、その詳細化段階により生成された第2の段階の段階匿名化ビューを第1の段階の段階匿名化ビューに戻す処理を削減することが可能になる。 This modified example can reduce the process of returning the second stage stage anonymized view generated by the refinement stage to the first stage stage anonymized view in S609 of FIG.
 <<<第1の実施形態の第3の変形例>>>
 匿名化部120は、複数の詳細化対象の取得ビュー(例えば、ビュー821及びビュー822)のそれぞれに対応する優先度に基づいて、段階匿名化ビューのその詳細化段階を順次実行する。例えば、匿名化部120は、その優先度が相対的に高い段階匿名化ビューほど優先して、その詳細化段階を実行する。
<<< Third Modification of First Embodiment >>>
The anonymization unit 120 sequentially executes the refinement stages of the stage anonymization view based on the priority corresponding to each of the acquisition views (for example, the view 821 and the view 822) to be refined. For example, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view having a relatively high priority.
 例えば、匿名化部120は、図6に示す入力部704を介して操作者が入力したその優先度を受け付ける。 For example, the anonymization unit 120 receives the priority input by the operator via the input unit 704 shown in FIG.
 尚、匿名化部120は、図6に示す記憶部702或いは記憶装置703に、予め記憶されているその優先度を読み出すようにしてよい。また、匿名化部120は、図6に示す通信部706を介して図示しない機器から、その優先度を受信するようにしてもよい。また、匿名化部120は、図6に示す記憶装置703を介して、記録媒体707に記録されたその優先度を読み出すようにしてもよい。 The anonymization unit 120 may read the priority stored in advance in the storage unit 702 or the storage device 703 shown in FIG. Moreover, you may make it the anonymization part 120 receive the priority from the apparatus which is not shown in figure via the communication part 706 shown in FIG. Further, the anonymization unit 120 may read out the priority recorded in the recording medium 707 via the storage device 703 shown in FIG.
 具体的には、匿名化部120は、その優先度が相対的に高い詳細化対象の取得ビューに対応する、段階匿名化ビューから順に、その詳細化段階を実行する。
 また、匿名化部120は、その優先度を、パターン804の組み合わせに基づいて算出してもよい。例えば、匿名化部120は、準識別子が複数のパターン804に含まれている率に基づいて、パターン804のそれぞれが含む準識別子のその率が低いパターン804ほど優先度を高く算出する。
Specifically, the anonymization unit 120 executes the refinement steps in order from the step anonymization view corresponding to the acquisition view to be refined that has a relatively high priority.
Further, the anonymization unit 120 may calculate the priority based on the combination of the patterns 804. For example, based on the rate at which the quasi-identifier is included in the plurality of patterns 804, the anonymization unit 120 calculates a higher priority for the pattern 804 having a lower rate of the quasi-identifier included in each pattern 804.
 具体的には、パターン804として{QI_1,SA}、{QI_1,QI_2,SA}、{QI_3,SA}の3つのパターン804が入力されたとする。この時、「QI_1」は、2つのパターン804({QI_1,SA}、{QI_1,QI_2,SA})に含まれている。しかしながら、「QI_2」は、1つのパターン804({QI_1,QI_2,SA})にしか、「QI_3」は1つのパターン804({QI_3,SA})にしか、含まれていない。この時、匿名化部120は、「QI_2」と「QI_3」とは、1つのパターン804にしか含まれていないので、「QI_1」よりも重要であると判断してもよい。そして、「QI_2」及び「QI_3」のいずれかが含まれている{QI_1,QI_2,SA}及び{QI_3,SA}の優先度を「2」として高くし、{QI_1,SA}の優先度を「1」として低く設定してもよい。 
 また、匿名化部120は、その優先度と全体情報損失量との両方に基づいて、段階匿名化ビューのその詳細化段階を順次実行するようにしてもよい。例えば、匿名化部120は、段階匿名化ビューのそれぞれに対応するその優先度とその全体情報損失量とを乗じた値が大きい段階匿名化ビューほど優先して、その詳細化段階を実行するようにしてよい。
Specifically, it is assumed that three patterns 804 of {QI_1, SA}, {QI_1, QI_2, SA}, and {QI_3, SA} are input as the pattern 804. At this time, “QI_1” is included in two patterns 804 ({QI_1, SA}, {QI_1, QI_2, SA}). However, "QI_2" is included only in one pattern 804 ({QI_1, QI_2, SA}), and "QI_3" is included only in one pattern 804 ({QI_3, SA}). At this time, the anonymization unit 120 may determine that “QI — 2” and “QI — 3” are more important than “QI — 1” because they are included in only one pattern 804. Then, the priority of {QI_1, QI_2, SA} and {QI_3, SA} including either “QI_2” or “QI_3” is increased to “2”, and the priority of {QI_1, SA} is increased. “1” may be set low.
Further, the anonymization unit 120 may sequentially execute the detailed steps of the step anonymization view based on both the priority and the total information loss amount. For example, the anonymization unit 120 gives priority to the stage anonymization view having a larger value obtained by multiplying the priority corresponding to each of the stage anonymization views and the total information loss amount, and executes the refinement stage. You can do it.
 例えば、匿名化部120は、第1の取得ビューの優先度が「2」で、第2の取得ビューの優先度が「1」である場合、第1の取得ビューの情報損失量を2倍にした値と、第2の取得ビューの情報損失量を1倍した値とを比較する。そして、匿名化部120は、値が大きい方の取得ビューに対応する段階匿名化ビューを優先してその詳細化段階を実行する。 For example, the anonymization unit 120 doubles the information loss amount of the first acquisition view when the priority of the first acquisition view is “2” and the priority of the second acquisition view is “1”. And the value obtained by multiplying the amount of information loss of the second acquired view by one. Then, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view corresponding to the acquisition view having the larger value.
 本変形例は、第1に、その優先度に基づいて、特定のビュー850の情報損失量を相対的に小さくすることが可能になるという効果を有する。 First, this modification has an effect that the information loss amount of a specific view 850 can be relatively reduced based on the priority.
 本変形例は、第2に、その優先度をパターン804の組み合わせに基づいて算出し、取得ビューに含まれる準識別子に対応して、特定のビュー850の情報損失量を相対的に小さくすることが可能になるという効果を有する。 Second, this modification calculates the priority based on the combination of the patterns 804, and relatively reduces the information loss amount of the specific view 850 corresponding to the quasi-identifier included in the acquired view. Has the effect of becoming possible.
 本変形例は、第3に、その優先度と情報損失量とから算出される値に基づいて、特定のビュー850の情報損失量を相対的に小さくすることが可能になるという効果を有する。 Third, this modification has an effect that the information loss amount of the specific view 850 can be relatively reduced based on the value calculated from the priority and the information loss amount.
 <<<第2の実施形態>>>
 次に、本発明の第2の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。
<<< Second Embodiment >>>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
 図17は、本発明の第2の実施形態に係る匿名化装置200の構成を示すブロック図である。 FIG. 17 is a block diagram showing a configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.
 図17に示すように、本実施形態における匿名化装置200は、第1の実施形態の匿名化装置100と比べて、類似度算出部230を更に含む。また、匿名化装置200は、匿名化部120に替えて、匿名化部220を含む。 As shown in FIG. 17, the anonymization device 200 according to the present embodiment further includes a similarity calculation unit 230 as compared with the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 includes an anonymization unit 220 instead of the anonymization unit 120.
 ===類似度算出部230===
 類似度算出部230は、等価クラス毎に含まれるセンシティブ属性値の種類を抽出する。ここで、その等価クラスは、ある詳細化段階の実行前の段階匿名化ビューに対する複数の分割点候補のそれぞれで分割された場合の等価クラスである。以後、その抽出されたセンシティブ属性値の種類の組み合わせを、「SA組み合わせ」と呼ぶ。
=== Similarity Calculation Unit 230 ===
The similarity calculation unit 230 extracts the types of sensitive attribute values included for each equivalent class. Here, the equivalence class is an equivalence class when it is divided at each of a plurality of division point candidates for the stage anonymized view before execution of a certain refinement stage. Hereinafter, the combination of the extracted types of sensitive attribute values is referred to as “SA combination”.
 尚、いずれかの段階匿名化ビューに分割点候補がない場合、類似度算出部230は、分割点候補を有する段階匿名化ビューについてのみ、そのセンシティブ属性値を抽出するようにしてよい。 In addition, when there is no division point candidate in any of the stage anonymization views, the similarity calculation unit 230 may extract the sensitive attribute value only for the stage anonymization view having the division point candidates.
 次に、類似度算出部230は、そのSA組み合わせに基づいて、その複数の分割点候補のそれぞれに対応する、類似度を算出する。その類似度は、その分割点候補で詳細化段階を実行された場合のそれらの各段階匿名化ビュー間の類似度である。 Next, the similarity calculation unit 230 calculates the similarity corresponding to each of the plurality of division point candidates based on the SA combination. The degree of similarity is the degree of similarity between the respective stage anonymized views when the detailing stage is executed with the division point candidate.
 例えば、類似度算出部230は、パーソナル情報810のレコード毎の、それらの各段階匿名化ビューのそのSA組み合わせ間の編集距離に基づいて、その類似度を算出する。 For example, the similarity calculation unit 230 calculates the similarity based on the edit distance between the SA combinations of the respective stage anonymized views for each record of the personal information 810.
 ここで、SA組み合わせ間の編集距離とは、任意の個数のセンシティブ属性値を含む2つのSA組み合わせにおいて、片方のそのSA組み合わせから、何回の編集(削除、追加)で他方のそのSA組み合わせになるかで距離を示す方法である。 Here, the editing distance between SA combinations means that, in two SA combinations including an arbitrary number of sensitive attribute values, the number of editing (deletion or addition) from one SA combination to the other SA combination. This is a method of indicating the distance.
 例えば、 そのSA組み合わせの「{A,B,C}」から、そのSA組み合わせの「{B,C,D}」への編集は、「A」を1つ削除し、「D」を1つ追加する編集であるので、その編集距離は「2」である。 For example, when editing the SA combination “{A, B, C}” to the SA combination “{B, C, D}”, delete one “A” and one “D”. Since the editing is to be added, the editing distance is “2”.
 具体的には、第1に、類似度算出部230は、各段階匿名化ビューについて、段階匿名化ビューがある分割点で詳細化段階を実行されても所要のl-多様性を満たすその分割点を、分割点候補として抽出する。 Specifically, first, the similarity calculation unit 230 divides each stage anonymized view to satisfy the required l-diversity even if the refinement stage is executed at a division point with the stage anonymized view. A point is extracted as a division point candidate.
 図18は、類似度算出部230が抽出するビュー831の分割点候補を示す。また、図19は、類似度算出部230が抽出するビュー832の分割点候補を示す。 FIG. 18 shows split point candidates of the view 831 extracted by the similarity calculation unit 230. Further, FIG. 19 shows the dividing point candidates of the view 832 extracted by the similarity calculation unit 230.
 第2に、類似度算出部230は、各段階匿名化ビューの分割点候補毎の、SA組み合わせを抽出する。 Second, the similarity calculation unit 230 extracts the SA combination for each division point candidate of each stage anonymized view.
 図20は、ビュー831の分割点候補の「a1」、「a2」、「a3」及び「a4」と、ビュー832の分割点候補の「b1」、「b2」、「b3」及び「b4」とのそれぞれに対応する、そのSA組み合わせを示す。 FIG. 20 shows “a1”, “a2”, “a3”, and “a4” as the division point candidates for the view 831, and “b1”, “b2”, “b3”, and “b4” as the division point candidates for the view 832. The SA combinations corresponding to the above are shown.
 第3に、類似度算出部230は、各段階匿名化ビュー(ここでは、ビュー831及びビュー832のそれぞれ)間の類似度を、各段階匿名化ビューのそれぞれの分割点候補の組み合わせ(以下、「候補組み合わせ」と呼ぶ)毎に算出する。 Third, the similarity calculation unit 230 calculates the similarity between each stage anonymized view (here, each of the view 831 and the view 832) by combining each division point candidate of each stage anonymized view (hereinafter, This is called “candidate combination”).
 例えば、「a1」と「b1」との候補組み合わせの場合、いずれのIDのレコードにおいてもそのSA組み合わせは一致しているため、編集距離は全て「0」であり、その合計も「0」であるため、類似度は「0」である。 For example, in the case of a candidate combination of “a1” and “b1”, since the SA combination is the same in any ID record, the edit distances are all “0” and the sum is also “0”. Therefore, the similarity is “0”.
 例えば、「a1」と「b2」との候補組み合わせの場合、IDが「user3」のレコードにおいて、ビュー831に対応するそのSA組み合わせは「{A,B,C}」、ビュー832に対応するそのSA組み合わせは「{A,B}」であり、その編集距離は「1」である。その他のIDのレコードにおいて、そのSA組み合わせは一致しているため、編集距離は「0」であり、その合計は「1」であるため、類似度は「-1」である。尚、編集距離が小さいほど、類似度は大きいので、編集距離の合計値の符号を反転した値を類似度とする。 For example, in the case of a candidate combination of “a1” and “b2”, in the record whose ID is “user3”, the SA combination corresponding to the view 831 is “{A, B, C}”, that corresponding to the view 832 The SA combination is “{A, B}”, and the edit distance is “1”. In the other ID records, the SA combinations match, so the edit distance is “0”, and the sum is “1”, so the similarity is “−1”. Since the similarity is greater as the editing distance is smaller, the similarity is determined by inverting the sign of the total editing distance.
 同様にして、類似度算出部230は、候補組み合わせの全てについて、類似度を算出する。この場合、「a1」と「b1」、「a2」と「b2」、「a3」と「b3」及び「a4」と「b4」のそれぞれの候補組み合わせが、最も類似度が高い。 Similarly, the similarity calculation unit 230 calculates the similarity for all candidate combinations. In this case, each candidate combination of “a1” and “b1”, “a2” and “b2”, “a3” and “b3”, and “a4” and “b4” has the highest similarity.
 また、類似度算出部230は、パーソナル情報810のレコード毎の、それらの各段階匿名化ビュー間のSA組み合わせの積集合の個数に基づいて、その個数が多いほどその類似度が大きいと判断するようにしてもよい。 Also, the similarity calculation unit 230 determines that the greater the number is, the greater the similarity is based on the number of product sets of SA combinations between the respective stages of anonymized views for each record of the personal information 810. You may do it.
 ===匿名化部220===
 本実施形態の匿名化部220は、類似度算出部230が算出した類似度に基づいて、その詳細化段階における等価クラスの分割を実行する際の、分割点を決定する。
=== Anonymizing unit 220 ===
The anonymization unit 220 according to the present embodiment determines a division point when performing the division of the equivalent class in the refinement stage based on the similarity calculated by the similarity calculation unit 230.
 具体的には、匿名化部220は、その類似度が最も高い候補組み合わせに含まれる分割点候補を、分割点として決定する。尚、匿名化部220は、その類似度が最も高い候補組み合わせが複数存在する場合は、例えば平均値に近い分割点候補を含む候補組み合わせを選択する。また、その場合、匿名化部220は、中央値(Median) に近い分割点候補を含む候補組み合わせを選択するようにしてもよい。 Specifically, the anonymization unit 220 determines a division point candidate included in the candidate combination having the highest similarity as a division point. In addition, when there are a plurality of candidate combinations having the highest similarity, the anonymization unit 220 selects a candidate combination including, for example, a division point candidate close to the average value. In that case, the anonymization unit 220 may select a candidate combination including a division point candidate close to the median (Median).
 例えば、図3のビュー821を参照すると、「QI_1」の平均値は「13」なので、これに近い分割点候補は、「a2」及び「a3」である。また、図4のビュー822を参照すると、「QI_2」の平均値は「22」なので、これに近い分割点候補は、「b3」である。従って、ここでは、匿名化部220は、「a3」と「b3」とを選択(決定)する。 For example, referring to the view 821 in FIG. 3, the average value of “QI_1” is “13”, and the candidate division points close to this are “a2” and “a3”. Further, referring to the view 822 in FIG. 4, since the average value of “QI_2” is “22”, a candidate for a dividing point close to this is “b3”. Accordingly, here, the anonymization unit 220 selects (determines) “a3” and “b3”.
 次に、匿名化部220は、その決定した分割点で詳細化段階を実行する。 Next, the anonymization unit 220 executes the refinement step at the determined division point.
 次に、本実施形態の動作について、図面を参照して詳細に説明する。 Next, the operation of this embodiment will be described in detail with reference to the drawings.
 図21は、本実施形態の動作を示すフローチャートである。 FIG. 21 is a flowchart showing the operation of the present embodiment.
 ビュー取得部110は、パーソナル情報810及びパターン804を取得する(S601)。 The view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).
 次に、ビュー取得部110は、パターン804のそれぞれに基づいて、パーソナル情報810からビュー821及びビュー822を取得する(S602)。 Next, the view acquisition unit 110 acquires the view 821 and the view 822 from the personal information 810 based on each of the patterns 804 (S602).
 次に、匿名化部220は、ビュー821及びビュー822のそれぞれに含まれる準識別子を最も汎化した状態のビュー831及びビュー832を生成する(S603)。 Next, the anonymization unit 220 generates a view 831 and a view 832 that are the most generalized quasi-identifiers included in each of the view 821 and the view 822 (S603).
 次に、類似度算出部230は、段階匿名化ビュー(ビュー831及びビュー832)のそれぞれの類似度を算出する(S624)。 Next, the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 831 and view 832) (S624).
 次に、匿名化部220は、その類似度に基づいて、分割点を決定する(S625)。例えば、匿名化部220は、1回目のS625の処理において、分割点候補「a3」及び「b3」を、分割点に決定する。 Next, the anonymization unit 220 determines a division point based on the similarity (S625). For example, the anonymization unit 220 determines the division point candidates “a3” and “b3” as the division points in the first process of S625.
 次に、匿名化部220は、各段階匿名化ビューに対して、その決定した分割点で、詳細化段階を実行する(S606)。 Next, the anonymization unit 220 executes the refinement step for each step anonymization view at the determined division point (S606).
 次に、匿名化部220は、その詳細化段階を実行された各段階匿名化ビューが突き合わされて分析された場合に、複数ビューのl-多様性が満足されるか否かを判定する(S607)。 Next, the anonymization unit 220 determines whether or not l-diversity of a plurality of views is satisfied when each stage anonymized view that has been subjected to the refinement stage is matched and analyzed ( S607).
 図22は、1回目のS606及びS607の動作を説明する図である。図22に示す例では、S606において、匿名化部220がビュー831及びビュー832に対して、分割点「a3」及び「b3」で詳細化段階を実行し、ビュー871及びビュー872を生成する。次に、匿名化部220は、ビュー871及びビュー872を突き合せた場合の、各個人に対応するレコード毎の、推測可能なセンシティブ属性値を抽出する。 FIG. 22 is a diagram for explaining the first operation of S606 and S607. In the example illustrated in FIG. 22, in S606, the anonymization unit 220 executes the refinement stage on the view 831 and the view 832 at the division points “a3” and “b3”, and generates the view 871 and the view 872. Next, the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 871 and the view 872 are matched.
 図22を参照すると、確認内容873は、各個人に対応する、ビュー871で推測できるセンシティブ属性の「SA」の属性値、ビュー872で推測できるセンシティブ属性の「SA」の属性値、及びそれらの積集合を示す。ここで、その積集合は、ビュー871で推測できるセンシティブ属性の「SA」の属性値と、ビュー872で推測できるセンシティブ属性の「SA」の属性値とを突き合せて推測できるセンシティブ属性の「SA」の属性値である。そして、図22を参照すると、いずれのレコードにおいても、突き合せて推測できるセンシティブ属性の属性値の数は2以上である。従って、匿名化部220は、l=2の「複数ビューのl-多様性」が満足されると判定する。 Referring to FIG. 22, the confirmation content 873 includes attribute values “SA” of the sensitive attribute that can be inferred in the view 871, attribute values of “SA” in the sensitive attribute that can be inferred in the view 872, and their corresponding values. Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 871 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 872 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value. Then, referring to FIG. 22, in any record, the number of sensitive attribute attribute values that can be estimated by matching is two or more. Therefore, the anonymization unit 220 determines that “l-diversity of multiple views” with l = 2 is satisfied.
 次に、複数ビューのl-多様性が満足される場合(S607でYES)、匿名化部220は、いずれかの段階匿名化ビューに分割点候補が存在するか否かを判定する(S608)。そして、その分割点候補が存在する場合(S608でYES)、処理はS624へ戻る。 Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). . If the candidate for division point exists (YES in S608), the process returns to S624.
 次に、類似度算出部230は、段階匿名化ビュー(ビュー841及びビュー842)のそれぞれの類似度を算出する(S624)。 Next, the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 841 and view 842) (S624).
 次に、匿名化部220は、その類似度に基づいて、分割点を決定する(S625)。 Next, the anonymization unit 220 determines a division point based on the similarity (S625).
 次に、匿名化部220は、各段階匿名化ビューに対して、その決定した分割点で、詳細化段階を実行する(S606)。 Next, the anonymization unit 220 executes the refinement step for each step anonymization view at the determined division point (S606).
 次に、匿名化部220は、その詳細化段階を実行された各段階匿名化ビューが突き合わされて分析された場合に、複数ビューのl-多様性が満足されるか否かを判定する(S607)。 Next, the anonymization unit 220 determines whether or not l-diversity of a plurality of views is satisfied when each stage anonymized view that has been subjected to the refinement stage is matched and analyzed ( S607).
 図23は、2回目のS606及びS607の動作を説明する図である。図23に示す例では、S606において、匿名化部220がビュー871及びビュー872に対して、2回目のS625で決定された分割点で詳細化段階を実行し、ビュー881及びビュー882を生成する。次に、匿名化部220は、ビュー881及びビュー882を突き合せた場合の、各個人に対応するレコード毎の、推測可能なセンシティブ属性値を抽出する。 FIG. 23 is a diagram for explaining the second operation of S606 and S607. In the example illustrated in FIG. 23, in S606, the anonymization unit 220 performs the refinement step on the view 871 and the view 872 at the division point determined in the second S625, and generates the view 881 and the view 882. . Next, the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 881 and the view 882 are matched.
 図23を参照すると、確認内容883は、各個人に対応する、ビュー881で推測できるセンシティブ属性の「SA」の属性値、ビュー882で推測できるセンシティブ属性の「SA」の属性値、及びそれらの積集合を示す。ここで、その積集合は、ビュー881で推測できるセンシティブ属性の「SA」の属性値と、ビュー882で推測できるセンシティブ属性の「SA」の属性値とを突き合せて推測できるセンシティブ属性の「SA」の属性値である。そして、図23を参照すると、いずれのレコードにおいても、突き合せて推測できるセンシティブ属性の属性値の数は2以上である。従って、匿名化部220は、l=2の「複数ビューのl-多様性」が満足されると判定する。 Referring to FIG. 23, the confirmation content 883 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 881, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 882, and their corresponding values. Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 881 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 882 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value. Referring to FIG. 23, in any record, the number of sensitive attribute attribute values that can be estimated by matching is two or more. Therefore, the anonymization unit 220 determines that “l-diversity of multiple views” with l = 2 is satisfied.
 次に、複数ビューのl-多様性が満足される場合(S607でYES)、匿名化部220は、いずれかの段階匿名化ビューに分割点候補が存在するか否かを判定する(S608)。 Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). .
 次に、その分割点候補が存在しない場合(S608でNO)、匿名化部220は、その詳細化段階を実行された各段階匿名化ビュー(第2の段階の段階匿名化ビュー)を匿名化ビューとし、出力する(S610)。ここでは、匿名化部220は、ビュー881及びビュー882を出力する。 Next, when the division point candidate does not exist (NO in S608), the anonymization unit 220 anonymizes each stage anonymization view (second stage stage anonymization view) for which the detailing stage has been executed. A view is output (S610). Here, the anonymization unit 220 outputs the view 881 and the view 882.
 尚、複数ビューのl-多様性が満足されない場合(S607でNO)、匿名化部220は、その詳細化段階を実行された各段階匿名化ビューを、その詳細化段階を実行前の各段階匿名化ビュー(第1の段階の段階匿名化ビュー)に戻して匿名化ビューとする。次に、匿名化部220は、その匿名化ビューを出力する(S609)。 If the l-diversity of the plurality of views is not satisfied (NO in S607), the anonymization unit 220 displays each stage anonymized view that has been subjected to the refinement stage, and each stage prior to the execution of the refinement stage. The anonymized view is returned to the anonymized view (the first stage anonymized view). Next, the anonymization unit 220 outputs the anonymized view (S609).
 本実施形態の匿名化装置200がパーソナル情報810とパターン804に基づいて出力する匿名化ビューは、第1の実施形態の匿名化装置100が出力する匿名化ビューに比べて、情報損失量が小さい。具体的には、ビュー881及びビュー882のそれぞれは、ビュー841及びビュー842のそれぞれより、分割回数が1回多い。 The anonymization view output from the anonymization device 200 according to the present embodiment based on the personal information 810 and the pattern 804 has a smaller amount of information loss than the anonymization view output from the anonymization device 100 according to the first embodiment. . Specifically, each of the view 881 and the view 882 has one more division than the view 841 and the view 842.
 この差異が発生する理由は、以下の通りである。 The reason why this difference occurs is as follows.
 特定の個人について推測できるセンシティブ属性値の候補(SA組み合わせ)について、各段階匿名化ビューを突き合せた場合のSA組み合わせの積集合が小さくなると、複数ビューのl-多様性は小さくなる。 For the sensitive attribute value candidates (SA combinations) that can be inferred for a specific individual, the l-diversity of multiple views decreases as the product set of SA combinations when matching each stage anonymized view decreases.
 第1の実施形態の匿名化装置100においては、詳細化段階での分割点は、必ずしも最適な分割点ではなく、各段階匿名化ビューを突き合せた場合のSA組み合わせの積集合が相対的に小さくなるような分割点である場合がある。そのため、匿名化装置100は、詳細化段階の繰り返しの途中で、複数ビューのl-多様性が満足できなくなる。結果として、匿名化装置100は、匿名化装置200に比べて、分割回数が減り、相対的に情報損失が大きくなる。 In the anonymization device 100 of the first embodiment, the division point at the detailing stage is not necessarily the optimum division point, and the product set of SA combinations when matching each stage anonymization view is relatively There may be a division point that becomes smaller. Therefore, the anonymization device 100 cannot satisfy the l-diversity of the plurality of views in the course of repeating the detailing stage. As a result, compared with the anonymization device 200, the anonymization device 100 reduces the number of divisions and relatively increases information loss.
 本実施形態の匿名化装置200は、各個人について、各段階匿名化ビューから推測できるSA組み合わせが類似するように、詳細化段階を実行する。具体的には、匿名化装置200は、各個人に対応するレコード毎の、各段階匿名化ビュー間の等価クラスのSA組み合わせが類似するような分割点で分割を行う。従って、匿名化装置200は、より多くの詳細化段階を実行することが可能となる。結果的に、匿名化装置200は、情報損失がより小さく、所要の匿名性を満足する匿名化ビューを出力することが可能になる。 The anonymization apparatus 200 according to the present embodiment executes the refinement step so that the SA combinations that can be inferred from each step anonymization view are similar for each individual. Specifically, the anonymization apparatus 200 performs the division at the division points where the SA combinations of the equivalent classes between the respective stages of anonymized views are similar for each record corresponding to each individual. Therefore, the anonymization device 200 can execute more detailed steps. As a result, the anonymization device 200 can output an anonymization view that has smaller information loss and satisfies the required anonymity.
 図24は、パーソナル情報810全体を1つの取得ビューとして、関連技術により匿名化されたビュー893を示す。図24に示すように、ビュー893はパーソナル情報810が2分割されたものである。ビュー893における「QI_1」及び「QI_2」のそれぞれの情報損失量は、「25」及び「49」である。 FIG. 24 shows a view 893 that is anonymized by related technology with the entire personal information 810 as one acquired view. As shown in FIG. 24, the view 893 is obtained by dividing the personal information 810 into two parts. The information loss amounts of “QI_1” and “QI_2” in the view 893 are “25” and “49”, respectively.
 一方、本実施形態の出力であるビュー841の「QI_1」及びビュー842の「QI_2」のそれぞれの情報損失量は、「17」及び「35」である。即ち、本実施形態による匿名化は、関連技術によるその匿名化よりも、情報損失量を小さくすることができる。 On the other hand, the information loss amounts of “QI_1” of the view 841 and “QI_2” of the view 842, which are outputs of the present embodiment, are “17” and “35”, respectively. That is, the anonymization according to the present embodiment can reduce the amount of information loss compared with the anonymization according to the related technology.
 上述した本実施形態における効果は、第1の実施形態の効果に加えて、出力する匿名化ビューの情報損失量を、より小さくすることが可能になることである。 The effect of this embodiment described above is that, in addition to the effect of the first embodiment, the amount of information loss of the anonymized view to be output can be further reduced.
 その理由は、匿名化部220が、類似度算出部230により算出されたその等価クラスのSA組み合わせの類似度に基づいて、分割点を決定するようにしたからである。 The reason is that the anonymization unit 220 determines the division point based on the similarity of the SA combination of the equivalent class calculated by the similarity calculation unit 230.
 <<<第2の実施形態の変形例>>>
 類似度算出部230は、その類似度を算出する場合に、分割される等価クラスの準識別子の平均値の点に隣接するその分割点候補を対象として、その類似度を算出する。尚、類似度算出部230は、その場合に、分割される等価クラスの準識別子の中央値の点に隣接するその分割点候補を対象として、その類似度を算出するようにしてもよい。
<<< Modification of Second Embodiment >>>
When calculating the similarity, the similarity calculation unit 230 calculates the similarity for the division point candidate adjacent to the average value of the quasi-identifiers of the equivalent classes to be divided. In this case, the similarity calculation unit 230 may calculate the similarity with respect to the division point candidate adjacent to the median point of the quasi-identifier of the equivalent class to be divided.
 その隣接する分割点候補の個数は、予め設定された個数であってよい。また、その隣接する分割点候補の個数は、例えば、設定された取得ビューの個数(パターン804の個数)に基づいて決定されてもよい。 The number of adjacent division point candidates may be a preset number. Further, the number of adjacent division point candidates may be determined based on, for example, the set number of acquired views (the number of patterns 804).
 例えば、その隣接する分割点候補の個数の基準が、取得ビューの個数が2つの際に、それらの取得ビューに対応するビュー850のそれぞれにおけるその隣接する分割点候補の個数は10個であるとする。この場合、例えば、類似度算出部230は、取得ビューの個数が4つの際に、その隣接する分割点候補の個数を5個とする。即ち、類似度算出部230は、取得ビューの個数が2倍なので、分割点候補の個数を0.5(2の逆数)倍とする。 For example, when the number of adjacent division point candidates is two acquisition views, the number of adjacent division point candidates in each of the views 850 corresponding to the acquisition views is 10. To do. In this case, for example, when the number of acquired views is four, the similarity calculation unit 230 sets the number of adjacent division point candidates to five. That is, since the number of acquired views is double, the similarity calculation unit 230 sets the number of division point candidates to 0.5 (the reciprocal of 2).
 本変形例は、第1に匿名化装置200における処理の負荷を低減することができるという効果を有する。 This modification has the effect that the processing load in the anonymization device 200 can be reduced first.
 本変形例は、第2に取得ビューの数に基づいて、処理の負荷を制御することができるという効果を有する。換言すると、本変形例は、取得ビューが多くなると計算量が多くなるという傾向を、防止することができるという効果がある。
  尚、第2の実施形態は、第1の実施形態の第1乃至3の変形例と同様の変形を適用されてもよい。
Second, this modification has an effect that the processing load can be controlled based on the number of acquired views. In other words, the present modification has an effect of preventing the tendency that the amount of calculation increases as the number of acquired views increases.
The second embodiment may be modified in the same manner as the first to third modifications of the first embodiment.
 <<<第3の実施形態>>>
 次に、本発明の第3の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。
<<< Third Embodiment >>>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
 図25は、本発明の第3の実施形態に係る匿名化装置300の構成を示すブロック図である。 FIG. 25 is a block diagram showing a configuration of the anonymization device 300 according to the third exemplary embodiment of the present invention.
 図25に示すように、本実施形態における匿名化装置200は、第1の実施形態の匿名化装置100と比べて、ビュー取得部110に替えて、ビュー取得部310を含む。 25, the anonymization device 200 in the present embodiment includes a view acquisition unit 310 instead of the view acquisition unit 110, as compared to the anonymization device 100 of the first embodiment.
 ===ビュー取得部310===
 ビュー取得部310は、あるパーソナル情報(例えば、図2に示すパーソナル情報810)に含まれるその属性の相互の関係に基づいて、取得ビューを取得する。例えば、ビュー取得部310は、属性間の相関が強い属性を含む取得ビューを取得する。
=== View Acquisition Unit 310 ===
The view acquisition unit 310 acquires an acquired view based on the mutual relationship between attributes included in certain personal information (for example, personal information 810 illustrated in FIG. 2). For example, the view acquisition unit 310 acquires an acquisition view that includes an attribute having a strong correlation between attributes.
 「属性間の相関が強い属性を含む取得ビューの取得」について、具体例を示して説明する。 ”" Acquisition of acquisition view including attributes with strong correlation between attributes "will be described with a specific example.
 例えば、そのパーソナル情報は、4つの属性「QI_1」、「QI_2」、「QI_3」及び「SA」を含む。 For example, the personal information includes four attributes “QI — 1”, “QI — 2”, “QI — 3”, and “SA”.
 それらの属性間の相関は、以下の通りであるとする。「QI_1」と「SA」間の相関係数は「0.8」、「QI_2」と「SA」間の相関係数は「0.1」及び「QI_3」と「SA」間の相関係数は「0.7」である。 Suppose the correlation between these attributes is as follows. The correlation coefficient between “QI_1” and “SA” is “0.8”, the correlation coefficient between “QI_2” and “SA” is “0.1”, and the correlation coefficient between “QI — 3” and “SA”. Is “0.7”.
 また、判定用閾値は「0.5」であるとする。 Also, it is assumed that the determination threshold is “0.5”.
 この場合、ビュー取得部310は、「QI_1、SA」及び「QI_3、SA」のそれぞれの強相関パターンにより、取得ビューを取得する。ここで、その強相関パターンの構造は、図5に示すパターン804の構造と同様である。 In this case, the view acquisition unit 310 acquires an acquired view based on the strong correlation patterns of “QI_1, SA” and “QI_3, SA”. Here, the structure of the strong correlation pattern is the same as the structure of the pattern 804 shown in FIG.
 尚、ビュー取得部310は、相関ルール分析 (Association Rule Mining)を用いて、強相関パターンを決定してもよい。相関ルール分析は、複数の属性相互間での相関の強さを判断できる。 Note that the view acquisition unit 310 may determine a strong correlation pattern by using an association rule analyzer (Association Rule Mining). The correlation rule analysis can determine the strength of correlation between a plurality of attributes.
 例えば、supportが10%で、confidenceが80%以上の相関ルールを検出した結果、「(QI_1=a)→(SA=X)」と「(QI_1=b,QI_3=c)→(SA=Y)」という2つの相関ルールが算出されたとする。 For example, as a result of detecting an association rule having a support of 10% and a confidence of 80% or more, “(QI_1 = a) → (SA = X)” and “(QI_1 = b, QI_3 = c) → (SA = Y ) ”Is calculated.
 この場合、ビュー取得部310は、 「QI_1、SA」と「QI_1、QI_3、SA」とを強相関パターンとして取得ビューを取得する。 In this case, the view acquisition unit 310 acquires an acquisition view using “QI_1, SA” and “QI_1, QI_3, SA” as strong correlation patterns.
 上述した本実施形態における効果は、第1の実施形態の効果に加えて、人がパターン804を予め決定する必要がないことである。 The effect of the present embodiment described above is that, in addition to the effect of the first embodiment, it is not necessary for a person to determine the pattern 804 in advance.
 その理由は、ビュー取得部310がそのパーソナル情報に含まれる属性の相互の関係に基づいて、取得ビューを取得するようにしたからである。 The reason is that the view acquisition unit 310 acquires the acquired view based on the mutual relationship of attributes included in the personal information.
 <<<第3の実施形態の第1の変形例>>>
 ビュー取得部310は、既取得の取得ビューについての学習結果に基づいて、新たな取得ビューを取得する。例えば、ビュー取得部310は、以前に入力された図5に示すようなパターン804の組み合わせを保存し、パターン804のその複数の組み合わせに含まれる強相関パターン間の相関を学習する。次に、ビュー取得部310は、その学習した結果に基づいて、新たに入力されたパターン804との相関が強い強相関パターンで取得ビューを取得する。
<<< First Modification of Third Embodiment >>>
The view acquisition unit 310 acquires a new acquired view based on the learning result of the acquired acquisition view. For example, the view acquisition unit 310 stores a previously input combination of patterns 804 as illustrated in FIG. 5 and learns correlations between strong correlation patterns included in the plurality of combinations of patterns 804. Next, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern having a strong correlation with the newly input pattern 804 based on the learned result.
 「相関が強い強相関パターンで取得ビューを取得」について、具体例を示して説明する。 “The acquisition view is acquired with a strong correlation pattern with strong correlation” will be described with a specific example.
 例えば、そのパーソナル情報は、6つの属性「QI_1」、「QI_2」、「QI_3」、「QI_4」、「QI_5」及び「SA」を含む。 For example, the personal information includes six attributes “QI — 1”, “QI — 2”, “QI — 3”, “QI — 4”, “QI — 5”, and “SA”.
 また、以下の4つのパターン804が保存されている。 In addition, the following four patterns 804 are stored.
 第1のパターン804の組み合わせ:{QI_1,SA},{QI_2,SA}。 Combination of first pattern 804: {QI_1, SA}, {QI_2, SA}.
 第2のパターン804の組み合わせ:{QI_1,SA},{QI_2,SA},{QI_3,SA}。 Combination of second patterns 804: {QI_1, SA}, {QI_2, SA}, {QI_3, SA}.
 第3のパターン804の組み合わせ:{QI_1,SA},{QI_2,SA},{QI_5,SA}。 Combination of third patterns 804: {QI_1, SA}, {QI_2, SA}, {QI_5, SA}.
 第4のパターン804の組み合わせ:{QI_1,SA},{QI_4,SA}。 Combination of the fourth pattern 804: {QI_1, SA}, {QI_4, SA}.
 ビュー取得部310は、相関ルールマイニングによりこれらのパターン804の組み合わせを学習し、以下の学習結果を得る。 The view acquisition unit 310 learns a combination of these patterns 804 by association rule mining, and obtains the following learning result.
 「{QI_1,SA}→{QI_2,SA}」は、「support=100%」及び 「confidence=75%」である。ここで、「support」は、保存されている全てのパターン804の内、{QI_1,SA}を含むパターン804の割合である。また、「confidence」は、{QI_1,SA}を含むパターン804の内、{QI_2,SA}を含む割合である。 “{QI_1, SA} → {QI_2, SA}” is “support = 100%” and “confidence = 75%”. Here, “support” is the ratio of the pattern 804 including {QI_1, SA} among all the stored patterns 804. “Confidence” is a ratio including {QI_2, SA} in the pattern 804 including {QI_1, SA}.
 「{QI_2,SA}→ {QI_1,SA}」は、「support=75%」及び「 confidence=100%」である。 “{QI_2, SA} → {QI_1, SA}” is “support = 75%” and “confidence = 100%”.
 新たに、第5のパターン804として{QI_1,SA}が入力された場合、ビュー取得部310はその学習結果に基づいて、{QI_2,SA}の取得ビューも合わせて取得する。 When {QI_1, SA} is newly input as the fifth pattern 804, the view acquisition unit 310 also acquires the acquired view of {QI_2, SA} based on the learning result.
 尚、ビュー取得部310は、分類木学習などにより、保存されているパターン804を学習するようにしてもよい。 Note that the view acquisition unit 310 may learn the stored pattern 804 by classification tree learning or the like.
 また、ビュー取得部310は、新たなパターン804の入力なしに、その学習結果に基づいて、取得ビューを取得するようにしてもよい。この場合、ビュー取得部310は、support及びconfidenceについての判定用閾値に基づいて、強相関パターンを決定するようにしてよい。 Further, the view acquisition unit 310 may acquire an acquired view based on the learning result without inputting a new pattern 804. In this case, the view acquisition unit 310 may determine a strong correlation pattern based on a determination threshold for support and confidence.
 以上説明したように、匿名化装置300は、例えば、複数の人が入力したパターン804を学習することによって、特定の取得ビューとセットで利用されやすい取得ビューを生成することができる。換言すると、多くの人がよく行う「パターン」での分析に適した、匿名化が可能になる。その理由は、保存されたパターン804を学習した結果に基づいて、取得ビューを取得するようにしたからである。 As described above, the anonymization device 300 can generate an acquisition view that can be easily used in combination with a specific acquisition view, for example, by learning a pattern 804 input by a plurality of people. In other words, anonymization suitable for analysis by “pattern” that is often performed by many people is possible. The reason is that the acquired view is acquired based on the result of learning the stored pattern 804.
 また、人が強相関パターンを決定することなく、匿名化を実行することも可能である。その理由は、ビュー取得部310が、新たなパターン804の入力なしに、その学習結果に基づいて、取得ビューを取得するようにしたからである。 Also, anonymization can be executed without a person determining a strong correlation pattern. The reason is that the view acquisition unit 310 acquires the acquired view based on the learning result without inputting the new pattern 804.
 <<<第3の実施形態の第2の変形例>>>
 ビュー取得部310は、取得した取得ビューに含まれていない属性を、その取得した取得ビューに更に含む新たな取得ビューを取得する。具体的には、ビュー取得部310は、その取得した取得ビューに含まれていない属性のそれぞれを、その取得した取得ビューに1つずつ追加した取得ビューを更に取得する。
<<< Second Modification of Third Embodiment >>>
The view acquisition unit 310 acquires a new acquisition view that further includes attributes that are not included in the acquired acquisition view in the acquired acquisition view. Specifically, the view acquisition unit 310 further acquires an acquired view in which each attribute not included in the acquired acquired view is added to the acquired acquired view.
 「属性を追加した取得ビューを取得」について、具体例を示して説明する。 ”" Acquisition of acquisition view with attributes added "will be described with a specific example.
 例えば、そのパーソナル情報は、6つの属性「QI_1」、「QI_2」、「QI_3」、「QI_4」、「QI_5」及び「SA」を含む。 For example, the personal information includes six attributes “QI — 1”, “QI — 2”, “QI — 3”, “QI — 4”, “QI — 5”, and “SA”.
 図5に示すパターン804が入力されたとする。 Suppose that the pattern 804 shown in FIG.
 ビュー取得部310は、パターン804の組み合わせに含まれる強相関パターンである{QI_1,SA}及び {QI_2,SA}で、取得ビューを取得する。更に、ビュー取得部310は、パターン804の組み合わせに含まれる強相関パターンのそれぞれに、1つの属性を追加した強相関パターンで取得ビューを取得する。1つの属性を追加したその強相関パターンは、{QI_1,QI_2,SA},{QI_1,QI_3,SA},{QI_1,QI_4,SA},{QI_1,QI_5,SA},{QI_2,QI_3,SA},{QI_2,QI_4,SA},{QI_2,QI_5,SA}である。尚、その追加する属性は、パターン804の組み合わせに含まれる強相関パターンに含まれていない、全ての属性である。また、その追加する属性は、パターン804の組み合わせに含まれる強相関パターンに含まれていない、任意の属性であってもよい。 The view acquisition unit 310 acquires an acquired view using {QI_1, SA} and {QI_2, SA}, which are strong correlation patterns included in the combination of the patterns 804. Further, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern obtained by adding one attribute to each of the strong correlation patterns included in the combination of the patterns 804. The strong correlation pattern obtained by adding one attribute is {QI_1, QI_2, SA}, {QI_1, QI_3, SA}, {QI_1, QI_4, SA}, {QI_1, QI_5, SA}, {QI_2, QI_3, SA. }, {QI_2, QI_4, SA}, {QI_2, QI_5, SA}. The added attributes are all attributes that are not included in the strong correlation pattern included in the combination of the patterns 804. The attribute to be added may be an arbitrary attribute that is not included in the strong correlation pattern included in the combination of the patterns 804.
 以上説明したように、匿名化装置300は、入力した強相関パターン以外についても、分析の精度が高くなるようにすることができる。換言すると、分析者は、当初に入力した強相関パターン以外の強相関パターンでの分析が必要となる可能性がある場合、入力した強相関パターンに含まれる属性以外についても、複数ビューのl-多様性を満足する匿名化ビューを得ることができる。その理由は、ビュー取得部310が、入力された強相関パターンに属性を追加した取得ビューを生成するようにしたからである。 As described above, the anonymization device 300 can increase the accuracy of analysis other than the input strong correlation pattern. In other words, if there is a possibility that an analysis with a strong correlation pattern other than the initially input strong correlation pattern may be required, the analyst can also analyze the multi-view l- in addition to the attributes included in the input strong correlation pattern. Anonymized views satisfying diversity can be obtained. The reason is that the view acquisition unit 310 generates an acquisition view in which an attribute is added to the input strong correlation pattern.
 以上の各実施形態で説明した各構成要素は、必ずしも個々に独立した存在である必要はない。例えば、各構成要素は、複数の構成要素が1個のモジュールとして実現されてよい。また、各構成要素は、1つの構成要素が複数のモジュールで実現されてもよい。また、各構成要素は、ある構成要素が他の構成要素の一部であるような構成であってよい。また、各構成要素は、ある構成要素の一部と他の構成要素の一部とが重複するような構成であってもよい。 Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.
 以上説明した各実施形態における各構成要素及び各構成要素を実現するモジュールは、必要に応じ、可能であれば、ハードウェア的に実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、コンピュータ及びプログラムで実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、ハードウェア的なモジュールとコンピュータ及びプログラムとの混在により実現されてもよい。 In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.
 そのプログラムは、例えば、磁気ディスクや半導体メモリなど、不揮発性のコンピュータ可読記録媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られる。この読み取られたプログラムは、そのコンピュータの動作を制御することにより、そのコンピュータを前述した各実施形態における構成要素として機能させる。 The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
 また、以上説明した各実施形態では、複数の動作をフローチャートの形式で順番に記載してあるが、その記載の順番は複数の動作を実行する順番を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の順番は内容的に支障のない範囲で変更することができる。 In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.
 更に、以上説明した各実施形態では、複数の動作は個々に相違するタイミングで実行されることに限定されない。例えば、ある動作の実行中に他の動作が発生したり、ある動作と他の動作との実行タイミングが部分的に乃至全部において重複していたりしていてもよい。 Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
 更に、以上説明した各実施形態では、ある動作が他の動作の契機になるように記載しているが、その記載はある動作と他の動作との全ての関係を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の関係は内容的に支障のない範囲で変更することができる。また各構成要素の各動作の具体的な記載は、各構成要素の各動作を限定するものではない。このため、各構成要素の具体的な各動作は、各実施形態を実施する上で機能的、性能的、その他の特性に対して支障をきたさない範囲内で変更されて良い。 Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
 以上、各実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解しえるさまざまな変更をすることができる。 As mentioned above, although this invention was demonstrated with reference to each embodiment, this invention is not limited to the said embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 この出願は、2013年7月17日に出願された日本出願特願2013-148137を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2013-148137 filed on July 17, 2013, the entire disclosure of which is incorporated herein.
 100  匿名化装置
 110  ビュー取得部
 120  匿名化部
 200  匿名化装置
 220  匿名化部
 230  類似度算出部
 300  匿名化装置
 310  ビュー取得部
 700  コンピュータ
 701  CPU
 702  記憶部
 703  記憶装置
 704  入力部
 705  出力部
 706  通信部
 707  記録媒体
 804  パターン
 810  パーソナル情報
 815  識別子
 816  準識別子
 817  準識別子
 818  センシティブ属性
 821  ビュー
 822  ビュー
 831  ビュー
 832  ビュー
 841  ビュー
 842  ビュー
 851  ビュー
 852  ビュー
 871  ビュー
 872  ビュー
 873  確認内容
 881  ビュー
 882  ビュー
 883  確認内容
 893  ビュー
 910  パーソナル情報
 915  識別子
 916  準識別子
 917  準識別子
 918  センシティブ属性
 920  匿名化情報
 926  準識別子
 927  準識別子
 928  センシティブ属性
 921  匿名化情報
 931  元ビュー
 932  最汎化ビュー
 933  分割ビュー
 934  分割ビュー
 935  分割ビュー
 941  パーソナル情報
 942  第1ビュー
 943  第2ビュー
 944  第1匿名化ビュー
 945  第2匿名化ビュー
 951  属性値
 952  属性値
 953  積集合
DESCRIPTION OF SYMBOLS 100 Anonymization apparatus 110 View acquisition part 120 Anonymization part 200 Anonymization apparatus 220 Anonymization part 230 Similarity calculation part 300 Anonymization apparatus 310 View acquisition part 700 Computer 701 CPU
702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 804 Pattern 810 Personal information 815 Identifier 816 Semi-identifier 817 Semi-identifier 818 Sensitive attribute 821 View 822 View 831 View 832 View 841 View 852 View 851 View 851 871 View 872 View 873 Confirmation Content 881 View 882 View 883 Confirmation Content 893 View 910 Personal Information 915 Identifier 916 Semi-identifier 918 Semi-identifier 918 Sensitive attribute 920 Anonymized information 926 Semi-identifier 927 Semi-identifier 928 Sensitive attribute 921 Anonymized information 931 932 Generalized view 933 Split view 934 Split view 35 split view 941 Personal information 942 first view 943 second view 944 first anonymous view 945 second anonymizing view 951 attribute value 952 attribute values 953 intersection

Claims (13)

  1.  複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得するためのビュー取得手段と、
     前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力するための匿名化手段と、を含む情報処理装置。
    A view acquisition means for acquiring a plurality of views of personal information including a plurality of attributes of each of a plurality of individuals;
    An anonymization means for outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.
  2.  前記匿名化手段は、前記匿名化ビューの任意の組み合わせが所要の匿名性を満足するように前記匿名化を実行する
     ことを特徴とする請求項1記載の情報処理装置。
    The information processing apparatus according to claim 1, wherein the anonymization unit performs the anonymization so that an arbitrary combination of the anonymization views satisfies a required anonymity.
  3.  前記匿名化手段は、前記ビューのそれぞれに含まれる準識別子が最も汎化された状態から段階を追って前記準識別子の詳細化を実行する際に、前記ビューのそれぞれにおける前記詳細化の1つの前記段階を順次実行する
     ことを特徴とする請求項1または2記載の情報処理装置。
    The anonymization means performs the refinement of the semi-identifier step by step from a state in which the semi-identifier included in each of the views is most generalized, and the one of the refinements in each of the views. The information processing apparatus according to claim 1, wherein the steps are sequentially executed.
  4.  前記匿名化手段は、前記段階を実行された前記ビューの任意の組み合わせが前記所要の匿名性を満足する場合に次の前記段階を実行し、前記段階を実行された前記ビューの任意の組み合わせが前記所要の匿名性を満足しない場合に前記匿名化ビューを出力する
     ことを特徴とする請求項3記載の情報処理装置。
    The anonymization means executes the next step when an arbitrary combination of the views subjected to the step satisfies the required anonymity, and an arbitrary combination of the views subjected to the step The information processing apparatus according to claim 3, wherein the anonymized view is output when the required anonymity is not satisfied.
  5.  前記詳細化時の前記ビューの分割点候補で前記ビューが分割された場合の、等価クラス毎に含まれる前記センシティブ属性の属性値の組み合わせに基づいて、前記分割点候補のそれぞれに対応する各前記ビュー間の類似度を算出するための類似度算出手段を更に含み、
     前記匿名化手段は、前記類似度に基づいて前記詳細化を実行する
     ことを特徴とする請求項3または4記載の情報処理装置。
    When the view is divided by the view division point candidates at the time of the detailing, each of the division point candidates corresponding to each of the division point candidates is based on a combination of attribute values of the sensitive attributes included in each equivalence class. It further includes similarity calculation means for calculating the similarity between views,
    The information processing apparatus according to claim 3, wherein the anonymization unit performs the detailing based on the similarity.
  6.  前記類似度算出手段は、前記類似度を算出する場合に、分割される前記等価クラスの前記準識別子の平均値及び中央値のいずれか一方の点に隣接する前記分割点候補のみに対応する前記類似度を算出する
    ことを特徴とする請求項5記載の情報処理装置。
    The similarity calculation means, when calculating the similarity, corresponds only to the division point candidates adjacent to either one of the average value and the median point of the quasi-identifiers of the equivalent class to be divided 6. The information processing apparatus according to claim 5, wherein the similarity is calculated.
  7.  前記匿名化手段は、前記ビューのそれぞれに対応する優先度に基づいて、前記ビューのそれぞれにおける前記詳細化の1つの前記段階を順次実行する
     ことを特徴とする請求項1乃至6のいずれか1項に記載の情報処理装置。
    The said anonymization means performs one said step of the said refinement | determination in each of the said views sequentially based on the priority corresponding to each of the said views. The information processing apparatus according to item.
  8.  前記匿名化手段は、前記ビューのそれぞれに含まれる準識別子の組み合わせに基づいて前記優先度を算出する、
     ことを特徴とする請求項7記載の情報処理装置。
    The anonymization means calculates the priority based on a combination of quasi-identifiers included in each of the views,
    The information processing apparatus according to claim 7.
  9.  前記ビュー取得手段は、前記パーソナル情報に含まれる前記属性の相互の関係に基づいて、前記ビューを取得する
     ことを特徴とする請求項1乃至8のいずれか1項に記載の情報処理装置。
    The information processing apparatus according to any one of claims 1 to 8, wherein the view acquisition unit acquires the view based on a mutual relationship between the attributes included in the personal information.
  10.  前記ビュー取得手段は、既取得の前記ビューについての学習結果に基づいて、新たな前記ビューを取得する
     ことを特徴とする請求項1乃至9のいずれか1項に記載の情報処理装置。
    The information processing apparatus according to any one of claims 1 to 9, wherein the view acquisition unit acquires a new view based on a learning result of the acquired view.
  11.  前記ビュー取得手段は、前記取得したビューに含まれていない前記属性を、前記取得したビューに更に含む新たな前記ビューを取得する
     ことを特徴とする請求項1乃至10のいずれか1項に記載の情報処理装置。
    The said view acquisition means acquires the said new view which further contains the said attribute which is not contained in the acquired view in the acquired view. The one of Claim 1 thru | or 10 characterized by the above-mentioned. Information processing device.
  12.  コンピュータが、
     複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得し、
     前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力する
     匿名化方法。
    Computer
    Retrieve multiple views of personal information, including multiple attributes for each of multiple individuals,
    An anonymization method for outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.
  13.  複数の個人のそれぞれの複数の属性を含むパーソナル情報の、複数のビューを取得し、
     前記複数のビューの匿名化を並列に実行して得た匿名化ビューを出力する処理をコンピュータに実行させる
     プログラムを記録したコンピュータ読み取り可能な非一時的記録媒体。
    Retrieve multiple views of personal information, including multiple attributes for each of multiple individuals,
    A computer-readable non-transitory recording medium storing a program for causing a computer to execute a process of outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.
PCT/JP2014/003732 2013-07-17 2014-07-15 Information processing device that performs anonymization, and anonymization method WO2015008480A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013148137 2013-07-17
JP2013-148137 2013-07-17

Publications (1)

Publication Number Publication Date
WO2015008480A1 true WO2015008480A1 (en) 2015-01-22

Family

ID=52345956

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/003732 WO2015008480A1 (en) 2013-07-17 2014-07-15 Information processing device that performs anonymization, and anonymization method

Country Status (1)

Country Link
WO (1) WO2015008480A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293421A (en) * 1998-10-02 2000-10-20 Ncr Internatl Inc Device and method for data management with improved privacy protecting function
US20030220927A1 (en) * 2002-05-22 2003-11-27 Iverson Dane Steven System and method of de-identifying data
JP2012159982A (en) * 2011-01-31 2012-08-23 Kddi Corp Device for protecting privacy of public information, method for protecting privacy of public information, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293421A (en) * 1998-10-02 2000-10-20 Ncr Internatl Inc Device and method for data management with improved privacy protecting function
US20030220927A1 (en) * 2002-05-22 2003-11-27 Iverson Dane Steven System and method of de-identifying data
JP2012159982A (en) * 2011-01-31 2012-08-23 Kddi Corp Device for protecting privacy of public information, method for protecting privacy of public information, and program

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHAO YAO ET AL.: "Checking for k-Anonymity Violation by Views", PROCEEDINGS OF THE 31ST VLDB CONFERENCE, 2005, pages 1 - 2, Retrieved from the Internet <URL:http://www.emba.uvm.edu/~xwang4/publications/yao_wang_jajodia_vldb05.pdf> *
JUN'ICHI SAKURADA: "Bunsan Database ni Okeru Tokumeika Kano Hantei no Tameno Anzen de Koritsuteki na Protocol", THE 4TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT RONBUNSHU (DAI 10 KAI THE DATABASE SOCIETY OF JAPAN NENJI TAIKAI, 30 August 2012 (2012-08-30), pages 2 - 3 *
TSUBASA TAKAHASHI: "Shugochi no Ippanka Kaiso nashi Sai Fugoka ni yoru Fukugo Data no k-Tokumeika", THE 5TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT (DAI 11 KAI THE DATABASE SOCIETY OF JAPAN NENJI TAIKAI, 31 May 2013 (2013-05-31), pages 3 - 5 *

Similar Documents

Publication Publication Date Title
US10817621B2 (en) Anonymization processing device, anonymization processing method, and program
CN109716345B (en) Computer-implemented privacy engineering system and method
US9230132B2 (en) Anonymization for data having a relational part and sequential part
Hompes et al. Discovering deviating cases and process variants using trace clustering
US10289869B2 (en) Personal information anonymization method, recording medium, and information processing apparatus
JP6015658B2 (en) Anonymization device and anonymization method
JP7106643B2 (en) Methods for de-identifying data, systems for de-identifying data, and computer programs for de-identifying de-data
US20140317756A1 (en) Anonymization apparatus, anonymization method, and computer program
JP2021117487A (en) Conversion device for secret calculation, secret calculation system, conversion method for secret calculation, and conversion program for secret calculation
WO2014181541A1 (en) Information processing device that verifies anonymity and method for verifying anonymity
JP6079783B2 (en) Information processing apparatus, anonymization method, and program for executing anonymization
US10878128B2 (en) Data de-identification with minimal data change operations to maintain privacy and data utility
US20230334238A1 (en) Augmented Natural Language Generation Platform
WO2014136422A1 (en) Information processing device for performing anonymization processing, and anonymization method
WO2015008480A1 (en) Information processing device that performs anonymization, and anonymization method
Jung et al. Hierarchical business process clustering
CN113268490B (en) Account book processing method, device, equipment and storage medium based on intelligent contract
JP6600368B2 (en) Data conversion apparatus, data conversion method, and data conversion program
KR20240077318A (en) System and method for de-identification processing of data based on machine learning and computer program for the same
JP6280271B1 (en) Data conversion apparatus, data conversion method, and data conversion program
JP6280269B1 (en) Data reference authority management device, data reference authority management method, and data reference authority management program
US20210271993A1 (en) Observed event determination apparatus, observed event determination method, and computer readable recording medium
Argyros et al. On the convergence of Newton-type methods using recurrent functions
Koda et al. Bit Resultant Matrix for Mining Quantitative Association Rules of Bipolar Item Sets
JP2018190382A (en) Data reference authority management device, data reference authority management method and data reference authority management program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14825921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14825921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP