WO2015008480A1

WO2015008480A1 - Information processing device that performs anonymization, and anonymization method

Info

Publication number: WO2015008480A1
Application number: PCT/JP2014/003732
Authority: WO
Inventors: 隆夫竹之内
Original assignee: 日本電気株式会社
Priority date: 2013-07-17
Filing date: 2014-07-15
Publication date: 2015-01-22

Abstract

Provided is an information processing device which performs anonymization so as to decrease fluctuation of information loss between multiple anonymized views. This information processing device is provided with a view acquiring means which acquires multiple views of personal information including multiple attributes of multiple individuals, and an anonymizing means which outputs the anonymized views obtained by performing anonymization of said multiple views in parallel.

Description

Information processing apparatus for performing anonymization and anonymization method

The present invention relates to anonymization technology, and more particularly to privacy protection technology in secondary use of personal information.

Personal information utilization technology and various related technologies are known.

For example, Patent Document 1 discloses a technique for managing multiple types of medical information with different subscriber identification information. The medical information management device disclosed in Patent Literature 1 determines a matching pattern for uniquely identifying a subscriber in the basic ledger. Then, the medical information management device manages various types of medical information related to the subscriber in association with each other according to the matching pattern. Further, the medical information management apparatus anonymizes and manages the symbols, numbers, kanji names, and kana names of health insurance subscribers.

However, such a medical information management apparatus has a problem that the privacy protection of the subscriber is insufficient. This is because it is possible to identify an individual by combining information such as date of birth, sex, medical history, etc. that has not been anonymized in the medical information management apparatus. As a result, information that the individual does not want to be known to others (attribute value of the sensitive attribute) may be known to other individuals. Information such as date of birth, sex, medical history, and the like is called a quasi-identifier. That is, the quasi-identifier is information that makes it possible to identify an individual by being combined. In other words, the information contained in the basic ledger must be sufficiently protected for privacy by anonymization.

FIG. 26 is a diagram illustrating an example of general personal information 910 that needs privacy protection and is an anonymization target.

26, personal information 910 has an identifier (ID) as an identifier 915, a postal code as a quasi-identifier 916, an age as a quasi-identifier 917, and a disease name as a sensitive attribute (918). Note that the zip code, age, disease name, and the like constituting the personal information 910 are also generally called attributes.

Then, as shown in FIG. 26, personal information 910 is represented by a table composed of a plurality of records. Each of the records includes attribute values corresponding to each individual (each attribute, that is, the zip code, age, and disease name). The sensitive attribute 918 is information that is desired to be prevented from being disclosed in association with a specific individual. Note that attributes such as the postal code, age, and disease name are treated as either a semi-identifier or a sensitive attribute depending on the situation.

FIG. 27 shows anonymized information 920 corresponding to personal information 910. The anonymization information 920 does not include an ID. The attribute values of the quasi-identifier 926 and the quasi-identifier 927 of the anonymization information 920 are obtained by generalizing the attribute values of the quasi-identifier 916 and the quasi-identifier 917 of the personal information 910, respectively. The sensitive attribute 928 of the anonymized information 920 is the same as the sensitive attribute 918 of the personal information 910.

Techniques regarding anonymization of the personal information 910 as described above are described in Patent Document 2, Patent Document 3, Non-Patent Document 1, and Non-Patent Document 2.

Patent Document 2 discloses a technique for protecting the privacy of public information. Specifically, Patent Document 2 discloses a technique for performing top-down processing and determining k-anonymity and l-diversity for the result. Patent Document 2 discloses a technique for performing bottom-up processing and determining k-anonymity and l-diversity for the result. Here, the top-down process and the bottom-up process are repeatedly executed. Further, Patent Document 2 discloses a technique for performing partial anonymization processing and determining k-anonymity and l-diversity for the result.

Patent Document 3 discloses a technique for anonymizing important information (sensitive attribute) in addition to the quasi-identifier.

Non-Patent Document 1 discloses an example of an anonymization method. Specifically, Non-Patent Document 1 discloses an anonymization method based on a top-down approach using a Mondrian method.

FIG. 28 shows an example of l-diversification using the Mondrian method.

QID1 and QID2 shown in FIG. 28 are quasi-identifiers. Also, SA shown in FIG. 28 is a sensitive attribute. An original view 931 shown in FIG. 28 is a view of certain personal information (not shown) composed of original (unprocessed) attribute values. Here, the view is a view in the database technology, for example, a table obtained by extracting some attributes from the table as shown in FIG.

As shown in FIG. 28, l-diversification is executed by the following procedure using, for example, a certain anonymizing device (not shown).

First, the anonymization device generates the most generalized view 932 by most generalizing the quasi-identifier included in the original view 931 (processing it into an ambiguous value). In the generalized view 932, one group of records having the same combination of quasi-identifier values (attribute values) can be created. This group of records having the same combination of quasi-identifier values is also called an equivalence class.

Second, the anonymization device divides the equivalent class and refines the quasi-identifier in units of the divided equivalent class. Specifically, the anonymization device regroups the records based on the attribute value of the specific quasi-identifier in the original view 931 with the specific value of the specific quasi-identifier as a boundary. , Split its equivalence class.

For example, the anonymization device selects the specific quasi-identifier sequentially from the left column of the view to be divided. In addition, the anonymization device sets an average value of attribute values of the specific quasi-identifier included in the equivalence class as the specific value.

Specifically, the anonymization device selects QID1 as the specific quasi-identifier for the generalized view 932 and determines the specific value as “120 (rounds off the decimal point)”. The anonymization device regroups the records included in the generalized view 932 based on the attribute value of QID1 in the original view 931, with this “120” as a boundary. Next, the anonymization device refines the quasi-identifier for each regrouped equivalence class and generates a split view 933.

The specific quasi-identifier may be the quasi-identifier having the largest value range. The specific value may be a median value of the attribute values of the specific quasi-identifier. There are a plurality of known techniques regarding the method for selecting the specific quasi-identifier and the method for determining the specific value, but the description of these techniques is omitted.

Thirdly, the anonymization device repeats the second process while the divided view generated in the second process described above satisfies a predetermined anonymity.

Specifically, FIG. 28 shows that a split view 934 is generated from the split view 933 and a split view 935 is further generated from the split view 934, with predetermined anonymity as 3-diversity. Here, 3-diversity means that l-diversity l is “3”. In other words, it can be said that 3-diversity represents “l-diversity with l = 3”. Hereinafter, the display in which “3” of 3-diversity is replaced with an arbitrary numerical value also means that “l” of 1-diversity is the “arbitrary numerical value”.

Non-Patent Document 2 discloses an example of an l-diversity verification method. Specifically, Non-Patent Document 2 discloses a method for verifying l-diversity when matching a plurality of anonymized views thereof. Hereinafter, this l-diversity is referred to as “l-diversity of multiple views”. The “anonymized view” is referred to as “anonymized view”.

FIG. 29 is a diagram for explaining the l-diversity of a plurality of views.

29, personal information 941 includes “ID” as an identifier, “QI_1” and “QI_2” as quasi-identifiers, and “SA” as a sensitive attribute.

Assume that the personal information 941 is analyzed for the relationship between “QI_1” and “SA” as the first pattern and the relationship between “QI_2” and “SA” as the second pattern. The required anonymity in this case is 2-diversity.

The first view 942 is a view corresponding to the first pattern. The first anonymized view 944 is an anonymized view so that the first view 942 satisfies 2-diversity. The second view 943 is a view corresponding to the second pattern. The second anonymized view 945 is an anonymized view so that the second view 943 satisfies 2-diversity.

As shown in FIG. 29, both the first anonymized view 944 and the second anonymized view 945 satisfy 2-diversity.

Here, an attacker performs an attack to identify sensitive attributes. The attacker knows that “QI_1” of “user4” is “13” and “QI_2” is “21” as prerequisite knowledge.

The attacker can infer from the first anonymized view 944 that the value of “SA” of “user4” is “B” or “C” based on the premise knowledge. Further, the attacker can infer from the second anonymized view 945 that the value of “SA” of “user4” is “A” or “B” based on the premise knowledge. Therefore, the attacker can specify that the value of “SA” of “user4” is “B”. In other words, the l-diversity of the multiple views in this case is 1-diversity.

Non-Patent Document 2 describes the following algorithm as a method for verifying l-diversity of multiple views. First, the verification method acquires a sensitive attribute corresponding to the target ID in each anonymized view. Second, the verification method obtains a product set for each ID of the acquired sensitive attribute. Third, the l-diversity in the target ID is determined based on whether or not the number of sensitive attributes in the product set is equal to or greater than l (l-diversity “l” to be verified).

FIG. 30 is a diagram for explaining an example in which the l-diversity of the two anonymized views 944 and 945 shown in FIG. 29 is verified.

30, the attribute value 951 is an attribute value of each individual sensitive attribute that can be inferred from the first anonymized view 944. The attribute value 952 is an attribute value of the sensitive attribute of each individual that can be inferred from the second anonymized view 945. The product set 953 is a product set of the attribute value 951 and the attribute value 952.

The number of sensitive attributes included in each product set 953 is the value of “l” of l-diversity of multiple views in each target ID.

JP 2012-098879 A JP 2012-159982 A JP 2013-084027 A

However, the technique described in the above-described prior art document has a problem in that the amount of information loss varies among a plurality of anonymized views. In other words, there is a problem that the information loss amount of a certain anonymized view is smaller than necessary, and the information loss amount of another anonymized view may be larger than a required limit.

The reason is that none of the anonymization techniques disclosed in

Patent Documents

2 and 3 considers that the “plurality of anonymization views” as shown in Non-Patent Document 2 are generated. . Therefore, when generating a second anonymized view after generating the first anonymized view, in order to satisfy “l-diversity when matching those anonymized views” This is because the amount of information loss in the second anonymized view may have to be sacrificed.

Specifically, the anonymization techniques disclosed in

Patent Documents

2 and 3 perform anonymization (referred to as pre-anonymization) in principle so that the amount of information loss is minimized. Therefore, when anonymization (referred to as subsequent anonymization) is performed from another point of view (in terms of correspondence with another quasi-identifier, which is different from the quasi-identifier at the time of preceding anonymization), the preceding anonymization is performed. It is also necessary to consider the anonymity of the anonymized view generated by. That is, the subsequent line anonymization is limited by the result of the preceding anonymization.

An object of the present invention is to provide an information processing apparatus, an anonymization method, and a computer-readable non-transitory recording medium recording the program for solving the above-described problems.

An information processing apparatus according to an embodiment of the present invention performs, in parallel, a view acquisition unit that acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and anonymization of the plurality of views in parallel. Anonymizing means for outputting the obtained anonymized view.

In the anonymization method according to one embodiment of the present invention, a computer obtains a plurality of views of personal information including a plurality of attributes of a plurality of individuals, and executes anonymization of the plurality of views in parallel. Output anonymized views.

A computer-readable non-transitory recording medium according to one embodiment of the present invention acquires a plurality of views of personal information including a plurality of attributes of a plurality of individuals and performs anonymization of the plurality of views in parallel. A program that causes the computer to execute the process of outputting the anonymized view obtained in this way is recorded.

The present invention has an effect that it is possible to reduce variation in the amount of information loss between a plurality of anonymized views.

FIG. 1 is a block diagram showing the configuration of the anonymization apparatus according to the first embodiment of the present invention. FIG. 2 shows an example of personal information in the first embodiment. FIG. 3 shows an example of a view in the first embodiment. FIG. 4 shows an example of a view in the first embodiment. FIG. 5 shows an example of a pattern in the first embodiment. FIG. 6 is a block diagram illustrating a hardware configuration of a computer that implements the anonymization device according to the first embodiment. FIG. 7 is a flowchart illustrating the operation of the anonymization device according to the first embodiment. FIG. 8 is a diagram illustrating an example of maximum generalized view generation in the first embodiment. FIG. 9 is a diagram for explaining an example of the refinement stage in the first embodiment. FIG. 10 is a diagram for explaining an example of the refinement stage in the first embodiment. FIG. 11 shows an example of a view in the related art. FIG. 12 shows an example of a view in the related art. FIG. 13 shows an example of anonymization information in the first embodiment. FIG. 14 is a flowchart illustrating the operation of the anonymization device according to the first modification example of the first embodiment. FIG. 15 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment. FIG. 16 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the first embodiment. FIG. 17 is a block diagram showing a configuration of the anonymization device according to the second exemplary embodiment of the present invention. FIG. 18 is a diagram for explaining division point candidates in the second embodiment. FIG. 19 is a diagram for explaining division point candidates in the second embodiment. FIG. 20 is a diagram for explaining the sensitive attribute of the equivalent class in the second embodiment. FIG. 21 is a flowchart illustrating the operation of the anonymization device according to the second embodiment. FIG. 22 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment. FIG. 23 is a diagram for explaining an example of the refinement stage and the determination of l-diversity of a plurality of views in the second embodiment. FIG. 24 shows an example of a view in the related art. FIG. 25 is a block diagram showing a configuration of the anonymization device according to the third exemplary embodiment of the present invention. FIG. 26 shows an example of personal information in the related art. FIG. 27 shows an example of anonymization information in the related art. FIG. 28 shows an example of l-diversification using the Mondrian method in the related art. FIG. 29 is a diagram for explaining l-diversity of a plurality of views in the related art. FIG. 30 is a diagram illustrating verification of l-diversity of multiple views in the related art.

Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In each embodiment described in each drawing and specification, the same reference numerals are given to the same components, and the description thereof is omitted as appropriate.

<<<< first embodiment >>>>
FIG. 1 is a block diagram showing the configuration of the anonymization device 100 according to the first embodiment of the present invention.

As shown in FIG. 1, the anonymization device 100 according to the present embodiment includes a view acquisition unit 110 and an anonymization unit 120. The constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of the computer apparatus. Here, the components shown in FIG. 1 will be described as components divided into functional units of the computer apparatus.

=== Personal information 810 ===
FIG. 2 is a diagram illustrating an example of the personal information 810. As shown in FIG. 2, the personal information 810 includes an identifier 815 and a plurality of attributes for each of a plurality of individuals. Here, the plurality of attributes are a quasi-identifier 816, a quasi-identifier 817, and a sensitive attribute 818. That is, one record of the personal information 810 includes an identifier of a certain individual and a plurality of attributes.

“ID”, “QI_1”, “QI_2”, and “SA” shown in the top row of FIG. 2 are attribute names.

=== View ===
Each of FIG. 3 and FIG. 4 shows an example of a view composed of a combination of arbitrary attributes in the personal information 810. Hereinafter, the views acquired from arbitrary personal information (for example, a view 821 and a view 822 described later) are collectively referred to as an acquired view.

FIG. 3 is a diagram showing a view 821 including a combination of the quasi-identifier 816 and the sensitive attribute 818 in the personal information 810.

FIG. 4 is a diagram showing a view 822 that is a combination of the quasi-identifier 817 and the sensitive attribute 818 in the personal information 810.

=== Pattern 804 ===
The pattern is information indicating which attribute group is desired to be analyzed such as correlation. The pattern is represented, for example, by information (for example, attribute name) that specifies a quasi-identifier or a sensitive attribute. The quasi-identifier and its sensitive attribute may be plural.

FIG. 5 is a diagram illustrating an example of the pattern 804. As shown in FIG. 5, each of the patterns 804 includes a pair of attribute names “QI_1” and “SA” and a pair of attribute names “QI_2” and “SA”. In other words, the pattern 804 illustrated in FIG. 5 obtains a view 821 including a set of attributes “QI_1” and “SA” and a view 822 including a set of attributes “QI_2” and “SA”. It is information to suggest. The pattern 804 shown in FIG. 5 is also expressed as “{QI_1, SA} and {QI_2, SA}”.

Next, each component of the functional unit of the anonymization device 100 will be described.

=== View Acquisition Unit 110 ===
The view acquisition unit 110 acquires a plurality of acquisition views from the personal information 810.

For example, the view acquisition unit 110 acquires the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 from the personal information 810 based on the pattern 804 shown in FIG.

Input of personal information 810 and input of pattern 804 are performed as follows.

For example, when inputting the personal information 810, the view acquisition unit 110 reads the personal information 810 stored in advance in a storage unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may receive the personal information 810 from the outside via a communication unit (not shown) of the anonymization device 100.

For example, in the input of the pattern 804, the view acquisition unit 110 accepts the input of the pattern 804 given by the operator via an input unit (not shown) of the anonymization device 100. Further, the view acquisition unit 110 may read a pattern 804 stored in advance in a storage unit (not shown) of the anonymization device 100.

=== Anonymizing unit 120 ===
The anonymization unit 120 performs anonymization of the plurality of acquired views acquired by the view acquisition unit 110 in parallel. Next, the anonymization unit 120 outputs the anonymized view obtained by the anonymization. The anonymized view is a generic name for the output anonymized view.

For example, the anonymization unit 120 outputs anonymized views corresponding to the views 821 and 822 obtained by executing the anonymization of the view 821 shown in FIG. 3 and the view 822 shown in FIG. 4 in parallel.

Here, the anonymization unit 120 executes the anonymization using, for example, a Mondrian method. The Mondrian method converts anonymization target information (for example, each of the view 821 and the view 822) into a single equivalent class (group) in a state of maximum generalization, and divides the equivalent class to create a new equivalent class. It is a technique that repeats generating a class. Here, the “single equivalence class in the state of maximum generalization” is an equivalence class having anonymization target information (for example, each of the view 821 and the view 822) as one group. The new equivalence class is an equivalence class in which the quasi-identifier is refined in units of new groups generated by the division. Hereinafter, a combination of one division and detailing is referred to as a detailing stage.

Here, “a state in which all records are included in a single equivalence class to form one group” is a state in which all quasi-identifiers included in the anonymization information are most generalized.

In addition, “detailed quasi-identifiers in new group units” means that the quasi-identifier values included in the new group are changed from the maximum value to the minimum value of the original values of those quasi-identifiers. Process it and make it a new equivalence class.

That is, the anonymization unit 120 executes the refinement stage for each of the plurality of acquired views in parallel. More specifically, the anonymization unit 120 executes the first refinement stage of the view 821, and then executes the first refinement stage of the view 822. Next, the anonymization unit 120 executes the second refinement stage of the view 821, and then executes the second refinement stage of the view 822. Next, the anonymization unit 120 performs the detailed steps after the third time in the same manner. Also, the anonymization unit 120 executes the first refinement stage of each acquired view in the same manner when there are three or more acquired views, and then the second refinement stage of each acquired view. Execute.

This completes the description of each component of the functional unit of the anonymization device 100.

Next, the components of the anonymization device 100 in hardware units will be described.

FIG. 6 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 100 according to the present embodiment.

As shown in FIG. 6, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.

The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). The CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702. Here, the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.

Then, the CPU 701 executes various processes as the view acquisition unit 110 and the anonymization unit 120 shown in FIG. 1 according to the read program and based on the read data.

Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).

The storage unit 702 stores programs and data. The storage unit 702 may store personal information 810, a pattern 804, each acquired view (for example, the view 821 and the view 822), and the like.

The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 (recording medium 707) stores the program in a computer-readable manner. The storage device 703 may store data. The storage device 703 may store personal information 810, patterns 804, acquired views, and the like.

The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, for example. The anonymization device 100 may acquire the personal information 810 and the pattern 804 input via the input unit 704.

The output unit 705 is realized by a display, for example, and is used for confirming the output. The anonymization device 100 may output the anonymized view via the output unit 705.

The communication unit 706 realizes an interface with the outside. The communication unit 706 is included as part of the view acquisition unit 110 and the anonymization unit 120. The anonymization device 100 may acquire the personal information 810 and the pattern 804 from the outside via the communication unit 706. Further, the anonymization device 100 may output the anonymization view to the outside via the communication unit 706.

As described above, the functional unit block of the anonymization device 100 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .

Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily. A storage medium that stores information non-temporarily is also referred to as a non-volatile storage medium.

This completes the description of each component of the computer 700 that implements the anonymization device 100 according to the present embodiment.

Next, the operation of this embodiment will be described in detail with reference to FIGS.

FIG. 7 is a flowchart showing the operation of this embodiment. Note that the processing according to this flowchart may be executed based on the program control by the CPU 701 described above. Further, the step name of the process is described by a symbol as in S601.

The premise is that the required anonymity is 2-anonymity (“k-anonymity” with k = 2) and 2-diversity with multiple views (“l-diversity with multiple views” with l = 2). And

The view acquisition unit 110 acquires the personal information 810 and the pattern 804 (S601).

Next, the view acquisition unit 110 acquires the view 821 and the view 822 from the personal information 810 based on each of the patterns 804 (S602).

Next, the anonymization unit 120 generates a view 831 and a view 832 that are the most generalized quasi-identifiers (“QI_1” and “QI_2”) included in each of the view 821 and the view 822 (S603). The process for generating the view 831 and the view 832 may be a process for updating the view 821 and the view 822 to the view 831 and the view 832, respectively. Similarly, the process for generating each stage anonymized view in the following description may be a process for updating the stage anonymized view one stage before corresponding to each stage anonymized view. Here, the stage anonymized view is a general term for a view 831, a view 832, a view 841 (described later), a view 842 (described later), a view 851 (described later), and a view 852 (described later).

FIG. 8 shows an image in which the anonymization unit 120 generates the view 831 by most abstracting (maximum generalization) the quasi-identifier “QI — 1” included in the view 821 shown in FIG. FIG. 8 illustrates an image in which the anonymization unit 120 generates the view 832 by abstracting the quasi-identifier “QI — 2” included in the view 822 illustrated in FIG.

Next, the anonymization unit 120 executes the details of each of the view 831 and the view 832 in parallel (S604).

FIG. 9 shows an image in which the anonymization unit 120 executes the first detailing stage. As illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 831 to generate the view 841. Further, as illustrated in FIG. 9, the anonymization unit 120 executes the first refinement stage for the view 832 to generate a view 842.

FIG. 10 shows an image in which the anonymization unit 120 executes the second refinement stage. As illustrated in FIG. 10, the anonymization unit 120 performs the second refinement stage on the view 841 to generate a view 851. Also, as illustrated in FIG. 10, the anonymization unit 120 performs a second detailing step on the view 842 to generate a view 852.

Next, the anonymization unit 120 outputs anonymized views (here, view 841 and view 842) that satisfy the required anonymity among the generated views (S605). The view 851 and the view 852 are not output because they do not satisfy the 2-diversity of a plurality of views when they are matched.

It should be noted that if there is no equivalence class that can be divided in the refinement stage of a certain stage anonymization view, the anonymization unit 120 may skip the division and refinement in the refinement stage.

In other words, the anonymization unit 120 may perform division and refinement of the refinement stage only for the stage anonymization view including the equivalence class that can be split. Here, “when there is no equivalence class that can be split” means that when the stage anonymization view is split and refined, the stage anonymization view cannot satisfy the anonymity that should be satisfied alone. Is the case.

Further, if there is no equivalence class that can be divided in any of the anonymization views, the anonymization unit 120 ends the process of S604. It should be noted that the anonymization unit 120 may end the process of S604 when executing a predetermined number of details.

The above is the description of the operation of this embodiment.

Next, as a comparison with anonymization according to the present embodiment, anonymization by related technology will be described.

This related technique is a method that simply combines the Mondrian method disclosed in Non-Patent Document 1 and the “l-diversity verification method when matching multiple anonymized views” disclosed in Non-Patent Document 2. is there. For example, the process of anonymizing two acquired views by this method is simply a process of first anonymizing the first view and then anonymizing the second view. On the other hand, the anonymization apparatus 100 of this embodiment anonymizes a plurality of acquired views in parallel.

The anonymization target information is assumed to be a view 821 and a view 822. The required anonymity is 2-anonymity and 2-view diversity.

In the related technology, for example, the view 821 is first anonymized and the result is output. Next, the view 822 is anonymized and the result is output.

FIG. 11 is a diagram showing anonymization of the view 821 by the related technology. As shown in FIG. 11, in the related art, the view 821 is anonymized, and the view 851 having the smallest amount of information loss while satisfying 2-diversity is output.

FIG. 12 is a diagram showing anonymization of the view 822 by the related technology. As shown in FIG. 12, in the related art, the view 822 is anonymized. Here, the view 842 and the view 852 do not satisfy the 2-diversity of the plurality of views for the record of “user4” when the view 852 and the view 851 are matched with each other. Therefore, the view 832 is output as the anonymous result of the view 822.

As a result, in the related technology, the view 851 and the view 832 are output.

Here, the amount of information loss in this embodiment will be described. Several methods for calculating the information loss amount of the entire anonymized information (hereinafter referred to as the total information loss amount) have been proposed. In this embodiment, the information loss amount of a certain quasi-identifier is set as the quasi-identifier of each record. Is the sum of generalization widths. The total information loss amount is defined as the total information loss amount of the quasi-identifier included in the anonymized information.

FIG. 13 shows anonymization information 921. The total information loss amount of the anonymized information 921 is obtained as follows.

The information loss amount of the quasi-identifier “ZIP code” is “8”, which is the sum of the generalization widths of all records (all “1”).

The information loss amount of “age” of the quasi-identifier is the generalization width of each record (in order from the top row, “2”, “2”, “3”, “3”, “7”, “7”, “2” and “2”) are totaled to be “28”.

Assuming that the quasi-identifier “nationality” information loss is generalized with * in four countries, the totalization width (all “4”) of each record is summed up to “32”. is there.

Therefore, the total information loss amount of the anonymized information 921 is “68”, which is the total information loss amount of each quasi-identifier.

The above is an explanation of the amount of information loss.

Each of the view 841 and the view 842 output by the present embodiment is divided once. The total information loss amounts of the view 841 and the view 842 are “25” and “43”, respectively.

On the other hand, in the related technology, the view 851 is divided twice, whereas the view 832 is never divided. The total information loss amounts of the view 851 and the view 832 are “17” and “91”, respectively.

That is, the analyst can expect that the analysis using each of the view 841 and the view 842 output according to the present embodiment maintains a certain accuracy. However, the analyst can expect high accuracy in the analysis using the view 851 output by the related technology, but cannot expect practical accuracy in the analysis using the view 832 that is also output. In other words, the analyst cannot obtain an analysis result having a certain accuracy with a plurality of anonymized views output by the related technology.

The first effect in the present embodiment described above is that it is possible to reduce the variation in the total information loss amount among a plurality of anonymized views.

The second effect of the present embodiment described above is that an analysis result having a certain accuracy can be obtained for each of a plurality of anonymized views.

The reason is that the view acquisition unit 110 acquires a plurality of acquisition views, and the anonymization unit 120 executes anonymization of the plurality of views in parallel.

<<< First Modification of First Embodiment >>>
The anonymization unit 120 executes the refinement stage for each of the first stage stage anonymized views and updates (generates) the second stage stage anonymized view. It is then determined whether those second stage stage anonymized views satisfy the l-diversity of the multiple views.

If the anonymization unit 120 determines that l-diversity of the plurality of views is satisfied, the second stage stage anonymized view is used as a new first stage stage anonymized view, and the next stage That refinement stage of the. Also, when the anonymization unit 120 determines that the l-diversity of the plurality of views is not satisfied, the anonymization unit 120 outputs the second stage stage anonymization view.

In other words, as long as the l-diversity of a plurality of views is satisfied, the anonymization unit 120 repeatedly performs the refinement step, satisfies the l-diversity of the plurality of views, and generates the most total information loss amount. Output fewer anonymized views as anonymized views.

Next, the operation of this modification will be described in detail with reference to the drawings.

FIG. 14 is a flowchart showing the operation of this modification.

Next, the view acquisition unit 110 acquires an acquisition view (for example, the view 821 and the view 822) from the personal information 810 based on each of the patterns 804 (S602).

Next, the anonymization unit 120 generates a stage anonymized view (for example, the view 831 and the view 832) that is the most generalized quasi-identifier included in each of the acquired views (S603).

Next, the anonymization unit 120 executes the refinement stage for each stage anonymization view (S606).

Specifically, in the process of S606 for the first time, the anonymization unit 120 performs the view 841 and the view 842 (detailed one-stage anonymized view) of the view 831 and the view 832 (the first-stage anonymized view). Update to the second stage stage anonymization view).

Next, when the view 841 and the view 842 are matched and analyzed, the anonymization unit 120 determines whether or not l-diversity of a plurality of views is satisfied (S607).

FIG. 15 is a diagram for explaining an example of the “detailing stage” in S606 for the first time and the “determination” in S607. In the example illustrated in FIG. 15, the anonymization unit 120 generates a view 841 and a view 842 obtained by refining each of the view 831 and the view 832 by one stage at the detailing stage. Further, the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 841 and the view 842 are matched. Here, the sensitive attribute value is an attribute value of “SA” of the sensitive attribute.

Referring to FIG. 15, the confirmation content 843 corresponds to the attribute value “SA” of the sensitive attribute that can be inferred in the view 841, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 842, and Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 841 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 842 is “SA” of the sensitive attribute that can be estimated. ”Attribute value. Referring to FIG. 15, in any record, the number of attribute values of the sensitive attribute “SA” that can be estimated by matching the records is two or more. Therefore, the anonymization unit 120 determines that “l-diversity of multiple views” of l = 2 is satisfied.

Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 120 determines whether or not a division point candidate exists in any stage anonymization view (S608). . If the candidate for the dividing point exists (YES in S608), the process returns to S606.

FIG. 16 is a diagram for explaining an example of the “detailed stage” in S606 for the second time and the “determination” in S607 following that when returning from S607. In the example illustrated in FIG. 16, the view acquisition unit 110 generates a view 851 and a view 852 in which each of the view 841 and the view 842 is refined by one stage at the detailing stage. Further, the anonymization unit 120 extracts all specifiable sensitive attribute values for each record corresponding to each individual when the view 851 and the view 852 are matched.

Referring to FIG. 16, the confirmation content 853 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 851, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 852, and Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be inferred in the view 851 and the attribute value of the sensitive attribute “SA” that can be inferred in the view 852 is “SA” of the sensitive attribute that can be inferred. ”Attribute value. Referring to FIG. 16, in the record whose ID is “user4”, the number of attribute values of the sensitive attribute “SA” that can be estimated by matching is one. Therefore, the anonymization unit 120 determines that “l-diversity of multiple views” with l = 2 is not satisfied.

Next, when the l-diversity of the plurality of views is not satisfied (NO in S607), the view acquisition unit 110 uses the second stage stage anonymized view generated in the detailing stage as the first stage stage. Return to anonymized view. Subsequently, the anonymization unit 120 outputs the returned first-stage anonymization view as an anonymization view (S609). Here, the anonymization unit 120 returns the view 851 and the view 852 to the view 841 and the view 842, respectively. Subsequently, the anonymization unit 120 outputs the view 841 and the view 842. The views 841 and 842 are anonymized views that satisfy the required anonymity (“l-diversity of multiple views of l = 2”) and have the least amount of overall information loss.

If the candidate for the division point does not exist (NO in S608), the anonymization unit 120 outputs the second stage stage anonymized view generated in the detailing stage as an anonymized view (S610). .

The above is the description of the operation of this modification.

This modification can output an anonymized view that satisfies the required anonymity and has the least amount of overall information loss.

<<< Second Modification of First Embodiment >>>
In the first embodiment described above, the anonymization unit 120 executes one detailing step for each of the plurality of step anonymized views, and then confirms “l-diversity of multiple views”. For example, the anonymization unit 120 executes the refinement stage on the view 831 and then executes the refinement stage on the view 832. Next, the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 842.

In the second modification, the anonymization unit 120 refines the view 831 and updates it to the view 841. Next, the anonymization unit 120 confirms “l-diversity of multiple views” for the view 841 and the view 832 that has not been detailed yet.

In other words, in the present modification, the anonymization unit 120 determines whether or not the required anonymity is satisfied following the refinement stage for one stage anonymization view.

This modified example can reduce the process of returning the second stage stage anonymized view generated by the refinement stage to the first stage stage anonymized view in S609 of FIG.

<<< Third Modification of First Embodiment >>>
The anonymization unit 120 sequentially executes the refinement stages of the stage anonymization view based on the priority corresponding to each of the acquisition views (for example, the view 821 and the view 822) to be refined. For example, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view having a relatively high priority.

For example, the anonymization unit 120 receives the priority input by the operator via the input unit 704 shown in FIG.

The anonymization unit 120 may read the priority stored in advance in the storage unit 702 or the storage device 703 shown in FIG. Moreover, you may make it the anonymization part 120 receive the priority from the apparatus which is not shown in figure via the communication part 706 shown in FIG. Further, the anonymization unit 120 may read out the priority recorded in the recording medium 707 via the storage device 703 shown in FIG.

Specifically, the anonymization unit 120 executes the refinement steps in order from the step anonymization view corresponding to the acquisition view to be refined that has a relatively high priority.
Further, the anonymization unit 120 may calculate the priority based on the combination of the patterns 804. For example, based on the rate at which the quasi-identifier is included in the plurality of patterns 804, the anonymization unit 120 calculates a higher priority for the pattern 804 having a lower rate of the quasi-identifier included in each pattern 804.

Specifically, it is assumed that three patterns 804 of {QI_1, SA}, {QI_1, QI_2, SA}, and {QI_3, SA} are input as the pattern 804. At this time, “QI_1” is included in two patterns 804 ({QI_1, SA}, {QI_1, QI_2, SA}). However, "QI_2" is included only in one pattern 804 ({QI_1, QI_2, SA}), and "QI_3" is included only in one pattern 804 ({QI_3, SA}). At this time, the anonymization unit 120 may determine that “QI — 2” and “QI — 3” are more important than “QI — 1” because they are included in only one pattern 804. Then, the priority of {QI_1, QI_2, SA} and {QI_3, SA} including either “QI_2” or “QI_3” is increased to “2”, and the priority of {QI_1, SA} is increased. “1” may be set low.
Further, the anonymization unit 120 may sequentially execute the detailed steps of the step anonymization view based on both the priority and the total information loss amount. For example, the anonymization unit 120 gives priority to the stage anonymization view having a larger value obtained by multiplying the priority corresponding to each of the stage anonymization views and the total information loss amount, and executes the refinement stage. You can do it.

For example, the anonymization unit 120 doubles the information loss amount of the first acquisition view when the priority of the first acquisition view is “2” and the priority of the second acquisition view is “1”. And the value obtained by multiplying the amount of information loss of the second acquired view by one. Then, the anonymization unit 120 executes the refinement step by giving priority to the step anonymization view corresponding to the acquisition view having the larger value.

First, this modification has an effect that the information loss amount of a specific view 850 can be relatively reduced based on the priority.

Second, this modification calculates the priority based on the combination of the patterns 804, and relatively reduces the information loss amount of the specific view 850 corresponding to the quasi-identifier included in the acquired view. Has the effect of becoming possible.

Third, this modification has an effect that the information loss amount of the specific view 850 can be relatively reduced based on the value calculated from the priority and the information loss amount.

<<< Second Embodiment >>>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

FIG. 17 is a block diagram showing a configuration of the anonymization apparatus 200 according to the second embodiment of the present invention.

As shown in FIG. 17, the anonymization device 200 according to the present embodiment further includes a similarity calculation unit 230 as compared with the anonymization device 100 according to the first embodiment. Further, the anonymization device 200 includes an anonymization unit 220 instead of the anonymization unit 120.

=== Similarity Calculation Unit 230 ===
The similarity calculation unit 230 extracts the types of sensitive attribute values included for each equivalent class. Here, the equivalence class is an equivalence class when it is divided at each of a plurality of division point candidates for the stage anonymized view before execution of a certain refinement stage. Hereinafter, the combination of the extracted types of sensitive attribute values is referred to as “SA combination”.

In addition, when there is no division point candidate in any of the stage anonymization views, the similarity calculation unit 230 may extract the sensitive attribute value only for the stage anonymization view having the division point candidates.

Next, the similarity calculation unit 230 calculates the similarity corresponding to each of the plurality of division point candidates based on the SA combination. The degree of similarity is the degree of similarity between the respective stage anonymized views when the detailing stage is executed with the division point candidate.

For example, the similarity calculation unit 230 calculates the similarity based on the edit distance between the SA combinations of the respective stage anonymized views for each record of the personal information 810.

Here, the editing distance between SA combinations means that, in two SA combinations including an arbitrary number of sensitive attribute values, the number of editing (deletion or addition) from one SA combination to the other SA combination. This is a method of indicating the distance.

For example, when editing the SA combination “{A, B, C}” to the SA combination “{B, C, D}”, delete one “A” and one “D”. Since the editing is to be added, the editing distance is “2”.

Specifically, first, the similarity calculation unit 230 divides each stage anonymized view to satisfy the required l-diversity even if the refinement stage is executed at a division point with the stage anonymized view. A point is extracted as a division point candidate.

FIG. 18 shows split point candidates of the view 831 extracted by the similarity calculation unit 230. Further, FIG. 19 shows the dividing point candidates of the view 832 extracted by the similarity calculation unit 230.

Second, the similarity calculation unit 230 extracts the SA combination for each division point candidate of each stage anonymized view.

FIG. 20 shows “a1”, “a2”, “a3”, and “a4” as the division point candidates for the view 831, and “b1”, “b2”, “b3”, and “b4” as the division point candidates for the view 832. The SA combinations corresponding to the above are shown.

Third, the similarity calculation unit 230 calculates the similarity between each stage anonymized view (here, each of the view 831 and the view 832) by combining each division point candidate of each stage anonymized view (hereinafter, This is called “candidate combination”).

For example, in the case of a candidate combination of “a1” and “b1”, since the SA combination is the same in any ID record, the edit distances are all “0” and the sum is also “0”. Therefore, the similarity is “0”.

For example, in the case of a candidate combination of “a1” and “b2”, in the record whose ID is “user3”, the SA combination corresponding to the view 831 is “{A, B, C}”, that corresponding to the view 832 The SA combination is “{A, B}”, and the edit distance is “1”. In the other ID records, the SA combinations match, so the edit distance is “0”, and the sum is “1”, so the similarity is “−1”. Since the similarity is greater as the editing distance is smaller, the similarity is determined by inverting the sign of the total editing distance.

Similarly, the similarity calculation unit 230 calculates the similarity for all candidate combinations. In this case, each candidate combination of “a1” and “b1”, “a2” and “b2”, “a3” and “b3”, and “a4” and “b4” has the highest similarity.

Also, the similarity calculation unit 230 determines that the greater the number is, the greater the similarity is based on the number of product sets of SA combinations between the respective stages of anonymized views for each record of the personal information 810. You may do it.

=== Anonymizing unit 220 ===
The anonymization unit 220 according to the present embodiment determines a division point when performing the division of the equivalent class in the refinement stage based on the similarity calculated by the similarity calculation unit 230.

Specifically, the anonymization unit 220 determines a division point candidate included in the candidate combination having the highest similarity as a division point. In addition, when there are a plurality of candidate combinations having the highest similarity, the anonymization unit 220 selects a candidate combination including, for example, a division point candidate close to the average value. In that case, the anonymization unit 220 may select a candidate combination including a division point candidate close to the median (Median).

For example, referring to the view 821 in FIG. 3, the average value of “QI_1” is “13”, and the candidate division points close to this are “a2” and “a3”. Further, referring to the view 822 in FIG. 4, since the average value of “QI_2” is “22”, a candidate for a dividing point close to this is “b3”. Accordingly, here, the anonymization unit 220 selects (determines) “a3” and “b3”.

Next, the anonymization unit 220 executes the refinement step at the determined division point.

Next, the operation of this embodiment will be described in detail with reference to the drawings.

FIG. 21 is a flowchart showing the operation of the present embodiment.

Next, the anonymization unit 220 generates a view 831 and a view 832 that are the most generalized quasi-identifiers included in each of the view 821 and the view 822 (S603).

Next, the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 831 and view 832) (S624).

Next, the anonymization unit 220 determines a division point based on the similarity (S625). For example, the anonymization unit 220 determines the division point candidates “a3” and “b3” as the division points in the first process of S625.

Next, the anonymization unit 220 executes the refinement step for each step anonymization view at the determined division point (S606).

Next, the anonymization unit 220 determines whether or not l-diversity of a plurality of views is satisfied when each stage anonymized view that has been subjected to the refinement stage is matched and analyzed ( S607).

FIG. 22 is a diagram for explaining the first operation of S606 and S607. In the example illustrated in FIG. 22, in S606, the anonymization unit 220 executes the refinement stage on the view 831 and the view 832 at the division points “a3” and “b3”, and generates the view 871 and the view 872. Next, the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 871 and the view 872 are matched.

Referring to FIG. 22, the confirmation content 873 includes attribute values “SA” of the sensitive attribute that can be inferred in the view 871, attribute values of “SA” in the sensitive attribute that can be inferred in the view 872, and their corresponding values. Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 871 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 872 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value. Then, referring to FIG. 22, in any record, the number of sensitive attribute attribute values that can be estimated by matching is two or more. Therefore, the anonymization unit 220 determines that “l-diversity of multiple views” with l = 2 is satisfied.

Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). . If the candidate for division point exists (YES in S608), the process returns to S624.

Next, the similarity calculation unit 230 calculates the similarity of each of the stage anonymized views (view 841 and view 842) (S624).

Next, the anonymization unit 220 determines a division point based on the similarity (S625).

FIG. 23 is a diagram for explaining the second operation of S606 and S607. In the example illustrated in FIG. 23, in S606, the anonymization unit 220 performs the refinement step on the view 871 and the view 872 at the division point determined in the second S625, and generates the view 881 and the view 882. . Next, the anonymization unit 220 extracts a guessable sensitive attribute value for each record corresponding to each person when the view 881 and the view 882 are matched.

Referring to FIG. 23, the confirmation content 883 includes the attribute value “SA” of the sensitive attribute that can be inferred in the view 881, the attribute value of “SA” in the sensitive attribute that can be inferred in the view 882, and their corresponding values. Indicates the intersection. Here, the intersection of the attribute value of the sensitive attribute “SA” that can be estimated in the view 881 and the attribute value of the sensitive attribute “SA” that can be estimated in the view 882 is “SA” that is a sensitive attribute that can be estimated. ”Attribute value. Referring to FIG. 23, in any record, the number of sensitive attribute attribute values that can be estimated by matching is two or more. Therefore, the anonymization unit 220 determines that “l-diversity of multiple views” with l = 2 is satisfied.

Next, when l-diversity of a plurality of views is satisfied (YES in S607), the anonymization unit 220 determines whether there is a division point candidate in any stage anonymization view (S608). .

Next, when the division point candidate does not exist (NO in S608), the anonymization unit 220 anonymizes each stage anonymization view (second stage stage anonymization view) for which the detailing stage has been executed. A view is output (S610). Here, the anonymization unit 220 outputs the view 881 and the view 882.

If the l-diversity of the plurality of views is not satisfied (NO in S607), the anonymization unit 220 displays each stage anonymized view that has been subjected to the refinement stage, and each stage prior to the execution of the refinement stage. The anonymized view is returned to the anonymized view (the first stage anonymized view). Next, the anonymization unit 220 outputs the anonymized view (S609).

The anonymization view output from the anonymization device 200 according to the present embodiment based on the personal information 810 and the pattern 804 has a smaller amount of information loss than the anonymization view output from the anonymization device 100 according to the first embodiment. . Specifically, each of the view 881 and the view 882 has one more division than the view 841 and the view 842.

The reason why this difference occurs is as follows.

For the sensitive attribute value candidates (SA combinations) that can be inferred for a specific individual, the l-diversity of multiple views decreases as the product set of SA combinations when matching each stage anonymized view decreases.

In the anonymization device 100 of the first embodiment, the division point at the detailing stage is not necessarily the optimum division point, and the product set of SA combinations when matching each stage anonymization view is relatively There may be a division point that becomes smaller. Therefore, the anonymization device 100 cannot satisfy the l-diversity of the plurality of views in the course of repeating the detailing stage. As a result, compared with the anonymization device 200, the anonymization device 100 reduces the number of divisions and relatively increases information loss.

The anonymization apparatus 200 according to the present embodiment executes the refinement step so that the SA combinations that can be inferred from each step anonymization view are similar for each individual. Specifically, the anonymization apparatus 200 performs the division at the division points where the SA combinations of the equivalent classes between the respective stages of anonymized views are similar for each record corresponding to each individual. Therefore, the anonymization device 200 can execute more detailed steps. As a result, the anonymization device 200 can output an anonymization view that has smaller information loss and satisfies the required anonymity.

FIG. 24 shows a view 893 that is anonymized by related technology with the entire personal information 810 as one acquired view. As shown in FIG. 24, the view 893 is obtained by dividing the personal information 810 into two parts. The information loss amounts of “QI_1” and “QI_2” in the view 893 are “25” and “49”, respectively.

On the other hand, the information loss amounts of “QI_1” of the view 841 and “QI_2” of the view 842, which are outputs of the present embodiment, are “17” and “35”, respectively. That is, the anonymization according to the present embodiment can reduce the amount of information loss compared with the anonymization according to the related technology.

The effect of this embodiment described above is that, in addition to the effect of the first embodiment, the amount of information loss of the anonymized view to be output can be further reduced.

The reason is that the anonymization unit 220 determines the division point based on the similarity of the SA combination of the equivalent class calculated by the similarity calculation unit 230.

<<< Modification of Second Embodiment >>>
When calculating the similarity, the similarity calculation unit 230 calculates the similarity for the division point candidate adjacent to the average value of the quasi-identifiers of the equivalent classes to be divided. In this case, the similarity calculation unit 230 may calculate the similarity with respect to the division point candidate adjacent to the median point of the quasi-identifier of the equivalent class to be divided.

The number of adjacent division point candidates may be a preset number. Further, the number of adjacent division point candidates may be determined based on, for example, the set number of acquired views (the number of patterns 804).

For example, when the number of adjacent division point candidates is two acquisition views, the number of adjacent division point candidates in each of the views 850 corresponding to the acquisition views is 10. To do. In this case, for example, when the number of acquired views is four, the similarity calculation unit 230 sets the number of adjacent division point candidates to five. That is, since the number of acquired views is double, the similarity calculation unit 230 sets the number of division point candidates to 0.5 (the reciprocal of 2).

This modification has the effect that the processing load in the anonymization device 200 can be reduced first.

Second, this modification has an effect that the processing load can be controlled based on the number of acquired views. In other words, the present modification has an effect of preventing the tendency that the amount of calculation increases as the number of acquired views increases.
The second embodiment may be modified in the same manner as the first to third modifications of the first embodiment.

<<< Third Embodiment >>>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.

FIG. 25 is a block diagram showing a configuration of the anonymization device 300 according to the third exemplary embodiment of the present invention.

25, the anonymization device 200 in the present embodiment includes a view acquisition unit 310 instead of the view acquisition unit 110, as compared to the anonymization device 100 of the first embodiment.

=== View Acquisition Unit 310 ===
The view acquisition unit 310 acquires an acquired view based on the mutual relationship between attributes included in certain personal information (for example, personal information 810 illustrated in FIG. 2). For example, the view acquisition unit 310 acquires an acquisition view that includes an attribute having a strong correlation between attributes.

”" Acquisition of acquisition view including attributes with strong correlation between attributes "will be described with a specific example.

For example, the personal information includes four attributes “QI — 1”, “QI — 2”, “QI — 3”, and “SA”.

Suppose the correlation between these attributes is as follows. The correlation coefficient between “QI_1” and “SA” is “0.8”, the correlation coefficient between “QI_2” and “SA” is “0.1”, and the correlation coefficient between “QI — 3” and “SA”. Is “0.7”.

Also, it is assumed that the determination threshold is “0.5”.

In this case, the view acquisition unit 310 acquires an acquired view based on the strong correlation patterns of “QI_1, SA” and “QI_3, SA”. Here, the structure of the strong correlation pattern is the same as the structure of the pattern 804 shown in FIG.

Note that the view acquisition unit 310 may determine a strong correlation pattern by using an association rule analyzer (Association Rule Mining). The correlation rule analysis can determine the strength of correlation between a plurality of attributes.

For example, as a result of detecting an association rule having a support of 10% and a confidence of 80% or more, “(QI_1 = a) → (SA = X)” and “(QI_1 = b, QI_3 = c) → (SA = Y ) ”Is calculated.

In this case, the view acquisition unit 310 acquires an acquisition view using “QI_1, SA” and “QI_1, QI_3, SA” as strong correlation patterns.

The effect of the present embodiment described above is that, in addition to the effect of the first embodiment, it is not necessary for a person to determine the pattern 804 in advance.

The reason is that the view acquisition unit 310 acquires the acquired view based on the mutual relationship of attributes included in the personal information.

<<< First Modification of Third Embodiment >>>
The view acquisition unit 310 acquires a new acquired view based on the learning result of the acquired acquisition view. For example, the view acquisition unit 310 stores a previously input combination of patterns 804 as illustrated in FIG. 5 and learns correlations between strong correlation patterns included in the plurality of combinations of patterns 804. Next, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern having a strong correlation with the newly input pattern 804 based on the learned result.

“The acquisition view is acquired with a strong correlation pattern with strong correlation” will be described with a specific example.

For example, the personal information includes six attributes “QI — 1”, “QI — 2”, “QI — 3”, “QI — 4”, “QI — 5”, and “SA”.

In addition, the following four patterns 804 are stored.

Combination of first pattern 804: {QI_1, SA}, {QI_2, SA}.

Combination of second patterns 804: {QI_1, SA}, {QI_2, SA}, {QI_3, SA}.

Combination of third patterns 804: {QI_1, SA}, {QI_2, SA}, {QI_5, SA}.

Combination of the fourth pattern 804: {QI_1, SA}, {QI_4, SA}.

The view acquisition unit 310 learns a combination of these patterns 804 by association rule mining, and obtains the following learning result.

“{QI_1, SA} → {QI_2, SA}” is “support = 100%” and “confidence = 75%”. Here, “support” is the ratio of the pattern 804 including {QI_1, SA} among all the stored patterns 804. “Confidence” is a ratio including {QI_2, SA} in the pattern 804 including {QI_1, SA}.

“{QI_2, SA} → {QI_1, SA}” is “support = 75%” and “confidence = 100%”.

When {QI_1, SA} is newly input as the fifth pattern 804, the view acquisition unit 310 also acquires the acquired view of {QI_2, SA} based on the learning result.

Note that the view acquisition unit 310 may learn the stored pattern 804 by classification tree learning or the like.

Further, the view acquisition unit 310 may acquire an acquired view based on the learning result without inputting a new pattern 804. In this case, the view acquisition unit 310 may determine a strong correlation pattern based on a determination threshold for support and confidence.

As described above, the anonymization device 300 can generate an acquisition view that can be easily used in combination with a specific acquisition view, for example, by learning a pattern 804 input by a plurality of people. In other words, anonymization suitable for analysis by “pattern” that is often performed by many people is possible. The reason is that the acquired view is acquired based on the result of learning the stored pattern 804.

Also, anonymization can be executed without a person determining a strong correlation pattern. The reason is that the view acquisition unit 310 acquires the acquired view based on the learning result without inputting the new pattern 804.

<<< Second Modification of Third Embodiment >>>
The view acquisition unit 310 acquires a new acquisition view that further includes attributes that are not included in the acquired acquisition view in the acquired acquisition view. Specifically, the view acquisition unit 310 further acquires an acquired view in which each attribute not included in the acquired acquired view is added to the acquired acquired view.

”" Acquisition of acquisition view with attributes added "will be described with a specific example.

Suppose that the pattern 804 shown in FIG.

The view acquisition unit 310 acquires an acquired view using {QI_1, SA} and {QI_2, SA}, which are strong correlation patterns included in the combination of the patterns 804. Further, the view acquisition unit 310 acquires an acquired view with a strong correlation pattern obtained by adding one attribute to each of the strong correlation patterns included in the combination of the patterns 804. The strong correlation pattern obtained by adding one attribute is {QI_1, QI_2, SA}, {QI_1, QI_3, SA}, {QI_1, QI_4, SA}, {QI_1, QI_5, SA}, {QI_2, QI_3, SA. }, {QI_2, QI_4, SA}, {QI_2, QI_5, SA}. The added attributes are all attributes that are not included in the strong correlation pattern included in the combination of the patterns 804. The attribute to be added may be an arbitrary attribute that is not included in the strong correlation pattern included in the combination of the patterns 804.

As described above, the anonymization device 300 can increase the accuracy of analysis other than the input strong correlation pattern. In other words, if there is a possibility that an analysis with a strong correlation pattern other than the initially input strong correlation pattern may be required, the analyst can also analyze the multi-view l- in addition to the attributes included in the input strong correlation pattern. Anonymized views satisfying diversity can be obtained. The reason is that the view acquisition unit 310 generates an acquisition view in which an attribute is added to the input strong correlation pattern.

Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.

In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.

The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.

In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.

Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.

Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.

As mentioned above, although this invention was demonstrated with reference to each embodiment, this invention is not limited to the said embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2013-148137 filed on July 17, 2013, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 100 Anonymization apparatus 110 View acquisition part 120 Anonymization part 200 Anonymization apparatus 220 Anonymization part 230 Similarity calculation part 300 Anonymization apparatus 310 View acquisition part 700 Computer 701 CPU
702 Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 804 Pattern 810 Personal information 815 Identifier 816 Semi-identifier 817 Semi-identifier 818 Sensitive attribute 821 View 822 View 831 View 832 View 841 View 852 View 851 View 851 871 View 872 View 873 Confirmation Content 881 View 882 View 883 Confirmation Content 893 View 910 Personal Information 915 Identifier 916 Semi-identifier 918 Semi-identifier 918 Sensitive attribute 920 Anonymized information 926 Semi-identifier 927 Semi-identifier 928 Sensitive attribute 921 Anonymized information 931 932 Generalized view 933 Split view 934 Split view 35 split view 941 Personal information 942 first view 943 second view 944 first anonymous view 945 second anonymizing view 951 attribute value 952 attribute values 953 intersection

Claims

A view acquisition means for acquiring a plurality of views of personal information including a plurality of attributes of each of a plurality of individuals;
An anonymization means for outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.
The information processing apparatus according to claim 1, wherein the anonymization unit performs the anonymization so that an arbitrary combination of the anonymization views satisfies a required anonymity.
The anonymization means performs the refinement of the semi-identifier step by step from a state in which the semi-identifier included in each of the views is most generalized, and the one of the refinements in each of the views. The information processing apparatus according to claim 1, wherein the steps are sequentially executed.
The anonymization means executes the next step when an arbitrary combination of the views subjected to the step satisfies the required anonymity, and an arbitrary combination of the views subjected to the step The information processing apparatus according to claim 3, wherein the anonymized view is output when the required anonymity is not satisfied.
When the view is divided by the view division point candidates at the time of the detailing, each of the division point candidates corresponding to each of the division point candidates is based on a combination of attribute values of the sensitive attributes included in each equivalence class. It further includes similarity calculation means for calculating the similarity between views,
The information processing apparatus according to claim 3, wherein the anonymization unit performs the detailing based on the similarity.
The similarity calculation means, when calculating the similarity, corresponds only to the division point candidates adjacent to either one of the average value and the median point of the quasi-identifiers of the equivalent class to be divided 6. The information processing apparatus according to claim 5, wherein the similarity is calculated.
The said anonymization means performs one said step of the said refinement | determination in each of the said views sequentially based on the priority corresponding to each of the said views. The information processing apparatus according to item.
The anonymization means calculates the priority based on a combination of quasi-identifiers included in each of the views,
The information processing apparatus according to claim 7.
The information processing apparatus according to any one of claims 1 to 8, wherein the view acquisition unit acquires the view based on a mutual relationship between the attributes included in the personal information.
The information processing apparatus according to any one of claims 1 to 9, wherein the view acquisition unit acquires a new view based on a learning result of the acquired view.
The said view acquisition means acquires the said new view which further contains the said attribute which is not contained in the acquired view in the acquired view. The one of Claim 1 thru | or 10 characterized by the above-mentioned. Information processing device.
Computer
Retrieve multiple views of personal information, including multiple attributes for each of multiple individuals,
An anonymization method for outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.
Retrieve multiple views of personal information, including multiple attributes for each of multiple individuals,
A computer-readable non-transitory recording medium storing a program for causing a computer to execute a process of outputting an anonymized view obtained by executing anonymization of the plurality of views in parallel.