WO2013121739A1 - 匿名化装置及び匿名化方法 - Google Patents
匿名化装置及び匿名化方法 Download PDFInfo
- Publication number
- WO2013121739A1 WO2013121739A1 PCT/JP2013/000639 JP2013000639W WO2013121739A1 WO 2013121739 A1 WO2013121739 A1 WO 2013121739A1 JP 2013000639 W JP2013000639 W JP 2013000639W WO 2013121739 A1 WO2013121739 A1 WO 2013121739A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- anonymization
- anonymity
- hospital
- provider
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09C—CIPHERING OR DECIPHERING APPARATUS FOR CRYPTOGRAPHIC OR OTHER PURPOSES INVOLVING THE NEED FOR SECRECY
- G09C1/00—Apparatus or methods whereby a given sequence of signs, e.g. an intelligible text, is transformed into an unintelligible sequence of signs by transposing the signs or groups of signs or by replacing them by others according to a predetermined system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0407—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/42—Anonymization, e.g. involving pseudonyms
Definitions
- the present invention relates to anonymization technology.
- Statistic data regarding data including personal information such as age, gender or address is used.
- a technique of anonymizing using data abstraction is known so that an individual is not identified from the published data when the data is disclosed.
- Anonymization is a technique for processing data so that each record does not know which individual data in the collection of personal information.
- k anonymity is anonymization that ensures that the same data as each individual's data is not narrowed down to less than k.
- k anonymity is anonymization that ensures that the same data as each individual's data is not narrowed down to less than k.
- a group of attributes that can specify an individual based on the combination is called “quasi-identifier”. Basically, k anonymity guarantees anonymity based on generalizing attribute values included in this quasi-identifier and making k or more records sharing the quasi-identifier.
- Patent Document 1 discloses an information processing apparatus capable of determining anonymization as an entire item based on a comparison between a minimum value and a threshold value when grouped in individual items of collected data. .
- the anonymization item storage unit stores an anonymization classification for each item.
- the anonymization processing unit designates an anonymization classification for each item for the data recorded in the first database. And an anonymization processing part groups data based on the anonymization classification. And an anonymization processing part calculates the minimum data number after grouping for every item, and anonymizes based on the calculation result. Then, the anonymization processing unit records the result of the anonymization process in the second database.
- the anonymization determination unit determines whether there is an item that falls below a predetermined threshold with respect to the result of the anonymization process recorded in the second database.
- Patent Document 1 may be able to identify personal information provided by another provider based on a comparison between data existing at the information provider and data that has been anonymized. . That is, the technique described in Patent Document 1 has a problem that anonymity cannot always be maintained.
- the reason is as follows.
- the data provider can identify the data provided by itself in the anonymized data. For this reason, the data provider can make the anonymity of the data of other providers lower than the predetermined index except for the data provided by the identified self.
- One of the objects of the present invention is to provide an anonymization device and an anonymization method capable of maintaining anonymity of data for any provider who provides data.
- the anonymization device relates to data obtained by combining records acquired from a plurality of providers, and provides data to any provider that provides a record that is a part of the data.
- the determination means which determines whether anonymity is maintained, and the anonymization means which anonymize data based on the determination result of the anonymity of the determination means are included.
- the anonymization method relates to data obtained by combining records acquired from a plurality of providers, and provides data to any provider that provides a record that is a part of the data. Whether or not anonymity is maintained is determined, and the data is anonymized based on the determination result.
- the program according to the present invention relates to data obtained by combining records acquired from a plurality of providers, and provides data anonymous to any provider that provides a record that is a part of the data.
- the computer executes a process for determining whether or not the data is maintained and a process for anonymizing the data based on the determination result.
- An example of the effect of the present invention is that the anonymity of data can be maintained for any provider that provided the data.
- FIG. 1 is a diagram for explaining the background of the present invention.
- FIG. 2 is a diagram illustrating data held by the hospital X.
- FIG. 3 is a diagram showing data held by the hospital Y.
- FIG. 4 is a diagram illustrating data held by the operator Z.
- FIG. 5 is a diagram showing a state where the data shown in FIG. 4 is divided into a plurality of groups based on the anonymization technique related to the present invention.
- FIG. 6 is a diagram illustrating data in which a part of the data illustrated in FIG. 5 is integrated.
- FIG. 7 is a diagram showing the anonymized combined data that is finally generated based on the anonymization technique related to the present invention.
- FIG. 1 is a diagram for explaining the background of the present invention.
- FIG. 2 is a diagram illustrating data held by the hospital X.
- FIG. 3 is a diagram showing data held by the hospital Y.
- FIG. 4 is a diagram illustrating data held by the operator Z.
- FIG. 5 is
- FIG. 8 is a block diagram illustrating a configuration of the anonymization device 10 according to the first embodiment.
- FIG. 9 is a flowchart showing the operation of the anonymization device 10 according to the first exemplary embodiment of the present invention.
- FIG. 10 is a diagram illustrating an example of the combined data stored in the storage unit 13.
- FIG. 11 is a diagram illustrating an example of combined data divided into a plurality of groups based on the value of the quasi-identifier.
- FIG. 12 is a diagram illustrating an example of data after the anonymization unit 12 is anonymized.
- FIG. 13 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 10.
- FIG. 14 is a block diagram illustrating a configuration of the anonymization device 20 according to the second embodiment.
- FIG. 15 is a flowchart showing an operation of the anonymization device 20 according to the second exemplary embodiment of the present invention.
- FIG. 16 is a diagram illustrating an example of combined data to which three types of provider information “hospital X”, “hospital Y”, and “hospital W” are assigned.
- FIG. 17 is a diagram illustrating an example of a state of being divided into a plurality of groups based on the value of the data quasi-identifier illustrated in FIG.
- FIG. 18 is a diagram illustrating an example of a state in which the data illustrated in FIG. 17 is integrated.
- FIG. 19 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 20.
- FIG. 20 is a diagram illustrating anonymized data when the collusion of a provider in another variation is considered.
- FIG. 21 is a block diagram illustrating a configuration of the anonymization device 20 according to the third embodiment.
- FIG. 22 is a flowchart showing the operation of the anonymization device 30 according to the third exemplary embodiment of the present invention.
- FIG. 23 is a diagram illustrating an example of combined data in which different anonymity level thresholds are set for each type of provider information.
- FIG. 24 is a diagram illustrating an example of a state in which the data illustrated in FIG. 23 is divided into a plurality of groups based on the value of the quasi-identifier.
- FIG. 25 is a diagram illustrating an example of a state in which the data illustrated in FIG. 24 is integrated.
- FIG. 26 is a diagram illustrating an example of a state in which the data illustrated in FIG. 25 is integrated.
- FIG. 27 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 30.
- FIG. 28 is a block diagram illustrating a configuration of the anonymization device 40 according to the fourth embodiment.
- FIG. 29 is a flowchart showing an operation of the anonymization device 40 according to the fourth exemplary embodiment of the present invention.
- FIG. 30 is a block diagram illustrating an example of a hardware configuration of the anonymization device 10 according to the first embodiment.
- FIG. 1 is a diagram for explaining the background of the present invention.
- an operator Z which is an intermediary organization, receives data from hospitals X and Y, which are data providing organizations, and combines the data to use the data.
- hospitals X and Y which are data providing organizations
- the business operator Z that has received the provision of the two data combines the two data and performs anonymization processing to ensure the individual anonymity of the combined data.
- the data to be subjected to anonymization processing generally includes an ID (Identification) for identifying a user, sensitive information, and a quasi-identifier.
- Sensitive information is information that you do not want to be known to others while being associated with an individual.
- a semi-identifier is information that cannot identify an individual with a single piece of information, but may identify an individual based on a combination with other information.
- the quasi-identifier value should be a unified abstraction in all records in the sense of preventing identification of individuals.
- the value of the quasi-identifier is preferably individual specific.
- the anonymization process is a process that reconciles the purpose of “preventing identification of individuals” and the purpose of “use of combined data”.
- Anonymization processing includes top-down processing and bottom-up processing.
- the top-down anonymization process is “data division process”
- the bottom-up anonymization process is “data integration process”.
- Provider Z collects personal information held by two different hospitals, Hospital X and Hospital Y, and combines both data while ensuring anonymity.
- the personal information held by the hospital X and the hospital Y is information including “No.”, “age”, and “disease code”.
- the “disease code” that enables identification of individual illnesses is sensitive information.
- Sensitive information is information that is not desired to be changed in the abstraction process because it is used for analyzing published data.
- the abstraction process is a process for converting data attributes or attribute values into data with a wider range of attributes or attribute values.
- the attribute is, for example, a type such as age, sex, and address.
- An attribute value is a specific content or value of an attribute.
- the abstraction target data is a specific value
- a process of converting the value into numerical range data (ambiguous data) including the value is an example of the abstraction process.
- Personal information other than sensitive information shall be quasi-identifiers.
- “age” is a quasi-identifier.
- the anonymization technique related to the present invention determines whether or not anonymity is maintained based on whether or not a predetermined anonymity index is satisfied.
- k anonymity is an index for requesting data having the same value of k or more quasi-identifiers. In the following description, it is assumed that two anonymity is required. Further, it is assumed that the anonymization process uses a bottom-up process.
- FIG. 2 is a diagram showing data held by hospital X. As shown in FIG. 2, the hospital X holds personal information of a total of seven users whose user IDs are user1 to user7.
- FIG. 3 is a diagram showing data held by the hospital Y. As shown in FIG. 3, the hospital Y holds the personal information of a total of six people whose user IDs are user8 to user13.
- FIG. 4 is a diagram showing data held by the operator Z. As shown in FIG. 4, the operator Z acquires the data shown in FIG. 2 from the hospital X and the data shown in FIG. 3 from the hospital Y, and combines and holds both data. The data shown in FIG. 4 is arranged in order of age.
- the anonymization technique related to the present invention divides the combined data shown in FIG. 4 into a plurality of groups based on “age” that is a quasi-identifier.
- FIG. 5 is a diagram showing a state where the data shown in FIG. 4 is divided into a plurality of groups based on the anonymization technique related to the present invention.
- the group whose “age” is “20” includes four users ⁇ user1, user2, user3, user8 ⁇ , and therefore satisfies 2 anonymity.
- a group whose “age” is “23” and “24” satisfies two anonymity.
- the groups whose “age” is “21” and “22” are ⁇ user9 ⁇ and ⁇ user4 ⁇ , respectively. Therefore, the bottom-up anonymization technology related to the present invention integrates, for example, groups whose “age” is “21” and “22”.
- FIG. 6 is a diagram showing data obtained by integrating a part of the data shown in FIG. As shown in FIG. 6, the groups whose “age” is “21” and “22” are integrated into the groups whose “age” is “21-22”. This integrated group satisfies 2 anonymity.
- FIG. 7 is a diagram showing the finally generated anonymized combined data based on the anonymization technique related to the present invention.
- the anonymization technology related to the present invention anonymizes the data held by the operator Z so that all groups satisfy 2 anonymity.
- the data provider may be able to identify personal information existing at other providers. That is, the data shown in FIG. 7 may not always be kept anonymous.
- the business provider hospital X and hospital Y
- hospital X and hospital Y The business provider (hospital X and hospital Y) that provided the data can identify the data provided by itself in the anonymized data. For this reason, the data provider can reduce the anonymity of the data more than the determined index.
- Hospital X compares the data shown in FIG. 2 provided by itself with the combined data shown in FIG. Based on the comparison, Hospital X can specify that the data relating to the user whose “disease code” is “F” in the data belonging to the group whose “age” is “21-22” is the data provided by itself. .
- hospital Y can also specify data. Therefore, the group of “age” “21-22” in FIG. 7 cannot satisfy two anonymity for hospital X and hospital Y. Therefore, for example, if Hospital X knows “No.” (here “user9”) of the user whose “age” is “21” included in the data of Hospital Y, Hospital X will be an anonymized combination. Based on the data, the “disease code” of “user9” can be identified as “E”.
- the anonymization technique related to the present invention has a problem that the anonymization index cannot be satisfied.
- FIG. 8 is a block diagram showing an example of the configuration of the anonymization device 10 according to the first embodiment.
- the anonymization device 10 is, for example, a device held by the operator Z in FIG.
- the anonymization device 10 includes a determination unit 11, an anonymization unit 12, and a storage unit 13.
- FIG. 1 there are two sources of information acquired by the anonymization device 10, for example, hospital X and hospital Y.
- hospital X and hospital Y there are two sources of information acquired by the anonymization device 10, for example, hospital X and hospital Y.
- the anonymization process executed by the anonymization unit 12 included in the anonymization apparatus 10 may be an existing method, and may be a top-down process or a bottom-up process. Therefore, in the following description of the present embodiment, as an example, the anonymization unit 12 will be described as processing bottom-up anonymization.
- the anonymization device 10 stores the combined data in the storage unit 13 in advance.
- the combined data is data obtained by combining data acquired by the anonymization device 10 from a plurality of providers.
- the combined data is a set of records in which user attribute information that is attribute information related to a user and provider information that is information indicating a provider of the user attribute information are associated with each other.
- the anonymization device 10 stores combined data, which is a combination of data acquired from the hospital X and the hospital Y, in the storage unit 13.
- the anonymization device 10 receives an instruction from the user of the anonymization device 10 and starts anonymization of the combined data.
- the anonymization device 10 may be configured such that the user instructs the determination unit 11 of the anonymization device 10 to start anonymization processing.
- the determination unit 11 When the determination unit 11 receives a start instruction from the user, the determination unit 11 acquires combined data from the storage unit 13.
- the determination unit 11 determines whether or not the anonymity of the data is maintained for any provider of the data regarding the combined data acquired from the storage unit 13.
- any provider refers to hospital X and hospital Y. Therefore, specifically, the determination unit 11 determines whether the anonymity is maintained even when the hospital X and the hospital Y compare the data held by the hospital X and the hospital Y with the combined data. Note that, as will be described later, the determination unit 11 determines whether the data output from the anonymization unit 12 is kept anonymous even when viewed from any source of the data.
- the determination unit 11 determines that there is a group in which anonymity is not maintained (for example, k anonymity is not satisfied), the determination unit 11 outputs the combined data to the anonymization unit 12.
- the anonymization unit 12 When the anonymization unit 12 receives the combined data from the determination unit 11, the anonymization unit 12 anonymizes a group in which the anonymity included in the received combined data is not maintained. Since the anonymization process of this embodiment is a bottom-up process, the anonymization part 12 integrates the group in which the anonymity contained in combined data is not maintained.
- the determination unit 11 outputs the combined data to the anonymization unit 12 when there is a group whose anonymity is not maintained in the combined data after the anonymization unit 12 is anonymized.
- the anonymization unit 12 receives the combined data and anonymizes it. That is, the determination unit 11 and the anonymization unit 12 repeat the data anonymization process of the anonymization unit 12 until the determination unit 11 determines that there is no group in which anonymization is maintained.
- the determination unit 11 determines that the anonymity of all the groups of the combined data is maintained, the determination unit 11 outputs the anonymized combined data to the outside.
- the outside is, for example, the business operator V shown in FIG. That is, the determination unit 11 outputs the anonymized combined data to, for example, the operator V illustrated in FIG.
- FIG. 9 is a flowchart showing the operation of the anonymization device 10 according to the first embodiment.
- the determination unit 11 of the anonymization device 10 acquires combined data to which provider information is assigned from the storage unit 13 (Step S ⁇ b> 1).
- storage part 13 is the information (information indicating whether it acquired from the hospital X, the hospital Y, etc.) which acquired the data acquired from several different providers (for example, the hospital X and the hospital Y). ) In advance.
- the determination unit 11 divides the acquired combined data into a plurality of groups, with a plurality of records having the same quasi-identifier value as one group (step S2).
- the determination unit 11 determines whether or not the anonymity of the data is maintained for any of the data providers (for example, “Hospital X” and “Hospital Y”) regarding the combined data acquired from the storage unit 13. Determine (step S3).
- the determination unit 11 determines as follows.
- the determination unit 11 selects one group from groups having the same quasi-identifier (for example, “age”) value, and assumes a group excluding records including one type of provider information (for example, “hospital X”). To do. Then, the determination unit 11 determines whether or not the number of records included in the group is equal to or greater than a threshold value that is an anonymity index (for example, “2 anonymity”) (for example, “2 or more”). Determine.
- a threshold value that is an anonymity index (for example, “2 anonymity”) (for example, “2 or more”).
- the determination unit 11 performs the same determination in all groups.
- the determination unit 11 performs the same determination for all types of provider information (for example, “Hospital X” and “Hospital Y”).
- the determination part 11 determines whether the anonymity of combined data is maintained based on all the determinations.
- the determination unit 11 selects the next process based on the determination in step S3 (step S4).
- the judgment part 11 makes anonymous the combination data used as the object of judgment processing. Is output as combined data that has been processed.
- the determination unit 11 instructs the anonymization unit 12 to integrate the groups.
- the anonymization unit 12 integrates groups in which anonymity is not maintained (step S5).
- the group integration process of the anonymization unit 12 is not particularly limited.
- the anonymization unit 12 may focus on an arbitrary quasi-identifier in a group that does not maintain anonymity, and may abstract by integrating the groups having the closest centroid distance in the data space.
- step S5 the determination part 11 will determine whether anonymity is maintained with respect to any provider about the group which the anonymization part 12 integrated like step S4 (step S6). More specifically, the determination unit 11 determines whether or not the number of records obtained by subtracting the provider records is greater than or equal to a threshold value that is an anonymity index for each provider information of the integrated group.
- the determination unit 11 selects the next process based on the determination result (step S7).
- the determination unit 11 outputs the combined data subjected to the determination process as combined data that has been anonymized.
- the determination unit 11 instructs the anonymization unit 12 to integrate the groups again.
- the anonymization unit 12 again integrates groups in which anonymity is not maintained (step S5).
- the determination unit 11 and the anonymization unit 12 repeat step S5 to step S7 until all groups are equal to or greater than the threshold value.
- the anonymization device 10 is assumed to be owned by the operator Z. Further, the data provider is hospital X and hospital Y (see FIG. 1). Furthermore, it is assumed that the business unit Z acquires the data shown in FIG. 2 from the hospital X and the data shown in FIG. That is, the quasi-identifier is “age” information, and the sensitive information is “disease code” information. Furthermore, as for anonymity, the personal information table requires 2 anonymity.
- step S ⁇ b> 1 of FIG. 9 the determination unit 11 acquires combined data from the storage unit 13.
- FIG. 10 is a diagram illustrating an example of combined data stored in the storage unit 13.
- the storage unit 13 stores personal information together with information (provider information) indicating the provider of the data.
- the determination unit 11 acquires combined data to which provider information is assigned.
- the determination unit 11 divides the acquired combined data into a plurality of groups, with a plurality of records having the same quasi-identifier value as one group.
- FIG. 11 is a diagram showing an example of combined data divided into a plurality of groups based on the value of the quasi-identifier.
- the combined data is divided into five groups whose “age” is “20”, “21”, “22”, “23”, and “24”, respectively.
- whether each group satisfies anonymity (OK) or not (NG) is displayed.
- the determination unit 11 removes a record including a certain provider information from records included in a group having the same quasi-identifier value. For example, the determination unit 11 excludes records of user1, user2, and user3 whose provider information is “hospital X” from the group whose “age” is “20”. The determination unit 11 determines the anonymity of the group whose “age” is “20” after removing the three records. The number of records of the group whose “age” is “20” after removing the three records is one (user8 record). Therefore, the determination unit 11 determines that this group does not satisfy 2 anonymity (the number of records is not 2 or more). That is, the determination unit 11 determines that the group whose “age” is “20” does not maintain anonymity.
- the determination unit 11 determines for all types of provider information in all groups.
- the determination unit 11 determines that the groups whose “age” is “21”, “22”, and “23” do not maintain anonymity.
- the determination unit 11 determines that anonymity is maintained for any provider of the group whose “age” is “24”.
- the determination unit 11 determines that there is a group whose number of records is not 2 or more (there is a group that does not maintain anonymity) (No in step S4), the determination unit 11 instructs the anonymization unit 12 to integrate groups.
- the anonymization unit 12 integrates groups that do not satisfy anonymity in response to an instruction from the determination unit 11. For example, the anonymization unit 12 integrates the group “20” and the group “21” based on the closeness of the distance in the data space, the group “22” and the group “23”. To integrate. Note that the anonymization unit 12 may integrate the data in the storage unit 13. Or the anonymization part 12 receives the data of the group whose "age” is "20” and "21", and the group of "22” and "23” from the determination part 11, and integrates these groups. good.
- FIG. 12 is a diagram illustrating an example of data after the anonymization process of the anonymization unit 12.
- the anonymization unit 12 abstracts the value of “age” and integrates each group.
- the data shown in FIG. 12 is information that is to be determined again in step S6 of FIG.
- step S6 in FIG. 9 the determination unit 11 excludes the group in which “age” is “20-21” and the group in “22-23” except for the record “hospital X”. However, even if the record of “Hospital Y” is excluded, it is determined that 2 anonymity is satisfied. Therefore, the determination unit 11 outputs the combined data that is the current determination target as the anonymized combined data (step S7, Yes).
- FIG. 13 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 10.
- the anonymization device 10 determineation unit 11 deletes the provider information and the user ID (No.) from the combined data so that the provider is not leaked to the outside and the individual is not specified. To output anonymized combined data.
- the anonymization device 10 can maintain the anonymity of data for any data provider.
- the determination unit 11 determines whether or not anonymity is satisfied with data held by another provider except for data held by the provider for each provider. And when it does not satisfy anonymity, it is because the anonymization part 12 anonymizes data until it satisfies anonymity.
- the anonymization process of the anonymization part 12 was demonstrated as a bottom-up method, the anonymization part 12 may anonymize using a top-down process.
- the anonymization unit 12 divides the data rather than integrating the data.
- the anonymization unit 12 first collects data into one group, then determines a division point of the group, and divides the data into a plurality of groups.
- the number of records when the determination unit 11 excludes the data of each provider for all types of provider information in all divided groups is equal to or greater than a threshold that is an anonymity index. It is determined whether or not. And when it is more than a threshold value in all the groups, the determination part 11 requests the anonymization part 12 to divide.
- the anonymization unit 12 performs anonymization of top-down processing (data division). The determination unit 11 repeats this operation as long as all groups satisfy anonymity. And after anonymization of the anonymization part 12, when even one group which does not satisfy anonymity exists, the determination part 11 cancels the division
- the anonymization unit 12 may use the median value of each group of the combined data as a dividing point, or may determine the dividing point by other methods. For example, the anonymization unit 12 may determine the division point in consideration of the entropy amount. More specifically, based on entropy, the anonymization unit 12 may use, as a division point, a point with less bias of providers (for example, hospital X and hospital Y) regarding data belonging to a group after division.
- providers for example, hospital X and hospital Y
- the anonymization unit 12 may calculate the entropy in the group after the division by the following formula.
- the anonymization unit 12 calculates entropy in the group after the division as follows.
- the anonymization unit 12 calculates the entropy for two groups after division at an appropriate division candidate point.
- the anonymization part 12 may determine a division
- the two groups have a large mix of data within the two groups (the mix of “Hospital X” data and “Hospital Y” data). It means less bias.
- the anonymization unit 12 may use a division candidate point including a group having a maximum entropy value among all the division candidate points as a division point.
- the method for determining the division point using entropy is not limited to the above-described method, and other methods may be used.
- the determination unit 11 determines anonymity using k anonymity as an index.
- the determination unit 11 may determine not only k anonymity but also other indices, for example, l diversity.
- l Diversity is an index that requires l or more types of sensitive information within a group.
- the determination unit 11 determines in advance the number of types of sensitive information included in a group when a record including one type of provider information is excluded from a group having the same quasi-identifier value. It may be determined in all the groups for each type of provider information whether or not the threshold is a diversity index.
- the groups whose “age” is “20 to 21” and “22 to 23” have five types of “disease code” as sensitive information (A, B, C, D, E) and four types (F, A, B, C). Therefore, the groups whose “age” is “20-21” and “22-23” satisfy three diversity.
- the group whose “age” is “24” has two types of “disease codes” (C, D). Therefore, the group whose “age” is “24” does not satisfy the three diversity.
- the determination unit 11 determines that the three diversity is not satisfied, and instructs the anonymization unit 12 to perform anonymization.
- the anonymization unit 12 anonymizes data based on the determination result of the anonymity and diversity of the determination unit 11 described above. Note that the anonymization unit 12 may repeat the anonymization process. Further, the determination unit 11 may determine whether or not other indicators (for example, t approximation) are satisfied.
- the t-approximation is an index that requires the two groups to have a distance of the distribution of sensitive data and a distance of the distribution of all attributes t or less.
- each group includes both “hospital X” and “hospital Y” with respect to the provider information has been described.
- a group of data “Y” may be generated.
- the anonymization device 10 may set a group whose “age” is “22 to 23” and a group whose provider is all “hospital Y”.
- the data of the group “22 to 23” are all records of the hospital Y
- other providers hospital X
- hospital X cannot reduce the number of data in the group even if their own data is used. Therefore, other providers cannot identify individuals within the group.
- the anonymity for the hospital X does not decrease.
- the anonymization device 20 differs from the anonymization device 10 in that it operates so as to maintain anonymity even when a plurality of providers collide.
- FIG. 14 is a block diagram showing an example of the configuration of the anonymization device 20 according to the second embodiment.
- the anonymization device 20 includes a determination unit 21 instead of the determination unit 11 and includes a storage unit 23 instead of the storage unit 13 as compared with the anonymization device 10 in the first embodiment. It is different in point.
- the storage unit 23 stores data associated with three or more types of provider information.
- the anonymization device 20 receives data from the hospital W in addition to the hospital X and the hospital Y. And the memory
- the determination unit 21 collects two or more types of provider information as a kind of provider in a group including three or more types of provider information, and determines anonymity for each type of provider information.
- FIG. 15 is a flowchart showing the operation of the anonymization device 20 according to the second embodiment of the present invention. As shown in FIG. 15, the anonymization device 20 is different from the anonymization device 10 in that step S8 is performed instead of step S3, and step S9 is performed instead of step S6. Since the other steps are the same, detailed description is omitted.
- the determination unit 21 basically operates in the same manner as the determination unit 11.
- the determination unit 21 includes two or more types of provider information (for example, “Hospital Y” and “Hospital W”).
- the information obtained by combining is used as a kind of provider information.
- the determination part 21 determines anonymity for every kind of provider information (a kind of "hospital X” and a kind of combination of "hospital Y" and "hospital W").
- the determination unit 21 determines whether or not anonymity is maintained even when the hospital Y and the hospital W collide and share the data held by each.
- step S9 the determination unit 21 determines anonymity for the group integrated by the anonymization unit 12 in step S5, with two or more types of provider information as a kind of provider, as in step S8.
- the determination unit 21 acquires data from the storage unit 23.
- FIG. 16 is a diagram illustrating an example of combined data to which three types of provider information of “Hospital X”, “Hospital Y”, and “Hospital W” are assigned.
- the storage unit 23 is user14 acquired from the hospital W (“age” is “21”, “disease code” is “A”). And user15 (“age” is “22”, “disease code” is “B”).
- the determination unit 21 divides the data acquired from the storage unit 23 into a plurality of groups based on the quasi-identifier value.
- FIG. 17 is a diagram illustrating an example of a state in which the data illustrated in FIG. 16 is divided into a plurality of groups based on the value of the quasi-identifier.
- the combined data is divided into five groups whose “age” is “20”, “21”, “22”, “23” and “24”, respectively.
- FIG. 17 when two or more hospitals collide, whether each group satisfies anonymity (OK) or not (NG) is displayed.
- the determination unit 21 sets a group including three or more types of provider information as a determination target when collating. Further, it is assumed that the reliability of the hospital Y and the hospital W is low, and the determination unit 21 determines whether or not anonymity is satisfied with “hospital Y” and “hospital W” as a kind of provider. .
- the determination unit 21 determines anonymity when two types of provider information (hospital Y and hospital W) are used as a type of provider. However, in this embodiment, a group including three or more types of provider information is set as a determination target when collusion is performed.
- the provider information of each group includes “hospital X” and “hospital Y” (a group whose “age” is “20”), “hospital Y” and “hospital W” (a group whose “age” is “21”). ), “Hospital X” and “Hospital W” (a group whose “age” is “22”), “Hospital X” and “Hospital Y” (a group whose “age” is “23”), and “Hospital X” “Hospital Y” (a group whose “age” is “24”).
- the determination unit 21 does not process determination in consideration of collusion. That is, the determination unit 21 determines based on one type of provider information. As shown in FIG. 17, the determination result of the determination unit 21 includes a group that does not satisfy the threshold (No in step S4). Therefore, the anonymization device 10 proceeds to step S5.
- the anonymization part 12 integrates the group of NG among the data shown in FIG.
- FIG. 18 is a diagram illustrating an example of a state in which the data illustrated in FIG. 17 are integrated.
- step S9 of FIG. 15 the determination unit 21 records “hospital Y” and “hospital W” as a kind of provider from the group “age” of “20-21” and the group of “22-23”. And anonymity is determined. In this case, for the group whose “age” is “20-21”, three records for “hospital X” remain, and for the group “22-23”, two groups for “hospital X” remain. That is, both groups satisfy 2 anonymity. Therefore, the determination unit 21 determines that all groups satisfy anonymity. Therefore, the determination unit 21 outputs the combined data that is the determination target as combined data that has been anonymized (step S7, Yes).
- FIG. 19 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 20.
- the determination unit 21 may determine that anonymity is maintained when all combinations of the provider information satisfy anonymity. Specifically, for example, in the case of FIG. 18, the determination unit 21 determines a combination of “hospital X” and “hospital Y” in each group whose “age” is “20 to 21” and “22 to 23”. In addition, anonymity excluding records may be determined for the combination of “hospital X” and “hospital W” and the combination of “hospital Y” and “hospital W”.
- the provider information in the data to be anonymized may be three or more types, and the plurality of types of provider information may be a type of provider information.
- the anonymization device 20 can maintain the anonymity of data even when a plurality of providers providing the data collide.
- the determination unit 21 determines whether or not anonymity is satisfied by using a plurality of pieces of provider information as a kind of provider information. And when anonymity is not satisfy
- the anonymization device 30 is different from the anonymization device 10 and the anonymization device 20 in that different anonymization levels are set depending on the provider.
- FIG. 21 is a block diagram illustrating an example of the configuration of the anonymization device 30 according to the third embodiment.
- the anonymization device 30 is different from the anonymization device 10 and the anonymization device 20 in that a setting unit 34 is included. Further, the anonymization device 30 is different in that it includes a determination unit 31 instead of the determination unit 11 and the determination unit 21. Since the memory
- the setting unit 34 sets a threshold of anonymity level that differs for each type of provider information for the combined data stored in the storage unit 23. For example, the setting unit 34 may set an anonymity level according to the reliability of the provider. The setting unit 34 outputs combined data in which different anonymity levels are set depending on the type of the provider information to the determination unit 31.
- the setting unit 34 may accept an anonymity level setting instruction according to the type of provider information from the user.
- the anonymization device 30 may start the anonymization process when the setting unit 34 receives a setting instruction.
- the determination unit 31 determines whether or not the number of records when the records with the same provider information are the same is greater than or equal to a threshold (anonymization index) that differs depending on the type of provider information.
- FIG. 22 is a flowchart showing the operation of the anonymization device 30 according to the third embodiment of the present invention.
- the anonymization device 30 differs from the operation of the anonymization device 10 in that it includes step S10. Further, the operation of the anonymization device 30 is different from the operation of the anonymization device 10 in that step S11 is executed instead of step S3, and step S12 is executed instead of step S6.
- the setting unit 34 sets a threshold value of anonymity level for each type of provider information for the combined data stored in the storage unit 23.
- the setting unit 34 may set different anonymity levels for each type of provider information, or may set the same anonymity level threshold for a plurality of types of provider information.
- the determination part 11 is more than the threshold value of the anonymity level for every kind of provider information for every kind of provider information in each group except the provider information. It is determined whether or not there is.
- the storage unit 23 stores the combined data shown in FIG. 16 as in the second embodiment.
- the setting unit 34 acquires combined data from the storage unit 23. And the setting part 34 sets the threshold value of anonymity level for every kind of provision source information with respect to the coupling
- FIG. 23 is a diagram illustrating an example of combined data in which a threshold of anonymity level is set for each type of provider information.
- the setting unit 34 sets the anonymization level to “1” because hospital X has high reliability, and sets the anonymization level to “2” because hospital Y has normal reliability. Since hospital W has low reliability, the anonymization level is set to “3”.
- the determination unit 31 divides the data acquired from the storage unit 23 into a plurality of groups based on the quasi-identifier value.
- FIG. 24 is a diagram illustrating an example of a state in which the data illustrated in FIG. 23 is divided into a plurality of groups based on the value of the quasi-identifier. As shown in FIG. 24, the combined data is divided into five groups whose “age” is “20”, “21”, “22”, “23”, and “24”, respectively.
- the determination unit 31 determines whether each group satisfies the anonymity level for each type of the provider information regardless of the provider of the data will be described in detail.
- step S11 the determination unit 31 determines whether or not the number of records when the records with the same provider information are the same is greater than or equal to the threshold according to the type of the provider information.
- FIG. 24 whether the anonymity level for each type of provider information is satisfied for each group (OK) or not satisfied (NG) is displayed.
- the determination unit 31 determines that the group whose “age” is “20” satisfies anonymity. If “hospital Y” is excluded, three records of “hospital X” remain. The “anonymity level” of “Hospital Y” is “2”. Therefore, the determination unit 31 determines that anonymity is maintained for the group whose “age” is “20”.
- the groups whose “age” is “21” and “22” include “hospital W” whose reliability is low and whose “anonymity level” is “3”, respectively.
- the determination unit 31 determines that none of the groups whose “age” is “21” and “22” satisfy anonymity.
- the determination part 31 determines similarly about all the groups.
- the anonymization unit 12 integrates the NG group shown in FIG. 24.
- FIG. 25 is a diagram illustrating an example of a state in which the data illustrated in FIG. 24 is integrated.
- the anonymization unit 12 of the present embodiment first integrates “age” groups “21” and “22” among NG groups.
- the group “21 to 22” obtained by integrating the groups “age” “21” and “22” shown in FIG. 25 two records remain except for the record “hospital W”.
- the “anonymization level” of “hospital W” is “3”. Therefore, in step S12 of FIG. 22, the determination unit 31 determines that the group “21 to 22” does not yet satisfy anonymity (No in step S7).
- step S5 of FIG. 22 the anonymization unit 12 integrates the groups whose “age” is determined to be not satisfying anonymity “21-22”.
- the anonymization unit 12 integrates the NG group “age” of “21-22” and the “23” group.
- FIG. 26 is a diagram illustrating an example of a state in which the data illustrated in FIG. 25 is integrated.
- step S12 of FIG. 22 the determination unit 31 determines that the group “21 to 23” satisfies anonymity (step S7, Yes).
- FIG. 27 is a diagram illustrating an example of the anonymized combined data that is finally output by the anonymization device 30.
- the anonymization device 30 can maintain the anonymity of data corresponding to the reliability of a plurality of providers that provided the data.
- the setting unit 34 sets a threshold of anonymity level for each type of provider information for the combined data stored in the storage unit 23. And it is because the determination part 31 instruct
- the setting unit 34 has been described as setting the anonymity level in the data stored in the storage unit 23.
- the storage unit 23 may store combined data in which an anonymity level corresponding to the provider is set in advance.
- the setting unit 34 is not necessary.
- the determination part 31 may set an anonymity level according to a provider, before dividing
- the anonymization unit 12 may use weighted entropy corresponding to the reliability.
- the anonymization unit 12 may calculate the entropy in the group after the division by the following formula.
- Entropy ⁇ ⁇ W Class ⁇ P (Class) ⁇ log (P (Class)) ⁇
- W Class is a weighting coefficient corresponding to the reliability for each Class (for example, each of Hospital X, Hospital Y, and Hospital W).
- W Class is “Hospital X”, “W Class ” is “1”, and when “Class” is “Hospital Y”, “W Class ” is “2”, “Class”. Is “Hospital W”, “W Class ” is “3”.
- the anonymization device 40 is different from the anonymization device 10, the anonymization device 20, and the anonymization device 30 in that data is directly input to the determination unit 41 from the outside.
- FIG. 28 is a block diagram illustrating an example of the configuration of the anonymization device 40 according to the fourth embodiment.
- the anonymization device 40 is different from the anonymization device 10, the anonymization device 20, and the anonymization device 30 in that it does not have a storage unit.
- the determination unit 41 relates to data obtained by combining a plurality of records acquired from a plurality of providers, and whether the anonymity of the data is maintained even when viewed from any provider having a record that is a part of the combined data. Determine.
- the anonymization unit 42 repeats the data anonymization process based on the determination result of the anonymity of the determination unit 41.
- the determination unit 41 determines that anonymity is maintained for any provider with respect to the combined data, the determination unit 41 outputs the combined data to the outside as combined data that has been anonymized.
- FIG. 29 is a flowchart showing the operation of the anonymization device 40 according to the fourth exemplary embodiment of the present invention.
- the determination unit 41 of the anonymization device 40 receives data from the outside and generates combined data (step S11).
- the determination unit 41 receives, for example, the data shown in FIG. 2 from the hospital X and the data shown in FIG.
- the anonymization device 40 performs the same processing as the anonymization device 10 according to the first embodiment.
- the anonymization device 40 can maintain the anonymity of data for any provider that provided the data.
- the determination part 41 of the anonymization apparatus 40 determines anonymization similarly to the anonymization apparatus 10 of 1st Embodiment. This is because the determination unit 41 instructs the anonymization unit 12 to anonymize a group that does not satisfy the threshold.
- FIG. 30 is a block diagram illustrating an example of a hardware configuration of the anonymization device 10 according to the first embodiment.
- each part constituting the anonymization device 10 includes a CPU (Central Processing Unit) 1, a communication IF 2 (communication interface 2) for network connection, a memory 3, a storage device 4, and an input device. 5 and the output device 6 are realized.
- the configuration of the anonymization device 10 is not limited to the computer device shown in FIG.
- CPU1 operates an operating system, for example, and reads a program and data to the memory 3 from the recording medium not shown with which the memory
- the communication IF 2 connects the anonymization device 10 to another device (not shown) via a network.
- the anonymization device 10 may receive data of the hospital X and the hospital Y from an external device (not shown) via the communication IF 2 and store the data in the storage unit 13.
- the CPU 1 may download and execute a computer program or an external computer (not shown) connected to the communication network via the communication IF 2.
- the memory 3 is, for example, a D-RAM (Dynamic Random Read Memory) and temporarily stores programs and data.
- D-RAM Dynamic Random Read Memory
- the storage device 4 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, or a semiconductor memory, and records a computer program so that it can be read by a computer.
- the storage unit 13 may be realized using the storage device 4.
- the input device 5 is, for example, a mouse or a keyboard, and receives input from the user.
- the output device 6 is a display device such as a display, for example.
- the anonymization devices 20, 30, and 40 according to the second to fourth embodiments may also be configured using a computer device that includes the CPU 1 and the storage device 4 that stores the program.
- the block diagrams (FIGS. 8, 14, 21, and 28) used in the embodiments described so far show functional unit blocks, not hardware unit configurations. These functional blocks are implemented using any combination of hardware and software.
- the means for realizing the components of the anonymization device 10 is not particularly limited. That is, the anonymization device 10 may be realized based on one physically coupled device, or two or more physically separated devices may be connected by wire or wirelessly, and the plurality of devices may be based. It may be realized.
- the program of the present invention may be a program that causes a computer to execute the operations described in the above embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
まず、本発明の実施形態の理解を容易にするために、本発明の背景を説明する。
ここで、「Class」を「病院X」又は「病院Y」とする場合、P(Class)は、それぞれ次のようになる。
P(病院Y)=(分割後のグループ内での「病院Y」の数)/(分割後のグループ内での「病院X」及び「病院Y」の数の合計)
つまり、匿名化部12は、分割後のグループにおけるエントロピーを次のように計算する。
例えば、匿名化部12は、上記のエントロピーを、適当な分割候補点における分割後の2つのグループについて計算する。なお、匿名化部12は、分割候補点を、所定のルール(アルゴリズム)で決めても良く、周知の手法で決めても良い。そして、匿名化部12は、2つのグループのエントロピーを足した値(S)が最も大きくなる分割候補点を、分割点として決定すれば良い。
次に、本発明の第2実施形態に係る匿名化装置20について説明する。
次に、本発明の第3実施形態に係る匿名化装置30について説明する。匿名化装置30は、提供元に応じて異なる匿名化レベルが設定される点で、匿名化装置10及び匿名化装置20と異なる。
ここで、WClassを乗算する以外は、第1実施形態に示した関数と同様の関数でも良い。また、上記のエントロピーの値に基づいた分割点の決定方法も、第1実施形態で示した方法と同様でも良い。WClassは、Class毎の(例えば、病院X、病院Y及び病院Wそれぞれの)信頼度に応じた重み係数である。上述した例では、例えば、「Class」が「病院X」の場合、「WClass」が「1」、「Class」が「病院Y」の場合、「WClass」が「2」、「Class」が「病院W」の場合、「WClass」が「3」である。
次に、本発明の第4実施形態に係る匿名化装置40について説明する。
2 通信IF
3 メモリ
4 記憶装置
5 入力装置
6 出力装置
10、20、30、40 匿名化装置
11、21、31、41 判定部
12、42 匿名化部
13、23 記憶部
34 設定部
Claims (10)
- 複数の提供元から取得したレコードを結合したデータに関し、当該データの一部であるレコードを提供したいずれの提供元に対しても、データの匿名性が保たれているか否かを判定する判定手段と、
前記判定手段の匿名性の判定結果に基づいて、データを匿名化する匿名化手段と、
を含む匿名化装置。 - ユーザに関する属性情報であるユーザ属性情報と、当該ユーザ属性情報の提供元を示す情報である提供元情報とが関連付けられたレコードの結合である前記データを記憶する記憶手段をさらに含み、
前記判定手段は、
前記記憶手段が記憶するデータに関し、前記ユーザ属性情報のうちの準識別子の値が同一であるグループから、一の種類の提供元情報を含むレコードを除いた場合における、当該グループに含まれるレコード数が、予め定められた匿名性の指標である閾値以上であるか否かを、提供元情報の種類毎に全ての前記グループにおける判定し、前記判定を基に、前記匿名性が保たれているか否かを判定する、
請求項1に記載の匿名化装置。 - 前記匿名化手段は、
前記判定手段が、全ての前記グループにおける全ての提供元情報の種類において、前記レコード数が前記匿名性の指標である閾値以上であると判定するまで、ボトムアップ処理を用いた前記匿名化を処理する、
請求項2に記載の匿名化装置。 - 前記匿名化手段は、
前記判定手段が、全ての前記グループにおける全ての提供元情報の種類において、前記レコード数が前記匿名性の指標である閾値以上であると判定する限りにおいて、トップダウン処理を用いた前記匿名化を処理する、
請求項2に記載の匿名化装置。 - 前記判定手段は、
前記記憶手段が記憶するデータに含まれる提供元情報の種類が三種以上である場合に、提供元情報が三種以上含まれるグループにおいて、二種以上の提供元情報を一種の提供元として提供元情報の種類毎に、前記判定する、
請求項2~4のいずれか1項に記載の匿名化装置。 - 前記判定手段は、
提供元情報の種類毎の閾値を用いて、前記レコード数が、前記匿名性の指標である閾値以上であるか否かを判定する、
請求項2~5のいずれか1項に記載の匿名化装置。 - 前記判定手段は、
準識別子の値が同一である前記グループから、一の種類の提供元情報を含むレコードを除いた場合における、当該グループに含まれるセンシティブ情報の種類の数が、予め定められた多様性の指標である閾値以上であるか否かについて、提供元情報の種類毎に全ての前記グループにおいて判定し、
前記匿名化手段は、
前記判定手段の多様性の判定結果に基づいて、データを匿名化する、
請求項2~6のいずれか1項に記載の匿名化装置。 - 前記判定手段の判定結果に基づいて、匿名化処理済みのデータを出力する出力手段
を含む請求項1~7のいずれか1項に記載の匿名化装置。 - 複数の提供元から取得したレコードを結合したデータに関し、当該データの一部であるレコードを提供したいずれの提供元に対しても、データの匿名性が保たれているか否かを判定し、
前記判定結果に基づいて、データを匿名化する、
匿名化方法。 - 複数の提供元から取得したレコードを結合したデータに関し、当該データの一部であるレコードを提供したいずれの提供元に対しても、データの匿名性が保たれているか否かを判定する処理と、
前記判定結果に基づいて、データを匿名化する処理と
をコンピュータに実行させるプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014500090A JP6007969B2 (ja) | 2012-02-17 | 2013-02-06 | 匿名化装置及び匿名化方法 |
US14/378,849 US20150033356A1 (en) | 2012-02-17 | 2013-02-06 | Anonymization device, anonymization method and computer readable medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-032992 | 2012-02-17 | ||
JP2012032992 | 2012-02-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013121739A1 true WO2013121739A1 (ja) | 2013-08-22 |
Family
ID=48983876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/000639 WO2013121739A1 (ja) | 2012-02-17 | 2013-02-06 | 匿名化装置及び匿名化方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20150033356A1 (ja) |
JP (1) | JP6007969B2 (ja) |
WO (1) | WO2013121739A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015204111A (ja) * | 2014-04-10 | 2015-11-16 | マックス プランク ゲゼルシャフト ツール フォーデルング デル ヴィッセンシャフテン | 匿名化されたユーザリストカウントのためのシステム及び方法 |
JP2016053693A (ja) * | 2014-09-04 | 2016-04-14 | 株式会社東芝 | 匿名化システム |
JPWO2015118801A1 (ja) * | 2014-02-04 | 2017-03-23 | 日本電気株式会社 | 情報判定装置、情報判定方法及びプログラム |
KR20180060390A (ko) * | 2016-11-29 | 2018-06-07 | 주식회사 파수닷컴 | 목적에 따라 비식별화된 데이터를 최적화하는 방법 및 장치 |
WO2020235016A1 (ja) * | 2019-05-21 | 2020-11-26 | 日本電信電話株式会社 | 情報処理装置、情報処理方法及びプログラム |
US11003681B2 (en) | 2015-11-04 | 2021-05-11 | Kabushiki Kaisha Toshiba | Anonymization system |
JP2021099668A (ja) * | 2019-12-23 | 2021-07-01 | 日本電気株式会社 | 匿名性劣化情報出力防止装置、匿名性劣化情報出力防止方法および匿名性劣化情報出力防止プログラム |
JP2021111085A (ja) * | 2020-01-09 | 2021-08-02 | Kddi株式会社 | リスク評価装置、リスク評価方法及びリスク評価プログラム |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014229039A (ja) * | 2013-05-22 | 2014-12-08 | 株式会社日立製作所 | プライバシ保護型データ提供システム |
US20140380489A1 (en) * | 2013-06-20 | 2014-12-25 | Alcatel-Lucent Bell Labs France | Systems and methods for data anonymization |
CA2852253A1 (en) * | 2014-05-23 | 2015-11-23 | University Of Ottawa | System and method for shifting dates in the de-identification of datesets |
US11520930B2 (en) * | 2014-09-26 | 2022-12-06 | Alcatel Lucent | Privacy protection for third party data sharing |
US11120163B2 (en) * | 2014-11-14 | 2021-09-14 | Oracle International Corporation | Associating anonymous information with personally identifiable information in a non-identifiable manner |
US9870381B2 (en) * | 2015-05-22 | 2018-01-16 | International Business Machines Corporation | Detecting quasi-identifiers in datasets |
US10380381B2 (en) | 2015-07-15 | 2019-08-13 | Privacy Analytics Inc. | Re-identification risk prediction |
US10685138B2 (en) | 2015-07-15 | 2020-06-16 | Privacy Analytics Inc. | Re-identification risk measurement estimation of a dataset |
US10395059B2 (en) * | 2015-07-15 | 2019-08-27 | Privacy Analytics Inc. | System and method to reduce a risk of re-identification of text de-identification tools |
US10423803B2 (en) | 2015-07-15 | 2019-09-24 | Privacy Analytics Inc. | Smart suppression using re-identification risk measurement |
JP6192064B2 (ja) * | 2015-11-09 | 2017-09-06 | Keepdata株式会社 | 情報匿名化処理装置、及び匿名化情報の運用システム |
US10383367B2 (en) * | 2016-07-25 | 2019-08-20 | Fontem Holdings 1 B.V. | Electronic cigarette power supply portion |
EP3520014B1 (de) * | 2016-11-28 | 2020-04-15 | Siemens Aktiengesellschaft | Verfahren und system zum anonymisieren von datenbeständen |
CN109246175B (zh) * | 2017-07-11 | 2022-11-29 | 松下电器(美国)知识产权公司 | 电子投票系统和控制方法 |
US11048820B2 (en) * | 2017-07-21 | 2021-06-29 | Sap Se | Anonymized data storage and retrieval |
US10565399B2 (en) * | 2017-10-26 | 2020-02-18 | Sap Se | Bottom up data anonymization in an in-memory database |
US10915662B2 (en) * | 2017-12-15 | 2021-02-09 | International Business Machines Corporation | Data de-identification based on detection of allowable configurations for data de-identification processes |
WO2019189969A1 (ko) * | 2018-03-30 | 2019-10-03 | 주식회사 그리즐리 | 빅데이터 개인정보 익명화 및 익명 데이터 결합 방법 |
US11188678B2 (en) * | 2018-05-09 | 2021-11-30 | Fujitsu Limited | Detection and prevention of privacy violation due to database release |
JP7174377B2 (ja) * | 2018-11-26 | 2022-11-17 | 株式会社日立製作所 | データベース管理システム、および、匿名加工処理方法 |
JP7145743B2 (ja) * | 2018-12-05 | 2022-10-03 | 三菱電機株式会社 | 個人情報管理装置、個人情報管理システム、個人情報管理方法及びプログラム |
EP4115789B1 (en) | 2021-07-08 | 2023-12-20 | Ambu A/S | Endoscope image processing device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010086179A (ja) * | 2008-09-30 | 2010-04-15 | Oki Electric Ind Co Ltd | 情報処理装置、コンピュータプログラムおよび記録媒体 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002003219A1 (en) * | 2000-06-30 | 2002-01-10 | Plurimus Corporation | Method and system for monitoring online computer network behavior and creating online behavior profiles |
US7797725B2 (en) * | 2004-12-02 | 2010-09-14 | Palo Alto Research Center Incorporated | Systems and methods for protecting privacy |
US8204213B2 (en) * | 2006-03-29 | 2012-06-19 | International Business Machines Corporation | System and method for performing a similarity measure of anonymized data |
EP1950684A1 (en) * | 2007-01-29 | 2008-07-30 | Accenture Global Services GmbH | Anonymity measuring device |
CA2679800A1 (en) * | 2008-09-22 | 2010-03-22 | University Of Ottawa | Re-identification risk in de-identified databases containing personal information |
US8661423B2 (en) * | 2009-05-01 | 2014-02-25 | Telcordia Technologies, Inc. | Automated determination of quasi-identifiers using program analysis |
WO2011013495A1 (ja) * | 2009-07-31 | 2011-02-03 | 日本電気株式会社 | 情報管理装置、情報管理方法、及び情報管理プログラム |
US20110178943A1 (en) * | 2009-12-17 | 2011-07-21 | New Jersey Institute Of Technology | Systems and Methods For Anonymity Protection |
JP5796574B2 (ja) * | 2010-05-10 | 2015-10-21 | 日本電気株式会社 | 情報処理装置、制御方法及びプログラム |
US8438650B2 (en) * | 2010-07-06 | 2013-05-07 | At&T Intellectual Property I, L.P. | Anonymization of data over multiple temporal releases |
US8682910B2 (en) * | 2010-08-03 | 2014-03-25 | Accenture Global Services Limited | Database anonymization for use in testing database-centric applications |
US10148623B2 (en) * | 2010-11-12 | 2018-12-04 | Time Warner Cable Enterprises Llc | Apparatus and methods ensuring data privacy in a content distribution network |
US8868654B2 (en) * | 2011-06-06 | 2014-10-21 | Microsoft Corporation | Privacy-preserving matching service |
US8943079B2 (en) * | 2012-02-01 | 2015-01-27 | Telefonaktiebolaget L M Ericsson (Publ) | Apparatus and methods for anonymizing a data set |
-
2013
- 2013-02-06 WO PCT/JP2013/000639 patent/WO2013121739A1/ja active Application Filing
- 2013-02-06 US US14/378,849 patent/US20150033356A1/en not_active Abandoned
- 2013-02-06 JP JP2014500090A patent/JP6007969B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010086179A (ja) * | 2008-09-30 | 2010-04-15 | Oki Electric Ind Co Ltd | 情報処理装置、コンピュータプログラムおよび記録媒体 |
Non-Patent Citations (2)
Title |
---|
JUN SAKUMA: "Privacy-Preserving Data Mining", JOURNAL OF JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, vol. 24, no. 2, 1 March 2009 (2009-03-01), pages 283 - 294, XP008171636 * |
YASUYUKI SHIRAI: "Data Tokumeika ni Kansuru Kento", 2010 NENDO JAPAN SCIENCE AND TECHNOLOGY AGENCY ERATO MINATO RISAN KOZO SHORIKEI PROJECT KOKYUROKU, 9 July 2010 (2010-07-09), Retrieved from the Internet <URL:http://eprints2008.lib.hokudai.ac.jp/dspace/bitstream/2115/48479/1/06all.pdf> [retrieved on 20130422] * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPWO2015118801A1 (ja) * | 2014-02-04 | 2017-03-23 | 日本電気株式会社 | 情報判定装置、情報判定方法及びプログラム |
JP2015204111A (ja) * | 2014-04-10 | 2015-11-16 | マックス プランク ゲゼルシャフト ツール フォーデルング デル ヴィッセンシャフテン | 匿名化されたユーザリストカウントのためのシステム及び方法 |
JP2016053693A (ja) * | 2014-09-04 | 2016-04-14 | 株式会社東芝 | 匿名化システム |
US11003681B2 (en) | 2015-11-04 | 2021-05-11 | Kabushiki Kaisha Toshiba | Anonymization system |
KR20180060390A (ko) * | 2016-11-29 | 2018-06-07 | 주식회사 파수닷컴 | 목적에 따라 비식별화된 데이터를 최적화하는 방법 및 장치 |
KR101973949B1 (ko) | 2016-11-29 | 2019-04-30 | 주식회사 파수닷컴 | 목적에 따라 비식별화된 데이터를 최적화하는 방법 및 장치 |
JPWO2020235016A1 (ja) * | 2019-05-21 | 2020-11-26 | ||
WO2020235016A1 (ja) * | 2019-05-21 | 2020-11-26 | 日本電信電話株式会社 | 情報処理装置、情報処理方法及びプログラム |
JP7231020B2 (ja) | 2019-05-21 | 2023-03-01 | 日本電信電話株式会社 | 情報処理装置、情報処理方法及びプログラム |
JP2021099668A (ja) * | 2019-12-23 | 2021-07-01 | 日本電気株式会社 | 匿名性劣化情報出力防止装置、匿名性劣化情報出力防止方法および匿名性劣化情報出力防止プログラム |
JP7380183B2 (ja) | 2019-12-23 | 2023-11-15 | 日本電気株式会社 | 匿名性劣化情報出力防止装置、匿名性劣化情報出力防止方法および匿名性劣化情報出力防止プログラム |
JP2021111085A (ja) * | 2020-01-09 | 2021-08-02 | Kddi株式会社 | リスク評価装置、リスク評価方法及びリスク評価プログラム |
JP7219726B2 (ja) | 2020-01-09 | 2023-02-08 | Kddi株式会社 | リスク評価装置、リスク評価方法及びリスク評価プログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2013121739A1 (ja) | 2015-05-11 |
US20150033356A1 (en) | 2015-01-29 |
JP6007969B2 (ja) | 2016-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6007969B2 (ja) | 匿名化装置及び匿名化方法 | |
JP6065833B2 (ja) | 分散匿名化システム、分散匿名化装置及び分散匿名化方法 | |
JP6015658B2 (ja) | 匿名化装置、及び、匿名化方法 | |
US9230132B2 (en) | Anonymization for data having a relational part and sequential part | |
US10817621B2 (en) | Anonymization processing device, anonymization processing method, and program | |
WO2013088681A1 (ja) | 匿名化装置、匿名化方法、並びにコンピュータ・プログラム | |
US20170277907A1 (en) | Abstracted Graphs from Social Relationship Graph | |
JP5782637B2 (ja) | 属性選択装置、情報匿名化装置、属性選択方法、情報匿名化方法、属性選択プログラム、及び情報匿名化プログラム | |
US9990515B2 (en) | Method of re-identification risk measurement and suppression on a longitudinal dataset | |
WO2017158452A1 (en) | Abstracted graphs from social relationship graph | |
EP3832559A1 (en) | Controlling access to de-identified data sets based on a risk of re-identification | |
JP5782636B2 (ja) | 情報匿名化システム、情報損失判定方法、及び情報損失判定プログラム | |
JP6471699B2 (ja) | 情報判定装置、情報判定方法及びプログラム | |
WO2014181541A1 (ja) | 匿名性を検証する情報処理装置及び匿名性検証方法 | |
WO2013121738A1 (ja) | 分散匿名化装置及び分散匿名化方法 | |
JP2017228255A (ja) | 評価装置、評価方法及びプログラム | |
JP2016018379A (ja) | プライバシー保護装置、方法及びプログラム | |
JP2017041048A (ja) | プライバシ保護装置、方法及びプログラム | |
US11620406B2 (en) | Information processing device, information processing method, and recording medium | |
JPWO2014006851A1 (ja) | 匿名化装置、匿名化システム、匿名化方法、及び、匿名化プログラム | |
WO2014030302A1 (ja) | 匿名化を実行する情報処理装置及び匿名化処理方法 | |
KR102640123B1 (ko) | 빅데이터의 비식별화 처리방법 | |
WO2016021039A1 (ja) | k-匿名化処理システム及びk-匿名化処理方法 | |
JP7219726B2 (ja) | リスク評価装置、リスク評価方法及びリスク評価プログラム | |
JP5875535B2 (ja) | 匿名化装置、匿名化方法、プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13748469 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014500090 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14378849 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13748469 Country of ref document: EP Kind code of ref document: A1 |