WO2016067566A1 - Information processing device, information processing method, and recording medium - Google Patents

Information processing device, information processing method, and recording medium Download PDF

Info

Publication number
WO2016067566A1
WO2016067566A1 PCT/JP2015/005289 JP2015005289W WO2016067566A1 WO 2016067566 A1 WO2016067566 A1 WO 2016067566A1 JP 2015005289 W JP2015005289 W JP 2015005289W WO 2016067566 A1 WO2016067566 A1 WO 2016067566A1
Authority
WO
WIPO (PCT)
Prior art keywords
information processing
specific
risk
processing apparatus
personal data
Prior art date
Application number
PCT/JP2015/005289
Other languages
French (fr)
Japanese (ja)
Inventor
隆夫 竹之内
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2016556210A priority Critical patent/JPWO2016067566A1/en
Publication of WO2016067566A1 publication Critical patent/WO2016067566A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • the present invention relates to anonymity of data, and more particularly to an information processing apparatus, an information processing method, and a recording medium related to the possibility that an individual is specified based on data.
  • Non-Patent Document 1 Use and utilization of personal data (hereinafter referred to as personal data) is expected (see, for example, Non-Patent Document 1).
  • the data in order to promote the use and utilization of personal data, if it is “data with reduced individual identifiability”, the data can be transferred to a third party (without the consent of the person). For example, it is described that the data is provided to a company other than the company that collected the data.
  • personal data generally includes one or more attribute values.
  • a database including personal data includes personal data attribute values as column values of the data.
  • Non-Patent Document 2 explains the term “specific” in comparison with the term “identification”. Based on Non-Patent Document 2, “identification” means “knowing that certain information is the information of one person”. On the other hand, “specific” means “knowing who the information is about”. For example, it is assumed that personal data of a certain individual (one individual) is stored in a certain record in the database. At this time, “identification” means that a record corresponding to the attribute value is uniquely determined based on a certain attribute value (a value in a database column). On the other hand, “specific” means that the record identified by “identified” can be known to whom (the above-mentioned individual) is a record. As a matter of course, a record is not “specified” unless it is “identified”.
  • Non-Patent Document 3 As techniques for preventing “identification”, techniques of k-anonymity and k-anonymization are disclosed (for example, see Non-Patent Document 3).
  • the nonpatent literature 1 and the nonpatent literature 2 do not disclose the specific technical content of the anonymization for preventing specification and identification, the following description is mainly a technique of the nonpatent literature 3. Will be described with reference to FIG.
  • Each of the left table and the right table shown in FIG. 16 stores personal data for four persons.
  • each record represented by each row is personal data of each individual (each person).
  • Each table includes columns of record number (No.), zip code, age, and medical condition.
  • Record number (No.)” is a number for uniquely identifying a record.
  • “Zip code”, “age” and “disease state” are attributes.
  • the attribute includes an attribute name and an attribute value.
  • “zip code”, “age”, and “disease state” are attribute names.
  • “1230001” “28” and “heart disease” are attribute values.
  • the attribute indicated by the attribute name “zip code” has an attribute value of 1230001.
  • the table shown in FIG. 16 assumes hospital chart data.
  • an identifier is an attribute such as a name that can uniquely identify an individual.
  • the quasi-identifier is an attribute that can identify an individual when combined with other attributes. Sensitive attributes are attributes that you do not want others to know. And attributes other than these are other attributes.
  • the attribute may be a quasi-identifier and a sensitive attribute.
  • the table on the left side of FIG. 16 is a table that collects records from which identifiers such as names are deleted from hospital medical record data.
  • the record in the left table does not include an identifier. Therefore, it seems that the records included in the left table cannot identify who the record is based on the attribute of the record.
  • an analyst hereinafter also referred to as an attacker
  • the table on the left side knows that the postal code is 1230001 and the age is 28 years old as the attributes of the user A. Suppose you knew you were going to this hospital. In this case, the attacker uses the No. in the left table. It can be seen that one record is the record of user A. As a result, the attacker knows that the medical condition of user A, which is a sensitive attribute, is heart disease. Thus, the attacker can specify a record based on the zip code and the age in the left table.
  • the zip code and age are quasi-identifiers.
  • the personal data may be identified as a record.
  • the sensitive attribute that the user does not want to be known may be known to the attacker. Therefore, a technique called k-anonymization is used.
  • the table on the right side of FIG. 16 is a table including records obtained by processing quasi-identifiers using k-anonymization.
  • the postal code and age attribute values are processed using k-anonymity.
  • 2 is used as the value of k. That is, in the table on the right side of FIG. 16, the number of records that can be identified from the attribute value of the semi-identifier is 2 or more.
  • Non-Patent Document 3 defines that a table in which the number of records identified based on the quasi-identifier is k or more satisfies k-anonymity. Processing to satisfy k-anonymity is called k-anonymization.
  • a table satisfying k-anonymity can prevent “identification” below k.
  • a table satisfying k-anonymity can prevent “specification” less than k.
  • Non-Patent Document 3 The techniques of k-anonymity and k-anonymization described in Non-Patent Document 3 are techniques regarding the possibility of “identification” (hereinafter also referred to as “identification risk”). However, the risk includes a “specific” risk described in Non-Patent Document 1.
  • the current technology related to anonymity is a technology corresponding to identification risk (identification), and is not a technology corresponding to the possibility of “specific” (hereinafter referred to as “specific risk”). That is, the technique described in Non-Patent Document 3 has a problem that it cannot cope with a specific risk.
  • anonymization of data considering only identification risk without considering specific risk may protect privacy more than necessary, that is, process data more than necessary.
  • the assumption that the attacker knows the quasi-identifier corresponds to consideration of identification risk. However, the attacker may not know the quasi-identifier. If the value of k-anonymity is set on the assumption that the attacker knows the quasi-identifier even though it is unlikely that the attacker knows the quasi-identifier, the value of k is more than necessary. It will be a big value. As described above, an anonymization apparatus may process data more than necessary in anonymization using an identification risk. As a result, the usefulness of the data after anonymization decreases more than necessary.
  • Non-Patent Document 3 has a problem of processing data more than necessary.
  • An object of the present invention is to provide an information processing apparatus, an information processing method, and a recording medium that can solve the above problems and calculate (evaluate) a specific risk.
  • An information processing apparatus includes an identification risk calculation unit that calculates an identification risk indicating that there is a possibility that data related to a specified individual is determined to be one person's data; Possibility that the specified personal data is determined to be the specified personal data based on the specific individual arrival rate indicating the possibility of being determined to be the specified personal data and the identification risk Specific risk calculation means for calculating a specific risk indicative of
  • An information processing system includes an identification risk calculation unit that calculates an identification risk indicating that there is a possibility that data related to a specified individual is the data of one person, and data of the specified individual is Possibility that the specified personal data is determined to be the specified personal data based on the specific individual arrival rate indicating the possibility of being determined to be the specified personal data and the identification risk
  • An information processing apparatus including a specific risk calculating means for calculating a specific risk indicating personal information, a personal information storing means for storing personal data as information on a plurality of individuals, and all individuals included in the personal data calculated by the information processing apparatus And an overall risk calculating means for calculating a risk corresponding to the entire personal data based on the specific risk corresponding to the data.
  • An information processing method calculates an identification risk indicating a possibility that data related to a specified individual is determined to be one person's data, and the data of the specified individual is Based on the specific individual arrival rate indicating the possibility of being determined to be data and the identification risk, the specific risk indicating the possibility that the specified personal data is determined to be the specified personal data calculate.
  • the recording medium includes a process for calculating an identification risk indicating that there is a possibility that the data related to the designated individual is determined to be one person's data, and the individual designated by the designated individual data.
  • Specific risk that indicates the possibility that the data of the specified individual is determined to be the data of the specified individual based on the specific individual arrival rate that indicates the possibility that the data is determined to be
  • a program for causing the computer to execute the process of calculating the value is recorded so as to be readable by the computer.
  • FIG. 1 is a block diagram showing an example of the configuration of an information processing system including an information processing apparatus according to the first embodiment of the present invention.
  • FIG. 2 is a diagram illustrating an example of personal data included in the personal data storage unit according to the first embodiment.
  • FIG. 3 is a flowchart illustrating an example of the operation of the information processing apparatus according to the first embodiment.
  • FIG. 4 is a block diagram illustrating an example of another configuration of the information processing system including the information processing apparatus according to the first embodiment.
  • FIG. 5 is a flowchart illustrating an example of the operation of the information processing system according to the first embodiment.
  • FIG. 6 is a diagram illustrating a calculation result of a specific risk used for explaining the first embodiment.
  • FIG. 1 is a block diagram showing an example of the configuration of an information processing system including an information processing apparatus according to the first embodiment of the present invention.
  • FIG. 2 is a diagram illustrating an example of personal data included in the personal data storage unit according to the first embodiment.
  • FIG. 3 is a
  • FIG. 7 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the second embodiment.
  • FIG. 8 is a diagram illustrating an example of data stored by the storage unit according to the second embodiment.
  • FIG. 9 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the third embodiment.
  • FIG. 10 is a diagram illustrating an example of data stored by the storage unit according to the third embodiment.
  • FIG. 11 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the fourth embodiment.
  • FIG. 12 is a diagram illustrating an example of data stored by the storage unit according to the fourth embodiment.
  • FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the fifth embodiment.
  • FIG. 14 is a diagram illustrating an example of data stored by the storage unit according to the fifth embodiment.
  • FIG. 15 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the sixth embodiment.
  • FIG. 16 is a diagram for explaining k-anonymization and k-anonymity.
  • FIG. 17 is a block diagram illustrating an example of a configuration of a modification of the information processing apparatus according to the first embodiment.
  • FIG. 18 is a block diagram illustrating an example of a configuration of a modified example of the information processing apparatus according to the first embodiment.
  • Identity means that it is known (determined) that certain information (data) is information (data) of one person.
  • “Specific” means knowing (determining) who information (data) is certain information (data).
  • Record is personal data of each individual.
  • the record includes a plurality of attributes.
  • attribute is the type of data included in the record.
  • the attribute includes an attribute name and an attribute value.
  • “Semi-identifier” is an attribute that can identify an individual based on a combination with other attributes.
  • “Sensitive attributes” are attributes (data) that individuals do not want to disclose.
  • the “specific individual arrival rate” is a value indicating the possibility that the data (record) is determined (specified) as data of a specific individual. More specifically, the specific individual arrival rate is a value used to calculate the specific risk when the identification risk is calculated. Alternatively, the specific individual arrival rate is a value used when calculating a specific risk by correcting an identification risk calculated based on information for identifying data in a record. More specifically, the specific individual arrival rate is, for example, the possibility that an attacker acquires information (for example, the value of the quasi-identifier described above) for identifying data in a record.
  • the specific individual arrival rate for one quasi-identifier Is higher than the specific individual arrival rate for other quasi-identifiers.
  • the possibility of obtaining a value of a certain semi-identifier is generally higher than the possibility of obtaining a combination of values of a plurality of semi-identifiers including the semi-identifier. Therefore, the specific individual arrival rate for a certain quasi-identifier is higher than the specific individual arrival rate for a combination of quasi-identifiers including the quasi-identifier.
  • increasing the specific individual arrival rate means increasing the specific risk.
  • the anonymization corresponding to a specific risk makes anonymity high, so that a specific risk is high.
  • rare attributes quadsi-identifiers
  • the ratio of the specific risk to the identification risk in the rare attribute is higher than the ratio of the specific risk to the identification risk in the general attribute. Therefore, the specific person arrival rate for rare attributes is higher than the specific individual arrival rate for non-rare attributes.
  • the range of the value (r) of the specific individual arrival rate is between 0 and 1 (0 ⁇ r ⁇ 1).
  • FIG. 1 is a block diagram showing an example of the configuration of an information processing system 300 including an information processing apparatus 100 according to the first embodiment of the present invention.
  • the information processing system 300 includes an information processing apparatus 100 and a personal data storage unit 200.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the personal data storage unit 200 stores personal data that is a target of specific risk evaluation processing in the information processing apparatus 100.
  • FIG. 2 is a diagram illustrating an example of personal data stored by the personal data storage unit 200.
  • the personal data storage unit 200 stores “user ID”, which is an identifier of a user (individual), in association with attributes related to the user as personal data.
  • the personal data shown in FIG. 2 includes age, sex, and disease name as attributes relating to the user. For example, “age” (attribute name) of “user1” is “20” (attribute value). Similarly, the “sex” (attribute name) of user1 is “male” (attribute value). The “disease name” (attribute name) of user1 is “cold” (attribute value).
  • the quasi-identifier shall be age and gender. That is, age and sex are attributes to be anonymized.
  • the sensitive attribute is a disease name.
  • the information processing apparatus 100 calculates (evaluates) the specified individual specific risk in the personal data stored by the personal data storage unit 200.
  • the information processing apparatus 100 includes a reception unit 110, an identification risk calculation unit 120, and a specific risk calculation unit 130.
  • the receiving unit 110 receives a specific individual arrival rate from a device (not shown).
  • the device that transmits the specific individual arrival rate is not particularly limited.
  • the receiving unit 110 may receive a specific individual arrival rate from a device operated by the user.
  • the receiving unit 110 may read the specific individual arrival rate from a storage device (not shown).
  • these are collectively referred to as “the receiving unit 110 receives a specific individual arrival rate”.
  • the identification risk calculation unit 120 calculates the identification risk.
  • the specific risk calculation unit 130 calculates the specific risk based on the identification risk and the specific individual arrival rate.
  • FIG. 3 is a flowchart showing an example of the operation of the information processing apparatus 100 according to the first embodiment. As illustrated in FIG. 3, the information processing apparatus 100 performs operations from steps S101 to S104 described below.
  • the information processing apparatus 100 has previously designated an individual (user) who is to calculate a specific risk. Then, the receiving unit 110 receives the specific individual arrival rate (r).
  • the identification risk calculation unit 120 acquires personal data from the personal data storage unit 200 (step S101). In this description, as already described, the identification risk calculation unit 120 acquires the personal data shown in FIG.
  • the identification risk calculation unit 120 calculates how many (how many) records can be identified for the specified individual based on the personal data. Then, the identification risk calculation unit 120 calculates the identification risk of the designated individual based on the number (m) of records that can be identified (step S102). The greater the value of m, the more difficult it is to identify the specified individual from the identified record.
  • the identification risk calculation unit 120 calculates the identification risk using a preset calculation method. There is no particular limitation on the method for calculating the identification risk. For example, when m records are identified, the identification risk calculation unit 120 may calculate “1 / m” as the identification risk. Then, the identification risk calculation unit 120 transmits the calculated identification risk to the specific risk calculation unit 130.
  • identification risk calculation unit 120 An example of the operation of the identification risk calculation unit 120 will be described with reference to personal data shown in FIG. The designated individual is “user1”.
  • the values of the two quasi-identifiers (age and sex) of “user1” are “20 (age)” and “male (sex)”, respectively.
  • the identification risk calculation unit 120 may use other arithmetic operations such as other four arithmetic operations or a power root as the calculation of the identification risk, not limited to division.
  • the identification risk calculation unit 120 may use a general identification risk calculation method.
  • the identification risk calculation unit 120 transmits 0.5 as the identification risk to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120.
  • the specific risk calculation unit 130 acquires the specific individual arrival rate (r) of the individual from the reception unit 110 (step S103).
  • the specific individual arrival rate (r) is 0.3.
  • the specific risk calculation unit 130 calculates the specific risk using a preset calculation method.
  • the identification risk (1 / m) is 0.5.
  • specific risk (1 / m) ⁇ r” is used as the calculation of the individual specific risk.
  • the calculation of the specific risk is not limited to this.
  • the specific risk calculation unit 130 is not limited to multiplication or division as a calculation formula, and addition or subtraction may be used.
  • the specific risk calculation unit 130 is not limited to the four rules, and may use a calculation such as a power root or logarithm. Furthermore, the specific risk calculation unit 130 may combine these calculations.
  • the specific risk calculation unit 130 may transmit the specific risk to a predetermined device (for example, a user device).
  • the information processing apparatus 100 calculates a specific risk for a certain individual.
  • the information processing apparatus 100 may calculate specific risks for a plurality of or all individuals stored in the personal data storage unit 200.
  • FIG. 4 is a diagram illustrating an example of a configuration of an information processing system 310 having another configuration including the information processing apparatus 100 according to the first embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing apparatus 100 may include an overall risk calculation unit 240. Furthermore, the information processing apparatus 100 may include a specific risk calculation result storage unit 230.
  • the specific risk calculation result storage unit 230 stores a specific risk for each individual.
  • FIG. 6 is a diagram showing a calculation result of a specific risk for each user.
  • the formula shown on the right side of the table is a formula for calculating the specific risk of each individual.
  • the overall risk calculation unit 240 calculates the specific risks of all individuals stored in the personal data storage unit 200 using the information processing apparatus 100, and stores the calculated specific risks in the specific risk calculation result storage unit 230. To do. After the calculation of the specific risk for all individuals is completed, the overall risk calculation unit 240 is stored in the personal data storage unit 200 using all the specific risks stored in the specific risk calculation result storage unit 230. Calculate the overall risk in your personal data.
  • the “total risk” is a value calculated using a predetermined calculation formula based on the specific risk of each individual.
  • the overall risk may be a total value, an arithmetic average value, a median value, or a mode value of specific risks of all individuals.
  • the overall risk may be the maximum value or the minimum value in the specific risk of all individuals.
  • the total risk may be a total value or an average value of a predetermined number of high-level specific risks in all individual specific risks.
  • the overall risk may be a value related to the distribution of specific risks of all individuals or a distribution shape such as standard deviation.
  • the overall risk calculation unit 240 may calculate not only one value but a plurality of values (for example, an average value and variance) as the overall risk.
  • FIG. 5 is a flowchart showing an example of the operation of the information processing system 310.
  • the operation control entity need not be limited to this.
  • the information processing apparatus 100 may control including the overall risk calculation unit 240.
  • a control device (not shown) may control the configuration included in the information processing system 310.
  • the overall risk calculation unit 240 instructs the information processing apparatus 100 to acquire personal data.
  • the information processing apparatus 100 acquires personal data from the personal data storage unit 200 (step S201).
  • the overall risk calculation unit 240 instructs the information processing apparatus 100 to calculate a specific risk corresponding to each individual of the personal data (step S202).
  • the information processing apparatus 100 calculates a specific risk corresponding to the designated individual (step S203).
  • the overall risk calculation unit 240 stores the calculated specific risk in the specific risk calculation result storage unit 230 in association with the individual (step S204).
  • the overall risk calculation unit 240 calculates the overall risk based on the specific risks of all individuals (step S205).
  • the information processing apparatus 100 can achieve an effect that a specific risk of a predetermined individual can be calculated.
  • the receiving unit 110 of the present embodiment receives the specific individual arrival rate. Then, the identification risk calculation unit 120 calculates an individual identification risk. This is because the specific risk calculation unit 130 can calculate the specific risk of the designated individual based on the identification risk and the specific individual arrival rate.
  • the system using the information processing apparatus 100 can determine anonymization of appropriate personal data using the specific risk calculated by the information processing apparatus 100.
  • the information processing apparatus 100 can achieve an effect of preventing unnecessary data processing.
  • the system using the information processing apparatus 100 determines the degree of anonymization (for example, k-value of anonymity) when identifying risk. This is because a specific risk can be used.
  • the information processing system 310 including the information processing apparatus 100 according to the present embodiment can produce an effect that it is possible to calculate the overall risk for the entire personal data.
  • the reason is that the overall risk calculation unit 240 can calculate the overall risk of personal data based on the specific risk of all individuals.
  • each component of the information processing apparatus 100 may be configured with a hardware circuit.
  • each component may be configured using a plurality of apparatuses connected via a network.
  • FIG. 18 is a block diagram illustrating an example of the configuration of the information processing apparatus 106 according to the first modification of the present embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing apparatus 106 includes an identification risk calculation unit 120 and a specific risk calculation unit 130. Each configuration of the information processing apparatus 106 receives personal data and a specific individual arrival rate via a network (not shown) and operates in the same manner as each configuration of the information processing apparatus 100.
  • the information processing apparatus 106 configured in this manner can achieve the same effects as the information processing apparatus 100.
  • each configuration of the information processing apparatus 106 operates in the same manner as the configuration of the information processing apparatus 100 and can calculate a specific risk.
  • the information processing apparatus 106 is the minimum configuration in the embodiment of the present invention.
  • Modification 2 Furthermore, modified examples of the information processing apparatus 100 and the information processing apparatus 106 will be described using the information processing apparatus 100.
  • the plurality of components may be configured with a single piece of hardware.
  • the information processing apparatus 100 may be realized as a computer apparatus including a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory).
  • the information processing apparatus 100 may be realized as a computer apparatus that further includes an input / output connection circuit (IOC: Input : / Output Circuit) and a network interface circuit (NIC: Network Interface Circuit).
  • IOC Input : / Output Circuit
  • NIC Network Interface Circuit
  • FIG. 17 is a block diagram showing an example of the configuration of the information processing apparatus 600 according to this modification.
  • the information processing apparatus 600 includes a CPU 610, a ROM 620, a RAM 630, an internal storage device 640, an IOC 650, and a NIC 680, and constitutes a computer device.
  • the CPU 610 reads a program from ROM 620.
  • the CPU 610 controls the RAM 630, the internal storage device 640, the IOC 650, and the NIC 680 based on the read program.
  • the computer including the CPU 610 controls these configurations to realize the functions as the reception unit 110, the identification risk calculation unit 120, and the specific risk calculation unit 130 illustrated in FIG. Further, the computer including the CPU 610 may control these configurations to realize the function as the overall risk calculation unit 240 shown in FIG.
  • the CPU 610 may use the RAM 630 or the internal storage device 640 as a temporary storage medium for the program when realizing each function.
  • the CPU 610 may read a program included in the storage medium 700 storing the program so as to be readable by a computer by using a storage medium reading device (not shown). Alternatively, the CPU 610 may receive a program from an external device (not shown) via the NIC 680, store the program in the RAM 630, and operate based on the stored program.
  • ROM 620 stores programs executed by CPU 610 and fixed data.
  • the ROM 620 is, for example, a P-ROM (Programmable-ROM) or a flash ROM.
  • the RAM 630 temporarily stores programs executed by the CPU 610 and data.
  • the RAM 630 is, for example, a D-RAM (Dynamic-RAM).
  • the internal storage device 640 stores data and programs stored in the information processing device 600 for a long period of time. Further, the internal storage device 640 may operate as a temporary storage device for the CPU 610.
  • the internal storage device 640 is, for example, a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), or a disk array device.
  • the internal storage device 640 may operate as the personal data storage unit 200.
  • the ROM 620 and the internal storage device 640 are nonvolatile storage media.
  • the RAM 630 is a volatile storage medium.
  • the CPU 610 can operate based on a program stored in the ROM 620, the internal storage device 640, or the RAM 630. That is, the CPU 610 can operate using a nonvolatile storage medium or a volatile storage medium.
  • the IOC 650 mediates data between the CPU 610, the input device 660, and the display device 670.
  • the IOC 650 is, for example, an IO interface card or a USB (Universal Serial Bus) card.
  • the input device 660 is a device that receives an input instruction from an operator of the information processing apparatus 600.
  • the input device 660 is, for example, a keyboard, a mouse, or a touch panel.
  • the display device 670 is a device that displays information to the operator of the information processing apparatus 600.
  • the display device 670 is a liquid crystal display, for example.
  • the NIC 680 relays data exchange with an external device (not shown) via the network.
  • the NIC 680 is, for example, a LAN (Local Area Network) card.
  • the information processing apparatus 600 configured as described above can achieve the same effects as the information processing apparatus 100.
  • the information processing apparatus 101 according to the second embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to the attribute of the quasi-identifier.
  • This embodiment can cope with a case where a plurality of attackers have different possibilities of knowing the quasi-identifier.
  • FIG. 7 is a block diagram illustrating an example of the configuration of the information processing apparatus 101 according to the second embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing apparatus 101 includes an acquisition unit (first acquisition unit) 111 instead of the reception unit 110 that receives the specific individual arrival rate of the information processing apparatus 100.
  • the information processing apparatus 101 includes a storage unit 211.
  • the information processing apparatus 101 may be configured using a computer shown in FIG. Further, the information processing apparatus 101 may use the storage unit 211 as an external apparatus connected via a network. Further, the acquisition unit 111 may receive information stored in the storage unit 211 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 101 may not include the storage unit 211.
  • the storage unit 211 stores the quasi-identifier or combination of quasi-identifiers and the corresponding specific individual arrival rate in association with each other.
  • FIG. 8 is a diagram illustrating an example of data stored by the storage unit 211.
  • the specific individual arrival rate corresponding to the combination of age and sex, which are quasi-identifiers is 0.3.
  • the specific individual arrival rate with respect to age is 0.6, which is higher than that value. This is because the possibility that the value of the combination of quasi-identifiers can be acquired is lower than the possibility that the value of one quasi-identifier included in the combination can be acquired.
  • the following description of the present embodiment will be described using the data of FIG.
  • the acquisition unit 111 acquires the specific individual arrival rate corresponding to the quasi-identifier or the combination of quasi-identifiers from the storage unit 211.
  • the acquiring unit 111 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers.
  • the information processing apparatus 101 executes step S101 and step S102 as in the first embodiment.
  • step S103 the information processing apparatus 101 executes the following operation instead of step S103.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 passes the information on the specified individual semi-identifier or combination of semi-identifiers to the acquisition unit 111 and requests acquisition of the specific individual arrival rate.
  • the acquisition unit 111 acquires the specific individual arrival rate corresponding to the requested quasi-identifier or combination of quasi-identifiers from the storage unit 211. Then, the acquisition unit 111 returns the specific individual arrival rate to the specific risk calculation unit 130.
  • the acquisition unit 111 acquires 0.3 as the specific individual reach and transmits it to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
  • the information processing apparatus 101 can produce an effect that a specific risk can be calculated according to the attribute of the quasi-identifier. That is, the second embodiment can produce an effect that it can cope with a case where the quasi-identifiers that the attacker may know are different.
  • the first acquisition unit 111 acquires a specific individual arrival rate corresponding to a quasi-identifier or a combination of quasi-identifiers. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the quasi-identifier or the combination of quasi-identifiers.
  • the information processing apparatus 102 according to the third embodiment differs from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to a combination of conditions for the attributes of the quasi-identifier.
  • FIG. 9 is a block diagram illustrating an example of the configuration of the information processing apparatus 102 according to the third embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing apparatus 102 includes an acquisition unit 112 instead of the reception unit 110 of the information processing apparatus 100.
  • the acquisition unit 112 may be referred to as a second acquisition unit.
  • the information processing apparatus 102 includes a storage unit 212.
  • the information processing apparatus 102 may be configured using a computer shown in FIG. Further, the information processing apparatus 102 may use the storage unit 212 as an external apparatus connected via a network. Further, the acquisition unit 112 may receive information stored in the storage unit 212 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 102 may not include the storage unit 212.
  • the storage unit 212 includes a first attribute (first attribute name) as a condition, a second attribute (attribute name) for setting a specific individual arrival rate, and a function ( For example, a combination of a conditional expression and a calculation expression) is stored in association with each other.
  • the first and second attributes may be a combination of a plurality of attributes.
  • FIG. 10 is a diagram illustrating an example of data stored by the storage unit 212.
  • the attribute used for the determination is the first attribute
  • the attribute for setting the specific individual arrival rate is the second attribute.
  • the specific individual arrival rate set in FIG. 10 is a function for calculating the set specific individual arrival rate. According to the function shown in FIG. 10, for example, in the case of the attribute value (male) of the attribute name (gender), the specific individual arrival rate for the attribute (age) is 0.2. Further, according to the function, in the case of the attribute value (female) of the attribute name (gender), the specific individual arrival rate for the attribute (age) is 0.1. The following description of the present embodiment will be described using the data of FIG.
  • the acquisition unit 112 acquires a specific individual arrival rate corresponding to the specified attribute.
  • the acquisition unit 112 may acquire personal data from the personal data storage unit 200 as necessary when acquiring the specific individual arrival rate.
  • the acquiring unit 112 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers.
  • the acquiring unit 112 obtains the minimum value among the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of quasi-identifiers. You may select the specific individual arrival rate which is the maximum value.
  • the information processing apparatus 102 executes step S101 and step S102 as in the first embodiment.
  • step S103 the information processing apparatus 102 executes the following operation instead of step S103.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. The specific risk calculation unit 130 then passes the information on the specified individual semi-identifier or combination of semi-identifiers to the acquisition unit 112 and requests acquisition of the specific individual reach.
  • the obtaining unit 112 refers to the storage unit 212 and obtains the specific individual arrival rate corresponding to the attribute name and attribute value of the designated individual semi-identifier. Then, the acquisition unit 112 returns the specific individual arrival rate to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 may transmit the specified individual attribute name and attribute value to the acquisition unit 112.
  • the specific risk calculation unit 130 may transmit the specified individual identifier or the specified individual attribute name to the acquisition unit 112.
  • the acquisition unit 112 refers to the personal data storage unit 200 and acquires data necessary for the determination.
  • the acquisition unit 112 acquires the attribute value (20) of the attribute name (age) of user1 from the data (data shown in FIG. 2) of the personal data storage unit 200. Then, the acquisition unit 112 acquires 0.3 as the specific individual arrival rate based on the data stored in the storage unit 212 (data shown in FIG. 10).
  • the specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
  • the information processing apparatus 102 can achieve the effect that the specific individual arrival rate can be determined according to the attribute name and the attribute value. That is, the third embodiment can produce an effect that an appropriate specific risk corresponding to a finer condition can be calculated.
  • the second acquisition unit 112 acquires the specific individual arrival rate based on the condition set corresponding to the attribute value or the combination of attribute values that are quasi-identifiers. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the condition.
  • the information processing apparatus 103 according to the fourth embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to the individual identification risk.
  • FIG. 11 is a block diagram illustrating an example of the configuration of the information processing apparatus 103 according to the fourth embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing apparatus 103 includes an acquisition unit 113 instead of the reception unit 110 of the information processing apparatus 100.
  • the acquisition unit 113 may be referred to as a third acquisition unit in order to distinguish it from the acquisition unit 112 or the like.
  • the information processing apparatus 103 includes a storage unit 213.
  • the information processing apparatus 103 may be configured using a computer shown in FIG. Further, the information processing apparatus 103 may use the storage unit 213 as an external apparatus connected via a network. Further, the acquisition unit 113 may receive information stored in the storage unit 213 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 103 may not include the storage unit 213.
  • the storage unit 213 stores the identification risk and the specific individual arrival rate in association with each other.
  • FIG. 12 is a diagram illustrating an example of data stored by the storage unit 213.
  • Increasing the specific individual arrival rate means increasing the specific risk.
  • the anonymization process using a specific risk performs anonymization so that it may become high anonymity, so that it is hard to identify, so that a specific risk is high. Therefore, the specific individual arrival rate shown in FIG. 12 is higher as the identification risk (1 / m) is larger. This is because, as described above, the rare attributes (attributes with high identification risk) are less likely to be specified (specific individual arrival rate is high).
  • This is an example of this embodiment.
  • the present embodiment need not be limited to such data.
  • the following description of this embodiment is demonstrated using the data of FIG.
  • the acquisition unit 113 acquires the specific individual arrival rate based on the identification risk.
  • the information processing apparatus 103 executes step S101 and step S102 as in the first embodiment.
  • step S103 the information processing apparatus 103 performs the following operation instead of step S103.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 passes the quasi-identifier information of the specified individual identification risk condition to the acquisition unit 113 and requests acquisition of the specific individual arrival rate.
  • the acquisition unit 113 acquires the possibility of individual identification (identification risk) from the identification risk calculation unit 120. Thereafter, the acquisition unit 113 refers to the storage unit 213 and acquires the specific individual arrival rate.
  • the acquisition unit 113 refers to the storage unit 213 (data illustrated in FIG. 12) and acquires 0.8 as the specific individual arrival rate.
  • the acquisition unit 113 returns the specific individual arrival rate to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
  • the information processing apparatus 103 can produce an effect that an appropriate specific risk can be calculated according to the identification risk.
  • the third acquisition unit 113 acquires the specific individual arrival rate in consideration of the identification risk. Then, the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate.
  • birthday is a quasi-identifier.
  • a person whose birthday is February 29 of a leap year is more easily identified based on a quasi-identifier (birthday) than a person whose birthday is a birthday on the other day. Therefore, a person whose birthday is February 29 is more likely to hide his / her birthday (quasi-identifier) than other people. That is, when the quasi-identifier is a birthday, the identification risk for a person whose birthday is February 29 is higher than the identification risk for other persons. Even in such a case, the present embodiment can acquire an appropriate specific individual arrival rate according to the identification risk and calculate the specific risk.
  • the information processing apparatus 104 according to the fifth embodiment is different from the first embodiment in that the specific individual arrival rate is changed according to an organization (or an organization that may become an attacker) of a personal data providing destination. Different from the information processing apparatus 100. This is because the organization (partner) to whom personal data is provided may become an attacker against the provided personal data. Each organization has different risks as attackers.
  • FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus 104 according to the fifth embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing device 104 includes an acquisition unit 114 instead of the reception unit 110 of the information processing device 100.
  • the acquisition unit 114 is distinguished from the acquisition unit 112 or the like, it is referred to as a fourth acquisition unit.
  • the information processing apparatus 104 includes a storage unit 214.
  • the information processing apparatus 104 may be configured using a computer shown in FIG. Further, the information processing apparatus 104 may use the storage unit 214 as an external apparatus connected via a network. Further, the acquisition unit 114 may receive information stored in the storage unit 214 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 104 may not include the storage unit 214.
  • the storage unit 214 stores the information providing destination in association with the specific individual arrival rate corresponding to the providing destination.
  • FIG. 14 is a diagram illustrating an example of data stored by the storage unit 214.
  • an organization with a large number of people is likely to include a person who knows the target individual.
  • the number of members of the organization B is larger than the number of members of the organization A.
  • the specific individual arrival rate in the organization B needs to be larger than the specific individual arrival rate in the organization A. Therefore, in FIG. 14, the specific individual arrival rate in the organization B of the provision destination is a value larger than the specific individual arrival rate in the organization A.
  • the storage unit 214 may include a plurality of types (for example, the organization and the business type illustrated in FIG. 14) as the organization to be stored as a providing destination. The following description of the present embodiment will be described using the data of FIG.
  • the acquisition unit 114 acquires the specific individual arrival rate from the storage unit 214.
  • the acquiring unit 114 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers.
  • the acquisition part 114 may acquire the information (for example, the number of members) regarding a provision destination from the preservation
  • the information processing apparatus 104 executes step S101 and step S102 as in the first embodiment.
  • step S103 the information processing apparatus 104 executes the following operation instead of step S103.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 requests the acquisition unit 114 to acquire the specific individual arrival rate. At this time, the specific risk calculation unit 130 transmits to the acquisition unit 114 information on the provision destination and the attribute set as the opponent (attacker).
  • the acquisition unit 114 acquires the specific individual arrival rate corresponding to the received destination and attribute based on the data of the storage unit 214.
  • the acquisition unit 114 acquires 0.5 as the specific individual arrival rate.
  • the acquisition unit 114 returns information on the specific individual arrival rate to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
  • the information processing apparatus 104 changes the specific individual arrival rate according to an organization that provides personal data (an organization that may become an attacker). The effect that it can be done can be produced.
  • the fourth acquisition unit 114 acquires a specific individual arrival rate corresponding to an organization (partner) that provides personal data. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the organization (partner).
  • the information processing apparatus 105 according to the sixth embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is calculated using publicly available data (public information).
  • public information is data (public data) that is open to the public.
  • public information refers to the distribution of members to whom data is provided (for example, information such as 10,000 members in their 10s, 15,000 members in their 20s, and 10,000 members in their 30s). is there.
  • the public information is information published on the Internet such as Twitter (registered trademark) (for example, “user1 is a teenager and discloses location information”).
  • the disclosure range of public information need not be limited to information that has no limitation on the disclosure range, such as information on the Internet.
  • the public information may be information that is disclosed to a member registered in a predetermined organization (for example, an Internet provider) and whose disclosure range is limited to some extent.
  • FIG. 15 is a block diagram illustrating an example of the configuration of the information processing apparatus 105 according to the sixth embodiment.
  • the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
  • the information processing device 105 includes a specific individual arrival rate calculation unit 115 instead of the reception unit 110 of the information processing device 100.
  • the information processing apparatus 105 includes a public distribution information storage unit 215.
  • the information processing apparatus 105 may be configured using a computer shown in FIG. Further, the information processing apparatus 105 may use the public distribution information storage unit 215 as an external apparatus connected via a network. Further, the specific individual arrival rate calculation unit 115 may receive information stored in the public distribution information storage unit 215 described below from an external device (not shown), as in the first embodiment. In this case, the information processing apparatus 105 may not include the public distribution information storage unit 215.
  • the public distribution information storage unit 215 stores public information.
  • the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate based on the public information.
  • the information processing apparatus 105 executes step S101 and step S102 as in the first embodiment.
  • step S103 the information processing apparatus 105 executes the following operation instead of step S103.
  • the specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 requests the specific individual arrival rate calculation unit 115 to calculate the specific individual arrival rate.
  • the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate using the public information stored in the public distribution information storage unit 215 and the personal data stored in the personal data storage unit 200, and the calculation result To the specific risk calculation unit 130.
  • the calculation of the specific individual arrival rate calculation unit 115 is not particularly limited and may be set according to the required risk.
  • the specific individual arrival rate calculation unit 115 may calculate the specific individual arrival rate using the distribution of personal data and the distribution of data in the organization to which the personal data is provided in the public information.
  • the first calculation example is a calculation example using the number of members of the target organization.
  • the public information is “Company A has 10 million minor members (10% of Japan's population)”.
  • the specific individual arrival rate calculation unit 115 knows that Company A knows the quasi-identifier with a probability of 10%. Therefore, the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate as 0.1 based on the public information (population ratio).
  • the second calculation example is a calculation example using the distribution of members of the target organization.
  • the public information is “the number of teenage members is 10,000
  • the number of location information public members of the teens is 1,000
  • the number of members of the 20s is 20,000
  • the number of location information disclosure members is 1,000.
  • the specific individual arrival rate of members is calculated as 0.05.
  • the specific individual arrival rate calculation unit 115 returns the specific individual arrival rate to the specific risk calculation unit 130.
  • the specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
  • the information processing apparatus 105 has an effect of receiving specific personal information and reducing the operation of storing the specific personal reach.
  • the reason is that the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate based on the public information.
  • the specific risk calculation unit 130 uses the specific individual arrival rate corresponding to the quasi-identifier described in the second embodiment and the specific individual corresponding to the identification risk described in the fourth embodiment.
  • the specific risk may be calculated using the arrival rate.
  • the information processing system 310 may include the information processing apparatus 101 according to the second embodiment to the information processing apparatus 105 according to the sixth embodiment, instead of the information processing apparatus 100 according to the first embodiment. .
  • the present invention can be used as a tool for calculating a specific risk of an individual.
  • the present invention can also be used when processing data so that the personal data becomes “data with reduced individual specificity”.
  • Information processing apparatus 101 Information processing apparatus 102 Information processing apparatus 103 Information processing apparatus 104 Information processing apparatus 105 Information processing apparatus 106 Information processing apparatus 110 Reception part 111 Acquisition part 112 Acquisition part 113 Acquisition part 114 Acquisition part 115 Specific individual arrival rate calculation part 120 Identification Risk Calculation Unit 130 Specific Risk Calculation Unit 200 Personal Data Storage Unit 211 Storage Unit 212 Storage Unit 213 Storage Unit 214 Storage Unit 215 Public Distribution Information Storage Unit 230 Specific Risk Calculation Result Storage Unit 240 Overall Risk Calculation Unit 300 Information Processing System 310 Information processing system 600 Information processing device 610 CPU 620 ROM 630 RAM 640 Internal storage device 650 IOC 660 Input device 670 Display device 680 NIC 700 storage media

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In order to calculate an identification risk relating to personal data, this information processing device comprises: a discrimination risk calculating means which calculates a discrimination risk representing the possibility that data relating to a specific individual can be determined to be the data for a single person; and an identification risk calculating means which calculates an identification risk representing the possibility that the data for the specific individual can be determined to be the data for the specific individual, on the basis of the discrimination risk and an identified individual attainment ratio representing the possibility that the data for the specific individual can be determined to be the data for the specific individual.

Description

情報処理装置、情報処理方法、及び、記録媒体Information processing apparatus, information processing method, and recording medium
 本発明は、データの匿名性に関し、特に、データを基に個人が特定される可能性に関連する情報処理装置、情報処理方法、及び記録媒体に関する。 The present invention relates to anonymity of data, and more particularly to an information processing apparatus, an information processing method, and a recording medium related to the possibility that an individual is specified based on data.
 個人に関するデータ(以下、パーソナルデータと呼ぶ)の利用及び活用が、期待されている(例えば、非特許文献1を参照)。非特許文献1には、パーソナルデータの利用及び活用を促進させるために、「個人の特定可能性を低減したデータ」であれば、本人の同意がなくても、そのデータを、第三者(例えば、そのデータを収集した企業以外の企業)へ提供することが、記載されている。 Use and utilization of personal data (hereinafter referred to as personal data) is expected (see, for example, Non-Patent Document 1). In Non-Patent Document 1, in order to promote the use and utilization of personal data, if it is “data with reduced individual identifiability”, the data can be transferred to a third party (without the consent of the person). For example, it is described that the data is provided to a company other than the company that collected the data.
 なお、パーソナルデータは、一般的に、1つ又は複数の属性値を含む。例えば、パーソナルデータを含むデータベースは、パーソナルデータの属性値を、そのデータのカラムの値として含む。 Note that personal data generally includes one or more attribute values. For example, a database including personal data includes personal data attribute values as column values of the data.
 ここで、以下で用いる「特定」及び「識別」という用語について説明する(非特許文献2を参照)。非特許文献2は、「特定」という用語を「識別」という用語と比較して、説明している。非特許文献2に基づくと、「識別」とは、「ある情報が、誰か一人の情報であることが分かること」を意味する。それに対し、「特定」とは、「ある情報が誰についての情報であるかが分かること」を意味する。例えば、データベースのあるレコードに、ある個人(一人の個人)のパーソナルデータが保存されているとする。このとき、「識別」とは、ある属性値(データベースのカラムの値)を基に、その属性値に対応するレコードが、一意に決定されることを意味する。一方、「特定」とは、その「識別」されたレコードが、誰(上記のある個人)のレコードであるかが分かることを意味する。なお、当然であるが、レコードは、「識別」されなければ、「特定」されない。 Here, the terms “specific” and “identification” used below will be described (see Non-Patent Document 2). Non-Patent Document 2 explains the term “specific” in comparison with the term “identification”. Based on Non-Patent Document 2, “identification” means “knowing that certain information is the information of one person”. On the other hand, “specific” means “knowing who the information is about”. For example, it is assumed that personal data of a certain individual (one individual) is stored in a certain record in the database. At this time, “identification” means that a record corresponding to the attribute value is uniquely determined based on a certain attribute value (a value in a database column). On the other hand, “specific” means that the record identified by “identified” can be known to whom (the above-mentioned individual) is a record. As a matter of course, a record is not “specified” unless it is “identified”.
 「識別」を防ぐための技術として、k-匿名性(k-anonymity)及びk-匿名化(k-anonymization)という技術が、開示されている(例えば、非特許文献3を参照)。なお、非特許文献1及び非特許文献2は、特定及び識別を防ぐための匿名化の具体的な技術内容を開示していないため、以下の説明は、主に非特許文献3に記載の技術を参照して説明する。 As techniques for preventing “identification”, techniques of k-anonymity and k-anonymization are disclosed (for example, see Non-Patent Document 3). In addition, since the nonpatent literature 1 and the nonpatent literature 2 do not disclose the specific technical content of the anonymization for preventing specification and identification, the following description is mainly a technique of the nonpatent literature 3. Will be described with reference to FIG.
 図16を参照して、k-匿名性及びk-匿名化について説明する。 Referring to FIG. 16, k-anonymity and k-anonymization will be described.
 まず、図16に示す表(テーブル)について説明する。図16に示す左側のテーブル及び右側のテーブルは、それぞれ、4名のパーソナルデータを格納している。それぞれのテーブルにおいて、各行で表される各レコードが、それぞれ、各個人(一人一人)のパーソナルデータである。各テーブルは、レコード番号(No.)、郵便番号、年齢、及び病状というカラムを含む。「レコード番号(No.)」は、レコードを一意に識別するための番号である。「郵便番号」、「年齢」及び「病状」は、属性である。属性は、属性名と属性値とを含む。例えば、「郵便番号」「年齢」及び「病状」は、属性名である。また、「1230001」「28」及び「心臓病」は、属性値である。例えば、左側のテーブルの1行目のレコードにおいて、郵便番号という属性名で示された属性は、属性値として1230001を持つ。なお、図16に示す表は、病院のカルテデータを想定している。 First, the table shown in FIG. 16 will be described. Each of the left table and the right table shown in FIG. 16 stores personal data for four persons. In each table, each record represented by each row is personal data of each individual (each person). Each table includes columns of record number (No.), zip code, age, and medical condition. “Record number (No.)” is a number for uniquely identifying a record. “Zip code”, “age” and “disease state” are attributes. The attribute includes an attribute name and an attribute value. For example, “zip code”, “age”, and “disease state” are attribute names. “1230001” “28” and “heart disease” are attribute values. For example, in the record in the first row of the left table, the attribute indicated by the attribute name “zip code” has an attribute value of 1230001. The table shown in FIG. 16 assumes hospital chart data.
 k-匿名性及びk-匿名化の技術において、属性(Attribute)は、識別子(ID:Identifier)、準識別子(QI:Quasi-Identifier)、センシティブ属性(Sensitive Attribute)、及びその他の属性の4種類に分類される。識別子とは、氏名のように、単一で個人を識別できる属性である。準識別子とは、他の属性と組み合わせると、個人を識別できる属性である。センシティブ属性とは、他人に知られたくない属性である。そして、これら以外の属性は、その他属性である。なお、属性は、準識別子かつセンシティブ属性であってもよい。 In k-anonymity and k-anonymization techniques, there are four types of attributes: an identifier (ID), a quasi-identifier (QI), a sensitive attribute (Sensitive Attribute), and other attributes. are categorized. An identifier is an attribute such as a name that can uniquely identify an individual. The quasi-identifier is an attribute that can identify an individual when combined with other attributes. Sensitive attributes are attributes that you do not want others to know. And attributes other than these are other attributes. The attribute may be a quasi-identifier and a sensitive attribute.
 次に、k-匿名性を確保する動作(k-匿名化の動作)について説明する。図16の左側のテーブルは、病院のカルテデータのうち、氏名のような識別子を削除したレコードを集めたテーブルである。 Next, an operation for ensuring k-anonymity (k-anonymization operation) will be described. The table on the left side of FIG. 16 is a table that collects records from which identifiers such as names are deleted from hospital medical record data.
 左側のテーブルのレコードは、識別子を含まない。そのため、左側のテーブルに含まれるレコードは、そのレコードの属性を基に誰のレコードであるか識別できないように見える。しかし、ここで、例えば、左側のテーブルの分析者(以下、攻撃者とも呼ぶ)が、ユーザAの属性として、「郵便番号が1230001」、「年齢が28才」を知っていて、さらに、「この病院に通院していること」を知っていたとする。この場合、攻撃者は、左側のテーブルのNo.1のレコードが、ユーザAのレコードであると分かる。その結果、攻撃者は、センシティブ属性であるユーザAの病状が心臓病であると分かる。このように、攻撃者は、左側のテーブルにおいて、郵便番号及び年齢を基に、レコードを特定できる。つまり、郵便番号及び年齢は、準識別子となっている。このように、識別子が削除されていても準識別子が含まれる場合、パーソナルデータは、レコードを特定される可能がある。その結果、ユーザが知られたくないと考えているセンシティブ属性が、攻撃者に知られてしまう可能性がある。そこで、k-匿名化という技術が用いられている。 The record in the left table does not include an identifier. Therefore, it seems that the records included in the left table cannot identify who the record is based on the attribute of the record. However, here, for example, an analyst (hereinafter also referred to as an attacker) in the table on the left side knows that the postal code is 1230001 and the age is 28 years old as the attributes of the user A. Suppose you knew you were going to this hospital. In this case, the attacker uses the No. in the left table. It can be seen that one record is the record of user A. As a result, the attacker knows that the medical condition of user A, which is a sensitive attribute, is heart disease. Thus, the attacker can specify a record based on the zip code and the age in the left table. That is, the zip code and age are quasi-identifiers. In this way, even if the identifier is deleted, if the quasi-identifier is included, the personal data may be identified as a record. As a result, the sensitive attribute that the user does not want to be known may be known to the attacker. Therefore, a technique called k-anonymization is used.
 図16の右側のテーブルは、k-匿名化を用いて準識別子を加工したレコードを含むテーブルである。右側のテーブルのレコードは、k-匿名性を用いて、郵便番号及び年齢の属性値が加工されている。なお、図16は、kの値として2を用いている。つまり、図16の右側のテーブルは、準識別子の属性値から識別できるレコード数が、2以上となっている。非特許文献3は、準識別子を基に識別されるレコード数がk以上となっているテーブルを、k-匿名性を満たすと定義している。また、k-匿名性を満たすように加工することは、k-匿名化と呼ばれる。例えば、右側のテーブルは、k=2のk-匿名性を満たすテーブルである。つまり、右側のテーブルは、2-匿名化されたテーブルである。なお、k-匿名性を満たすテーブルは、k未満の「識別」を防ぐことができる。そして、その結果として、k-匿名性を満たすテーブルは、k未満の「特定」を防ぐことができる。 The table on the right side of FIG. 16 is a table including records obtained by processing quasi-identifiers using k-anonymization. In the record in the right table, the postal code and age attribute values are processed using k-anonymity. In FIG. 16, 2 is used as the value of k. That is, in the table on the right side of FIG. 16, the number of records that can be identified from the attribute value of the semi-identifier is 2 or more. Non-Patent Document 3 defines that a table in which the number of records identified based on the quasi-identifier is k or more satisfies k-anonymity. Processing to satisfy k-anonymity is called k-anonymization. For example, the right table is a table satisfying k-anonymity of k = 2. That is, the table on the right is a 2-anonymized table. A table satisfying k-anonymity can prevent “identification” below k. As a result, a table satisfying k-anonymity can prevent “specification” less than k.
 非特許文献3に記載されているk-匿名性及びk-匿名化という技術は、「識別」の可能性(以降では、「識別リスク」とも呼ぶ)についての技術である。しかし、リスクには、非特許文献1に記載された「特定」のリスクがある。 The techniques of k-anonymity and k-anonymization described in Non-Patent Document 3 are techniques regarding the possibility of “identification” (hereinafter also referred to as “identification risk”). However, the risk includes a “specific” risk described in Non-Patent Document 1.
 つまり、現状の匿名性に関する技術は、識別リスク(識別)に対応した技術であり、「特定」の可能性(以降では、「特定リスク」と呼ぶ)に対応した技術ではない。つまり、非特許文献3に記載の技術は、特定リスクに対応できないという問題点があった。 In other words, the current technology related to anonymity is a technology corresponding to identification risk (identification), and is not a technology corresponding to the possibility of “specific” (hereinafter referred to as “specific risk”). That is, the technique described in Non-Patent Document 3 has a problem that it cannot cope with a specific risk.
 さらに、特定リスクを考慮せずに、識別リスクだけを考慮したデータの匿名化は、必要以上にプライバシーを保護する、つまり、必要以上にデータを加工する場合がある。 Furthermore, anonymization of data considering only identification risk without considering specific risk may protect privacy more than necessary, that is, process data more than necessary.
 攻撃者が準識別子を知っているとの前提は、識別リスクの考慮に相当する。しかし、攻撃者は、準識別子を知らない可能性もある。そして、攻撃者が準識別子を知っている可能性が低いにもかかわらず、準識別子を知っていることを前提としてk-匿名性のkの値を設定した場合、kの値は、必要以上に大きな値となってしまう。このように、匿名化を実行する装置は、識別リスクを用いた匿名化において、必要以上にデータを加工してしまう場合がある。その結果、匿名化後のデータの有用性は、必要以上に低下することになる。 The assumption that the attacker knows the quasi-identifier corresponds to consideration of identification risk. However, the attacker may not know the quasi-identifier. If the value of k-anonymity is set on the assumption that the attacker knows the quasi-identifier even though it is unlikely that the attacker knows the quasi-identifier, the value of k is more than necessary. It will be a big value. As described above, an anonymization apparatus may process data more than necessary in anonymization using an identification risk. As a result, the usefulness of the data after anonymization decreases more than necessary.
 このように、非特許文献3に記載の技術は、必要以上にデータを加工するという問題点があった。 Thus, the technique described in Non-Patent Document 3 has a problem of processing data more than necessary.
 本発明の目的は、上記の問題点を解決し、特定リスクを計算(評価)できる情報処理装置、情報処理方法、及び記録媒体を提供することにある。 An object of the present invention is to provide an information processing apparatus, an information processing method, and a recording medium that can solve the above problems and calculate (evaluate) a specific risk.
 本発明の一形態における情報処理装置は、指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算する識別リスク計算手段と、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、識別リスクとを基に、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定リスクを計算する特定リスク計算手段とを含む。 An information processing apparatus according to an aspect of the present invention includes an identification risk calculation unit that calculates an identification risk indicating that there is a possibility that data related to a specified individual is determined to be one person's data; Possibility that the specified personal data is determined to be the specified personal data based on the specific individual arrival rate indicating the possibility of being determined to be the specified personal data and the identification risk Specific risk calculation means for calculating a specific risk indicative of
 本発明の一形態における情報処理システムは、指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算する識別リスク計算手段と、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、識別リスクとを基に、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定リスクを計算する特定リスク計算手段とを含む情報処理装置と、複数の個人に関する情報であるパーソナルデータを保存するパーソナル情報保存手段と、情報処理装置が計算したパーソナルデータに含まれる全て個人に関するデータ対応する特定リスクを基に、パーソナルデータの全体に対応するリスクを計算する全体リスク計算手段とを含む。 An information processing system according to an aspect of the present invention includes an identification risk calculation unit that calculates an identification risk indicating that there is a possibility that data related to a specified individual is the data of one person, and data of the specified individual is Possibility that the specified personal data is determined to be the specified personal data based on the specific individual arrival rate indicating the possibility of being determined to be the specified personal data and the identification risk An information processing apparatus including a specific risk calculating means for calculating a specific risk indicating personal information, a personal information storing means for storing personal data as information on a plurality of individuals, and all individuals included in the personal data calculated by the information processing apparatus And an overall risk calculating means for calculating a risk corresponding to the entire personal data based on the specific risk corresponding to the data.
 本発明の一形態における情報処理方法は、指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算し、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、識別リスクとを基に、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定リスクを計算する。 An information processing method according to an aspect of the present invention calculates an identification risk indicating a possibility that data related to a specified individual is determined to be one person's data, and the data of the specified individual is Based on the specific individual arrival rate indicating the possibility of being determined to be data and the identification risk, the specific risk indicating the possibility that the specified personal data is determined to be the specified personal data calculate.
 本発明の一形態における記録媒体は、指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算する処理と、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、識別リスクとを基に、指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定リスクを計算する処理とをコンピュータに実行させるプログラムをコンピュータに読み取り可能に記録する。 The recording medium according to one aspect of the present invention includes a process for calculating an identification risk indicating that there is a possibility that the data related to the designated individual is determined to be one person's data, and the individual designated by the designated individual data. Specific risk that indicates the possibility that the data of the specified individual is determined to be the data of the specified individual based on the specific individual arrival rate that indicates the possibility that the data is determined to be A program for causing the computer to execute the process of calculating the value is recorded so as to be readable by the computer.
 本発明に基づけば、特定リスクを計算できるとの効果を奏することができる。 Based on the present invention, it is possible to obtain an effect that a specific risk can be calculated.
図1は、本発明のおける第1の実施形態に係る情報処理装置を含む情報処理システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of an information processing system including an information processing apparatus according to the first embodiment of the present invention. 図2は、第1の実施形態に係るパーソナルデータ保存部が含むパーソナルデータの一例を示す図である。FIG. 2 is a diagram illustrating an example of personal data included in the personal data storage unit according to the first embodiment. 図3は、第1の実施形態に係る情報処理装置の動作の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the operation of the information processing apparatus according to the first embodiment. 図4は、第1の実施形態に係る情報処理装置を含む情報処理システムの別の構成の一例を示すブロック図である。FIG. 4 is a block diagram illustrating an example of another configuration of the information processing system including the information processing apparatus according to the first embodiment. 図5は、第1の実施形態に係る情報処理システムの動作の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of the operation of the information processing system according to the first embodiment. 図6は、第1の実施形態の説明に用いる特定リスクの計算結果を示す図である。FIG. 6 is a diagram illustrating a calculation result of a specific risk used for explaining the first embodiment. 図7は、第2の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 7 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the second embodiment. 図8は、第2の実施形態に係る保存部が保存するデータの一例を示す図である。FIG. 8 is a diagram illustrating an example of data stored by the storage unit according to the second embodiment. 図9は、第3の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 9 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the third embodiment. 図10は、第3の実施形態に係る保存部が保存するデータの一例を示す図である。FIG. 10 is a diagram illustrating an example of data stored by the storage unit according to the third embodiment. 図11は、第4の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 11 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the fourth embodiment. 図12は、第4の実施形態に係る保存部が保存するデータの一例を示す図である。FIG. 12 is a diagram illustrating an example of data stored by the storage unit according to the fourth embodiment. 図13は、第5の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the fifth embodiment. 図14は、第5の実施形態に係る保存部が保存するデータの一例を示す図である。FIG. 14 is a diagram illustrating an example of data stored by the storage unit according to the fifth embodiment. 図15は、第6の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 15 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the sixth embodiment. 図16は、k-匿名化及びk-匿名性を説明するための図である。FIG. 16 is a diagram for explaining k-anonymization and k-anonymity. 図17は、第1の実施形態に係る情報処理装置の変形例の構成の一例を示すブロック図である。FIG. 17 is a block diagram illustrating an example of a configuration of a modification of the information processing apparatus according to the first embodiment. 図18は、第1の実施形態に係る情報処理装置の変形例の構成の一例を示すブロック図である。FIG. 18 is a block diagram illustrating an example of a configuration of a modified example of the information processing apparatus according to the first embodiment.
 次に、本発明の実施形態について図面を参照して説明する。 Next, an embodiment of the present invention will be described with reference to the drawings.
 なお、各図面は、本発明の実施形態を説明するものである。ただし、本発明は、各図面の記載に限られるわけではない。また、各図面の同様の構成には、同じ番号を付し、その繰り返しの説明を、省略する場合がある。また、以下の説明に用いる図面において、本発明の説明に関係しない部分の構成については、記載を省略し、図示しない場合もある。 Each drawing explains an embodiment of the present invention. However, the present invention is not limited to the description of each drawing. Moreover, the same number is attached | subjected to the same structure of each drawing, and the repeated description may be abbreviate | omitted. Further, in the drawings used for the following description, the description of the configuration of the part not related to the description of the present invention is omitted, and there are cases where it is not illustrated.
 まず、本実施形態の説明に用いる用語について、整理する。 First, the terms used in the description of this embodiment will be organized.
 「識別」とは、ある情報(データ)が誰か一人の情報(データ)であることが分かる(判断される)ことである。 “Identification” means that it is known (determined) that certain information (data) is information (data) of one person.
 「識別リスク」とは、識別される可能性を示す。 “Identification risk” indicates the possibility of identification.
 「特定」とは、ある情報(データ)が誰の情報(データ)であるかが分かる(判断される)ことである。 “Specific” means knowing (determining) who information (data) is certain information (data).
 「特定リスク」とは、特定される可能性を示す。 “Specified risk” indicates the possibility of being specified.
 「レコード」は、各個人のパーソナルデータである。レコードは、複数の属性を含む。 “Record” is personal data of each individual. The record includes a plurality of attributes.
 「属性」とは、レコードに含まれるデータの種別である。属性は、属性名と、属性値とを含む。 “Attribute” is the type of data included in the record. The attribute includes an attribute name and an attribute value.
 「識別子」とは、単独で個人を特定できる属性である。 “Identifier” is an attribute that can identify an individual alone.
 「準識別子」とは、他の属性との組合せを基に、個人を特定できる属性である。 “Semi-identifier” is an attribute that can identify an individual based on a combination with other attributes.
 「センシティブ属性」とは、個人が公開したくない属性(データ)である。 “Sensitive attributes” are attributes (data) that individuals do not want to disclose.
 「特定個人到達率」とは、データ(レコード)が、ある特定の個人のデータであると判断される(特定される)可能性を示す値である。より詳細には、特定個人到達率とは、識別リスクが算出された場合における特定リスクを算出するために用いられる値である。あるいは、特定個人到達率は、レコードにおけるデータを識別するための情報を基に算出された識別リスクを補正して、特定リスクを算出するときに用いられる値である。より具体的には、特定個人到達率は、例えば、攻撃者が、レコードにおけるデータを識別するための情報(例えば、上記の準識別子の値)を取得する可能性である。例えば、同じ識別リスクとなっている準識別子が二つある場合で、その一の準識別子に対する取得可能性が、他の準識別子に対する取得可能性より高い場合、一の準識別子に対する特定個人到達率は、他の準識別子に対する特定個人到達率より高くなる。あるいは、ある準識別子の値を取得する可能性は、一般的に、その準識別子を含む複数の準識別子の値の組合せを取得する可能性より高くなる。そのため、ある準識別子に対する特定個人到達率は、その準識別子を含む準識別子の組合せに対する特定個人到達率より高くなる。また、特定個人到達率を高くすることは、特定リスクを高くすることである。そして、特定リスクに対応した匿名化は、特定リスクが高いほど、匿名性を高くする。例えば、稀な属性(準識別子)は、特定されやすい。つまり、稀な属性における識別リスクに対する特定リスクの比率は、一般的な属性のおける識別リスクに対する特定リスクの比率より高くすることが望ましい。そのため、稀な属性に対する特定個人到達率は、稀でない属性に対する特定個人到達率より高くなる。なお、特定個人到達率の値(r)の範囲は、0から1の間(0≦r≦1)である。 The “specific individual arrival rate” is a value indicating the possibility that the data (record) is determined (specified) as data of a specific individual. More specifically, the specific individual arrival rate is a value used to calculate the specific risk when the identification risk is calculated. Alternatively, the specific individual arrival rate is a value used when calculating a specific risk by correcting an identification risk calculated based on information for identifying data in a record. More specifically, the specific individual arrival rate is, for example, the possibility that an attacker acquires information (for example, the value of the quasi-identifier described above) for identifying data in a record. For example, when there are two quasi-identifiers that have the same identification risk, and the probability of acquisition for one quasi-identifier is higher than the probability of acquisition for other quasi-identifiers, the specific individual arrival rate for one quasi-identifier Is higher than the specific individual arrival rate for other quasi-identifiers. Alternatively, the possibility of obtaining a value of a certain semi-identifier is generally higher than the possibility of obtaining a combination of values of a plurality of semi-identifiers including the semi-identifier. Therefore, the specific individual arrival rate for a certain quasi-identifier is higher than the specific individual arrival rate for a combination of quasi-identifiers including the quasi-identifier. In addition, increasing the specific individual arrival rate means increasing the specific risk. And the anonymization corresponding to a specific risk makes anonymity high, so that a specific risk is high. For example, rare attributes (quasi-identifiers) are easily specified. That is, it is desirable that the ratio of the specific risk to the identification risk in the rare attribute is higher than the ratio of the specific risk to the identification risk in the general attribute. Therefore, the specific person arrival rate for rare attributes is higher than the specific individual arrival rate for non-rare attributes. In addition, the range of the value (r) of the specific individual arrival rate is between 0 and 1 (0 ≦ r ≦ 1).
 <第1の実施の形態>
 本発明における第1の実施形態について図面を参照して説明する。
<First Embodiment>
A first embodiment of the present invention will be described with reference to the drawings.
[構成の説明]
 図1は、本発明における第1の実施形態に係る情報処理装置100を含む情報処理システム300の構成の一例を示すブロック部である。図1に示すように、情報処理システム300は、情報処理装置100とパーソナルデータ保存部200とを含む。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。
[Description of configuration]
FIG. 1 is a block diagram showing an example of the configuration of an information processing system 300 including an information processing apparatus 100 according to the first embodiment of the present invention. As illustrated in FIG. 1, the information processing system 300 includes an information processing apparatus 100 and a personal data storage unit 200. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
 パーソナルデータ保存部200は、情報処理装置100における特定リスクの評価の処理の対象であるパーソナルデータを保存する。 The personal data storage unit 200 stores personal data that is a target of specific risk evaluation processing in the information processing apparatus 100.
 図2は、パーソナルデータ保存部200が保存するパーソナルデータの一例を示す図である。 FIG. 2 is a diagram illustrating an example of personal data stored by the personal data storage unit 200.
 パーソナルデータ保存部200は、図2に示すように、パーソナルデータとして、ユーザ(個人)の識別子である「ユーザID」と、そのユーザに関する属性とを関連付けて、保存する。図2に示すパーソナルデータは、ユーザに関する属性として、年齢、性別、及び病名を含む。例えば、「user1」の「年齢」(属性名)は、「20」(属性値)である。同様に、user1の「性別」(属性名)は、「男」(属性値)である。また、user1の「病名」(属性名)は、「かぜ」(属性値)である。 As shown in FIG. 2, the personal data storage unit 200 stores “user ID”, which is an identifier of a user (individual), in association with attributes related to the user as personal data. The personal data shown in FIG. 2 includes age, sex, and disease name as attributes relating to the user. For example, “age” (attribute name) of “user1” is “20” (attribute value). Similarly, the “sex” (attribute name) of user1 is “male” (attribute value). The “disease name” (attribute name) of user1 is “cold” (attribute value).
 なお、以下における各実施形態の動作の説明は、パーソナルデータの一例として、図2に示すパーソナルデータを用いる。また、準識別子は、年齢及び性別とする。つまり、年齢と性別が、匿名化対象の属性とする。そして、センシティブ属性は、病名とする。 In the following description of the operation of each embodiment, personal data shown in FIG. 2 is used as an example of personal data. The quasi-identifier shall be age and gender. That is, age and sex are attributes to be anonymized. The sensitive attribute is a disease name.
 情報処理装置100は、パーソナルデータ保存部200が保存するパーソナルデータにおける指定された個人の特定リスクを計算(評価)する。 The information processing apparatus 100 calculates (evaluates) the specified individual specific risk in the personal data stored by the personal data storage unit 200.
 そのため、情報処理装置100は、受信部110と、識別リスク計算部120と、特定リスク計算部130とを含む。 Therefore, the information processing apparatus 100 includes a reception unit 110, an identification risk calculation unit 120, and a specific risk calculation unit 130.
 受信部110は、図示しない装置から特定個人到達率を受信する。本実施形態において、特定個人到達率を送信する装置は、特に制限はない。例えば、受信部110は、ユーザが操作する装置から、特定個人到達率を受信してもよい。あるいは、受信部110は、図示しない記憶装置から特定個人到達率を読み出してもよい。以下、これらをまとめて、「受信部110が、特定個人到達率を受信する」と呼ぶ。 The receiving unit 110 receives a specific individual arrival rate from a device (not shown). In the present embodiment, the device that transmits the specific individual arrival rate is not particularly limited. For example, the receiving unit 110 may receive a specific individual arrival rate from a device operated by the user. Alternatively, the receiving unit 110 may read the specific individual arrival rate from a storage device (not shown). Hereinafter, these are collectively referred to as “the receiving unit 110 receives a specific individual arrival rate”.
 識別リスク計算部120は、識別リスクを計算する。 The identification risk calculation unit 120 calculates the identification risk.
 特定リスク計算部130は、識別リスクと特定個人到達率とを基に、特定リスクを計算する。 The specific risk calculation unit 130 calculates the specific risk based on the identification risk and the specific individual arrival rate.
 識別リスク計算部120及び特定リスク計算部130における計算の詳細は、後ほど説明する。 Details of calculation in the identification risk calculation unit 120 and the specific risk calculation unit 130 will be described later.
[動作の説明]
 次に、図面を参照して、情報処理装置100の動作について説明する。
[Description of operation]
Next, the operation of the information processing apparatus 100 will be described with reference to the drawings.
 図3は、第1の実施形態に係る情報処理装置100の動作の一例を示すフローチャートである。図3に示すように、情報処理装置100は、以下で説明するステップS101からS104までの動作を実行する。 FIG. 3 is a flowchart showing an example of the operation of the information processing apparatus 100 according to the first embodiment. As illustrated in FIG. 3, the information processing apparatus 100 performs operations from steps S101 to S104 described below.
 情報処理装置100は、予め、特定リスクを計算する対象の個人(ユーザ)が指定されているとする。そして、受信部110は、特定個人到達率(r)を受信する。 It is assumed that the information processing apparatus 100 has previously designated an individual (user) who is to calculate a specific risk. Then, the receiving unit 110 receives the specific individual arrival rate (r).
 そして、識別リスク計算部120は、パーソナルデータ保存部200から、パーソナルデータを取得する(ステップS101)。本説明においては、既に説明したとおり、識別リスク計算部120は、図2に示したパーソナルデータを取得する。 Then, the identification risk calculation unit 120 acquires personal data from the personal data storage unit 200 (step S101). In this description, as already described, the identification risk calculation unit 120 acquires the personal data shown in FIG.
 次に、識別リスク計算部120は、パーソナルデータを基に、指定された個人に関して、何人の(何個の)レコードを識別できるかを計算する。そして、識別リスク計算部120は、識別できるレコードの数(m)を基に、指定された個人の識別リスクを、計算する(ステップS102)。mの値が大きい程、識別できたレコードから、指定された個人を特定することが、より困難になる。 Next, the identification risk calculation unit 120 calculates how many (how many) records can be identified for the specified individual based on the personal data. Then, the identification risk calculation unit 120 calculates the identification risk of the designated individual based on the number (m) of records that can be identified (step S102). The greater the value of m, the more difficult it is to identify the specified individual from the identified record.
 なお、識別リスク計算部120は、予め設定されている計算手法を用いて、識別リスクを計算する。識別リスクの計算手法は、特に制限はない。例えば、mレコードが識別される場合、識別リスク計算部120は、識別リスクとして「1/m」を計算してもよい。そして、識別リスク計算部120は、特定リスク計算部130に計算した識別リスクを送信する。 Note that the identification risk calculation unit 120 calculates the identification risk using a preset calculation method. There is no particular limitation on the method for calculating the identification risk. For example, when m records are identified, the identification risk calculation unit 120 may calculate “1 / m” as the identification risk. Then, the identification risk calculation unit 120 transmits the calculated identification risk to the specific risk calculation unit 130.
 識別リスク計算部120の動作の一例を、図2に示すパーソナルデータを参照して説明する。なお、指定された個人は、「user1」とする。 An example of the operation of the identification risk calculation unit 120 will be described with reference to personal data shown in FIG. The designated individual is “user1”.
 図2を参照すると「user1」の2つの準識別子(年齢及び性別)の値は、それぞれ「20(年齢)」及び「男(性別)」である。この2つの準識別子の値は、「user1」及び「user2」において、同じである。つまり、年齢が「20」であり、性別が「男」であるレコードとして、2名(m=2)のレコードが識別される。よって、識別リスク計算部120は、識別リスクとして、0.5(=1/2)を計算する。なお、識別リスク計算部120は、識別リスクの計算として、除算に限らず他の四則演算、又は、累乗根など他の演算を用いてもよい。また、識別リスク計算部120は、一般的な識別リスクの計算手法を用いてもよい。 Referring to FIG. 2, the values of the two quasi-identifiers (age and sex) of “user1” are “20 (age)” and “male (sex)”, respectively. The values of the two quasi-identifiers are the same in “user1” and “user2”. That is, two records (m = 2) are identified as records having an age of “20” and a gender of “male”. Therefore, the identification risk calculation unit 120 calculates 0.5 (= 1/2) as the identification risk. In addition, the identification risk calculation unit 120 may use other arithmetic operations such as other four arithmetic operations or a power root as the calculation of the identification risk, not limited to division. The identification risk calculation unit 120 may use a general identification risk calculation method.
 そして、識別リスク計算部120は、識別リスクとして、0.5を特定リスク計算部130に送信する。特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。 Then, the identification risk calculation unit 120 transmits 0.5 as the identification risk to the specific risk calculation unit 130. The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120.
 そして、特定リスク計算部130は、個人の特定個人到達率(r)を、受信部110から取得する(ステップS103)。ここでは、特定個人到達率(r)は、0.3とする。 Then, the specific risk calculation unit 130 acquires the specific individual arrival rate (r) of the individual from the reception unit 110 (step S103). Here, the specific individual arrival rate (r) is 0.3.
 そして、特定リスク計算部130は、識別リスク(0.5)と特定個人到達率(r=0.3)とを基に、指定された個人の特定リスクを計算する(ステップS104)。特定リスク計算部130は、予め設定されている計算手法を用いて、特定リスクを計算する。なお、特定リスクを計算する手法は、特に制限はない。以下の説明では、一例として、特定リスク計算部130は、特定リスクとして、「個人の特定リスク=(1/m)×r」を用いて計算するとする。 Then, the specific risk calculation unit 130 calculates the specific risk of the specified individual based on the identification risk (0.5) and the specific individual arrival rate (r = 0.3) (step S104). The specific risk calculation unit 130 calculates the specific risk using a preset calculation method. The method for calculating the specific risk is not particularly limited. In the following description, as an example, it is assumed that the specific risk calculation unit 130 calculates using “individual specific risk = (1 / m) × r” as the specific risk.
 ここでは、識別リスク(1/m)は、0.5である。また、特定個人到達率(r)は、0.3である。そのため、特定リスク計算部130は、特定リスクとして、「0.5×0.3=0.15」を計算する。 Here, the identification risk (1 / m) is 0.5. The specific individual arrival rate (r) is 0.3. Therefore, the specific risk calculation unit 130 calculates “0.5 × 0.3 = 0.15” as the specific risk.
 なお、上記において、個人の特定リスクの計算として、「特定リスク=(1/m)×r」を用いたが、特定リスクの計算は、これに限る必要ない。例えば、特定リスク計算部130は、特定リスクの計算として、「特定リスク=(1/m)×r×r」を用いてもよい。また、特定リスク計算部130は、計算式として乗算又は除算に限らず、加算又は減算を用いてもよい。あるいは、特定リスク計算部130は、四則に限らず、累乗根又は対数のような計算を用いてもよい。さらに、特定リスク計算部130は、これらの計算を組み合わせてもよい。 In the above, “specific risk = (1 / m) × r” is used as the calculation of the individual specific risk. However, the calculation of the specific risk is not limited to this. For example, the specific risk calculation unit 130 may use “specific risk = (1 / m) × r × r” as the calculation of the specific risk. The specific risk calculation unit 130 is not limited to multiplication or division as a calculation formula, and addition or subtraction may be used. Alternatively, the specific risk calculation unit 130 is not limited to the four rules, and may use a calculation such as a power root or logarithm. Furthermore, the specific risk calculation unit 130 may combine these calculations.
 なお、特定リスク計算部130は、特定リスクを所定の装置(例えば、ユーザーの装置)に送信してもよい。 It should be noted that the specific risk calculation unit 130 may transmit the specific risk to a predetermined device (for example, a user device).
 なお、ここまでの説明として、情報処理装置100は、ある個人に対する特定リスクを計算した。しかし、情報処理装置100は、パーソナルデータ保存部200に保存された複数又は全ての個人に対する特定リスクを計算してもよい。 As an explanation so far, the information processing apparatus 100 calculates a specific risk for a certain individual. However, the information processing apparatus 100 may calculate specific risks for a plurality of or all individuals stored in the personal data storage unit 200.
 図4は、第1の実施形態に係る情報処理装置100を含む別の構成の情報処理システム310の構成の一例を示す図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。 FIG. 4 is a diagram illustrating an example of a configuration of an information processing system 310 having another configuration including the information processing apparatus 100 according to the first embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
 図4に示す情報処理システム310は、情報処理システム300の構成に加え、特定リスク計算結果保存部230と、全体リスク計算部240とを含む。なお、情報処理装置100が、全体リスク計算部240を含んでもよい。さらに、情報処理装置100は、特定リスク計算結果保存部230を含んでもよい。 4 includes, in addition to the configuration of the information processing system 300, a specific risk calculation result storage unit 230 and an overall risk calculation unit 240. Note that the information processing apparatus 100 may include an overall risk calculation unit 240. Furthermore, the information processing apparatus 100 may include a specific risk calculation result storage unit 230.
 特定リスク計算結果保存部230は、各個人についての、特定リスクを保存する。 The specific risk calculation result storage unit 230 stores a specific risk for each individual.
 図6は、各ユーザに対する特定リスクの計算結果を示す図である。図6において、表の右側に示す式は、各個人の特定リスクを算出した式である。 FIG. 6 is a diagram showing a calculation result of a specific risk for each user. In FIG. 6, the formula shown on the right side of the table is a formula for calculating the specific risk of each individual.
 全体リスク計算部240は、情報処理装置100を用いて、パーソナルデータ保存部200に保存されている全ての個人の特定リスクを計算し、算出された特定リスクを特定リスク計算結果保存部230に保存する。そして、全ての個人の特定リスクの計算が終了後、全体リスク計算部240は、特定リスク計算結果保存部230に保存された全ての個人の特定リスクを用いて、パーソナルデータ保存部200に保存されているパーソナルデータにおける全体リスクを計算する。ここで、「全体リスク」とは、各個人の特定リスクを基に、所定の計算式を用いて算出される値である。例えば、全体リスクは、全ての個人の特定リスクの合計値、算術平均値、中央値、又は、最頻値でもよい。あるいは、全体リスクは、全ての個人の特定リスクにおける、最大値、又は、最小値でもよい。あるいは、全体リスクは、全ての個人の特定リスクにおいて、所定の数の上位の特定リスクの合計値又は平均値でもよい。さらに、全体リスクは、全ての個人の特定リスクの分散、又は、標準偏差のような分布の形状に関する値でもよい。 The overall risk calculation unit 240 calculates the specific risks of all individuals stored in the personal data storage unit 200 using the information processing apparatus 100, and stores the calculated specific risks in the specific risk calculation result storage unit 230. To do. After the calculation of the specific risk for all individuals is completed, the overall risk calculation unit 240 is stored in the personal data storage unit 200 using all the specific risks stored in the specific risk calculation result storage unit 230. Calculate the overall risk in your personal data. Here, the “total risk” is a value calculated using a predetermined calculation formula based on the specific risk of each individual. For example, the overall risk may be a total value, an arithmetic average value, a median value, or a mode value of specific risks of all individuals. Alternatively, the overall risk may be the maximum value or the minimum value in the specific risk of all individuals. Alternatively, the total risk may be a total value or an average value of a predetermined number of high-level specific risks in all individual specific risks. Furthermore, the overall risk may be a value related to the distribution of specific risks of all individuals or a distribution shape such as standard deviation.
 全体リスク計算部240は、全体リスクとして、1つの値に限らず、複数の値(例えば、平均値と分散)を計算してもよい。 The overall risk calculation unit 240 may calculate not only one value but a plurality of values (for example, an average value and variance) as the overall risk.
 情報処理システム310の動作について、図面を参照して説明する。 The operation of the information processing system 310 will be described with reference to the drawings.
 図5は、情報処理システム310の動作の一例を示すフローチャートである。 FIG. 5 is a flowchart showing an example of the operation of the information processing system 310.
 なお、以下の動作の説明においては、全体リスク計算部240が、動作を制御するとして説明する。ただし、動作の制御主体は、これに限る必要はない。例えば、情報処理装置100が、全体リスク計算部240を含め制御してもよい。あるいは、図示しない制御装置が、情報処理システム310に含まれる構成を制御してもよい。 In the following description of the operation, it is assumed that the overall risk calculation unit 240 controls the operation. However, the operation control entity need not be limited to this. For example, the information processing apparatus 100 may control including the overall risk calculation unit 240. Alternatively, a control device (not shown) may control the configuration included in the information processing system 310.
 まず、全体リスク計算部240は、情報処理装置100にパーソナルデータの取得を指示する。情報処理装置100は、パーソナルデータ保存部200から、パーソナルデータを取得する(ステップS201)。 First, the overall risk calculation unit 240 instructs the information processing apparatus 100 to acquire personal data. The information processing apparatus 100 acquires personal data from the personal data storage unit 200 (step S201).
 次に、全体リスク計算部240は、パーソナルデータの各個人に対応する特定リスクの計算を情報処理装置100に指示する(ステップS202)。 Next, the overall risk calculation unit 240 instructs the information processing apparatus 100 to calculate a specific risk corresponding to each individual of the personal data (step S202).
 情報処理装置100は、指定された個人に対応する特定リスクを計算する(ステップS203)。 The information processing apparatus 100 calculates a specific risk corresponding to the designated individual (step S203).
 全体リスク計算部240は、算出された特定リスクを、その個人と関連付けて、特定リスク計算結果保存部230に保存する(ステップS204)。 The overall risk calculation unit 240 stores the calculated specific risk in the specific risk calculation result storage unit 230 in association with the individual (step S204).
 全ての個人についての特定リスクを計算後、全体リスク計算部240は、全ての個人の特定リスクを基に、全体リスクを計算する(ステップS205)。 After calculating the specific risk for all individuals, the overall risk calculation unit 240 calculates the overall risk based on the specific risks of all individuals (step S205).
[効果の説明]
 次に、第1の実施形態の効果について説明する。
[Description of effects]
Next, the effect of the first embodiment will be described.
 第1の実施形態に係る情報処理装置100は、所定の個人の特定リスクを計算できるとの効果を奏することができる。 The information processing apparatus 100 according to the first embodiment can achieve an effect that a specific risk of a predetermined individual can be calculated.
 その理由は、次のとおりである。 The reason is as follows.
 本実施形態の受信部110が、特定個人到達率を受信する。そして、識別リスク計算部120が、個人の識別リスクを計算する。そして、特定リスク計算部130が、識別リスクと特定個人到達率とを基に、指定された個人の特定リスクを計算できるためである。 The receiving unit 110 of the present embodiment receives the specific individual arrival rate. Then, the identification risk calculation unit 120 calculates an individual identification risk. This is because the specific risk calculation unit 130 can calculate the specific risk of the designated individual based on the identification risk and the specific individual arrival rate.
 そのため、情報処理装置100を用いるシステムは、情報処理装置100が算出した特定リスクを用いて、適切なパーソナルデータの匿名化を決定できる。 Therefore, the system using the information processing apparatus 100 can determine anonymization of appropriate personal data using the specific risk calculated by the information processing apparatus 100.
 また、情報処理装置100は、必要以上のデータの加工を防ぐという効果を奏することができる。 In addition, the information processing apparatus 100 can achieve an effect of preventing unnecessary data processing.
 その理由は、情報処理装置100が、特定リスクを算出するため、情報処理装置100を用いるシステムは、匿名化の程度(例えば、k-匿名性のkの値)を決定する場合に、識別リスクに加え、特定リスクを用いることができるためである。 The reason is that since the information processing apparatus 100 calculates a specific risk, the system using the information processing apparatus 100 determines the degree of anonymization (for example, k-value of anonymity) when identifying risk. This is because a specific risk can be used.
 また、本実施形態に係る情報処理装置100を含む情報処理システム310は、パーソナルデータの全体に対する全体リスクを計算できるとの効果を奏することができる。 In addition, the information processing system 310 including the information processing apparatus 100 according to the present embodiment can produce an effect that it is possible to calculate the overall risk for the entire personal data.
 その理由は、全体リスク計算部240が、全ての個人の特定リスクを基に、パーソナルデータの全体リスクを計算できるためである。 The reason is that the overall risk calculation unit 240 can calculate the overall risk of personal data based on the specific risk of all individuals.
[変形例1]
 以上の説明した情報処理装置100は、次のように構成される。
[Modification 1]
The information processing apparatus 100 described above is configured as follows.
 例えば、情報処理装置100の各構成部は、ハードウェア回路で構成されてもよい。 For example, each component of the information processing apparatus 100 may be configured with a hardware circuit.
 また、情報処理装置100において、各構成部は、ネットワークを介して接続した複数の装置を用いて、構成されてもよい。 Moreover, in the information processing apparatus 100, each component may be configured using a plurality of apparatuses connected via a network.
 図18は、本実施形態の変形例1に係る情報処理装置106の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。 FIG. 18 is a block diagram illustrating an example of the configuration of the information processing apparatus 106 according to the first modification of the present embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal.
 情報処理装置106は、識別リスク計算部120と、特定リスク計算部130とを含む。情報処理装置106の各構成は、図示しないネットワークなどを介して、パーソナルデータ及び特定個人到達率を受信し、情報処理装置100の各構成と同様に動作する。 The information processing apparatus 106 includes an identification risk calculation unit 120 and a specific risk calculation unit 130. Each configuration of the information processing apparatus 106 receives personal data and a specific individual arrival rate via a network (not shown) and operates in the same manner as each configuration of the information processing apparatus 100.
 このように構成された情報処理装置106は、情報処理装置100と同様の効果を奏することができる。 The information processing apparatus 106 configured in this manner can achieve the same effects as the information processing apparatus 100.
 その理由は、上記のとおり、情報処理装置106の各構成が、情報処理装置100の構成と同様に動作し、特定リスクを計算できるためである。 The reason is that, as described above, each configuration of the information processing apparatus 106 operates in the same manner as the configuration of the information processing apparatus 100 and can calculate a specific risk.
 なお、情報処理装置106は、本発明の実施形態における最小構成である。 Note that the information processing apparatus 106 is the minimum configuration in the embodiment of the present invention.
[変形例2]
 さらに、情報処理装置100、及び情報処理装置106の変形例について、情報処理装置100を用いて説明する。また、情報処理装置100及び情報処理装置において、複数の構成部は、1つのハードウェアで構成されてもよい。
[Modification 2]
Furthermore, modified examples of the information processing apparatus 100 and the information processing apparatus 106 will be described using the information processing apparatus 100. In the information processing apparatus 100 and the information processing apparatus, the plurality of components may be configured with a single piece of hardware.
 また、情報処理装置100は、CPU(Central Processing Unit)と、ROM(Read Only Memory)と、RAM(Random Access Memory)とを含むコンピュータ装置として実現されてもよい。情報処理装置100は、上記構成に加え、さらに、入出力接続回路(IOC:Input / Output Circuit)と、ネットワークインターフェース回路(NIC:Network Interface Circuit)とを含むコンピュータ装置として実現されてもよい。 Further, the information processing apparatus 100 may be realized as a computer apparatus including a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). In addition to the above configuration, the information processing apparatus 100 may be realized as a computer apparatus that further includes an input / output connection circuit (IOC: Input : / Output Circuit) and a network interface circuit (NIC: Network Interface Circuit).
 図17は、本変形例に係る情報処理装置600の構成の一例を示すブロック図である。 FIG. 17 is a block diagram showing an example of the configuration of the information processing apparatus 600 according to this modification.
 情報処理装置600は、CPU610と、ROM620と、RAM630と、内部記憶装置640と、IOC650と、NIC680とを含み、コンピュータ装置を構成している。 The information processing apparatus 600 includes a CPU 610, a ROM 620, a RAM 630, an internal storage device 640, an IOC 650, and a NIC 680, and constitutes a computer device.
 CPU610は、ROM620からプログラムを読み込む。そして、CPU610は、読み込んだプログラムに基づいて、RAM630と、内部記憶装置640と、IOC650と、NIC680とを制御する。そして、CPU610を含むコンピュータは、これらの構成を制御して、図1に示す、受信部110と、識別リスク計算部120と、特定リスク計算部130としての各機能を実現する。さらに、CPU610を含むコンピュータは、これらの構成を制御して、図4に示す全体リスク計算部240としての機能を実現してもよい。 CPU 610 reads a program from ROM 620. The CPU 610 controls the RAM 630, the internal storage device 640, the IOC 650, and the NIC 680 based on the read program. The computer including the CPU 610 controls these configurations to realize the functions as the reception unit 110, the identification risk calculation unit 120, and the specific risk calculation unit 130 illustrated in FIG. Further, the computer including the CPU 610 may control these configurations to realize the function as the overall risk calculation unit 240 shown in FIG.
 CPU610は、各機能を実現する際に、RAM630又は内部記憶装置640を、プログラムの一時記憶媒体として使用してもよい。 The CPU 610 may use the RAM 630 or the internal storage device 640 as a temporary storage medium for the program when realizing each function.
 また、CPU610は、コンピュータで読み取り可能にプログラムを記憶した記憶媒体700が含むプログラムを、図示しない記憶媒体読み取り装置を用いて読み込んでもよい。あるいは、CPU610は、NIC680を介して、図示しない外部の装置からプログラムを受け取り、RAM630に保存して、保存したプログラムを基に動作してもよい。 Further, the CPU 610 may read a program included in the storage medium 700 storing the program so as to be readable by a computer by using a storage medium reading device (not shown). Alternatively, the CPU 610 may receive a program from an external device (not shown) via the NIC 680, store the program in the RAM 630, and operate based on the stored program.
 ROM620は、CPU610が実行するプログラム及び固定的なデータを記憶する。ROM620は、例えば、P-ROM(Programmable-ROM)又はフラッシュROMである。 ROM 620 stores programs executed by CPU 610 and fixed data. The ROM 620 is, for example, a P-ROM (Programmable-ROM) or a flash ROM.
 RAM630は、CPU610が実行するプログラム及びデータを一時的に記憶する。RAM630は、例えば、D-RAM(Dynamic-RAM)である。 The RAM 630 temporarily stores programs executed by the CPU 610 and data. The RAM 630 is, for example, a D-RAM (Dynamic-RAM).
 内部記憶装置640は、情報処理装置600が長期的に保存するデータ及びプログラムを記憶する。また、内部記憶装置640は、CPU610の一時記憶装置として動作してもよい。内部記憶装置640は、例えば、ハードディスク装置、光磁気ディスク装置、SSD(Solid State Drive)又はディスクアレイ装置である。内部記憶装置640は、パーソナルデータ保存部200として動作してもよい。 The internal storage device 640 stores data and programs stored in the information processing device 600 for a long period of time. Further, the internal storage device 640 may operate as a temporary storage device for the CPU 610. The internal storage device 640 is, for example, a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), or a disk array device. The internal storage device 640 may operate as the personal data storage unit 200.
 ここで、ROM620と内部記憶装置640は、不揮発性の記憶媒体である。一方、RAM630は、揮発性の記憶媒体である。そして、CPU610は、ROM620、内部記憶装置640、又は、RAM630に記憶されているプログラムを基に動作可能である。つまり、CPU610は、不揮発性記憶媒体又は揮発性記憶媒体を用いて動作可能である。 Here, the ROM 620 and the internal storage device 640 are nonvolatile storage media. On the other hand, the RAM 630 is a volatile storage medium. The CPU 610 can operate based on a program stored in the ROM 620, the internal storage device 640, or the RAM 630. That is, the CPU 610 can operate using a nonvolatile storage medium or a volatile storage medium.
 IOC650は、CPU610と、入力機器660及び表示機器670とのデータを仲介する。IOC650は、例えば、IOインターフェースカード又はUSB(Universal Serial Bus)カードである。 The IOC 650 mediates data between the CPU 610, the input device 660, and the display device 670. The IOC 650 is, for example, an IO interface card or a USB (Universal Serial Bus) card.
 入力機器660は、情報処理装置600の操作者からの入力指示を受け取る機器である。入力機器660は、例えば、キーボード、マウス又はタッチパネルである。 The input device 660 is a device that receives an input instruction from an operator of the information processing apparatus 600. The input device 660 is, for example, a keyboard, a mouse, or a touch panel.
 表示機器670は、情報処理装置600の操作者に情報を表示する機器である。表示機器670は、例えば、液晶ディスプレイである。 The display device 670 is a device that displays information to the operator of the information processing apparatus 600. The display device 670 is a liquid crystal display, for example.
 NIC680は、ネットワークを介した図示しない外部の装置とのデータのやり取りを中継する。NIC680は、例えば、LAN(Local Area Network)カードである。 The NIC 680 relays data exchange with an external device (not shown) via the network. The NIC 680 is, for example, a LAN (Local Area Network) card.
 このように構成された情報処理装置600は、情報処理装置100と同様の効果を奏することができる。 The information processing apparatus 600 configured as described above can achieve the same effects as the information processing apparatus 100.
 その理由は、情報処理装置600のCPU610が、プログラムに基づいて情報処理装置100と同様の機能を実現できるためである。 This is because the CPU 610 of the information processing apparatus 600 can realize the same function as the information processing apparatus 100 based on the program.
 <第2の実施の形態>
 第2の実施形態について図面を参照して説明する。
<Second Embodiment>
A second embodiment will be described with reference to the drawings.
 第2の実施形態に係る情報処理装置101は、準識別子の属性に応じて特定個人到達率を決定する点が、第1の実施形態の情報処理装置100と異なる。本実施形態は、複数の攻撃者において、準識別子を知っている可能性が異なる場合に、対応できる。 The information processing apparatus 101 according to the second embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to the attribute of the quasi-identifier. This embodiment can cope with a case where a plurality of attackers have different possibilities of knowing the quasi-identifier.
[構成の説明]
 図面を参照して、第2の実施形態に係る情報処理装置101の構成について説明する。
[Description of configuration]
The configuration of the information processing apparatus 101 according to the second embodiment will be described with reference to the drawings.
 図7は、第2の実施形態に係る情報処理装置101の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。図7に示すように、情報処理装置101は、情報処理装置100の特定個人到達率を受信する受信部110に換えて、取得部(第1の取得部)111を含む。さらに、情報処理装置101は、保存部211を含む。なお、情報処理装置101は、図17に示すコンピュータを用いて構成されてもよい。また、情報処理装置101は、保存部211を、ネットワークを介して接続する外部の装置としてもよい。また、取得部111は、以下で説明する保存部211が保存する情報を、第1の実施形態と同様に、図示しない外部の装置から受信してもよい。この場合、情報処理装置101は、保存部211を含まなくてもよい。 FIG. 7 is a block diagram illustrating an example of the configuration of the information processing apparatus 101 according to the second embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal. As illustrated in FIG. 7, the information processing apparatus 101 includes an acquisition unit (first acquisition unit) 111 instead of the reception unit 110 that receives the specific individual arrival rate of the information processing apparatus 100. Further, the information processing apparatus 101 includes a storage unit 211. The information processing apparatus 101 may be configured using a computer shown in FIG. Further, the information processing apparatus 101 may use the storage unit 211 as an external apparatus connected via a network. Further, the acquisition unit 111 may receive information stored in the storage unit 211 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 101 may not include the storage unit 211.
 保存部211は、準識別子又は準識別子の組合せと、それに対応する特定個人到達率とを、関連付けて保存する。 The storage unit 211 stores the quasi-identifier or combination of quasi-identifiers and the corresponding specific individual arrival rate in association with each other.
 図8は、保存部211が保存するデータの一例を示す図である。図8に示すデータにおいて、例えば、準識別子である年齢と性別との組合せに対応する特定個人到達率は、0.3である。一方、年齢に対する特定個人到達率は、その値より高い0.6である。これは、準識別子の組合せの値を取得できる可能性が、組合せに含まれる一つの準識別子の値を取得できる可能性より低いためである。以下の本実施形態の説明は、図8のデータを用いて説明する。 FIG. 8 is a diagram illustrating an example of data stored by the storage unit 211. In the data shown in FIG. 8, for example, the specific individual arrival rate corresponding to the combination of age and sex, which are quasi-identifiers, is 0.3. On the other hand, the specific individual arrival rate with respect to age is 0.6, which is higher than that value. This is because the possibility that the value of the combination of quasi-identifiers can be acquired is lower than the possibility that the value of one quasi-identifier included in the combination can be acquired. The following description of the present embodiment will be described using the data of FIG.
 取得部111は、保存部211から、準識別子又は準識別子の組合せに対応する特定個人到達率を取得する。なお、準識別子の組合せの特定個人到達率を取得する場合、取得部111は、組合せに含まれる準識別子の特定個人到達率の積を、その準識別子の組合せの特定個人到達率としてもよい。 The acquisition unit 111 acquires the specific individual arrival rate corresponding to the quasi-identifier or the combination of quasi-identifiers from the storage unit 211. When acquiring the specific individual arrival rate of the combination of quasi-identifiers, the acquiring unit 111 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers.
[動作の説明]
 次に、情報処理装置101の動作について、図3を用いて説明した第1の実施形態の動作と異なる動作を中心に説明する。
[Description of operation]
Next, an operation of the information processing apparatus 101 will be described focusing on an operation different from the operation of the first embodiment described with reference to FIG.
 情報処理装置101は、第1の実施形態の同様に、ステップS101及びステップS102を実行する。 The information processing apparatus 101 executes step S101 and step S102 as in the first embodiment.
 そして、情報処理装置101は、ステップS103に換えて、下記の動作を実行する。 Then, the information processing apparatus 101 executes the following operation instead of step S103.
 特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。そして、特定リスク計算部130は、取得部111に対して、指定された個人の準識別子又は準識別子の組合せの情報を渡し、特定個人到達率の取得を依頼する。 The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 passes the information on the specified individual semi-identifier or combination of semi-identifiers to the acquisition unit 111 and requests acquisition of the specific individual arrival rate.
 取得部111は、保存部211から、依頼された準識別子又は準識別子の組合せに対応する特定個人到達率を取得する。そして、取得部111は、特定個人到達率を、特定リスク計算部130に返信する。 The acquisition unit 111 acquires the specific individual arrival rate corresponding to the requested quasi-identifier or combination of quasi-identifiers from the storage unit 211. Then, the acquisition unit 111 returns the specific individual arrival rate to the specific risk calculation unit 130.
 例えば、匿名化対象の準識別子が、年齢及び性別の場合、取得部111は、特定個人到達率として、0.3を取得し、特定リスク計算部130に送信する。 For example, when the anonymization target quasi-identifier is age and gender, the acquisition unit 111 acquires 0.3 as the specific individual reach and transmits it to the specific risk calculation unit 130.
 特定リスク計算部130は、第1の実施形態のステップS104と同様に動作して、特定リスクを計算する。 The specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
[効果の説明]
 次に、第2の実施形態の効果について説明する。
[Description of effects]
Next, effects of the second embodiment will be described.
 第2の実施形態に係る情報処理装置101は、第1の実施形態の効果に加え、準識別子の属性に応じて特定リスクを計算できるとの効果を奏することができる。すなわち、第2の実施形態は、攻撃者が知っている可能性がある準識別子が異なる場合に対応できるとの効果を奏することができる。 In addition to the effect of the first embodiment, the information processing apparatus 101 according to the second embodiment can produce an effect that a specific risk can be calculated according to the attribute of the quasi-identifier. That is, the second embodiment can produce an effect that it can cope with a case where the quasi-identifiers that the attacker may know are different.
 その理由は、次のとおりである。 The reason is as follows.
 第1の取得部111が、準識別子又は準識別子の組合せに対応した特定個人到達率を取得する。そして、特定リスク計算部130が、準識別子又は準識別子の組合せに対応した特定個人到達率を基に、特定リスクを計算するためである。 The first acquisition unit 111 acquires a specific individual arrival rate corresponding to a quasi-identifier or a combination of quasi-identifiers. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the quasi-identifier or the combination of quasi-identifiers.
 <第3の実施の形態>
 第3の実施形態について図面を参照して説明する。
<Third Embodiment>
A third embodiment will be described with reference to the drawings.
 第3の実施形態に係る情報処理装置102は、準識別子の属性に対する条件の組合せに応じて特定個人到達率を決定する点が、第1の実施形態の情報処理装置100と異なる。 The information processing apparatus 102 according to the third embodiment differs from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to a combination of conditions for the attributes of the quasi-identifier.
[構成の説明]
 図面を参照して、第3の実施形態に係る情報処理装置102の構成について説明する。
[Description of configuration]
The configuration of the information processing apparatus 102 according to the third embodiment will be described with reference to the drawings.
 図9は、第3の実施形態に係る情報処理装置102の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。図9に示すように、情報処理装置102は、情報処理装置100の受信部110に換えて取得部112を含む。以下、取得部112を、受信部110と区別するため、第2の取得部と呼ぶ場合もある。さらに、情報処理装置102は、保存部212を含む。なお、情報処理装置102は、図17に示すコンピュータを用いて構成されてもよい。また、情報処理装置102は、保存部212を、ネットワークを介して接続する外部の装置としてもよい。また、取得部112は、以下で説明する保存部212が保存する情報を、第1の実施形態と同様に、図示しない外部の装置から受信してもよい。この場合、情報処理装置102は、保存部212を含まなくてもよい。 FIG. 9 is a block diagram illustrating an example of the configuration of the information processing apparatus 102 according to the third embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal. As illustrated in FIG. 9, the information processing apparatus 102 includes an acquisition unit 112 instead of the reception unit 110 of the information processing apparatus 100. Hereinafter, in order to distinguish the acquisition unit 112 from the reception unit 110, the acquisition unit 112 may be referred to as a second acquisition unit. Further, the information processing apparatus 102 includes a storage unit 212. The information processing apparatus 102 may be configured using a computer shown in FIG. Further, the information processing apparatus 102 may use the storage unit 212 as an external apparatus connected via a network. Further, the acquisition unit 112 may receive information stored in the storage unit 212 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 102 may not include the storage unit 212.
 保存部212は、条件となる第1の属性(第1の属性名)と、特定個人到達率を設定する第2の属性(属性名)と、設定される特定個人到達率を算出する関数(例えば、条件式及び計算式の組合せ)とを関連付けて保存する。なお、第1及び第2の属性は、複数の属性の組合せでもよい。 The storage unit 212 includes a first attribute (first attribute name) as a condition, a second attribute (attribute name) for setting a specific individual arrival rate, and a function ( For example, a combination of a conditional expression and a calculation expression) is stored in association with each other. The first and second attributes may be a combination of a plurality of attributes.
 図10は、保存部212が保存するデータの一例を示す図である。図10に示すデータにおいて、判断に用いる属性が、上記の第1の属性であり、特定個人到達率を設定する属性が、第2の属性である。また、図10において設定される特定個人到達率が、設定される特定個人到達率を算出する関数である。図10に示す関数によれば、例えば、属性名(性別)の属性値(男性)の場合、属性(年齢)に対する特定個人到達率は、0.2である。また、その関数によれば、属性名(性別)の属性値(女性)の場合、属性(年齢)に対する特定個人到達率は、0.1である。以下の本実施形態の説明は、図10のデータを用いて説明する。 FIG. 10 is a diagram illustrating an example of data stored by the storage unit 212. In the data shown in FIG. 10, the attribute used for the determination is the first attribute, and the attribute for setting the specific individual arrival rate is the second attribute. Further, the specific individual arrival rate set in FIG. 10 is a function for calculating the set specific individual arrival rate. According to the function shown in FIG. 10, for example, in the case of the attribute value (male) of the attribute name (gender), the specific individual arrival rate for the attribute (age) is 0.2. Further, according to the function, in the case of the attribute value (female) of the attribute name (gender), the specific individual arrival rate for the attribute (age) is 0.1. The following description of the present embodiment will be described using the data of FIG.
 取得部112は、指定された属性に対応した特定個人到達率を取得する。取得部112は、特定個人到達率を取得する際に、必要に応じて、パーソナルデータ保存部200からパーソナルデータを取得してもよい。なお、準識別子の組合せの特定個人到達率を取得する場合、取得部112は、組合せに含まれる準識別子の特定個人到達率の積を、その準識別子の組合せの特定個人到達率としてもよい。あるいは、準識別子の組合せの特定個人到達率を取得する場合、取得部112は、準識別子の組合せの特定個人到達率として、組合せに含まれる準識別子の特定個人到達率の中で、最小値又は最大値となっている特定個人到達率を選択してもよい。 The acquisition unit 112 acquires a specific individual arrival rate corresponding to the specified attribute. The acquisition unit 112 may acquire personal data from the personal data storage unit 200 as necessary when acquiring the specific individual arrival rate. When acquiring the specific individual arrival rate of the combination of quasi-identifiers, the acquiring unit 112 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers. Alternatively, when acquiring the specific individual arrival rate of the combination of quasi-identifiers, the acquiring unit 112 obtains the minimum value among the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of quasi-identifiers. You may select the specific individual arrival rate which is the maximum value.
[動作の説明]
 次に、情報処理装置101の動作について、図3を用いて説明した第1の実施形態の動作と異なる動作を中心に説明する。
[Description of operation]
Next, an operation of the information processing apparatus 101 will be described focusing on an operation different from the operation of the first embodiment described with reference to FIG.
 情報処理装置102は、第1の実施形態の同様に、ステップS101及びステップS102を実行する。 The information processing apparatus 102 executes step S101 and step S102 as in the first embodiment.
 そして、情報処理装置102は、ステップS103に換えて、下記の動作を実行する。 Then, the information processing apparatus 102 executes the following operation instead of step S103.
 特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。そして、特定リスク計算部130は、取得部112に対して、指定された個人の準識別子又は準識別子の組合せの情報を渡し、特定個人到達率の取得を依頼する。 The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. The specific risk calculation unit 130 then passes the information on the specified individual semi-identifier or combination of semi-identifiers to the acquisition unit 112 and requests acquisition of the specific individual reach.
 取得部112は、保存部212を参照し、指定された個人の準識別子の属性名と属性値に対応する特定個人到達率を取得する。そして、取得部112は、特定個人到達率を、特定リスク計算部130に返信する。 The obtaining unit 112 refers to the storage unit 212 and obtains the specific individual arrival rate corresponding to the attribute name and attribute value of the designated individual semi-identifier. Then, the acquisition unit 112 returns the specific individual arrival rate to the specific risk calculation unit 130.
 なお、特定リスク計算部130は、取得部112に、指定された個人の属性名と属性値とを送信してもよい。あるいは、特定リスク計算部130は、取得部112に、指定された個人の識別子、又は、指定された個人の属性名を送信してもよい。この場合、取得部112は、パーソナルデータ保存部200を参照して、判断に必要なデータを取得する。 Note that the specific risk calculation unit 130 may transmit the specified individual attribute name and attribute value to the acquisition unit 112. Alternatively, the specific risk calculation unit 130 may transmit the specified individual identifier or the specified individual attribute name to the acquisition unit 112. In this case, the acquisition unit 112 refers to the personal data storage unit 200 and acquires data necessary for the determination.
 例えば、特定リスク計算部130が、user1と属性名(年齢)と送信したとする。この場合、取得部112は、パーソナルデータ保存部200のデータ(図2に示すデータ)から、user1の属性名(年齢)の属性値(20)を取得する。そして、取得部112は、保存部212に保存されているデータ(図10に示すデータ)を基に、特定個人到達率として、0.3を取得する。 For example, it is assumed that the specific risk calculation unit 130 transmits user1 and an attribute name (age). In this case, the acquisition unit 112 acquires the attribute value (20) of the attribute name (age) of user1 from the data (data shown in FIG. 2) of the personal data storage unit 200. Then, the acquisition unit 112 acquires 0.3 as the specific individual arrival rate based on the data stored in the storage unit 212 (data shown in FIG. 10).
 特定リスク計算部130は、第1の実施形態のステップS104と同様に動作して、特定リスクを計算する。 The specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
[効果の説明]
 次に、第3の実施形態の効果について説明する。
[Description of effects]
Next, effects of the third embodiment will be described.
 第3の実施形態に係る情報処理装置102は、第1の実施形態の効果に加え、属性名及び属性値に応じて特定個人到達率を決定できるとの効果を奏することができる。すなわち、第3の実施形態は、より細かな条件に対応した適切な特定リスクを計算できるとの効果を奏することができる。 In addition to the effects of the first embodiment, the information processing apparatus 102 according to the third embodiment can achieve the effect that the specific individual arrival rate can be determined according to the attribute name and the attribute value. That is, the third embodiment can produce an effect that an appropriate specific risk corresponding to a finer condition can be calculated.
 その理由は、次のとおりである。 The reason is as follows.
 第2の取得部112が、準識別子である属性値又は属性値の組合せに対応して設定された条件を基に特定個人到達率を取得する。そして、特定リスク計算部130が、条件に対応した特定個人到達率を基に、特定リスクを計算するためである。 The second acquisition unit 112 acquires the specific individual arrival rate based on the condition set corresponding to the attribute value or the combination of attribute values that are quasi-identifiers. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the condition.
 <第4の実施の形態>
 第4の実施形態について図面を参照して説明する。
<Fourth embodiment>
A fourth embodiment will be described with reference to the drawings.
 第4の実施形態に係る情報処理装置103は、個人識別リスクに応じて特定個人到達率を決定する点が、第1の実施形態に係る情報処理装置100と異なる。 The information processing apparatus 103 according to the fourth embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is determined according to the individual identification risk.
[構成の説明]
 図面を参照して、第4の実施形態に係る情報処理装置103の構成について説明する。
[Description of configuration]
The configuration of the information processing apparatus 103 according to the fourth embodiment will be described with reference to the drawings.
 図11は、第4の実施形態に係る情報処理装置103の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。図11に示すように、情報処理装置103は、情報処理装置100の受信部110に換えて取得部113を含む。以下、取得部113を、取得部112などと区別するため、第3の取得部と呼ぶ場合もある。さらに、情報処理装置103は、保存部213を含む。なお、情報処理装置103は、図17に示すコンピュータを用いて構成されてもよい。また、情報処理装置103は、保存部213を、ネットワークを介して接続する外部の装置としてもよい。また、取得部113は、以下で説明する保存部213が保存する情報を、第1の実施形態と同様に、図示しない外部の装置から受信してもよい。この場合、情報処理装置103は、保存部213を含まなくてもよい。 FIG. 11 is a block diagram illustrating an example of the configuration of the information processing apparatus 103 according to the fourth embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal. As illustrated in FIG. 11, the information processing apparatus 103 includes an acquisition unit 113 instead of the reception unit 110 of the information processing apparatus 100. Hereinafter, the acquisition unit 113 may be referred to as a third acquisition unit in order to distinguish it from the acquisition unit 112 or the like. Further, the information processing apparatus 103 includes a storage unit 213. The information processing apparatus 103 may be configured using a computer shown in FIG. Further, the information processing apparatus 103 may use the storage unit 213 as an external apparatus connected via a network. Further, the acquisition unit 113 may receive information stored in the storage unit 213 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 103 may not include the storage unit 213.
 保存部213は、識別リスクと特定個人到達率とを関連付けて保存する。 The storage unit 213 stores the identification risk and the specific individual arrival rate in association with each other.
 図12は、保存部213が保存するデータの一例を示す図である。特定個人到達率を高くすることは、特定リスクを高くすることである。そして、特定リスクを用いた匿名化処理は、特定リスクが高いほど、高い匿名性となるように、つまり、特定されにくいように匿名化を実行する。そのため、図12に示す特定個人到達率は、識別リスク(1/m)が大きいほど、特定個人到達率が高くなっている。これは、上記のとおり、稀な属性(識別リスクが高い属性)ほど、特定されにくく(特定個人到達率を高く)するためである。ただし、これは、本実施形態の一例である。本実施形態は、このようなデータに限る必要はない。なお、以下の本実施形態の説明は、図12のデータを用いて説明する。 FIG. 12 is a diagram illustrating an example of data stored by the storage unit 213. Increasing the specific individual arrival rate means increasing the specific risk. And the anonymization process using a specific risk performs anonymization so that it may become high anonymity, so that it is hard to identify, so that a specific risk is high. Therefore, the specific individual arrival rate shown in FIG. 12 is higher as the identification risk (1 / m) is larger. This is because, as described above, the rare attributes (attributes with high identification risk) are less likely to be specified (specific individual arrival rate is high). However, this is an example of this embodiment. The present embodiment need not be limited to such data. In addition, the following description of this embodiment is demonstrated using the data of FIG.
 取得部113は、識別リスクを基に、特定個人到達率を取得する。 The acquisition unit 113 acquires the specific individual arrival rate based on the identification risk.
[動作の説明]
 次に、情報処理装置103の動作について、図3を用いて説明した第1の実施形態の動作と異なる動作を中心に説明する。
[Description of operation]
Next, the operation of the information processing apparatus 103 will be described focusing on the operation different from the operation of the first embodiment described with reference to FIG.
 情報処理装置103は、第1の実施形態の同様に、ステップS101及びステップS102を実行する。 The information processing apparatus 103 executes step S101 and step S102 as in the first embodiment.
 そして、情報処理装置103は、ステップS103に換えて、下記の動作を実行する。 Then, the information processing apparatus 103 performs the following operation instead of step S103.
 特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。そして、特定リスク計算部130は、取得部113に対して、指定された個人の識別リスク条件の準識別子の情報を渡し、特定個人到達率の取得を依頼する。 The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 passes the quasi-identifier information of the specified individual identification risk condition to the acquisition unit 113 and requests acquisition of the specific individual arrival rate.
 取得部113は、識別リスク計算部120から個人の識別の可能性(識別リスク)を取得する。その後、取得部113は、保存部213を参照し、特定個人到達率を取得する。 The acquisition unit 113 acquires the possibility of individual identification (identification risk) from the identification risk calculation unit 120. Thereafter, the acquisition unit 113 refers to the storage unit 213 and acquires the specific individual arrival rate.
 例えば、識別リスクが0.6の場合、取得部113は、保存部213(図12に示すデータ)を参照して、特定個人到達率として0.8を取得する。 For example, when the identification risk is 0.6, the acquisition unit 113 refers to the storage unit 213 (data illustrated in FIG. 12) and acquires 0.8 as the specific individual arrival rate.
 そして、取得部113は、特定個人到達率を、特定リスク計算部130に返信する。 Then, the acquisition unit 113 returns the specific individual arrival rate to the specific risk calculation unit 130.
 特定リスク計算部130は、第1の実施形態のステップS104と同様に動作して、特定リスクを計算する。 The specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
[効果の説明]
 次に、第4の実施形態の効果について説明する。
[Description of effects]
Next, the effect of the fourth embodiment will be described.
 第4の実施形態に係る情報処理装置103は、第1の実施形態の効果に加え、識別リスクに応じて適切な特定リスクを計算できるとの効果を奏することができる。 In addition to the effect of the first embodiment, the information processing apparatus 103 according to the fourth embodiment can produce an effect that an appropriate specific risk can be calculated according to the identification risk.
 その理由は、次の通りである。 The reason is as follows.
 第3の取得部113が、識別リスクを考慮して、特定個人到達率を取得する。そして、特定リスク計算部130が、その特定個人到達率を基に、特定リスクを計算するためである。 The third acquisition unit 113 acquires the specific individual arrival rate in consideration of the identification risk. Then, the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate.
 例えば、誕生日が、準識別子とする。この場合、うるう年の2月29日が誕生日の人は、他の日が誕生日の人に比べ、準識別子(誕生日)を基に個人を識別されやすい。そのため、2月29日が誕生日の人は、他の人に比べ誕生日(準識別子)を隠す必要性が高い。つまり、準識別子が誕生日の場合、2月29日が誕生日の人における識別リスクは、他の人における識別リスクより高い。そのような場合でも、本実施形態は、識別リスクに応じて適切な特定個人到達率を取得し、特定リスクを計算できる。 For example, birthday is a quasi-identifier. In this case, a person whose birthday is February 29 of a leap year is more easily identified based on a quasi-identifier (birthday) than a person whose birthday is a birthday on the other day. Therefore, a person whose birthday is February 29 is more likely to hide his / her birthday (quasi-identifier) than other people. That is, when the quasi-identifier is a birthday, the identification risk for a person whose birthday is February 29 is higher than the identification risk for other persons. Even in such a case, the present embodiment can acquire an appropriate specific individual arrival rate according to the identification risk and calculate the specific risk.
 <第5の実施の形態>
 第5の実施形態について図面を参照して説明する。
<Fifth embodiment>
A fifth embodiment will be described with reference to the drawings.
 第5の実施形態に係る情報処理装置104は、パーソナルデータの提供先の組織(又は、攻撃者となる可能性のある組織)に応じて特定個人到達率を変える点が、第1の実施形態の情報処理装置100と異なる。これは、パーソナルデータの提供先の組織(相手)は、提供したパーソナルデータに対する攻撃者となる可能性があるためである。そして、それぞれの組織は、攻撃者として異なるリスクを備えるためである。 The information processing apparatus 104 according to the fifth embodiment is different from the first embodiment in that the specific individual arrival rate is changed according to an organization (or an organization that may become an attacker) of a personal data providing destination. Different from the information processing apparatus 100. This is because the organization (partner) to whom personal data is provided may become an attacker against the provided personal data. Each organization has different risks as attackers.
[構成の説明]
 図面を参照して、第5の実施形態に係る情報処理装置104の構成について説明する。
[Description of configuration]
The configuration of the information processing apparatus 104 according to the fifth embodiment will be described with reference to the drawings.
 図13は、第5の実施形態に係る情報処理装置104の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。図13に示すように、情報処理装置104は、情報処理装置100の受信部110に換えて、取得部114を含む。以下、取得部114を、取得部112などと区別する場合、第4の取得部と呼ぶ。さらに、情報処理装置104は、保存部214を含む。なお、情報処理装置104は、図17に示すコンピュータを用いて構成されてもよい。また、情報処理装置104は、保存部214を、ネットワークを介して接続する外部の装置としてもよい。また、取得部114は、以下で説明する保存部214が保存する情報を、第1の実施形態と同様に、図示しない外部の装置から受信してもよい。この場合、情報処理装置104は、保存部214を含まなくてもよい。 FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus 104 according to the fifth embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal. As illustrated in FIG. 13, the information processing device 104 includes an acquisition unit 114 instead of the reception unit 110 of the information processing device 100. Hereinafter, when the acquisition unit 114 is distinguished from the acquisition unit 112 or the like, it is referred to as a fourth acquisition unit. Further, the information processing apparatus 104 includes a storage unit 214. The information processing apparatus 104 may be configured using a computer shown in FIG. Further, the information processing apparatus 104 may use the storage unit 214 as an external apparatus connected via a network. Further, the acquisition unit 114 may receive information stored in the storage unit 214 described below from an external device (not shown) as in the first embodiment. In this case, the information processing apparatus 104 may not include the storage unit 214.
 保存部214は、情報の提供先と、提供先に応じた特定個人到達率とを関連付けて保存する。 The storage unit 214 stores the information providing destination in association with the specific individual arrival rate corresponding to the providing destination.
 図14は、保存部214が保存するデータの一例を示す図である。例えば、人数が多い組織は、対象となる個人を知っている者が含まれる可能性が高い。ここで、例えば、組織Bの会員数が、組織Aの会員数より多いとする。その場合、組織Bにおける特定個人到達率は、組織Aにおける特定個人到達率より大きいことが必要となる。そこで、図14において、提供先の組織Bにおける特定個人到達率は、組織Aにおける特定個人到達率より大きな値となっている。なお、保存部214は、図14に示すように、提供先として保存する組織として複数の種類(例えば、図14に示す組織と業種)を含んでもよい。以下の本実施形態の説明は、図14のデータを用いて説明する。 FIG. 14 is a diagram illustrating an example of data stored by the storage unit 214. For example, an organization with a large number of people is likely to include a person who knows the target individual. Here, for example, it is assumed that the number of members of the organization B is larger than the number of members of the organization A. In that case, the specific individual arrival rate in the organization B needs to be larger than the specific individual arrival rate in the organization A. Therefore, in FIG. 14, the specific individual arrival rate in the organization B of the provision destination is a value larger than the specific individual arrival rate in the organization A. As illustrated in FIG. 14, the storage unit 214 may include a plurality of types (for example, the organization and the business type illustrated in FIG. 14) as the organization to be stored as a providing destination. The following description of the present embodiment will be described using the data of FIG.
 取得部114は、保存部214から、特定個人到達率を取得する。なお、準識別子の組合せの特定個人到達率を取得する場合、取得部114は、組合せに含まれる準識別子の特定個人到達率の積を、その準識別子の組合せの特定個人到達率としてもよい。あるいは、取得部114は、保存部214から、提供先に関する情報(例えば、会員数)を取得し、その情報を基に提供先に応じた特定個人到達率を計算してもよい。 The acquisition unit 114 acquires the specific individual arrival rate from the storage unit 214. When acquiring the specific individual arrival rate of the combination of quasi-identifiers, the acquiring unit 114 may use the product of the specific individual arrival rates of the quasi-identifiers included in the combination as the specific individual arrival rate of the combination of the quasi-identifiers. Or the acquisition part 114 may acquire the information (for example, the number of members) regarding a provision destination from the preservation | save part 214, and may calculate the specific individual arrival rate according to a provision destination based on the information.
[動作の説明]
 次に、情報処理装置104の動作について、図3を用いて説明した第1の実施形態の動作と異なる動作を中心に説明する。
[Description of operation]
Next, operations of the information processing apparatus 104 will be described focusing on operations different from the operations of the first embodiment described with reference to FIG.
 情報処理装置104は、第1の実施形態の同様に、ステップS101及びステップS102を実行する。 The information processing apparatus 104 executes step S101 and step S102 as in the first embodiment.
 そして、情報処理装置104は、ステップS103に換えて、下記の動作を実行する。 Then, the information processing apparatus 104 executes the following operation instead of step S103.
 特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。そして、特定リスク計算部130は、取得部114に、特定個人到達率の取得を依頼する。このとき、特定リスク計算部130は、取得部114に、相手(攻撃者)として設定されている提供先と属性との情報を送信する。 The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 requests the acquisition unit 114 to acquire the specific individual arrival rate. At this time, the specific risk calculation unit 130 transmits to the acquisition unit 114 information on the provision destination and the attribute set as the opponent (attacker).
 取得部114は、保存部214のデータを基に、受信した提供先及び属性に対応する特定個人到達率を取得する。 The acquisition unit 114 acquires the specific individual arrival rate corresponding to the received destination and attribute based on the data of the storage unit 214.
 例えば、提供先が組織Bであり、属性が性別である場合、取得部114は、特定個人到達率として0.5を取得する。 For example, when the providing destination is the organization B and the attribute is gender, the acquisition unit 114 acquires 0.5 as the specific individual arrival rate.
 そして、取得部114は、特定リスク計算部130に特定個人到達率の情報を返信する。 Then, the acquisition unit 114 returns information on the specific individual arrival rate to the specific risk calculation unit 130.
 特定リスク計算部130は、第1の実施形態のステップS104と同様に動作して、特定リスクを計算する。 The specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
[効果の説明]
 次に、第5の実施形態の効果について説明する。
[Description of effects]
Next, effects of the fifth exemplary embodiment will be described.
 第5の実施形態に係る情報処理装置104は、第1の実施形態の効果に加え、パーソナルデータを提供する組織(攻撃者となる可能性のある組織)に応じて特定個人到達率を変えることができるとの効果を奏することができる。 In addition to the effects of the first embodiment, the information processing apparatus 104 according to the fifth embodiment changes the specific individual arrival rate according to an organization that provides personal data (an organization that may become an attacker). The effect that it can be done can be produced.
 その理由は、次のとおりである。 The reason is as follows.
 第4の取得部114が、パーソナルデータを提供する組織(相手)に対応した特定個人到達率を取得する。そして、特定リスク計算部130が、組織(相手)に対応した特定個人到達率を基に、特定リスクを計算するためである。 The fourth acquisition unit 114 acquires a specific individual arrival rate corresponding to an organization (partner) that provides personal data. This is because the specific risk calculation unit 130 calculates the specific risk based on the specific individual arrival rate corresponding to the organization (partner).
 <第6の実施の形態>
 第6の実施形態について図面を参照して説明する。
<Sixth Embodiment>
A sixth embodiment will be described with reference to the drawings.
 第6の実施形態に係る情報処理装置105は、公開されているデータ(公開情報)を用いて特定個人到達率を計算するという点が、第1の実施形態の情報処理装置100と異なる。 The information processing apparatus 105 according to the sixth embodiment is different from the information processing apparatus 100 according to the first embodiment in that the specific individual arrival rate is calculated using publicly available data (public information).
 ここで、公開情報とは、一般に公開されているデータ(公知のデータ)である。例えば、公開情報とは、データの提供先の会員の分布(例えば、10代の会員が1万人、20代の会員が1.5万人、30代の会員が1万人といった情報)である。あるいは、公開情報とは、twitter(登録商標)などのインターネット上に公開されている情報(例えば、「user1は、10代で、位置情報を公開している」)である。ただし、公開情報の公開範囲は、インターネットの情報のように、公開範囲に制限がない情報に限る必要はない。例えば、公開情報は、所定の組織(例えば、インターネットプロバイダ)に登録されている会員に公開されているような、公開範囲がある程度に限られる情報でもよい。 Here, public information is data (public data) that is open to the public. For example, public information refers to the distribution of members to whom data is provided (for example, information such as 10,000 members in their 10s, 15,000 members in their 20s, and 10,000 members in their 30s). is there. Alternatively, the public information is information published on the Internet such as Twitter (registered trademark) (for example, “user1 is a teenager and discloses location information”). However, the disclosure range of public information need not be limited to information that has no limitation on the disclosure range, such as information on the Internet. For example, the public information may be information that is disclosed to a member registered in a predetermined organization (for example, an Internet provider) and whose disclosure range is limited to some extent.
[構成の説明]
 図面を参照して、第6の実施形態に係る情報処理装置105の構成ついて説明する。
[Description of configuration]
The configuration of the information processing apparatus 105 according to the sixth embodiment will be described with reference to the drawings.
 図15は、第6の実施形態に係る情報処理装置105の構成の一例を示すブロック図である。なお、図面中の矢印の方向は、一例を示すものであり、信号の向きを限定するものではない。図15に示すように、情報処理装置105は、情報処理装置100の受信部110に換えて、特定個人到達率計算部115を含む。さらに、情報処理装置105は、公開分布情報保存部215を含む。なお、情報処理装置105は、図17に示すコンピュータを用いて構成されてもよい。また、情報処理装置105は、公開分布情報保存部215を、ネットワークを介して接続する外部の装置としてもよい。また、特定個人到達率計算部115は、以下で説明する公開分布情報保存部215が保存する情報を、第1の実施形態と同様に、図示しない外部の装置から受信してもよい。この場合、情報処理装置105は、公開分布情報保存部215を含まなくてもよい。 FIG. 15 is a block diagram illustrating an example of the configuration of the information processing apparatus 105 according to the sixth embodiment. In addition, the direction of the arrow in a drawing shows an example and does not limit the direction of a signal. As illustrated in FIG. 15, the information processing device 105 includes a specific individual arrival rate calculation unit 115 instead of the reception unit 110 of the information processing device 100. Further, the information processing apparatus 105 includes a public distribution information storage unit 215. The information processing apparatus 105 may be configured using a computer shown in FIG. Further, the information processing apparatus 105 may use the public distribution information storage unit 215 as an external apparatus connected via a network. Further, the specific individual arrival rate calculation unit 115 may receive information stored in the public distribution information storage unit 215 described below from an external device (not shown), as in the first embodiment. In this case, the information processing apparatus 105 may not include the public distribution information storage unit 215.
 公開分布情報保存部215は、公開情報を保存する。 The public distribution information storage unit 215 stores public information.
 特定個人到達率計算部115は、公開情報を基に、特定個人到達率を計算する。 The specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate based on the public information.
[動作の説明]
 次に、情報処理装置105の動作について、図3を用いて説明した第1の実施形態の動作と異なる動作を中心に説明する。
[Description of operation]
Next, the operation of the information processing apparatus 105 will be described focusing on the operation different from the operation of the first embodiment described with reference to FIG.
 情報処理装置105は、第1の実施形態の同様に、ステップS101及びステップS102を実行する。 The information processing apparatus 105 executes step S101 and step S102 as in the first embodiment.
 そして、情報処理装置105は、ステップS103に換えて、下記の動作を実行する。 Then, the information processing apparatus 105 executes the following operation instead of step S103.
 特定リスク計算部130は、識別リスク計算部120から識別リスクを受信する。そして、特定リスク計算部130は、特定個人到達率計算部115に特定個人到達率の計算を依頼する。 The specific risk calculation unit 130 receives the identification risk from the identification risk calculation unit 120. Then, the specific risk calculation unit 130 requests the specific individual arrival rate calculation unit 115 to calculate the specific individual arrival rate.
 特定個人到達率計算部115は、公開分布情報保存部215に保存された公開情報と、パーソナルデータ保存部200に保存されているパーソナルデータとを用いて、特定個人到達率を計算し、計算結果を特定リスク計算部130へ返信する。 The specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate using the public information stored in the public distribution information storage unit 215 and the personal data stored in the personal data storage unit 200, and the calculation result To the specific risk calculation unit 130.
 特定個人到達率計算部115の計算は、特に制限はなく、必要とされるリスクに合わせて設定されればよい。例えば、特定個人到達率計算部115は、パーソナルデータの分布と、公開情報におけるパーソナルデータの提供先の組織におけるデータの分布とを用いて、特定個人到達率を計算してもよい。 The calculation of the specific individual arrival rate calculation unit 115 is not particularly limited and may be set according to the required risk. For example, the specific individual arrival rate calculation unit 115 may calculate the specific individual arrival rate using the distribution of personal data and the distribution of data in the organization to which the personal data is provided in the public information.
 次に、より具体的な計算例として、2つの計算例を説明する。 Next, two calculation examples will be described as more specific calculation examples.
 第1の計算例は、対象となる組織の会員数を用いる計算例である。ここで、公開情報は、「A社の未成年会員数は、1000万人(日本の人口の10%)である。」とする。特定個人到達率計算部115は、この公開情報に基づいて、A社が、10%の確率で準識別子を知っていることが分かる。そこで、特定個人到達率計算部115は、公開情報(人口比率)を基に、特定個人到達率を0.1と計算する。 The first calculation example is a calculation example using the number of members of the target organization. Here, the public information is “Company A has 10 million minor members (10% of Japan's population)”. Based on this public information, the specific individual arrival rate calculation unit 115 knows that Company A knows the quasi-identifier with a probability of 10%. Therefore, the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate as 0.1 based on the public information (population ratio).
 第2の計算例は、対象となる組織の会員の分布を用いる計算例である。ここで、公開情報は、「10代の会員数が10,000人で、10代の位置情報公開会員数が1,000人で、20代の会員数が20,000人で、20代の位置情報公開会員数1,000人である。」とする。この場合、10代の会員における位置情報の公開率は、0.1(=1000/10000)である。同様に、20代の会員における位置情報の公開率は、0.05(=1000/20000)である。そこで、特定個人到達率計算部115は、「位置情報の公開率=特定個人到達率」との想定を基に、10代の会員における特定個人到達率を0.1と計算し、20代の会員における特定個人到達率を0.05と計算する。 The second calculation example is a calculation example using the distribution of members of the target organization. Here, the public information is “the number of teenage members is 10,000, the number of location information public members of the teens is 1,000, the number of members of the 20s is 20,000, The number of location information disclosure members is 1,000. " In this case, the disclosure rate of location information among teenage members is 0.1 (= 1000/10000). Similarly, the disclosure rate of location information among members in their 20s is 0.05 (= 1000/20000). Therefore, the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate of the teenage member as 0.1 based on the assumption that “the disclosure rate of the position information = the specific individual arrival rate”, and the 20th generation The specific individual arrival rate of members is calculated as 0.05.
 そして、特定個人到達率計算部115は、特定個人到達率を、特定リスク計算部130に返信する。 Then, the specific individual arrival rate calculation unit 115 returns the specific individual arrival rate to the specific risk calculation unit 130.
 特定リスク計算部130は、第1の実施形態のステップS104と同様に動作して、特定リスクを計算する。 The specific risk calculation unit 130 operates in the same manner as step S104 of the first embodiment, and calculates a specific risk.
[効果の説明]
 次に、第6の実施形態の効果について説明する。
[Description of effects]
Next, the effect of the sixth embodiment will be described.
 第6の実施形態に係る情報処理装置105は、第1の実施形態の効果に加え、特定個人情報を受信、及び、特定個人到達率の保存の動作を削減するという効果を奏する。 In addition to the effects of the first embodiment, the information processing apparatus 105 according to the sixth embodiment has an effect of receiving specific personal information and reducing the operation of storing the specific personal reach.
 その理由は、特定個人到達率計算部115が、公開情報を基に、特定個人到達率を計算するためである。 The reason is that the specific individual arrival rate calculation unit 115 calculates the specific individual arrival rate based on the public information.
 <その他の実施形態>
 上記の第1ないし第6の実施形態は、組み合わせてもよい。例えば、本発明の実施形態に係る特定リスク計算部130は、第2の実施形態において説明した準識別子に対応した特定個人到達率と、第4の実施形態において説明した識別リスクに対応した特定個人到達率とを用いて、特定リスクを計算してもよい。
<Other embodiments>
The above first to sixth embodiments may be combined. For example, the specific risk calculation unit 130 according to the embodiment of the present invention uses the specific individual arrival rate corresponding to the quasi-identifier described in the second embodiment and the specific individual corresponding to the identification risk described in the fourth embodiment. The specific risk may be calculated using the arrival rate.
 あるいは、情報処理システム310は、第1の実施形態に係る情報処理装置100に換えて、第2の実施形態に係る情報処理装置101ないし第6の実施形態に係る情報処理装置105を含んでもよい。 Alternatively, the information processing system 310 may include the information processing apparatus 101 according to the second embodiment to the information processing apparatus 105 according to the sixth embodiment, instead of the information processing apparatus 100 according to the first embodiment. .
 以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成及び詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
 この出願は、2014年10月29日に出願された日本出願特願2014-219808を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2014-219808 filed on October 29, 2014, the entire disclosure of which is incorporated herein.
 本発明は、個人の特定リスクを計算するツールに利用可能である。また、本発明は、パーソナルデータを「個人の特定性を低減したデータ」となるように、データを加工する際に利用可能である。 The present invention can be used as a tool for calculating a specific risk of an individual. The present invention can also be used when processing data so that the personal data becomes “data with reduced individual specificity”.
 100 情報処理装置
 101 情報処理装置
 102 情報処理装置
 103 情報処理装置
 104 情報処理装置
 105 情報処理装置
 106 情報処理装置
 110 受信部
 111 取得部
 112 取得部
 113 取得部
 114 取得部
 115 特定個人到達率計算部
 120 識別リスク計算部
 130 特定リスク計算部
 200 パーソナルデータ保存部
 211 保存部
 212 保存部
 213 保存部
 214 保存部
 215 公開分布情報保存部
 230 特定リスク計算結果保存部
 240 全体リスク計算部
 300 情報処理システム
 310 情報処理システム
 600 情報処理装置
 610 CPU
 620 ROM
 630 RAM
 640 内部記憶装置
 650 IOC
 660 入力機器
 670 表示機器
 680 NIC
 700 記憶媒体
DESCRIPTION OF SYMBOLS 100 Information processing apparatus 101 Information processing apparatus 102 Information processing apparatus 103 Information processing apparatus 104 Information processing apparatus 105 Information processing apparatus 106 Information processing apparatus 110 Reception part 111 Acquisition part 112 Acquisition part 113 Acquisition part 114 Acquisition part 115 Specific individual arrival rate calculation part 120 Identification Risk Calculation Unit 130 Specific Risk Calculation Unit 200 Personal Data Storage Unit 211 Storage Unit 212 Storage Unit 213 Storage Unit 214 Storage Unit 215 Public Distribution Information Storage Unit 230 Specific Risk Calculation Result Storage Unit 240 Overall Risk Calculation Unit 300 Information Processing System 310 Information processing system 600 Information processing device 610 CPU
620 ROM
630 RAM
640 Internal storage device 650 IOC
660 Input device 670 Display device 680 NIC
700 storage media

Claims (10)

  1. 指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算する識別リスク計算手段と、
     前記指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、前記識別リスクとを基に、前記指定された個人のデータが前記指定された個人のデータであると判断される可能性を示す特定リスクを計算する特定リスク計算手段と
     を含む情報処理装置。
    An identification risk calculation means for calculating an identification risk indicating the possibility that the data related to the designated individual is determined to be the data of one person,
    Based on the specific individual arrival rate indicating the possibility that the specified personal data is determined to be the specified personal data and the identification risk, the specified personal data is the specified personal data. An information processing apparatus comprising: a specific risk calculating means for calculating a specific risk indicating a possibility of being determined as personal data.
  2. 前記特定個人到達率を受信する受信手段
     を含む請求項1に記載の情報処理装置。
    The information processing apparatus according to claim 1, further comprising receiving means for receiving the specific individual arrival rate.
  3. 前記指定された個人に関するデータにおける準識別子の属性に対応した前記特定個人到達率を取得する第1の取得手段
     を含む請求項1又は2に記載の情報処理装置。
    The information processing apparatus according to claim 1, further comprising: a first acquisition unit configured to acquire the specific individual arrival rate corresponding to the attribute of the quasi-identifier in the data regarding the specified individual.
  4. 前記指定された個人に関するデータにおける属性に対する条件の組合せに対応した前記特定個人到達率を取得する第2の取得手段
     を含む請求項1ないし3のいずれか1項に記載の情報処理装置。
    4. The information processing apparatus according to claim 1, further comprising: a second acquisition unit configured to acquire the specific individual arrival rate corresponding to a combination of conditions for attributes in the data related to the designated individual. 5.
  5. 前記識別リスクに対応した前記特定個人到達率を取得する第3の取得手段
     を含む請求項1ないし4のいずれか1項に記載の情報処理装置。
    The information processing apparatus according to claim 1, further comprising: a third acquisition unit that acquires the specific individual arrival rate corresponding to the identification risk.
  6. 前記パーソナルデータの提供先に対応して前記特定個人到達率を取得する第4の取得手段
     を含む請求項1ないし5のいずれか1項に記載の情報処理装置。
    6. The information processing apparatus according to claim 1, further comprising: a fourth acquisition unit configured to acquire the specific individual arrival rate corresponding to the personal data providing destination.
  7. 公開情報とパーソナルデータとを基に前記特定個人到達率を計算する特定個人到達率計算手段
     を含む請求項1ないし6のいずれか1項に記載の情報処理装置。
    The information processing apparatus according to any one of claims 1 to 6, further comprising: a specific individual arrival rate calculating unit that calculates the specific individual arrival rate based on public information and personal data.
  8. 請求項1ないし7のいずれか1項に記載の情報処理装置と、
     複数の個人に関する情報であるパーソナルデータを保存するパーソナル情報保存手段と、
     前記情報処理装置が計算した前記パーソナルデータに含まれる全て個人に関するデータ対応する特定リスクを基に、前記パーソナルデータの全体に対応するリスクを計算する全体リスク計算手段と
     を含む情報処理システム。
    An information processing apparatus according to any one of claims 1 to 7,
    Personal information storage means for storing personal data that is information about a plurality of individuals;
    An overall information processing system comprising: an overall risk calculating means for calculating a risk corresponding to the entire personal data based on a specific risk corresponding to all personal data included in the personal data calculated by the information processing apparatus.
  9. 指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算し、
     前記指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、前記識別リスクとを基に、前記指定された個人のデータが前記指定された個人のデータであると判断される可能性を示す特定リスクを計算する
     情報処理方法。
    Calculates the identification risk that indicates that the data for the specified individual may be determined by someone else ’s data,
    Based on the specific individual arrival rate indicating the possibility that the specified personal data is determined to be the specified personal data and the identification risk, the specified personal data is the specified personal data. An information processing method that calculates a specific risk that indicates the possibility of being considered personal data.
  10. 指定された個人に関するデータが誰か一人のデータであると判断される可能性を示す識別リスクを計算する処理と、
     前記指定された個人のデータが指定された個人のデータであると判断される可能性を示す特定個人到達率と、前記識別リスクとを基に、前記指定された個人のデータが前記指定された個人のデータであると判断される可能性を示す特定リスクを計算する処理と
     をコンピュータに実行させるプログラムをコンピュータに読み取り可能に記録する不揮発性記録媒体。
    A process of calculating an identification risk that indicates that the data for the specified individual may be determined to be that of someone else,
    Based on the specific individual arrival rate indicating the possibility that the specified personal data is determined to be the specified personal data and the identification risk, the specified personal data is the specified personal data. A non-volatile recording medium that records in a computer readable manner a program that causes the computer to execute a process of calculating a specific risk indicating the possibility of being determined as personal data.
PCT/JP2015/005289 2014-10-29 2015-10-20 Information processing device, information processing method, and recording medium WO2016067566A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016556210A JPWO2016067566A1 (en) 2014-10-29 2015-10-20 Information processing apparatus, information processing method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-219808 2014-10-29
JP2014219808 2014-10-29

Publications (1)

Publication Number Publication Date
WO2016067566A1 true WO2016067566A1 (en) 2016-05-06

Family

ID=55856933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/005289 WO2016067566A1 (en) 2014-10-29 2015-10-20 Information processing device, information processing method, and recording medium

Country Status (2)

Country Link
JP (1) JPWO2016067566A1 (en)
WO (1) WO2016067566A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017228255A (en) * 2016-06-24 2017-12-28 Necソリューションイノベータ株式会社 Evaluation device, evaluation method and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287846A (en) * 2003-03-20 2004-10-14 Ntt Data Corp Individual specification preventing device, individual specification preventing method and program
US20110178943A1 (en) * 2009-12-17 2011-07-21 New Jersey Institute Of Technology Systems and Methods For Anonymity Protection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287846A (en) * 2003-03-20 2004-10-14 Ntt Data Corp Individual specification preventing device, individual specification preventing method and program
US20110178943A1 (en) * 2009-12-17 2011-07-21 New Jersey Institute Of Technology Systems and Methods For Anonymity Protection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RYOKO SUZUKI ET AL.: "On the Security of Anonymised databases", CSS2012 COMPUTER SECURITY SYMPOSIUM 2012 RONBUNSHU GODO KAISAI ANTI MALWARE ENGINEERING WORKSHOP 2012, IPSJ SYMPOSIUM SERIES, vol. 2012, no. 3, 23 October 2012 (2012-10-23), pages 517 - 524 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017228255A (en) * 2016-06-24 2017-12-28 Necソリューションイノベータ株式会社 Evaluation device, evaluation method and program

Also Published As

Publication number Publication date
JPWO2016067566A1 (en) 2017-08-17

Similar Documents

Publication Publication Date Title
Bazemore et al. “Community vital signs”: incorporating geocoded social determinants into electronic records to promote patient and population health
Bannay et al. The best use of the Charlson comorbidity index with electronic health care database to predict mortality
Malin et al. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule
US9230132B2 (en) Anonymization for data having a relational part and sequential part
Gros et al. Containment efficiency and control strategies for the corona pandemic costs
Galvin et al. Developments in privacy and data ownership in mobile health technologies, 2016-2019
WO2013121739A1 (en) Anonymization device, and anonymization method
US10600506B2 (en) System and method for creation of persistent patient identification
Jean et al. Temporal trends in prevalence, incidence, and mortality for rheumatoid arthritis in Quebec, Canada: a population-based study
US9990515B2 (en) Method of re-identification risk measurement and suppression on a longitudinal dataset
US20160306999A1 (en) Systems, methods, and computer-readable media for de-identifying information
Bauer et al. Addressing disparities in the health of American Indian and Alaska Native people: the importance of improved public health data
JP5782636B2 (en) Information anonymization system, information loss determination method, and information loss determination program
Quinn et al. The validity of the Short-Term Assessment of Risk and Treatability (START) in a UK medium secure forensic mental health service
JP2017228255A (en) Evaluation device, evaluation method and program
Dybov On regular solutions of the Dirichlet problem for the Beltrami equations
AU2019293106A1 (en) Personal information analysis system and personal information analysis method
Buchanich et al. Underascertainment of deaths using social security records: a recommended solution to a little-known problem
WO2016067566A1 (en) Information processing device, information processing method, and recording medium
Hrostowski et al. The unchecked HIV/AIDS crisis in Mississippi
JP6127774B2 (en) Information processing apparatus and data processing method
WO2016203752A1 (en) Information processing device, information processing method, and storage medium
Aboumrad et al. Development and validation of a clinical risk score to predict hospitalization within 30 days of coronavirus disease 2019 diagnosis
Frimpong et al. Effect of the Ghana National Health Insurance Scheme on exit time from catastrophic healthcare expenditure
Brener et al. Association between in‐hospital supportive visits by primary care physicians and patient outcomes: A population‐based cohort study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15853993

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016556210

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15853993

Country of ref document: EP

Kind code of ref document: A1