WO2014030302A1 - Information processing device for executing anonymization and anonymization processing method - Google Patents
Information processing device for executing anonymization and anonymization processing method Download PDFInfo
- Publication number
- WO2014030302A1 WO2014030302A1 PCT/JP2013/004624 JP2013004624W WO2014030302A1 WO 2014030302 A1 WO2014030302 A1 WO 2014030302A1 JP 2013004624 W JP2013004624 W JP 2013004624W WO 2014030302 A1 WO2014030302 A1 WO 2014030302A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attribute
- information loss
- loss amount
- personal data
- value
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Definitions
- the present invention relates to an information processing apparatus, anonymization processing method, and program for anonymizing personal data.
- the digitization of medical information is progressing, and medical information is being accumulated in medical institutions and health insurance associations. Wide use of such medical information is considered to lead to the development of epidemiological research, medical technology and new drug development. Therefore, it is expected that the medical information can be used in research institutions while ensuring the privacy of the stored medical information.
- Anonymization is one of the methods for ensuring privacy in the use of information.
- This anonymization is a technique for performing processing for preventing identification of individuals on data including information that is not desired to be known to others, such as the above-described medical information.
- a batch of data to be processed as described above is referred to as a data set.
- a lump of data corresponding to each individual constituting the data set is called a personal data record.
- the minimum unit information such as the age of the individual and the name of the disease affected by the individual constituting the personal data record is called an attribute.
- Non-Patent Document 1 discloses k-anonymization, which is one of representative techniques for anonymization.
- k-anonymization each individual data record included in the data set is processed so that the individual specific probability is 1 / k (k-k of anonymization) or less, and a certain anonymity is guaranteed.
- Processing in k-anonymization is, for example, processing such that the value of a specific attribute is made ambiguous (also called generalization) among a plurality of personal data records constituting a data set. It is.
- Generalization k-anonymization has a top-down approach and a bottom-up approach.
- the top-down approach is a method of embodying attribute values contained in the most generalized personal data record within a range where k-anonymity does not break down.
- the bottom-up approach is a method of generalizing the original values of unprocessed personal data records so as to ensure k-anonymity.
- Non-Patent Document 2 shows one of the typical top-down approaches.
- the method disclosed in Non-Patent Document 2 is a method for anonymizing a personal data record of a data set by processing it as follows in order to satisfy k-anonymity in a data set.
- the initial state is a state in which the values of all attributes to be anonymized are generalized to the same value for each attribute among all personal data records in the data set to be anonymized.
- one attribute is selected from the attributes to be anonymized.
- the median value of the attribute values selected in the first step included in each of all the personal data records is obtained.
- the third step divides the personal data records into two groups based on the attribute values of the personal data records with the calculated median as a boundary.
- Non-Patent Document 3 shows one of the representative methods of the bottom-up approach.
- the technique disclosed in Non-Patent Document 3 is an anonymization technique in which a certain attribute value is generalized from an original value so that personal data satisfies k-anonymity in a certain data set.
- Patent Document 1 discloses a data anonymization device incorporating k-anonymization.
- the data anonymization device of Patent Literature 1 generates a complete graph connecting all personal data records constituting a data set with edges, divides the complete graph into clusters, and generalizes attributes in divided cluster units. .
- the data anonymization device realizes k-anonymization by a top-down approach.
- the reason for this is that in the anonymization technology disclosed in the above-mentioned patent document and non-patent document, the attributes to be processed are selected and processed in an order not related to the purpose of use without considering the purpose of use. Because. [Object of the invention]
- the objective of this invention is providing the information processing apparatus, the anonymization processing method, and program which solve the problem mentioned above.
- the information processing apparatus of the present invention calculates an information loss amount corresponding to each attribute included in the first personal data to be anonymized, and outputs information loss amount calculation means, and priority corresponding to each of the attributes Anonymity to determine the attribute to be processed based on the degree and the amount of information loss, generate second personal data obtained by processing the attribute value of the determined attribute of the first personal data, and output the second personal data Processing means.
- the computer calculates and outputs an information loss amount corresponding to each of the attributes included in the first personal data to be anonymized, and the priority corresponding to each of the attributes
- the attribute to be processed is determined based on the information loss amount, and second personal data obtained by processing the attribute value of the determined attribute of the first personal data is generated and output.
- the non-volatile recording medium of the present invention calculates the information loss amount corresponding to each of the attributes included in the first personal data to be anonymized, outputs the process, the priority corresponding to each of the attributes, and the Processing for determining the attribute to be processed based on the amount of information loss, and processing for generating and outputting second personal data obtained by processing the attribute value of the determined attribute of the first personal data
- a program for causing the computer to execute is recorded.
- the present invention has the effect that the data set can be anonymized so as to match the purpose of use.
- FIG. 1 is a block diagram illustrating a configuration of the anonymization device according to the first embodiment.
- FIG. 2 is a block diagram illustrating a configuration of a system including the anonymization device according to the first to third embodiments.
- FIG. 3 is a diagram showing an example of personal data in the first and second embodiments.
- FIG. 4 is a diagram illustrating an example of anonymized personal data in the first and second embodiments.
- FIG. 5 is a block diagram illustrating a hardware configuration of a computer that realizes the anonymization apparatus according to the first to third embodiments.
- FIG. 6 is a flowchart illustrating the operation of the anonymization device according to the first embodiment.
- FIG. 7 is a block diagram illustrating a configuration of the anonymization device according to the second embodiment.
- FIG. 8 is a diagram illustrating an example of priority determination information according to the second embodiment.
- FIG. 9A is a sequence diagram illustrating an operation of the anonymization device according to the second exemplary embodiment.
- FIG. 9B is a sequence diagram illustrating an operation of the anonymization device according to the second exemplary embodiment.
- FIG. 10A is a sequence diagram illustrating an operation of the anonymization device according to the second exemplary embodiment.
- FIG. 10B is a sequence diagram illustrating an operation of the anonymization device according to the second exemplary embodiment.
- FIG. 11 is a diagram illustrating an example of a data set in a generalized state according to the second embodiment.
- FIG. 12 is a diagram illustrating an example of division value candidates in the second embodiment.
- FIG. 13 is a diagram illustrating an example of division value candidates in the second embodiment.
- FIG. 14 is a diagram illustrating an image in which a data set is divided.
- FIG. 15 is a block diagram illustrating a configuration of the anonymization device according to the third embodiment.
- FIG. 16 is a diagram illustrating an example of priority determination information according to the third embodiment.
- FIG. 17 is a diagram illustrating an example of personal data according to the third embodiment.
- FIG. 18A is a sequence diagram illustrating an operation of the anonymization device according to the third exemplary embodiment.
- FIG. 18B is a sequence diagram illustrating an operation of the anonymization device according to the third exemplary embodiment.
- FIG. 18C is a sequence diagram illustrating an operation of the anonymization device according to the third exemplary embodiment.
- FIG. 18A is a sequence diagram illustrating an operation of the anonymization device according to the third exemplary embodiment.
- FIG. 18B is a sequence diagram illustrating an operation of the anonymization device according to the third exemplary embodiment.
- FIG. 18C is a sequence diagram illustrating an operation of the anonymization device according to the
- FIG. 19 is a diagram illustrating an example of personal data in an intermediate stage of anonymization processing by the anonymization device according to the third exemplary embodiment.
- FIG. 20 is a diagram illustrating an example of personal data in the middle of anonymization processing by the anonymization device according to the third exemplary embodiment.
- FIG. 21 is a diagram illustrating an example of personal data at an intermediate stage of anonymization processing by the anonymization device according to the third exemplary embodiment.
- FIG. 1 is a block diagram showing a configuration of an anonymization device (also called an information processing device) 310 according to the first embodiment of the present invention.
- an anonymization device also called an information processing device
- the anonymization device 310 of this embodiment includes an information loss amount calculation unit 312 and an anonymization processing unit 313.
- the constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of a computer device.
- the components shown in FIG. 1 will be described as components divided into functional units of the computer apparatus.
- FIG. 2 is a block diagram showing a configuration of a system including the anonymization device 310 according to the first embodiment of the present invention.
- the system includes a personal data storage device 100, an anonymized personal data storage device 200, and an anonymization device 310.
- the personal data storage device 100 stores a data set (hereinafter referred to as a data set sp) that is personal data to be anonymized (first personal data).
- the data set sp is a set of data records (hereinafter referred to as data records rp).
- the data record rp includes attribute values of a plurality of attributes corresponding to a specific individual.
- Personal data is, for example, medical information held by medical institutions.
- the attribute values included in the data record rp are attribute values of attributes such as name, date of birth, date of medical care, and disease name.
- FIG. 3 is a diagram illustrating an example of a data set sp110 that is personal data stored in the personal data storage device 100.
- the data set sp110 includes a plurality of data records rp111.
- the data record rp111 includes attribute values of “name”, “birth year”, “care date”, and “disease name”.
- the attribute of “name” is an identifier.
- the attributes of “birth year” and “medical care date” are quasi-identifiers that, when combined, may identify an individual.
- the attribute of “disease name” is sensitive information that is not desired to be known to others. These attributes used as quasi-identifiers or sensitive information are examples. That is, in the anonymization apparatus 310, it is arbitrary which attribute among the attributes included in the data set sp110 is treated as a quasi-identifier or sensitive information.
- a data record rp111 having a name attribute “patientA” has an attribute of “1949” as an attribute of birth year, “201006” as an attribute of medical treatment date, and an attribute of “DiseaseA” as an attribute of wound name Contains a value.
- the personal data set sp110 shown in FIG. 3 is an example, and the data record rp111 may include attribute values of arbitrary attributes as quasi-identifiers and sensitive information, respectively.
- the anonymized personal data storage device 200 stores an anonymized data set (hereinafter referred to as anonymized data set sa) that is anonymized personal data (second personal data).
- the anonymized data set sa is a set of anonymized data records (hereinafter referred to as anonymized data records ra) in which the data record rp111 is anonymized.
- FIG. 4 is a diagram illustrating an example of the anonymized data set sa210 that is anonymized personal data stored in the anonymized personal data storage device 200.
- the anonymized data set sa210 is a data set after the attribute value of the data set sp110 is processed (for example, generalized) by the anonymization device 310 to be anonymized. That is, the anonymized data set sa210 includes an anonymized data record ra211 obtained by processing the data record rp111 instead of the data record rp111.
- the “birth year” and “medical care date” are processed (generalized), and the attribute value of the quasi-identifier is obscured compared to the data set sp110.
- the information loss amount calculation unit 312 calculates and outputs an information loss amount (hereinafter referred to as an information loss amount ILA) corresponding to each attribute in the data set sp110.
- the information loss amount ILA is the amount of attribute information abstraction (hereinafter referred to as information abstraction ia) that increases when any attribute included in the data set sp110 is processed.
- the information abstraction ia is the abstraction of attribute information, that is, the attribute value of the attribute.
- the information loss amount calculation unit 312 may calculate the information loss amount ILA using various methods described below, as necessary.
- the information loss amount calculation unit 312 divides the range of attribute values after generalization of an attribute by the range of attribute values before generalization of the same attribute for one attribute.
- the amount of information loss in the data record (hereinafter referred to as information loss amount ILR) is calculated.
- the information loss amount calculation unit 312 adds the information loss amount ILR by the number of data records to calculate the information loss amount ILA.
- the information loss amount calculation unit 312 calculates the information loss amount ILA of each attribute to be anonymized when the data set sp110 illustrated in FIG. .
- the information loss amount calculation unit 312 always calculates the information loss amount ILR for one attribute value as “1”.
- the information loss amount calculation unit 312 adds the information loss amount ILR corresponding to the number (20) of the data records rp111, and calculates “20” as the information loss amount ILA. In this way, the information loss amount calculation unit 312 calculates “20” as the information loss amount ILA for any attribute of the data set sp110.
- the information loss amount calculation unit 312 divides the attribute value of the “birth year” attribute of the data set sp110 illustrated in FIG. 3 into data records rp111 of “1956” or less and “1961” or more, and generalizes them. If it is assumed, the information loss amount ILA of each attribute to be anonymized is calculated as follows.
- the minimum value is “1943” and the maximum value is “1977”.
- the attribute value range of the “birth year” attribute after generalization (the data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1956” or less) has a minimum value of “1943”, The maximum value is “1956”.
- the information loss amount calculation unit 312 calculates the information loss amount ILR-birth-ul 1956 of the attribute of “birth year” whose attribute value of “birth year” is “1956” or less as follows.
- the information loss amount calculation unit 312 adds the information loss amount ILR-birth-ul 1956 corresponding to the number (nine) of data records rp111 whose attribute value of the “birth year” attribute is “1956” or less. "3.438" is calculated as the information loss amount ILA-birth-ul 1956 of the data record rp111 whose attribute value is "1956” or less.
- the attribute value range of the “birth year” attribute after generalization (the data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1961” or more) has a minimum value of “1961”.
- the information loss amount calculation unit 312 adds the information loss amount ILR-birth-ov1961 corresponding to the number (11) of data records rp111 whose attribute value of the “birth year” attribute is “1961” or more. "5.181" is calculated as the information loss amount ILA-first-ov1961 of the data record rp111 having an attribute value of "1961” or more.
- the information loss amount calculation unit 312 adds the information loss amount ILA-birth-ul 1956 and the information loss amount ILA-birth-ov1961, and sets the information loss amount ILA-birth with the attribute “birth year” as “8. 619 "is calculated.
- the attribute value range of the “medical care date” attribute before generalization has a minimum value of “200512” and a maximum value of “201107”.
- the attribute value range of the “medical year” attribute after generalization has a minimum value of “200512”. Yes, the maximum value is “201107”. Therefore, the information loss amount calculation unit 312 sets the information loss amount ILR-mc-ul 1956 of the attribute “medical year” of the data record rp111 whose attribute value of the “birth year” attribute is “1956” or less to “1”. And calculate.
- the information loss amount calculation unit 312 adds the information loss amount ILR-mc-ul 1956 by the number (nine) of data records rp111 whose attribute value of the “birth year” attribute is “1956” or less. "9” is calculated as the information loss amount ILA-mc-ul 1956 of the attribute of "medical care date" of the data record rp111 having the attribute value of "1956” or less.
- the attribute value range of the “medical year” attribute after the generalization (data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1961” or more) is “20000612” as the minimum value range. Yes, the maximum value is “201107”. Therefore, the information loss amount calculation unit 312 sets the information loss amount ILR-mc-ov1961 of the attribute “medical year” of the data record rp111 having the attribute value of “birth year” of “1961” or more to “0. 832 ".
- the information loss amount calculation unit 312 adds the information loss amount ILR-mc-ov1961 by the number (11) of data records rp111 whose attribute value of the “birth year” attribute is “1961” or more. "9.152" is calculated as the information loss amount ILA-mc-ov1961 of the attribute of "medical care date" of the data record rp111 having an attribute value of "1961” or more.
- the information loss amount calculation unit 312 adds the information loss amount ILA-mc-ul 1956 and the information loss amount ILA-mc-ov1961 to obtain “18 as an information loss amount ILA-mc whose attribute is“ medical date ”. .152 ".
- the information loss amount calculation unit 312 may calculate the information loss amount ILA as follows. First, the information loss amount calculation unit 312 calculates the ratio of the number of attribute value types of the attribute after generalization and before generalization as the information loss amount ILR of one data record. Next, the information loss amount calculation unit 312 adds the information loss amount ILR by the number of data records to calculate the information loss amount ILA.
- the anonymization processing unit 313 determines the priority of each attribute (hereinafter, priority p). Called). Further, the anonymization processing unit 313 determines an attribute to be processed based on the priority p and the information loss amount ILA calculated by the information loss amount calculation unit 312. In other words, the anonymization processing unit 313 performs processing so as to reduce the loss of information in the entire anonymized data set sa210 by using the priority p and considering the purpose of use, and using the information loss amount ILA. Determine the attributes.
- the priority determination information is information for determining the priority p.
- the priority p is information indicating the degree of preventing the information abstraction ia possessed by each attribute included in the data set sp110 (data record rp111) from increasing (preventing loss of information preferentially). That is, the priority p indicates the priority of anonymization so that the increase in the information abstraction ia for the data set sp110 in the anonymized data set sa210 is made smaller for any of a plurality of attributes.
- the anonymization processing unit 313 calculates an evaluation value obtained by calculating (for example, multiplying) the priority p and the information loss amount ILA for each attribute.
- the anonymization processing unit 313 may acquire an evaluation value corresponding to a combination of the specific priority p and the specific information loss amount ILA from a unit (not shown).
- the calculation for calculating the evaluation value is a calculation for calculating the evaluation value larger as the priority p is higher if the information loss amount ILA is constant.
- the calculation for calculating the evaluation value is a calculation for calculating the evaluation value larger as the information loss amount ILA is larger if the priority p is constant. The same applies to the case where an evaluation value corresponding to a combination of a specific priority p and a specific information loss amount ILA is acquired.
- the anonymization processing unit 313 determines an attribute to be generalized so that an attribute with a smaller evaluation value is generalized so that an attribute with a larger evaluation value is not generalized.
- the anonymization processing unit 313 may determine an attribute to be generalized so that an attribute with a larger evaluation value is generalized so that an attribute with a smaller evaluation value is not generalized.
- the calculation for calculating the evaluation value is a calculation for calculating a smaller evaluation value as the priority p is higher if the information loss amount ILA is constant, and as the information loss amount ILA is larger if the priority p is constant. .
- the anonymization processing unit 313 generates and outputs an anonymized data set sa210 obtained by processing the determined attribute of the data set sp110. Note that the anonymization processing unit 313 may generate and output information on the difference of the anonymized data set sa210 with respect to the data set sp110.
- the anonymization processing unit 313 may evaluate anonymity of the processed data set.
- the processed data set is any one of a part and the whole of the data set when those attributes are processed. Subsequently, when the result of evaluating the anonymity is a predetermined content, the anonymization processing unit 313 treats the processed data set as any one of the anonymized data set part and the whole.
- the anonymized personal data storage device 200 may be recorded.
- FIG. 5 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 310 according to the present embodiment.
- the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside.
- the recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
- the CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown).
- the CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702.
- the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
- the CPU 701 executes various processes as the information loss amount calculation unit 312 and the anonymization processing unit 313 shown in FIG. 1 according to the read program and based on the read data.
- the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
- the storage unit 702 stores programs and data.
- the storage unit 702 may include the personal data storage device 100 and the anonymized personal data storage device 200.
- the storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707.
- the storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer.
- the storage device 703 may include a personal data storage device 100 and an anonymized personal data storage device 200.
- the input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation.
- the input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like.
- the output unit 705 is realized by a display, for example, and is used for confirming the output.
- the communication unit 706 implements an interface with the personal data storage device 100, the anonymized personal data storage device 200, and other external devices (not shown).
- the communication unit 706 is included as part of the anonymization processing unit 313.
- the functional unit block of the anonymization device 310 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG.
- the means for realizing each unit included in the computer 700 is not limited to the above.
- the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
- the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707.
- the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.
- FIG. 6 is a flowchart showing the operation of the anonymization device 310 in this embodiment.
- the information loss amount calculation unit 312 calculates the information loss amount ILA for each anonymization target attribute of the data set sp110 (step S601).
- the anonymization processing unit 313 determines the priority p of each attribute based on the information for determining the priority p (step S602).
- the anonymization processing unit 313 determines an attribute to be processed based on the information loss amount ILA and the priority p (step S603).
- the anonymization processing unit 313 processes the determined attribute of the data record rp111 (step S604).
- the anonymization processing unit 313 outputs the data record rp111 in which the attribute is processed (step S605).
- the first effect of the present embodiment described above is that the data set can be anonymized by controlling to match the purpose of use.
- the information loss amount calculation unit 312 calculates and outputs an information loss amount ILA corresponding to each attribute.
- the anonymization processing unit 313 determines an attribute to be processed based on the priority p and the information loss amount ILA, and processes the determined attribute.
- the second effect of the present embodiment described above is that it is possible to reduce the loss of information in the anonymized data set in addition to the first effect.
- this second effect enables both anonymizing the data set by controlling it to match the purpose of use and reducing the loss of information in the anonymized data set. Is a point. This is because anonymization is performed only considering that it matches the purpose of use, so that it is possible to prevent the general loss of attributes other than the attribute that suppresses processing and the loss of information as a whole as a whole. It will be possible.
- the anonymization processing unit 313 determines the attribute to be processed based on both the priority p and the information loss amount ILA.
- FIG. 7 is a block diagram showing a configuration of the anonymization device 320 according to the present embodiment.
- the anonymization apparatus 320 of this embodiment performs anonymization by a top-down approach.
- the anonymization device 320 includes a priority determination information storage unit 321, an information loss amount calculation unit 322, and an anonymization processing unit 323.
- the anonymization device 320 may be included in the system shown in FIG.
- the priority determination information storage unit 321 stores information for determining the priority p.
- Information for determining the priority p is preset by the user of the system. Further, the information for determining the priority p may be received in advance from an external system by the division attribute determining unit 3233 via the communication unit 706 shown in FIG.
- FIG. 8 is a diagram illustrating an example of the priority determination information 3210 stored in the priority determination information storage unit 321.
- the priority determination information 3210 includes a set of an index and a weight (also referred to as priority).
- the index is a value that uniquely determines the weight.
- the weight corresponds to each of the indexes and is a number indicating the importance of the attribute. In FIG. 8, for example, the weight corresponding to “5” of the index is “16”.
- the index is not limited to five, and may be any number of two or more. Further, the index is not limited to numerals, and may be written in alphabets or the like, or may be attribute names (hereinafter also referred to as attribute names).
- the weight may be an arbitrary numerical value that can be used for calculation of an evaluation value described later.
- the information loss amount calculation unit 322 calculates and outputs the information loss amount ILA of each attribute in the data set sp110.
- the anonymization processing unit 323 includes a division attribute determination unit 3233, a division value determination unit 3234, an anonymity evaluation unit 3235, and a generalization execution unit 3236.
- the division attribute determination unit 3233 uses the priority determination information 3210 stored in the priority determination information storage unit 321, for example, based on the index of each attribute input from the input unit 704 illustrated in FIG. 5. Generate weights for.
- the division attribute determination unit 3233 determines the attribute of the division axis (also referred to as an attribute to be processed, hereinafter referred to as an attribute) based on the generated weight and the information loss amount ILA.
- the split attribute is an attribute that is split based on the attribute value of the split attribute when the data set (for example, the data set sp110) is split.
- dividing a data set means grouping data records included in the data set. That is, when dividing the data set (for example, the data set sp110), the division attribute determination unit 3233 performs the division based on the attribute value range of the division attribute.
- the range is, for example, a value larger than a certain value and a smaller value. Alternatively, the range may be a geographic region, a type of thing, or an association with an event.
- the division value determination unit 3234 determines the division value of the division attribute so as to satisfy the necessary anonymity. For example, when the attribute value is indicated by a numerical value, the division value is a numerical value within a possible range of the attribute value.
- the range may be a set of identification information (for example, prefecture names) indicating the area when the attribute value is a geographical area.
- the range may be identification information (for example, what is performed outdoors) that classifies the type when the attribute value is a type of thing (for example, hobby).
- the range may be the presence or absence of relevance when the attribute value is relevance with an event.
- the anonymity evaluation unit 3235 determines whether each divided data set satisfies the required anonymity when a certain data set is divided. Specifically, the anonymity evaluation unit 3235, for example, when a certain data set is divided into two groups, the data so that each of the two groups includes at least k data records rp111. Determine whether the set can be split.
- the k “k” s are k-anonymity or k-anonymization “k”. The same applies to the subsequent k.
- the generalization execution unit 3236 generalizes (processes) the attribute value of the determined attribute based on the determined division value, and outputs it.
- the anonymization device 320 described above may be realized by the computer 700 shown in FIG. 5 similarly to the anonymization device 310 shown in FIG.
- FIG. 9A, FIG. 9B, FIG. 10A and FIG. 10B are sequence diagrams showing the operation of this embodiment.
- the division attribute determination unit 3233 receives, for example, an input of a division attribute determination request by a system user from the input unit 704 shown in FIG. 5 (step S801).
- the split attribute determination request includes, for example, k-anonymity k value “5” and attribute name and corresponding index “birth year: 4, medical year: 1”.
- the user who uses the anonymized data set specifies a larger index value for an attribute whose degree of generalization (processing) is desired to be suppressed.
- the division attribute determination unit 3233 stores, for example, in the storage unit 702 illustrated in FIG. 5, the value “5” of k included in the received division attribute determination request, the attribute name, and the corresponding index “birth year: 4, medical year” “Month: 1” is stored (step S802).
- the division attribute determination unit 3233 uses the priority determination information 3210 to generate a weight based on the attribute name and the corresponding index “birth year: 4, medical year: 1” (step S803).
- the division attribute determination unit 3233 calculates the weight corresponding to the attribute “birth year” as “8” and the weight corresponding to the attribute “medical year” as “1”.
- the division attribute determination unit 3233 transmits an information loss amount ILA calculation request to the information loss amount calculation unit 322 (step S804).
- the information loss amount calculation unit 322 that has received the calculation request for the information loss amount ILA transmits a request for acquiring the data set sp110 (hereinafter also referred to as a personal data acquisition request) to the personal data storage device 100 (step S805). ).
- the information loss amount calculation unit 322 that has received the data set sp110 calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the division attribute determination unit 3233 (step S806).
- the information loss amount calculation unit 322 calculates the information loss amount ILR of one data record rp111 using, for example, the following formula 1.
- pta-max is the maximum attribute value after generalization.
- pta-min is the minimum attribute value after generalization.
- Ptb-max is the maximum value of the attribute value before generalization.
- Ptb-min is the minimum attribute value before generalization.
- this embodiment is an anonymization embodiment using a top-down approach, it is assumed that the attribute values of the attributes to be anonymized in the data set sp110 are generalized so that they all have the same value.
- FIG. 11 is a diagram showing the data set st120 when the attribute values of the attributes to be anonymized in the data set sp110 shown in FIG. 3 are generalized to the same value. That is, the data set st120 shown in FIG. 11 is a data set in which the data set sp110 is generalized to the maximum.
- pta-max is, for example, “1977” (“1977” of “1943 to 1977”), which is the maximum attribute value of the attribute whose attribute name is “birth year” in the data set st120 shown in FIG. is there.
- pta-min is, for example, “1943” (“1943” of “1943 to 1977”) that is the minimum value of the attribute whose attribute name is “birth year” in the data set st120.
- ptb-max is, for example, “1977” which is the maximum attribute value of the attribute whose attribute name is “birth year” in the data set sp110 shown in FIG.
- ptb-min is, for example, “1943” which is the minimum attribute value of the attribute whose name is “birth year” in the data set sp110.
- the information loss amount calculation unit 322 calculates the ratio of the number of attribute value types of the attribute after generalization and before generalization as the information loss amount ILR of one data record rp111. May be.
- step S803, step S804, step S805, and step S806 may be any order. That is, the order may be reversed or simultaneous.
- the division attribute determination unit 3233 determines a division attribute (step S807).
- the division attribute determination unit 3233 calculates an evaluation value using an evaluation formula including the weight and the information loss amount ILA, and determines a division attribute.
- Formula 2 shown below is an example of an evaluation formula.
- Evaluation value weight ⁇ information loss amount ILA (Expression 2)
- the evaluation value of the attribute whose attribute name is “birth year” in the data set sp110 is “160” because the weight is “8” and the information loss amount ILA-birth is “20”.
- the evaluation value of the attribute having the attribute name “medical care date” is “20” because the weight is “1” and the information loss amount ILA-mc is “20”.
- the division attribute determination unit 3233 determines the attribute having the largest calculated evaluation value as the division attribute. For example, in the case of the data set sp110, since the evaluation value of the attribute whose attribute name is “birth year” is larger than the evaluation value of the attribute whose attribute name is “medical care month”, the divided attribute determination unit 3233 ”Is determined as the split attribute.
- the formula for calculating the evaluation value is not limited to the formula 2, but the higher the priority p (for example, the value indicating that the higher the priority is, like the “weight” in the formula 2), and the amount of information loss.
- An arbitrary evaluation formula may be used such that the larger the ILA, the larger the calculation result.
- the division attribute determination unit 3233 transmits a division value determination request to the division value determination unit 3234 (step S808).
- the division value determination request includes the “birth year” of the attribute name of the division attribute determined by the division attribute determination unit 3233.
- the division value determination unit 3234 that has received the division value determination request transmits a personal data acquisition request to the personal data storage device 100. (Step S809)
- the division value determining unit 3234 that has received the data set sp110 determines a division value (step S810).
- the division value is a threshold value when dividing the data set with the specified attribute as the division axis. For example, the division value “birth year: 1956” indicates that the data set sp110 is divided into the data record rp111 whose attribute of “birth year” is “1956” or less and the data record rp111 exceeding “1956”.
- FIG. 12 is a diagram illustrating an example of the division value candidates 1101 to 1111 of the data set sp110.
- the division value determination unit 3234 arranges the data records rp111 of the data set sp110 in the order in which the attribute values are determined in ascending order of the attribute values.
- the division value determination unit 3234 extracts division value candidates 1101 to 1111.
- Divided value candidates 1101 to 1111 extracted by the divided value determining unit 3234 include the first half part (also called third personal data) and the second half part (also called fourth personal data) of the divided data set sp110.
- This is a candidate for a division value in which the number of data records rp111 is k or more.
- the first half includes five data records rp111 whose attribute value is 1951 or less. Further, the latter half includes 15 data records rp111 of 1952 or more. In this case, each of the first half and the second half is 5 or more.
- the division value determining unit 3234 extracts division value candidates 1101 to 1111.
- the division value determination unit 3234 calculates an information loss amount ILA corresponding to each of the division value candidates 1101 to 1111. For example, the division value determining unit 3234 calculates the information loss amount ILA using Equation 1. Note that the division value determination unit 3234 may calculate the information loss amount ILA not only using the equation 1 but also using another calculation equation.
- the division value determination unit 3234 calculates the information loss amount ILA as follows.
- the division value determination unit 3234 sorts the data set sp110 in ascending order according to the attribute value of the division attribute “birth year”.
- the information loss amounts ILA of the attributes “birth year” when divided by the division value candidates 1101 to 1104 are “11.76”, “12.47”, “10.67”. And “10.23”.
- the information loss amounts ILA of the attribute “birth year” when divided by the division value candidates 1106 to 1111 calculated in the same manner are “10.00”, “10.05”, “9” .88 “,” 10.14 “,” 10.70 “and” 10.73 “.
- the division value determining unit 3234 that has calculated the information loss amount ILA for each of the division value candidates 1101 to 1111 determines “birth year: 1956” of the division value candidate 1105 having the smallest information loss amount ILA as the division value.
- the division value determination unit 3234 that has determined the division value transmits the determined division value “birth year: 1956” to the anonymity evaluation unit 3235 (step S811). In other words, the divided value determination unit 3234 requests the anonymity evaluation unit 3235 to evaluate anonymity.
- the anonymity evaluation unit 3235 that has received the division value “birth year: 1956” performs anonymity evaluation (step S812).
- Anonymity evaluation means evaluating whether or not an anonymity index is satisfied.
- the anonymity evaluation unit 3235 further divides the first half (third personal data) and the second half (fourth personal data) of the data set sp110, the further division is performed. Evaluate whether the part satisfies the anonymity index. That is, it is evaluated whether or not the number of data records rp111 is 2k or more for each of the first half and the latter half.
- the anonymity evaluation unit 3235 counts the number of data records rp111 of the first half and the second half divided by the received division value. For example, when dividing by the division value “birth year: 1956”, the anonymity evaluation unit 3235 counts the number of data records rp111 in the first half as nine and the number of data records rp111 in the latter half as eleven. .
- the anonymization processing unit 323 determines the portion evaluated by the anonymity evaluation unit 3235 that the anonymity index is not satisfied (for example, the first half portion divided by the division value “birth year: 1956”). The process of step S815 is executed. In addition, the anonymization processing unit 323 performs the processing from step S821 onward for the portion evaluated by the anonymity evaluation unit 3235 that satisfies the anonymity index (for example, the latter half portion divided by the division value “birth year: 1956”) Execute.
- the anonymity evaluation unit 3235 transmits a generalization execution request including “birth year: 1943 to 1956” to the generalization execution unit 3236 (step S813).
- the generalization execution unit 3236 Upon receiving the generalization execution request, generalization execution unit 3236 generalizes the data records rp111 having attribute values “1943” to “1956” of the “year of birth” attribute (step S814).
- the generalization execution unit 3236 sets the attribute value of the “birth year” attribute to “1943 to 1956” in the data record rp111 corresponding to the attribute value of “birth year” from “1943” to “1956”.
- the attribute value of the attribute “medical treatment date” is rewritten to “200512 to 201107”.
- the generalization execution unit 3236 records the rewritten data record rp111 in the anonymized personal data storage device 200 (step S815). In other words, the generalization execution unit 3236 registers the anonymized personal data in the anonymized personal data storage device 200.
- the number of data records rp111 in the second half divided by the division value “birth year: 1956” was 2k or more. Therefore, the anonymization device 320 sets the divided second half portion (fourth personal data) of the data set sp110 as a new data set sp (new first personal data), and performs the processing after step S821 (second time). Anonymize).
- the anonymity evaluation unit 3235 transmits a subdivision request including “birth year: 1961 to 1977” to the division attribute determination unit 3233 (step S821).
- the division attribute determination unit 3233 that has received the subdivision request uses the priority determination information 3210 to generate a weight based on the attribute name and the corresponding index “birth year: 4, medical year: 1”.
- the division attribute determination unit 3233 calculates the weight corresponding to the attribute “birth year” as “8” and the weight corresponding to the attribute “medical year” as “1”.
- the division attribute determination unit 3233 requests the information loss amount calculation unit 322 to calculate the information loss amount ILA (step S823).
- the information loss amount calculation unit 322 that has received the calculation request for the information loss amount ILA sends the personal data storage device 100 the data record rp111 (“1961” to “1977” attribute values of the “birth year” attribute).
- An acquisition request for the latter half of the data set sp110 is transmitted (step S824).
- the information loss amount calculation unit 322 requests the personal data storage device 100 to acquire personal data.
- the information loss amount calculation unit 322 that has received the latter half of the data set sp110 calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the division attribute determination unit 3233 (step S825).
- the information loss amount calculation unit 322 calculates the information loss amount ILA-birth-ov1961 of the attribute “birth year” for the latter half of the data set sp110 as follows.
- the information loss amount calculation unit 322 calculates the information loss amount ILA-mc-ov1961 having the attribute of “medical care date” as follows.
- the division attribute determination unit 3233 determines a division attribute (step S826). For example, the division attribute determination unit 3233 uses Equation 2 to evaluate the “birth year” attribute because the weight of the “birth year” attribute is “8” and the information loss amount ILA-birth-ov1961 is “5.181”. “41.448” is calculated as the value. Similarly, for the attribute of “medical care date”, the division attribute determination unit 3233 has a weight of “1” and an information loss amount ILA-mc-ov1961 of “9.152”. As a result, “9.152” is calculated.
- the division attribute determination unit 3233 determines the attribute whose attribute name is “birth year” as the division attribute.
- the division attribute determination unit 3233 transmits a division value determination request including the attribute name “birth year” to the division value determination unit 3234 (step S827).
- the division value determination unit 3234 that has received the division value determination request transmits a personal data acquisition request to the personal data storage device 100 (step S828).
- the division attribute determination unit 3233 requests to acquire the data record rp111 (for example, the second half of the data set sp110) that is the object of the second anonymization.
- the division value determining unit 3234 that has received the target data record rp111 determines a division value (step S829).
- FIG. 13 is a diagram illustrating an example of a divided value candidate 1121 and a divided value candidate 1122 of the data set sp130 (new first personal data) that is a divided second half of the data set sp110.
- the division value determination unit 3234 arranges the data records rp111 of the data set sp130 in the order of the attribute values of the attributes determined by the division attribute determination unit 3233.
- the division value determination unit 3234 extracts division value candidates.
- the division value determining unit 3234 extracts the division value candidate 1121 and the division value candidate 1122 as division value candidates.
- the division value determining unit 3234 calculates an information loss amount ILA-birth for each of the division value candidate 1121 and the division value candidate 1122.
- the division value determination unit 3234 has information loss amounts ILA-birth obtained by dividing the division value candidate 1121 and the division value candidate 1122 by “5.565” and “4. 820 ".
- the division value determination unit 3234 determines “the year of birth: 1963” of the division value candidate 1122 having the smallest information loss amount ILA-birth as the division value.
- the division value determination unit 3234 that has determined the division value transmits the determined division value “birth year: 1963” to the anonymity evaluation unit 3235 (step S830). In other words, the divided value determination unit 3234 requests the anonymity evaluation unit 3235 to evaluate anonymity.
- the anonymity evaluation part 3235 which received division value "birth year: 1963” performs anonymity evaluation (step S831).
- the anonymity evaluation unit 3235 counts the number of each data record rp111 divided by the division value “birth year: 1963”.
- FIG. 14 is a diagram illustrating an image in which the data set sp130 illustrated in FIG. 13 is divided when the division value determination unit 3234 determines the division value as the division value candidate 1222 “birth year: 1963”.
- the anonymity evaluation unit 3235 has six data records rp111 of the data set sp140 of the first half part after the division, and data records rp111 of the data set sp150 of the second half part after the division. Is counted as 5.
- the number of data records rp111 in each of the data set sp140 and the data set sp150 is less than 2k. Therefore, the anonymity evaluation unit 3235 transmits a generalization execution request including “birth year: 1961 to 1963” and a generalization execution request including “birth year: 1964 to 1977” to the generalization execution unit 3236 (step S813).
- the generalization execution unit 3236 Upon receiving the generalization execution request, the generalization execution unit 3236 receives the generalization of the data record rp111 having the attribute value “1961” to “1963” and the attribute value of the “birth year” attribute “1964”. ”To“ 1977 ”is generalized (step S814). As shown in FIG. 14, the data set sp140 is generalized to have an attribute value of “1961 to 1963” for an attribute of “birth year” and an attribute value of an attribute of “medical year” to “20062 to 201105”. In addition, the attribute value of the “birth year” attribute is generalized to “1964-1977”, and the attribute value of the “medical care month” attribute is generalized to “200706-201104” in the data set sp150.
- the generalization execution unit 3236 records the generalized data record rp111 in the anonymized personal data storage device 200 (step S815).
- the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be reduced.
- FIG. 15 is a block diagram showing a configuration of the anonymization device 330 according to the present embodiment.
- the anonymization apparatus 330 of this embodiment performs anonymization by a bottom-up approach.
- the anonymization device 330 includes a priority determination information storage unit 321, an information loss amount calculation unit 332, and an anonymization processing unit 333.
- the anonymization device 330 may be included in the system shown in FIG. 2 instead of the anonymization device 310.
- the priority determination information storage unit 321 stores information for determining the priority p.
- Information for determining the priority p is preset by the user of the system. Further, the information for determining the priority p may be received in advance from an external system by the division attribute determining unit 3233 via the communication unit 706 shown in FIG.
- FIG. 16 is a diagram illustrating an example of priority determination information 3310 stored in the priority determination information storage unit 321.
- the priority determination information 3310 includes one or more sets of priority order, attribute name, and threshold value.
- the priority order indicates, for example, the order in which the attribute specified by the corresponding attribute name is generalized. For example, when the value obtained by subtracting the information loss amount ILA of the higher priority attribute from the information loss amount ILA of the lower priority attribute exceeds the threshold, the lower priority attribute is generalized. Indicates the value when the order to be performed is first.
- the priority indicates that the smaller the number, the higher the priority. That is, in FIG. 16, the higher priority attribute is an attribute whose attribute name is “age”, and the lower priority attribute is an attribute whose attribute name is “2011 medical care month”.
- the priority determination information may include a set of priority and attribute name.
- the anonymization processing unit 333 may hold the threshold value in an internal storage unit (not shown), for example.
- the information loss amount calculation unit 332 calculates and outputs an information loss amount ILA due to generalization. For example, the information loss amount calculation unit 332 counts the number of different attribute values included in the attribute of the personal data, and sets this as the information loss amount ILA.
- the anonymization processing unit 333 includes a generalization attribute determination unit 3333, a generalization execution unit 3336, and an anonymity evaluation unit 3335.
- the generalization attribute determination unit 3333 determines an attribute to be generalized. For example, the generalization attribute determination unit 3333 determines an attribute to be generalized as follows. First, the generalized attribute determination unit 3333 calculates the information loss amount difference by subtracting the information loss amount ILA of the attribute having the higher priority from the information loss amount ILA of the attribute having the lower priority. Next, the generalization attribute determination unit 3333 compares the information loss amount difference with the threshold value of the attribute with the higher priority. Then, when the information loss amount difference is equal to or greater than the threshold, the generalization attribute determination unit 3333 determines to generalize the lower priority attribute. In addition, when the information loss amount difference is less than the threshold, the generalization attribute determination unit 3333 determines to generalize the attribute with the higher priority.
- the generalization execution unit 3336 generalizes the attribute determined by the generalization attribute determination unit 3333.
- the anonymity evaluation unit 3335 determines whether the data set generalized by the generalization execution unit 3336 satisfies the anonymity index.
- FIG. 17 is a diagram showing an example of a data set sp160 stored in the personal data storage device 100 of the present embodiment.
- Each of the data records rp161 of the data set sp160 illustrated in FIG. 17 includes attribute values of attributes of “name”, “age”, “2011 medical care month”, and “disease name”.
- “age” and “2011 medical care date” are set as anonymization attributes (quasi-identifiers).
- the anonymization device 330 described above may be realized by the computer 700 shown in FIG. 5 similarly to the anonymization device 310 shown in FIG.
- 18A, 18B, and 18C are sequence diagrams illustrating the operation of the anonymization device 330 according to the present embodiment.
- the generalization attribute determination unit 3333 receives an input of an anonymization execution request by a system user from the input unit 704 shown in FIG. 5, for example (step S841).
- the anonymization execution request includes, for example, the value of k-anonymization k (for example, “3”).
- the generalization attribute determination unit 3333 that has received the anonymization execution request transmits a priority determination information acquisition request to the priority determination information storage unit 321 (step S842).
- the generalized attribute determination unit 3333 that has received the priority determination information 3310 as a response to the priority determination information acquisition request transmits an information loss amount calculation request to the information loss amount calculation unit 332. (Step S843).
- the information loss amount calculation unit 332 that has received the information loss amount calculation request transmits a personal data acquisition request to the personal data storage device 100. (Step S844).
- the information loss amount calculation unit 332 that has received the data set sp160 as a response to the personal data acquisition request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalized attribute determination unit 3333. (Step S845).
- the information loss amount calculation unit 332 calculates the information loss amount ILA by the number of types of attribute values. That is, the information loss amount calculation unit 332 calculates the information loss amount ILAbirth of the “age” attribute as “12” because the attribute value of the “age” attribute has 12 types. Further, the information loss amount calculation unit 332 calculates the information loss amount ILAmc2011 of the attribute of “2011 medical care month” as “10” because there are ten types of attribute values of the attribute of “2011 medical care month”.
- the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S846).
- the generalization attribute determination unit 3333 uses the received priority determination information 3310 to determine the attribute to be generalized based on the received information loss amount ILA.
- generalization attribute determination unit 3333 may determine the attribute to be generalized using the method described in the first embodiment.
- the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name of the attribute determined to be generalized (in this case, “age”) to the generalization attribute execution unit (step S847).
- the generalization execution unit 3336 that has received the generalization execution request generalizes the data set sp160 shown in FIG. 17 as the data set sp162 shown in FIG. 19 (step S848).
- FIG. 19 is a diagram illustrating an example of a data set in the middle of anonymization processing (partially generalized) by the anonymization device 330 of the present embodiment.
- the generalization execution unit 3336 transmits an anonymity evaluation request including the data set sp162 to the anonymity evaluation unit 3335 (step S849).
- the generalization execution unit 3336 may store the data set sp162 in the storage unit 702 illustrated in FIG. 5 and transmit an anonymity evaluation request including the stored address to the anonymity evaluation unit 3335. The same applies to the anonymity evaluation request below.
- the anonymity evaluation unit 3335 that has received the anonymity evaluation request evaluates the anonymity of the data set sp162. In the case of the data set sp162 of FIG. 19, the anonymity evaluation unit 3335 determines that the value of “k-anonymity” (“3”) is not satisfied for the attribute “medical care month” (step S850).
- the anonymity evaluation unit 3335 transmits a generalization attribute determination request to the generalization attribute determination unit 3333 (step S851).
- the generalization attribute determination unit 3333 that has received the generalization attribute determination request transmits an information loss amount calculation request to the information loss amount calculation unit 332 (step S852).
- the information loss amount calculation unit 332 that has received the information loss amount calculation request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalization attribute determination unit 3333 (step S853).
- the types of attribute values of the attribute of “age” are four types of “21 to 24”, “31 to 40”, “41 to 51”, and “52 to 58”. It is.
- the information loss amount calculation unit 332 sets the information loss amount ILA-birth and the information loss amount ILA-mc2011 corresponding to the attributes of “age” and “2011 medical care month” to “4” and “10”, respectively. calculate.
- the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S854).
- the information loss amount ILA-birth of the attribute “age” with the priority “1” is “4”
- the information loss amount ILA-mc2011 of the attribute “2011 medical care month” with the priority “2” is “10”. Therefore, the information loss amount difference is as follows.
- the generalization attribute determination unit 3333 compares the difference in information loss amount (“6”) with the threshold value (“3”) of “age” that is the attribute having the priority “1”. In this case, since 6> 3, the generalization attribute determination unit 3333 determines to generalize “2011 medical care month” that is the attribute having the priority “2”.
- the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name determined to be generalized (in this case, “2011 medical care month”) to the generalization attribute execution unit (step S855).
- the generalization execution unit 3336 that received the generalization execution request generalizes the data set sp162 shown in FIG. 19 to the data set sp163 shown in FIG. 20 (step S856).
- FIG. 20 is a diagram illustrating an example of a data set in the middle of the anonymization process (partially generalized) by the anonymization apparatus 330 of the present embodiment.
- the generalization execution unit 3336 transmits an anonymity evaluation request including the data set sp163 to the anonymity evaluation unit 3335 (step S857).
- requirement evaluates the anonymity of the data set sp163.
- the anonymity evaluation unit 3335 has k-anonymity k values (“3” for the combination of the “medical care month” attribute and the “2011 medical care month” attribute. ]) Is not satisfied (step S858).
- the anonymity evaluation unit 3335 transmits a generalization attribute determination request to the generalization attribute determination unit 3333 (step S859).
- the generalization attribute determination unit 3333 that has received the generalization attribute determination request transmits an information loss amount calculation request to the information loss amount calculation unit 332 (step S860).
- the information loss amount calculation unit 332 that has received the information loss amount calculation request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalization attribute determination unit 3333 (step S861).
- the information loss amount calculation unit 332 calculates the information loss amount ILA-birth and the information loss amount ILA-mc2011 corresponding to the respective attributes of “age” and “2011 medical care month” as “4”. .
- the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S862).
- the information loss amount ILA-birth of the attribute “age” with the priority “1” is “4”
- the information loss amount ILA-mc2011 of the attribute “2011 medical care month” with the priority “2” is “4”. Therefore, the information loss amount difference is as follows.
- the generalization attribute determination unit 3333 compares the difference in information loss amount (“0”) with the threshold value (“3”) of “age” that is the attribute having the priority “1”. In this case, since 0 ⁇ 3, the generalization attribute determination unit 3333 determines to generalize the “age” that is the attribute having the priority “1”.
- the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name of the attribute determined to be generalized (in this case, “age”) to the generalization execution unit 3336 (step S863).
- the generalization execution unit 3336 that received the generalization execution request generalizes the data set sp163 shown in FIG. 20 to the data set sp164 shown in FIG. 21 (step S864).
- FIG. 21 is a diagram illustrating an example of a data set that has been anonymized by the anonymization device 330 of the present embodiment.
- the generalization execution unit 3336 transmits an anonymity evaluation request including the data set sp164 to the anonymity evaluation unit 3335 (step S865).
- requirement evaluates the anonymity of the data set sp164.
- the anonymity evaluation unit 3335 determines that the data set sp164 satisfies k-anonymity (step S866).
- the anonymity evaluation unit 3335 transmits the data set sp164 that satisfies the anonymity to the anonymized personal data storage device 200 (step S867).
- the anonymized personal data storage unit 2a that has received the data set sp164 stores the data set sp164 as an anonymized data set st120 (anonymized personal data).
- the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be made compatible with reduction.
- the reason is that the generalization attribute determination unit 3333 generates an evaluation value based on the priority order, the threshold value, and the information loss amount ILA, and determines an attribute to be generalized based on the generated evaluation value. .
- each component described in each of the above embodiments does not necessarily need to be an independent entity.
- each component may be realized as a module with a plurality of components.
- each component may be realized by a plurality of modules.
- Each component may be configured such that a certain component is a part of another component.
- Each component may be configured such that a part of a certain component overlaps a part of another component.
- each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement
- the program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up.
- the read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
- a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
- each of the embodiments described above it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.
- the specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation
- movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Economics (AREA)
- Primary Health Care (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
This invention provides an information processing device for anonymizing a dataset so as to conform to a utilization purpose. The information processing device calculates the data loss amount corresponding to each attribute included in a first piece of personal data to be anonymized, decides on the attribute to be manipulated on the basis of priorities corresponding to the respective attributes and the data loss amount, and generates and outputs a second piece of personal data in which the attribute value of the decided attribute of the first piece of personal data has been manipulated.
Description
本発明は、個人データを匿名化する情報処理装置、匿名化処理方法及びプログラムに関する。
The present invention relates to an information processing apparatus, anonymization processing method, and program for anonymizing personal data.
診療情報の電子化が進み、各医療機関や健康保険組合等においては診療情報が蓄積されつつある。これらの診療情報を広く活用することは、疫学研究の発展、医療技術及び新薬開発に繋がると考えられている。そこで、その蓄積された診療情報のプライバシーを確保した上で、その診療情報を研究機関等において利用できることが期待されている。
The digitization of medical information is progressing, and medical information is being accumulated in medical institutions and health insurance associations. Wide use of such medical information is considered to lead to the development of epidemiological research, medical technology and new drug development. Therefore, it is expected that the medical information can be used in research institutions while ensuring the privacy of the stored medical information.
情報の利用においてプライバシーを確保するための手法の一つに匿名化がある。この匿名化とは、例えば上述の診療情報のように、他人には知られたくない情報を含むデータに対して、個人の特定を防ぐための加工を施す技術である。以後、上述のような処理対象のデータの一塊をデータセットと呼ぶ。また、データセットを構成する、個人毎に対応するデータの塊を個人データレコードと呼ぶ。また、個人データレコードを構成する、個人の年齢、その個人が罹患した疾病名などの最小単位の情報を、属性と呼ぶ。
Anonymization is one of the methods for ensuring privacy in the use of information. This anonymization is a technique for performing processing for preventing identification of individuals on data including information that is not desired to be known to others, such as the above-described medical information. Hereinafter, a batch of data to be processed as described above is referred to as a data set. In addition, a lump of data corresponding to each individual constituting the data set is called a personal data record. Further, the minimum unit information such as the age of the individual and the name of the disease affected by the individual constituting the personal data record is called an attribute.
非特許文献1は、匿名化の代表的な技術の一つであるk-匿名化を開示する。このk-匿名化は、データセットに含まれる個人データレコードのそれぞれを、個人の特定確率が1/k(k-匿名化のk)以下となるように加工し、ある一定の匿名性を保証する技術である。k-匿名化における加工は、例えば、データセットを構成する複数の個人データレコード間で、特定の属性の値が同じになるように、その値を曖昧に(汎化とも呼ばれる)するなどの加工である。
Non-Patent Document 1 discloses k-anonymization, which is one of representative techniques for anonymization. In this k-anonymization, each individual data record included in the data set is processed so that the individual specific probability is 1 / k (k-k of anonymization) or less, and a certain anonymity is guaranteed. Technology. Processing in k-anonymization is, for example, processing such that the value of a specific attribute is made ambiguous (also called generalization) among a plurality of personal data records constituting a data set. It is.
汎化によるk-匿名化には、トップダウンアプローチとボトムアップアプローチとがある。トップダウンアプローチは、最も汎化された状態の個人データレコードに含まれる属性の値を、k-匿名性が破綻しない範囲で具体化していく手法である。また、ボトムアップアプローチは、加工されていない個人データレコードの元の値を、k-匿名性が確保されるように汎化していく手法である。
Generalization k-anonymization has a top-down approach and a bottom-up approach. The top-down approach is a method of embodying attribute values contained in the most generalized personal data record within a range where k-anonymity does not break down. The bottom-up approach is a method of generalizing the original values of unprocessed personal data records so as to ensure k-anonymity.
非特許文献2は、トップダウンアプローチの代表的な手法の1つを示す。非特許文献2に示される手法は、あるデータセットにおけるk-匿名性を満たすために、そのデータセットの個人データレコードを、次に示すように加工することにより匿名化する手法である。
Non-Patent Document 2 shows one of the typical top-down approaches. The method disclosed in Non-Patent Document 2 is a method for anonymizing a personal data record of a data set by processing it as follows in order to satisfy k-anonymity in a data set.
トップダウンアプローチでは、匿名化対象のデータセットにおいて、全ての個人データレコード間で、匿名化対象の全ての属性の値が属性毎に同じ値に汎化されている状態を初期状態とする。
In the top-down approach, the initial state is a state in which the values of all attributes to be anonymized are generalized to the same value for each attribute among all personal data records in the data set to be anonymized.
第1のステップは、その匿名化対象の属性の中から、ある属性を1つ選択する。
In the first step, one attribute is selected from the attributes to be anonymized.
第2のステップは、その全ての個人データレコードのそれぞれに含まれる、第1のステップで選択した属性の値の中央値を求める。
In the second step, the median value of the attribute values selected in the first step included in each of all the personal data records is obtained.
第3のステップは、その求めた中央値を境界として、それらの個人データレコードのその属性の値に基づいて、その個人データレコードを2つのグループに分ける。
The third step divides the personal data records into two groups based on the attribute values of the personal data records with the calculated median as a boundary.
第1のステップから第3のステップまでの処理をくりかえし、各グループの個人データレコードの数がk(k-匿名化或いはk-匿名性の「k」、以後同様)個を満たさなくなった時点で処理を終了とする。このように、最も汎化された初期状態から、ある値を境界として属性の値をグループ化することを分割という。尚、結果として出力されるグループは、それらのグループのそれぞれの個人データレコードの数がk個を満たさなくなる直前の、グループである。
When the process from the first step to the third step is repeated and the number of personal data records in each group does not satisfy k (k-anonymization or k-anonymity “k”, and so on) The process ends. In this way, grouping attribute values from a most generalized initial state with a certain value as a boundary is called division. Note that the group output as a result is the group immediately before the number of personal data records in each group does not satisfy k.
また、非特許文献3は、ボトムアップアプローチの代表的な手法の1つを示す。非特許文献3に示される手法は、あるデータセットにおいて、個人データがk-匿名性を満たすように、ある属性の値を元の値から汎化していくことにより匿名化する手法である。
In addition, Non-Patent Document 3 shows one of the representative methods of the bottom-up approach. The technique disclosed in Non-Patent Document 3 is an anonymization technique in which a certain attribute value is generalized from an original value so that personal data satisfies k-anonymity in a certain data set.
特許文献1は、k-匿名化を取り入れたデータ匿名化装置を開示する。特許文献1のデータ匿名化装置は、データセットを構成する全ての個人データレコードを辺で結んだ完全グラフを生成し、この完全グラフをクラスタに分割し、分割したクラスタ単位で属性を汎化する。こうして、そのデータ匿名化装置は、トップダウンアプローチによるk-匿名化を実現する。
Patent Document 1 discloses a data anonymization device incorporating k-anonymization. The data anonymization device of Patent Literature 1 generates a complete graph connecting all personal data records constituting a data set with edges, divides the complete graph into clusters, and generalizes attributes in divided cluster units. . Thus, the data anonymization device realizes k-anonymization by a top-down approach.
しかしながら上述した特許文献及び非特許文献に記載された技術においては、データセットを、利用目的に合致するようにして匿名化することができないという問題点がある。
However, the techniques described in the above-mentioned patent documents and non-patent documents have a problem that the data set cannot be anonymized so as to match the purpose of use.
その理由は、上述した特許文献及び非特許文献が開示する匿名化技術では、加工対象の属性が、利用目的を考慮されることなく、利用目的とは関係のない順番で選択され、加工されるからである。
[発明の目的]
本発明の目的は、上述した問題点を解決する情報処理装置、匿名化処理方法、及びプログラムを提供することにある。 The reason for this is that in the anonymization technology disclosed in the above-mentioned patent document and non-patent document, the attributes to be processed are selected and processed in an order not related to the purpose of use without considering the purpose of use. Because.
[Object of the invention]
The objective of this invention is providing the information processing apparatus, the anonymization processing method, and program which solve the problem mentioned above.
[発明の目的]
本発明の目的は、上述した問題点を解決する情報処理装置、匿名化処理方法、及びプログラムを提供することにある。 The reason for this is that in the anonymization technology disclosed in the above-mentioned patent document and non-patent document, the attributes to be processed are selected and processed in an order not related to the purpose of use without considering the purpose of use. Because.
[Object of the invention]
The objective of this invention is providing the information processing apparatus, the anonymization processing method, and program which solve the problem mentioned above.
本発明の情報処理装置は、匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力する情報損失量計算手段と、前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定し、前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する匿名化処理手段と、を含む。
The information processing apparatus of the present invention calculates an information loss amount corresponding to each attribute included in the first personal data to be anonymized, and outputs information loss amount calculation means, and priority corresponding to each of the attributes Anonymity to determine the attribute to be processed based on the degree and the amount of information loss, generate second personal data obtained by processing the attribute value of the determined attribute of the first personal data, and output the second personal data Processing means.
本発明の匿名化処理方法は、コンピュータが、匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力し、前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定し、前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する。
In the anonymization processing method of the present invention, the computer calculates and outputs an information loss amount corresponding to each of the attributes included in the first personal data to be anonymized, and the priority corresponding to each of the attributes The attribute to be processed is determined based on the information loss amount, and second personal data obtained by processing the attribute value of the determined attribute of the first personal data is generated and output.
本発明の不揮発性記録媒体は、匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力する処理と、前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定する処理と、前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する処理と、をコンピュータに実行させるプログラムを記録する。
The non-volatile recording medium of the present invention calculates the information loss amount corresponding to each of the attributes included in the first personal data to be anonymized, outputs the process, the priority corresponding to each of the attributes, and the Processing for determining the attribute to be processed based on the amount of information loss, and processing for generating and outputting second personal data obtained by processing the attribute value of the determined attribute of the first personal data A program for causing the computer to execute is recorded.
本発明は、データセットを、利用目的に合致するようにして匿名化することが可能になるという効果がある。
The present invention has the effect that the data set can be anonymized so as to match the purpose of use.
本発明を実施するための形態について図面を参照して詳細に説明する。尚、各図面及び明細書記載の各実施形態において、同様の機能を備える構成要素には同様の符号が与えられている。
Embodiments for carrying out the present invention will be described in detail with reference to the drawings. In addition, in each embodiment described in each drawing and specification, the same code | symbol is given to the component provided with the same function.
<<<第1の実施形態>>>
図1は、本発明の第1の実施形態に係る匿名化装置(情報処理装置とも呼ばれる)310の構成を示すブロック図である。 <<<< first embodiment >>>>
FIG. 1 is a block diagram showing a configuration of an anonymization device (also called an information processing device) 310 according to the first embodiment of the present invention.
図1は、本発明の第1の実施形態に係る匿名化装置(情報処理装置とも呼ばれる)310の構成を示すブロック図である。 <<<< first embodiment >>>>
FIG. 1 is a block diagram showing a configuration of an anonymization device (also called an information processing device) 310 according to the first embodiment of the present invention.
図1に示すように、本実施形態の匿名化装置310は、情報損失量計算部312及び匿名化処理部313を含む。図1に示す構成要素は、ハードウェア単位の構成要素でも、コンピュータ装置の機能単位に分割した構成要素でもよい。ここでは、図1に示す構成要素は、コンピュータ装置の機能単位に分割した構成要素として説明する。
As shown in FIG. 1, the anonymization device 310 of this embodiment includes an information loss amount calculation unit 312 and an anonymization processing unit 313. The constituent elements shown in FIG. 1 may be constituent elements in hardware units or constituent elements divided into functional units of a computer device. Here, the components shown in FIG. 1 will be described as components divided into functional units of the computer apparatus.
図2は、本発明の第1の実施形態に係る匿名化装置310を含むシステムの構成を示すブロック図である。
FIG. 2 is a block diagram showing a configuration of a system including the anonymization device 310 according to the first embodiment of the present invention.
図2に示すように、そのシステムは、個人データ記憶装置100と、匿名化済個人データ記憶装置200と、匿名化装置310とから構成されている。
As shown in FIG. 2, the system includes a personal data storage device 100, an anonymized personal data storage device 200, and an anonymization device 310.
個人データ記憶装置100は、匿名化対象の個人データ(第1の個人データ)である、データセット(以後、データセットspと呼ぶ)を記憶する。尚、データセットspは、データレコード(以後、データレコードrpと呼ぶ)の集合である。そして、データレコードrpは、特定の個人に対応する、複数の属性の属性値を含む。
The personal data storage device 100 stores a data set (hereinafter referred to as a data set sp) that is personal data to be anonymized (first personal data). The data set sp is a set of data records (hereinafter referred to as data records rp). The data record rp includes attribute values of a plurality of attributes corresponding to a specific individual.
個人データは、例えば、医療機関などが保有する診療情報である。この場合、データレコードrpに含まれる属性値は、氏名、生年、診療年月、病名などの属性の属性値である。
Personal data is, for example, medical information held by medical institutions. In this case, the attribute values included in the data record rp are attribute values of attributes such as name, date of birth, date of medical care, and disease name.
図3は、個人データ記憶装置100に記憶される個人データであるデータセットsp110の一例を示す図である。データセットsp110は、複数のデータレコードrp111を含む。
FIG. 3 is a diagram illustrating an example of a data set sp110 that is personal data stored in the personal data storage device 100. The data set sp110 includes a plurality of data records rp111.
データレコードrp111は、「氏名」、「生年」、「診療年月」及び「病名」の属性のそれぞれの属性値を含む。ここで、「氏名」の属性は、識別子である。「生年」及び「診療年月」の属性は、組み合わせると個人を特定する恐れがある準識別子である。また、「病名」の属性は、他人には知られたくないセンシティブ情報である。尚、これらの準識別子或いはセンシティブ情報とした属性は、一例である。即ち、匿名化装置310において、データセットsp110に含まれる属性の内、いずれの属性を準識別子或いはセンシティブ情報として扱うかは、任意である。
The data record rp111 includes attribute values of “name”, “birth year”, “care date”, and “disease name”. Here, the attribute of “name” is an identifier. The attributes of “birth year” and “medical care date” are quasi-identifiers that, when combined, may identify an individual. The attribute of “disease name” is sensitive information that is not desired to be known to others. These attributes used as quasi-identifiers or sensitive information are examples. That is, in the anonymization apparatus 310, it is arbitrary which attribute among the attributes included in the data set sp110 is treated as a quasi-identifier or sensitive information.
図3に示すように、例えば、氏名の属性が「patientA」のデータレコードrp111は、生年の属性として「1949」、診療年月の属性として「201006」、傷病名の属性として「DiseaseA」の属性値を含む。図3に示す個人データのデータセットsp110は、一例であって、データレコードrp111に、準識別子及びセンシティブ情報のそれぞれとして任意の属性の属性値を含んでよい。
As shown in FIG. 3, for example, a data record rp111 having a name attribute “patientA” has an attribute of “1949” as an attribute of birth year, “201006” as an attribute of medical treatment date, and an attribute of “DiseaseA” as an attribute of wound name Contains a value. The personal data set sp110 shown in FIG. 3 is an example, and the data record rp111 may include attribute values of arbitrary attributes as quasi-identifiers and sensitive information, respectively.
匿名化済個人データ記憶装置200は、匿名化済み個人データ(第2の個人データ)である匿名化済データセット(以後、匿名化済データセットsaと呼ぶ)を記憶する。尚、匿名化済データセットsaは、データレコードrp111が匿名化された匿名化済データレコード(以後、匿名化済データレコードraと呼ぶ)の集合である。
The anonymized personal data storage device 200 stores an anonymized data set (hereinafter referred to as anonymized data set sa) that is anonymized personal data (second personal data). The anonymized data set sa is a set of anonymized data records (hereinafter referred to as anonymized data records ra) in which the data record rp111 is anonymized.
図4は、匿名化済個人データ記憶装置200に記憶される匿名化済個人データである匿名化済データセットsa210の一例を示す図である。匿名化済データセットsa210は、匿名化装置310によりデータセットsp110の属性値が加工(例えば、汎化)されて匿名化された後の、データセットである。即ち、匿名化済データセットsa210は、データレコードrp111に替えて、データレコードrp111が加工された匿名化済データレコードra211を含む。
FIG. 4 is a diagram illustrating an example of the anonymized data set sa210 that is anonymized personal data stored in the anonymized personal data storage device 200. The anonymized data set sa210 is a data set after the attribute value of the data set sp110 is processed (for example, generalized) by the anonymization device 310 to be anonymized. That is, the anonymized data set sa210 includes an anonymized data record ra211 obtained by processing the data record rp111 instead of the data record rp111.
図4に示す匿名化済データセットsa210は、「生年」「診療年月」が加工(汎化)され、データセットsp110に比べて準識別子の属性値が曖昧化されている。
In the anonymized data set sa210 shown in FIG. 4, the “birth year” and “medical care date” are processed (generalized), and the attribute value of the quasi-identifier is obscured compared to the data set sp110.
情報損失量計算部312は、データセットsp110における、属性のそれぞれに対応する情報損失量(以後、情報損失量ILAと呼ぶ)を算出し、出力する。
The information loss amount calculation unit 312 calculates and outputs an information loss amount (hereinafter referred to as an information loss amount ILA) corresponding to each attribute in the data set sp110.
ここで、情報損失量ILAは、データセットsp110に含まれるいずれかの属性を加工した場合に増加する、属性の情報の抽象性(以後、情報抽象性iaと呼ぶ)の量である。尚、情報抽象性iaは、属性の情報、即ちその属性の属性値、の抽象性である。
Here, the information loss amount ILA is the amount of attribute information abstraction (hereinafter referred to as information abstraction ia) that increases when any attribute included in the data set sp110 is processed. The information abstraction ia is the abstraction of attribute information, that is, the attribute value of the attribute.
情報損失量計算部312は、必要に応じて、以下に示す様々な方法を用いて情報損失量ILAを算出してよい。
The information loss amount calculation unit 312 may calculate the information loss amount ILA using various methods described below, as necessary.
例えば、第1の方法として、情報損失量計算部312は、ある属性について、その属性の汎化後の属性値の範囲を、同じ属性の汎化前の属性値の範囲で除して1つのデータレコードの情報の損失量(以後、情報損失量ILRと呼ぶ)を算出する。次に、情報損失量計算部312は、データレコードの個数分だけ情報損失量ILRを加算し、情報損失量ILAを算出する。
For example, as a first method, the information loss amount calculation unit 312 divides the range of attribute values after generalization of an attribute by the range of attribute values before generalization of the same attribute for one attribute. The amount of information loss in the data record (hereinafter referred to as information loss amount ILR) is calculated. Next, the information loss amount calculation unit 312 adds the information loss amount ILR by the number of data records to calculate the information loss amount ILA.
具体的には、情報損失量計算部312は、図3に示すデータセットsp110が最大に汎化されるとした場合の、匿名化対象の各属性の情報損失量ILAを以下のように算出する。
Specifically, the information loss amount calculation unit 312 calculates the information loss amount ILA of each attribute to be anonymized when the data set sp110 illustrated in FIG. .
この場合、汎化前と汎化後とのそれぞれの匿名化対象の属性の属性値の範囲は、同一である。従って、情報損失量計算部312は、1つの属性値に対する情報損失量ILRを、常に「1」と算出する。
In this case, the attribute value ranges of the anonymization target attributes before and after generalization are the same. Therefore, the information loss amount calculation unit 312 always calculates the information loss amount ILR for one attribute value as “1”.
次に、情報損失量計算部312は、データレコードrp111の個数(20個)分の情報損失量ILRを加算し、情報損失量ILAとして「20」を算出する。こうして、情報損失量計算部312は、データセットsp110のいずれの属性についても、情報損失量ILAとして「20」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILR corresponding to the number (20) of the data records rp111, and calculates “20” as the information loss amount ILA. In this way, the information loss amount calculation unit 312 calculates “20” as the information loss amount ILA for any attribute of the data set sp110.
また、情報損失量計算部312は、図3に示すデータセットsp110の「生年」の属性の属性値が「1956」以下と「1961」以上とのデータレコードrp111のそれぞれに分割されて、汎化されるとした場合の、匿名化対象の各属性の情報損失量ILAを以下のように算出する。
Further, the information loss amount calculation unit 312 divides the attribute value of the “birth year” attribute of the data set sp110 illustrated in FIG. 3 into data records rp111 of “1956” or less and “1961” or more, and generalizes them. If it is assumed, the information loss amount ILA of each attribute to be anonymized is calculated as follows.
この場合、汎化前の「生年」の属性の属性値の範囲は、最小値が「1943」であり、最大値が「1977」である。また、汎化後(汎化される「生年」の属性の属性値が「1956」以下のデータレコードrp111)の「生年」の属性の属性値の範囲は、最小値が「1943」であり、最大値が「1956」である。
In this case, regarding the range of the attribute value of the “birth year” attribute before generalization, the minimum value is “1943” and the maximum value is “1977”. Further, the attribute value range of the “birth year” attribute after generalization (the data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1956” or less) has a minimum value of “1943”, The maximum value is “1956”.
従って、情報損失量計算部312は、「生年」の属性の属性値が「1956」以下の「生年」の属性の情報損失量ILR-birth-ul1956を、以下のように算出する。
Therefore, the information loss amount calculation unit 312 calculates the information loss amount ILR-birth-ul 1956 of the attribute of “birth year” whose attribute value of “birth year” is “1956” or less as follows.
「(1956-1943)÷(1977-1943)=0.382」
次に、情報損失量計算部312は、「生年」の属性の属性値が「1956」以下のデータレコードrp111の個数(9個)分の情報損失量ILR-birth-ul1956を加算し、「生年」の属性の属性値が「1956」以下のデータレコードrp111の情報損失量ILA-birth-ul1956として「3.438」を算出する。 “(1956-1943) ÷ (1977-1943) = 0.382”
Next, the information lossamount calculation unit 312 adds the information loss amount ILR-birth-ul 1956 corresponding to the number (nine) of data records rp111 whose attribute value of the “birth year” attribute is “1956” or less. "3.438" is calculated as the information loss amount ILA-birth-ul 1956 of the data record rp111 whose attribute value is "1956" or less.
次に、情報損失量計算部312は、「生年」の属性の属性値が「1956」以下のデータレコードrp111の個数(9個)分の情報損失量ILR-birth-ul1956を加算し、「生年」の属性の属性値が「1956」以下のデータレコードrp111の情報損失量ILA-birth-ul1956として「3.438」を算出する。 “(1956-1943) ÷ (1977-1943) = 0.382”
Next, the information loss
また、汎化後(汎化される「生年」の属性の属性値が「1961」以上のデータレコードrp111)の「生年」の属性の属性値の範囲は、最小値が「1961」であり、最大値が「1977」である。従って、情報損失量計算部312は、「生年」の属性の属性値が「1961」以上の「生年」の属性の情報損失量ILR-birth-ov1961を、「(1977-1961)÷(1977-1943)=0.471」と算出する。
Further, the attribute value range of the “birth year” attribute after generalization (the data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1961” or more) has a minimum value of “1961”. The maximum value is “1977”. Accordingly, the information loss amount calculation unit 312 calculates the information loss amount ILR-birth-ov1961 of the attribute of “birth year” with an attribute value of “1961” or more as “(1977-1961) ÷ (1977−). 1943) = 0.471 ”.
次に、情報損失量計算部312は、「生年」の属性の属性値が「1961」以上のデータレコードrp111の個数(11個)分の情報損失量ILR-birth-ov1961を加算し、「生年」の属性の属性値が「1961」以上のデータレコードrp111の情報損失量ILA-birth-ov1961として「5.181」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILR-birth-ov1961 corresponding to the number (11) of data records rp111 whose attribute value of the “birth year” attribute is “1961” or more. "5.181" is calculated as the information loss amount ILA-first-ov1961 of the data record rp111 having an attribute value of "1961" or more.
次に、情報損失量計算部312は、情報損失量ILA-birth-ul1956と情報損失量ILA-birth-ov1961とを加算して、属性が「生年」の情報損失量ILA-birthとして「8.619」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILA-birth-ul 1956 and the information loss amount ILA-birth-ov1961, and sets the information loss amount ILA-birth with the attribute “birth year” as “8. 619 "is calculated.
同様に、汎化前の「診療年月」の属性の属性値の範囲は、最小値が「200512」であり、最大値が「201107」である。また、汎化後(汎化される「生年」の属性の属性値が「1956」以下のデータレコードrp111)の「診療年月」の属性の属性値の範囲は、最小値が「200512」であり、最大値が「201107」である。従って、情報損失量計算部312は、「生年」の属性の属性値が「1956」以下のデータレコードrp111の、「診療年月」の属性の情報損失量ILR-mc-ul1956を、「1」と算出する。
Similarly, the attribute value range of the “medical care date” attribute before generalization has a minimum value of “200512” and a maximum value of “201107”. Also, the attribute value range of the “medical year” attribute after generalization (data record rp111 whose generalized “birth year” attribute value is “1956” or less) has a minimum value of “200512”. Yes, the maximum value is “201107”. Therefore, the information loss amount calculation unit 312 sets the information loss amount ILR-mc-ul 1956 of the attribute “medical year” of the data record rp111 whose attribute value of the “birth year” attribute is “1956” or less to “1”. And calculate.
次に、情報損失量計算部312は、「生年」の属性の属性値が「1956」以下のデータレコードrp111の個数(9個)分だけ情報損失量ILR-mc-ul1956を加算し、「生年」の属性の属性値が「1956」以下のデータレコードrp111の「診療年月」の属性の情報損失量ILA-mc-ul1956として「9」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILR-mc-ul 1956 by the number (nine) of data records rp111 whose attribute value of the “birth year” attribute is “1956” or less. "9" is calculated as the information loss amount ILA-mc-ul 1956 of the attribute of "medical care date" of the data record rp111 having the attribute value of "1956" or less.
また、汎化後(汎化される「生年」の属性の属性値が「1961」以上のデータレコードrp111)の「診療年月」の属性の属性値の範囲は、最小値が「200612」であり、最大値が「201107」である。従って、情報損失量計算部312は、「生年」の属性の属性値が「1961」以上のデータレコードrp111の、「診療年月」の属性の情報損失量ILR-mc-ov1961を、「0.832」と算出する。
Also, the attribute value range of the “medical year” attribute after the generalization (data record rp111 whose attribute value of the “birth year” attribute to be generalized is “1961” or more) is “20000612” as the minimum value range. Yes, the maximum value is “201107”. Therefore, the information loss amount calculation unit 312 sets the information loss amount ILR-mc-ov1961 of the attribute “medical year” of the data record rp111 having the attribute value of “birth year” of “1961” or more to “0. 832 ".
次に、情報損失量計算部312は、「生年」の属性の属性値が「1961」以上のデータレコードrp111の個数(11個)分だけ情報損失量ILR-mc-ov1961を加算し、「生年」の属性の属性値が「1961」以上のデータレコードrp111の「診療年月」の属性の情報損失量ILA-mc-ov1961として「9.152」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILR-mc-ov1961 by the number (11) of data records rp111 whose attribute value of the “birth year” attribute is “1961” or more. "9.152" is calculated as the information loss amount ILA-mc-ov1961 of the attribute of "medical care date" of the data record rp111 having an attribute value of "1961" or more.
次に、情報損失量計算部312は、情報損失量ILA-mc-ul1956と情報損失量ILA-mc-ov1961とを加算し、属性が「診療年月」の情報損失量ILA-mcとして「18.152」を算出する。
Next, the information loss amount calculation unit 312 adds the information loss amount ILA-mc-ul 1956 and the information loss amount ILA-mc-ov1961 to obtain “18 as an information loss amount ILA-mc whose attribute is“ medical date ”. .152 ".
以上が、第1の方法の説明である。
The above is the description of the first method.
また、第2の方法として、情報損失量計算部312は、以下のようにして情報損失量ILAを算出してもよい。まず、情報損失量計算部312は、汎化後と汎化前とのそれぞれの、その属性の属性値の種類の数の比を、1つのデータレコードの情報損失量ILRとして算出する。次に、情報損失量計算部312は、データレコードの個数分だけ情報損失量ILRを加算し、情報損失量ILAを算出する。
As a second method, the information loss amount calculation unit 312 may calculate the information loss amount ILA as follows. First, the information loss amount calculation unit 312 calculates the ratio of the number of attribute value types of the attribute after generalization and before generalization as the information loss amount ILR of one data record. Next, the information loss amount calculation unit 312 adds the information loss amount ILR by the number of data records to calculate the information loss amount ILA.
匿名化処理部313は、図示しない手段(例えば、匿名化処理部313内の図示しない記憶手段)に記憶されている優先度決定情報に基づいて、属性のそれぞれの優先度(以後、優先度pと呼ぶ)を決定する。また、匿名化処理部313は、その優先度pと情報損失量計算部312が算出した情報損失量ILAとに基づいて、加工する属性を決定する。即ち、匿名化処理部313は、優先度pを用いることで利用目的を考慮し、かつ情報損失量ILAを用いることで匿名化済データセットsa210全体の情報の損失を低減するように、加工する属性を決定する。
Based on the priority determination information stored in a means (not shown) (for example, a storage means (not shown) in the anonymization processing unit 313), the anonymization processing unit 313 determines the priority of each attribute (hereinafter, priority p). Called). Further, the anonymization processing unit 313 determines an attribute to be processed based on the priority p and the information loss amount ILA calculated by the information loss amount calculation unit 312. In other words, the anonymization processing unit 313 performs processing so as to reduce the loss of information in the entire anonymized data set sa210 by using the priority p and considering the purpose of use, and using the information loss amount ILA. Determine the attributes.
ここで、優先度決定情報は、優先度pを決定する情報である。優先度pは、データセットsp110(データレコードrp111)に含まれる属性のそれぞれが持っている情報抽象性iaを増加させないようにする(情報の損失を優先的に防ぐ)度合いを示す情報である。即ち、優先度pは、匿名化済データセットsa210におけるデータセットsp110に対する情報抽象性iaの増加を、複数の属性のいずれについて、より少なくするように匿名化するかの優先度を示す。
Here, the priority determination information is information for determining the priority p. The priority p is information indicating the degree of preventing the information abstraction ia possessed by each attribute included in the data set sp110 (data record rp111) from increasing (preventing loss of information preferentially). That is, the priority p indicates the priority of anonymization so that the increase in the information abstraction ia for the data set sp110 in the anonymized data set sa210 is made smaller for any of a plurality of attributes.
例えば、匿名化処理部313は、属性毎に優先度pと情報損失量ILAとを演算(例えば、乗算)した評価値を算出する。尚、匿名化処理部313は、図示しない手段から、特定の優先度pと特定の情報損失量ILAとの組み合わせに対応する評価値を取得するようにしてもよい。尚、評価値を算出する演算は、情報損失量ILAが一定ならば優先度pが高いほど、評価値を大きく算出する演算である。また、評価値を算出する演算は、優先度pが一定ならば情報損失量ILAが大きいほど評価値を大きく算出する演算である。これは、特定の優先度pと特定の情報損失量ILAとの組み合わせに対応する評価値を取得する場合も同様である。
For example, the anonymization processing unit 313 calculates an evaluation value obtained by calculating (for example, multiplying) the priority p and the information loss amount ILA for each attribute. The anonymization processing unit 313 may acquire an evaluation value corresponding to a combination of the specific priority p and the specific information loss amount ILA from a unit (not shown). The calculation for calculating the evaluation value is a calculation for calculating the evaluation value larger as the priority p is higher if the information loss amount ILA is constant. The calculation for calculating the evaluation value is a calculation for calculating the evaluation value larger as the information loss amount ILA is larger if the priority p is constant. The same applies to the case where an evaluation value corresponding to a combination of a specific priority p and a specific information loss amount ILA is acquired.
続けて、匿名化処理部313は、例えば、評価値が大きい属性ほど汎化されないように、評価値が小さい属性ほど汎化されるように、汎化する属性を決定する。
Subsequently, for example, the anonymization processing unit 313 determines an attribute to be generalized so that an attribute with a smaller evaluation value is generalized so that an attribute with a larger evaluation value is not generalized.
尚、匿名化処理部313は、評価値が小さい属性ほど汎化されないように、評価値が大きい属性ほど汎化されるように、汎化する属性を決定するようにしてもよい。この場合、評価値を算出する演算は、情報損失量ILAが一定ならば優先度pが高いほど、及び優先度pが一定ならば情報損失量ILAが大きいほど評価値を小さく算出する演算である。このような演算であることは、特定の優先度pと特定の情報損失量ILAとの組み合わせに対応する評価値を取得する場合の演算についても同様である。
Note that the anonymization processing unit 313 may determine an attribute to be generalized so that an attribute with a larger evaluation value is generalized so that an attribute with a smaller evaluation value is not generalized. In this case, the calculation for calculating the evaluation value is a calculation for calculating a smaller evaluation value as the priority p is higher if the information loss amount ILA is constant, and as the information loss amount ILA is larger if the priority p is constant. . The same applies to the calculation when obtaining an evaluation value corresponding to the combination of the specific priority p and the specific information loss amount ILA.
次に、匿名化処理部313は、データセットsp110のその決定した属性を加工した匿名化済データセットsa210を生成し、出力する。尚、匿名化処理部313は、データセットsp110に対する匿名化済データセットsa210の差分の情報を生成し、出力するようにしてもよい。
Next, the anonymization processing unit 313 generates and outputs an anonymized data set sa210 obtained by processing the determined attribute of the data set sp110. Note that the anonymization processing unit 313 may generate and output information on the difference of the anonymized data set sa210 with respect to the data set sp110.
尚、匿名化処理部313は、加工済みデータセットの匿名性を評価するようにしてもよい。ここで、加工済みデータセットは、それらの属性を加工した場合の、データセットの一部分及び全体のいずれかの任意のものである。続けて、匿名化処理部313は、その匿名性を評価した結果が所定の内容である場合に、その加工済みデータセットを、匿名化済みのデータセット一部分及び全体のいずれかの任意のものとして、匿名化済個人データ記憶装置200に記録するようにしてもよい。
The anonymization processing unit 313 may evaluate anonymity of the processed data set. Here, the processed data set is any one of a part and the whole of the data set when those attributes are processed. Subsequently, when the result of evaluating the anonymity is a predetermined content, the anonymization processing unit 313 treats the processed data set as any one of the anonymized data set part and the whole. The anonymized personal data storage device 200 may be recorded.
以上が、匿名化装置310のコンピュータ装置の機能単位に分割した各構成要素についての説明である。
This completes the description of each component divided into functional units of the computer device of the anonymization device 310.
次に、匿名化装置310のハードウェア単位の構成要素について説明する。
Next, components of the anonymization device 310 in units of hardware will be described.
図5は、本実施形態における匿名化装置310を実現するコンピュータ700のハードウェア構成を示す図である。
FIG. 5 is a diagram illustrating a hardware configuration of a computer 700 that realizes the anonymization apparatus 310 according to the present embodiment.
図5に示すように、コンピュータ700は、CPU(Central Processing Unit)701、記憶部702、記憶装置703、入力部704、出力部705及び通信部706を含む。更に、コンピュータ700は、外部から供給される記録媒体(または記憶媒体)707を含む。記録媒体707は、情報を非一時的に記憶する不揮発性記録媒体であってもよい。
As shown in FIG. 5, the computer 700 includes a CPU (Central Processing Unit) 701, a storage unit 702, a storage device 703, an input unit 704, an output unit 705, and a communication unit 706. Furthermore, the computer 700 includes a recording medium (or storage medium) 707 supplied from the outside. The recording medium 707 may be a non-volatile recording medium that stores information non-temporarily.
CPU701は、オペレーティングシステム(不図示)を動作させて、コンピュータ700の、全体の動作を制御する。また、CPU701は、例えば記憶装置703に装着された記録媒体707から、プログラムやデータを読み込み、読み込んだプログラムやデータを記憶部702に書き込む。ここで、そのプログラムは、例えば、後述の図6に示すフローチャートの動作をコンピュータ700に実行させるプログラムである。
The CPU 701 controls the overall operation of the computer 700 by operating an operating system (not shown). The CPU 701 reads a program and data from a recording medium 707 mounted on the storage device 703, for example, and writes the read program and data to the storage unit 702. Here, the program is, for example, a program that causes the computer 700 to execute an operation of a flowchart shown in FIG.
そして、CPU701は、読み込んだプログラムに従って、また読み込んだデータに基づいて、図1に示す情報損失量計算部312及び匿名化処理部313として各種の処理を実行する。
The CPU 701 executes various processes as the information loss amount calculation unit 312 and the anonymization processing unit 313 shown in FIG. 1 according to the read program and based on the read data.
尚、CPU701は、通信網(不図示)に接続されている外部コンピュータ(不図示)から、記憶部702にプログラムやデータをダウンロードするようにしてもよい。
Note that the CPU 701 may download a program or data to the storage unit 702 from an external computer (not shown) connected to a communication network (not shown).
記憶部702は、プログラムやデータを記憶する。記憶部702は、個人データ記憶装置100及び匿名化済個人データ記憶装置200を含んでもよい。
The storage unit 702 stores programs and data. The storage unit 702 may include the personal data storage device 100 and the anonymized personal data storage device 200.
記憶装置703は、例えば、光ディスク、フレキシブルディスク、磁気光ディスク、外付けハードディスク及び半導体メモリであって、記録媒体707を含む。記憶装置703は、プログラムをコンピュータ読み取り可能に記録する。また、記憶装置703は、データをコンピュータ読み取り可能に記録してもよい。記憶装置703は、個人データ記憶装置100及び匿名化済個人データ記憶装置200を含んでもよい。
The storage device 703 is, for example, an optical disk, a flexible disk, a magnetic optical disk, an external hard disk, and a semiconductor memory, and includes a recording medium 707. The storage device 703 records the program so that it can be read by a computer. Further, the storage device 703 may record data so as to be readable by a computer. The storage device 703 may include a personal data storage device 100 and an anonymized personal data storage device 200.
入力部704は、例えばマウスやキーボード、内蔵のキーボタンなどで実現され、入力操作に用いられる。入力部704は、マウスやキーボード、内蔵のキーボタンに限らず、例えばタッチパネル、加速度計、ジャイロセンサ、カメラなどでもよい。
The input unit 704 is realized by, for example, a mouse, a keyboard, a built-in key button, and the like, and is used for an input operation. The input unit 704 is not limited to a mouse, a keyboard, and a built-in key button, and may be a touch panel, an accelerometer, a gyro sensor, a camera, or the like.
出力部705は、例えばディスプレイで実現され、出力を確認するために用いられる。
The output unit 705 is realized by a display, for example, and is used for confirming the output.
通信部706は、個人データ記憶装置100や匿名化済個人データ記憶装置200及び他の図示しない外部装置とのインタフェースを実現する。通信部706は、匿名化処理部313の一部として含まれる。
The communication unit 706 implements an interface with the personal data storage device 100, the anonymized personal data storage device 200, and other external devices (not shown). The communication unit 706 is included as part of the anonymization processing unit 313.
以上説明したように、図1に示す匿名化装置310の機能単位のブロックは、図5に示すハードウェア構成のコンピュータ700によって実現される。但し、コンピュータ700が備える各部の実現手段は、上記に限定されない。すなわち、コンピュータ700は、物理的に結合した1つの装置により実現されてもよいし、物理的に分離した2つ以上の装置を有線または無線で接続し、これら複数の装置により実現されてもよい。
As described above, the functional unit block of the anonymization device 310 shown in FIG. 1 is realized by the computer 700 having the hardware configuration shown in FIG. However, the means for realizing each unit included in the computer 700 is not limited to the above. In other words, the computer 700 may be realized by one physically coupled device, or may be realized by two or more physically separated devices connected by wire or wirelessly and by a plurality of these devices. .
尚、上述のプログラムのコードを記録した記録媒体707が、コンピュータ700に供給され、CPU701は、記録媒体707に格納されたプログラムのコードを読み出して実行するようにしてもよい。或いは、CPU701は、記録媒体707に格納されたプログラムのコードを、記憶部702、記憶装置703またはその両方に格納するようにしてもよい。すなわち、本実施形態は、コンピュータ700(CPU701)が実行するプログラム(ソフトウェア)を、一時的にまたは非一時的に、記憶する記録媒体707の実施形態を含む。
Note that the recording medium 707 in which the above-described program code is recorded may be supplied to the computer 700, and the CPU 701 may read and execute the program code stored in the recording medium 707. Alternatively, the CPU 701 may store the code of the program stored in the recording medium 707 in the storage unit 702, the storage device 703, or both. That is, the present embodiment includes an embodiment of a recording medium 707 that stores a program (software) executed by the computer 700 (CPU 701) temporarily or non-temporarily.
以上が、本実施形態における匿名化装置310を実現するコンピュータ700の、ハードウェア単位の各構成要素についての説明である。
This completes the description of each component in hardware units of the computer 700 that implements the anonymization device 310 in the present embodiment.
次に、図1~図6を参照して本実施形態の動作について詳細に説明する。
Next, the operation of this embodiment will be described in detail with reference to FIGS.
図6は、本実施形態における匿名化装置310の動作を示すフローチャートである。
FIG. 6 is a flowchart showing the operation of the anonymization device 310 in this embodiment.
情報損失量計算部312は、データセットsp110の匿名化対象の属性のそれぞれについて、情報損失量ILAを算出する(ステップS601)。
The information loss amount calculation unit 312 calculates the information loss amount ILA for each anonymization target attribute of the data set sp110 (step S601).
次に、匿名化処理部313は、優先度pを決定する情報に基づいて、属性のそれぞれの優先度pを決定する(ステップS602)。
Next, the anonymization processing unit 313 determines the priority p of each attribute based on the information for determining the priority p (step S602).
次に、匿名化処理部313は、情報損失量ILAと優先度pとに基づいて、加工する属性を決定する(ステップS603)。
Next, the anonymization processing unit 313 determines an attribute to be processed based on the information loss amount ILA and the priority p (step S603).
次に、匿名化処理部313は、データレコードrp111の、決定した属性を加工する(ステップS604)。
Next, the anonymization processing unit 313 processes the determined attribute of the data record rp111 (step S604).
次に、匿名化処理部313は、属性を加工したデータレコードrp111を出力する(ステップS605)。
Next, the anonymization processing unit 313 outputs the data record rp111 in which the attribute is processed (step S605).
上述した本実施形態における第1の効果は、利用目的に合致するように制御して、データセットを匿名化することが可能になる点である。
The first effect of the present embodiment described above is that the data set can be anonymized by controlling to match the purpose of use.
その理由は、以下のような構成を含むからである。即ち、第1に、情報損失量計算部312が属性のそれぞれに対応する情報損失量ILAを算出し、出力する。第2に、匿名化処理部313が優先度pと情報損失量ILAとに基づいて加工する属性を決定し、決定した属性を加工する。
The reason is that the following configuration is included. That is, first, the information loss amount calculation unit 312 calculates and outputs an information loss amount ILA corresponding to each attribute. Second, the anonymization processing unit 313 determines an attribute to be processed based on the priority p and the information loss amount ILA, and processes the determined attribute.
上述した本実施形態における第2の効果は、第1の効果に加えて匿名化されたデータセットにおける情報の損失を低減することが可能になる点である。即ち、この第2の効果は、利用目的に合致するように制御してデータセットを匿名化することと、匿名化されたデータセットにおける情報の損失を低減することとが両立して可能になる点である。これは、利用目的に合致することだけを考慮して匿名化がなされることにより、加工を抑制した属性以外の属性が過度に汎化され、データ全体として大きく情報が損失することを防ぐことが可能になるということである。
The second effect of the present embodiment described above is that it is possible to reduce the loss of information in the anonymized data set in addition to the first effect. In other words, this second effect enables both anonymizing the data set by controlling it to match the purpose of use and reducing the loss of information in the anonymized data set. Is a point. This is because anonymization is performed only considering that it matches the purpose of use, so that it is possible to prevent the general loss of attributes other than the attribute that suppresses processing and the loss of information as a whole as a whole. It will be possible.
その理由は、第1の効果と同様である。即ち、匿名化処理部313が優先度pと情報損失量ILAとの両方に基づいて加工する属性を決定するようにしたからである。
<<<第2の実施形態>>>
次に、本発明の第2の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。 The reason is the same as the first effect. That is, theanonymization processing unit 313 determines the attribute to be processed based on both the priority p and the information loss amount ILA.
<<< Second Embodiment >>>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
<<<第2の実施形態>>>
次に、本発明の第2の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。 The reason is the same as the first effect. That is, the
<<< Second Embodiment >>>
Next, a second embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
図7は、本実施形態に係る匿名化装置320の構成を示すブロック図である。本実施形態の匿名化装置320は、トップダウンアプローチにより匿名化を行う。
FIG. 7 is a block diagram showing a configuration of the anonymization device 320 according to the present embodiment. The anonymization apparatus 320 of this embodiment performs anonymization by a top-down approach.
図7に示すように、匿名化装置320は、優先度決定情報記憶部321と情報損失量計算部322と匿名化処理部323とを含む。
As shown in FIG. 7, the anonymization device 320 includes a priority determination information storage unit 321, an information loss amount calculation unit 322, and an anonymization processing unit 323.
尚、匿名化装置320は、匿名化装置310に替えて、図2に示すシステムに含まれてもよい。
It should be noted that the anonymization device 320 may be included in the system shown in FIG.
優先度決定情報記憶部321は、優先度pを決定する情報を記憶する。尚、優先度pを決定する情報は、システムの利用者により予め設定されている。また、優先度pを決定する情報は、分割属性決定部3233が図5に示す通信部706により、予め外部のシステムから受信するようにしてもよい。
The priority determination information storage unit 321 stores information for determining the priority p. Information for determining the priority p is preset by the user of the system. Further, the information for determining the priority p may be received in advance from an external system by the division attribute determining unit 3233 via the communication unit 706 shown in FIG.
図8は、優先度決定情報記憶部321に記憶される優先度決定情報3210の一例を示す図である。図8に示すように、優先度決定情報3210は、インデックスと重み(優先度とも呼ばれる)との組を含む。ここで、そのインデックスは、重みを一意に決定する値である。その重みは、そのインデックスのそれぞれに対応し、属性の重要さを示す数字である。図8において、例えば、そのインデックスの「5」に対応する重みは「16」である。
FIG. 8 is a diagram illustrating an example of the priority determination information 3210 stored in the priority determination information storage unit 321. As shown in FIG. 8, the priority determination information 3210 includes a set of an index and a weight (also referred to as priority). Here, the index is a value that uniquely determines the weight. The weight corresponds to each of the indexes and is a number indicating the importance of the attribute. In FIG. 8, for example, the weight corresponding to “5” of the index is “16”.
尚、図8の例に係わらず、そのインデックスは5種類に限らず、2種類以上の任意の数であってよい。また、インデックスは数字に限らず、アルファベット等の表記であってよいし、属性の名前(以後、属性名とも呼ぶ)であってもよい。
Note that, regardless of the example of FIG. 8, the index is not limited to five, and may be any number of two or more. Further, the index is not limited to numerals, and may be written in alphabets or the like, or may be attribute names (hereinafter also referred to as attribute names).
また、その重みは、後述する評価値の算出に使用可能な、任意の数値であってよい。
Further, the weight may be an arbitrary numerical value that can be used for calculation of an evaluation value described later.
また、優先度決定情報記憶部321は、そのインデックスの入力に対して重みを計算する計算式(例えば、「重み=2×(インデックス-1)」)を、優先度決定情報として記憶してもよい。
Also, the priority determination information storage unit 321 may store a calculation formula (for example, “weight = 2 × (index−1)”) for calculating a weight for the input of the index as the priority determination information. Good.
情報損失量計算部322は、データセットsp110における各属性の情報損失量ILAを計算し、出力する。
The information loss amount calculation unit 322 calculates and outputs the information loss amount ILA of each attribute in the data set sp110.
匿名化処理部323は、分割属性決定部3233と分割値決定部3234と匿名性評価部3235と汎化実行部3236とを含む。
The anonymization processing unit 323 includes a division attribute determination unit 3233, a division value determination unit 3234, an anonymity evaluation unit 3235, and a generalization execution unit 3236.
分割属性決定部3233は、優先度決定情報記憶部321に記憶された優先度決定情報3210を利用し、例えば、図5に示す入力部704から入力された各属性のインデックスに基づいて、各属性の重みを生成する。
The division attribute determination unit 3233 uses the priority determination information 3210 stored in the priority determination information storage unit 321, for example, based on the index of each attribute input from the input unit 704 illustrated in FIG. 5. Generate weights for.
次に、分割属性決定部3233は、その生成した重みと情報損失量ILAとに基づいて、分割軸の属性(加工の対象とする属性とも呼ばれる、以後、属性と呼ぶ)を決定する。
Next, the division attribute determination unit 3233 determines the attribute of the division axis (also referred to as an attribute to be processed, hereinafter referred to as an attribute) based on the generated weight and the information loss amount ILA.
分割属性は、データセット(例えば、データセットsp110)を分割する場合に、その分割属性の属性値を基準として分割する属性のことである。ここで、データセットを分割することは、そのデータセットに含まれるデータレコードをグループ分けすることである。即ち、分割属性決定部3233は、データセット(例えば、データセットsp110)を分割する場合に、その分割属性の属性値の範囲を基準としてその分割を行う。その範囲は、例えば、ある値より大きい値と小さい値とである。或いは、その範囲は、地理的な領域や、物事の種類や、事象との関連性やであってよい。
The split attribute is an attribute that is split based on the attribute value of the split attribute when the data set (for example, the data set sp110) is split. Here, dividing a data set means grouping data records included in the data set. That is, when dividing the data set (for example, the data set sp110), the division attribute determination unit 3233 performs the division based on the attribute value range of the division attribute. The range is, for example, a value larger than a certain value and a smaller value. Alternatively, the range may be a geographic region, a type of thing, or an association with an event.
分割値決定部3234は、必要な匿名性を満たすように、その分割属性の分割値を決定する。その分割値は、例えば、属性値が数値で示される場合、属性値が取り得る範囲内の数値である。或いは、その範囲は、属性値が地理的な領域である場合、その領域を示す識別情報(例えば、県名)の集合であってよい。また、その範囲は、属性値が物事(例えば、趣味)の種類である場合、その種類を分類する識別情報(例えば、屋外で行うもの)であってよい。また、その範囲は、属性値が事象との関連性である場合、関連性の有無であってよい。
The division value determination unit 3234 determines the division value of the division attribute so as to satisfy the necessary anonymity. For example, when the attribute value is indicated by a numerical value, the division value is a numerical value within a possible range of the attribute value. Alternatively, the range may be a set of identification information (for example, prefecture names) indicating the area when the attribute value is a geographical area. Moreover, the range may be identification information (for example, what is performed outdoors) that classifies the type when the attribute value is a type of thing (for example, hobby). Moreover, the range may be the presence or absence of relevance when the attribute value is relevance with an event.
匿名性評価部3235は、あるデータセットが分割された場合にその分割されたデータセットのそれぞれが、必要な匿名性を満たすか否かを判定する。具体的には、匿名性評価部3235は、例えば、あるデータセットが、2つのグループに分割された場合、その2つのグループのそれぞれが、少なくともk個のデータレコードrp111を含むように、そのデータセットを分割できるか否かを判定する。ここで、そのk個の「k」は、k-匿名性或いはk-匿名化の「k」である。以後のk個についても、同様である。
The anonymity evaluation unit 3235 determines whether each divided data set satisfies the required anonymity when a certain data set is divided. Specifically, the anonymity evaluation unit 3235, for example, when a certain data set is divided into two groups, the data so that each of the two groups includes at least k data records rp111. Determine whether the set can be split. Here, the k “k” s are k-anonymity or k-anonymization “k”. The same applies to the subsequent k.
汎化実行部3236は、決定された分割値に基づいて、決定された属性の属性値を汎化(加工)し、出力する。
The generalization execution unit 3236 generalizes (processes) the attribute value of the determined attribute based on the determined division value, and outputs it.
以上説明した匿名化装置320は、図1に示す匿名化装置310と同様に、図5に示すコンピュータ700によって実現してもよい。
The anonymization device 320 described above may be realized by the computer 700 shown in FIG. 5 similarly to the anonymization device 310 shown in FIG.
次に本実施形態の動作について、図面を参照して詳細に説明する。
Next, the operation of this embodiment will be described in detail with reference to the drawings.
図9A、図9B、図10A及び図10Bは、本実施形態の動作を示すシーケンス図である。
FIG. 9A, FIG. 9B, FIG. 10A and FIG. 10B are sequence diagrams showing the operation of this embodiment.
図9Aにおいて、分割属性決定部3233は、例えば図5に示す入力部704からのシステムの利用者による分割属性決定要求の入力を、受け付ける(ステップS801)。
9A, the division attribute determination unit 3233 receives, for example, an input of a division attribute determination request by a system user from the input unit 704 shown in FIG. 5 (step S801).
ここで、分割属性決定要求は、例えば、k-匿名性のkの値「5」と、属性名及び対応するインデックス「生年:4、診療年月:1」とを含む。
Here, the split attribute determination request includes, for example, k-anonymity k value “5” and attribute name and corresponding index “birth year: 4, medical year: 1”.
尚、匿名化済データセットを利用するその利用者は、汎化(加工)の度合を抑制したい属性ほど大きいインデックスの値を指定する。
Note that the user who uses the anonymized data set specifies a larger index value for an attribute whose degree of generalization (processing) is desired to be suppressed.
次に、分割属性決定部3233は、例えば図5に示す記憶部702に、その受け付けた分割属性決定要求に含まれるkの値「5」と属性名及び対応するインデックス「生年:4、診療年月:1」とを記憶する(ステップS802)。
Next, the division attribute determination unit 3233 stores, for example, in the storage unit 702 illustrated in FIG. 5, the value “5” of k included in the received division attribute determination request, the attribute name, and the corresponding index “birth year: 4, medical year” “Month: 1” is stored (step S802).
次に、分割属性決定部3233は、優先度決定情報3210を利用し、属性名及び対応するインデックス「生年:4、診療年月:1」に基づいて重みを生成する(ステップS803)。
Next, the division attribute determination unit 3233 uses the priority determination information 3210 to generate a weight based on the attribute name and the corresponding index “birth year: 4, medical year: 1” (step S803).
ここでは、分割属性決定部3233は、「生年」の属性に対応するその重みを「8」、「診療年月」の属性に対応するその重みを「1」と算出する。
Here, the division attribute determination unit 3233 calculates the weight corresponding to the attribute “birth year” as “8” and the weight corresponding to the attribute “medical year” as “1”.
次に、分割属性決定部3233は情報損失量計算部322へ情報損失量ILAの計算要求を送信する(ステップS804)
次に、その情報損失量ILAの計算要求を受信した情報損失量計算部322は、個人データ記憶装置100へデータセットsp110の取得要求(以後、個人データ取得要求とも呼ぶ)を送信する(ステップS805)。 Next, the divisionattribute determination unit 3233 transmits an information loss amount ILA calculation request to the information loss amount calculation unit 322 (step S804).
Next, the information lossamount calculation unit 322 that has received the calculation request for the information loss amount ILA transmits a request for acquiring the data set sp110 (hereinafter also referred to as a personal data acquisition request) to the personal data storage device 100 (step S805). ).
次に、その情報損失量ILAの計算要求を受信した情報損失量計算部322は、個人データ記憶装置100へデータセットsp110の取得要求(以後、個人データ取得要求とも呼ぶ)を送信する(ステップS805)。 Next, the division
Next, the information loss
次に、データセットsp110を受信した情報損失量計算部322は、情報損失量ILAを計算し、計算した情報損失量ILAを分割属性決定部3233へ送信する(ステップS806)。
Next, the information loss amount calculation unit 322 that has received the data set sp110 calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the division attribute determination unit 3233 (step S806).
ここで、情報損失量計算部322の情報損失量ILAの計算の動作について、詳細に説明する。
Here, the operation of calculating the information loss amount ILA of the information loss amount calculation unit 322 will be described in detail.
情報損失量計算部322は、例えば以下に示す式1を用いて、1つのデータレコードrp111の情報損失量ILRを算出する。
The information loss amount calculation unit 322 calculates the information loss amount ILR of one data record rp111 using, for example, the following formula 1.
ここで、pta-maxは、汎化後の属性値の最大値である。また、pta-minは、汎化後の属性値の最小値である。また、ptb-maxは、汎化前の属性値の最大値である。また、ptb-minは、汎化前の属性値の最小値である。
Here, pta-max is the maximum attribute value after generalization. Also, pta-min is the minimum attribute value after generalization. Ptb-max is the maximum value of the attribute value before generalization. Ptb-min is the minimum attribute value before generalization.
本実施形態はトップダウンアプローチを用いた匿名化の実施形態であるため、データセットsp110の匿名化対象の属性の属性値は、全て同一の値になるように汎化されるものとする。
Since this embodiment is an anonymization embodiment using a top-down approach, it is assumed that the attribute values of the attributes to be anonymized in the data set sp110 are generalized so that they all have the same value.
図11は、図3に示すデータセットsp110の匿名化の対象である属性のそれぞれの属性値が同一の値に汎化された場合のデータセットst120を示す図である。即ち、図11に示すデータセットst120は、データセットsp110が最大に汎化された状態のデータセットである。
FIG. 11 is a diagram showing the data set st120 when the attribute values of the attributes to be anonymized in the data set sp110 shown in FIG. 3 are generalized to the same value. That is, the data set st120 shown in FIG. 11 is a data set in which the data set sp110 is generalized to the maximum.
この場合、pta-maxは、例えば、図11に示すデータセットst120において、属性名が「生年」の属性の属性値の最大値である「1977」(「1943~1977」の「1977」)である。また、pta-minは、例えば、データセットst120において、属性名が「生年」の属性の、属性値の最小値である「1943」(「1943~1977」の「1943」)である。また、ptb-maxは、例えば、図3に示すデータセットsp110において、属性名が「生年」の属性の、属性値の最大値である「1977」である。また、ptb-minは、例えば、データセットsp110において、属性名が「生年」の属性の、属性値の最小値である「1943」である。
In this case, pta-max is, for example, “1977” (“1977” of “1943 to 1977”), which is the maximum attribute value of the attribute whose attribute name is “birth year” in the data set st120 shown in FIG. is there. Further, pta-min is, for example, “1943” (“1943” of “1943 to 1977”) that is the minimum value of the attribute whose attribute name is “birth year” in the data set st120. Also, ptb-max is, for example, “1977” which is the maximum attribute value of the attribute whose attribute name is “birth year” in the data set sp110 shown in FIG. Also, ptb-min is, for example, “1943” which is the minimum attribute value of the attribute whose name is “birth year” in the data set sp110.
従って、属性名が「生年」の属性の、1つのデータレコードrp111の情報損失量ILR-birthは、以下のように「1」が算出される。
Therefore, “1” is calculated as follows for the information loss amount ILR-birth of one data record rp111 with the attribute name “year of birth”.
「情報損失量ILR-birth」=(1977-1973)÷(1977-1943)=1
また、データセットsp110に含まれるデータレコードrp111の数は、20である。従って、属性名が「生年」の属性の情報損失量ILA-birthは、以下のように「1」が算出される。 “Information loss amount ILR-birth” = (1977-1973) ÷ (1977-1943) = 1
In addition, the number of data records rp111 included in the data set sp110 is 20. Accordingly, the information loss amount ILA-birth of the attribute whose attribute name is “birth year” is calculated as “1” as follows.
また、データセットsp110に含まれるデータレコードrp111の数は、20である。従って、属性名が「生年」の属性の情報損失量ILA-birthは、以下のように「1」が算出される。 “Information loss amount ILR-birth” = (1977-1973) ÷ (1977-1943) = 1
In addition, the number of data records rp111 included in the data set sp110 is 20. Accordingly, the information loss amount ILA-birth of the attribute whose attribute name is “birth year” is calculated as “1” as follows.
(「情報損失量ILR-birth」)×(データレコードrp111の数)=1×20=20
同様に、診療年月の全体の情報損失量ILA-mcは、「20」が算出される。 (“Information Loss ILR-birth”) × (number of data records rp111) = 1 × 20 = 20
Similarly, “20” is calculated as the total information loss amount ILA-mc of the medical treatment date.
同様に、診療年月の全体の情報損失量ILA-mcは、「20」が算出される。 (“Information Loss ILR-birth”) × (number of data records rp111) = 1 × 20 = 20
Similarly, “20” is calculated as the total information loss amount ILA-mc of the medical treatment date.
尚、情報損失量計算部322は、汎化後と汎化前とのそれぞれのその属性の属性値の種類の数の比を、1つのデータレコードrp111の情報損失量ILRとして、算出するようにしてもよい。
The information loss amount calculation unit 322 calculates the ratio of the number of attribute value types of the attribute after generalization and before generalization as the information loss amount ILR of one data record rp111. May be.
以上が、情報損失量計算部322の情報損失量ILAの計算の動作についての詳細な説明である。
The above is the detailed description of the operation of calculating the information loss amount ILA of the information loss amount calculation unit 322.
図9Aの説明に戻る。尚、上述の説明において、ステップS803と、ステップS804、ステップS805及びステップS806との処理順序は、任意の順序であってよい。即ち、その順序は、逆であってもよいし、同時であってもよい。
Returning to the description of FIG. 9A. In the above description, the processing order of step S803, step S804, step S805, and step S806 may be any order. That is, the order may be reversed or simultaneous.
次に、分割属性決定部3233は、分割属性を決定する(ステップS807)。
Next, the division attribute determination unit 3233 determines a division attribute (step S807).
ここで、分割属性決定部3233による、分割属性の決定の動作について、詳細に説明する。
Here, the operation of determining the division attribute by the division attribute determination unit 3233 will be described in detail.
分割属性決定部3233は、重みと情報損失量ILAとを含んだ評価式を用いて評価値を算出し、分割属性を決定する。以下に示す式2は、評価式の一例である。
The division attribute determination unit 3233 calculates an evaluation value using an evaluation formula including the weight and the information loss amount ILA, and determines a division attribute. Formula 2 shown below is an example of an evaluation formula.
[数2]
評価値=重み×情報損失量ILA ・・・ (式2)
例えば、データセットsp110の属性名が「生年」の属性の評価値は、重みが「8」、情報損失量ILA-birthが「20」なので、「160」である。同様に、属性名が「診療年月」の属性の評価値は、重みが「1」、情報損失量ILA-mcが「20」なので、「20」である。 [Equation 2]
Evaluation value = weight × information loss amount ILA (Expression 2)
For example, the evaluation value of the attribute whose attribute name is “birth year” in the data set sp110 is “160” because the weight is “8” and the information loss amount ILA-birth is “20”. Similarly, the evaluation value of the attribute having the attribute name “medical care date” is “20” because the weight is “1” and the information loss amount ILA-mc is “20”.
評価値=重み×情報損失量ILA ・・・ (式2)
例えば、データセットsp110の属性名が「生年」の属性の評価値は、重みが「8」、情報損失量ILA-birthが「20」なので、「160」である。同様に、属性名が「診療年月」の属性の評価値は、重みが「1」、情報損失量ILA-mcが「20」なので、「20」である。 [Equation 2]
Evaluation value = weight × information loss amount ILA (Expression 2)
For example, the evaluation value of the attribute whose attribute name is “birth year” in the data set sp110 is “160” because the weight is “8” and the information loss amount ILA-birth is “20”. Similarly, the evaluation value of the attribute having the attribute name “medical care date” is “20” because the weight is “1” and the information loss amount ILA-mc is “20”.
次に、分割属性決定部3233は、算出した評価値が最大の属性を分割属性として決定する。例えばデータセットsp110の場合、分割属性決定部3233は、属性名が「生年」の属性の評価値が、属性名が「診療年月」の属性の評価値よりも大きいので、属性名が「生年」の属性をその分割属性として決定する。
Next, the division attribute determination unit 3233 determines the attribute having the largest calculated evaluation value as the division attribute. For example, in the case of the data set sp110, since the evaluation value of the attribute whose attribute name is “birth year” is larger than the evaluation value of the attribute whose attribute name is “medical care month”, the divided attribute determination unit 3233 ”Is determined as the split attribute.
尚、評価値を算出する式は、式2に限らず、優先度p(例えば、式2の「重み」のように、大きいほど優先度が高いことを示す値)が高いほど及び情報損失量ILAが大きいほど演算結果が大きくなるような、任意の評価式でよい。
The formula for calculating the evaluation value is not limited to the formula 2, but the higher the priority p (for example, the value indicating that the higher the priority is, like the “weight” in the formula 2), and the amount of information loss. An arbitrary evaluation formula may be used such that the larger the ILA, the larger the calculation result.
以上が、分割属性決定部3233による分割属性の決定の動作についての説明である。
The above is the description of the operation of determining the division attribute by the division attribute determination unit 3233.
次に、図9Bにおいて、分割属性決定部3233は、分割値決定部3234へ分割値決定要求を送信する(ステップS808)。その分割値決定要求は、分割属性決定部3233により決定された分割属性の属性名の「生年」を含む。
Next, in FIG. 9B, the division attribute determination unit 3233 transmits a division value determination request to the division value determination unit 3234 (step S808). The division value determination request includes the “birth year” of the attribute name of the division attribute determined by the division attribute determination unit 3233.
その分割値決定要求を受信した分割値決定部3234は、個人データ記憶装置100へ個人データ取得要求を送信する。(ステップS809)
データセットsp110を受信した分割値決定部3234は、分割値を決定する(ステップS810)。 The divisionvalue determination unit 3234 that has received the division value determination request transmits a personal data acquisition request to the personal data storage device 100. (Step S809)
The divisionvalue determining unit 3234 that has received the data set sp110 determines a division value (step S810).
データセットsp110を受信した分割値決定部3234は、分割値を決定する(ステップS810)。 The division
The division
ここで、分割値決定部3234による分割値の決定の動作について、詳細に説明する。
Here, the operation of determining the division value by the division value determination unit 3234 will be described in detail.
その分割値は、指定された属性を分割軸としてデータセットを分割する時の、閾値である。例えば、分割値「生年:1956」は、「生年」の属性が「1956」以下のデータレコードrp111と「1956」を超えるデータレコードrp111とに、データセットsp110を分割することを示す。
The division value is a threshold value when dividing the data set with the specified attribute as the division axis. For example, the division value “birth year: 1956” indicates that the data set sp110 is divided into the data record rp111 whose attribute of “birth year” is “1956” or less and the data record rp111 exceeding “1956”.
図12は、データセットsp110の分割値候補1101~1111の例を示す図である。
FIG. 12 is a diagram illustrating an example of the division value candidates 1101 to 1111 of the data set sp110.
まず、分割値決定部3234は、図12に示すように、データセットsp110のデータレコードrp111を、分割属性決定部3233が決定した属性を、属性値が小さい順番に並べる。
First, as illustrated in FIG. 12, the division value determination unit 3234 arranges the data records rp111 of the data set sp110 in the order in which the attribute values are determined in ascending order of the attribute values.
次に、分割値決定部3234は、分割値候補1101~1111を抽出する。分割値決定部3234が抽出する分割値候補1101~1111は、分割されたデータセットsp110の前半部分(第3の個人データとも呼ばれる)と後半部分(第4の個人データとも呼ばれる)とのそれぞれのデータレコードrp111の数がk個以上になる分割値の候補である。例えば、「生年」の属性において属性値の「1951」を分割値とすると、その前半部分は、その属性値が1951以下の5個のデータレコードrp111を含む。また、その後半部分は、1952以上の15個のデータレコードrp111を含む。この場合、その前半部分とその後半部分とのそれぞれは、いずれも5個以上である。
Next, the division value determination unit 3234 extracts division value candidates 1101 to 1111. Divided value candidates 1101 to 1111 extracted by the divided value determining unit 3234 include the first half part (also called third personal data) and the second half part (also called fourth personal data) of the divided data set sp110. This is a candidate for a division value in which the number of data records rp111 is k or more. For example, assuming that the attribute value “1951” in the attribute of “birth year” is a divided value, the first half includes five data records rp111 whose attribute value is 1951 or less. Further, the latter half includes 15 data records rp111 of 1952 or more. In this case, each of the first half and the second half is 5 or more.
図12に示すデータセットsp110において、分割値決定部3234は、分割値候補1101~1111を抽出する。
In the data set sp110 shown in FIG. 12, the division value determining unit 3234 extracts division value candidates 1101 to 1111.
次に、分割値決定部3234は、各分割値候補1101~1111に対応する情報損失量ILAを計算する。例えば、分割値決定部3234は、式1を用いて情報損失量ILAを算出する。尚、分割値決定部3234は、式1に限らず、他の算出式を用いて情報損失量ILAを算出してもよい。
Next, the division value determination unit 3234 calculates an information loss amount ILA corresponding to each of the division value candidates 1101 to 1111. For example, the division value determining unit 3234 calculates the information loss amount ILA using Equation 1. Note that the division value determination unit 3234 may calculate the information loss amount ILA not only using the equation 1 but also using another calculation equation.
具体的には、例えば分割値候補1105でデータセットsp110を分割する場合、分割値決定部3234は、以下の様に情報損失量ILAを計算する。
Specifically, for example, when the data set sp110 is divided by the division value candidate 1105, the division value determination unit 3234 calculates the information loss amount ILA as follows.
分割値決定部3234は、図12に示すように、分割属性の「生年」の属性値により、昇順でデータセットsp110をソートする。分割値候補1105の分割値でデータセットsp110を分割した場合、その分割された前半部分のデータレコードrp111の一つの情報損失量ILRは、(1956-1943)÷(1977-1943)=0.382である。
As shown in FIG. 12, the division value determination unit 3234 sorts the data set sp110 in ascending order according to the attribute value of the division attribute “birth year”. When the data set sp110 is divided by the division value of the division value candidate 1105, one information loss amount ILR of the divided first half data record rp111 is (1956-1943) / (1977-1943) = 0.382. It is.
従って、その前半部分の情報損失量ILRの合計は、データレコードrp111の数が9個であるので、0.382×9=3.438である。
Therefore, the total of the information loss amount ILR in the first half is 0.382 × 9 = 3.438 because the number of data records rp111 is nine.
また、その分割された後半部分のデータレコードrp111の1つの情報損失量ILRは、(1977-1961)÷(1977-1943)=0.471である。
In addition, one information loss amount ILR of the divided second half data record rp111 is (1977-1961) ÷ (1977-1943) = 0.471.
従って、その後半部分の情報損失量ILRの合計は、データレコードrp111の数が11個であるので、0.471×11=5.181である。
Therefore, the total of the information loss amount ILR in the latter half is 0.471 × 11 = 5.181 because the number of data records rp111 is 11.
従って、分割値候補1105で分割された場合の合計の情報損失量ILAは、3.438+5.181=8.619である。
Therefore, the total information loss amount ILA when divided by the division value candidate 1105 is 3.438 + 5.181 = 8.619.
同様にして算出される、分割値候補1101~1104で分割された場合の、「生年」の属性のそれぞれの情報損失量ILAは、「11.76」、「12.47」、「10.67」及び「10.23」である。また、同様にして算出される、分割値候補1106~1111で分割された場合の、「生年」の属性のそれぞれの情報損失量ILAは、「10.00」、「10.05」、「9.88」、「10.14」、「10.70」及び「10.73」である。
Similarly, the information loss amounts ILA of the attributes “birth year” when divided by the division value candidates 1101 to 1104 are “11.76”, “12.47”, “10.67”. And “10.23”. Similarly, the information loss amounts ILA of the attribute “birth year” when divided by the division value candidates 1106 to 1111 calculated in the same manner are “10.00”, “10.05”, “9” .88 "," 10.14 "," 10.70 "and" 10.73 ".
分割値候補1101~1111のそれぞれに対する情報損失量ILAを計算した分割値決定部3234は、その情報損失量ILAが最小である分割値候補1105の「生年:1956」を分割値として決定する。
The division value determining unit 3234 that has calculated the information loss amount ILA for each of the division value candidates 1101 to 1111 determines “birth year: 1956” of the division value candidate 1105 having the smallest information loss amount ILA as the division value.
以上が、分割値決定部3234による分割値の決定の動作についての説明である。
The above is the description of the operation of determining the division value by the division value determination unit 3234.
図9Bの説明に戻る。次に、その分割値を決定した分割値決定部3234は、その決定した分割値「生年:1956」を匿名性評価部3235へ送信する(ステップS811)。換言すると、分割値決定部3234は、匿名性評価部3235に、匿名性の評価を要求する。
Returning to the description of FIG. 9B. Next, the division value determination unit 3234 that has determined the division value transmits the determined division value “birth year: 1956” to the anonymity evaluation unit 3235 (step S811). In other words, the divided value determination unit 3234 requests the anonymity evaluation unit 3235 to evaluate anonymity.
その分割値「生年:1956」を受信した匿名性評価部3235は、匿名性評価を行う(ステップS812)。
The anonymity evaluation unit 3235 that has received the division value “birth year: 1956” performs anonymity evaluation (step S812).
ここで、匿名性評価部3235による匿名性評価の動作について、詳細に説明する。
Here, the operation of anonymity evaluation by the anonymity evaluation unit 3235 will be described in detail.
匿名性評価とは、匿名性の指標を満たすか否かを評価することである。匿名性評価部3235は、データセットsp110の分割されたその前半部分(第3の個人データ)とその後半部分(第4の個人データ)とについて更に分割を行った場合に、その更に分割された部分が匿名性の指標を満たすか否かを評価する。即ち、その前半部分及びその後半部分のそれぞれについて、データレコードrp111の数が2k個以上か否かを評価する。
Anonymity evaluation means evaluating whether or not an anonymity index is satisfied. When the anonymity evaluation unit 3235 further divides the first half (third personal data) and the second half (fourth personal data) of the data set sp110, the further division is performed. Evaluate whether the part satisfies the anonymity index. That is, it is evaluated whether or not the number of data records rp111 is 2k or more for each of the first half and the latter half.
匿名性評価部3235は、受信した分割値で分割されたその前半部分及びその後半部分のそれぞれのデータレコードrp111の数を計数する。例えば、その分割値「生年:1956」で分割した場合、匿名性評価部3235は、その前半部分のデータレコードrp111の数を9個、その後半部分のデータレコードrp111の数を11個と計数する。
The anonymity evaluation unit 3235 counts the number of data records rp111 of the first half and the second half divided by the received division value. For example, when dividing by the division value “birth year: 1956”, the anonymity evaluation unit 3235 counts the number of data records rp111 in the first half as nine and the number of data records rp111 in the latter half as eleven. .
以上が、匿名性評価部3235による匿名性評価の動作についての説明である。
The above is the description of the anonymity evaluation operation by the anonymity evaluation unit 3235.
図9Bの説明に戻る。次に、匿名化処理部323は、匿名性の指標を満たさないと匿名性評価部3235が評価した部分(例えば、その分割値「生年:1956」で分割された前半部分)について、ステップS813からステップS815の処理を実行する。また、匿名化処理部323は、匿名性の指標を満たすと匿名性評価部3235が評価した部分(例えば、その分割値「生年:1956」で分割された後半部分)について、ステップS821以降の処理を実行する。
Returning to the description of FIG. 9B. Next, from step S813, the anonymization processing unit 323 determines the portion evaluated by the anonymity evaluation unit 3235 that the anonymity index is not satisfied (for example, the first half portion divided by the division value “birth year: 1956”). The process of step S815 is executed. In addition, the anonymization processing unit 323 performs the processing from step S821 onward for the portion evaluated by the anonymity evaluation unit 3235 that satisfies the anonymity index (for example, the latter half portion divided by the division value “birth year: 1956”) Execute.
その分割値「生年:1956」で分割された前半部分のデータレコードrp111の数は2k個未満であった。そこで、匿名性評価部3235は、「生年:1943~1956」を含む汎化実行要求を汎化実行部3236へ送信する(ステップS813)
その汎化実行要求を受信した汎化実行部3236は、「生年」の属性の属性値が「1943」~「1956」のデータレコードrp111を汎化する(ステップS814)。 The number of data records rp111 in the first half divided by the division value “birth year: 1956” was less than 2k. Therefore, theanonymity evaluation unit 3235 transmits a generalization execution request including “birth year: 1943 to 1956” to the generalization execution unit 3236 (step S813).
Upon receiving the generalization execution request, thegeneralization execution unit 3236 generalizes the data records rp111 having attribute values “1943” to “1956” of the “year of birth” attribute (step S814).
その汎化実行要求を受信した汎化実行部3236は、「生年」の属性の属性値が「1943」~「1956」のデータレコードrp111を汎化する(ステップS814)。 The number of data records rp111 in the first half divided by the division value “birth year: 1956” was less than 2k. Therefore, the
Upon receiving the generalization execution request, the
具体的には、汎化実行部3236は、「生年」の属性の属性値が「1943」~「1956」に該当するデータレコードrp111の、「生年」の属性の属性値を「1943~1956」に、「診療年月」の属性の属性値を「200512~201107」に書き換える。
Specifically, the generalization execution unit 3236 sets the attribute value of the “birth year” attribute to “1943 to 1956” in the data record rp111 corresponding to the attribute value of “birth year” from “1943” to “1956”. In addition, the attribute value of the attribute “medical treatment date” is rewritten to “200512 to 201107”.
次に、汎化実行部3236は、書き換えたデータレコードrp111を匿名化済個人データ記憶装置200へ記録する(ステップS815)。換言すると、汎化実行部3236は、匿名化済個人データ記憶装置200へ匿名化済み個人データを登録する。
Next, the generalization execution unit 3236 records the rewritten data record rp111 in the anonymized personal data storage device 200 (step S815). In other words, the generalization execution unit 3236 registers the anonymized personal data in the anonymized personal data storage device 200.
その分割値の「生年:1956」で分割された後半部分のデータレコードrp111の数は2k個以上であった。そこで、匿名化装置320は、そのデータセットsp110の分割された後半部分(第4の個人データ)を新たなデータセットsp(新たな第1の個人データ)として、ステップS821以降の処理(2回目の匿名化)を実行する。
The number of data records rp111 in the second half divided by the division value “birth year: 1956” was 2k or more. Therefore, the anonymization device 320 sets the divided second half portion (fourth personal data) of the data set sp110 as a new data set sp (new first personal data), and performs the processing after step S821 (second time). Anonymize).
図10Aにおいて、匿名性評価部3235は、「生年:1961~1977」を含む再分割要求を分割属性決定部3233へ送信する(ステップS821)。
10A, the anonymity evaluation unit 3235 transmits a subdivision request including “birth year: 1961 to 1977” to the division attribute determination unit 3233 (step S821).
次に、その再分割要求を受信した分割属性決定部3233は、優先度決定情報3210を利用し、属性名及び対応するインデックス「生年:4、診療年月:1」に基づいて重みを生成する(ステップS822)
ここでは、分割属性決定部3233は、「生年」の属性に対応する重みを「8」、「診療年月」の属性に対応する重みを「1」と算出する。 Next, the divisionattribute determination unit 3233 that has received the subdivision request uses the priority determination information 3210 to generate a weight based on the attribute name and the corresponding index “birth year: 4, medical year: 1”. (Step S822)
Here, the divisionattribute determination unit 3233 calculates the weight corresponding to the attribute “birth year” as “8” and the weight corresponding to the attribute “medical year” as “1”.
ここでは、分割属性決定部3233は、「生年」の属性に対応する重みを「8」、「診療年月」の属性に対応する重みを「1」と算出する。 Next, the division
Here, the division
次に、分割属性決定部3233は、情報損失量計算部322へ情報損失量ILAの計算要求を行う(ステップS823)
次に、その情報損失量ILAの計算要求を受信した情報損失量計算部322は、個人データ記憶装置100へ「生年」の属性の属性値が「1961」~「1977」であるデータレコードrp111(そのデータセットsp110の後半部分)の取得要求を送信する(ステップS824)。換言すると、情報損失量計算部322は、個人データ記憶装置100に個人データの取得を要求する。 Next, the divisionattribute determination unit 3233 requests the information loss amount calculation unit 322 to calculate the information loss amount ILA (step S823).
Next, the information lossamount calculation unit 322 that has received the calculation request for the information loss amount ILA sends the personal data storage device 100 the data record rp111 (“1961” to “1977” attribute values of the “birth year” attribute). An acquisition request for the latter half of the data set sp110 is transmitted (step S824). In other words, the information loss amount calculation unit 322 requests the personal data storage device 100 to acquire personal data.
次に、その情報損失量ILAの計算要求を受信した情報損失量計算部322は、個人データ記憶装置100へ「生年」の属性の属性値が「1961」~「1977」であるデータレコードrp111(そのデータセットsp110の後半部分)の取得要求を送信する(ステップS824)。換言すると、情報損失量計算部322は、個人データ記憶装置100に個人データの取得を要求する。 Next, the division
Next, the information loss
次に、そのデータセットsp110の後半部分を受信した情報損失量計算部322は、情報損失量ILAを計算し、計算した情報損失量ILAを分割属性決定部3233へ送信する(ステップS825)。
Next, the information loss amount calculation unit 322 that has received the latter half of the data set sp110 calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the division attribute determination unit 3233 (step S825).
ここでは、情報損失量計算部322は、そのデータセットsp110の後半部分について、「生年」の属性の情報損失量ILA-birth-ov1961を以下のように算出する。
Here, the information loss amount calculation unit 322 calculates the information loss amount ILA-birth-ov1961 of the attribute “birth year” for the latter half of the data set sp110 as follows.
(1977-1961)÷(1977-1943)×11=5.181
また、情報損失量計算部322は、「診療年月」の属性の情報損失量ILA-mc-ov1961を以下のように算出する。 (1977-1961) ÷ (1977-1943) × 11 = 5.181
Further, the information lossamount calculation unit 322 calculates the information loss amount ILA-mc-ov1961 having the attribute of “medical care date” as follows.
また、情報損失量計算部322は、「診療年月」の属性の情報損失量ILA-mc-ov1961を以下のように算出する。 (1977-1961) ÷ (1977-1943) × 11 = 5.181
Further, the information loss
(201107-200512÷(201107-200612)=9.152
次に、分割属性決定部3233は、分割属性を決定する(ステップS826)
例えば、分割属性決定部3233は、式2を用いて、「生年」の属性について、重みが「8」、情報損失量ILA-birth-ov1961が「5.181」なので、「生年」属性の評価値として「41.448」を算出する。同様に、分割属性決定部3233は、「診療年月」の属性について、重みが「1」、情報損失量ILA-mc-ov1961が「9.152」なので、「診療年月」属性の評価値として「9.152」を算出する。 (201107-200512 ÷ (201107-200612) = 9.152
Next, the divisionattribute determination unit 3233 determines a division attribute (step S826).
For example, the divisionattribute determination unit 3233 uses Equation 2 to evaluate the “birth year” attribute because the weight of the “birth year” attribute is “8” and the information loss amount ILA-birth-ov1961 is “5.181”. “41.448” is calculated as the value. Similarly, for the attribute of “medical care date”, the division attribute determination unit 3233 has a weight of “1” and an information loss amount ILA-mc-ov1961 of “9.152”. As a result, “9.152” is calculated.
次に、分割属性決定部3233は、分割属性を決定する(ステップS826)
例えば、分割属性決定部3233は、式2を用いて、「生年」の属性について、重みが「8」、情報損失量ILA-birth-ov1961が「5.181」なので、「生年」属性の評価値として「41.448」を算出する。同様に、分割属性決定部3233は、「診療年月」の属性について、重みが「1」、情報損失量ILA-mc-ov1961が「9.152」なので、「診療年月」属性の評価値として「9.152」を算出する。 (201107-200512 ÷ (201107-200612) = 9.152
Next, the division
For example, the division
次に、分割属性決定部3233は、「生年」の属性の評価値が「診療年月」の属性の評価値よりも大きいので、属性名が「生年」の属性を分割属性として決定する。
Next, since the evaluation value of the attribute of “birth year” is larger than the evaluation value of the attribute of “medical care date”, the division attribute determination unit 3233 determines the attribute whose attribute name is “birth year” as the division attribute.
次に、図10Bにおいて、分割属性決定部3233は、属性名の「生年」を含む分割値決定要求を分割値決定部3234へ送信する(ステップS827)。
Next, in FIG. 10B, the division attribute determination unit 3233 transmits a division value determination request including the attribute name “birth year” to the division value determination unit 3234 (step S827).
分割値決定要求を受信した分割値決定部3234は、個人データ記憶装置100へ個人データ取得要求を送信する(ステップS828)。
The division value determination unit 3234 that has received the division value determination request transmits a personal data acquisition request to the personal data storage device 100 (step S828).
ここでは、分割属性決定部3233は、2回目の匿名化の対象となるデータレコードrp111(例えば、そのデータセットsp110の後半部分)を、取得することを要求する。
Here, the division attribute determination unit 3233 requests to acquire the data record rp111 (for example, the second half of the data set sp110) that is the object of the second anonymization.
対象となるデータレコードrp111を受信した分割値決定部3234は、分割値を決定する(ステップS829)。
The division value determining unit 3234 that has received the target data record rp111 determines a division value (step S829).
図13は、そのデータセットsp110の分割された後半部分であるデータセットsp130(新たな第1の個人データ)の分割値候補1121及び分割値候補1122の例を示す図である。
FIG. 13 is a diagram illustrating an example of a divided value candidate 1121 and a divided value candidate 1122 of the data set sp130 (new first personal data) that is a divided second half of the data set sp110.
まず、分割値決定部3234は、図13に示すように、データセットsp130のデータレコードrp111を、分割属性決定部3233が決定した属性の属性値の小さい順番に並べる。
First, as illustrated in FIG. 13, the division value determination unit 3234 arranges the data records rp111 of the data set sp130 in the order of the attribute values of the attributes determined by the division attribute determination unit 3233.
次に、分割値決定部3234は、分割値候補を抽出する。図13に示すデータセットsp130において、分割値決定部3234は、分割値候補1121及び分割値候補1122を分割値候補として抽出する。
Next, the division value determination unit 3234 extracts division value candidates. In the data set sp130 illustrated in FIG. 13, the division value determining unit 3234 extracts the division value candidate 1121 and the division value candidate 1122 as division value candidates.
次に、分割値決定部3234は、分割値候補1121及び分割値候補1122のそれぞれに対する情報損失量ILA-birthを計算する。図13に示すデータセットsp130の場合、分割値決定部3234は、分割値候補1121及び分割値候補1122で分割した場合のそれぞれの情報損失量ILA-birthは、「5.565」及び「4.820」である。続けて、分割値決定部3234は、その情報損失量ILA-birthが最小である分割値候補1122の「生年:1963」を分割値として決定する。
Next, the division value determining unit 3234 calculates an information loss amount ILA-birth for each of the division value candidate 1121 and the division value candidate 1122. In the case of the data set sp130 shown in FIG. 13, the division value determination unit 3234 has information loss amounts ILA-birth obtained by dividing the division value candidate 1121 and the division value candidate 1122 by “5.565” and “4. 820 ". Subsequently, the division value determination unit 3234 determines “the year of birth: 1963” of the division value candidate 1122 having the smallest information loss amount ILA-birth as the division value.
次に、分割値を決定した分割値決定部3234は、決定した分割値「生年:1963」を匿名性評価部3235へ送信する(ステップS830)。換言すると、分割値決定部3234は、匿名性評価部3235に、匿名性の評価を要求する。
Next, the division value determination unit 3234 that has determined the division value transmits the determined division value “birth year: 1963” to the anonymity evaluation unit 3235 (step S830). In other words, the divided value determination unit 3234 requests the anonymity evaluation unit 3235 to evaluate anonymity.
分割値「生年:1963」を受信した匿名性評価部3235は、匿名性評価を行う(ステップS831)。
The anonymity evaluation part 3235 which received division value "birth year: 1963" performs anonymity evaluation (step S831).
匿名性評価部3235は、分割値の「生年:1963」で分割された、それぞれのデータレコードrp111の数を計数する。図14は、分割値決定部3234が分割値を分割値候補1222「生年:1963」に決定した場合の、図13に示すデータセットsp130が分割されるイメージを示す図である。図14に示す例では、匿名性評価部3235は、分割した後のその前半部分のデータセットsp140のデータレコードrp111の数を6個、分割した後のその後半部分のデータセットsp150のデータレコードrp111の数を5個と計数する。
The anonymity evaluation unit 3235 counts the number of each data record rp111 divided by the division value “birth year: 1963”. FIG. 14 is a diagram illustrating an image in which the data set sp130 illustrated in FIG. 13 is divided when the division value determination unit 3234 determines the division value as the division value candidate 1222 “birth year: 1963”. In the example illustrated in FIG. 14, the anonymity evaluation unit 3235 has six data records rp111 of the data set sp140 of the first half part after the division, and data records rp111 of the data set sp150 of the second half part after the division. Is counted as 5.
データセットsp140及びデータセットsp150のそれぞれのデータレコードrp111の数が2k個未満である。そこで、匿名性評価部3235は、「生年:1961~1963」を含む汎化実行要求及び「生年:1964~1977」を含む汎化実行要求を汎化実行部3236へ送信する(ステップS813)。
The number of data records rp111 in each of the data set sp140 and the data set sp150 is less than 2k. Therefore, the anonymity evaluation unit 3235 transmits a generalization execution request including “birth year: 1961 to 1963” and a generalization execution request including “birth year: 1964 to 1977” to the generalization execution unit 3236 (step S813).
汎化の実行要求を受信した汎化実行部3236は、「生年」の属性の属性値が「1961」~「1963」のデータレコードrp111の汎化及び「生年」の属性の属性値が「1964」~「1977」のデータレコードrp111の汎化を実行する(ステップS814)
図14に示すように、データセットsp140は、「生年」の属性の属性値が「1961~1963」、「診療年月」の属性の属性値が「200612~201105」に汎化される。また、データセットsp150は、「生年」の属性の属性値が「1964~1977」、「診療年月」の属性の属性値が「200706~201104」に汎化される。 Upon receiving the generalization execution request, thegeneralization execution unit 3236 receives the generalization of the data record rp111 having the attribute value “1961” to “1963” and the attribute value of the “birth year” attribute “1964”. ”To“ 1977 ”is generalized (step S814).
As shown in FIG. 14, the data set sp140 is generalized to have an attribute value of “1961 to 1963” for an attribute of “birth year” and an attribute value of an attribute of “medical year” to “20062 to 201105”. In addition, the attribute value of the “birth year” attribute is generalized to “1964-1977”, and the attribute value of the “medical care month” attribute is generalized to “200706-201104” in the data set sp150.
図14に示すように、データセットsp140は、「生年」の属性の属性値が「1961~1963」、「診療年月」の属性の属性値が「200612~201105」に汎化される。また、データセットsp150は、「生年」の属性の属性値が「1964~1977」、「診療年月」の属性の属性値が「200706~201104」に汎化される。 Upon receiving the generalization execution request, the
As shown in FIG. 14, the data set sp140 is generalized to have an attribute value of “1961 to 1963” for an attribute of “birth year” and an attribute value of an attribute of “medical year” to “20062 to 201105”. In addition, the attribute value of the “birth year” attribute is generalized to “1964-1977”, and the attribute value of the “medical care month” attribute is generalized to “200706-201104” in the data set sp150.
次に、汎化実行部3236は、汎化したデータレコードrp111を匿名化済個人データ記憶装置200へ記録する(ステップS815)
上述した本実施形態における効果は、第1の実施形態の効果と同様に、利用目的に合致するように制御してデータセットを匿名化することと、匿名化されたデータセットにおける情報の損失を低減することとを両立させることができる点である。 Next, thegeneralization execution unit 3236 records the generalized data record rp111 in the anonymized personal data storage device 200 (step S815).
As in the effect of the first embodiment, the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be reduced.
上述した本実施形態における効果は、第1の実施形態の効果と同様に、利用目的に合致するように制御してデータセットを匿名化することと、匿名化されたデータセットにおける情報の損失を低減することとを両立させることができる点である。 Next, the
As in the effect of the first embodiment, the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be reduced.
その理由は、分割属性決定部3233が優先度pと情報損失量ILAとに基づいて評価値を生成し、生成した評価値に基づいて汎化する属性を決定するようにしたからである。
<<<第3の実施形態>>>
次に、本発明の第3の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。 The reason is that the divisionattribute determination unit 3233 generates an evaluation value based on the priority p and the information loss amount ILA, and determines an attribute to be generalized based on the generated evaluation value.
<<< Third Embodiment >>>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
<<<第3の実施形態>>>
次に、本発明の第3の実施形態について図面を参照して詳細に説明する。以下、本実施形態の説明が不明確にならない範囲で、前述の説明と重複する内容については説明を省略する。 The reason is that the division
<<< Third Embodiment >>>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings. Hereinafter, the description overlapping with the above description is omitted as long as the description of the present embodiment is not obscured.
図15は、本実施形態に係る匿名化装置330の構成を示すブロック図である。本実施形態の匿名化装置330は、ボトムアップアプローチにより匿名化を行う。
FIG. 15 is a block diagram showing a configuration of the anonymization device 330 according to the present embodiment. The anonymization apparatus 330 of this embodiment performs anonymization by a bottom-up approach.
図15に示すように、匿名化装置330は、優先度決定情報記憶部321と情報損失量計算部332と匿名化処理部333とを含む。
As illustrated in FIG. 15, the anonymization device 330 includes a priority determination information storage unit 321, an information loss amount calculation unit 332, and an anonymization processing unit 333.
尚、匿名化装置330は、匿名化装置310に替えて、図2に示すシステムに含まれてよい。
Note that the anonymization device 330 may be included in the system shown in FIG. 2 instead of the anonymization device 310.
優先度決定情報記憶部321は、優先度pを決定する情報を記憶する。尚、優先度pを決定する情報は、システムの利用者により予め設定されている。また、優先度pを決定する情報は、分割属性決定部3233が図5に示す通信部706により、予め外部のシステムから受信するようにしてもよい。
The priority determination information storage unit 321 stores information for determining the priority p. Information for determining the priority p is preset by the user of the system. Further, the information for determining the priority p may be received in advance from an external system by the division attribute determining unit 3233 via the communication unit 706 shown in FIG.
図16は、優先度決定情報記憶部321に記憶される優先度決定情報3310の一例を示す図である。図16に示すように、本実施形態の優先度決定情報3310は、優先順位と属性名と閾値との組を1以上含む。優先順位は、例えば、対応する属性名で特定される属性を、汎化する順番を示す。閾値は、例えば、下位の優先順位の属性の情報損失量ILAから上位の優先順位の属性の情報損失量ILAを引いた値が、この閾値を超える場合に、下位の優先順位の属性を汎化する順番を先にする場合の値を示す。
FIG. 16 is a diagram illustrating an example of priority determination information 3310 stored in the priority determination information storage unit 321. As illustrated in FIG. 16, the priority determination information 3310 according to the present embodiment includes one or more sets of priority order, attribute name, and threshold value. The priority order indicates, for example, the order in which the attribute specified by the corresponding attribute name is generalized. For example, when the value obtained by subtracting the information loss amount ILA of the higher priority attribute from the information loss amount ILA of the lower priority attribute exceeds the threshold, the lower priority attribute is generalized. Indicates the value when the order to be performed is first.
図16において、優先順位は、数字が小さいほど上位の優先順位であることを示すものとする。即ち、図16において、上位の優先順位の属性は、属性の名称が「年齢」の属性であり、下位の優先順位の属性は、属性の名称が「2011年度の診療月」の属性である。
In FIG. 16, the priority indicates that the smaller the number, the higher the priority. That is, in FIG. 16, the higher priority attribute is an attribute whose attribute name is “age”, and the lower priority attribute is an attribute whose attribute name is “2011 medical care month”.
尚、優先度決定情報は、優先順位と属性名との組を含むようにしてよい。この場合、匿名化処理部333は、例えば、図示しない内部記憶手段に閾値を保持するようにしてもよい。
Note that the priority determination information may include a set of priority and attribute name. In this case, the anonymization processing unit 333 may hold the threshold value in an internal storage unit (not shown), for example.
情報損失量計算部332は、汎化による情報損失量ILAを算出し、出力する。情報損失量計算部332は、例えば、個人データの属性に含まれる異なる属性値の数を計数し、これを情報損失量ILAとする。
The information loss amount calculation unit 332 calculates and outputs an information loss amount ILA due to generalization. For example, the information loss amount calculation unit 332 counts the number of different attribute values included in the attribute of the personal data, and sets this as the information loss amount ILA.
匿名化処理部333は、汎化属性決定部3333と汎化実行部3336と匿名性評価部3335とを含む。
The anonymization processing unit 333 includes a generalization attribute determination unit 3333, a generalization execution unit 3336, and an anonymity evaluation unit 3335.
汎化属性決定部3333は、汎化する属性を決定する。汎化属性決定部3333は、例えば、次のようにして、汎化する属性を決定する。第一に、汎化属性決定部3333は、優先順位が下位の属性の情報損失量ILAから優先順位が上位の属性の情報損失量ILAを減算して、情報損失量差分を算出する。次に、汎化属性決定部3333は、情報損失量差分とその優先順位が上位の属性の閾値とを比較する。そして、汎化属性決定部3333は、情報損失量差分が閾値以上の場合、下位の優先順位の属性を汎化することを決定する。また、汎化属性決定部3333は、情報損失量差分が閾値未満の場合、上位の優先順位の属性を汎化することを決定する。
The generalization attribute determination unit 3333 determines an attribute to be generalized. For example, the generalization attribute determination unit 3333 determines an attribute to be generalized as follows. First, the generalized attribute determination unit 3333 calculates the information loss amount difference by subtracting the information loss amount ILA of the attribute having the higher priority from the information loss amount ILA of the attribute having the lower priority. Next, the generalization attribute determination unit 3333 compares the information loss amount difference with the threshold value of the attribute with the higher priority. Then, when the information loss amount difference is equal to or greater than the threshold, the generalization attribute determination unit 3333 determines to generalize the lower priority attribute. In addition, when the information loss amount difference is less than the threshold, the generalization attribute determination unit 3333 determines to generalize the attribute with the higher priority.
尚、汎化属性決定部3333による、汎化する属性の決定は、以下のようしてもよい。まず、汎化属性決定部3333において、優先順位が上位の属性の優先度pはその属性の閾値であり、優先順位が下位の属性の優先度pは「0」である。汎化属性決定部3333は、それらの属性のそれぞれについて、評価値=情報損失量ILA+優先度pという評価式を利用して評価値を算出する。次に、汎化属性決定部3333は、評価値の大きい属性を、汎化する属性として決定する。尚、優先順位が上位の属性と優先順位が下位の属性とで評価値が同じである場合、汎化属性決定部3333は、例えばその優先順位が上位の属性を、汎化する属性として決定するようにしてよい。
The generalization attribute determination unit 3333 may determine the generalization attribute as follows. First, in the generalization attribute determination unit 3333, the priority p of the attribute with the higher priority is the threshold value of the attribute, and the priority p of the attribute with the lower priority is “0”. The generalization attribute determination unit 3333 calculates an evaluation value for each of these attributes using an evaluation formula of evaluation value = information loss amount ILA + priority p. Next, the generalization attribute determination unit 3333 determines an attribute having a large evaluation value as an attribute to be generalized. If the evaluation value is the same for the attribute with the higher priority and the attribute with the lower priority, the generalized attribute determination unit 3333 determines, for example, the attribute with the higher priority as the attribute to be generalized. You may do it.
汎化実行部3336は、汎化属性決定部3333が決定したその属性を汎化する。
The generalization execution unit 3336 generalizes the attribute determined by the generalization attribute determination unit 3333.
匿名性評価部3335は、汎化実行部3336が汎化したデータセットが匿名性指標を満たしているか否かを判定する。
The anonymity evaluation unit 3335 determines whether the data set generalized by the generalization execution unit 3336 satisfies the anonymity index.
図17は、本実施形態の個人データ記憶装置100に記憶されるデータセットsp160の一例を示す図である。図17に示すデータセットsp160の各データレコードrp161のそれぞれは、「氏名」、「年齢」、「2011年度診療月」及び「病名」の属性の属性値を含む。尚、本実施形態では、「年齢」と「2011年度診療年月」を匿名化対象の属性(準識別子である)とする。
FIG. 17 is a diagram showing an example of a data set sp160 stored in the personal data storage device 100 of the present embodiment. Each of the data records rp161 of the data set sp160 illustrated in FIG. 17 includes attribute values of attributes of “name”, “age”, “2011 medical care month”, and “disease name”. In the present embodiment, “age” and “2011 medical care date” are set as anonymization attributes (quasi-identifiers).
以上説明した匿名化装置330は、図1に示す匿名化装置310と同様に、図5に示すコンピュータ700によって実現してもよい。
The anonymization device 330 described above may be realized by the computer 700 shown in FIG. 5 similarly to the anonymization device 310 shown in FIG.
図18A、図18Bび図18Cは、本実施形態における匿名化装置330の動作を示すシーケンス図である。
18A, 18B, and 18C are sequence diagrams illustrating the operation of the anonymization device 330 according to the present embodiment.
図18Aにおいて、汎化属性決定部3333は、例えば図5に示す入力部704からのシステムの利用者による匿名化実行要求の入力を、受け付ける(ステップS841)。
18A, the generalization attribute determination unit 3333 receives an input of an anonymization execution request by a system user from the input unit 704 shown in FIG. 5, for example (step S841).
ここで、匿名化実行要求は、例えば、k-匿名化のkの値(例えば、「3」)を含む。
Here, the anonymization execution request includes, for example, the value of k-anonymization k (for example, “3”).
次に、その匿名化実行要求を受信した汎化属性決定部3333は、優先度決定情報記憶部321に優先度決定情報取得要求を送信する(ステップS842)。
Next, the generalization attribute determination unit 3333 that has received the anonymization execution request transmits a priority determination information acquisition request to the priority determination information storage unit 321 (step S842).
次に、その優先度決定情報取得要求の応答として優先度決定情報3310を受信した汎化属性決定部3333は、情報損失量計算部332へ情報損失量計算要求を送信する。(ステップS843)。
Next, the generalized attribute determination unit 3333 that has received the priority determination information 3310 as a response to the priority determination information acquisition request transmits an information loss amount calculation request to the information loss amount calculation unit 332. (Step S843).
次に、その情報損失量計算要求を受信した情報損失量計算部332は、個人データ記憶装置100に個人データ取得要求を送信する。(ステップS844)。
Next, the information loss amount calculation unit 332 that has received the information loss amount calculation request transmits a personal data acquisition request to the personal data storage device 100. (Step S844).
次に、その個人データ取得要求への応答としてデータセットsp160を受信した情報損失量計算部332は、情報損失量ILAを計算し、その計算した情報損失量ILAを汎化属性決定部3333へ送信する(ステップS845)。
Next, the information loss amount calculation unit 332 that has received the data set sp160 as a response to the personal data acquisition request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalized attribute determination unit 3333. (Step S845).
ここで、情報損失量計算部332は、情報損失量ILAを属性の値の種類の数で計算する。即ち、情報損失量計算部332は、「年齢」の属性の属性値の種類が12種類なので、「年齢」の属性の情報損失量ILAbirthを「12」と計算する。また、情報損失量計算部332は、「2011年度診療月」の属性の属性値の種類が10種類なので、「2011年度診療月」の属性の情報損失量ILAmc2011を「10」と計算する。
Here, the information loss amount calculation unit 332 calculates the information loss amount ILA by the number of types of attribute values. That is, the information loss amount calculation unit 332 calculates the information loss amount ILAbirth of the “age” attribute as “12” because the attribute value of the “age” attribute has 12 types. Further, the information loss amount calculation unit 332 calculates the information loss amount ILAmc2011 of the attribute of “2011 medical care month” as “10” because there are ten types of attribute values of the attribute of “2011 medical care month”.
次に、情報損失量ILAを受信した汎化属性決定部3333は、汎化する属性を決定する(ステップS846)。
Next, the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S846).
例えば、汎化属性決定部3333は、受信した優先度決定情報3310を利用して、受信した情報損失量ILAに基づいて、その汎化する属性を決定する。
For example, the generalization attribute determination unit 3333 uses the received priority determination information 3310 to determine the attribute to be generalized based on the received information loss amount ILA.
例えば、汎化属性決定部3333は、優先順位が「2」の属性である「2011年度の診療月」の情報損失量ILAから、優先順位が「1」の属性である「年齢」の情報損失量ILAを減算して、情報損失量差分を算出する。即ち、10-12=-2を算出する。次に、汎化属性決定部3333は、その情報損失量差分と優先順位が「1」の属性である「年齢」の閾値(「3」)とを比較する。この場合、-2<3なので、汎化属性決定部3333は、優先順位が「1」の属性である「年齢」を汎化することを決定する。
For example, the generalization attribute determination unit 3333 determines the information loss of “age” that is the attribute of “1” from the information loss amount ILA of “medical care month of 2011” that is the attribute of “2”. The amount of information loss is calculated by subtracting the amount ILA. That is, 10−12 = −2 is calculated. Next, the generalization attribute determination unit 3333 compares the information loss amount difference with the threshold value of “age” (“3”) that is the attribute having the priority “1”. In this case, since −2 <3, the generalization attribute determination unit 3333 determines to generalize “age”, which is the attribute having the priority “1”.
尚、汎化属性決定部3333は、実施形態1で説明した方法を用いて、汎化する属性を決定してもよい。
Note that the generalization attribute determination unit 3333 may determine the attribute to be generalized using the method described in the first embodiment.
次に、汎化属性決定部3333は、その汎化すると決定した属性の属性名(この場合、「年齢」)を含む汎化実行要求を汎化属性実行部に送信する(ステップS847)。
Next, the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name of the attribute determined to be generalized (in this case, “age”) to the generalization attribute execution unit (step S847).
次に、その汎化実行要求を受信した汎化実行部3336は、図17に示すデータセットsp160を図19に示すデータセットsp162のように汎化する(ステップS848)。
Next, the generalization execution unit 3336 that has received the generalization execution request generalizes the data set sp160 shown in FIG. 17 as the data set sp162 shown in FIG. 19 (step S848).
図19は、本実施形態の匿名化装置330による匿名化処理の途中段階の(一部が汎化された)データセットの例を示す図である。
FIG. 19 is a diagram illustrating an example of a data set in the middle of anonymization processing (partially generalized) by the anonymization device 330 of the present embodiment.
次に、汎化実行部3336は、データセットsp162を含む匿名性評価要求を匿名性評価部3335へ送信する(ステップS849)。
Next, the generalization execution unit 3336 transmits an anonymity evaluation request including the data set sp162 to the anonymity evaluation unit 3335 (step S849).
尚、汎化実行部3336は、データセットsp162を図5に示す記憶部702に格納し、その格納したアドレスを含む匿名性評価要求を匿名性評価部3335へ送信するようにしてもよい。以下における匿名性評価要求についても同様である。
Note that the generalization execution unit 3336 may store the data set sp162 in the storage unit 702 illustrated in FIG. 5 and transmit an anonymity evaluation request including the stored address to the anonymity evaluation unit 3335. The same applies to the anonymity evaluation request below.
次に、その匿名性評価要求を受信した匿名性評価部3335は、データセットsp162の匿名性を評価する。図19のデータセットsp162の場合、匿名性評価部3335は、「診療月」の属性についてk-匿名性のkの値(「3」)を満たしていないと判定する(ステップS850)。
Next, the anonymity evaluation unit 3335 that has received the anonymity evaluation request evaluates the anonymity of the data set sp162. In the case of the data set sp162 of FIG. 19, the anonymity evaluation unit 3335 determines that the value of “k-anonymity” (“3”) is not satisfied for the attribute “medical care month” (step S850).
次に、図18Bにおいて、匿名性評価部3335は、汎化属性決定部3333へ汎化属性決定要求を送信する(ステップS851)。
次に、その汎化属性決定要求を受信した汎化属性決定部3333は、情報損失量計算部332へ情報損失量計算要求を送信する(ステップS852)。 Next, in FIG. 18B, theanonymity evaluation unit 3335 transmits a generalization attribute determination request to the generalization attribute determination unit 3333 (step S851).
Next, the generalizationattribute determination unit 3333 that has received the generalization attribute determination request transmits an information loss amount calculation request to the information loss amount calculation unit 332 (step S852).
次に、その汎化属性決定要求を受信した汎化属性決定部3333は、情報損失量計算部332へ情報損失量計算要求を送信する(ステップS852)。 Next, in FIG. 18B, the
Next, the generalization
次に、その情報損失量計算要求を受信した情報損失量計算部332は、情報損失量ILAを計算し、計算した情報損失量ILAを汎化属性決定部3333へ送信する(ステップS853)。
Next, the information loss amount calculation unit 332 that has received the information loss amount calculation request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalization attribute determination unit 3333 (step S853).
ここで、図19に示すデータセットsp162の場合、「年齢」の属性の属性値の種類は「21~24」、「31~40」、「41~51」及び「52~58」の4種類である。また、「2011年度診療月」の属性の属性値の種類は、10種類である。従って、情報損失量計算部332は、「年齢」及び「2011年度診療月」のそれぞれの属性に対応する情報損失量ILA-birth及び情報損失量ILA-mc2011を、「4」及び「10」と計算する。
Here, in the case of the data set sp162 shown in FIG. 19, the types of attribute values of the attribute of “age” are four types of “21 to 24”, “31 to 40”, “41 to 51”, and “52 to 58”. It is. In addition, there are ten types of attribute values for the attribute “2011 medical care month”. Therefore, the information loss amount calculation unit 332 sets the information loss amount ILA-birth and the information loss amount ILA-mc2011 corresponding to the attributes of “age” and “2011 medical care month” to “4” and “10”, respectively. calculate.
次に、その情報損失量ILAを受信した汎化属性決定部3333は、汎化する属性の決定をする(ステップS854)。
Next, the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S854).
優先順位が「1」の「年齢」の属性の情報損失量ILA-birthは「4」、優先順位が「2」の「2011年度診療月」の属性の情報損失量ILA-mc2011は「10」なので、情報損失量差分は、以下のとおりである。
The information loss amount ILA-birth of the attribute “age” with the priority “1” is “4”, and the information loss amount ILA-mc2011 of the attribute “2011 medical care month” with the priority “2” is “10”. Therefore, the information loss amount difference is as follows.
10-4=6
汎化属性決定部3333は、この情報損失量差分(「6」)と優先順位が「1」の属性である「年齢」の閾値(「3」)とを比較する。この場合、6>3なので、汎化属性決定部3333は、優先順位が「2」の属性である「2011年度診療月」を汎化することを決定する。 10-4 = 6
The generalizationattribute determination unit 3333 compares the difference in information loss amount (“6”) with the threshold value (“3”) of “age” that is the attribute having the priority “1”. In this case, since 6> 3, the generalization attribute determination unit 3333 determines to generalize “2011 medical care month” that is the attribute having the priority “2”.
汎化属性決定部3333は、この情報損失量差分(「6」)と優先順位が「1」の属性である「年齢」の閾値(「3」)とを比較する。この場合、6>3なので、汎化属性決定部3333は、優先順位が「2」の属性である「2011年度診療月」を汎化することを決定する。 10-4 = 6
The generalization
次に、汎化属性決定部3333は、その汎化すると決定した属性名(この場合、「2011年度診療月」)を含む汎化実行要求を汎化属性実行部に送信する(ステップS855)。
Next, the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name determined to be generalized (in this case, “2011 medical care month”) to the generalization attribute execution unit (step S855).
次に、その汎化実行要求を受信した汎化実行部3336は、図19に示すデータセットsp162を図20に示すデータセットsp163のように汎化する(ステップS856)。
Next, the generalization execution unit 3336 that received the generalization execution request generalizes the data set sp162 shown in FIG. 19 to the data set sp163 shown in FIG. 20 (step S856).
図20は、本実施形態の匿名化装置330による匿名化処理の途中段階の(一部が汎化された)データセットの例を示す図である。
FIG. 20 is a diagram illustrating an example of a data set in the middle of the anonymization process (partially generalized) by the anonymization apparatus 330 of the present embodiment.
次に、汎化実行部3336は、データセットsp163を含む匿名性評価要求を匿名性評価部3335へ送信する(ステップS857)
次に、その匿名性評価要求を受信した匿名性評価部3335は、データセットsp163の匿名性を評価する。図20に示すデータセットsp163の場合、匿名性評価部3335は、「診療月」の属性と「2011年度診療月」の属性とを組み合わせた場合について、k-匿名性のkの値(「3」)を満たしていないと判定する(ステップS858)。 Next, thegeneralization execution unit 3336 transmits an anonymity evaluation request including the data set sp163 to the anonymity evaluation unit 3335 (step S857).
Next, theanonymity evaluation part 3335 which received the anonymity evaluation request | requirement evaluates the anonymity of the data set sp163. In the case of the data set sp163 shown in FIG. 20, the anonymity evaluation unit 3335 has k-anonymity k values (“3” for the combination of the “medical care month” attribute and the “2011 medical care month” attribute. ]) Is not satisfied (step S858).
次に、その匿名性評価要求を受信した匿名性評価部3335は、データセットsp163の匿名性を評価する。図20に示すデータセットsp163の場合、匿名性評価部3335は、「診療月」の属性と「2011年度診療月」の属性とを組み合わせた場合について、k-匿名性のkの値(「3」)を満たしていないと判定する(ステップS858)。 Next, the
Next, the
次に、匿名性評価部3335は、汎化属性決定部3333へ汎化属性決定要求を送信する(ステップS859)。
次に、図18Cにおいて、その汎化属性決定要求を受信した汎化属性決定部3333は、情報損失量計算部332へ情報損失量計算要求を送信する(ステップS860)。 Next, theanonymity evaluation unit 3335 transmits a generalization attribute determination request to the generalization attribute determination unit 3333 (step S859).
Next, in FIG. 18C, the generalizationattribute determination unit 3333 that has received the generalization attribute determination request transmits an information loss amount calculation request to the information loss amount calculation unit 332 (step S860).
次に、図18Cにおいて、その汎化属性決定要求を受信した汎化属性決定部3333は、情報損失量計算部332へ情報損失量計算要求を送信する(ステップS860)。 Next, the
Next, in FIG. 18C, the generalization
次に、その情報損失量計算要求を受信した情報損失量計算部332は、情報損失量ILAを計算し、その計算した情報損失量ILAを汎化属性決定部3333へ送信する(ステップS861)
ここで、図20に示すデータセットsp163の場合、「年齢」の属性の属性値の種類は4種類、「2011年度診療月」の属性の属性値の種類は4種類である。従って、情報損失量計算部332は、「年齢」及び「2011年度診療月」のそれぞれの属性に対応する情報損失量ILA-birth及び情報損失量ILA-mc2011を、何れも「4」と計算する。 Next, the information lossamount calculation unit 332 that has received the information loss amount calculation request calculates the information loss amount ILA, and transmits the calculated information loss amount ILA to the generalization attribute determination unit 3333 (step S861).
Here, in the case of the data set sp163 illustrated in FIG. 20, there are four types of attribute values of the attribute “age” and four types of attribute values of the attribute “2011 medical care month”. Therefore, the information lossamount calculation unit 332 calculates the information loss amount ILA-birth and the information loss amount ILA-mc2011 corresponding to the respective attributes of “age” and “2011 medical care month” as “4”. .
ここで、図20に示すデータセットsp163の場合、「年齢」の属性の属性値の種類は4種類、「2011年度診療月」の属性の属性値の種類は4種類である。従って、情報損失量計算部332は、「年齢」及び「2011年度診療月」のそれぞれの属性に対応する情報損失量ILA-birth及び情報損失量ILA-mc2011を、何れも「4」と計算する。 Next, the information loss
Here, in the case of the data set sp163 illustrated in FIG. 20, there are four types of attribute values of the attribute “age” and four types of attribute values of the attribute “2011 medical care month”. Therefore, the information loss
次に、情報損失量ILAを受信した汎化属性決定部3333は、汎化する属性を決定する(ステップS862)。
Next, the generalization attribute determination unit 3333 that has received the information loss amount ILA determines an attribute to be generalized (step S862).
優先順位が「1」の「年齢」の属性の情報損失量ILA-birthは「4」、優先順位が「2」の「2011年度診療月」の属性の情報損失量ILA-mc2011は「4」なので、情報損失量差分は、以下のとおりである。
The information loss amount ILA-birth of the attribute “age” with the priority “1” is “4”, and the information loss amount ILA-mc2011 of the attribute “2011 medical care month” with the priority “2” is “4”. Therefore, the information loss amount difference is as follows.
4-4=0
汎化属性決定部3333は、この情報損失量差分(「0」)と優先順位が「1」の属性である「年齢」の閾値(「3」)とを比較する。この場合、0<3なので、汎化属性決定部3333は、優先順位が「1」の属性である「年齢」を汎化することを決定する。 4-4 = 0
The generalizationattribute determination unit 3333 compares the difference in information loss amount (“0”) with the threshold value (“3”) of “age” that is the attribute having the priority “1”. In this case, since 0 <3, the generalization attribute determination unit 3333 determines to generalize the “age” that is the attribute having the priority “1”.
汎化属性決定部3333は、この情報損失量差分(「0」)と優先順位が「1」の属性である「年齢」の閾値(「3」)とを比較する。この場合、0<3なので、汎化属性決定部3333は、優先順位が「1」の属性である「年齢」を汎化することを決定する。 4-4 = 0
The generalization
次に、汎化属性決定部3333は、汎化すると決定した属性の属性名(この場合、「年齢」)を含む汎化実行要求を、汎化実行部3336に送信する(ステップS863)。
Next, the generalization attribute determination unit 3333 transmits a generalization execution request including the attribute name of the attribute determined to be generalized (in this case, “age”) to the generalization execution unit 3336 (step S863).
次に、その汎化実行要求を受信した汎化実行部3336は、図20に示すデータセットsp163を図21に示すデータセットsp164のように汎化する(ステップS864)。
Next, the generalization execution unit 3336 that received the generalization execution request generalizes the data set sp163 shown in FIG. 20 to the data set sp164 shown in FIG. 21 (step S864).
図21は、本実施形態の匿名化装置330により匿名化処理されたデータセットの例を示す図である。
FIG. 21 is a diagram illustrating an example of a data set that has been anonymized by the anonymization device 330 of the present embodiment.
次に、汎化実行部3336は、データセットsp164を含む匿名性評価要求を匿名性評価部3335へ送信する(ステップS865)
次に、その匿名性評価要求を受信した匿名性評価部3335は、データセットsp164の匿名性を評価する。図21に示すデータセットsp164の場合、匿名性評価部3335は、データセットsp164がk-匿名性を満たしていると判定する(ステップS866)。 Next, thegeneralization execution unit 3336 transmits an anonymity evaluation request including the data set sp164 to the anonymity evaluation unit 3335 (step S865).
Next, theanonymity evaluation part 3335 which received the anonymity evaluation request | requirement evaluates the anonymity of the data set sp164. In the case of the data set sp164 shown in FIG. 21, the anonymity evaluation unit 3335 determines that the data set sp164 satisfies k-anonymity (step S866).
次に、その匿名性評価要求を受信した匿名性評価部3335は、データセットsp164の匿名性を評価する。図21に示すデータセットsp164の場合、匿名性評価部3335は、データセットsp164がk-匿名性を満たしていると判定する(ステップS866)。 Next, the
Next, the
次に、匿名性評価部3335は、その匿名性を満たしたデータセットsp164を、匿名化済個人データ記憶装置200へ送信する(ステップS867)。
Next, the anonymity evaluation unit 3335 transmits the data set sp164 that satisfies the anonymity to the anonymized personal data storage device 200 (step S867).
データセットsp164を受信した匿名化済個人データ記憶部2aは、そのデータセットsp164を匿名化済データセットst120(匿名化済個人データ)として記憶する。(ステップS868)
上述した本実施形態における効果は、第1の実施形態の効果と同様に、利用目的に合致するように制御してデータセットを匿名化することと、匿名化されたデータセットにおける情報の損失を低減することとを両立して可能にできる点である。 The anonymized personal data storage unit 2a that has received the data set sp164 stores the data set sp164 as an anonymized data set st120 (anonymized personal data). (Step S868)
As in the effect of the first embodiment, the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be made compatible with reduction.
上述した本実施形態における効果は、第1の実施形態の効果と同様に、利用目的に合致するように制御してデータセットを匿名化することと、匿名化されたデータセットにおける情報の損失を低減することとを両立して可能にできる点である。 The anonymized personal data storage unit 2a that has received the data set sp164 stores the data set sp164 as an anonymized data set st120 (anonymized personal data). (Step S868)
As in the effect of the first embodiment, the effect in the above-described embodiment is that the data set is anonymized by controlling to match the purpose of use, and the loss of information in the anonymized data set is reduced. It is a point that can be made compatible with reduction.
その理由は、汎化属性決定部3333が優先順位と閾値と情報損失量ILAとに基づいて評価値を生成し、生成した評価値に基づいて汎化する属性を決定するようにしたからである。
The reason is that the generalization attribute determination unit 3333 generates an evaluation value based on the priority order, the threshold value, and the information loss amount ILA, and determines an attribute to be generalized based on the generated evaluation value. .
以上の各実施形態で説明した各構成要素は、必ずしも個々に独立した存在である必要はない。例えば、各構成要素は、複数の構成要素が1個のモジュールとして実現されてよい。また、各構成要素は、1つの構成要素が複数のモジュールで実現されてもよい。また、各構成要素は、ある構成要素が他の構成要素の一部であるような構成であってよい。また、各構成要素は、ある構成要素の一部と他の構成要素の一部とが重複するような構成であってもよい。
Each component described in each of the above embodiments does not necessarily need to be an independent entity. For example, each component may be realized as a module with a plurality of components. In addition, each component may be realized by a plurality of modules. Each component may be configured such that a certain component is a part of another component. Each component may be configured such that a part of a certain component overlaps a part of another component.
以上説明した各実施形態における各構成要素及び各構成要素を実現するモジュールは、必要に応じ、可能であれば、ハードウェア的に実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、コンピュータ及びプログラムで実現されてよい。また、各構成要素及び各構成要素を実現するモジュールは、ハードウェア的なモジュールとコンピュータ及びプログラムとの混在により実現されてもよい。
In the embodiments described above, each component and a module that realizes each component may be realized by hardware if necessary. Moreover, each component and the module which implement | achieves each component may be implement | achieved by a computer and a program. Each component and a module that realizes each component may be realized by mixing hardware modules, computers, and programs.
そのプログラムは、例えば、磁気ディスクや半導体メモリなど、不揮発性のコンピュータ可読記録媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られる。この読み取られたプログラムは、そのコンピュータの動作を制御することにより、そのコンピュータを前述した各実施形態における構成要素として機能させる。
The program is provided by being recorded on a non-volatile computer-readable recording medium such as a magnetic disk or a semiconductor memory, and is read by the computer when the computer is started up. The read program causes the computer to function as a component in each of the above-described embodiments by controlling the operation of the computer.
また、以上説明した各実施形態では、複数の動作をフローチャートの形式で順番に記載してあるが、その記載の順番は複数の動作を実行する順番を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の順番は内容的に支障しない範囲で変更することができる。
In each of the embodiments described above, a plurality of operations are described in order in the form of a flowchart. However, the order of description does not limit the order in which the plurality of operations are executed. For this reason, when each embodiment is implemented, the order of the plurality of operations can be changed within a range that does not hinder the contents.
更に、以上説明した各実施形態では、複数の動作は個々に相違するタイミングで実行されることに限定されない。例えば、ある動作の実行中に他の動作が発生したり、ある動作と他の動作との実行タイミングが部分的に乃至全部において重複していたりしていてもよい。
Furthermore, in each embodiment described above, a plurality of operations are not limited to being executed at different timings. For example, another operation may occur during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
更に、以上説明した各実施形態では、ある動作が他の動作の契機になるように記載しているが、その記載はある動作と他の動作との全ての関係を限定するものではない。このため、各実施形態を実施するときには、その複数の動作の関係は内容的に支障のない範囲で変更することができる。また各構成要素の各動作の具体的な記載は、各構成要素の各動作を限定するものではない。このため、各構成要素の具体的な各動作は、各実施形態を実施する上で機能的、性能的、その他の特性に対して支障をきたさない範囲内で変更されて良い。
Furthermore, in each of the embodiments described above, it is described that a certain operation becomes a trigger for another operation, but the description does not limit all relationships between the certain operation and other operations. For this reason, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents. The specific description of each operation of each component does not limit each operation of each component. For this reason, each specific operation | movement of each component may be changed in the range which does not cause trouble with respect to a functional, performance, and other characteristic in implementing each embodiment.
以上、各実施形態及び実施例を参照して本発明を説明したが、本発明は上記実施形態及び実施例に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解しえる様々な変更をすることができる。
As mentioned above, although this invention was demonstrated with reference to each embodiment and an Example, this invention is not limited to the said embodiment and Example. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。
The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
この出願は、2012年8月20日に出願された日本出願特願2012-181684を基礎とする優先権を主張し、その開示の全てをここに取り込む。
This application claims priority based on Japanese Patent Application No. 2012-181684 filed on August 20, 2012, the entire disclosure of which is incorporated herein.
100 個人データ記憶装置
110 データセットsp
111 データレコードrp
130 データセットsp
140 データセットsp
150 データセットsp
160 データセットsp
161 データレコードrp
162 データセットsp
163 データセットsp
164 データセットsp
200 匿名化済個人データ記憶装置
210 匿名化済データセットsa
211 匿名化済データレコード
310 匿名化装置
312 情報損失量計算部
313 匿名化処理部
320 匿名化装置
321 優先度決定情報記憶部
322 情報損失量計算部
323 匿名化処理部
700 コンピュータ
701 CPU
702 記憶部
703 記憶装置
704 入力部
705 出力部
706 通信部
707 記録媒体
1105 分割値候補
1121 分割値候補
1122 分割値候補
3210 優先度決定情報
3233 分割属性決定部
3234 分割値決定部
3235 匿名性評価部
3236 汎化実行部
1101~1104 分割値候補
1101~1111 分割値候補
1106~1111 分割値候補 100 personal data storage 110 data set sp
111 data record rp
130 Data set sp
140 Data set sp
150 Data set sp
160 Data set sp
161 data record rp
162 Data set sp
163 Data set sp
164 Data set sp
200 Anonymized personal data storage device 210 Anonymized data set sa
211 anonymizeddata record 310 anonymization device 312 information loss amount calculation unit 313 anonymization processing unit 320 anonymization device 321 priority determination information storage unit 322 information loss amount calculation unit 323 anonymization processing unit 700 computer 701 CPU
702Storage unit 703 Storage device 704 Input unit 705 Output unit 706 Communication unit 707 Recording medium 1105 Division value candidate 1121 Division value candidate 1122 Division value candidate 3210 Priority determination information 3233 Division attribute determination unit 3234 Division value determination unit 3235 Anonymity evaluation unit 3236 Generalization execution units 1101 to 1104 Division value candidates 1101 to 1111 Division value candidates 1106 to 1111 Division value candidates
110 データセットsp
111 データレコードrp
130 データセットsp
140 データセットsp
150 データセットsp
160 データセットsp
161 データレコードrp
162 データセットsp
163 データセットsp
164 データセットsp
200 匿名化済個人データ記憶装置
210 匿名化済データセットsa
211 匿名化済データレコード
310 匿名化装置
312 情報損失量計算部
313 匿名化処理部
320 匿名化装置
321 優先度決定情報記憶部
322 情報損失量計算部
323 匿名化処理部
700 コンピュータ
701 CPU
702 記憶部
703 記憶装置
704 入力部
705 出力部
706 通信部
707 記録媒体
1105 分割値候補
1121 分割値候補
1122 分割値候補
3210 優先度決定情報
3233 分割属性決定部
3234 分割値決定部
3235 匿名性評価部
3236 汎化実行部
1101~1104 分割値候補
1101~1111 分割値候補
1106~1111 分割値候補 100 personal data storage 110 data set sp
111 data record rp
130 Data set sp
140 Data set sp
150 Data set sp
160 Data set sp
161 data record rp
162 Data set sp
163 Data set sp
164 Data set sp
200 Anonymized personal data storage device 210 Anonymized data set sa
211 anonymized
702
Claims (12)
- 匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力する情報損失量計算手段と、
前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定し、前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する匿名化処理手段と、
を含む情報処理装置。 Calculating an information loss amount corresponding to each of the attributes included in the first personal data to be anonymized, and outputting the information loss amount calculating means;
A second individual who determines the attribute to be processed based on the priority corresponding to each of the attributes and the amount of information loss, and processes the attribute value of the determined attribute of the first personal data Anonymization processing means for generating and outputting data;
An information processing apparatus including: - 前記優先度は、前記加工した第2の個人データにおける情報の損失を、前記属性のいずれについてより少なくするかを示す
ことを特徴とする請求項1記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the priority indicates which of the attributes causes less information loss in the processed second personal data. - 前記優先度を決定する情報を記憶する優先度決定情報記憶手段を更に含み、
前記匿名化処理手段は、前記優先度を決定する情報に基づいて前記優先度を決定する
ことを特徴とする請求項1または2記載の情報処理装置。 Further comprising priority determination information storage means for storing information for determining the priority;
The information processing apparatus according to claim 1, wherein the anonymization processing unit determines the priority based on information for determining the priority. - 前記匿名化処理手段は、前記情報損失量が一定ならば前記優先度が高いほど、及び前記優先度が一定ならば前記情報損失量が大きいほど演算結果が大きくなるような評価式を利用して評価値を算出し、前記算出した評価値が最大の前記属性を前記加工の対象とする属性として決定する
ことを特徴とする請求項1乃至3のいずれか1項に記載の情報処理装置。 The anonymization processing means uses an evaluation formula such that if the information loss amount is constant, the higher the priority is, and if the priority is constant, the larger the information loss amount is, the larger the calculation result is. The information processing apparatus according to any one of claims 1 to 3, wherein an evaluation value is calculated, and the attribute having the maximum calculated evaluation value is determined as an attribute to be processed. - 前記評価式は、情報損失量と優先度とを乗じる演算を含む
ことを特徴とする請求項4記載の情報処理装置。 The information processing apparatus according to claim 4, wherein the evaluation formula includes an operation of multiplying an information loss amount and a priority. - 前記評価式は、情報損失量と優先度とを加算する演算を含む
ことを特徴とする請求項4記載の情報処理装置。 The information processing apparatus according to claim 4, wherein the evaluation formula includes an operation of adding the information loss amount and the priority. - 前記匿名化処理手段は、
前記第1の個人データが匿名化の対象である前記属性のそれぞれの属性値を同一の値に汎化される場合の、前記情報損失量を算出し、算出した前記情報損失量に基づいて汎化対象である分割属性を決定する分割属性決定手段と、
前記決定した分割属性を軸として前記第1の個人データを分割して前記分割属性の属性値を汎化する場合の前記情報損失量が最小であるように、前記分割属性の分割値を決定する分割値決定手段と、
前記決定した分割値で前記第1の個人データを分割して生成した第3の個人データ及び第4の個人データのそれぞれについて、更なる分割が可能か否かを判定する匿名性評価手段と、
前記匿名化評価手段が更なる分割が可能でないと判定した前記第3の個人データ及び前記第4の個人データの前記分割属性の属性値を汎化し、出力する汎化実行手段と、を含み、
前記分割属性決定手段と前記分割値決定手段とは、前記匿名性評価手段が更なる分割が可能であると判定した前記第3の個人データ及び前記第4の個人データを新たな第1の個人データとして処理する
ことを特徴とする請求項1乃至6のいずれか1項に記載の情報処理装置。 The anonymization processing means is:
The information loss amount is calculated when the first personal data is generalized to the same attribute value of each of the attributes to be anonymized, and based on the calculated information loss amount A split attribute determining means for determining a split attribute to be converted to,
The division value of the division attribute is determined so that the amount of information loss when the first personal data is divided and the attribute value of the division attribute is generalized with the determined division attribute as an axis is minimized. A dividing value determining means;
Anonymity evaluation means for determining whether or not further division is possible for each of the third personal data and the fourth personal data generated by dividing the first personal data with the determined division value;
Generalization execution means for generalizing and outputting the attribute values of the division attributes of the third personal data and the fourth personal data determined that the anonymization evaluation means is not possible to be further divided,
The split attribute determining means and the split value determining means are configured to change the third personal data and the fourth personal data determined by the anonymity evaluation means to be further split into new first individuals. It processes as data. The information processing apparatus of any one of Claim 1 thru | or 6 characterized by the above-mentioned. - 前記匿名化処理手段は、
前記第1の個人データの匿名化の対象である前記属性のそれぞれに対応する前記情報損失量を算出し、算出した前記情報損失量と前記優先度とに基づいて汎化対象である汎化属性を決定する汎化属性決定手段と、
前記第1の個人データに含まれる前記決定した汎化属性の属性値を汎化して、第5の個人データを生成する汎化実行手段と、
前記第5の個人データについて、所定の匿名性を有しているか否かを判定し、前記第5の個人データが所定の匿名性を有していると判定した場合、前記第5の個人データを前記第2の個人データとして、出力する匿名性評価手段と、を含み、
前記汎化属性決定手段及び汎化実行手段は、前記匿名性評価手段が前記第5の個人データが所定の匿名性を有していないと判定した場合、前記第5の個人データを新たな第1の個人データとして処理する
ことを特徴とする請求項1乃至6のいずれか1項に記載の情報処理装置。 The anonymization processing means is:
The information loss amount corresponding to each of the attributes to be anonymized of the first personal data is calculated, and the generalized attribute to be generalized based on the calculated information loss amount and the priority Generalization attribute determination means for determining
Generalization executing means for generating fifth personal data by generalizing the attribute value of the determined generalization attribute included in the first personal data;
When it is determined whether or not the fifth personal data has predetermined anonymity, and it is determined that the fifth personal data has predetermined anonymity, the fifth personal data Anonymity evaluation means for outputting as the second personal data,
The generalization attribute determination unit and the generalization execution unit, when the anonymity evaluation unit determines that the fifth personal data does not have a predetermined anonymity, The information processing apparatus according to any one of claims 1 to 6, wherein the information processing apparatus is processed as one piece of personal data. - コンピュータが、
匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力し、
前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定し、
前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する、
匿名化処理方法。 Computer
Calculate and output the amount of information loss corresponding to each attribute included in the first personal data to be anonymized,
Determine the attribute to be processed based on the priority corresponding to each of the attributes and the amount of information loss,
Generating and outputting second personal data obtained by processing an attribute value of the determined attribute of the first personal data;
Anonymization processing method. - 前記コンピュータが、
前記情報損失量が一定ならば前記優先度が高いほど、及び前記優先度が一定ならば前記情報損失量が大きいほど演算結果が大きくなるような評価式を利用して評価値を算出し、
前記算出した評価値が最大の前記属性を前記加工の対象とする属性として決定する
ことを特徴とする請求項9記載の匿名化処理方法。 The computer is
If the information loss amount is constant, the higher the priority is, and if the priority is constant, the evaluation value is calculated using an evaluation formula such that the larger the information loss amount is, the larger the calculation result is.
The anonymization processing method according to claim 9, wherein the attribute having the maximum calculated evaluation value is determined as an attribute to be processed. - 匿名化対象の第1の個人データに含まれる属性のそれぞれに対応する情報損失量を算出し、出力する処理と、
前記属性のそれぞれに対応する優先度と前記情報損失量とに基づいて加工の対象とする前記属性を決定する処理と、
前記第1の個人データの前記決定した属性の属性値を加工した第2の個人データを生成し、出力する処理と、をコンピュータに実行させる
プログラムを記録した不揮発性記録媒体。 A process of calculating and outputting an information loss amount corresponding to each attribute included in the first personal data to be anonymized;
Processing for determining the attribute to be processed based on the priority corresponding to each of the attributes and the information loss amount;
A non-volatile recording medium storing a program for causing a computer to generate and output second personal data obtained by processing the attribute value of the determined attribute of the first personal data. - 前記属性を決定する処理は、
前記情報損失量が一定ならば前記優先度が高いほど、及び前記優先度が一定ならば前記情報損失量が大きいほど演算結果が大きくなるような評価式を利用して評価値を算出する処理と、
前記算出した評価値が最大の前記属性を前記加工の対象とする属性として決定する処理と、を含む
ことを特徴とする請求項11記載のプログラムを記録した不揮発性記録媒体。 The process for determining the attribute includes:
A process of calculating an evaluation value using an evaluation formula such that if the information loss amount is constant, the higher the priority is; and if the priority is constant, the information loss amount is larger, the calculation result is larger. ,
The process according to claim 11, further comprising: determining the attribute having the maximum calculated evaluation value as an attribute to be processed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014531490A JPWO2014030302A1 (en) | 2012-08-20 | 2013-07-31 | Information processing apparatus and anonymization processing method for performing anonymization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012181684 | 2012-08-20 | ||
JP2012-181684 | 2012-08-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014030302A1 true WO2014030302A1 (en) | 2014-02-27 |
Family
ID=50149634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/004624 WO2014030302A1 (en) | 2012-08-20 | 2013-07-31 | Information processing device for executing anonymization and anonymization processing method |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2014030302A1 (en) |
WO (1) | WO2014030302A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017156878A (en) * | 2016-02-29 | 2017-09-07 | 富士通株式会社 | Leakage risk distribution device, leakage risk distribution method and leakage risk distribution program |
EP3522056A1 (en) * | 2018-02-06 | 2019-08-07 | Nokia Technologies Oy | Distributed computing system for anonymized computation |
JP2020181487A (en) * | 2019-04-26 | 2020-11-05 | 株式会社日立製作所 | Anonymous processing system, anonymous processing program, and anonymous processing method |
JP2021082043A (en) * | 2019-11-20 | 2021-05-27 | 株式会社日立製作所 | Anonymous processing system, anonymous processing program, and anonymous processing method |
US20230418977A1 (en) * | 2022-06-28 | 2023-12-28 | Here Global B.V. | Method, apparatus, and computer program product for estimating the privacy risk of anonymized trajectory data |
US12125054B2 (en) | 2019-09-25 | 2024-10-22 | Valideck International Corporation | System, devices, and methods for acquiring and verifying online information |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011145401A1 (en) * | 2010-05-19 | 2011-11-24 | 株式会社日立製作所 | Identity information de-identification device |
JP2012003440A (en) * | 2010-06-16 | 2012-01-05 | Kddi Corp | Apparatus, method and program for protecting privacy of public information |
JP2012022315A (en) * | 2010-07-02 | 2012-02-02 | Nec (China) Co Ltd | Method and device for anonymizing data |
-
2013
- 2013-07-31 WO PCT/JP2013/004624 patent/WO2014030302A1/en active Application Filing
- 2013-07-31 JP JP2014531490A patent/JPWO2014030302A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011145401A1 (en) * | 2010-05-19 | 2011-11-24 | 株式会社日立製作所 | Identity information de-identification device |
JP2012003440A (en) * | 2010-06-16 | 2012-01-05 | Kddi Corp | Apparatus, method and program for protecting privacy of public information |
JP2012022315A (en) * | 2010-07-02 | 2012-02-02 | Nec (China) Co Ltd | Method and device for anonymizing data |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017156878A (en) * | 2016-02-29 | 2017-09-07 | 富士通株式会社 | Leakage risk distribution device, leakage risk distribution method and leakage risk distribution program |
EP3522056A1 (en) * | 2018-02-06 | 2019-08-07 | Nokia Technologies Oy | Distributed computing system for anonymized computation |
JP2020181487A (en) * | 2019-04-26 | 2020-11-05 | 株式会社日立製作所 | Anonymous processing system, anonymous processing program, and anonymous processing method |
JP7242407B2 (en) | 2019-04-26 | 2023-03-20 | 株式会社日立製作所 | Anonymous Processing System, Anonymous Processing Program and Anonymous Processing Method |
US12125054B2 (en) | 2019-09-25 | 2024-10-22 | Valideck International Corporation | System, devices, and methods for acquiring and verifying online information |
JP2021082043A (en) * | 2019-11-20 | 2021-05-27 | 株式会社日立製作所 | Anonymous processing system, anonymous processing program, and anonymous processing method |
JP7257938B2 (en) | 2019-11-20 | 2023-04-14 | 株式会社日立製作所 | Anonymous Processing System, Anonymous Processing Program and Anonymous Processing Method |
US20230418977A1 (en) * | 2022-06-28 | 2023-12-28 | Here Global B.V. | Method, apparatus, and computer program product for estimating the privacy risk of anonymized trajectory data |
Also Published As
Publication number | Publication date |
---|---|
JPWO2014030302A1 (en) | 2016-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9230132B2 (en) | Anonymization for data having a relational part and sequential part | |
US11748517B2 (en) | Smart de-identification using date jittering | |
Arellano et al. | Privacy policy and technology in biomedical data science | |
JP6007969B2 (en) | Anonymization device and anonymization method | |
WO2014030302A1 (en) | Information processing device for executing anonymization and anonymization processing method | |
US20210240853A1 (en) | De-identification of protected information | |
JP6015658B2 (en) | Anonymization device and anonymization method | |
EP2793162A1 (en) | Anonymization device, anonymization method, and computer program | |
US20210334455A1 (en) | Utility-preserving text de-identification with privacy guarantees | |
US20210165913A1 (en) | Controlling access to de-identified data sets based on a risk of re- identification | |
JP2013200659A (en) | Attribute selection device, information anonymity device, attribute selection method, information anonymity method, attribute selection program, and information anonymity program | |
JP6471699B2 (en) | Information determination apparatus, information determination method, and program | |
CN109983467B (en) | System and method for anonymizing data sets | |
US20160306999A1 (en) | Systems, methods, and computer-readable media for de-identifying information | |
JP5782636B2 (en) | Information anonymization system, information loss determination method, and information loss determination program | |
US10657273B2 (en) | Systems and methods for automatic and customizable data minimization of electronic data stores | |
Cavoukian et al. | Start with privacy by design in all big data applications | |
Vardalachakis et al. | ShinyAnonymizer: A Tool for Anonymizing Health Data. | |
Lachner et al. | Context-aware enforcement of privacy policies in edge computing | |
JP5839460B2 (en) | Public information privacy protection device, public information privacy protection method and program | |
Chen et al. | Architecture and building the medical image anonymization service: cloud, big data and automation | |
WO2013183250A1 (en) | Information processing device for anonymization and anonymization method | |
Mazumder et al. | A single-center prospective observational study evaluating telemedicine for kidney transplant patients in the coronavirus disease-19 pandemic: breaking the access barrier | |
WO2013190810A1 (en) | Information processing device and information anonymizing method | |
JP5875535B2 (en) | Anonymization device, anonymization method, program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13831515 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014531490 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13831515 Country of ref document: EP Kind code of ref document: A1 |