WO2020235008A1 - Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system - Google Patents

Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system Download PDF

Info

Publication number
WO2020235008A1
WO2020235008A1 PCT/JP2019/020137 JP2019020137W WO2020235008A1 WO 2020235008 A1 WO2020235008 A1 WO 2020235008A1 JP 2019020137 W JP2019020137 W JP 2019020137W WO 2020235008 A1 WO2020235008 A1 WO 2020235008A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
unit
personal information
anonymization method
anonymization
Prior art date
Application number
PCT/JP2019/020137
Other languages
French (fr)
Japanese (ja)
Inventor
充洋 服部
貴人 平野
りな 清水
泰興 飯田
史生 大松
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2019/020137 priority Critical patent/WO2020235008A1/en
Priority to JP2019550273A priority patent/JP6695511B1/en
Publication of WO2020235008A1 publication Critical patent/WO2020235008A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • the present invention relates to an anonymization method derivation device, anonymization method derivation method, anonymization method derivation program, and anonymization method derivation system.
  • Anonymization technology that converts personal information into anonymously processed information is known as a technology for achieving both protection and utilization of personal information.
  • Providing personal information from a business operator that has certain personal information (hereinafter referred to as the provider) to a business operator that does not have the personal information (hereinafter referred to as the provider) may infringe on the rights and interests of the individual.
  • the provider can utilize the personal information while protecting the rights and interests of the individual.
  • the provider needs to determine the anonymization method to be applied and its parameters when anonymizing personal information.
  • Appropriate anonymization method and parameters should be determined depending on the content of personal information possessed by the provider and the content of data analysis to be performed by the provider, and should be determined by the provider or the provider. The destination cannot be decided independently.
  • Patent Document 1 The information processing system disclosed in Patent Document 1 is composed of a provider device, a user device, and an information processing device.
  • the provider device provides the information processing device with data for policy calculation,
  • the information processing device determines the policy based on the policy calculation data and notifies the provider device.
  • the policy calculation data is anonymously processed information that is data that obscures personal information. Therefore, there is a problem that the provider must independently determine the anonymization method for calculating the policy calculation data and its parameters.
  • the present invention generates highly safe and useful anonymously processed information by anonymizing personal information by using an anonymization method derived based on the personal information of the provider and the analysis command of the provider. It is an object of the present invention to provide a device capable of performing.
  • the anonymization method derivation device of the present invention The personal information storage unit that stores personal information and An analysis command storage unit that stores an analysis command that analyzes the personal information stored in the personal information storage unit, and an analysis command storage unit that stores the analysis command that analyzes the personal information.
  • An anonymization method derivation unit that derives an anonymization method for anonymizing the personal information based on the analysis command stored in the analysis command storage unit. It is provided with an anonymization processing unit that generates anonymized processing information in which the personal information is anonymized by using the anonymization method derived by the anonymization method derivation unit.
  • the anonymization method derivation device of the present invention it is safe and useful by anonymizing the personal information by using the anonymization method derived based on the personal information of the provider and the analysis command of the provider. It is possible to generate highly anonymous processed information.
  • FIG. The hardware configuration diagram of the anonymization method derivation device 100 according to the first embodiment.
  • the flowchart which shows the operation of the registration phase 51 which concerns on Embodiment 1.
  • the flowchart which shows the operation of the analysis phase 52 which concerns on Embodiment 1.
  • FIG. An example of input / output of analysis phase 52 when using Python.
  • the flowchart which shows the operation of the derivation phase 53 which concerns on Embodiment 1.
  • the flowchart which shows a part of the operation of the analysis phase 52 which concerns on Embodiment 1.
  • the flowchart which shows a part of the operation of the analysis phase 52 which concerns on Embodiment 1.
  • FIG. 1 The block diagram of the anonymization method derivation apparatus 100 which concerns on Embodiment 2 and the synthetic data generation apparatus 200 which concerns on Embodiment 2.
  • FIG. 1 The flowchart which shows a part of the operation of the registration phase 51 which concerns on Embodiment 2.
  • the anonymization method derivation device 100 determines an appropriate anonymization method and its parameters based on the personal information possessed by the provider 1 and the analysis command of the provider 2.
  • An analysis command is a part or all of a command that causes a computer or the like to analyze personal information. Specific examples of the analysis command are a character string written in an interpreted language and an executable file.
  • the analysis command is also a command for analyzing the personal information stored in the personal information storage unit 111.
  • analysis of personal information includes analysis of information in which personal information is anonymized.
  • FIG. 1 is a diagram showing a configuration example of an anonymization method derivation device 100 according to the present embodiment.
  • the arrow in the figure indicates that data can flow in the direction of the arrow while the anonymization method derivation device 100 or the anonymization method derivation system is being executed.
  • Provider 1 A device that provides personal information, etc. It may be a business operator or the like that provides personal information.
  • the means by which the provider 1 provides the personal information to the anonymization method derivation device 100 may be arbitrary.
  • Provider 2 is A device that receives anonymously processed information that is anonymized personal information. It may be a business operator or the like that analyzes personal information. The means by which the provider 2 receives personal information from the anonymization method derivation device 100 may be arbitrary.
  • the anonymization method derivation device 100 is a device that determines an appropriate anonymization method when converting personal information (personal data) into anonymously processed information.
  • the anonymization method typically comprises an anonymization method and its parameters, and may be the anonymization method itself.
  • the anonymization method is a k-anonymization method or an ⁇ -differential privacy method.
  • the anonymization method derivation device 100 is composed of the components shown in FIG.
  • the personal information input unit 101 Accepting the input of personal information from provider 1
  • the input personal information is stored in the personal information storage unit 111.
  • the means by which the provider 1 inputs personal information to the anonymization method derivation device 100 may be arbitrary.
  • the composite data generation unit 102 Synthetic data 61 is generated from the input personal information, The composite data 61 is stored in the composite data storage unit 103.
  • Synthetic data (synthetic data) is data generated based on personal information, and is data generated so that the statistical properties are equivalent to the personal information of the generation source.
  • the synthetic data generation unit 102 generates synthetic data obtained by processing the personal information stored in the personal information storage unit 111.
  • the composite data storage unit 103 can hold the composite data 61.
  • the analysis command input unit 104 Accepts input of analysis command to analyze personal information from provider 2 and accepts The input analysis command is stored in the analysis command storage unit 112.
  • the means by which the provider 2 inputs the analysis command to the anonymization method derivation device 100 may be arbitrary.
  • the analysis command execution unit 105 executes the analysis command stored in the analysis command storage unit 112.
  • the analysis command execution unit 105 may execute the analysis command using the composite data 61 generated by the composite data generation unit 102.
  • the execution result output unit 106 outputs the execution result of the analysis command execution unit 105, the information of the anonymization method derived by the anonymization method derivation unit 108, and the anonymization processing information generated by the anonymization processing unit 109.
  • Analysis content analysis unit 107 Analyze the analysis content of the data that the provider 2 is going to carry out, Output the analysis information that is the analysis result. That is, the analysis content analysis unit 107 analyzes the analysis command and outputs the analysis information. In addition, the analysis content analysis unit 107 As the analysis information, the personal information used when the analysis command is executed and the operation to be performed on the personal information when the analysis command is executed may be output. The analysis command may be analyzed based on the execution information when the analysis command execution unit 105 executes the analysis command.
  • the analysis content includes personal information used when executing an analysis command, operations performed on personal information when executing an analysis command, and the like.
  • the operation performed on the personal information is an operation performed by using at least a part of the personal information or the processed information of the personal information. Execution information is information related to command execution.
  • the anonymization method derivation unit 108 derives an appropriate anonymization method and its parameters based on the analysis information output by the analysis content analysis unit 107.
  • the anonymization method derivation unit 108 Anonymization method for anonymizing personal information may be derived based on the analysis command stored in the analysis command storage unit 112.
  • Anonymization method Anonymization method may be derived based on the anonymization method stored in the storage unit 110.
  • Anonymization methods may be derived based on the analysis information.
  • the anonymization processing unit 109 generates anonymity processing information based on the personal information stored in the personal information storage unit 111 and the anonymization method derived by the anonymization method derivation unit 108. That is, the anonymization processing unit 109 uses the anonymization method derived by the anonymization method derivation unit 108 to generate anonymized processing information in which personal information is anonymized. In addition, the anonymous processing unit 109 may evaluate the safety and usefulness of the anonymous processing information.
  • the anonymization method storage unit 110 stores a database in which various anonymization methods and examples of their parameters are summarized.
  • the anonymization method is typically a program that implements the anonymization method. Therefore, the anonymization method storage unit 110 stores a program that realizes an anonymization method for anonymizing personal information.
  • the personal information storage unit 111 can hold personal information.
  • the personal information storage unit 111 stores the personal information.
  • the analysis command storage unit 112 can hold the analysis command.
  • the analysis command storage unit 112 stores the analysis command for analyzing the personal information stored in the personal information storage unit 111.
  • FIG. 2 is a diagram showing a hardware configuration example of the anonymization method derivation device 100 according to the present embodiment.
  • the anonymization method derivation device 100 is composed of a general computer.
  • the display 21, the keyboard 22, and the mouse 23 are for the provider 1 to operate the anonymization method derivation device 100.
  • the display 24, the keyboard 25, and the mouse 26 are for the provider 2 to operate the anonymization method derivation device 100.
  • the synthetic data generation unit 102, the analysis command execution unit 105, the analysis content analysis unit 107, and the anonymous processing unit 109 are composed of a processor 11 and a memory 12.
  • the synthetic data storage unit 103, the personal information storage unit 111, and the analysis command storage unit 112 are composed of a memory 12.
  • the personal information input unit 101, the analysis command input unit 104, the execution result output unit 106, and the anonymization method derivation unit 108 are composed of a processor 11, a memory 12, and a port 14.
  • the anonymization method storage unit 110 is composed of a storage device 13.
  • the processor 11 is connected to other hardware via the data bus 15 (signal line) and controls these other hardware.
  • the storage device 13 stores the anonymization method derivation program.
  • the processor 11 is a processing device that executes a program, an OS (Operating System), and the like.
  • the processing device is sometimes called an IC (Integrated Circuit), and the processor 11 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and a GPU (Graphics Processing Unit).
  • the processor 11 reads and executes the program stored in the memory 12.
  • the computer 10 in this figure includes only one processor 11, but the computer 10 may include a plurality of processors that replace the processor 11. These plurality of processors share the execution of programs and the like.
  • the memory 12 is a storage device that temporarily stores data, and functions as a main memory used as a work area of the processor 11.
  • the memory 12 is a RAM (Random Access Memory) such as a SRAM (Static Random Access Memory) or a DRAM (Dynamic Random Access Memory).
  • the memory 12 holds the calculation result of the processor 11.
  • the storage device 13 is a storage device that stores data in a non-volatile manner, and stores the OS, each program executed by the processor 11, data used when executing each program, and the like. Specific examples of the storage device 13 are an HDD (Hard Disk Drive) and an SSD (Solid State Drive).
  • the storage device 13 includes a memory card, SD (Secure Digital, registered trademark) memory card, CF (Compact Flash), NAND flash, flexible disk, optical disk, compact disk, Blu-ray (registered trademark) disk, and DVD (Digital entirely Disk). ) Etc. may be a portable recording medium.
  • the port 14 is an interface for communicating with an external device or the like.
  • the port 14 is a port of Ethernet (registered trademark) or USB (Universal Serial Bus).
  • the port 14 may be a plurality of ports.
  • the personal information input unit 101 stores in the memory 12 the personal information input to the anonymization method derivation device 100 by the provider 1 using any one or more of the display 21, the keyboard 22, and the mouse 23.
  • the composite data generation unit 102 generates the composite data 61 using the processor 11 based on the personal information stored in the memory 12, and stores it in the memory 12.
  • the analysis command input unit 104 stores in the memory 12 an analysis command input by the provider 2 using any one or more of the display 24, the keyboard 25, and the mouse 26.
  • the analysis command execution unit 105 extracts the composite data 61 from the memory 12, executes the analysis command using the processor 11, and stores the execution result in the memory 12.
  • the execution result output unit 106 outputs the execution result stored in the memory 12 to the outside.
  • the analysis content analysis unit 107 analyzes the analysis content using the processor 11 from the content of the analysis command stored in the memory 12, and stores the analysis result in the memory 12.
  • the anonymization method derivation unit 108 derives the anonymization method from the analysis result stored in the memory 12 by using the processor 11, and stores the derivation result in the memory 12. In addition, the anonymization method derivation unit 108 may be used as necessary.
  • the anonymization method and the parameters are read from the storage device 13, The derivation result is stored in the storage device 13.
  • the hardware configuration shown in FIG. 2 is the most basic example, and the hardware configuration of the anonymization method derivation device 100 may be another configuration. As a specific example, the configuration shown in FIG. 2 may be virtually constructed on a general computer. Further, the provider 1 and / or the provider 2 is a computer different from the anonymization method derivation device 100, so that the provider 1 and / or the provider 2 can operate the anonymization method derivation device 100 by remote connection. It may be.
  • the operation procedure of the anonymization method derivation device 100 corresponds to the anonymization method derivation method. Further, the program that realizes the operation of the anonymization method derivation device 100 corresponds to the anonymization method derivation program.
  • FIG. 3 is an example of a flowchart showing the operation of the registration phase 51.
  • the order of processing shown in this flowchart may be changed as appropriate.
  • the registration phase 51 corresponds to a process from the time when the provider 1 inputs personal information to the anonymization method derivation device 100 until the synthetic data generation unit 102 stores the synthetic data 61 in the synthetic data storage unit 103.
  • Step S301 Input reception process
  • the personal information input unit 101 Accepting the input of personal information from provider 1
  • the received personal information is stored in the personal information storage unit 111.
  • the input method may be any method such as a method using the keyboard 22, a method of inputting from a medium, a method via a network, or the like, in which the personal information input unit 101 can recognize the input information.
  • Step S302 Synthetic data generation process
  • the composite data generation unit 102 Synthetic data 61 is generated from the input personal information,
  • the composite data 61 is stored in the composite data storage unit 103.
  • the method for generating the composite data 61 may be any method for generating anonymous data while maintaining the statistical properties of the original personal information.
  • a specific example of the method of generating the synthetic data 61 is given in Reference 1.
  • FIG. 4 is an example of a flowchart showing the operation of the analysis phase 52.
  • the order of processing shown in this flowchart may be changed as appropriate.
  • the analysis phase 52 corresponds to the process from the input of the analysis command to the anonymization method derivation device 100 by the provider 2 to the output of the execution result by the execution result output unit 106.
  • Step S401 Composite data reading process
  • the analysis command execution unit 105 reads the composite data 61 from the composite data storage unit 103.
  • Step S402 Analysis command reception process
  • the analysis command input unit 104 Accepts input of analysis command from provider 2 and accepts
  • the received analysis command is stored in the analysis command storage unit 112.
  • the input method may be any method that can be recognized by the analysis command input unit 104, such as a method using the keyboard 25, a method of inputting from a medium, or a method via a network.
  • Step S403 Analysis command execution process
  • the analysis command execution unit 105 executes the analysis command stored in the analysis command storage unit 112 with respect to the composite data 61.
  • Step S404 Execution result output processing
  • the execution result output unit 106 outputs the execution result of the analysis command execution unit 105. However, when the analysis command execution unit 105 executes an analysis command that does not request the output of the execution result, the execution result output unit 106 does not output the execution result.
  • the output method may be any method that can be recognized by the provider 2, such as a method of outputting to the display 24 or a method of outputting via a network.
  • Step S405 Analysis command confirmation process
  • the analysis command input unit 104 confirms whether the provider 2 has input a new analysis command.
  • the anonymization method derivation device 100 If the provider 2 inputs a new analysis command, the process proceeds to step S402. Otherwise, the process of analysis phase 52 ends.
  • FIG. 5 shows an example of input / output in the analysis phase 52 when the provider 2 inputs an analysis command by the programming language Python.
  • the input / output is the input of the analysis command and the output of the execution result.
  • Lines 501 to 503 are analysis commands input by the provider 2.
  • the analysis command execution unit 105 executes these analysis commands and holds the execution result. However, since these analysis commands do not request output, the execution result output unit 106 does not output the execution result.
  • the execution result output unit 106 outputs the execution result as in line 505.
  • the anonymization method derivation device 100 repeats the process as shown in FIG. 5 in the analysis phase 52.
  • the analysis command may be in any programming language.
  • FIG. 6 is an example of a flowchart showing the operation of the derivation phase 53.
  • the order of processing shown in this flowchart may be changed as appropriate.
  • the derivation phase 53 corresponds to the process from the analysis content analysis unit 107 analyzing the analysis content in the analysis command execution unit 105 to the anonymization method derivation unit 108 deriving the anonymization method.
  • Step S601 Analysis command read process
  • the analysis content analysis unit 107 reads a sequence of analysis commands from the analysis command storage unit 112.
  • a series of analysis commands is a group of analysis commands that have some meaning.
  • Step S602 Analysis content estimation process
  • the analysis content analysis unit 107 analyzes the analysis content from the series of analysis commands and outputs the analysis result.
  • the analysis content analysis unit 107 estimates the analysis content by analyzing the analysis content. This estimation method will be described later. Since the analysis result of the analysis content analysis unit 107 is an estimation of the analysis content, it may be partially different from the actual analysis content.
  • Step S603 Derivation process
  • the anonymization method derivation unit 108 derives the anonymization method based on the analysis result of step S602, the anonymization method stored in the anonymization method storage unit 110, and its parameters. This derivation method will be described later.
  • Step S604 Output processing
  • the execution result output unit 106 outputs the anonymization method derived in step S603.
  • FIG. 7 is an example of a series of analysis commands described by Python, and three commands are extracted from the series of analysis commands.
  • the 21st analysis command is The number of days elapsed from 2010/1/1 is calculated for each element of the InvoiceDate column of the data frame stored in the variable B. Calculate the average value for all the calculated values, It means that the average value is stored in the date_ave column of the variable B.
  • the 22nd analysis command is Calculate the number of days elapsed from 2010/1/1 for each element in the InvoiceDate column of variable B. Calculate the standard deviation for all the calculated values It means that the standard deviation is stored in the date_std column of the variable B.
  • the 23rd analysis command is When creating a histogram consisting of 10 intervals based on all the elements of the date column of variable B, the boundary of each bin of the histogram and the frequency of each bin are calculated. It means that the calculation result is stored in the hist column of the variable B.
  • Interpreting the meaning of an analysis command is what the processor 11 does when executing an analysis command, which is a common technique in a programming language called an interpreted language.
  • the analysis content analysis unit 107 utilizes the above-mentioned processing of the processor 11 when estimating the analysis content from the sequence of analysis commands. That is, the analysis content analysis unit 107 analyzes the "calculation target" and the "calculation content” of the content interpreted by the processor 11 over the entire series of analysis commands, thereby calculating the calculation target of the provider 2. Estimate the result you are trying to achieve. For the 23rd analysis command The calculation target is all the elements of the date column of the variable B. The calculation content is to calculate the boundary of each bin of the histogram and the frequency of each bin.
  • FIG. 7 as a specific example, it is estimated that the provider 2 is trying to calculate the frequency distribution of the number of days elapsed from 2010/1/1 in the InvoiceDate column.
  • analysis commands generally continue for a long time when data analysis is performed.
  • the analysis content analysis unit 107 can estimate what the provider 2 is trying to calculate and what kind of result is to be calculated by analyzing the entire sequence of analysis commands by the above method.
  • FIG. 8 is an example of a flowchart showing an operation in which the analysis content analysis unit 107 estimates the analysis content from a series of analysis commands. The order of processing shown in this flowchart may be changed as appropriate.
  • Step S8011 Analysis command interpretation process
  • the analysis content analysis unit 107 interprets the analysis command.
  • the analysis content analysis unit 107 may use a technique generally used in a programming language called an interpreted language for interpreting analysis commands.
  • Analysis content analysis unit 107 Continue the process of this step until the interpretation of the analysis command is completed. When the interpretation of the analysis command is completed, the process proceeds to step S802.
  • Step S802 Analysis command analysis process
  • the analysis content analysis unit 107 analyzes the "calculation target" and the "calculation content” of the interpreted contents over the entire series of analysis commands.
  • the specific method of analysis depends on the programming language.
  • the analysis content analysis unit 107 From the result of the interpretation of the analysis command, the "calculation target" is analyzed as a specific column of the data frame stored in the variable B, and it is analyzed. From the result of the interpretation of the analysis command, it is analyzed that the "calculation content” is to perform the calculation using the mean value calculation function of the numerical calculation library called numpy, the standard deviation calculation function, and the frequency distribution calculation function. From the analysis result, it is estimated that the provider 2 is trying to calculate the frequency distribution of the number of days elapsed from 2010/1/1 of the InvoiceDate column.
  • Step S803 Analysis information transmission process
  • the analysis content analysis unit 107 transmits the analyzed "calculation target" and the “calculation content” to the anonymization method derivation unit 108.
  • FIG. 9 is an example of a flowchart showing a procedure for deriving an anonymization method from the estimation result of the analysis content. The order of processing shown in this flowchart may be changed as appropriate. Using this figure, a method of deriving an anonymization method from the estimation result of the analysis content will be described.
  • Step S9011 Analysis result reception processing
  • the anonymization method derivation unit 108 receives the “calculation target” and the “calculation content” from the analysis content analysis unit 107.
  • Step S902 Derivation process
  • the anonymization method derivation unit 108 derives the anonymization method by reading the received "calculation target", the anonymization method corresponding to the "calculation content", and its parameters from the anonymization method storage unit 110. To do.
  • the anonymization method corresponding to the "calculation target" and the “calculation content” and its parameters the “calculation content” is saved.
  • Step S903 Personal information reading process
  • the anonymous processing unit 109 reads out personal information from the personal information storage unit 111.
  • Step S904 Anonymous processing
  • Anonymized processed information is generated by applying the anonymization method to the personal information read in step S903.
  • Step S905 Safety evaluation process
  • the anonymous processing unit 109 evaluates whether the safety and usefulness of the generated anonymous processing information both meet the criteria.
  • the anonymization method derivation device 100 If both meet the criteria, proceed to step S906. Otherwise, the process proceeds to step S902.
  • the anonymous processing unit 109 may evaluate whether the safety and the usefulness meet the criteria by any method.
  • the evaluation method of safety and usefulness may depend on the anonymization method or may be independent of the anonymization method.
  • Anonymous processing unit 109 When the evaluation method depends on the anonymization method, when reading the anonymization method and its parameters from the anonymization method storage unit 110, the evaluation method of safety and usefulness may be read out. , If it is independent of the anonymization method, an external database may be referred to.
  • the anonymization method storage unit 110 may store the evaluation method of safety and usefulness for each combination of the anonymization method and its parameters.
  • Step S906 Output processing
  • the execution result output unit 106 outputs the anonymization method and the anonymization processing information.
  • the output method may be any method that can be recognized by the provider 1, such as output to the display 21 or output via the network.
  • Embodiment 1 *** Explanation of the effect of Embodiment 1 *** As described above, according to the present embodiment, the synthetic data obtained by processing the personal information of the provider 1 and the appropriate anonymization method and its parameters according to the analysis content of the analysis command of the provider 2 are obtained. Can be decided. Further, according to the present embodiment, Because it is possible to generate anonymously processed information that meets the criteria of safety and usefulness, Anonymously processed information that protects the rights and interests of individuals and is suitable for analysis can be provided to the provider 2.
  • the provider 1 and the anonymization method derivation device 100 may be integrated.
  • the personal information input unit 101 includes a processor 11 and a memory 12.
  • the anonymization method derivation device 100 does not have to include the anonymization method storage unit 110.
  • the anonymization method derivation unit 108 derives the anonymization method by referring to an external database or the like.
  • the database stored in the anonymization method storage unit 110 may be prepared in advance by the provider 1 or the provider 2.
  • the anonymization method derivation device 100 stores the database prepared by the provider 1 or the provider 2 in the anonymization method storage unit 110 before generating the anonymized processing information.
  • At least one of the synthetic data storage unit 103, the personal information storage unit 111, and the analysis command storage unit 112 may be composed of the memory 12 and the storage device 13.
  • the anonymization method derivation device 100 does not have to include the analysis command execution unit 105.
  • the analysis content analysis unit 107 analyzes the analysis command based on the data of the analysis command.
  • the anonymization method derivation unit 108 does not have to output the anonymization processing information and the anonymization method to the provider 1.
  • the anonymous processing unit 109 may output the evaluation result regarding safety and usefulness to the provider 1.
  • the analysis command execution unit 105 may execute the analysis command using data other than the composite data in addition to the composite data.
  • the anonymization method derivation device 100 can determine an appropriate anonymization method based on an environment closer to the actual data analysis use case.
  • the execution result output unit 106 does not have to output the information of the anonymization method.
  • the anonymization method derivation device 100 includes an electronic circuit (processing circuit) instead of the processor 11.
  • the anonymization method derivation device 100 includes an electronic circuit instead of the processor 11 and the memory 12.
  • the electronic circuit is a dedicated electronic circuit that realizes each of the above functions (and the memory 12).
  • the electronic circuit is assumed to be a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, a logic IC, a GA (Gate Array), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Gate Array). To.
  • Each of the above functions may be realized by one electronic circuit, or each of the above functions may be distributed and realized in a plurality of electronic circuits.
  • processing circuit Lee The above-mentioned processor 11, memory 12, and electronic circuit are collectively referred to as "processing circuit Lee". That is, each of the above functions is realized by the processing circuit.
  • the anonymization method derivation device 100 determines an appropriate anonymization method based on the personal information possessed by the provider 1 and the personal information possessed by the provider 2.
  • FIG. 10 is a diagram showing an example of an anonymization method derivation system including an anonymization method derivation device 100 according to the present embodiment and a synthetic data generation device 200 according to the present embodiment.
  • the anonymization method derivation device 100 includes a synthetic data receiving unit 121.
  • the composite data receiving unit 121 receives the provider composite data transmitted by the composite data transmission unit 203.
  • the destination composite data is synonymous with the composite data 62.
  • the synthetic data generator 200 It is a device that generates synthetic data 62 from the personal information of the provider 2. It is composed of a personal information input unit 201, a provision destination synthetic data generation unit 202, a synthetic data transmission unit 203 that transmits synthetic data 62 to the anonymization method derivation device 100, and a provision destination storage unit 204.
  • the personal information input unit 201 is the same as the personal information input unit 101.
  • the destination composite data generation unit 202 It is the same as the composite data generation unit 102, Generates the destination composite data obtained by processing the personal information stored in the destination storage unit 204.
  • the composite data transmission unit 203 transmits the provider composite data.
  • the destination storage unit 204 Similar to the personal information storage unit 111, Personal information can be retained.
  • the provision destination storage unit 204 stores the personal information possessed by the destination of the anonymously processed information.
  • the composite data receiving unit 121 includes a processor 11, a memory 12, and a port 14.
  • FIG. 11 is a hardware configuration example of the composite data generation device 200.
  • the synthetic data generator 200 is composed of a general computer 10.
  • the personal information input unit 201 and the composite data transmission unit 203 are composed of a processor 11, a memory 12, and a port 14.
  • the provider composite data generation unit 202 is composed of a processor 11 and a memory 12.
  • the provision destination storage unit 204 is composed of the memory 12.
  • the operation in the present embodiment is composed of three phases, a registration phase 51, an analysis phase 52, and a derivation phase 53.
  • these operations will be described in order. However, the description will be omitted when the operation is the same as that of the first embodiment.
  • FIG. 12 shows a procedure relating to the synthetic data generation device 200 in the registration phase 51, that is, after the provider 2 inputs personal information to the synthetic data generation device 200, the synthetic data generation device 200 is anonymized method derivation device 100.
  • This is an example of a flowchart showing an operation until the composite data 62 is transmitted to. The order of processing shown in this flowchart may be changed as appropriate.
  • Step S311 Input reception process
  • the personal information input unit 201 Accepting the input of personal information from the provider 2
  • the received personal information is stored in the provision destination storage unit 204.
  • Step S312 Synthetic data generation process
  • the provider composite data generation unit 202 generates the composite data 62 based on the personal information stored in the provider storage unit 204.
  • Step S313 Synthetic data transmission process
  • the synthetic data transmission unit 203 transmits the synthetic data 62 to the anonymization method derivation device 100.
  • FIG. 13 shows a procedure relating to the anonymization method derivation device 100 in the registration phase 51, that is, after the provider 1 inputs personal information to the anonymization method derivation device 100, the composite data generation unit 102 generates the composite data 61.
  • This is an example of a flowchart showing a procedure in which the composite data storage unit 103 stores the data and the composite data receiving unit 121 stores the composite data 62 in the composite data storage unit 103. The order of processing shown in this flowchart may be changed as appropriate.
  • steps S301 and S302 are the same as those in the first embodiment, the description thereof will be omitted.
  • Step S323 Synthetic data reception process
  • the composite data receiving unit 121 receives the composite data 62 from the composite data generation device 200.
  • Step S324 Synthetic data storage process
  • the composite data generation unit 102 stores the composite data 61 in the composite data storage unit 103.
  • the composite data receiving unit 121 stores the composite data 62 in the composite data storage unit 103.
  • the analysis command execution unit 105 executes the analysis command by using the provider composite data received by the composite data reception unit 121.
  • the synthetic data generation device 200 transmits the synthetic data obtained by processing the personal information of the provider 2 to the anonymization method derivation device 100. Since the anonymization method derivation device 100 can select an appropriate anonymization method by using the synthetic data obtained by processing the personal information of the provider 1 and the synthetic data obtained by processing the personal information of the provider 2. , Appropriate anonymization methods and their parameters can be determined in an environment closer to the actual data analysis use case. Further, according to the present embodiment, similarly to the first embodiment, it is possible to generate anonymously processed information that satisfies the criteria of safety and usefulness.
  • the provider 2 and the composite data generation device 200 may be integrated.
  • the personal information input unit 201 includes a processor 11 and a memory 12.
  • the anonymization method derivation device 100 and the synthetic data generation device 200 may be integrated.
  • the composite data receiving unit 121 and the composite data transmitting unit 203 are composed of a processor 11 and a memory 12.
  • the destination storage unit 204 may be composed of the memory 12 and the storage device 13.
  • the embodiment is not limited to the one shown in the first and second embodiments, and various changes can be made as needed.
  • Python which is an interpreter-type programming language
  • the analysis command does not have to be created by the interpreter-type programming language.
  • the analysis content analysis unit 107 may analyze the "calculation target" and the "calculation content" of the analysis command even when the analysis command is created by the compiled programming language. It is possible.
  • the anonymization method derivation device 100 can interpret the analysis content even when the provider 2 does not use a programming language and analyzes personal information by another means such as data analysis dedicated software. In this case, the anonymization method derivation device 100 can realize the analysis phase 52 by combining techniques known in the field of data analysis.

Abstract

An anonymization technique derivation device (100) is provided with: a personal information storage unit (111) which stores personal information; an analysis command storage unit (112) which stores an analysis command for analyzing the personal information stored in the personal information storage unit (111); an anonymization technique derivation unit (108) which derives an anonymization technique for anonymizing the personal information, on the basis of the analysis command stored in the analysis command storage unit (112); and an anonymity processing unit (109) which generates anonymity processing information by anonymizing the personal information using the anonymization technique derived by the anonymization technique derivation unit (108).

Description

匿名化手法導出装置、匿名化手法導出方法、匿名化手法導出プログラム、及び、匿名化手法導出システムAnonymization method derivation device, anonymization method derivation method, anonymization method derivation program, and anonymization method derivation system
 この発明は、匿名化手法導出装置、匿名化手法導出方法、匿名化手法導出プログラム、及び、匿名化手法導出システムに関する。 The present invention relates to an anonymization method derivation device, anonymization method derivation method, anonymization method derivation program, and anonymization method derivation system.
 個人情報の保護と、利活用との両立を図るための技術として、個人情報を匿名加工情報に変換する匿名化技術が知られている。ある個人情報を持つ事業者(以下、提供元)からその個人情報を持たない事業者(以下、提供先)へ個人情報を提供すると、個人の権利及び利益を侵害する恐れがあるが、提供元が個人情報法を匿名加工情報に変換して提供することにより、個人の権利及び利益を保護しつつ、提供先が個人情報を利活用することができる。 Anonymization technology that converts personal information into anonymously processed information is known as a technology for achieving both protection and utilization of personal information. Providing personal information from a business operator that has certain personal information (hereinafter referred to as the provider) to a business operator that does not have the personal information (hereinafter referred to as the provider) may infringe on the rights and interests of the individual. By converting the Personal Information Law into anonymously processed information and providing it, the provider can utilize the personal information while protecting the rights and interests of the individual.
 提供元は、個人情報を匿名化する際に、適用する匿名化方式と、そのパラメータとを決定する必要がある。
 具体例として、社員の人事情報テーブルを匿名化する場合、適用する匿名化方式としてk-匿名化方式を用いること、及び、そのパラメータであるkの値を3にすることを決定する必要がある。
 別の具体例として、顧客のアンケート情報テーブルを匿名化する場合、適用する匿名化方式としてε-差分プライバシー方式を用いること、及び、そのパラメータであるεの値を0.1にすることを決定する必要がある。
The provider needs to determine the anonymization method to be applied and its parameters when anonymizing personal information.
As a specific example, when anonymizing an employee's personnel information table, it is necessary to decide to use the k-anonymization method as the anonymization method to be applied and to set the value of k, which is a parameter thereof, to 3. ..
As another specific example, when anonymizing the customer questionnaire information table, it was decided to use the ε-difference privacy method as the anonymization method to be applied and to set the value of its parameter ε to 0.1. There is a need to.
 適切な匿名化方式と、パラメータとは、本来、提供元が持つ個人情報の内容と、提供先が実施しようとするデータ分析の内容とに依存して決められるべきものであり、提供元又は提供先が単独で決めることができるものではない。 Appropriate anonymization method and parameters should be determined depending on the content of personal information possessed by the provider and the content of data analysis to be performed by the provider, and should be determined by the provider or the provider. The destination cannot be decided independently.
 そこで、適切な匿名化方式と、パラメータとを決めるための技術が提案されている(例えば、特許文献1)。
 特許文献1で開示されている情報処理システムは、提供元装置と、利用者装置と、情報処理装置とから構成され、
 提供元装置が、情報処理装置に方針算出用データを提供し、
 情報処理装置が、方針算出用データに基づいて方針を決定し、提供元装置へ通知する。
Therefore, a technique for determining an appropriate anonymization method and parameters has been proposed (for example, Patent Document 1).
The information processing system disclosed in Patent Document 1 is composed of a provider device, a user device, and an information processing device.
The provider device provides the information processing device with data for policy calculation,
The information processing device determines the policy based on the policy calculation data and notifies the provider device.
特開2014-170369号公報Japanese Unexamined Patent Publication No. 2014-170369
 しかしながら、方針算出用データは、個人情報を曖昧化したデータである匿名加工情報である。そのため、提供元が、方針算出用データを算出するための匿名化方式と、そのパラメータとを、単独で決めなければならないという課題があった。 However, the policy calculation data is anonymously processed information that is data that obscures personal information. Therefore, there is a problem that the provider must independently determine the anonymization method for calculating the policy calculation data and its parameters.
 本発明は、提供元の個人情報と、提供先の分析コマンドとに基づいて導出した匿名化手法を利用して個人情報を匿名化することにより、安全性及び有用性の高い匿名加工情報を生成することができる装置を提供することを目的とする。 The present invention generates highly safe and useful anonymously processed information by anonymizing personal information by using an anonymization method derived based on the personal information of the provider and the analysis command of the provider. It is an object of the present invention to provide a device capable of performing.
 この発明の匿名化手法導出装置は、
 個人情報を記憶している個人情報記憶部と、
 前記個人情報記憶部が記憶している前記個人情報を分析する分析コマンドを記憶している分析コマンド記憶部と、
 前記分析コマンド記憶部が記憶している前記分析コマンドに基づいて、前記個人情報を匿名化する匿名化手法を導出する匿名化手法導出部と、
 前記匿名化手法導出部が導出した前記匿名化手法を利用して、前記個人情報を匿名化した匿名加工情報を生成する匿名加工部と
を備える。
The anonymization method derivation device of the present invention
The personal information storage unit that stores personal information and
An analysis command storage unit that stores an analysis command that analyzes the personal information stored in the personal information storage unit, and an analysis command storage unit that stores the analysis command that analyzes the personal information.
An anonymization method derivation unit that derives an anonymization method for anonymizing the personal information based on the analysis command stored in the analysis command storage unit.
It is provided with an anonymization processing unit that generates anonymized processing information in which the personal information is anonymized by using the anonymization method derived by the anonymization method derivation unit.
 この発明の匿名化手法導出装置によれば、提供元の個人情報と、提供先の分析コマンドとに基づいて導出した匿名化手法を利用して個人情報を匿名化することにより、安全性及び有用性の高い匿名加工情報を生成することができる。 According to the anonymization method derivation device of the present invention, it is safe and useful by anonymizing the personal information by using the anonymization method derived based on the personal information of the provider and the analysis command of the provider. It is possible to generate highly anonymous processed information.
実施の形態1に係る匿名化手法導出装置100の構成図。The block diagram of the anonymization method derivation apparatus 100 which concerns on Embodiment 1. FIG. 実施の形態1に係る匿名化手法導出装置100のハードウェア構成図。The hardware configuration diagram of the anonymization method derivation device 100 according to the first embodiment. 実施の形態1に係る登録フェーズ51の動作を示すフローチャート。The flowchart which shows the operation of the registration phase 51 which concerns on Embodiment 1. 実施の形態1に係る分析フェーズ52の動作を示すフローチャート。The flowchart which shows the operation of the analysis phase 52 which concerns on Embodiment 1. FIG. Pythonを用いた場合における分析フェーズ52の入出力の例。An example of input / output of analysis phase 52 when using Python. 実施の形態1に係る導出フェーズ53の動作を示すフローチャート。The flowchart which shows the operation of the derivation phase 53 which concerns on Embodiment 1. Pythonを用いた場合における分析コマンドの例。An example of an analysis command when using Python. 実施の形態1に係る分析フェーズ52の動作の一部を示すフローチャート。The flowchart which shows a part of the operation of the analysis phase 52 which concerns on Embodiment 1. 実施の形態1に係る分析フェーズ52の動作の一部を示すフローチャート。The flowchart which shows a part of the operation of the analysis phase 52 which concerns on Embodiment 1. 実施の形態2に係る匿名化手法導出装置100及び実施の形態2に係る合成データ生成装置200の構成図。The block diagram of the anonymization method derivation apparatus 100 which concerns on Embodiment 2 and the synthetic data generation apparatus 200 which concerns on Embodiment 2. 実施の形態2に係る合成データ生成装置200のハードウェア構成図。The hardware configuration diagram of the synthetic data generation apparatus 200 which concerns on Embodiment 2. FIG. 実施の形態2に係る登録フェーズ51の動作の一部を示すフローチャート。The flowchart which shows a part of the operation of the registration phase 51 which concerns on Embodiment 2. 実施の形態2に係る登録フェーズ51の動作の一部を示すフローチャート。The flowchart which shows a part of the operation of the registration phase 51 which concerns on Embodiment 2.
 実施の形態1.
 以下、本実施の形態について、図面を参照しながら詳細に説明する。
 本実施の形態に係る匿名化手法導出装置100は、提供元1が持つ個人情報と、提供先2の分析コマンドに基づいて、適切な匿名化方式と、そのパラメータとを決定する。
 分析コマンドは、コンピュータ等に個人情報を分析させる命令の一部又は全部のことである。分析コマンドは、具体例としては、インタプリタ型言語により記述された文字列、及び、実行ファイルである。
 分析コマンドは、個人情報記憶部111が記憶している個人情報を分析するコマンドでもある。
 なお、個人情報を分析することには、個人情報を匿名化した情報を分析することが含まれる。
Embodiment 1.
Hereinafter, the present embodiment will be described in detail with reference to the drawings.
The anonymization method derivation device 100 according to the present embodiment determines an appropriate anonymization method and its parameters based on the personal information possessed by the provider 1 and the analysis command of the provider 2.
An analysis command is a part or all of a command that causes a computer or the like to analyze personal information. Specific examples of the analysis command are a character string written in an interpreted language and an executable file.
The analysis command is also a command for analyzing the personal information stored in the personal information storage unit 111.
In addition, analysis of personal information includes analysis of information in which personal information is anonymized.
***構成の説明***
 図1は、本実施の形態に係る匿名化手法導出装置100の構成例を示す図である。
 図中の矢印は、匿名化手法導出装置100、又は、匿名化手法導出システムの実行中に、データが矢先の方向へ流れ得ることを表す。
*** Explanation of configuration ***
FIG. 1 is a diagram showing a configuration example of an anonymization method derivation device 100 according to the present embodiment.
The arrow in the figure indicates that data can flow in the direction of the arrow while the anonymization method derivation device 100 or the anonymization method derivation system is being executed.
 提供元1は、
 個人情報を提供する装置等であり、
 個人情報を提供する事業者等であっても良い。
 提供元1が匿名化手法導出装置100に個人情報を提供する手段は、任意のものであって良い。
Provider 1
A device that provides personal information, etc.
It may be a business operator or the like that provides personal information.
The means by which the provider 1 provides the personal information to the anonymization method derivation device 100 may be arbitrary.
 提供先2は、
 個人情報を匿名化した情報である匿名加工情報を受け取る装置等であり、
 個人情報を分析する事業者等であっても良い。
 提供先2が匿名化手法導出装置100から個人情報を受け取る手段は、任意のものであって良い。
Provider 2 is
A device that receives anonymously processed information that is anonymized personal information.
It may be a business operator or the like that analyzes personal information.
The means by which the provider 2 receives personal information from the anonymization method derivation device 100 may be arbitrary.
 匿名化手法導出装置100は、個人情報(personal data)を匿名加工情報に変換する際の適切な匿名化手法を決める装置である。匿名化手法は、典型的に、匿名化方式と、そのパラメータとから成り、匿名化方式そのものであっても良い。匿名化方式は、具体例としては、k-匿名化方式、又は、ε-差分プライバシー方式である。
 匿名化手法導出装置100は、図1に示す構成要素から構成される。
The anonymization method derivation device 100 is a device that determines an appropriate anonymization method when converting personal information (personal data) into anonymously processed information. The anonymization method typically comprises an anonymization method and its parameters, and may be the anonymization method itself. As a specific example, the anonymization method is a k-anonymization method or an ε-differential privacy method.
The anonymization method derivation device 100 is composed of the components shown in FIG.
 個人情報入力部101は、
 提供元1から個人情報の入力を受け付け、
 入力された個人情報を、個人情報記憶部111に記憶させる。
 提供元1が匿名化手法導出装置100に個人情報を入力する手段は、任意のものであって良い。
The personal information input unit 101
Accepting the input of personal information from provider 1
The input personal information is stored in the personal information storage unit 111.
The means by which the provider 1 inputs personal information to the anonymization method derivation device 100 may be arbitrary.
 合成データ生成部102は、
 入力された個人情報から合成データ61を生成し、
 合成データ61を、合成データ記憶部103に記憶させる。
 合成データ(synthetic data)とは、個人情報に基づいて生成されたデータであって、統計的性質が生成元の個人情報と同等になるように生成されたデータである。
 合成データ生成部102は、個人情報記憶部111が記憶している個人情報を加工した合成データを生成する。
The composite data generation unit 102
Synthetic data 61 is generated from the input personal information,
The composite data 61 is stored in the composite data storage unit 103.
Synthetic data (synthetic data) is data generated based on personal information, and is data generated so that the statistical properties are equivalent to the personal information of the generation source.
The synthetic data generation unit 102 generates synthetic data obtained by processing the personal information stored in the personal information storage unit 111.
 合成データ記憶部103は、合成データ61を保持することができる。 The composite data storage unit 103 can hold the composite data 61.
 分析コマンド入力部104は、
 提供先2から個人情報を分析する分析コマンドの入力を受け付け、
 入力された分析コマンドを、分析コマンド記憶部112に記憶させる。
 提供先2が匿名化手法導出装置100に分析コマンドを入力する手段は、任意のものであって良い。
The analysis command input unit 104
Accepts input of analysis command to analyze personal information from provider 2 and accepts
The input analysis command is stored in the analysis command storage unit 112.
The means by which the provider 2 inputs the analysis command to the anonymization method derivation device 100 may be arbitrary.
 分析コマンド実行部105は、分析コマンド記憶部112が記憶している分析コマンドを実行する。
 分析コマンド実行部105は、合成データ生成部102が生成した合成データ61を利用して分析コマンドを実行しても良い。
The analysis command execution unit 105 executes the analysis command stored in the analysis command storage unit 112.
The analysis command execution unit 105 may execute the analysis command using the composite data 61 generated by the composite data generation unit 102.
 実行結果出力部106は、分析コマンド実行部105の実行結果と、匿名化手法導出部108の導出した匿名化手法の情報と、匿名加工部109が生成した匿名加工情報とを出力する。 The execution result output unit 106 outputs the execution result of the analysis command execution unit 105, the information of the anonymization method derived by the anonymization method derivation unit 108, and the anonymization processing information generated by the anonymization processing unit 109.
 分析内容解析部107は、
 提供先2が実施しようとしているデータの分析内容を解析し、
 解析結果である解析情報を出力する。
 即ち、分析内容解析部107は、分析コマンドを解析し、解析情報を出力する。
 また、分析内容解析部107は、
 解析情報として、分析コマンドの実行時に利用する個人情報と、分析コマンドの実行時に個人情報に対して行う操作とを出力しても良く、
 分析コマンド実行部105が分析コマンドを実行した際の実行情報に基づいて分析コマンドを解析しても良い。
 分析内容は、分析コマンドの実行時に利用する個人情報と、分析コマンドの実行時に個人情報に対して行う操作と等のことである。
 個人情報に対して行う操作は、個人情報又は個人情報を加工した情報の少なくとも一部を用いて行う演算等のことである。
 実行情報は、コマンドの実行に関連する情報のことである。
Analysis content analysis unit 107
Analyze the analysis content of the data that the provider 2 is going to carry out,
Output the analysis information that is the analysis result.
That is, the analysis content analysis unit 107 analyzes the analysis command and outputs the analysis information.
In addition, the analysis content analysis unit 107
As the analysis information, the personal information used when the analysis command is executed and the operation to be performed on the personal information when the analysis command is executed may be output.
The analysis command may be analyzed based on the execution information when the analysis command execution unit 105 executes the analysis command.
The analysis content includes personal information used when executing an analysis command, operations performed on personal information when executing an analysis command, and the like.
The operation performed on the personal information is an operation performed by using at least a part of the personal information or the processed information of the personal information.
Execution information is information related to command execution.
 匿名化手法導出部108は、分析内容解析部107が出力した解析情報に基づいて、適切な匿名化方式と、そのパラメータとを導出する。
 匿名化手法導出部108は、
 分析コマンド記憶部112が記憶している分析コマンドに基づいて、個人情報を匿名化する匿名化手法を導出しても良く、
 匿名化方式記憶部110が記憶している匿名化方式に基づいて匿名化手法を導出しても良く、
 解析情報に基づいて匿名化手法を導出しても良い。
The anonymization method derivation unit 108 derives an appropriate anonymization method and its parameters based on the analysis information output by the analysis content analysis unit 107.
The anonymization method derivation unit 108
Anonymization method for anonymizing personal information may be derived based on the analysis command stored in the analysis command storage unit 112.
Anonymization method Anonymization method may be derived based on the anonymization method stored in the storage unit 110.
Anonymization methods may be derived based on the analysis information.
 匿名加工部109は、個人情報記憶部111が記憶している個人情報と、匿名化手法導出部108が導出した匿名化手法とに基づいて、匿名加工情報を生成する。
 即ち、匿名加工部109は、匿名化手法導出部108が導出した匿名化手法を利用して、個人情報を匿名化した匿名加工情報を生成する。
 また、匿名加工部109は、匿名加工情報の安全性と、有用性とを評価しても良い。
The anonymization processing unit 109 generates anonymity processing information based on the personal information stored in the personal information storage unit 111 and the anonymization method derived by the anonymization method derivation unit 108.
That is, the anonymization processing unit 109 uses the anonymization method derived by the anonymization method derivation unit 108 to generate anonymized processing information in which personal information is anonymized.
In addition, the anonymous processing unit 109 may evaluate the safety and usefulness of the anonymous processing information.
 匿名化方式記憶部110は、様々な匿名化方式と、そのパラメータの例とがまとめられたデータベースを記憶している。匿名化方式は、典型的には、匿名化方式を実現するプログラムのことである。
 そのため、匿名化方式記憶部110は、個人情報を匿名化する匿名化方式を実現するプログラムを記憶している。
The anonymization method storage unit 110 stores a database in which various anonymization methods and examples of their parameters are summarized. The anonymization method is typically a program that implements the anonymization method.
Therefore, the anonymization method storage unit 110 stores a program that realizes an anonymization method for anonymizing personal information.
 個人情報記憶部111は、個人情報を保持することができる。
 個人情報入力部101が個人情報記憶部111に個人情報を記憶させた場合、個人情報記憶部111は個人情報を記憶している。
The personal information storage unit 111 can hold personal information.
When the personal information input unit 101 stores the personal information in the personal information storage unit 111, the personal information storage unit 111 stores the personal information.
 分析コマンド記憶部112は、分析コマンドを保持することができる。
 分析コマンド入力部104が分析コマンド記憶部112に分析コマンドを記憶させた場合、分析コマンド記憶部112は、個人情報記憶部111が記憶している個人情報を分析する分析コマンドを記憶している。
The analysis command storage unit 112 can hold the analysis command.
When the analysis command input unit 104 stores the analysis command in the analysis command storage unit 112, the analysis command storage unit 112 stores the analysis command for analyzing the personal information stored in the personal information storage unit 111.
 図2は、本実施の形態に係る匿名化手法導出装置100の、ハードウェア構成例を示す図である。 FIG. 2 is a diagram showing a hardware configuration example of the anonymization method derivation device 100 according to the present embodiment.
 匿名化手法導出装置100は、本図に示すように、一般的なコンピュータから構成される。 As shown in this figure, the anonymization method derivation device 100 is composed of a general computer.
 ディスプレイ21と、キーボード22と、マウス23とは、提供元1が匿名化手法導出装置100を操作するためのものである。 The display 21, the keyboard 22, and the mouse 23 are for the provider 1 to operate the anonymization method derivation device 100.
 ディスプレイ24と、キーボード25と、マウス26とは、提供先2が匿名化手法導出装置100を操作するためのものである。 The display 24, the keyboard 25, and the mouse 26 are for the provider 2 to operate the anonymization method derivation device 100.
 合成データ生成部102と、分析コマンド実行部105と、分析内容解析部107と、匿名加工部109とは、プロセッサ11と、メモリ12とから構成される。 The synthetic data generation unit 102, the analysis command execution unit 105, the analysis content analysis unit 107, and the anonymous processing unit 109 are composed of a processor 11 and a memory 12.
 合成データ記憶部103と、個人情報記憶部111と、分析コマンド記憶部112とは、メモリ12から構成される。 The synthetic data storage unit 103, the personal information storage unit 111, and the analysis command storage unit 112 are composed of a memory 12.
 個人情報入力部101と、分析コマンド入力部104と、実行結果出力部106と、匿名化手法導出部108とは、プロセッサ11と、メモリ12と、ポート14とから構成される。 The personal information input unit 101, the analysis command input unit 104, the execution result output unit 106, and the anonymization method derivation unit 108 are composed of a processor 11, a memory 12, and a port 14.
 匿名化方式記憶部110は、記憶装置13から構成される。 The anonymization method storage unit 110 is composed of a storage device 13.
 プロセッサ11は、データバス15(信号線)を介して他のハードウェアと接続され、これら他のハードウェアを制御する。 The processor 11 is connected to other hardware via the data bus 15 (signal line) and controls these other hardware.
 記憶装置13は、匿名化手法導出プログラムを記憶している。 The storage device 13 stores the anonymization method derivation program.
 プロセッサ11は、プログラム及びOS(Operating System)等を実行するプロセッシング装置である。プロセッシング装置は、IC(Integrated Circuit)と呼ぶこともあり、プロセッサ11は、具体例としては、CPU(Central Processing Unit)、DSP(Digital Signal Processor)、GPU(Graphics Processing Unit)である。プロセッサ11は、メモリ12に格納されたプログラムを読み出して実行する。 The processor 11 is a processing device that executes a program, an OS (Operating System), and the like. The processing device is sometimes called an IC (Integrated Circuit), and the processor 11 is, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and a GPU (Graphics Processing Unit). The processor 11 reads and executes the program stored in the memory 12.
 本図のコンピュータ10は、プロセッサ11を1つだけ備えているが、コンピュータ10は、プロセッサ11を代替する複数のプロセッサを備えていても良い。これら複数のプロセッサは、プログラムの実行等を分担する。 The computer 10 in this figure includes only one processor 11, but the computer 10 may include a plurality of processors that replace the processor 11. These plurality of processors share the execution of programs and the like.
 メモリ12は、データを一時的に記憶する記憶装置であり、プロセッサ11の作業領域として使用されるメインメモリとして機能する。メモリ12は、具体例としては、SRAM(Static Random Access Memory)、DRAM(Dynamic Random Access Memory)等のRAM(Random Access Memory)である。メモリ12は、プロセッサ11の演算結果を保持する。 The memory 12 is a storage device that temporarily stores data, and functions as a main memory used as a work area of the processor 11. As a specific example, the memory 12 is a RAM (Random Access Memory) such as a SRAM (Static Random Access Memory) or a DRAM (Dynamic Random Access Memory). The memory 12 holds the calculation result of the processor 11.
 記憶装置13は、データを不揮発的に保管する記憶装置であり、OS、プロセッサ11によって実行される各プログラム、各プログラムの実行時に使用されるデータ等を記憶する。記憶装置13は、具体例としては、HDD(Hard Disk Drive)、SSD(Solid State Drive)である。また、記憶装置13は、メモリカード、SD(Secure Digital、登録商標)メモリカード、CF(Compact Flash)、NANDフラッシュ、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ(登録商標)ディスク、DVD(Digital Versatile Disk)等の可搬記録媒体であってもよい。 The storage device 13 is a storage device that stores data in a non-volatile manner, and stores the OS, each program executed by the processor 11, data used when executing each program, and the like. Specific examples of the storage device 13 are an HDD (Hard Disk Drive) and an SSD (Solid State Drive). The storage device 13 includes a memory card, SD (Secure Digital, registered trademark) memory card, CF (Compact Flash), NAND flash, flexible disk, optical disk, compact disk, Blu-ray (registered trademark) disk, and DVD (Digital Versailles Disk). ) Etc. may be a portable recording medium.
 ポート14は、外部の装置等と通信するためのインタフェースである。
 ポート14は、具体例としては、Ethernet(登録商標)、又は、USB(Universal Serial Bus)のポートである。
 なお、ポート14は、複数のポートであっても良い。
The port 14 is an interface for communicating with an external device or the like.
As a specific example, the port 14 is a port of Ethernet (registered trademark) or USB (Universal Serial Bus).
The port 14 may be a plurality of ports.
 ここで、図1の機能構成図と、図2のハードウェア構成図との対応をさらに説明する。
 個人情報入力部101は、提供元1がディスプレイ21と、キーボード22と、マウス23とのいずれか1以上のものを用いて匿名化手法導出装置100に入力した個人情報をメモリ12に格納する。
Here, the correspondence between the functional configuration diagram of FIG. 1 and the hardware configuration diagram of FIG. 2 will be further described.
The personal information input unit 101 stores in the memory 12 the personal information input to the anonymization method derivation device 100 by the provider 1 using any one or more of the display 21, the keyboard 22, and the mouse 23.
 合成データ生成部102は、メモリ12が記憶している個人情報に基づいてプロセッサ11を用いて合成データ61を生成し、メモリ12に格納する。 The composite data generation unit 102 generates the composite data 61 using the processor 11 based on the personal information stored in the memory 12, and stores it in the memory 12.
 分析コマンド入力部104は、提供先2がディスプレイ24と、キーボード25と、マウス26とのいずれか1以上のものを用いて入力した分析コマンドをメモリ12に格納する。 The analysis command input unit 104 stores in the memory 12 an analysis command input by the provider 2 using any one or more of the display 24, the keyboard 25, and the mouse 26.
 分析コマンド実行部105は、メモリ12から合成データ61を取り出した上でプロセッサ11を用いて分析コマンドを実行し、実行結果をメモリ12に格納する。 The analysis command execution unit 105 extracts the composite data 61 from the memory 12, executes the analysis command using the processor 11, and stores the execution result in the memory 12.
 実行結果出力部106は、メモリ12が記憶している実行結果を外部に出力する。 The execution result output unit 106 outputs the execution result stored in the memory 12 to the outside.
 分析内容解析部107は、メモリ12が記憶している分析コマンドの内容から、プロセッサ11を用いて分析内容を解析し、解析結果をメモリ12に格納する。 The analysis content analysis unit 107 analyzes the analysis content using the processor 11 from the content of the analysis command stored in the memory 12, and stores the analysis result in the memory 12.
 匿名化手法導出部108は、メモリ12が記憶している解析結果からプロセッサ11を用いて匿名化手法を導出し、導出結果をメモリ12に格納する。
 また、匿名化手法導出部108は、必要に応じて、
 匿名化手法と、パラメータとを記憶装置13から読み出し、
 導出結果を記憶装置13に格納する。
The anonymization method derivation unit 108 derives the anonymization method from the analysis result stored in the memory 12 by using the processor 11, and stores the derivation result in the memory 12.
In addition, the anonymization method derivation unit 108 may be used as necessary.
The anonymization method and the parameters are read from the storage device 13,
The derivation result is stored in the storage device 13.
 なお、図2に示すハードウェア構成は最も基本的な例であり、匿名化手法導出装置100のハードウェア構成は別の構成であってもよい。
 具体例としては、一般的なコンピュータに、図2の構成を仮想的に構築してもよい。また、提供元1及び/又は提供先2が匿名化手法導出装置100とは別のコンピュータであって、提供元1及び/又は提供先2がリモート接続により匿名化手法導出装置100を操作できるようにしてもよい。
The hardware configuration shown in FIG. 2 is the most basic example, and the hardware configuration of the anonymization method derivation device 100 may be another configuration.
As a specific example, the configuration shown in FIG. 2 may be virtually constructed on a general computer. Further, the provider 1 and / or the provider 2 is a computer different from the anonymization method derivation device 100, so that the provider 1 and / or the provider 2 can operate the anonymization method derivation device 100 by remote connection. It may be.
***動作の説明***
 本実施の形態における動作は、登録フェーズ51と、分析フェーズ52と、導出フェーズ53との3つのフェーズに分けられる。これらの動作を順に説明する。
*** Explanation of operation ***
The operation in the present embodiment is divided into three phases: a registration phase 51, an analysis phase 52, and a derivation phase 53. These operations will be described in order.
 匿名化手法導出装置100の動作手順は、匿名化手法導出方法に相当する。また、匿名化手法導出装置100の動作を実現するプログラムは、匿名化手法導出プログラムに相当する。 The operation procedure of the anonymization method derivation device 100 corresponds to the anonymization method derivation method. Further, the program that realizes the operation of the anonymization method derivation device 100 corresponds to the anonymization method derivation program.
***登録フェーズ51の動作の説明***
 図3は、登録フェーズ51の動作を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
 登録フェーズ51は、提供元1が匿名化手法導出装置100に個人情報を入力してから、合成データ生成部102が合成データ61を合成データ記憶部103に記憶させるまでの処理に対応する。
*** Explanation of the operation of registration phase 51 ***
FIG. 3 is an example of a flowchart showing the operation of the registration phase 51.
The order of processing shown in this flowchart may be changed as appropriate.
The registration phase 51 corresponds to a process from the time when the provider 1 inputs personal information to the anonymization method derivation device 100 until the synthetic data generation unit 102 stores the synthetic data 61 in the synthetic data storage unit 103.
(ステップS301:入力受付処理)
 個人情報入力部101は、
 提供元1から個人情報の入力を受け付け、
 受け付けた個人情報を個人情報記憶部111に記憶させる。
 入力の方法は、キーボード22を用いた方法、媒体から入力する方法、又は、ネットワークを経由した方法等、個人情報入力部101が入力情報を認識できる任意の方法であって良い。
(Step S301: Input reception process)
The personal information input unit 101
Accepting the input of personal information from provider 1
The received personal information is stored in the personal information storage unit 111.
The input method may be any method such as a method using the keyboard 22, a method of inputting from a medium, a method via a network, or the like, in which the personal information input unit 101 can recognize the input information.
(ステップS302:合成データ生成処理)
 合成データ生成部102は、
 入力された個人情報から合成データ61を生成し、
 合成データ61を合成データ記憶部103に記憶させる。
 合成データ61の生成方法は、元の個人情報の統計的性質を保ちつつ匿名性のあるデータを生成する任意の方法であって良い。合成データ61の生成方法の具体例は、参考文献1に挙げられている。
(Step S302: Synthetic data generation process)
The composite data generation unit 102
Synthetic data 61 is generated from the input personal information,
The composite data 61 is stored in the composite data storage unit 103.
The method for generating the composite data 61 may be any method for generating anonymous data while maintaining the statistical properties of the original personal information. A specific example of the method of generating the synthetic data 61 is given in Reference 1.
[参考文献1]
 Aggarwal, Charu C., and S. Yu Philip, eds. Privacy-preserving data mining: models and algorithms. Springer Science & Business Media, 2008.
[Reference 1]
Aggarwal, Charu C. Aggar. , And S. Yu Philip, eds. Privacy-preserving data mining: models and algorithms. Springer Science & Business Media, 2008.
***分析フェーズ52の動作の説明***
 図4は、分析フェーズ52の動作を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
 分析フェーズ52は、提供先2が匿名化手法導出装置100に分析コマンドを入力してから、実行結果出力部106が実行結果を出力するまでの処理に対応する。
*** Explanation of the operation of analysis phase 52 ***
FIG. 4 is an example of a flowchart showing the operation of the analysis phase 52.
The order of processing shown in this flowchart may be changed as appropriate.
The analysis phase 52 corresponds to the process from the input of the analysis command to the anonymization method derivation device 100 by the provider 2 to the output of the execution result by the execution result output unit 106.
(ステップS401:合成データ読出処理)
 分析コマンド実行部105は、合成データ記憶部103から合成データ61を読み出す。
(Step S401: Composite data reading process)
The analysis command execution unit 105 reads the composite data 61 from the composite data storage unit 103.
(ステップS402:分析コマンド受付処理)
 分析コマンド入力部104は、
 提供先2からの分析コマンドの入力を受け付け、
 受け付けた分析コマンドを分析コマンド記憶部112に記憶させる。
 入力の方法は、キーボード25を用いた方法、媒体から入力する方法、又は、ネットワークを経由した方法等、分析コマンド入力部104が認識できる任意の方法であって良い。
(Step S402: Analysis command reception process)
The analysis command input unit 104
Accepts input of analysis command from provider 2 and accepts
The received analysis command is stored in the analysis command storage unit 112.
The input method may be any method that can be recognized by the analysis command input unit 104, such as a method using the keyboard 25, a method of inputting from a medium, or a method via a network.
(ステップS403:分析コマンド実行処理)
 分析コマンド実行部105は、合成データ61に対して分析コマンド記憶部112が記憶している分析コマンドを実行する。
(Step S403: Analysis command execution process)
The analysis command execution unit 105 executes the analysis command stored in the analysis command storage unit 112 with respect to the composite data 61.
(ステップS404:実行結果出力処理)
 実行結果出力部106は、分析コマンド実行部105の実行結果を出力する。ただし、分析コマンド実行部105が実行結果の出力を要求しない分析コマンドを実行した場合、実行結果出力部106は、実行結果を出力しない。
 出力の方法は、ディスプレイ24に出力する方法、ネットワークを経由して出力する方法等、提供先2が認識できる任意の方法であって良い。
(Step S404: Execution result output processing)
The execution result output unit 106 outputs the execution result of the analysis command execution unit 105. However, when the analysis command execution unit 105 executes an analysis command that does not request the output of the execution result, the execution result output unit 106 does not output the execution result.
The output method may be any method that can be recognized by the provider 2, such as a method of outputting to the display 24 or a method of outputting via a network.
(ステップS405:分析コマンド確認処理)
 分析コマンド入力部104は、提供先2が新たな分析コマンドを入力したか確認する。
 匿名化手法導出装置100は、
 提供先2が新たな分析コマンドを入力した場合、ステップS402に進み、
 それ以外の場合、分析フェーズ52の処理を終了する。
(Step S405: Analysis command confirmation process)
The analysis command input unit 104 confirms whether the provider 2 has input a new analysis command.
The anonymization method derivation device 100
If the provider 2 inputs a new analysis command, the process proceeds to step S402.
Otherwise, the process of analysis phase 52 ends.
 図5は、提供先2がプログラミング言語のPythonによる分析コマンドを入力した場合における、分析フェーズ52の入出力の例を示したものである。
 ここで、入出力は、分析コマンドの入力と、実行結果の出力とのことである。
FIG. 5 shows an example of input / output in the analysis phase 52 when the provider 2 inputs an analysis command by the programming language Python.
Here, the input / output is the input of the analysis command and the output of the execution result.
 行501から行503までは、提供先2が入力した分析コマンドである。
 分析コマンド実行部105は、これらの分析コマンドを実行し、実行結果を保持する。
 しかし、これらの分析コマンドは出力を要求しないものであるため、実行結果出力部106は、実行結果を出力しない。
Lines 501 to 503 are analysis commands input by the provider 2.
The analysis command execution unit 105 executes these analysis commands and holds the execution result.
However, since these analysis commands do not request output, the execution result output unit 106 does not output the execution result.
 分析コマンドが行504である場合、実行結果出力部106は、行505のように実行結果を出力する。
 匿名化手法導出装置100は、分析フェーズ52において、図5に示すような処理を繰り返す。
When the analysis command is line 504, the execution result output unit 106 outputs the execution result as in line 505.
The anonymization method derivation device 100 repeats the process as shown in FIG. 5 in the analysis phase 52.
 なお、分析コマンドをPythonにより記述した場合の例を示したが、分析コマンドは、任意のプログラミング言語であって良い。 Although an example is shown when the analysis command is described by Python, the analysis command may be in any programming language.
***導出フェーズ53の動作の説明***
 図6は、導出フェーズ53の動作を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
 導出フェーズ53は、分析内容解析部107が分析コマンド実行部105における分析内容を解析してから、匿名化手法導出部108が匿名化手法を導出するまでの処理に対応する。
*** Explanation of the operation of the derivation phase 53 ***
FIG. 6 is an example of a flowchart showing the operation of the derivation phase 53.
The order of processing shown in this flowchart may be changed as appropriate.
The derivation phase 53 corresponds to the process from the analysis content analysis unit 107 analyzing the analysis content in the analysis command execution unit 105 to the anonymization method derivation unit 108 deriving the anonymization method.
(ステップS601:分析コマンド読出処理)
 分析内容解析部107は、分析コマンドの系列を分析コマンド記憶部112から読み出す。
 分析コマンドの系列は、何らかの意味を有する一まとまりの分析コマンドのことである。
(Step S601: Analysis command read process)
The analysis content analysis unit 107 reads a sequence of analysis commands from the analysis command storage unit 112.
A series of analysis commands is a group of analysis commands that have some meaning.
(ステップS602:分析内容推定処理)
 分析内容解析部107は、分析コマンドの系列から分析内容を解析し、解析結果を出力する。
 分析内容解析部107は、分析内容を解析することにより、分析内容を推定する。この推定方法については後述する。
 なお、分析内容解析部107の解析結果は、分析内容を推定したものであるため、実際の分析内容と一部異なるものであって良い。
(Step S602: Analysis content estimation process)
The analysis content analysis unit 107 analyzes the analysis content from the series of analysis commands and outputs the analysis result.
The analysis content analysis unit 107 estimates the analysis content by analyzing the analysis content. This estimation method will be described later.
Since the analysis result of the analysis content analysis unit 107 is an estimation of the analysis content, it may be partially different from the actual analysis content.
(ステップS603:導出処理)
 匿名化手法導出部108は、ステップS602の解析結果と、匿名化方式記憶部110が記憶している匿名化方式と、そのパラメータとに基づいて、匿名化手法を導出する。この導出方法については後述する。
(Step S603: Derivation process)
The anonymization method derivation unit 108 derives the anonymization method based on the analysis result of step S602, the anonymization method stored in the anonymization method storage unit 110, and its parameters. This derivation method will be described later.
(ステップS604:出力処理)
 実行結果出力部106は、ステップS603において導出した匿名化手法を出力する。
(Step S604: Output processing)
The execution result output unit 106 outputs the anonymization method derived in step S603.
***分析フェーズ52の推定方法の説明***
 図7及び図8を用いて、分析内容解析部107が分析コマンドの系列から分析内容を推定する方法を説明する。
 図7は、Pythonにより記述された分析コマンドの系列の例であり、分析コマンドの系列から3つのコマンドを抜粋したものである。
*** Explanation of estimation method for analysis phase 52 ***
A method in which the analysis content analysis unit 107 estimates the analysis content from a series of analysis commands will be described with reference to FIGS. 7 and 8.
FIG. 7 is an example of a series of analysis commands described by Python, and three commands are extracted from the series of analysis commands.
 21番目の分析コマンドは、
 変数Bに格納されたデータフレームのInvoiceDate列の要素毎に2010/1/1からの経過日数を計算し、
 計算した値全てに対する平均値を計算し、
 前記平均値を変数Bのdate_ave列に格納することを意味する。
The 21st analysis command is
The number of days elapsed from 2010/1/1 is calculated for each element of the InvoiceDate column of the data frame stored in the variable B.
Calculate the average value for all the calculated values,
It means that the average value is stored in the date_ave column of the variable B.
 22番目の分析コマンドは、
 変数BのInvoiceDate列の要素毎に2010/1/1からの経過日数を計算し、
 計算した値全てに対する標準偏差を計算し、
 前記標準偏差を変数Bのdate_std列に格納することを意味する。
The 22nd analysis command is
Calculate the number of days elapsed from 2010/1/1 for each element in the InvoiceDate column of variable B.
Calculate the standard deviation for all the calculated values
It means that the standard deviation is stored in the date_std column of the variable B.
 23番目の分析コマンドは、
 変数Bのdate列の全要素に基づいて10個の区間から成るヒストグラムを作成する場合において、ヒストグラムの各ビンの境界と、各ビンの度数とを計算し、
 計算結果を変数Bのhist列に格納することを意味する。
The 23rd analysis command is
When creating a histogram consisting of 10 intervals based on all the elements of the date column of variable B, the boundary of each bin of the histogram and the frequency of each bin are calculated.
It means that the calculation result is stored in the hist column of the variable B.
 分析コマンドの意味を解釈することは、分析コマンドを実行するにあたってプロセッサ11が行っていることであり、インタプリタ型言語と呼ばれるプログラミング言語においては一般的な技術である。 Interpreting the meaning of an analysis command is what the processor 11 does when executing an analysis command, which is a common technique in a programming language called an interpreted language.
 分析内容解析部107は、分析コマンドの系列から分析内容を推定する際に、上述のプロセッサ11の処理を活用する。即ち、分析内容解析部107は、プロセッサ11が解釈した内容のうち、「計算対象」と、「計算内容」とを分析コマンドの系列全体にわたって解析することにより、提供先2の計算対象と、算出しようとしている結果とを推定する。
 23番目の分析コマンドの場合、
 計算対象は、変数Bのdate列の全要素であり、
 計算内容は、ヒストグラムの各ビンの境界と、各ビンの度数とを計算することである。
The analysis content analysis unit 107 utilizes the above-mentioned processing of the processor 11 when estimating the analysis content from the sequence of analysis commands. That is, the analysis content analysis unit 107 analyzes the "calculation target" and the "calculation content" of the content interpreted by the processor 11 over the entire series of analysis commands, thereby calculating the calculation target of the provider 2. Estimate the result you are trying to achieve.
For the 23rd analysis command
The calculation target is all the elements of the date column of the variable B.
The calculation content is to calculate the boundary of each bin of the histogram and the frequency of each bin.
 図7では、具体例として、提供先2がInvoiceDate列の2010/1/1からの経過日数の頻度分布を算出しようとしていることが推定される。 In FIG. 7, as a specific example, it is estimated that the provider 2 is trying to calculate the frequency distribution of the number of days elapsed from 2010/1/1 in the InvoiceDate column.
 図7では3つの分析コマンドを抜粋して示したが、一般にデータ分析を行う場合には、このような分析コマンドが長く続く。
 分析内容解析部107は、上述の方法により分析コマンドの系列全体を解析することによって、提供先2が何を計算対象として、どのような結果を算出しようとしているか推定することができる。
Although three analysis commands are excerpted and shown in FIG. 7, such analysis commands generally continue for a long time when data analysis is performed.
The analysis content analysis unit 107 can estimate what the provider 2 is trying to calculate and what kind of result is to be calculated by analyzing the entire sequence of analysis commands by the above method.
 図8は、分析内容解析部107が分析コマンドの系列から分析内容を推定する動作を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
FIG. 8 is an example of a flowchart showing an operation in which the analysis content analysis unit 107 estimates the analysis content from a series of analysis commands.
The order of processing shown in this flowchart may be changed as appropriate.
(ステップS801:分析コマンド解釈処理)
 分析内容解析部107は、分析コマンドを解釈する。
 分析内容解析部107は、分析コマンドの解釈に、インタプリタ型言語と呼ばれるプログラミング言語において一般的に用いられている技術を用いても良い。
 分析内容解析部107は、
 分析コマンドの解釈が完了するまで本ステップの処理を続け、
 分析コマンドの解釈が完了した場合、ステップS802に進む。
(Step S8011: Analysis command interpretation process)
The analysis content analysis unit 107 interprets the analysis command.
The analysis content analysis unit 107 may use a technique generally used in a programming language called an interpreted language for interpreting analysis commands.
Analysis content analysis unit 107
Continue the process of this step until the interpretation of the analysis command is completed.
When the interpretation of the analysis command is completed, the process proceeds to step S802.
(ステップS802:分析コマンド解析処理)
 分析内容解析部107は、解釈した内容のうち、「計算対象」と、「計算内容」とを分析コマンドの系列全体にわたって解析する。解析の具体的な方法は、プログラミング言語に依存して決まる。
 図7に示す例において、分析内容解析部107は、
 分析コマンドの解釈の結果から、「計算対象」を変数Bに格納されたデータフレームの特定の列であると解析し、
 分析コマンドの解釈の結果から、「計算内容」をnumpyという数値計算用ライブラリの平均値算出関数と、標準偏差算出関数と、頻度分布算出関数とを用いた計算を行うことであると解析し、
 解析結果から、提供先2がInvoiceDate列の2010/1/1からの経過日数の頻度分布を算出しようとしていることを推定する。
(Step S802: Analysis command analysis process)
The analysis content analysis unit 107 analyzes the "calculation target" and the "calculation content" of the interpreted contents over the entire series of analysis commands. The specific method of analysis depends on the programming language.
In the example shown in FIG. 7, the analysis content analysis unit 107
From the result of the interpretation of the analysis command, the "calculation target" is analyzed as a specific column of the data frame stored in the variable B, and it is analyzed.
From the result of the interpretation of the analysis command, it is analyzed that the "calculation content" is to perform the calculation using the mean value calculation function of the numerical calculation library called numpy, the standard deviation calculation function, and the frequency distribution calculation function.
From the analysis result, it is estimated that the provider 2 is trying to calculate the frequency distribution of the number of days elapsed from 2010/1/1 of the InvoiceDate column.
(ステップS803:解析情報送信処理)
 分析内容解析部107は、解析した「計算対象」と、「計算内容」とを匿名化手法導出部108に送信する。
(Step S803: Analysis information transmission process)
The analysis content analysis unit 107 transmits the analyzed "calculation target" and the "calculation content" to the anonymization method derivation unit 108.
 図9は、分析内容の推定結果から、匿名化手法を導出する手順を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
 本図を用いて、分析内容の推定結果から、匿名化手法を導出する方法を説明する。
FIG. 9 is an example of a flowchart showing a procedure for deriving an anonymization method from the estimation result of the analysis content.
The order of processing shown in this flowchart may be changed as appropriate.
Using this figure, a method of deriving an anonymization method from the estimation result of the analysis content will be described.
(ステップS901:解析結果受信処理)
 匿名化手法導出部108は、分析内容解析部107から「計算対象」と、「計算内容」とを受信する。
(Step S9011: Analysis result reception processing)
The anonymization method derivation unit 108 receives the “calculation target” and the “calculation content” from the analysis content analysis unit 107.
(ステップS902:導出処理)
 匿名化手法導出部108は、受信した「計算対象」と、「計算内容」とに対応する匿名化方式と、そのパラメータとを、匿名化方式記憶部110から読み出すことにより、匿名化手法を導出する。
 「計算対象」と、「計算内容」とに対応する匿名化方式と、そのパラメータとの具体例としては、「計算内容」を保存するものである。
(Step S902: Derivation process)
The anonymization method derivation unit 108 derives the anonymization method by reading the received "calculation target", the anonymization method corresponding to the "calculation content", and its parameters from the anonymization method storage unit 110. To do.
As a specific example of the anonymization method corresponding to the "calculation target" and the "calculation content" and its parameters, the "calculation content" is saved.
(ステップS903:個人情報読出処理)
 匿名加工部109は、個人情報記憶部111から個人情報を読み出す。
(Step S903: Personal information reading process)
The anonymous processing unit 109 reads out personal information from the personal information storage unit 111.
(ステップS904:匿名加工処理)
 匿名加工部109は、
 ステップS903において読み出した個人情報に対して、匿名化手法を適用することにより、匿名加工情報を生成する。
(Step S904: Anonymous processing)
Anonymous processing unit 109
Anonymized processed information is generated by applying the anonymization method to the personal information read in step S903.
(ステップS905:安全性評価処理)
 匿名加工部109は、生成した匿名加工情報の安全性と、有用性とが、共に基準を達成しているか評価する。
 匿名化手法導出装置100は、
 共に基準を達成している場合、ステップS906に進み、
 それ以外の場合、ステップS902に進む。
(Step S905: Safety evaluation process)
The anonymous processing unit 109 evaluates whether the safety and usefulness of the generated anonymous processing information both meet the criteria.
The anonymization method derivation device 100
If both meet the criteria, proceed to step S906.
Otherwise, the process proceeds to step S902.
 匿名加工部109は、任意の方法によって、安全性と、有用性とが基準を達成しているか評価して良い。
 安全性と、有用性との評価方法は、匿名化手法に依存したものであっても良く、匿名化手法とは独立したものであっても良い。
 匿名加工部109は、
 前記評価方法を匿名化手法に依存するものとする場合、匿名化方式と、そのパラメータとを匿名化方式記憶部110から読み出す際に、安全性と、有用性との評価方法を読み出しても良く、
 匿名化手法とは独立したものとする場合、外部のデータベースを参照しても良い。
 匿名化方式記憶部110は、匿名化方式と、そのパラメータとの組み合わせ毎に、安全性と、有用性との評価方法を記憶していても良い。
The anonymous processing unit 109 may evaluate whether the safety and the usefulness meet the criteria by any method.
The evaluation method of safety and usefulness may depend on the anonymization method or may be independent of the anonymization method.
Anonymous processing unit 109
When the evaluation method depends on the anonymization method, when reading the anonymization method and its parameters from the anonymization method storage unit 110, the evaluation method of safety and usefulness may be read out. ,
If it is independent of the anonymization method, an external database may be referred to.
The anonymization method storage unit 110 may store the evaluation method of safety and usefulness for each combination of the anonymization method and its parameters.
(ステップS906:出力処理)
 実行結果出力部106は、匿名化手法と、匿名加工情報とを出力する。
 出力の方法は、ディスプレイ21への出力、又は、ネットワークを経由した出力等、提供元1が認識できる任意の方法であってよい。
(Step S906: Output processing)
The execution result output unit 106 outputs the anonymization method and the anonymization processing information.
The output method may be any method that can be recognized by the provider 1, such as output to the display 21 or output via the network.
***実施の形態1の効果の説明***
 以上のように、本実施の形態によれば、提供元1の個人情報を加工した合成データと、提供先2の分析コマンドの分析内容とに応じて適切な匿名化方式と、そのパラメータとを決定することができる。
 また、本実施の形態によれば、
 安全性と、有用性との基準を満たす匿名加工情報を生成することができるため、
 提供先2に、個人の権利利益を保護した匿名加工情報であって、分析に適した匿名加工情報を提供することができる。
*** Explanation of the effect of Embodiment 1 ***
As described above, according to the present embodiment, the synthetic data obtained by processing the personal information of the provider 1 and the appropriate anonymization method and its parameters according to the analysis content of the analysis command of the provider 2 are obtained. Can be decided.
Further, according to the present embodiment,
Because it is possible to generate anonymously processed information that meets the criteria of safety and usefulness,
Anonymously processed information that protects the rights and interests of individuals and is suitable for analysis can be provided to the provider 2.
<変形例1>
 匿名化手法導出装置100には、ディスプレイ21と、キーボード22と、マウス23と、ディスプレイ24と、キーボード25と、マウス26との内、少なくとも1つが接続されていなくても良い。
<Modification example 1>
At least one of the display 21, the keyboard 22, the mouse 23, the display 24, the keyboard 25, and the mouse 26 may not be connected to the anonymization method derivation device 100.
<変形例2>
 提供元1と、匿名化手法導出装置100とは、一体化していても良い。
 本変形例において、個人情報入力部101は、プロセッサ11と、メモリ12とから構成される。
<Modification 2>
The provider 1 and the anonymization method derivation device 100 may be integrated.
In this modification, the personal information input unit 101 includes a processor 11 and a memory 12.
<変形例3>
 匿名化手法導出装置100は、匿名化方式記憶部110を備えなくても良い。
 本変形例において、匿名化手法導出部108は、外部のデータベース等を参照することにより、匿名化手法を導出する。
<Modification example 3>
The anonymization method derivation device 100 does not have to include the anonymization method storage unit 110.
In this modification, the anonymization method derivation unit 108 derives the anonymization method by referring to an external database or the like.
<変形例4>
 匿名化方式記憶部110が記憶しているデータベースは、提供元1又は提供先2が事前に用意したものであっても良い。
 本変形例において、匿名化手法導出装置100は、匿名加工情報生成前に、提供元1又は提供先2が用意したデータベースを匿名化方式記憶部110に記憶させる。
<Modification example 4>
The database stored in the anonymization method storage unit 110 may be prepared in advance by the provider 1 or the provider 2.
In this modification, the anonymization method derivation device 100 stores the database prepared by the provider 1 or the provider 2 in the anonymization method storage unit 110 before generating the anonymized processing information.
<変形例5>
 合成データ記憶部103と、個人情報記憶部111と、分析コマンド記憶部112との内、少なくとも1つは、メモリ12と、記憶装置13とから構成されても良い。
<Modification 5>
At least one of the synthetic data storage unit 103, the personal information storage unit 111, and the analysis command storage unit 112 may be composed of the memory 12 and the storage device 13.
<変形例6>
 匿名化手法導出装置100は、分析コマンド実行部105を備えなくても良い。
 本変形例において、分析内容解析部107は、分析コマンドのデータに基づいて分析コマンドを解析する。
<Modification 6>
The anonymization method derivation device 100 does not have to include the analysis command execution unit 105.
In this modification, the analysis content analysis unit 107 analyzes the analysis command based on the data of the analysis command.
<変形例7>
 匿名化手法導出部108は、提供元1に、匿名加工情報と、匿名化手法とを出力しなくても良い。
<Modification 7>
The anonymization method derivation unit 108 does not have to output the anonymization processing information and the anonymization method to the provider 1.
<変形例8>
 匿名加工部109は、安全性と、有用性とに関する評価結果を提供元1に出力しても良い。
<Modification 8>
The anonymous processing unit 109 may output the evaluation result regarding safety and usefulness to the provider 1.
<変形例9>
 分析コマンド実行部105は、合成データに加えて、合成データ以外のデータを使用して分析コマンドを実行してもよい。
 本変形例において、匿名化手法導出装置100は、より実際のデータ分析のユースケースに近い環境に基づいて、適切な匿名化手法を決定することができる。
<Modification 9>
The analysis command execution unit 105 may execute the analysis command using data other than the composite data in addition to the composite data.
In this modification, the anonymization method derivation device 100 can determine an appropriate anonymization method based on an environment closer to the actual data analysis use case.
<変形例10>
 実行結果出力部106は、匿名化手法の情報を出力しなくても良い。
<Modification example 10>
The execution result output unit 106 does not have to output the information of the anonymization method.
<変形例11>
 本実施の形態では、匿名化手法導出装置100の各機能をソフトウェアで実現する場合を説明した。しかし、変形例として、前記各機能は、ハードウェアにより実現されても良い。
<Modification 11>
In the present embodiment, the case where each function of the anonymization method derivation device 100 is realized by software has been described. However, as a modification, each of the above functions may be realized by hardware.
 前記各機能がハードウェアにより実現される場合には、匿名化手法導出装置100は、プロセッサ11に代えて、電子回路(処理回路)を備える。あるいは、匿名化手法導出装置100は、プロセッサ11、及び、メモリ12に代えて、電子回路を備える。電子回路は、前記各機能(及びメモリ12)を実現する専用の電子回路である。 When each of the above functions is realized by hardware, the anonymization method derivation device 100 includes an electronic circuit (processing circuit) instead of the processor 11. Alternatively, the anonymization method derivation device 100 includes an electronic circuit instead of the processor 11 and the memory 12. The electronic circuit is a dedicated electronic circuit that realizes each of the above functions (and the memory 12).
 電子回路は、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ロジックIC、GA(Gate Array)、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)が想定される。 The electronic circuit is assumed to be a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, a logic IC, a GA (Gate Array), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable Gate Array). To.
 前記各機能を1つの電子回路で実現してもよいし、前記各機能を複数の電子回路に分散させて実現してもよい。 Each of the above functions may be realized by one electronic circuit, or each of the above functions may be distributed and realized in a plurality of electronic circuits.
 あるいは、一部の前記各機能がハードウェアで実現され、他の前記各機能がソフトウェアで実現されてもよい。 Alternatively, some of the above-mentioned functions may be realized by hardware, and other above-mentioned functions may be realized by software.
 前述したプロセッサ11とメモリ12と電子回路とを、総称して「プロセッシングサーキットリー」という。つまり、前記各機能は、プロセッシングサーキットリーにより実現される。 The above-mentioned processor 11, memory 12, and electronic circuit are collectively referred to as "processing circuit Lee". That is, each of the above functions is realized by the processing circuit.
 実施の形態2.
 以下、前述した実施の形態と異なる点について、図面を参照しながら説明する。
 本実施の形態に係る匿名化手法導出装置100は、提供元1が持つ個人情報と、提供先2が持つ個人情報とに基づいて、適切な匿名化手法を決定する。
Embodiment 2.
Hereinafter, points different from the above-described embodiment will be described with reference to the drawings.
The anonymization method derivation device 100 according to the present embodiment determines an appropriate anonymization method based on the personal information possessed by the provider 1 and the personal information possessed by the provider 2.
***構成の説明***
 図10は、本実施の形態に係る匿名化手法導出装置100と、本実施の形態に係る合成データ生成装置200とを備える匿名化手法導出システムの例を示す図である。
*** Explanation of configuration ***
FIG. 10 is a diagram showing an example of an anonymization method derivation system including an anonymization method derivation device 100 according to the present embodiment and a synthetic data generation device 200 according to the present embodiment.
 匿名化手法導出装置100は、合成データ受信部121を備える。 The anonymization method derivation device 100 includes a synthetic data receiving unit 121.
 合成データ受信部121は、合成データ送信部203が送信した提供先合成データを受信する。
 提供先合成データは、合成データ62と同義である。
The composite data receiving unit 121 receives the provider composite data transmitted by the composite data transmission unit 203.
The destination composite data is synonymous with the composite data 62.
 合成データ生成装置200は、
 提供先2が持つ個人情報から合成データ62を生成する装置であり、
 個人情報入力部201と、提供先合成データ生成部202と、匿名化手法導出装置100に対して合成データ62を送信する合成データ送信部203と、提供先記憶部204とから構成される。
The synthetic data generator 200
It is a device that generates synthetic data 62 from the personal information of the provider 2.
It is composed of a personal information input unit 201, a provision destination synthetic data generation unit 202, a synthetic data transmission unit 203 that transmits synthetic data 62 to the anonymization method derivation device 100, and a provision destination storage unit 204.
 個人情報入力部201は、個人情報入力部101と同様である。 The personal information input unit 201 is the same as the personal information input unit 101.
 提供先合成データ生成部202は、
 合成データ生成部102と同様であり、
 提供先記憶部204が記憶している個人情報を加工した提供先合成データを生成する。
The destination composite data generation unit 202
It is the same as the composite data generation unit 102,
Generates the destination composite data obtained by processing the personal information stored in the destination storage unit 204.
 合成データ送信部203は、提供先合成データを送信する。 The composite data transmission unit 203 transmits the provider composite data.
 提供先記憶部204は、
 個人情報記憶部111と同様であり、
 個人情報を保持することができる。
 個人情報入力部201が提供先記憶部204に個人情報を記憶させた場合、提供先記憶部204は、匿名加工情報の提供先が有する個人情報を記憶している。
The destination storage unit 204
Similar to the personal information storage unit 111,
Personal information can be retained.
When the personal information input unit 201 stores the personal information in the provision destination storage unit 204, the provision destination storage unit 204 stores the personal information possessed by the destination of the anonymously processed information.
 合成データ受信部121は、図には示さないが、プロセッサ11と、メモリ12と、ポート14とから構成される。 Although not shown in the figure, the composite data receiving unit 121 includes a processor 11, a memory 12, and a port 14.
 図11は、合成データ生成装置200のハードウェア構成例である。
 本図に示すように、合成データ生成装置200は、一般的なコンピュータ10から構成される。
FIG. 11 is a hardware configuration example of the composite data generation device 200.
As shown in this figure, the synthetic data generator 200 is composed of a general computer 10.
 個人情報入力部201と、合成データ送信部203とは、プロセッサ11と、メモリ12と、ポート14とから構成される。 The personal information input unit 201 and the composite data transmission unit 203 are composed of a processor 11, a memory 12, and a port 14.
 提供先合成データ生成部202は、プロセッサ11と、メモリ12とから構成される。 The provider composite data generation unit 202 is composed of a processor 11 and a memory 12.
 提供先記憶部204は、メモリ12から構成される。 The provision destination storage unit 204 is composed of the memory 12.
***動作の説明***
 本実施の形態における動作は、登録フェーズ51と、分析フェーズ52と、導出フェーズ53との3つのフェーズから構成される。
 以下、これらの動作を順に説明する。ただし、実施の形態1と同じ動作である場合には説明を省略する。
*** Explanation of operation ***
The operation in the present embodiment is composed of three phases, a registration phase 51, an analysis phase 52, and a derivation phase 53.
Hereinafter, these operations will be described in order. However, the description will be omitted when the operation is the same as that of the first embodiment.
***登録フェーズ51の動作の説明***
 図12は、登録フェーズ51の内、合成データ生成装置200に関する手順、即ち、提供先2が合成データ生成装置200に個人情報を入力してから、合成データ生成装置200が匿名化手法導出装置100へ合成データ62を送信するまでの動作を示すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
*** Explanation of the operation of registration phase 51 ***
FIG. 12 shows a procedure relating to the synthetic data generation device 200 in the registration phase 51, that is, after the provider 2 inputs personal information to the synthetic data generation device 200, the synthetic data generation device 200 is anonymized method derivation device 100. This is an example of a flowchart showing an operation until the composite data 62 is transmitted to.
The order of processing shown in this flowchart may be changed as appropriate.
(ステップS311:入力受付処理)
 個人情報入力部201は、
 提供先2からの個人情報の入力を受け付け、
 受け付けた個人情報を提供先記憶部204に記憶させる。
(Step S311: Input reception process)
The personal information input unit 201
Accepting the input of personal information from the provider 2
The received personal information is stored in the provision destination storage unit 204.
(ステップS312:合成データ生成処理)
 提供先合成データ生成部202は、提供先記憶部204が記憶している個人情報に基づいて合成データ62を生成する。
(Step S312: Synthetic data generation process)
The provider composite data generation unit 202 generates the composite data 62 based on the personal information stored in the provider storage unit 204.
(ステップS313:合成データ送信処理)
 合成データ送信部203は、合成データ62を匿名化手法導出装置100に送信する。
(Step S313: Synthetic data transmission process)
The synthetic data transmission unit 203 transmits the synthetic data 62 to the anonymization method derivation device 100.
 図13は、登録フェーズ51の内、匿名化手法導出装置100に関する手順、即ち、提供元1が匿名化手法導出装置100に個人情報を入力してから、合成データ生成部102が合成データ61を合成データ記憶部103に記憶させ、合成データ受信部121が合成データ62を合成データ記憶部103に記憶させるまでの手順を表すフローチャートの例である。
 本フローチャートに示す処理の順序は、適宜変更しても良い。
FIG. 13 shows a procedure relating to the anonymization method derivation device 100 in the registration phase 51, that is, after the provider 1 inputs personal information to the anonymization method derivation device 100, the composite data generation unit 102 generates the composite data 61. This is an example of a flowchart showing a procedure in which the composite data storage unit 103 stores the data and the composite data receiving unit 121 stores the composite data 62 in the composite data storage unit 103.
The order of processing shown in this flowchart may be changed as appropriate.
 ステップS301及びステップS302は、実施の形態1のものと同様であるため、説明を省略する。 Since steps S301 and S302 are the same as those in the first embodiment, the description thereof will be omitted.
(ステップS323:合成データ受信処理)
 合成データ受信部121は、合成データ生成装置200から合成データ62を受信する。
(Step S323: Synthetic data reception process)
The composite data receiving unit 121 receives the composite data 62 from the composite data generation device 200.
(ステップS324:合成データ記憶処理)
 合成データ生成部102は、合成データ61を合成データ記憶部103に記憶させ、
 合成データ受信部121は、合成データ62を合成データ記憶部103に記憶させる。
(Step S324: Synthetic data storage process)
The composite data generation unit 102 stores the composite data 61 in the composite data storage unit 103.
The composite data receiving unit 121 stores the composite data 62 in the composite data storage unit 103.
 分析フェーズ52及び導出フェーズ53の動作の説明は、実施の形態1における動作の説明中の合成データ61を、合成データ61及び合成データ62に読み替えたものであるため、省略する。
 なお、本実施の形態において、分析コマンド実行部105は、合成データ受信部121が受信した提供先合成データを利用して分析コマンドを実行する。
The description of the operations of the analysis phase 52 and the derivation phase 53 will be omitted because the synthetic data 61 in the description of the operations in the first embodiment is replaced with the synthetic data 61 and the synthetic data 62.
In the present embodiment, the analysis command execution unit 105 executes the analysis command by using the provider composite data received by the composite data reception unit 121.
***実施の形態2の効果の説明***
 以上のように、本実施の形態によれば、
 合成データ生成装置200は、提供先2の個人情報を加工した合成データを匿名化手法導出装置100に送信し、
 匿名化手法導出装置100は、提供元1の個人情報を加工した合成データと、提供先2の個人情報を加工した合成データとに基づいて用いて適切な匿名化手法を選択することができるため、より実際のデータ分析のユースケースに近い環境において、適切な匿名化方式と、そのパラメータとを決定することができる。
 また、本実施の形態によれば、実施の形態1と同様に、安全性と、有用性との基準を満たす匿名加工情報を生成することができる。
*** Explanation of the effect of Embodiment 2 ***
As described above, according to the present embodiment,
The synthetic data generation device 200 transmits the synthetic data obtained by processing the personal information of the provider 2 to the anonymization method derivation device 100.
Since the anonymization method derivation device 100 can select an appropriate anonymization method by using the synthetic data obtained by processing the personal information of the provider 1 and the synthetic data obtained by processing the personal information of the provider 2. , Appropriate anonymization methods and their parameters can be determined in an environment closer to the actual data analysis use case.
Further, according to the present embodiment, similarly to the first embodiment, it is possible to generate anonymously processed information that satisfies the criteria of safety and usefulness.
<変形例12>
 提供先2と、合成データ生成装置200とは、一体化していても良い。
 本変形例において、個人情報入力部201は、プロセッサ11と、メモリ12とから構成される。
<Modification example 12>
The provider 2 and the composite data generation device 200 may be integrated.
In this modification, the personal information input unit 201 includes a processor 11 and a memory 12.
<変形例13>
 匿名化手法導出装置100と、合成データ生成装置200とは、一体化していても良い。
 本変形例において、合成データ受信部121と、合成データ送信部203とは、プロセッサ11と、メモリ12とから構成される。
<Modification example 13>
The anonymization method derivation device 100 and the synthetic data generation device 200 may be integrated.
In this modification, the composite data receiving unit 121 and the composite data transmitting unit 203 are composed of a processor 11 and a memory 12.
<変形例14>
 提供先記憶部204は、メモリ12と、記憶装置13とから構成されていても良い。
<Modification 14>
The destination storage unit 204 may be composed of the memory 12 and the storage device 13.
***他の実施の形態***
 前述した各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。
*** Other embodiments ***
It is possible to freely combine the above-described embodiments, modify any component of each embodiment, or omit any component in each embodiment.
 また、実施の形態は、実施の形態1から2で示したものに限定されるものではなく、必要に応じて種々の変更が可能である。 Further, the embodiment is not limited to the one shown in the first and second embodiments, and various changes can be made as needed.
 また、提供先2の分析コマンドの説明において、インタプリタ型のプログラミング言語であるPythonを用いたが、分析コマンドは、インタプリタ型のプログラミング言語により作成されていなくても良い。
 分析内容解析部107は、具体例としては、コンパイル型のプログラミング言語により分析コマンドが作成されている場合であっても、分析コマンドの「計算対象」と、「計算内容」とを解析することが可能である。
Further, although Python, which is an interpreter-type programming language, is used in the explanation of the analysis command of the provider 2, the analysis command does not have to be created by the interpreter-type programming language.
As a specific example, the analysis content analysis unit 107 may analyze the "calculation target" and the "calculation content" of the analysis command even when the analysis command is created by the compiled programming language. It is possible.
 また、匿名化手法導出装置100は、提供先2がプログラミング言語を使わず、データ分析専用ソフトウェア等の別の手段により個人情報を分析する場合であっても、分析内容を解釈することができる。この場合、匿名化手法導出装置100は、データ分析の分野において公知の技術を組み合わせることにより、分析フェーズ52を実現することができる。 Further, the anonymization method derivation device 100 can interpret the analysis content even when the provider 2 does not use a programming language and analyzes personal information by another means such as data analysis dedicated software. In this case, the anonymization method derivation device 100 can realize the analysis phase 52 by combining techniques known in the field of data analysis.
 1 提供元、2 提供先、10 コンピュータ、11 プロセッサ、12 メモリ、13 記憶装置、14 ポート、15 データバス、21 ディスプレイ、22 キーボード、23 マウス、24 ディスプレイ、25 キーボード、26 マウス、51 登録フェーズ、52 分析フェーズ、53 導出フェーズ、61 合成データ、62 合成データ、100 匿名化手法導出装置、101 個人情報入力部、102 合成データ生成部、103 合成データ記憶部、104 分析コマンド入力部、105 分析コマンド実行部、106 実行結果出力部、107 分析内容解析部、108 匿名化手法導出部、109 匿名加工部、110 匿名化方式記憶部、111 個人情報記憶部、112 分析コマンド記憶部、121 合成データ受信部、200 合成データ生成装置、201 個人情報入力部、202 提供先合成データ生成部、203 合成データ送信部、204 提供先記憶部、501 行、502 行、503 行、504 行、505 行。 1 Provider, 2 Provider, 10 Computer, 11 Processor, 12 Memory, 13 Storage Device, 14 Port, 15 Data Bus, 21 Display, 22 Keyboard, 23 Mouse, 24 Display, 25 Keyboard, 26 Mouse, 51 Registration Phase, 52 analysis phase, 53 derivation phase, 61 synthetic data, 62 synthetic data, 100 anonymization method derivation device, 101 personal information input unit, 102 synthetic data generation unit, 103 synthetic data storage unit, 104 analysis command input unit, 105 analysis command Execution unit, 106 Execution result output unit, 107 Analysis content analysis unit, 108 Anonymous method derivation unit, 109 Anonymous processing unit, 110 Anonymous method storage unit, 111 Personal information storage unit, 112 Analysis command storage unit, 121 Synthetic data reception Unit, 200 Synthetic data generator, 201 Personal information input unit, 202 Provided destination synthetic data generation unit, 203 Synthetic data transmission unit, 204 Provided destination storage unit, 501 lines, 502 lines, 503 lines, 504 lines, 505 lines.

Claims (9)

  1.  個人情報を記憶している個人情報記憶部と、
     前記個人情報記憶部が記憶している前記個人情報を分析する分析コマンドを記憶している分析コマンド記憶部と、
     前記分析コマンド記憶部が記憶している前記分析コマンドに基づいて、前記個人情報を匿名化する匿名化手法を導出する匿名化手法導出部と、
     前記匿名化手法導出部が導出した前記匿名化手法を利用して、前記個人情報を匿名化した匿名加工情報を生成する匿名加工部と
    を備える匿名化手法導出装置。
    The personal information storage unit that stores personal information and
    An analysis command storage unit that stores an analysis command that analyzes the personal information stored in the personal information storage unit, and an analysis command storage unit that stores the analysis command that analyzes the personal information.
    An anonymization method derivation unit that derives an anonymization method for anonymizing the personal information based on the analysis command stored in the analysis command storage unit.
    An anonymization method derivation device including an anonymization processing unit that generates anonymized processing information in which the personal information is anonymized by using the anonymization method derived by the anonymization method derivation unit.
  2.  前記匿名加工部は、前記匿名加工情報の安全性と、有用性とを評価する請求項1に記載の匿名化手法導出装置。 The anonymization method derivation device according to claim 1, wherein the anonymous processing unit evaluates the safety and usefulness of the anonymous processing information.
  3.  前記個人情報を匿名化する匿名化方式を実現するプログラムを記憶している匿名化方式記憶部を備え、
     前記匿名化手法導出部は、前記匿名化方式記憶部が記憶している前記匿名化方式に基づいて前記匿名化手法を導出する請求項1又は2に記載の匿名化手法導出装置。
    It is provided with an anonymization method storage unit that stores a program that realizes an anonymization method for anonymizing the personal information.
    The anonymization method derivation device according to claim 1 or 2, wherein the anonymization method derivation unit derives the anonymization method based on the anonymization method stored in the anonymization method storage unit.
  4.  前記分析コマンドを解析し、解析情報を出力する分析内容解析部を備え、
     前記匿名化手法導出部は、前記解析情報に基づいて前記匿名化手法を導出する請求項1から3のいずれか1項に記載の匿名化手法導出装置。
    It is equipped with an analysis content analysis unit that analyzes the analysis command and outputs analysis information.
    The anonymization method derivation unit according to any one of claims 1 to 3, wherein the anonymization method derivation unit derives the anonymization method based on the analysis information.
  5.  前記分析内容解析部は、前記解析情報として、前記分析コマンドの実行時に利用する前記個人情報と、前記分析コマンドの実行時に前記個人情報に対して行う操作とを出力する請求項4に記載の匿名化手法導出装置。 The anonymity according to claim 4, wherein the analysis content analysis unit outputs the personal information used when executing the analysis command and the operation performed on the personal information when executing the analysis command as the analysis information. Method derivation device.
  6.  前記個人情報記憶部が記憶している前記個人情報を加工した合成データを生成する合成データ生成部と、
     前記合成データ生成部が生成した前記合成データを利用して前記分析コマンドを実行する分析コマンド実行部と
    を備え、
     前記分析内容解析部は、前記分析コマンド実行部が前記分析コマンドを実行する際の実行情報に基づいて前記分析コマンドを解析する請求項4又は5に記載の匿名化手法導出装置。
    A synthetic data generation unit that generates synthetic data obtained by processing the personal information stored in the personal information storage unit, and
    It is provided with an analysis command execution unit that executes the analysis command using the synthetic data generated by the synthetic data generation unit.
    The anonymization method deriving device according to claim 4 or 5, wherein the analysis content analysis unit analyzes the analysis command based on execution information when the analysis command execution unit executes the analysis command.
  7.  前記匿名加工情報の提供先が有する個人情報を記憶している提供先記憶部と、
     前記提供先記憶部が記憶している前記個人情報を加工した提供先合成データを生成する提供先合成データ生成部と、
     前記提供先合成データを送信する合成データ送信部と
    を備える合成データ生成装置と、
     前記合成データ送信部が送信した前記提供先合成データを受信する合成データ受信部を備える請求項6に記載の匿名化手法導出装置と
    を備え、
     前記分析コマンド実行部は、前記合成データ受信部が受信した前記提供先合成データを利用して前記分析コマンドを実行する匿名化手法導出システム。
    A provider storage unit that stores personal information held by the anonymously processed information provider, and a provider storage unit.
    A provider composite data generation unit that generates destination composite data obtained by processing the personal information stored in the provider storage unit,
    A composite data generator including a composite data transmission unit for transmitting the provider composite data, and
    The anonymization method derivation device according to claim 6, further comprising a synthetic data receiving unit for receiving the provided destination synthetic data transmitted by the synthetic data transmitting unit.
    The analysis command execution unit is an anonymization method derivation system that executes the analysis command by using the destination synthetic data received by the synthetic data receiving unit.
  8.  個人情報記憶部が、個人情報を記憶しており、
     分析コマンド記憶部が、前記個人情報記憶部が記憶している前記個人情報を分析する分析コマンドを記憶しており、
     匿名化手法導出部が、前記分析コマンド記憶部が記憶している前記分析コマンドに基づいて、前記個人情報を匿名化する匿名化手法を導出し、
     匿名加工部が、前記匿名化手法導出部が導出した前記匿名化手法を利用して、前記個人情報を匿名化した匿名加工情報を生成する匿名化手法導出方法。
    The personal information storage department stores personal information,
    The analysis command storage unit stores the analysis command for analyzing the personal information stored in the personal information storage unit.
    The anonymization method derivation unit derives an anonymization method for anonymizing the personal information based on the analysis command stored in the analysis command storage unit.
    An anonymization method derivation method in which the anonymization processing unit generates anonymized processing information in which the personal information is anonymized by using the anonymization method derived by the anonymization method derivation unit.
  9.  コンピュータに、
     個人情報を記憶させ、
     記憶させた前記個人情報を分析する分析コマンドを記憶させ、
     記憶させた前記分析コマンドに基づいて、前記個人情報を匿名化する匿名化手法を導出させ、
     導出させた前記匿名化手法を利用して、前記個人情報を匿名化した匿名加工情報を生成させる匿名化手法導出プログラム。
    On the computer
    Memorize personal information
    The analysis command for analyzing the stored personal information is stored and stored.
    Based on the stored analysis command, an anonymization method for anonymizing the personal information is derived.
    An anonymization method derivation program that generates anonymized processed information by anonymizing the personal information by using the derived anonymization method.
PCT/JP2019/020137 2019-05-21 2019-05-21 Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system WO2020235008A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2019/020137 WO2020235008A1 (en) 2019-05-21 2019-05-21 Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system
JP2019550273A JP6695511B1 (en) 2019-05-21 2019-05-21 Anonymization method derivation device, anonymization method derivation method, anonymization method derivation program, and anonymization method derivation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2019/020137 WO2020235008A1 (en) 2019-05-21 2019-05-21 Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system

Publications (1)

Publication Number Publication Date
WO2020235008A1 true WO2020235008A1 (en) 2020-11-26

Family

ID=70682351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/020137 WO2020235008A1 (en) 2019-05-21 2019-05-21 Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system

Country Status (2)

Country Link
JP (1) JP6695511B1 (en)
WO (1) WO2020235008A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013027785A1 (en) * 2011-08-25 2013-02-28 日本電気株式会社 Anonymization device, anonymization method, and recording medium recoding program therefor
JP2014086037A (en) * 2012-10-26 2014-05-12 Toshiba Corp Anonymized data modification system
JP2014191431A (en) * 2013-03-26 2014-10-06 Nippon Telegr & Teleph Corp <Ntt> Anonymity system, possession device, anonymity device, user device, anonymity method and program
WO2014185043A1 (en) * 2013-05-15 2014-11-20 日本電気株式会社 Information processing device, information anonymization method, and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013027785A1 (en) * 2011-08-25 2013-02-28 日本電気株式会社 Anonymization device, anonymization method, and recording medium recoding program therefor
JP2014086037A (en) * 2012-10-26 2014-05-12 Toshiba Corp Anonymized data modification system
JP2014191431A (en) * 2013-03-26 2014-10-06 Nippon Telegr & Teleph Corp <Ntt> Anonymity system, possession device, anonymity device, user device, anonymity method and program
WO2014185043A1 (en) * 2013-05-15 2014-11-20 日本電気株式会社 Information processing device, information anonymization method, and recording medium

Also Published As

Publication number Publication date
JP6695511B1 (en) 2020-05-20
JPWO2020235008A1 (en) 2021-11-25

Similar Documents

Publication Publication Date Title
US11372997B2 (en) Automatic audit logging of events in software applications performing regulatory workloads
CN107113183B (en) System and method for controlled sharing of big data
US10884838B2 (en) Maintaining core dump privacy during application fault handling
US11640286B2 (en) Production-ready attributes creation and management for software development
US9716704B2 (en) Code analysis for providing data privacy in ETL systems
US10796071B2 (en) Analyzing document content and generating an appendix
US20150149148A1 (en) Language independent processing of logs in a log analytics system
US20150213069A1 (en) Tag Based System For Leveraging Design Data
WO2020235008A1 (en) Anonymization technique derivation device, anonymization technique derivation method, anonymization technique derivation program, and anonymization technique derivation system
JPWO2019138542A1 (en) Countermeasure planning support device, countermeasure planning support method, and countermeasure planning support program
Volgushev et al. Integrating mpc in big data workflows
JP6192601B2 (en) Personal information management system and personal information anonymization device
JP2015170169A (en) Personal information anonymization program and personal information anonymization device
Angermeier et al. Supporting risk assessment with the systematic identification, merging, and validation of security goals
JP6358260B2 (en) Information processing system, information processing method, and recording medium for storing program
JP6630840B2 (en) System and method for estimating landmark delimiters for log analysis
KR102540309B1 (en) Method, device and computer-readable recording medium for renting a virtual exhibition space for exhibiting copyright-infringed images
JP6914454B2 (en) Privacy risk analysis system, privacy risk analysis method and privacy risk analysis program
US20230107510A1 (en) Systems and methods for zero-trust algorithm deployment and operation on a protected dataset
EP4339819A1 (en) Model protection method and apparatus
KR102017475B1 (en) Analysis apparatus, operating method of the same, and system comprising the same
KR20230105389A (en) Method, apparatus and system for processing object data of 3d image
CA3234347A1 (en) Systems and methods for zero-trust algorithm deployment and operation on a protected dataset
WO2016075930A1 (en) Development assistance device, development assistance method, and recording medium storing development assistance program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019550273

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929841

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929841

Country of ref document: EP

Kind code of ref document: A1