WO2021220404A1 - Anonymized database generation device, anonymized database generation method, and program - Google Patents

Anonymized database generation device, anonymized database generation method, and program Download PDF

Info

Publication number
WO2021220404A1
WO2021220404A1 PCT/JP2020/018127 JP2020018127W WO2021220404A1 WO 2021220404 A1 WO2021220404 A1 WO 2021220404A1 JP 2020018127 W JP2020018127 W JP 2020018127W WO 2021220404 A1 WO2021220404 A1 WO 2021220404A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
database
type
anonymized
identifier
Prior art date
Application number
PCT/JP2020/018127
Other languages
French (fr)
Japanese (ja)
Inventor
聡 長谷川
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/018127 priority Critical patent/WO2021220404A1/en
Priority to JP2022518490A priority patent/JP7405248B2/en
Publication of WO2021220404A1 publication Critical patent/WO2021220404A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • the present invention relates to a technique for anonymizing a database.
  • Anonymously processed information refers to information that has been processed so that a specific individual cannot be identified so that the personal information cannot be restored.
  • the requirements for anonymously processed information are stipulated by the laws of each country (for example, the Personal Information Protection Law in Japan), and the processing methods (for example, deletion or replacement) described in Non-Patent Document 1 and Non-Patent Document 2 are used. Therefore, it is necessary to process personal information so as to meet the requirements.
  • an object of the present invention is to provide a technique capable of generating anonymously processed information without having knowledge of law or processing method.
  • One aspect of the present invention is an attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to an attribute constituting the database as the type of the attribute, and an attribute constituting the database. Therefore, it includes an anonymized database generation unit that anonymizes the value of the attribute by using a method according to the type of the attribute and generates an anonymized database.
  • One aspect of the present invention is an attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to an attribute constituting the database as the type of the attribute, and an attribute constituting the database.
  • the attribute type correction unit that corrects the type of the attribute to the type of no processing and the attribute that constitutes the database are adjusted according to the type of the attribute.
  • It includes an anonymization database generation unit that anonymizes the value of the attribute by using the above method and generates an anonymization database.
  • (Caret) represents a superscript.
  • x y ⁇ z means that y z is a superscript for x
  • x y ⁇ z means that y z is a subscript for x
  • _ (underscore) represents a subscript.
  • x y_z means that y z is a superscript for x
  • x y_z means that y z is a subscript for x.
  • the target for generating anonymized processed information is a database, and an anonymized database in which the data of the database is anonymized is generated.
  • the procedure for generating the anonymized database in each embodiment will be described.
  • the attributes of the database are classified into direct identifiers, quasi-identifiers, and others.
  • Direct identifiers, quasi-identifiers, and others are called types.
  • a direct identifier is an attribute that can identify a specific individual by itself.
  • a quasi-identifier is an attribute that can identify a specific individual in combination with other attributes. Others refer to attributes that do not correspond to either direct identifiers or quasi-identifiers.
  • the attribute data is anonymously processed by using an appropriate processing method according to the type.
  • the anonymized database generation device 100 takes the database as an input, generates an anonymized database, and outputs the anonymized database.
  • FIG. 1 is a block diagram showing the configuration of the anonymized database generation device 100.
  • FIG. 2 is a flowchart showing the operation of the anonymized database generation device 100.
  • the anonymization database generation device 100 includes an attribute type classification unit 110, an anonymization database generation unit 120, and a recording unit 190.
  • the recording unit 190 is a configuration unit that appropriately records information necessary for processing of the anonymized database generation device 100. In the recording unit 190, for example, a database to be anonymized is recorded.
  • the attribute type classification unit 110 takes the database as an input, assigns a direct identifier, a quasi-identifier, or any other type as the type of the attribute to each of the attributes constituting the database, and outputs the classification result. do.
  • Examples of direct identifiers and quasi-identifiers include name, email address, my number, basic pension number, resident's card code, telephone number, passport number, credit card number, and so on.
  • Examples of quasi-identifiers include age, address, gender, and date of birth.
  • an attribute that correlates with a certain quasi-identifier is also treated as a quasi-identifier.
  • the name which is a direct identifier
  • the address and the gender which are quasi-identifiers, can also be discriminated by pattern matching in which the list of addresses and the list of genders are a set of predetermined data.
  • the regular expression method is a method of determining whether or not the attribute data to be classified is a direct identifier / quasi-identifier by determining whether or not it corresponds to a predetermined regular expression. be.
  • the e-mail address, telephone number, and passport number which are direct identifiers, can be determined by whether or not they correspond to a predetermined regular expression.
  • the method by the check digit generation algorithm is a direct identifier by determining whether or not the data of the attribute to be classified is the data generated by using a predetermined check digit generation algorithm. It is a method of determining whether or not there is.
  • the My Number and resident's card code which are direct identifiers, can be determined by whether or not they are data generated using a check digit generation algorithm called Modulus11Weight234567. Further, the credit card number, which is a direct identifier, can be determined by whether or not the data is generated by using a check digit generation algorithm called Luhn Algorithm.
  • the method by range check is a method of determining whether or not it is a quasi-identifier by determining whether or not the data of the attribute to be classified is included in a predetermined data range.
  • the age which is a quasi-identifier, can be determined by whether or not it is included in the data range with ⁇ 0, 1, ..., 119, 120 ⁇ as a predetermined data range.
  • the correlation method is a method of determining whether or not an attribute to be classified is a quasi-identifier by determining whether or not there is a correlation with a certain quasi-identifier.
  • Pearson correlation is used when both the attribute to be classified and the quasi-identifier used for judgment are quantitative attributes. If one of the attribute to be classified and the quasi-identifier used for judgment is a quantitative attribute and the other is a qualitative attribute, the correlation ratio is used.
  • both the attribute to be classified and the quasi-identifier used for judgment are qualitative attributes
  • the number of associations of Cramer is used.
  • a qualitative attribute is an attribute that takes a value other than a numerical value as an attribute value such as gender
  • a quantitative attribute is an attribute that takes a numerical value as an attribute value such as age.
  • age, address, gender, and date of birth can be used. At that time, if the distribution of the quasi-identifier data is uniform, do not use it for the judgment of the presence or absence of correlation. It may be. By doing so, it is possible to reduce an error in determining whether or not the attribute to be classified is a quasi-identifier.
  • the attribute type determination unit 110 assigns a type by using some of a pattern matching method, a regular expression method, a check digit generation algorithm method, a range check method, and a correlation method. Can be configured in. That is, the attribute type classification unit 110 assigns one or more methods selected from a pattern matching method, a regular expression method, a check digit generation algorithm method, a range check method, and a correlation method to the attributes constituting the database. The type is determined by sequentially applying to the attribute, and the classification result including the set of the attribute and the type given to the attribute is generated and output.
  • the anonymization database generation unit 120 inputs the database and the classification result output in S110, and for each of the attributes constituting the database, the value of the attribute is used by a method according to the type of the attribute. Anonymize, generate an anonymized database, and output.
  • Item deletion is a method of anonymizing by deleting all the values of the attributes to be anonymized (that is, deleting the attribute items themselves).
  • Temporary ID conversion is a method of anonymizing by converting the value of an attribute to be anonymized into an ID using a hash function or the like.
  • Deletion is a method of anonymizing by deleting a part or all of the values of attributes to be anonymized.
  • (2) Generalization Generalization is a method of anonymizing by replacing the value of an attribute to be anonymized by using a higher-level concept.
  • (3) Rounding Rounding is a method of anonymizing an attribute by replacing it with a value obtained by rounding or rounding down the value of the attribute when the attribute to be anonymized is a quantitative attribute.
  • (4) Swapping Swapping is a method of anonymizing by (probabilistically) exchanging the values of attributes to be anonymized between records.
  • Addition of noise Noise addition is anonymization by adding a random value generated according to a certain (probability) distribution to the value of the attribute when the attribute to be anonymized is a quantitative attribute. How to do it.
  • Microaggregation is a method of anonymizing by grouping the values of attributes to be anonymized and replacing the values of the group with representative values.
  • Top coding When the attribute to be anonymized is a quantitative attribute, top coding is a method of anonymizing by collecting a particularly large numerical value with respect to the value of the attribute.
  • Bottom coding When the attribute to be anonymized is a quantitative attribute, bottom coding is a method of anonymizing by collecting a numerical value particularly small with respect to the value of the attribute.
  • Outlier processing is a method of anonymizing by deleting a peculiar value (outlier value) included in an attribute to be anonymized and performing processing such as top coding and bottom coding.
  • Randomization Randomization is a method of anonymizing by (probabilistically) replacing the value of an attribute to be anonymized with another value.
  • the anonymization database generation unit 120 anonymizes the attribute that constitutes the database by using some of the methods of deleting items and creating a temporary ID. If the type is a quasi-identifier, anonymize it using a method that satisfies k-anonymity, and if the type of attributes that make up the database is other, delete, generalize, round, swap, add noise, microaggregation. , Top coding, bottom coding, outlier processing, and randomization can be configured to be anonymized using several methods.
  • the anonymized database generator 200 takes the database as an input, generates an anonymized database, and outputs the anonymized database.
  • FIG. 3 is a block diagram showing the configuration of the anonymized database generation device 200.
  • FIG. 4 is a flowchart showing the operation of the anonymized database generator 200.
  • the anonymization database generation device 200 includes an attribute type classification unit 110, an attribute type correction unit 210, an anonymization database generation unit 120, and a recording unit 190.
  • the recording unit 190 is a configuration unit that appropriately records information necessary for processing of the anonymized database generation device 200. In the recording unit 190, for example, a database to be anonymized is recorded.
  • the attribute type classification unit 110 takes the database as an input, assigns a direct identifier, a quasi-identifier, or any other type as the type of the attribute to each of the attributes constituting the database, and outputs the classification result. do.
  • the attribute type correction unit 210 takes the classification result output in S110 as an input, and when the user determines that the attribute type is not appropriate for each of the attributes constituting the database, the attribute type correction unit 210 determines the attribute type. It is corrected to the type of no processing, and the classification result reflecting the correction is output.
  • the user inputs a correction instruction to the attribute type correction unit 210 using, for example, an input unit (not shown).
  • the anonymization database generation unit 120 takes the database and the classification result output in S210 as inputs, and for each of the attributes constituting the database, the value of the attribute is used by a method according to the type of the attribute. Anonymize, generate an anonymized database, and output. Here, if the attribute type is unprocessed, the anonymization process is not executed.
  • the processing in each component will be described using the database shown in FIG. 5 as an example.
  • the database has seven attributes identified as (a), (b), (c), (d), (e), (f), (g).
  • (1) Processing by the attribute type classification unit 110 and the attribute type correction unit 210 The attribute type classification unit 110 determines whether the above seven attributes directly correspond to an identifier, a quasi-identifier, or any other, and generates a classification result. do.
  • the attribute type correction unit 210 generates a classification result corrected based on the correction instruction by the user.
  • the attribute (a) it is determined that the attribute is a name by pattern matching using a list of names, and an identifier is directly assigned as the type of the attribute (a).
  • attribute (b) since it follows the check digit generation algorithm called Modulus11Weight234567, it is determined that the attribute is my number, and an identifier is directly assigned as the type of attribute (b).
  • attribute (c) it is determined that the attribute is gender by pattern matching using a list of genders, and a quasi-identifier is assigned as the type of attribute (c).
  • the attribute (d) it is determined that the attribute is an address by pattern matching using a list of addresses, and a quasi-identifier is assigned as the type of the attribute (d).
  • a range check with ⁇ 0, 1,..., 119, 120 ⁇ as the data range determines that the attribute is age, and assigns a quasi-identifier as the type of attribute (e).
  • the user determines that the type obtained by the processing in the attribute type classification unit 110 is inappropriate, and corrects the type of the attribute (g) without processing. Classify.
  • the anonymization database generation unit 120 executes the anonymization processing by using the method according to the type obtained in the processing of (1).
  • Attribute (a) and attribute (b) are direct identifiers, so anonymization processing is executed using item deletion.
  • FIG. 6 is a diagram showing an example of a functional configuration of a computer that realizes each of the above-mentioned devices (that is, each node).
  • the processing in each of the above-mentioned devices can be carried out by causing the recording unit 2020 to read a program for causing the computer to function as each of the above-mentioned devices, and operating the control unit 2010, the input unit 2030, the output unit 2040, and the like.
  • the device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity.
  • Communication unit CPU (Central Processing Unit, cache memory, registers, etc.) to which can be connected, RAM and ROM as memory, external storage device as hard hardware, and input, output, and communication units of these , CPU, RAM, ROM, and external storage device have a connecting bus so that data can be exchanged.
  • a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity.
  • a physical entity equipped with such hardware resources includes a general-purpose computer and the like.
  • the external storage device of the hardware entity stores the program required to realize the above-mentioned functions and the data required for processing this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.
  • each program stored in the external storage device (or ROM, etc.) and the data necessary for processing each program are read into the memory as needed, and are appropriately interpreted, executed, and processed by the CPU. ..
  • the CPU realizes a predetermined function (each constituent unit represented as the above-mentioned ... unit, ... means, etc.).
  • the present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. ..
  • the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by a computer
  • the processing content of the function that the hardware entity should have is described by a program.
  • the processing function in the above hardware entity is realized on the computer.
  • the program that describes this processing content can be recorded on a computer-readable recording medium.
  • the computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.
  • a hard disk device, a flexible disk, a magnetic tape, or the like as a magnetic recording device is used as an optical disk
  • a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk.
  • Memory CD-R (Recordable) / RW (ReWritable), etc.
  • MO Magnetto-Optical disc
  • EP-ROM Electroically Erasable and Programmable-Read Only Memory
  • semiconductor memory can be used.
  • the distribution of this program is carried out, for example, by selling, transferring, renting, etc., a portable recording medium such as a DVD or CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via the network.
  • a computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. It is also possible to execute the process according to the received program one by one each time. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be.
  • the program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).
  • the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a technique that enables generation of anonymous processed information without legal knowledge or knowledge of processing methods. The present invention includes: an attribute type classification unit that imparts, to an attribute forming a database as a type of the attribute, a type which is a direct identifier, a quasi-identifier, or the like; and an anonymized database generation unit that generates an anonymized database by anonymizing, for the attribute forming the database, the value of the attribute by using a method corresponding to the type of the attribute.

Description

匿名化データベース生成装置、匿名化データベース生成方法、プログラムAnonymized database generator, anonymized database generator, program
 本発明は、データベースを匿名化する技術に関する。 The present invention relates to a technique for anonymizing a database.
 匿名加工情報とは、特定の個人を識別することができないように個人情報を加工し、当該個人情報を復元できないようにした情報のことをいう。匿名加工情報に関する要件は、各国の法律(例えば、日本では個人情報保護法)などで規定されており、非特許文献1や非特許文献2に記載の加工方法(例えば、削除や置換)を用いて、要件を満たすように個人情報を加工する必要がある。 Anonymously processed information refers to information that has been processed so that a specific individual cannot be identified so that the personal information cannot be restored. The requirements for anonymously processed information are stipulated by the laws of each country (for example, the Personal Information Protection Law in Japan), and the processing methods (for example, deletion or replacement) described in Non-Patent Document 1 and Non-Patent Document 2 are used. Therefore, it is necessary to process personal information so as to meet the requirements.
 上述の通り、匿名加工情報を適切に作成する作業は、要件を規定する法律などを厳格に解釈し、人手で適切な加工方法を選択し作成する必要があるため、法律の知識および加工方法に関する知識が十分にない者にとっては、極めて困難なものとなる。 As mentioned above, in order to properly create anonymous processing information, it is necessary to strictly interpret the laws that stipulate the requirements and manually select and create the appropriate processing method. It can be extremely difficult for those who do not have sufficient knowledge.
 そこで本発明では、法律の知識や加工方法の知識を持たなくても、匿名加工情報を生成することができる技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique capable of generating anonymously processed information without having knowledge of law or processing method.
 本発明の一態様は、データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類部と、前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成部とを含む。 One aspect of the present invention is an attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to an attribute constituting the database as the type of the attribute, and an attribute constituting the database. Therefore, it includes an anonymized database generation unit that anonymizes the value of the attribute by using a method according to the type of the attribute and generates an anonymized database.
 本発明の一態様は、データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類部と、前記データベースを構成する属性に対して、ユーザが当該属性の種別が適切でないと判断する場合、当該属性の種別を無加工という種別に修正する属性種別修正部と、前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成部とを含む。 One aspect of the present invention is an attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to an attribute constituting the database as the type of the attribute, and an attribute constituting the database. When the user determines that the type of the attribute is not appropriate, the attribute type correction unit that corrects the type of the attribute to the type of no processing and the attribute that constitutes the database are adjusted according to the type of the attribute. It includes an anonymization database generation unit that anonymizes the value of the attribute by using the above method and generates an anonymization database.
 本発明によれば、法律の知識や加工方法の知識を持たなくても、匿名加工情報を生成することが可能となる。 According to the present invention, it is possible to generate anonymously processed information without having knowledge of the law or processing method.
匿名化データベース生成装置100の構成を示すブロック図である。It is a block diagram which shows the structure of the anonymization database generation apparatus 100. 匿名化データベース生成装置100の動作を示すフローチャートである。It is a flowchart which shows the operation of the anonymization database generation apparatus 100. 匿名化データベース生成装置200の構成を示すブロック図である。It is a block diagram which shows the structure of the anonymization database generation apparatus 200. 匿名化データベース生成装置200の動作を示すフローチャートである。It is a flowchart which shows the operation of the anonymization database generation apparatus 200. データベースの一例を示す図である。It is a figure which shows an example of a database. 本発明の実施形態における各装置を実現するコンピュータの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the computer which realizes each apparatus in embodiment of this invention.
 以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. The components having the same function are given the same number, and duplicate explanations will be omitted.
 各実施形態の説明に先立って、この明細書における表記方法について説明する。 Prior to the description of each embodiment, the notation method in this specification will be described.
 ^(キャレット)は上付き添字を表す。例えば、xy^zはyzがxに対する上付き添字であり、xy^zはyzがxに対する下付き添字であることを表す。また、_(アンダースコア)は下付き添字を表す。例えば、xy_zはyzがxに対する上付き添字であり、xy_zはyzがxに対する下付き添字であることを表す。 ^ (Caret) represents a superscript. For example, x y ^ z means that y z is a superscript for x, and x y ^ z means that y z is a subscript for x. In addition, _ (underscore) represents a subscript. For example, x y_z means that y z is a superscript for x, and x y_z means that y z is a subscript for x.
 また、ある文字xに対する^xや~xのような上付き添え字の”^”や”~”は、本来”x”の真上に記載されるべきであるが、明細書の記載表記の制約上、^xや~xと記載しているものである。
<技術的背景>
 本発明の各実施形態において、匿名加工情報を生成する対象はデータベースであり、データベースのデータが匿名化された匿名化データベースが生成される。
Also, superscripts "^" and "~" such as ^ x and ~ x for a certain character x should be written directly above "x", but they should be written directly above "x". Due to restrictions, it is described as ^ x or ~ x.
<Technical background>
In each embodiment of the present invention, the target for generating anonymized processed information is a database, and an anonymized database in which the data of the database is anonymized is generated.
 以下、各実施形態における匿名化データベースの生成手順について、説明する。
(1)まず、データベースの属性を直接識別子、準識別子、その他に分類する。直接識別子、準識別子、その他のことを種別という。直接識別子とは、自明に単体で特定の個人を識別できる属性のことをいう。準識別子とは、自明に他属性との組み合わせで特定の個人を識別できる属性のことをいう。その他とは、直接識別子、準識別子のいずれにも該当しない属性のことをいう。
(2)次に、種別に応じた適切な加工方法を用いて、属性のデータを匿名加工する。
Hereinafter, the procedure for generating the anonymized database in each embodiment will be described.
(1) First, the attributes of the database are classified into direct identifiers, quasi-identifiers, and others. Direct identifiers, quasi-identifiers, and others are called types. A direct identifier is an attribute that can identify a specific individual by itself. A quasi-identifier is an attribute that can identify a specific individual in combination with other attributes. Others refer to attributes that do not correspond to either direct identifiers or quasi-identifiers.
(2) Next, the attribute data is anonymously processed by using an appropriate processing method according to the type.
 なお、(2)の処理を実行する前に、誤分類の可能性を考慮し、ユーザが属性の種別を修正することができるようにしてもよい。この場合、新たに、「無加工」という種別を設け、当該種別を指定することで、(2)の処理対象としないようにする。
<第1実施形態>
 匿名化データベース生成装置100は、データベースを入力とし、匿名化データベースを生成し、出力する。
Before executing the process (2), the user may be able to correct the type of the attribute in consideration of the possibility of misclassification. In this case, a new type of "no processing" is provided, and by designating the type, the processing target of (2) is not set.
<First Embodiment>
The anonymized database generation device 100 takes the database as an input, generates an anonymized database, and outputs the anonymized database.
 以下、図1~図2を参照して匿名化データベース生成装置100について説明する。図1は、匿名化データベース生成装置100の構成を示すブロック図である。図2は、匿名化データベース生成装置100の動作を示すフローチャートである。図1に示すように匿名化データベース生成装置100は、属性種別分類部110と、匿名化データベース生成部120と、記録部190を含む。記録部190は、匿名化データベース生成装置100の処理に必要な情報を適宜記録する構成部である。記録部190には、例えば、匿名化の対象となるデータベースが記録される。 Hereinafter, the anonymized database generator 100 will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram showing the configuration of the anonymized database generation device 100. FIG. 2 is a flowchart showing the operation of the anonymized database generation device 100. As shown in FIG. 1, the anonymization database generation device 100 includes an attribute type classification unit 110, an anonymization database generation unit 120, and a recording unit 190. The recording unit 190 is a configuration unit that appropriately records information necessary for processing of the anonymized database generation device 100. In the recording unit 190, for example, a database to be anonymized is recorded.
 図2に従い匿名化データベース生成装置100の動作について説明する。 The operation of the anonymized database generator 100 will be described with reference to FIG.
 S110において、属性種別分類部110は、データベースを入力とし、データベースを構成する属性それぞれに対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与し、分類結果として出力する。 In S110, the attribute type classification unit 110 takes the database as an input, assigns a direct identifier, a quasi-identifier, or any other type as the type of the attribute to each of the attributes constituting the database, and outputs the classification result. do.
 以下、直接識別子と準識別子の例、分類方法の例について説明する。
(直接識別子と準識別子の例)
 直接識別子の例として、氏名、メールアドレス、マイナンバー、基礎年金番号、住民票コード、電話番号、旅券番号、クレジットカード番号などがある。
Hereinafter, examples of direct identifiers and quasi-identifiers, and examples of classification methods will be described.
(Examples of direct identifiers and quasi-identifiers)
Examples of direct identifiers include name, email address, my number, basic pension number, resident's card code, telephone number, passport number, credit card number, and so on.
 準識別子の例として、年齢、住所、性別、生年月日などがある。また、ある準識別子と相関がある属性も準識別子として扱うこととする。
(分類方法の例)
 分類方法として、以下の方法がある。
(1)パターンマッチングによる方法
 パターンマッチングによる方法とは、所定のデータの集合のデータと分類対象(つまり、種別付与対象)の属性のデータをパターンマッチングすることで直接識別子/準識別子であるか否かを判別する方法である。
Examples of quasi-identifiers include age, address, gender, and date of birth. In addition, an attribute that correlates with a certain quasi-identifier is also treated as a quasi-identifier.
(Example of classification method)
There are the following methods as a classification method.
(1) Method by pattern matching The method by pattern matching is whether or not it is a direct identifier / quasi-identifier by pattern matching the data of a predetermined set of data and the attribute data of the classification target (that is, the classification target). It is a method of determining whether or not.
 直接識別子である氏名については、氏名の一覧を所定のデータの集合とするパターンマッチングにより判別できる。また、準識別子である住所、性別についても、住所の一覧、性別の一覧を所定のデータの集合とするパターンマッチングにより判別できる。
(2)正規表現による方法
 正規表現による方法とは、分類対象の属性のデータが所定の正規表現に該当するか否か判定することで直接識別子/準識別子であるか否かを判別する方法である。
The name, which is a direct identifier, can be discriminated by pattern matching using a list of names as a predetermined set of data. Further, the address and the gender, which are quasi-identifiers, can also be discriminated by pattern matching in which the list of addresses and the list of genders are a set of predetermined data.
(2) Regular expression method The regular expression method is a method of determining whether or not the attribute data to be classified is a direct identifier / quasi-identifier by determining whether or not it corresponds to a predetermined regular expression. be.
 直接識別子であるメールアドレス、電話番号、旅券番号については、所定の正規表現に該当するか否かで判別できる。
(3)チェックデジット生成アルゴリズムによる方法
 チェックデジット生成アルゴリズムによる方法とは、分類対象の属性のデータが所定のチェックデジット生成アルゴリズムを用いて生成されたデータであるか否か判定することで直接識別子であるか否かを判別する方法である。
The e-mail address, telephone number, and passport number, which are direct identifiers, can be determined by whether or not they correspond to a predetermined regular expression.
(3) Method by check digit generation algorithm The method by the check digit generation algorithm is a direct identifier by determining whether or not the data of the attribute to be classified is the data generated by using a predetermined check digit generation algorithm. It is a method of determining whether or not there is.
 直接識別子であるマイナンバー、住民票コードについては、Modulus11Weight234567というチェックデジット生成アルゴリズムを用いて生成されたデータであるか否かで判別できる。また、直接識別子であるクレジットカード番号については、LuhnAlgorithmというチェックデジット生成アルゴリズムを用いて生成されたデータであるか否かで判別できる。
(4)範囲チェックによる方法
 範囲チェックによる方法とは、分類対象の属性のデータが所定のデータ範囲に含まれるか否か判定することで準識別子であるか否かを判別する方法である。
The My Number and resident's card code, which are direct identifiers, can be determined by whether or not they are data generated using a check digit generation algorithm called Modulus11Weight234567. Further, the credit card number, which is a direct identifier, can be determined by whether or not the data is generated by using a check digit generation algorithm called Luhn Algorithm.
(4) Method by range check The method by range check is a method of determining whether or not it is a quasi-identifier by determining whether or not the data of the attribute to be classified is included in a predetermined data range.
 準識別子である年齢については、{0, 1, …, 119, 120}を所定のデータ範囲として、当該データ範囲に含まれるか否かで判別できる。
(5)相関による方法
 相関による方法とは、分類対象の属性がある準識別子と相関があるか否かを判定することで準識別子であるか否かを判別する方法である。相関による方法では、分類対象の属性と判定に用いる準識別子の両方が量的属性である場合は、ピアソン相関を用いる。分類対象の属性と判定に用いる準識別子のいずれか一方が量的属性であり、もう一方が質的属性である場合は、相関比を用いる。分類対象の属性と判定に用いる準識別子の両方が質的属性である場合は、クラメールの連関係数を用いる。ここで、質的属性とは、性別のように属性の値として数値以外の値をとる属性、量的属性とは、年齢のように属性の値として数値をとる属性のことである。なお、判定に用いる準識別子として、年齢、住所、性別、生年月日を用いることができ、その際、準識別子のデータの分布が一様性を有する場合、相関の有無の判定に用いないようにしてもよい。このようにすることで、分類対象の属性が準識別子であるか否かの判別の誤りを削減することができる。
The age, which is a quasi-identifier, can be determined by whether or not it is included in the data range with {0, 1, ..., 119, 120} as a predetermined data range.
(5) Correlation method The correlation method is a method of determining whether or not an attribute to be classified is a quasi-identifier by determining whether or not there is a correlation with a certain quasi-identifier. In the correlation method, Pearson correlation is used when both the attribute to be classified and the quasi-identifier used for judgment are quantitative attributes. If one of the attribute to be classified and the quasi-identifier used for judgment is a quantitative attribute and the other is a qualitative attribute, the correlation ratio is used. If both the attribute to be classified and the quasi-identifier used for judgment are qualitative attributes, the number of associations of Cramer is used. Here, a qualitative attribute is an attribute that takes a value other than a numerical value as an attribute value such as gender, and a quantitative attribute is an attribute that takes a numerical value as an attribute value such as age. As the quasi-identifier used for the judgment, age, address, gender, and date of birth can be used. At that time, if the distribution of the quasi-identifier data is uniform, do not use it for the judgment of the presence or absence of correlation. It may be. By doing so, it is possible to reduce an error in determining whether or not the attribute to be classified is a quasi-identifier.
 したがって、属性種別判定部110は、パターンマッチングによる方法、正規表現による方法、チェックデジット生成アルゴリズムによる方法、範囲チェックによる方法、相関による方法のうち、いくつかの方法を用いて、種別を付与するように構成することができる。つまり、属性種別分類部110は、パターンマッチングによる方法、正規表現による方法、チェックデジット生成アルゴリズムによる方法、範囲チェックによる方法、相関による方法の中から選択した1以上の方法をデータベースを構成する属性に対して順次適用していくことで種別を判別し、属性と当該属性に付与された種別の組を含む分類結果を生成し、出力する。 Therefore, the attribute type determination unit 110 assigns a type by using some of a pattern matching method, a regular expression method, a check digit generation algorithm method, a range check method, and a correlation method. Can be configured in. That is, the attribute type classification unit 110 assigns one or more methods selected from a pattern matching method, a regular expression method, a check digit generation algorithm method, a range check method, and a correlation method to the attributes constituting the database. The type is determined by sequentially applying to the attribute, and the classification result including the set of the attribute and the type given to the attribute is generated and output.
 S120において、匿名化データベース生成部120は、データベースとS110で出力された分類結果とを入力とし、データベースを構成する属性それぞれに対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成し、出力する。 In S120, the anonymization database generation unit 120 inputs the database and the classification result output in S110, and for each of the attributes constituting the database, the value of the attribute is used by a method according to the type of the attribute. Anonymize, generate an anonymized database, and output.
 以下、匿名化の方法の例について説明する。
(属性の種別が直接識別子である場合)
(1)項目削除
 項目削除とは、匿名化対象となる属性の値をすべて削除する(つまり、属性項目自体を削除する)ことにより匿名化する方法である。
(2)仮ID化
 仮ID化とは、ハッシュ関数などを用いて匿名化対象となる属性の値をIDに変換することにより匿名化する方法である。
(属性の種別が準識別子である場合)
 例えば、参考非特許文献1に記載のk-匿名性を満たす方法がある。
(参考非特許文献1:Khaled El Emam, Fida Kamal Dankar, Romeo Issa, Elizabeth Jonker, Daniel Amyot, Elise Cogo, Jean-Pierre Corriveau, Mark Walker, Sadrul Chowdhury, Regis Vaillancourt, et al., “A globally optimal k-anonymity method for the de-identification of health data,” Journal of the American Medical Informatics Association, Vol.16, No.5, pp.670-682, 2009.)
(属性の種別がその他である場合)
(1)削除
 削除とは、匿名化対象となる属性の値の一部またはすべてを削除することにより匿名化する方法である。
(2)一般化
 一般化とは、上位となる概念を用いて匿名化対象となる属性の値を置き換えることにより匿名化する方法である。
(3)丸め
 丸めとは、匿名化対象となる属性が量的属性である場合、当該属性の値を四捨五入、切り捨てなどの端数処理することで得られた値で置換することにより匿名化する方法である。
(4)スワッピング
 スワッピングとは、匿名化対象となる属性の値をレコード間で(確率的に)入れ替えることにより匿名化する方法である。
(5)ノイズ付加
 ノイズ付加とは、匿名化対象となる属性が量的属性である場合、当該属性の値に対し一定の(確率)分布に従い発生させたランダムな値を加算することにより匿名化する方法である。
(6)ミクロアグリゲーション
 ミクロアグリゲーションとは、匿名化対象となる属性の値をグループ化しそのグループの値を代表値に置き換えることにより匿名化する方法である。
(7)トップコーディング
 トップコーディングとは、匿名化対象となる属性が量的属性である場合、当該属性の値に対して特に大きい数値をまとめることにより匿名化する方法である。
(8)ボトムコーディング
 ボトムコーディングとは、匿名化対象となる属性が量的属性である場合、当該属性の値に対して特に小さい数値をまとめることにより匿名化する方法である。
(9)外れ値加工
 外れ値加工とは、匿名化対象となる属性に含まれる特異な値(外れ値)を削除、トップコーディング、ボトムコーディングなどの加工を行うことにより匿名化する方法である。
(10)ランダム化
 ランダム化とは、匿名化対象となる属性の値を(確率的に)別の値に置き換えることにより匿名化する方法である。
An example of the anonymization method will be described below.
(When the attribute type is a direct identifier)
(1) Item deletion Item deletion is a method of anonymizing by deleting all the values of the attributes to be anonymized (that is, deleting the attribute items themselves).
(2) Temporary ID conversion Temporary ID conversion is a method of anonymizing by converting the value of an attribute to be anonymized into an ID using a hash function or the like.
(When the attribute type is a quasi-identifier)
For example, there is a method of satisfying k-anonymity described in Reference Non-Patent Document 1.
(Reference Non-Patent Document 1: Khaled El Emam, Fida Kamal Dankar, Romeo Issa, Elizabeth Jonker, Daniel Amyot, Elise Cogo, Jean-Pierre Corriveau, Mark Walker, Sadrul Chowdhury, Regis Vaillancourt, et al., “A globally optimal k) -anonymity method for the de-identification of health data, ”Journal of the American Medical Informatics Association, Vol.16, No.5, pp.670-682, 2009.)
(When the attribute type is other)
(1) Deletion Deletion is a method of anonymizing by deleting a part or all of the values of attributes to be anonymized.
(2) Generalization Generalization is a method of anonymizing by replacing the value of an attribute to be anonymized by using a higher-level concept.
(3) Rounding Rounding is a method of anonymizing an attribute by replacing it with a value obtained by rounding or rounding down the value of the attribute when the attribute to be anonymized is a quantitative attribute. Is.
(4) Swapping Swapping is a method of anonymizing by (probabilistically) exchanging the values of attributes to be anonymized between records.
(5) Addition of noise Noise addition is anonymization by adding a random value generated according to a certain (probability) distribution to the value of the attribute when the attribute to be anonymized is a quantitative attribute. How to do it.
(6) Microaggregation Microaggregation is a method of anonymizing by grouping the values of attributes to be anonymized and replacing the values of the group with representative values.
(7) Top coding When the attribute to be anonymized is a quantitative attribute, top coding is a method of anonymizing by collecting a particularly large numerical value with respect to the value of the attribute.
(8) Bottom coding When the attribute to be anonymized is a quantitative attribute, bottom coding is a method of anonymizing by collecting a numerical value particularly small with respect to the value of the attribute.
(9) Outlier processing Outlier processing is a method of anonymizing by deleting a peculiar value (outlier value) included in an attribute to be anonymized and performing processing such as top coding and bottom coding.
(10) Randomization Randomization is a method of anonymizing by (probabilistically) replacing the value of an attribute to be anonymized with another value.
 したがって、匿名化データベース生成部120は、データベースを構成する属性の種別が直接識別子である場合は、項目削除、仮ID化のうち、いくつかの方法を用いて匿名化し、データベースを構成する属性の種別が準識別子である場合は、k-匿名性を満たす方法を用いて匿名化し、データベースを構成する属性の種別がその他である場合は、削除、一般化、丸め、スワッピング、ノイズ付加、ミクロアグリゲーション、トップコーディング、ボトムコーディング、外れ値加工、ランダム化のうち、いくつかの方法を用いて匿名化するように構成することができる。 Therefore, when the type of the attribute that constitutes the database is a direct identifier, the anonymization database generation unit 120 anonymizes the attribute that constitutes the database by using some of the methods of deleting items and creating a temporary ID. If the type is a quasi-identifier, anonymize it using a method that satisfies k-anonymity, and if the type of attributes that make up the database is other, delete, generalize, round, swap, add noise, microaggregation. , Top coding, bottom coding, outlier processing, and randomization can be configured to be anonymized using several methods.
 本発明の実施形態によれば、法律の知識や加工方法の知識を持たなくても、匿名加工情報を生成することが可能となる。特に、専門知識がないユーザであっても、自動的に適切な加工方法を用いて匿名加工情報を生成することが可能となる。
<第2実施形態>
 匿名化データベース生成装置200は、データベースを入力とし、匿名化データベースを生成し、出力する。
According to the embodiment of the present invention, it is possible to generate anonymously processed information without having knowledge of laws and processing methods. In particular, even a user without specialized knowledge can automatically generate anonymous processing information using an appropriate processing method.
<Second Embodiment>
The anonymized database generator 200 takes the database as an input, generates an anonymized database, and outputs the anonymized database.
 以下、図3~図4を参照して匿名化データベース生成装置200について説明する。図3は、匿名化データベース生成装置200の構成を示すブロック図である。図4は、匿名化データベース生成装置200の動作を示すフローチャートである。図3に示すように匿名化データベース生成装置200は、属性種別分類部110と、属性種別修正部210と、匿名化データベース生成部120と、記録部190を含む。記録部190は、匿名化データベース生成装置200の処理に必要な情報を適宜記録する構成部である。記録部190には、例えば、匿名化の対象となるデータベースが記録される。 Hereinafter, the anonymized database generator 200 will be described with reference to FIGS. 3 to 4. FIG. 3 is a block diagram showing the configuration of the anonymized database generation device 200. FIG. 4 is a flowchart showing the operation of the anonymized database generator 200. As shown in FIG. 3, the anonymization database generation device 200 includes an attribute type classification unit 110, an attribute type correction unit 210, an anonymization database generation unit 120, and a recording unit 190. The recording unit 190 is a configuration unit that appropriately records information necessary for processing of the anonymized database generation device 200. In the recording unit 190, for example, a database to be anonymized is recorded.
 図4に従い匿名化データベース生成装置200の動作について説明する。 The operation of the anonymized database generator 200 will be described with reference to FIG.
 S110において、属性種別分類部110は、データベースを入力とし、データベースを構成する属性それぞれに対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与し、分類結果として出力する。 In S110, the attribute type classification unit 110 takes the database as an input, assigns a direct identifier, a quasi-identifier, or any other type as the type of the attribute to each of the attributes constituting the database, and outputs the classification result. do.
 S210において、属性種別修正部210は、S110で出力された分類結果を入力とし、データベースを構成する属性それぞれに対して、ユーザが当該属性の種別が適切でないと判断する場合、当該属性の種別を無加工という種別に修正し、当該修正を反映した分類結果を出力する。ユーザが属性の種別が適切でないと判断する場合、ユーザは、例えば、入力部(図示しない)を用いて、修正指示を属性種別修正部210に入力する。 In S210, the attribute type correction unit 210 takes the classification result output in S110 as an input, and when the user determines that the attribute type is not appropriate for each of the attributes constituting the database, the attribute type correction unit 210 determines the attribute type. It is corrected to the type of no processing, and the classification result reflecting the correction is output. When the user determines that the attribute type is not appropriate, the user inputs a correction instruction to the attribute type correction unit 210 using, for example, an input unit (not shown).
 S120において、匿名化データベース生成部120は、データベースとS210で出力された分類結果とを入力とし、データベースを構成する属性それぞれに対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成し、出力する。ここで、属性の種別が無加工である場合は、匿名化処理を実行しないものとする。 In S120, the anonymization database generation unit 120 takes the database and the classification result output in S210 as inputs, and for each of the attributes constituting the database, the value of the attribute is used by a method according to the type of the attribute. Anonymize, generate an anonymized database, and output. Here, if the attribute type is unprocessed, the anonymization process is not executed.
 本発明の実施形態によれば、法律の知識や加工方法の知識を持たなくても、匿名加工情報を生成することが可能となる。特に、専門知識がないユーザであっても、自動的に適切な加工方法を用いて匿名加工情報を生成することが可能となる。
<適用例>
 ここでは、図5に示すデータベースを例に用いながら各構成部での処理について説明する。当該データベースは、(a), (b), (c), (d), (e), (f), (g)と識別される7つの属性を持つ。
(1)属性種別分類部110、属性種別修正部210での処理
 属性種別分類部110は、上記7つの属性が直接識別子、準識別子、その他のいずれに該当するかを判別し、分類結果を生成する。属性種別修正部210は、ユーザによる修正指示に基づいて修正した分類結果を生成する。
According to the embodiment of the present invention, it is possible to generate anonymously processed information without having knowledge of laws and processing methods. In particular, even a user without specialized knowledge can automatically generate anonymous processing information using an appropriate processing method.
<Application example>
Here, the processing in each component will be described using the database shown in FIG. 5 as an example. The database has seven attributes identified as (a), (b), (c), (d), (e), (f), (g).
(1) Processing by the attribute type classification unit 110 and the attribute type correction unit 210 The attribute type classification unit 110 determines whether the above seven attributes directly correspond to an identifier, a quasi-identifier, or any other, and generates a classification result. do. The attribute type correction unit 210 generates a classification result corrected based on the correction instruction by the user.
 属性(a)については、氏名の一覧を用いたパターンマッチングにより、当該属性は氏名であると判別し、直接識別子を属性(a)の種別として付与する。 Regarding the attribute (a), it is determined that the attribute is a name by pattern matching using a list of names, and an identifier is directly assigned as the type of the attribute (a).
 属性(b)については、Modulus11Weight234567というチェックデジット生成アルゴリズムに従っていることから、当該属性はマイナンバーであると判別し、直接識別子を属性(b)の種別として付与する。 Regarding attribute (b), since it follows the check digit generation algorithm called Modulus11Weight234567, it is determined that the attribute is my number, and an identifier is directly assigned as the type of attribute (b).
 属性(c)については、性別の一覧を用いたパターンマッチングにより、当該属性は性別であると判別し、準識別子を属性(c)の種別として付与する。 Regarding attribute (c), it is determined that the attribute is gender by pattern matching using a list of genders, and a quasi-identifier is assigned as the type of attribute (c).
 属性(d)については、住所の一覧を用いたパターンマッチングにより、当該属性は住所であると判別し、準識別子を属性(d)の種別として付与する。 Regarding the attribute (d), it is determined that the attribute is an address by pattern matching using a list of addresses, and a quasi-identifier is assigned as the type of the attribute (d).
 属性(e)については、{0, 1, …, 119, 120}をデータ範囲とする範囲チェックにより、当該属性は年齢であると判別し、準識別子を属性(e)の種別として付与する。 Regarding attribute (e), a range check with {0, 1,…, 119, 120} as the data range determines that the attribute is age, and assigns a quasi-identifier as the type of attribute (e).
 属性(f)については、属性(e)と相関が高いと判定し、準識別子を属性(f)の種別として付与する。 Regarding the attribute (f), it is judged that the correlation with the attribute (e) is high, and a quasi-identifier is assigned as the type of the attribute (f).
 属性(g)については、属性種別分類部110での処理で得られた種別が不適当であるとユーザが判断し、無加工に属性(g)の種別を修正する。
分類する。
(2)匿名化データベース生成部120での処理
 匿名化データベース生成部120は、(1)の処理で得られた種別に応じた方法を用いて匿名化処理を実行する。
Regarding the attribute (g), the user determines that the type obtained by the processing in the attribute type classification unit 110 is inappropriate, and corrects the type of the attribute (g) without processing.
Classify.
(2) Processing in the anonymization database generation unit 120 The anonymization database generation unit 120 executes the anonymization processing by using the method according to the type obtained in the processing of (1).
 属性(a)、属性(b)については、直接識別子であることから、項目削除を用いて匿名化処理を実行する。 Attribute (a) and attribute (b) are direct identifiers, so anonymization processing is executed using item deletion.
 属性(c)、属性(d)、属性(e)、属性(f)については、当該4つの属性を対象としたk-匿名化を用いて匿名化処理を実行する。
属性(g)については、無加工であることから、匿名化処理を実行しない。
<補記>
 図6は、上述の各装置(つまり、各ノード)を実現するコンピュータの機能構成の一例を示す図である。上述の各装置における処理は、記録部2020に、コンピュータを上述の各装置として機能させるためのプログラムを読み込ませ、制御部2010、入力部2030、出力部2040などに動作させることで実施できる。
For the attribute (c), attribute (d), attribute (e), and attribute (f), anonymization processing is executed using k-anonymization for the four attributes.
Since the attribute (g) is unprocessed, the anonymization process is not executed.
<Supplement>
FIG. 6 is a diagram showing an example of a functional configuration of a computer that realizes each of the above-mentioned devices (that is, each node). The processing in each of the above-mentioned devices can be carried out by causing the recording unit 2020 to read a program for causing the computer to function as each of the above-mentioned devices, and operating the control unit 2010, the input unit 2030, the output unit 2040, and the like.
 本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置(例えば通信ケーブル)が接続可能な通信部、CPU(Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい)、メモリであるRAMやROM、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、CPU、RAM、ROM、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、CD-ROMなどの記録媒体を読み書きできる装置(ドライブ)などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 The device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Communication unit, CPU (Central Processing Unit, cache memory, registers, etc.) to which can be connected, RAM and ROM as memory, external storage device as hard hardware, and input, output, and communication units of these , CPU, RAM, ROM, and external storage device have a connecting bus so that data can be exchanged. Further, if necessary, a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity. A physical entity equipped with such hardware resources includes a general-purpose computer and the like.
 ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている(外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるROMに記憶させておくこととしてもよい)。また、これらのプログラムの処理によって得られるデータなどは、RAMや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores the program required to realize the above-mentioned functions and the data required for processing this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.
 ハードウェアエンティティでは、外部記憶装置(あるいはROMなど)に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にCPUで解釈実行・処理される。その結果、CPUが所定の機能(上記、…部、…手段などと表した各構成部)を実現する。 In the hardware entity, each program stored in the external storage device (or ROM, etc.) and the data necessary for processing each program are read into the memory as needed, and are appropriately interpreted, executed, and processed by the CPU. .. As a result, the CPU realizes a predetermined function (each constituent unit represented as the above-mentioned ... unit, ... means, etc.).
 本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. ..
 既述のように、上記実施形態において説明したハードウェアエンティティ(本発明の装置)における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by a computer, the processing content of the function that the hardware entity should have is described by a program. Then, by executing this program on the computer, the processing function in the above hardware entity is realized on the computer.
 この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD-RAM(Random Access Memory)、CD-ROM(Compact Disc Read Only Memory)、CD-R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto-Optical disc)等を、半導体メモリとしてEEP-ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。 The program that describes this processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like as a magnetic recording device is used as an optical disk, and a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk. Memory), CD-R (Recordable) / RW (ReWritable), etc., MO (Magneto-Optical disc), etc. as a magneto-optical recording medium, EP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. as a semiconductor memory Can be used.
 また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The distribution of this program is carried out, for example, by selling, transferring, renting, etc., a portable recording medium such as a DVD or CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via the network.
 このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記憶装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等)を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. It is also possible to execute the process according to the received program one by one each time. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be. The program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).
 また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, in this form, the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.
 上述の本発明の実施形態の記載は、例証と記載の目的で提示されたものである。網羅的であるという意思はなく、開示された厳密な形式に発明を限定する意思もない。変形やバリエーションは上述の教示から可能である。実施形態は、本発明の原理の最も良い例証を提供するために、そして、この分野の当業者が、熟考された実際の使用に適するように本発明を色々な実施形態で、また、色々な変形を付加して利用できるようにするために、選ばれて表現されたものである。すべてのそのような変形やバリエーションは、公正に合法的に公平に与えられる幅にしたがって解釈された添付の請求項によって定められた本発明のスコープ内である。 The above description of the embodiment of the present invention is presented for the purpose of illustration and description. There is no intention to be exhaustive and no intention to limit the invention to the exact form disclosed. Deformations and variations are possible from the above teachings. The embodiments are in various embodiments and in various ways to provide the best illustration of the principles of the invention and to be suitable for practical use by those skilled in the art. It is selected and expressed so that it can be used by adding transformations. All such variations and variations are within the scope of the invention as defined by the appended claims, interpreted according to the width given fairly, legally and impartially.

Claims (8)

  1.  データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類部と、
     前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成部と
     を含む匿名化データベース生成装置。
    An attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to the attributes that make up the database as the type of the attribute.
    An anonymized database generator including an anonymized database generator that anonymizes the value of the attribute that constitutes the database by using a method according to the type of the attribute and generates an anonymized database.
  2.  データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類部と、
     前記データベースを構成する属性に対して、ユーザが当該属性の種別が適切でないと判断する場合、当該属性の種別を無加工という種別に修正する属性種別修正部と、
     前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成部と
     を含む匿名化データベース生成装置。
    An attribute type classification unit that directly assigns an identifier, a quasi-identifier, or any other type to the attributes that make up the database as the type of the attribute.
    When the user determines that the type of the attribute is not appropriate for the attributes that make up the database, the attribute type correction unit that corrects the type of the attribute to the type of unprocessed
    An anonymized database generator including an anonymized database generator that anonymizes the value of the attribute that constitutes the database by using a method according to the type of the attribute and generates an anonymized database.
  3.  請求項1または2に記載の匿名化データベース生成装置であって、
     前記属性種別分類部は、パターンマッチングによる方法、正規表現による方法、チェックデジット生成アルゴリズムによる方法、範囲チェックによる方法、相関による方法のうち、いくつかの方法を用いて、種別を付与する
     ことを特徴とする匿名化データベース生成装置。
    The anonymized database generator according to claim 1 or 2.
    The attribute type classification unit is characterized in that a type is assigned by using some methods among a pattern matching method, a regular expression method, a check digit generation algorithm method, a range check method, and a correlation method. Anonymized database generator.
  4.  請求項1または2に記載の匿名化データベース生成装置であって、
     前記匿名化データベース生成部は、
     前記データベースを構成する属性の種別が直接識別子である場合は、項目削除、仮ID化のうち、いくつかの方法を用いて匿名化し、
     前記データベースを構成する属性の種別が準識別子である場合は、k-匿名性を満たす方法を用いて匿名化する
     ことを特徴とする匿名化データベース生成装置。
    The anonymized database generator according to claim 1 or 2.
    The anonymized database generation unit
    When the type of the attribute that constitutes the database is a direct identifier, anonymize it using some of the methods of deleting items and creating a temporary ID.
    An anonymization database generator characterized in that when the type of the attribute constituting the database is a quasi-identifier, it is anonymized by using a method satisfying k-anonymity.
  5.  請求項4に記載の匿名化データベース生成装置であって、
     前記匿名化データベース生成部は、前記データベースを構成する属性の種別がその他である場合は、削除、一般化、丸め、スワッピング、ノイズ付加、ミクロアグリゲーション、トップコーディング、ボトムコーディング、外れ値加工、ランダム化のうち、いくつかの方法を用いて匿名化する
     ことを特徴とする匿名化データベース生成装置。
    The anonymized database generator according to claim 4.
    The anonymization database generation unit deletes, generalizes, rounds, swaps, adds noise, microaggregates, top coding, bottom coding, outlier processing, and randomizes when the types of attributes constituting the database are other. Of these, an anonymization database generator characterized by anonymization using several methods.
  6.  匿名化データベース生成装置が、データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類ステップと、
     前記匿名化データベース生成装置が、前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成ステップと
     を含む匿名化データベース生成方法。
    An attribute type classification step in which the anonymized database generator directly assigns an identifier, a quasi-identifier, or any other type to the attributes constituting the database as the type of the attribute.
    Anonymized database generation device generates an anonymized database by anonymizing the value of the attribute with respect to the attribute constituting the database by using a method according to the type of the attribute. How to generate an anonymized database, including.
  7.  匿名化データベース生成装置が、データベースを構成する属性に対して、直接識別子、準識別子、その他のいずれかの種別を当該属性の種別として付与する属性種別分類ステップと、
     前記匿名化データベース生成装置が、前記データベースを構成する属性に対して、ユーザが当該属性の種別が適切でないと判断する場合、当該属性の種別を無加工という種別に修正する属性種別修正ステップと、
     前記匿名化データベース生成装置が、前記データベースを構成する属性に対して、当該属性の種別に応じた方法を用いて当該属性の値を匿名化し、匿名化データベースを生成する匿名化データベース生成ステップと
     を含む匿名化データベース生成方法。
    An attribute type classification step in which the anonymized database generator directly assigns an identifier, a quasi-identifier, or any other type to the attributes constituting the database as the type of the attribute.
    When the anonymized database generator determines that the type of the attribute is not appropriate for the attributes constituting the database, the attribute type correction step of correcting the type of the attribute to the type of no processing, and the attribute type correction step.
    Anonymized database generation device generates an anonymized database by anonymizing the value of the attribute with respect to the attribute constituting the database by using a method according to the type of the attribute. How to generate an anonymized database, including.
  8.  請求項1ないし5のいずれか1項に記載の匿名化データベース生成装置としてコンピュータを機能させるためのプログラム。 A program for operating a computer as an anonymized database generator according to any one of claims 1 to 5.
PCT/JP2020/018127 2020-04-28 2020-04-28 Anonymized database generation device, anonymized database generation method, and program WO2021220404A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/018127 WO2021220404A1 (en) 2020-04-28 2020-04-28 Anonymized database generation device, anonymized database generation method, and program
JP2022518490A JP7405248B2 (en) 2020-04-28 2020-04-28 Anonymized database generation device, anonymized database generation method, program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/018127 WO2021220404A1 (en) 2020-04-28 2020-04-28 Anonymized database generation device, anonymized database generation method, and program

Publications (1)

Publication Number Publication Date
WO2021220404A1 true WO2021220404A1 (en) 2021-11-04

Family

ID=78332317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/018127 WO2021220404A1 (en) 2020-04-28 2020-04-28 Anonymized database generation device, anonymized database generation method, and program

Country Status (2)

Country Link
JP (1) JP7405248B2 (en)
WO (1) WO2021220404A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010127216A2 (en) * 2009-05-01 2010-11-04 Telcodia Technologies, Inc. Automated determination of quasi-identifiers using program analysis
JP2014013458A (en) * 2012-07-03 2014-01-23 Hitachi Systems Ltd Service providing method and service providing system
JP2015114871A (en) * 2013-12-12 2015-06-22 Kddi株式会社 Device for privacy protection of public information, and method and program for privacy protection of public information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010127216A2 (en) * 2009-05-01 2010-11-04 Telcodia Technologies, Inc. Automated determination of quasi-identifiers using program analysis
JP2014013458A (en) * 2012-07-03 2014-01-23 Hitachi Systems Ltd Service providing method and service providing system
JP2015114871A (en) * 2013-12-12 2015-06-22 Kddi株式会社 Device for privacy protection of public information, and method and program for privacy protection of public information

Also Published As

Publication number Publication date
JP7405248B2 (en) 2023-12-26
JPWO2021220404A1 (en) 2021-11-04

Similar Documents

Publication Publication Date Title
US10108914B2 (en) Method and system for morphing object types in enterprise content management systems
Hentschel et al. Critical success factors for the implementation and adoption of cloud services in SMEs
US11375015B2 (en) Dynamic routing of file system objects
TW201423447A (en) Dynamic data masking method and data library system
Garcia-Arce et al. Comparison of machine learning algorithms for the prediction of preventable hospital readmissions
US11741258B2 (en) Dynamic data dissemination under declarative data subject constraints
Yaghini et al. A hybrid simulated annealing and column generation approach for capacitated multicommodity network design
Zúñiga et al. Master data management maturity model for the microfinance sector in Peru
Cai et al. Improving the efficiency of clinical trial recruitment using an ensemble machine learning to assist with eligibility screening
WO2021220404A1 (en) Anonymized database generation device, anonymized database generation method, and program
JP2017215868A (en) Anonymization processor, anonymization processing method, and program
Famutimi et al. An empirical comparison of the performances of single structure columnar in-memory and disk-resident data storage techniques using healthcare big data
US20220083604A1 (en) Mapping of personally-identifiable information to a person based on natural language coreference resolution
JP7104520B2 (en) Withholding tax-related business support equipment, withholding tax-related business support methods, and withholding tax-related business support programs
US20210295261A1 (en) Generating actionable information from documents
Adkinson Orellana et al. A new approach for dynamic and risk-based data anonymization
JP6927771B2 (en) Sales management equipment, sales management methods, and sales management programs
WO2021220402A1 (en) Quasi-identifier determination device, quasi-identifier determination method, and program
Saruwatari et al. Estimation of business rules using associations analysis
JP5875535B2 (en) Anonymization device, anonymization method, program
JP5875536B2 (en) Anonymization device, anonymization method, program
CN113474778B (en) Anonymizing apparatus, anonymizing method, and computer-readable recording medium
US20240005024A1 (en) Order preserving dataset obfuscation
WO2021065004A1 (en) Identification estimation risk evaluating device, identification estimation risk evaluating method, and program
JP7280851B2 (en) data access control system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20934147

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022518490

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20934147

Country of ref document: EP

Kind code of ref document: A1