WO2021220402A1 - Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme - Google Patents
Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme Download PDFInfo
- Publication number
- WO2021220402A1 WO2021220402A1 PCT/JP2020/018125 JP2020018125W WO2021220402A1 WO 2021220402 A1 WO2021220402 A1 WO 2021220402A1 JP 2020018125 W JP2020018125 W JP 2020018125W WO 2021220402 A1 WO2021220402 A1 WO 2021220402A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attribute
- quasi
- identifier
- attributes
- attribute set
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
Definitions
- the present invention relates to a technique for concealing database data.
- Non-Patent Document 1 As a technique for concealing database data, there are a definitive method such as Non-Patent Document 1 and Non-Patent Document 2 and a probabilistic method such as Non-Patent Document 3.
- the target database contains N data sets (hereinafter referred to as records) composed of M attribute values (M is an integer of 2 or more) as shown in FIG. 1 (N is 1 or more). Integer) is included. All of these methods are techniques for concealing data using a combination of attributes called quasi-identifiers, which can uniquely identify a record.
- an object of the present invention is to provide a technique for determining whether or not a database attribute is a quasi-identifier.
- X is a set of attributes of database T that are clearly defined as quasi-identifiers (hereinafter referred to as first attribute set), and Y is a set of attributes of database T that are candidates for quasi-identifiers.
- the relationship between the two attributes of the first attribute set X and the attribute set that is the element of the second attribute set Y is calculated. If the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is determined to be a quasi-identifier, and the attribute determined to be the quasi-identifier is defined as an element. It includes a quasi-identifier set generation unit that generates a subset of the second attribute set Y as a quasi-identifier set.
- X is a set of attributes of database T that are clearly defined as quasi-identifiers (hereinafter referred to as first attribute set), and Y is a set of attributes of database T that are candidates for quasi-identifiers.
- first attribute set a set of attributes of database T that are clearly defined as quasi-identifiers
- Y a set of attributes of database T that are candidates for quasi-identifiers.
- the uniformity of the attribute is determined for the attribute that is an element of the first attribute set X, and the attribute determined to be non-uniform is used as an element.
- the third attribute set generator that generates a subset of the set X as the third attribute set X', and the set of attributes that are the elements of the third attribute set X'and the attributes that are the elements of the second attribute set Y.
- the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is determined to be a quasi-identifier. It includes a quasi-identifier set generation unit that generates a subset of the second attribute set Y whose elements are the attributes determined to be the quasi-identifiers as the quasi-identifier set.
- ⁇ (Caret) represents a superscript.
- x y ⁇ z means that y z is a superscript for x
- x y ⁇ z means that y z is a subscript for x
- _ (underscore) represents a subscript.
- x y_z means that y z is a superscript for x
- x y_z means that y z is a subscript for x.
- Each embodiment of the present invention determines whether or not an attribute other than an attribute that is clear as a quasi-identifier is an attribute that becomes a quasi-identifier with respect to the database.
- the attribute that is clear as a quasi-identifier means, for example, an attribute known as a quasi-identifier such as age, address, and gender, or an attribute designated as a quasi-identifier by the user.
- a set X of attributes that is clear as a quasi-identifier and a set Y of attributes that are other than the clear attributes as quasi-identifiers and are candidates for quasi-identifiers are prepared.
- the degree of relationship indicating the strength of the relationship between the attribute x ⁇ X that is clear as a quasi-identifier and the attribute y ⁇ Y that is a candidate for the quasi-identifier is calculated, and the attribute y has a strong relationship with the attribute x (that is,).
- the degree of relation is greater than or greater than a predetermined threshold), it is determined that the attribute y is a quasi-identifier.
- a correlation coefficient can be used.
- the uniformity of the attribute x ⁇ X that is clear as a quasi-identifier is determined in advance, and if the distribution of the attribute x is uniform, the attribute x is related. It may be excluded from the calculation target of the degree, and the attribute x may be the calculation target of the relation degree when the distribution of the attribute x is not uniform.
- an attribute whose distribution is not uniform may be referred to as an attribute whose distribution is not uniform.
- a statistical hypothesis test can be used to determine uniformity.
- N is an integer of 1 or more
- M is an integer of 2 or more
- X is a quasi-identifier of the attributes of database T.
- the clear attribute set hereinafter referred to as the first attribute set
- Y be the set of attributes that are candidates for the quasi-identifier among the attributes of the database T (hereinafter referred to as the second attribute set).
- the quasi-identifier determination device 100 inputs a first attribute set X and a second attribute set Y, and sets a quasi-identifier set which is a subset of the second attribute set Y whose elements are attributes determined to be quasi-identifiers. Generate and output.
- FIG. 2 is a block diagram showing the configuration of the quasi-identifier determination device 100.
- FIG. 3 is a flowchart showing the operation of the quasi-identifier determination device 100.
- the quasi-identifier determination device 100 includes a quasi-identifier set generation unit 120 and a recording unit 190.
- the recording unit 190 is a component unit that appropriately records information necessary for processing of the quasi-identifier determination device 100. For example, the first attribute set X and the second attribute set Y are recorded in the recording unit 190.
- the operation of the quasi-identifier determination device 100 will be described with reference to FIG.
- the database shown in FIG. 4 will be described as an example.
- the database has five attributes: gender, age, address, annual income (unit is 10,000), and blood type.
- the quasi-identifier set generation unit 120 takes the first attribute set X and the second attribute set Y as inputs, and sets the attributes that are the elements of the first attribute set X and the attributes that are the elements of the second attribute set Y. For each, the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is a quasi-identifier. In other cases, it is determined that the attribute that is an element of the second attribute set Y is not a quasi-identifier, and a subset of the second attribute set Y that has the attribute determined to be a quasi-identifier as an element is generated as a quasi-identifier set. And output.
- the value indicating that the calculated degree of relationship is large means that the calculated degree of relationship is greater than or equal to a predetermined threshold value or greater than or equal to a predetermined threshold value.
- a correlation coefficient can be used for the degree of relationship between the two attributes.
- an appropriate correlation coefficient shall be used according to the type of the two attributes for which the correlation coefficient is calculated.
- the qualitative attribute is an attribute that takes a value other than a numerical value as an attribute value such as gender, and the quantitative attribute is an age.
- the following correlation coefficient is used according to the type of the attribute that is the element of the first attribute set X and the attribute that is the element of the second attribute set Y.
- the attribute that is the element of the first attribute set X is gender and the attribute that is the element of the second attribute set Y is blood type
- the number of associations of Klamer is used and the element of the first attribute set X is used.
- the Pearson correlation coefficient is used, and the attribute that is the element of the first attribute set X is the address and the element of the second attribute set Y.
- the degree of relationship should be calculated using the correlation ratio.
- the correlation coefficient can take a value of [-1, 1]
- the absolute value of the correlation coefficient is calculated, and the attribute that is an element of the second attribute set Y that is larger than (or more than) a predetermined threshold value.
- the predetermined threshold value is a standard specified by the user, and may be, for example, 0.7, 0.9, or the like.
- the attribute of the database is a quasi-identifier.
- an attribute other than the attribute known as a quasi-identifier such as age, address, and gender, is a quasi-identifier.
- the quasi-identifier determination device 200 inputs a first attribute set X and a second attribute set Y, and sets a quasi-identifier set which is a subset of the second attribute set Y whose elements are attributes determined to be quasi-identifiers. Generate and output.
- FIG. 5 is a block diagram showing the configuration of the quasi-identifier determination device 200.
- FIG. 6 is a flowchart showing the operation of the quasi-identifier determination device 200.
- the quasi-identifier determination device 200 includes a third attribute set generation unit 210, a quasi-identifier set generation unit 120, and a recording unit 190.
- the recording unit 190 is a component unit that appropriately records information necessary for processing of the quasi-identifier determination device 200.
- the operation of the quasi-identifier determination device 200 will be described with reference to FIG.
- the third attribute set generation unit 210 takes the first attribute set X as an input, determines the uniformity of the attributes for each of the attributes that are the elements of the first attribute set X, and if they are not uniform. A subset of the first set X whose elements are the determined attributes is generated as the third attribute set X'and output.
- a statistical hypothesis test can be used to determine uniformity.
- a null hypothesis that "the uniform distribution and the distribution of the attribute to be judged are related" is made, and the probability of occurrence is calculated and specified. If the significance level (for example, 0.05, 0.01) is exceeded, the null hypothesis is rejected, and the distribution of the attributes to be judged is not related to the uniform distribution.
- An appropriate statistical hypothesis test shall be used according to the type of attribute to be judged.
- the following statistical hypothesis test is used to judge uniformity according to the type of attribute that is an element of the first attribute set X.
- the uniformity is determined by using the chi-square test or Fisher's exact test.
- the uniformity is determined by using the Kolmogorov-Smirnov test.
- the chi-square test is used, and if the attribute that is the element of the first attribute set X is age, the Kolmogorov-Smirnov test is used. It is good to judge the appearance.
- the quasi-identifier set generation unit 120 takes the third attribute set X'and the second attribute set Y as inputs, and the attribute that is an element of the third attribute set X'and the attribute that is an element of the second attribute set Y. For each set of, the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is a quasi-identifier. , In other cases, it is determined that the attribute that is an element of the second attribute set Y is not a quasi-identifier, and the subset of the second attribute set Y whose element is the attribute determined to be a quasi-identifier is a quasi-identifier set. And output as.
- the correlation ratio is set. Use to calculate the degree of relationship.
- the attribute of the database is a quasi-identifier.
- an attribute other than the attribute known as a quasi-identifier such as age, address, and gender, is a quasi-identifier.
- the third attribute which is a subset of the first attribute set X
- FIG. 7 is a diagram showing an example of a functional configuration of a computer that realizes each of the above-mentioned devices (that is, each node).
- the processing in each of the above-mentioned devices can be carried out by causing the recording unit 2020 to read a program for causing the computer to function as each of the above-mentioned devices, and operating the control unit 2010, the input unit 2030, the output unit 2040, and the like.
- the device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity.
- Communication unit CPU (Central Processing Unit, cache memory, registers, etc.) to which can be connected, RAM and ROM as memory, external storage device as hard hardware, and input, output, and communication units of these , CPU, RAM, ROM, and external storage device have a connecting bus so that data can be exchanged.
- a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity.
- a physical entity equipped with such hardware resources includes a general-purpose computer and the like.
- the external storage device of the hardware entity stores the program required to realize the above-mentioned functions and the data required for processing this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.
- each program stored in the external storage device (or ROM, etc.) and the data necessary for processing each program are read into the memory as needed, and are appropriately interpreted, executed, and processed by the CPU. ..
- the CPU realizes a predetermined function (each constituent unit represented as the above-mentioned ... unit, ... means, etc.).
- the present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. ..
- the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by a computer
- the processing content of the function that the hardware entity should have is described by a program.
- the processing function in the above hardware entity is realized on the computer.
- the program that describes this processing content can be recorded on a computer-readable recording medium.
- the computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.
- a hard disk device, a flexible disk, a magnetic tape, or the like as a magnetic recording device is used as an optical disk
- a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk.
- Memory CD-R (Recordable) / RW (ReWritable), etc.
- MO Magnetto-Optical disc
- EP-ROM Electroically Erasable and Programmable-Read Only Memory
- semiconductor memory can be used.
- the distribution of this program is carried out, for example, by selling, transferring, renting, etc., a portable recording medium such as a DVD or CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via the network.
- a computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. It is also possible to execute the process according to the received program one by one each time. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be.
- the program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).
- the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne une technique pour déterminer si des attributs dans une base de données sont ou non des quasi-identificateurs. Un dispositif de détermination de quasi-identificateur comprend une unité de génération d'ensemble de quasi-identificateurs qui, lorsque X représente un ensemble d'attributs qui sont manifestement des quasi-identificateurs (désigné ci-après en tant que premier ensemble d'attributs) parmi des attributs dans une base de données T, et Y représente un ensemble d'attributs qui sont des candidats quasi-identificateurs (désigné ci-après en tant que second ensemble d'attributs) parmi les attributs dans la base de données T, calcule, pour un tuple d'un attribut qui est un élément du premier ensemble d'attributs X et un attribut qui est un élément du second ensemble d'attributs Y, un degré d'une relation entre les deux attributs du tuple, et, lorsque le degré calculé de la relation a une valeur élevée, détermine l'attribut qui est l'élément du second ensemble d'attributs Y comme étant un quasi-identificateur, et génère un sous-ensemble de l'élément de l'attribut déterminé comme étant le quasi-identificateur dans le second ensemble d'attributs Y, en tant qu'ensemble de quasi-identificateurs.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/018125 WO2021220402A1 (fr) | 2020-04-28 | 2020-04-28 | Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme |
JP2022518488A JP7380856B2 (ja) | 2020-04-28 | 2020-04-28 | 準識別子判定装置、準識別子判定方法、プログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/018125 WO2021220402A1 (fr) | 2020-04-28 | 2020-04-28 | Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021220402A1 true WO2021220402A1 (fr) | 2021-11-04 |
Family
ID=78373461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/018125 WO2021220402A1 (fr) | 2020-04-28 | 2020-04-28 | Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7380856B2 (fr) |
WO (1) | WO2021220402A1 (fr) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010127216A2 (fr) * | 2009-05-01 | 2010-11-04 | Telcodia Technologies, Inc. | Détermination automatisée de quasi-identificateurs à l'aide d'une analyse de programme |
JP2017027137A (ja) * | 2015-07-16 | 2017-02-02 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、プログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8682910B2 (en) * | 2010-08-03 | 2014-03-25 | Accenture Global Services Limited | Database anonymization for use in testing database-centric applications |
-
2020
- 2020-04-28 WO PCT/JP2020/018125 patent/WO2021220402A1/fr active Application Filing
- 2020-04-28 JP JP2022518488A patent/JP7380856B2/ja active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010127216A2 (fr) * | 2009-05-01 | 2010-11-04 | Telcodia Technologies, Inc. | Détermination automatisée de quasi-identificateurs à l'aide d'une analyse de programme |
JP2017027137A (ja) * | 2015-07-16 | 2017-02-02 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、プログラム |
Also Published As
Publication number | Publication date |
---|---|
JP7380856B2 (ja) | 2023-11-15 |
JPWO2021220402A1 (fr) | 2021-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12056583B2 (en) | Target variable distribution-based acceptance of machine learning test data sets | |
US8095770B2 (en) | Method and system for mapping data to a process | |
US8448217B2 (en) | Computer program, method, and system for access control | |
US20120131387A1 (en) | Managing automated and manual application testing | |
US10824460B2 (en) | Information processing apparatus, information processing method for reducing network traffic, and storage medium | |
CN114329367B (zh) | 网盘文件追溯方法、装置、网盘及存储介质 | |
CN112416710A (zh) | 用户操作的记录方法、装置、电子设备及存储介质 | |
EP3264254B1 (fr) | Système et procédé de simulation d'un système de stockage de blocs sur un système de stockage d'objets | |
WO2021220402A1 (fr) | Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme | |
US10303882B2 (en) | Implementing locale management on PaaS: locale replacement risk analysis | |
US9519592B2 (en) | Stale pointer detection with overlapping versioned memory | |
KR102416336B1 (ko) | 블록체인을 관리하기 위한 장치, 방법, 시스템 및 컴퓨터 판독가능 저장 매체 | |
US20140181445A1 (en) | Systems and methods for processing instructions while repairing and providing access to a copied volume of data | |
JP2007133632A (ja) | セキュリティポリシー設定方法及びプログラム | |
US9298390B2 (en) | Systems and methods for copying data maintained in a dynamic storage volume and verifying the copied data | |
EP3933635B1 (fr) | Dispositif d'anonymisation, procédé d'anonymisation et programme | |
WO2021220404A1 (fr) | Dispositif de génération de base de données anonymisée, procédé de génération de base de données anonymisée, et programme | |
WO2021065004A1 (fr) | Dispositif d'évaluation de risque d'estimation d'identification, procédé d'évaluation de risque d'estimation d'identification, et programme | |
US12026393B2 (en) | Apparatus and method for selecting storage location based on data usage | |
US20220004544A1 (en) | Anonymity evaluation apparatus, anonymity evaluation method, and program | |
JP7057564B2 (ja) | 分類器生成装置、仮説検定装置、分類器生成方法、仮説検定方法、プログラム | |
US11533315B2 (en) | Data transfer discovery and analysis systems and related methods | |
WO2021220403A1 (fr) | Dispositif d'estimation d'attribut, procédé d'estimation d'attribut et programme | |
WO2023058151A1 (fr) | Dispositif de mise en correspondance de sous-graphes, procédé de mise en correspondance de sous-graphes et programme | |
US20240037653A1 (en) | Secure Decentralized System and Method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933081 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022518488 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933081 Country of ref document: EP Kind code of ref document: A1 |