WO2021220402A1 - Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme - Google Patents

Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme Download PDF

Info

Publication number
WO2021220402A1
WO2021220402A1 PCT/JP2020/018125 JP2020018125W WO2021220402A1 WO 2021220402 A1 WO2021220402 A1 WO 2021220402A1 JP 2020018125 W JP2020018125 W JP 2020018125W WO 2021220402 A1 WO2021220402 A1 WO 2021220402A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
quasi
identifier
attributes
attribute set
Prior art date
Application number
PCT/JP2020/018125
Other languages
English (en)
Japanese (ja)
Inventor
聡 長谷川
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/018125 priority Critical patent/WO2021220402A1/fr
Priority to JP2022518488A priority patent/JP7380856B2/ja
Publication of WO2021220402A1 publication Critical patent/WO2021220402A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • the present invention relates to a technique for concealing database data.
  • Non-Patent Document 1 As a technique for concealing database data, there are a definitive method such as Non-Patent Document 1 and Non-Patent Document 2 and a probabilistic method such as Non-Patent Document 3.
  • the target database contains N data sets (hereinafter referred to as records) composed of M attribute values (M is an integer of 2 or more) as shown in FIG. 1 (N is 1 or more). Integer) is included. All of these methods are techniques for concealing data using a combination of attributes called quasi-identifiers, which can uniquely identify a record.
  • an object of the present invention is to provide a technique for determining whether or not a database attribute is a quasi-identifier.
  • X is a set of attributes of database T that are clearly defined as quasi-identifiers (hereinafter referred to as first attribute set), and Y is a set of attributes of database T that are candidates for quasi-identifiers.
  • the relationship between the two attributes of the first attribute set X and the attribute set that is the element of the second attribute set Y is calculated. If the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is determined to be a quasi-identifier, and the attribute determined to be the quasi-identifier is defined as an element. It includes a quasi-identifier set generation unit that generates a subset of the second attribute set Y as a quasi-identifier set.
  • X is a set of attributes of database T that are clearly defined as quasi-identifiers (hereinafter referred to as first attribute set), and Y is a set of attributes of database T that are candidates for quasi-identifiers.
  • first attribute set a set of attributes of database T that are clearly defined as quasi-identifiers
  • Y a set of attributes of database T that are candidates for quasi-identifiers.
  • the uniformity of the attribute is determined for the attribute that is an element of the first attribute set X, and the attribute determined to be non-uniform is used as an element.
  • the third attribute set generator that generates a subset of the set X as the third attribute set X', and the set of attributes that are the elements of the third attribute set X'and the attributes that are the elements of the second attribute set Y.
  • the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is determined to be a quasi-identifier. It includes a quasi-identifier set generation unit that generates a subset of the second attribute set Y whose elements are the attributes determined to be the quasi-identifiers as the quasi-identifier set.
  • (Caret) represents a superscript.
  • x y ⁇ z means that y z is a superscript for x
  • x y ⁇ z means that y z is a subscript for x
  • _ (underscore) represents a subscript.
  • x y_z means that y z is a superscript for x
  • x y_z means that y z is a subscript for x.
  • Each embodiment of the present invention determines whether or not an attribute other than an attribute that is clear as a quasi-identifier is an attribute that becomes a quasi-identifier with respect to the database.
  • the attribute that is clear as a quasi-identifier means, for example, an attribute known as a quasi-identifier such as age, address, and gender, or an attribute designated as a quasi-identifier by the user.
  • a set X of attributes that is clear as a quasi-identifier and a set Y of attributes that are other than the clear attributes as quasi-identifiers and are candidates for quasi-identifiers are prepared.
  • the degree of relationship indicating the strength of the relationship between the attribute x ⁇ X that is clear as a quasi-identifier and the attribute y ⁇ Y that is a candidate for the quasi-identifier is calculated, and the attribute y has a strong relationship with the attribute x (that is,).
  • the degree of relation is greater than or greater than a predetermined threshold), it is determined that the attribute y is a quasi-identifier.
  • a correlation coefficient can be used.
  • the uniformity of the attribute x ⁇ X that is clear as a quasi-identifier is determined in advance, and if the distribution of the attribute x is uniform, the attribute x is related. It may be excluded from the calculation target of the degree, and the attribute x may be the calculation target of the relation degree when the distribution of the attribute x is not uniform.
  • an attribute whose distribution is not uniform may be referred to as an attribute whose distribution is not uniform.
  • a statistical hypothesis test can be used to determine uniformity.
  • N is an integer of 1 or more
  • M is an integer of 2 or more
  • X is a quasi-identifier of the attributes of database T.
  • the clear attribute set hereinafter referred to as the first attribute set
  • Y be the set of attributes that are candidates for the quasi-identifier among the attributes of the database T (hereinafter referred to as the second attribute set).
  • the quasi-identifier determination device 100 inputs a first attribute set X and a second attribute set Y, and sets a quasi-identifier set which is a subset of the second attribute set Y whose elements are attributes determined to be quasi-identifiers. Generate and output.
  • FIG. 2 is a block diagram showing the configuration of the quasi-identifier determination device 100.
  • FIG. 3 is a flowchart showing the operation of the quasi-identifier determination device 100.
  • the quasi-identifier determination device 100 includes a quasi-identifier set generation unit 120 and a recording unit 190.
  • the recording unit 190 is a component unit that appropriately records information necessary for processing of the quasi-identifier determination device 100. For example, the first attribute set X and the second attribute set Y are recorded in the recording unit 190.
  • the operation of the quasi-identifier determination device 100 will be described with reference to FIG.
  • the database shown in FIG. 4 will be described as an example.
  • the database has five attributes: gender, age, address, annual income (unit is 10,000), and blood type.
  • the quasi-identifier set generation unit 120 takes the first attribute set X and the second attribute set Y as inputs, and sets the attributes that are the elements of the first attribute set X and the attributes that are the elements of the second attribute set Y. For each, the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is a quasi-identifier. In other cases, it is determined that the attribute that is an element of the second attribute set Y is not a quasi-identifier, and a subset of the second attribute set Y that has the attribute determined to be a quasi-identifier as an element is generated as a quasi-identifier set. And output.
  • the value indicating that the calculated degree of relationship is large means that the calculated degree of relationship is greater than or equal to a predetermined threshold value or greater than or equal to a predetermined threshold value.
  • a correlation coefficient can be used for the degree of relationship between the two attributes.
  • an appropriate correlation coefficient shall be used according to the type of the two attributes for which the correlation coefficient is calculated.
  • the qualitative attribute is an attribute that takes a value other than a numerical value as an attribute value such as gender, and the quantitative attribute is an age.
  • the following correlation coefficient is used according to the type of the attribute that is the element of the first attribute set X and the attribute that is the element of the second attribute set Y.
  • the attribute that is the element of the first attribute set X is gender and the attribute that is the element of the second attribute set Y is blood type
  • the number of associations of Klamer is used and the element of the first attribute set X is used.
  • the Pearson correlation coefficient is used, and the attribute that is the element of the first attribute set X is the address and the element of the second attribute set Y.
  • the degree of relationship should be calculated using the correlation ratio.
  • the correlation coefficient can take a value of [-1, 1]
  • the absolute value of the correlation coefficient is calculated, and the attribute that is an element of the second attribute set Y that is larger than (or more than) a predetermined threshold value.
  • the predetermined threshold value is a standard specified by the user, and may be, for example, 0.7, 0.9, or the like.
  • the attribute of the database is a quasi-identifier.
  • an attribute other than the attribute known as a quasi-identifier such as age, address, and gender, is a quasi-identifier.
  • the quasi-identifier determination device 200 inputs a first attribute set X and a second attribute set Y, and sets a quasi-identifier set which is a subset of the second attribute set Y whose elements are attributes determined to be quasi-identifiers. Generate and output.
  • FIG. 5 is a block diagram showing the configuration of the quasi-identifier determination device 200.
  • FIG. 6 is a flowchart showing the operation of the quasi-identifier determination device 200.
  • the quasi-identifier determination device 200 includes a third attribute set generation unit 210, a quasi-identifier set generation unit 120, and a recording unit 190.
  • the recording unit 190 is a component unit that appropriately records information necessary for processing of the quasi-identifier determination device 200.
  • the operation of the quasi-identifier determination device 200 will be described with reference to FIG.
  • the third attribute set generation unit 210 takes the first attribute set X as an input, determines the uniformity of the attributes for each of the attributes that are the elements of the first attribute set X, and if they are not uniform. A subset of the first set X whose elements are the determined attributes is generated as the third attribute set X'and output.
  • a statistical hypothesis test can be used to determine uniformity.
  • a null hypothesis that "the uniform distribution and the distribution of the attribute to be judged are related" is made, and the probability of occurrence is calculated and specified. If the significance level (for example, 0.05, 0.01) is exceeded, the null hypothesis is rejected, and the distribution of the attributes to be judged is not related to the uniform distribution.
  • An appropriate statistical hypothesis test shall be used according to the type of attribute to be judged.
  • the following statistical hypothesis test is used to judge uniformity according to the type of attribute that is an element of the first attribute set X.
  • the uniformity is determined by using the chi-square test or Fisher's exact test.
  • the uniformity is determined by using the Kolmogorov-Smirnov test.
  • the chi-square test is used, and if the attribute that is the element of the first attribute set X is age, the Kolmogorov-Smirnov test is used. It is good to judge the appearance.
  • the quasi-identifier set generation unit 120 takes the third attribute set X'and the second attribute set Y as inputs, and the attribute that is an element of the third attribute set X'and the attribute that is an element of the second attribute set Y. For each set of, the degree of relationship between the two attributes of the set is calculated, and if the value indicates that the calculated degree of relationship is large, the attribute that is an element of the second attribute set Y is a quasi-identifier. , In other cases, it is determined that the attribute that is an element of the second attribute set Y is not a quasi-identifier, and the subset of the second attribute set Y whose element is the attribute determined to be a quasi-identifier is a quasi-identifier set. And output as.
  • the correlation ratio is set. Use to calculate the degree of relationship.
  • the attribute of the database is a quasi-identifier.
  • an attribute other than the attribute known as a quasi-identifier such as age, address, and gender, is a quasi-identifier.
  • the third attribute which is a subset of the first attribute set X
  • FIG. 7 is a diagram showing an example of a functional configuration of a computer that realizes each of the above-mentioned devices (that is, each node).
  • the processing in each of the above-mentioned devices can be carried out by causing the recording unit 2020 to read a program for causing the computer to function as each of the above-mentioned devices, and operating the control unit 2010, the input unit 2030, the output unit 2040, and the like.
  • the device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity.
  • Communication unit CPU (Central Processing Unit, cache memory, registers, etc.) to which can be connected, RAM and ROM as memory, external storage device as hard hardware, and input, output, and communication units of these , CPU, RAM, ROM, and external storage device have a connecting bus so that data can be exchanged.
  • a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity.
  • a physical entity equipped with such hardware resources includes a general-purpose computer and the like.
  • the external storage device of the hardware entity stores the program required to realize the above-mentioned functions and the data required for processing this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.
  • each program stored in the external storage device (or ROM, etc.) and the data necessary for processing each program are read into the memory as needed, and are appropriately interpreted, executed, and processed by the CPU. ..
  • the CPU realizes a predetermined function (each constituent unit represented as the above-mentioned ... unit, ... means, etc.).
  • the present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. ..
  • the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by a computer
  • the processing content of the function that the hardware entity should have is described by a program.
  • the processing function in the above hardware entity is realized on the computer.
  • the program that describes this processing content can be recorded on a computer-readable recording medium.
  • the computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.
  • a hard disk device, a flexible disk, a magnetic tape, or the like as a magnetic recording device is used as an optical disk
  • a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk.
  • Memory CD-R (Recordable) / RW (ReWritable), etc.
  • MO Magnetto-Optical disc
  • EP-ROM Electroically Erasable and Programmable-Read Only Memory
  • semiconductor memory can be used.
  • the distribution of this program is carried out, for example, by selling, transferring, renting, etc., a portable recording medium such as a DVD or CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via the network.
  • a computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. It is also possible to execute the process according to the received program one by one each time. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be.
  • the program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).
  • the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne une technique pour déterminer si des attributs dans une base de données sont ou non des quasi-identificateurs. Un dispositif de détermination de quasi-identificateur comprend une unité de génération d'ensemble de quasi-identificateurs qui, lorsque X représente un ensemble d'attributs qui sont manifestement des quasi-identificateurs (désigné ci-après en tant que premier ensemble d'attributs) parmi des attributs dans une base de données T, et Y représente un ensemble d'attributs qui sont des candidats quasi-identificateurs (désigné ci-après en tant que second ensemble d'attributs) parmi les attributs dans la base de données T, calcule, pour un tuple d'un attribut qui est un élément du premier ensemble d'attributs X et un attribut qui est un élément du second ensemble d'attributs Y, un degré d'une relation entre les deux attributs du tuple, et, lorsque le degré calculé de la relation a une valeur élevée, détermine l'attribut qui est l'élément du second ensemble d'attributs Y comme étant un quasi-identificateur, et génère un sous-ensemble de l'élément de l'attribut déterminé comme étant le quasi-identificateur dans le second ensemble d'attributs Y, en tant qu'ensemble de quasi-identificateurs.
PCT/JP2020/018125 2020-04-28 2020-04-28 Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme WO2021220402A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/018125 WO2021220402A1 (fr) 2020-04-28 2020-04-28 Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme
JP2022518488A JP7380856B2 (ja) 2020-04-28 2020-04-28 準識別子判定装置、準識別子判定方法、プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/018125 WO2021220402A1 (fr) 2020-04-28 2020-04-28 Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme

Publications (1)

Publication Number Publication Date
WO2021220402A1 true WO2021220402A1 (fr) 2021-11-04

Family

ID=78373461

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/018125 WO2021220402A1 (fr) 2020-04-28 2020-04-28 Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme

Country Status (2)

Country Link
JP (1) JP7380856B2 (fr)
WO (1) WO2021220402A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010127216A2 (fr) * 2009-05-01 2010-11-04 Telcodia Technologies, Inc. Détermination automatisée de quasi-identificateurs à l'aide d'une analyse de programme
JP2017027137A (ja) * 2015-07-16 2017-02-02 日本電気株式会社 情報処理装置、情報処理方法、及び、プログラム

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682910B2 (en) * 2010-08-03 2014-03-25 Accenture Global Services Limited Database anonymization for use in testing database-centric applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010127216A2 (fr) * 2009-05-01 2010-11-04 Telcodia Technologies, Inc. Détermination automatisée de quasi-identificateurs à l'aide d'une analyse de programme
JP2017027137A (ja) * 2015-07-16 2017-02-02 日本電気株式会社 情報処理装置、情報処理方法、及び、プログラム

Also Published As

Publication number Publication date
JP7380856B2 (ja) 2023-11-15
JPWO2021220402A1 (fr) 2021-11-04

Similar Documents

Publication Publication Date Title
US12056583B2 (en) Target variable distribution-based acceptance of machine learning test data sets
US8095770B2 (en) Method and system for mapping data to a process
US8448217B2 (en) Computer program, method, and system for access control
US20120131387A1 (en) Managing automated and manual application testing
US10824460B2 (en) Information processing apparatus, information processing method for reducing network traffic, and storage medium
CN114329367B (zh) 网盘文件追溯方法、装置、网盘及存储介质
CN112416710A (zh) 用户操作的记录方法、装置、电子设备及存储介质
EP3264254B1 (fr) Système et procédé de simulation d'un système de stockage de blocs sur un système de stockage d'objets
WO2021220402A1 (fr) Dispositif de détermination de quasi-identificateur, procédé de détermination de quasi-identificateur et programme
US10303882B2 (en) Implementing locale management on PaaS: locale replacement risk analysis
US9519592B2 (en) Stale pointer detection with overlapping versioned memory
KR102416336B1 (ko) 블록체인을 관리하기 위한 장치, 방법, 시스템 및 컴퓨터 판독가능 저장 매체
US20140181445A1 (en) Systems and methods for processing instructions while repairing and providing access to a copied volume of data
JP2007133632A (ja) セキュリティポリシー設定方法及びプログラム
US9298390B2 (en) Systems and methods for copying data maintained in a dynamic storage volume and verifying the copied data
EP3933635B1 (fr) Dispositif d'anonymisation, procédé d'anonymisation et programme
WO2021220404A1 (fr) Dispositif de génération de base de données anonymisée, procédé de génération de base de données anonymisée, et programme
WO2021065004A1 (fr) Dispositif d'évaluation de risque d'estimation d'identification, procédé d'évaluation de risque d'estimation d'identification, et programme
US12026393B2 (en) Apparatus and method for selecting storage location based on data usage
US20220004544A1 (en) Anonymity evaluation apparatus, anonymity evaluation method, and program
JP7057564B2 (ja) 分類器生成装置、仮説検定装置、分類器生成方法、仮説検定方法、プログラム
US11533315B2 (en) Data transfer discovery and analysis systems and related methods
WO2021220403A1 (fr) Dispositif d'estimation d'attribut, procédé d'estimation d'attribut et programme
WO2023058151A1 (fr) Dispositif de mise en correspondance de sous-graphes, procédé de mise en correspondance de sous-graphes et programme
US20240037653A1 (en) Secure Decentralized System and Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933081

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022518488

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933081

Country of ref document: EP

Kind code of ref document: A1