WO2011145401A1 - 個人情報匿名化装置 - Google Patents
個人情報匿名化装置 Download PDFInfo
- Publication number
- WO2011145401A1 WO2011145401A1 PCT/JP2011/058590 JP2011058590W WO2011145401A1 WO 2011145401 A1 WO2011145401 A1 WO 2011145401A1 JP 2011058590 W JP2011058590 W JP 2011058590W WO 2011145401 A1 WO2011145401 A1 WO 2011145401A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hierarchical tree
- personal information
- node
- attribute
- tree
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6263—Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2145—Inheriting rights or properties, e.g., propagation of permissions or restrictions within a hierarchy
Definitions
- the present invention relates to anonymization of personal information.
- the simplest process for anonymizing personal information is to remove information that can identify an individual from the personal information, or to make it ambiguous.
- a process of removing a name and an address is applicable, and as an example of the latter, a process such as converting an address into units of prefectures or converting an age into increments of 10 years corresponds.
- a generalized hierarchical tree what is expressed in a tree structure according to the degree of ambiguity is called a generalized hierarchical tree.
- an individual may be identified by combining a plurality of attributes related to the individual. For example, an individual can be identified as long as it is an extremely rare combination of an address in units of prefectures and an age of 10 years. Thus, in anonymization, it is desired to remove the possibility of identification more reliably.
- Anonymization that generates threshold data and generates anonymous data that guarantees that any combination of attribute values included in personal information data exists in the data beyond the threshold as a technique for removing identifiability There is technology.
- the present invention also belongs to this kind of anonymization technology. This type of anonymization technique is described in Non-Patent Document 1.
- Non-Patent Document 1 At least a threshold number of combinations of arbitrary attribute values appearing in the disclosed data appear by obfuscating attribute values in personal information data using a generalized hierarchical tree. It is stated that it is guaranteed.
- Non-Patent Document 1 requires that a generalized hierarchical tree that defines the degree of ambiguity be separately defined for each attribute. Moreover, since all candidates that achieve the threshold value or more are output, it is necessary to select the anonymous data to be used, and the means for determining the superiority or inferiority of the usefulness between the anonymous data cannot be automated.
- the present invention has been made in consideration of such circumstances, and provides for protecting personal information appropriately while reducing the operation cost of personal information anonymization.
- An example disclosed is a personal information input means for inputting one or more pieces of personal information consisting of attribute values for each attribute, and one attribute is selected, and the attribute value for each attribute value that appears in the selected attribute.
- This is a personal information anonymization device having means for converting to personal information.
- Example 1 it is a figure which shows the structural example of a computer.
- Example 1 it is a figure which shows an example of a personal information table.
- Example 1 it is a figure which shows an example of the minimum equivalence number information.
- Example 1 it is a figure which shows an example of attribute classification information.
- Example 1 it is a figure which shows an example of the generalized hierarchy tree table.
- Example 1 it is a figure which shows an example of the generalized hierarchy tree table.
- Example 1 it is a figure which shows an example of the generalized hierarchy tree table.
- Example 1 it is a figure which shows an example of the generalized hierarchy tree table.
- Example 1 it is a figure which shows an example of an anonymous information table.
- FIG. 6 is a diagram illustrating an operation example in the first embodiment.
- FIG. 6 is a diagram illustrating an operation example in the first embodiment.
- FIG. 6 is a diagram illustrating an operation example in the first embodiment.
- FIG. 6 is a diagram illustrating an operation example in the first embodiment.
- FIG. 6 is a diagram illustrating an operation example in the first embodiment.
- Example 2 it is a figure which shows the structural example of a computer.
- Example 2 it is a figure which shows an example of the generation
- Example 2 it is a figure which shows an operation example.
- Example 2 it is a figure which shows an operation example.
- Example 3 it is a figure which shows the structural example of a computer.
- Example 3 it is a figure which shows an example of a user definition hierarchy tree table.
- Example 3 it is a figure which shows an example of the generalization hierarchy tree based on a user definition hierarchy tree and a user definition hierarchy tree.
- Example 3 it is a figure which shows an example of the generalization hierarchy tree based on a user definition hierarchy tree and a user definition hierarchy tree.
- Example 3 it is a figure which shows an example of the generalization hierarchy tree based on a user definition hierarchy tree and a user definition hierarchy tree.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- Example 3 it is a figure which shows an operation example.
- the personal information in the present embodiment refers to information relating to an individual, which can identify a specific individual by name, date of birth, and other information. Further, personal information includes information that can be easily collated with other information, thereby identifying a specific individual.
- personal information anonymization refers to a process of converting personal information so that the information subject cannot be identified. Also, re-encoding means replacing an attribute value that describes an attribute of an individual with a more ambiguous concept.
- Fig. 1 shows an example of configuring the device on a computer.
- a computer 100 is an arbitrary information processing apparatus such as a PC (Personal Computer), a server, or a workstation.
- the computer 100 includes a CPU (Central Processing Unit) 101, a memory 102, a storage 103, an input device 104, an output device 105, a communication device 106, and the like. All of these are connected to each other by an internal communication line 107 such as a bus.
- CPU Central Processing Unit
- the storage 103 is, for example, a CD-R (Compact Disc Recordable), a DVD-RAM (Digital Versatile Disk Random Access Memory), a storage medium such as a silicon disk, a drive device of the storage medium, an HDD (Hard Disk Drive), or the like.
- the storage 103 stores a personal information table 131, an anonymous information table 132, minimum equivalence number information 133, attribute type information 134, a program 151, and the like.
- the personal information table 131 stores personal information regarding a plurality of individuals. In this embodiment, each personal information consists of item values for a plurality of items.
- the anonymous information table 132 stores the result of anonymizing the personal information table 131 according to the present invention.
- the minimum equivalence number information 133 stores a threshold value.
- the attribute type information 134 stores the information type of each attribute that the personal information table 131 has.
- the program 151 is for realizing functions to be described later.
- the input device 104 is, for example, a keyboard, a mouse, a scanner, a microphone, or the like.
- the output device 105 is a display, a printer, a speaker, or the like.
- the communication device 106 is, for example, a LAN (Local Area Network) board or the like, and is connected to a communication network (not shown).
- the CPU 101 implements the generalized hierarchical tree automatic generation unit 121 and the re-encoding unit 122 by loading and executing the program 151 on the memory 102.
- the re-encoding unit 122 realizes a loss information amount measuring unit 123 as internal processing as necessary.
- the generalized hierarchical tree automatic generation unit 121 receives the personal information table 131 and the attribute value type information 134 as input, acquires the frequency of all attribute values from each attribute of the personal information table 131, and acquires the acquired frequency information and attribute value type.
- a Huffman code tree, a Shannon-Fano code tree, or a Hu-Tucker code tree is created from the attribute type information acquired from the information 134, and the created tree is stored in the generalized hierarchical tree table 135 as a generalized hierarchical tree.
- the re-encoding unit 122 receives the personal information table 131, the minimum equivalence number information 133, and the generalized hierarchical tree table 135 as input, and attributes thereof according to the generalized hierarchical tree corresponding to each attribute obtained from the generalized hierarchical tree table 135.
- the number of all records existing on the table is made larger than the value stored in the minimum equivalence number information 133, and the result is output to the anonymous information table 132.
- the result may be further output to the output device 105.
- the loss information amount measuring unit 123 is a part that quantitatively evaluates the amount of information lost due to re-encoding of attribute values, and is called from the re-encoding unit 122 as necessary.
- the personal information table 131 has a plurality of records.
- One record represents information related to one person.
- Each record is represented as a tuple of attribute values of attribute 201, attribute 202, and attribute 203.
- the first line of the table shown in Fig. 2 indicates the attribute name.
- the attribute 201, the attribute 202, and the attribute 203 represent any one or more of an individual's address, age, country of origin, or region of origin.
- the items of personal information are not limited to those shown in FIG. 2, and may be arbitrary.
- the total number of individuals (total number of records) and the number of attributes may be arbitrary.
- the computer 100 anonymizes information that can identify an individual.
- Information that can identify an individual is not necessarily limited to information that directly identifies an individual, such as a name. For example, there are cases where an individual can be specified by a combination of gender, age, and address.
- an attribute for anonymization is determined in advance by the system operator. In the example of FIG. 2, the system operator determines that a combination of address, age, country of origin or region of origin leads to identification of an individual, and anonymizes these three attributes. That is, the entire attribute of personal information need not be limited to address, age, country of origin or region.
- the example of FIG. 3 shows that the minimum equivalence number 301 is five.
- This minimum equivalence number 301 is a value that can be regarded as difficult to identify an individual even if data is disclosed if the number of records having the same attribute value tuple is equal to or greater than the minimum equivalence number 301.
- the value of the minimum equivalence number 301 is not limited to five, and may be arbitrary.
- the attribute type information 134 defines the information type of the attribute in order to specify the configuration method when configuring the generalized hierarchical tree of each attribute to be anonymized.
- the generalized hierarchical tree of the attribute “address” 401 is a character string processing type
- the generalized hierarchical tree of the attribute “age” 402 is an order storage type
- the attribute “country of origin or origin” This indicates that the generalized hierarchical tree of “region” 403 is generated elsewhere.
- the type of character string processing is designated.
- the order preserving type refers to a case in which the order of the leaves of the generalized hierarchical tree that has been configured is determined in advance, and the other refers to those that are neither the character string processing type nor the order preserving type.
- FIGS. 5a, 5b, and 5c An example of the generalized hierarchical tree table 135 is shown with reference to FIGS. 5a, 5b, and 5c.
- the generalized hierarchical tree table 135 is generated by the generalized hierarchical tree automatic generation unit 121 with reference to the personal information table 131 and the attribute type information 134.
- a conceptual diagram of the generalized hierarchical tree 135-a1 created for the attribute “address” 201 with reference to FIG. 5A-1 is shown, and the generalized hierarchical tree with reference to FIG. 5A-2
- a method of saving the tree 135-a1 on the storage will be described, and a method of managing the generalized hierarchical tree 135-a1 on the memory will be described with reference to FIG. 5 (a-3).
- the generalized hierarchical tree 135-a1 for the attribute "address" 201 is represented as a tree structure composed of a plurality of nodes and branches.
- a branch means a parent-child relationship between nodes.
- a branch is indicated by an arrow line, and the node on the root side of the arrow line is a parent, and the node on the arrow side is a child.
- the node 501 is a parent and the node 502 is a child.
- a node that does not have a parent is called a root, and a node that has no children is called a leaf.
- the node 501 is a root, and the nodes 503 and 504 are leaves.
- a node that can be reached by following a parent is called an ancestor, and a node that can be reached by following a child is called a descendant.
- Nodes that are not leaves are called internal nodes.
- the node 501 and the node 502 are internal nodes.
- a label 5031 and a frequency 5032 are associated with each node.
- the original attribute value is associated with the leaf as a label, and the number of occurrences of the attribute value in the personal information table is associated with the frequency.
- the leaf 503 is labeled “Bunkyo-ku, Tokyo”, and the number of appearances 35 is associated as the frequency.
- An abstract concept that can represent all its children is assigned to the label of the internal node, and the sum of the frequencies of all children is assigned as the frequency.
- the attribute “address” 201 is a front-matching character string processing type with reference to the attribute type information 134
- the node 503 “Bunkyo-ku, Tokyo” and the node 504 “Toshima-ku, Tokyo” are the same parent node 502.
- “Tokyo” is assigned as the label of the node 502.
- the frequency of the node 502 the sum of the frequencies of all its children is associated.
- a generalized hierarchical tree 135-a1 is obtained by performing a forward matching type character string processing on the generalized hierarchical structure of all attribute values and outputting it as a tree structure.
- FIG. 5A-2 shows an example of a method for saving the generalized hierarchical tree 135-a1 on the storage. It is stored on the storage using a relational database or the like.
- An example of saving the generalized hierarchical tree 135-a1 as a table on the relational database is a table 135-a2.
- the first row 511 of the table 135-a2 shows the label of each column, and each record in the second row and thereafter corresponds to one node. That is, the left column means the label of the node, the center column means the label of the parent node of the node, and the right column means the frequency of the node.
- the record 512 corresponds to the node 501. Since node 501 is a root, it has no parent. In this case, the value NULL is stored in the center column, and the frequency 205 of the node 501 is stored in the right column. Similarly, a record corresponding to the node 502 is a record 513.
- FIG. 5 (a-3) shows a method of managing the generalized hierarchical tree 135-a1 on the memory.
- the data structure 521 is an example of a data structure for managing the node 501.
- the data structure includes a pointer 5211, a pointer 5212 pointing to the parent, a child pointer list 5213, a label 5214 for the node, and a frequency 5215 for the node. Since the data structure 521 representing the node 501 corresponds to the root, the parent pointer is NULL. Similarly, for example, since the node 503 is a leaf, the child pointer list of the data structure representing the node 503 is empty.
- the attribute “age” 202 is an order storage type.
- the order storage type refers to an information type for storing the order of leaves. Actually, in the generalized hierarchical tree 135-b1, all the leaves store their magnitude order from left to right. Note that the order is not limited to a large or small order, and an arbitrary order such as a lexicographic order or a manually given order is applicable.
- the frequency information of the attribute value of the attribute is used to construct the generalized hierarchical tree of the order preservation type.
- the table 135-b2 is a table showing the frequency of the attribute value of the attribute “age” 202, 50 records having the attribute value “20”, 35 records having the attribute value “25”, and attribute value. This indicates that there are 25 records that take “27”, 40 records that take attribute value “33”, and 55 records that take attribute value “38”, and there are no other records that take attribute values.
- the attribute value is limited to five types, but it is not necessary to limit to this.
- the labels of the internal nodes can be specified in the form of a range. For example, at the node 531, a label “20-27” can be designated. In other words, the ranges represented by the two-node labels that are not in an ancestor-descendant relationship do not overlap.
- FIG. 6 shows a result of automatically generating a generalized hierarchical tree when the attribute type of the attribute “age” is “others” described later.
- labels are assigned to the internal nodes in the form of ranges.
- the labels of the nodes 601 and 602 are “25-38” and “33”, respectively, and these nodes have “33” “25-38” even though they are not in an ancestor-descendant relationship. ”Is included in the range.
- a generalized hierarchical tree having the attribute “country of origin or region” 203 is constructed using FIG.
- the attribute type information 134 the attribute type of the attribute “country of origin or region” 203 is other. That is, neither the character string processing type nor the order storage type.
- a generalized hierarchical tree is constructed using only frequency information of all attribute values of the attribute.
- An example of configuring a generalized hierarchical tree using the frequency information 135-c2 is a tree 135-c1.
- the label assigned to the internal node is a list of the labels of the leaves below the internal node.
- the node 541 is assigned the label ⁇ China, France, Germany, USA, UK ⁇ , which is interpreted as "China or France, Germany, USA or UK”.
- Each attribute value is re-encoded to the node label of the generalized hierarchical tree for that attribute.
- the node of the re-encoding destination is limited to the ancestor of the leaf corresponding to the original attribute value.
- the attribute value tuple with respect to all the attributes which appear in the anonymous information table 132 needs to exist at least the number of the minimum equivalence number information 133 in the whole anonymous information table. For example, there must be five or more records having the tuple data tuple 701 (Yokohama City, Kanagawa Prefecture, 33-38, Japan) indicated in the minimum equivalence number information 133.
- the node labels that are in the relationship between the descendants and the ancestors are not present in the anonymous information table 132, but this is not a limitation. That is, for example, an age cell re-encoded as “20-27” and an age cell re-encoded as “25-27” may coexist.
- the generalized hierarchical tree automatic generation unit 121 refers to the personal information table 131 and the attribute type information 134, automatically generates a generalized hierarchical tree, and stores the result in the generalized hierarchical tree table 135 (S801). .
- the re-encoding unit 122 refers to the personal information table 131, the minimum number of equivalence cases 134, and the generalized hierarchical tree table 135, and performs data so that there are five or more arbitrary records indicated in the minimum equivalence number 301. Is re-encoded and the result is stored in the anonymous information table 132 (S802).
- S801 and S802 are executed continuously, but these steps can be separated as can be seen from the above description. That is, the timing at which the generalized tree automatic generation unit 121 performs S801 and the timing at which the re-encoding unit 122 performs S802 can be made different. However, S801 needs to be executed before S802. By shifting the execution timings of S801 and S802, the following advantages can be obtained.
- the user of the computer 100 can browse the generalized hierarchical tree automatically generated in S801 and make corrections thereto.
- the tree that leaves only the internal nodes that are candidates for the re-encoding destination is re-examined.
- the speed of S802 can be increased.
- the generalized hierarchical tree other than the above can be modified, and the user can replace the generalized hierarchical tree with a structure that has nothing to do with the automatically generated tree.
- m is the total number of attributes (number of columns) in the personal information table 131.
- Each column of the personal information table 131 is called the 0th column, the 1st column,..., The (m ⁇ 1) th column in order from the left.
- the personal information table 131 is read into the memory 102 (S901), and the parameter j is initialized to 0 (S902).
- the attribute type of the jth attribute is acquired from the attribute type information 134 (S904), and conditional branching is performed according to the result (S905).
- attribute type of the attribute is “character string processing type” in S905
- all attribute values appearing in the personal information table 131 of the jth attribute are listed without being leaked (S911). Specifically, while scanning all records, it is determined whether the attribute values corresponding to the jth attribute have already been enumerated.
- a data structure such as a set provided in a standard library of C ++, which is a programming language, may be used to determine whether or not they are enumerated.
- the character string processing designated from the enumerated attribute values is performed, the inclusion relation is extracted, and a tree is constructed based on the inclusion relation (S912).
- the extraction method of the inclusion relationship depends on various known character string processing methods. For example, in the case of the forward matching type character string processing as in the example of FIG. All are cut out so that the longer matching length is closer to the leaf and the shorter matching length is closer to the root.
- Two attribute values having a matching portion in the character string are leaves of a subtree having the matching portion as a root, and a matching character string can be assigned to the label of the node serving as the root of the subtree. Labels are assigned to all nodes in an appropriate manner even for character string processing types other than the forward matching type.
- the attribute type of the attribute is “order preservation type” in S905
- frequency information of all attribute values of the jth attribute is acquired (S921). Specifically, while scanning all records, it is determined whether the attribute value corresponding to the jth attribute of the record currently being scanned has already been enumerated, and if enumerated, a counter that counts the frequency of the attribute value is counted. The attribute value frequency counter is set to 1 if the attribute value is not enumerated.
- a map provided by a standard library of C ++ is used. A map can associate a value with an element in the set set described above. The set element is called a key, and the associated value is called a value.
- the frequency of each attribute value is stored in map.
- a Hu-Tucker code tree is constructed using the frequency information of the jth attribute acquired above, and this is used as a generalized hierarchical tree of the attribute (S922).
- labels are assigned to the nodes by an appropriate method. Note that, in the case of the “order preservation type”, as described above, the label of the internal node can be assigned as a simple range of attribute values. After finishing the process of S922, it moves to the process of S941 mentioned later.
- a Huffman code tree or a Shannon-Fano code tree is constructed using the frequency information of the jth attribute acquired above, and this is used as a generalized hierarchical tree of the attribute (S932). Which code tree is used is determined in advance by the designer of the computer 100.
- the Huffman code tree construction method is described in the document “TS Han and K. Kobayashi,“ Mathematics of Information and Coding, ”American Mathematical Society, pp. 99-105, 2002”, “h”, etc.
- the tree construction method is described in the document “T.S. Han and K. Kobayashi,“ Materials of Information and Coding, ”America. n Mathematical Society, pp. 95-96, 2002 "is used. After finishing the process of S932, it moves to the process of S941 mentioned later.
- J increases monotonically and is always greater than m, and thus the generalized hierarchical tree for all attributes can be constructed.
- step S941 An example of a method for updating the node frequency information of the generalized hierarchical tree performed by the generalized hierarchical tree automatic generation unit 121 in step S941 will be described with reference to FIG. That is, the generalized hierarchical tree automatic generation unit 121 performs each process of FIG.
- FIG. 10A shows a large flow of node frequency update of the generalized hierarchical tree, and the routine of FIG. 10B is used recursively internally.
- Step S1001 is exactly the same as S921.
- the acquired frequency information is assigned to the corresponding leaf of the generalized hierarchical tree of the jth attribute (S1002). Specifically, the frequency acquired in S1001 is substituted into the frequency 5215 of the corresponding leaf data structure. Do this for all leaves.
- the routine in FIG. 10B is executed with the root of the jth generalized hierarchical tree as an argument (S1003).
- the routine in FIG. 10B will be described.
- the routine of FIG. 10B takes a node as an argument.
- all the children of the argument nodes are acquired, and this total is set to p (S1004).
- the p child nodes are referred to as 0th, 1st,..., P ⁇ 1th children.
- the child pointer list 5213 of the node data structure is acquired. The total number of elements stored in this list is p.
- the frequency can be set for all nodes.
- the minimum equivalence number 301 defined by the minimum equivalence number information 133 is represented by k.
- the personal information table 131 and the generalized hierarchical tree table 135 are loaded onto the memory (S1101). Specifically, the generalized hierarchical tree table 135 is managed on the memory using the data structure 521 described above. As described above, when the generalized hierarchical tree automatic generation S801 and re-encoding S802 are performed at different timings, and the generalized hierarchical tree is modified, the generalized hierarchical tree automatic is performed in this step.
- the generation unit 121 needs to update the frequency information of the generalized hierarchical tree using the method of FIG.
- an empty list v for storing nodes is prepared (S1102), and 0 is substituted for j (S1103).
- a node v is stored in the list v prepared in S1102, and each stored element e indicates that it is a candidate for re-encoding a label of a child of e into a label of e, and is dynamically changed in the process of S802. It will change.
- step S1104 If it is determined in step S1104 that j is greater than or equal to m, it is determined whether there are k or more all attribute data tuples appearing in the personal information table on the memory (S1107). Specifically, a data structure such as map is prepared, and when all attribute data tuples indicated by records exist in the map key set, the count stored in the value is incremented by one. If all attribute data tuples do not exist in the key set, 1 is assigned as the value to the key. The above is performed for all the records, and it is sufficient to determine whether or not all the values stored in the map are k or more.
- the loop in S1108 is processed. The loop is performed for all elements w in v.
- the loss information amount measuring unit 123 calculates the loss information amount when the attribute values of all the records having the node labels of the children of w as data are re-encoded into the w label (S1109). ). A method for calculating the loss information amount will be described later.
- the re-encoding result on the memory is written in the anonymous information table 132 (S1113). Finish the process.
- variable I for storing the amount of loss information to be obtained last is initialized to 0 (S1201).
- the loop S1202 is a loop for all children c of the node w.
- the loss information amount i is calculated when one record having the c label as data is re-encoded into the w label (S1203). A method for calculating the loss information amount will be described later.
- count (c) * i is added to I (S1204).
- count (c) means the total number of records having the label c as data in the personal information table on the memory, and the operation “*” means multiplication on a real number. Specifically, count (c) can be obtained by referring to the node frequency 5215.
- the loss information amount measuring unit 123 performs each process of FIG. The method described here is not necessarily used.
- count (c) means the total number of records having the label c as data in the personal information table on the memory, as described above.
- c and w do not necessarily have a parent-child relationship.
- the feature of the computer 100 is that it includes a method for automatically forming a generalized hierarchical tree and a method for measuring a loss information amount.
- the Hu-Tucker code tree, the Huffman code tree, and the Shannon-Fano code tree are trees in which attribute values with a low frequency are arranged deeper and attribute values with a high frequency are arranged shallower as described above. For this reason, in re-encoding, in order to increase the possibility that the infrequent ones are re-encoded to the same label, it is possible to generate highly useful anonymous data avoiding excessive re-encoding. Further, if the above-described code tree is used as a generalized hierarchical tree, it is possible to reduce the amount of loss information at the time of re-encoding. ⁇ Example 2> Next, Example 2 will be described.
- Example 2 improves the usefulness of data.
- the same reference numerals are given to the same components as those in the first embodiment, and the description thereof is omitted.
- the operation of the second embodiment is almost the same as that of the first embodiment.
- the same reference numerals are given to the operations of this part, and the description is omitted.
- the storage 103 of the computer 100 has a program 1331 instead of the program 151.
- the program 1331 is loaded onto the memory, and the CPU 101 implements a pseudo personal information generation unit 1321 in addition to the units 121, 122, and 123 of the first embodiment.
- a generation information table 1332 is stored on the storage as a storage destination of the processing result of the program 1331.
- the generation information table 1332 is almost the same as the anonymous information table 132 as shown in FIG. The difference is that in the occurrence information table 1332, each attribute information takes a value corresponding to the leaf of the generalized hierarchical tree of that attribute. More specifically, it is re-encoded as a leaf attribute value corresponding to the descendant of the node of the generalized hierarchical tree corresponding to the label stored in the anonymous information table 132.
- step S801 in which the generalized hierarchical tree automatic generation unit 121 automatically generates a generalized hierarchical tree and step S802 in which the reencoding unit 122 performs reencoding are completely equivalent to those in the first embodiment.
- the pseudo personal information generation unit 1321 executes the pseudo personal information generation step S1501. Note that, similarly to the relationship between S801 and S802 described in the first embodiment, it is not necessary to continue to execute S1501, and the timing of processing may be different.
- the anonymous information table 132 and the generalized hierarchical tree table 135 are acquired on the memory (S1601). After acquisition, the following is performed for the loop (S1602) for all records r and the loop for all attributes of record r (S1603) as an inner loop. However, the attribute currently being processed is the jth attribute.
- node of the generalized hierarchical tree the attribute value of the jth attribute of the record r corresponds, and the node is set to w (S1604).
- all the nodes corresponding to the descendants of w are enumerated and set as c1, c2,..., Cn (S1605).
- a search method such as breadth-first search may be used from w, or after searching once, this search result may be stored in association with a node and reused.
- the j-th attribute of the record r is a label of w, and this is replaced with a label of one leaf of the generalized hierarchical tree by the following method (S1606).
- C1, c2, c3,..., Cn are randomly generated with the same probability and replaced with the node labels of the generation results.
- the feature of the computer 100 configured in the second embodiment is that an application using data is not selected because the attribute value of the generation information table 1332 takes the same set value as the attribute value of the original personal information table 131. .
- the attribute value of the generation information table 1332 takes the same set value as the attribute value of the original personal information table 131.
- this data is re-encoded as “10-19 years old”, it cannot be expressed as an integer type and cannot be used for any application.
- this is replaced with the age between “10-19 years” using the frequency information. For example, “14 years old” is replaced. Therefore, it can be expressed as an integer type and can be used for any application that was available in the original personal information.
- it can be expected that the distribution of each attribute of the generation information table 1332 is close to the distribution of the original personal information table 131.
- the description includes a step of configuring the anonymous information table 132.
- the anonymous information table 132 is configured in advance, and only the pseudo personal information generating unit 1331 is configured later. There is also a way to do it.
- the system can be configured only by the anonymous information table 132, the generalized hierarchical tree table 135, and the pseudo personal information generating unit 1331. This makes it possible to construct a system that can be used simply by depositing only anonymous information and a generalized hierarchical tree outside the company, and it is a highly anonymous system in the sense that it is not necessary to deposit personal information.
- Example 3 Next, Example 3 will be described.
- Example 3 improves the usefulness of data by using the classification of attribute values desired by the user.
- classifications in various fields such as international disease classification, book classification, and patent classification.
- classifications such as teens and 20s in terms of age.
- the third embodiment automatically generates a generalized hierarchical tree in consideration of the classification desired by the user by previously defining only a hierarchical structure that the user desires for the generalized hierarchical tree. For example, by defining the age classification in advance as “20 to 24 years old” and “25 to 29 years old”, the data is not in the form desired by the user, such as “24 to 27 years old”. Prevent re-encoding.
- a generalized hierarchical tree when a generalized hierarchical tree is configured, addition of nodes in a form that does not deviate from the user-defined hierarchical tree is permitted. For example, when a user defines a classification of “20 to 24 years”, it is permitted to configure a node such as “20 to 22 years” as a child of the node “20 to 24 years”. In addition, when the user has defined “*” that includes all attribute values as the parent of “20 to 24 years old”, the node “20 to 29 years old” as the parent of “20 to 24 years old” Allow new additions. By allowing the addition of hierarchies in a form that does not deviate from the user-defined hierarchy tree, it becomes possible to output more detailed anonymized data while utilizing the classification desired by the user.
- the storage 103 of the computer 100 stores a personal information table 131, an anonymous information table 132, minimum equivalence number information 133, attribute type information 134, a generalized hierarchical tree table 135, a program 1731, and a user-defined hierarchical tree table 1732. To do.
- the CPU 101 loads the program 1731 on the memory 102 and realizes a generalized hierarchical tree automatic generation unit 1721 and a re-encoding unit 122 based on the user-defined hierarchical tree.
- the re-encoding unit 122 realizes a loss information amount measuring unit 123 as internal processing as necessary.
- the user-defined hierarchical tree table 1732 stores what defines the classification that the user desires for an arbitrary attribute.
- the user does not need to define a user-defined hierarchical tree for all attributes to be anonymized, but only defines an attribute for which classification is desired. Further, as described above, the user only needs to define the desired classification in each attribute, and it is not necessary to define all the hierarchies.
- classifications such as “25 to 38 years old” and “20 to 33 years old”, “ ⁇ Yokohama City, Kanagawa Prefecture, Kawasaki City, Kanagawa Prefecture ⁇ ” and “ ⁇ Yokohama City, Kanagawa Prefecture, Fujisawa City, Kanagawa Prefecture ⁇ ” Do not define a classification such as
- FIG. 18 shows an example of the user-defined hierarchical tree table 1732.
- FIG. 18A a conceptual diagram of a user-defined hierarchical tree for the attribute “age” is shown, and with reference to FIG. 18B, a method for storing a user-defined hierarchical tree in the storage is shown.
- a method of managing the generalized hierarchical tree on the memory will be described with reference to 18 (c).
- FIG. 18A shows an example of a user-defined hierarchical tree for the attribute “age”.
- the user-defined hierarchical tree is represented as a tree structure composed of a plurality of nodes and branches. Each node is associated with a label representing the classification desired by the user.
- FIG. 18B shows an example of a method for saving the user-defined hierarchical tree on the storage. It is stored on the storage using a relational database or the like.
- An example of storing the user-defined hierarchical tree as a table on the relational database is a table 1732-b.
- the first line 1811 of the table shows the label of each column, and each record in the second and subsequent lines corresponds to one node.
- FIG. 18C shows a method of managing the user-defined hierarchical tree 1732-a on the memory.
- a data structure 1821 is an example of a data structure for managing the node 1801.
- the data structure includes a pointer 18211, a pointer 18212 pointing to the parent, a child pointer list 18213, a label 18214 of the node, and frequency information 18215.
- FIG. 19 (a-1) is an example of a user-defined hierarchical tree of the character string processing type attribute “address”
- FIG. 19 (a-2) is a user using the data illustrated in FIG. 5 (a-2). It is an example which comprised the generalized hierarchy tree based on a definition hierarchy tree.
- the user can define a classification other than the classification extracted from the character string. For example, “Kanagawa Prefecture” can be classified into “ ⁇ Yokohama City, Kanagawa Prefecture, Kawasaki City, Kanagawa Prefecture ⁇ ”, “Kanagawa Prefecture Others”, and the like.
- the nodes " ⁇ Yokohama City, Kanagawa Prefecture, Kawasaki City, Kanagawa Prefecture ⁇ ” list the labels of the nodes that are children, and have "Yokohama City, Kanagawa Prefecture” and "Kawasaki City, Kanagawa Prefecture” as children. Therefore, it is not necessary to define “Yokohama City, Kanagawa Prefecture” and “Kawasaki City, Kanagawa Prefecture” as children, but it is not clear what kind of node “Kanagawa Prefecture Others” has as a child. Need to define a child of “Kanagawa Other”.
- FIG. 19 (b-1) shows an example of a user-defined hierarchical tree of the order preservation type attribute “age”
- FIG. 19 (b-2) shows a user definition using the data illustrated in FIG. 5 (b-2).
- An example in which a generalized hierarchical tree based on a hierarchical tree is configured is shown.
- the node label indicates a range of values, and thus it is not necessary to define a child for the node.
- FIG. 19 (c-1) shows an example of a user-defined hierarchical tree of other attributes “nationality”
- FIG. 19 (c-2) shows a user-defined hierarchical tree using the data illustrated in FIG. 5 (c-2).
- An example in which a generalized hierarchical tree based on is constructed is shown. Attributes with an attribute type of “Other”, like the “String processing type” attribute, do not need to define a child when the label of each node enumerates the nodes of the child. In the case of an abstract name such as “”, it is necessary to define what kind of node the child has.
- the portion surrounded by a dotted line indicates a node that does not need to be re-encoded.
- a node with a frequency of 0 that is, an attribute value that is classified into nodes in a user-defined hierarchical tree that is classified as a node does not exist in the personal information data
- the node is not necessary for the re-encoding process. Therefore, a node with a frequency of 0 may be deleted from the generalized hierarchical tree.
- a node whose node frequency does not differ from the child frequency of the node, that is, a node having only one child whose frequency is not 0 is not necessary for the re-encoding process. Therefore, a node having only one child with a frequency other than 0 may be deleted from the generalized hierarchical tree, and the child of the node and the parent of the node may have a parent-child relationship.
- the generalized hierarchical tree automatic generation unit 1721 based on the user-defined hierarchical tree refers to the personal information table 131, the attribute type information 134, and the user-defined hierarchical tree table 1732, and performs automatic generation of the generalized hierarchical tree.
- the generalized hierarchical tree table 135 S2001.
- the re-encoding unit 122 re-encodes the data and stores the result in the anonymous information table 132 (S802).
- Step S802 is equivalent to that in the first embodiment. Similar to the relationship between S801 and S802 described in the first embodiment, S2001 and S802 do not need to be executed continuously, and the timing of processing may be different.
- the personal information table 131 and the user-defined hierarchical tree table 1732 are read into the memory 102 (S2101). At this time, it is confirmed whether there is any overlap in the classification defined by the user-defined hierarchical tree. Specifically, it is checked that the descendants of each node do not overlap at a plurality of nodes that do not have an ancestor-descendant relationship among the nodes constituting the user-defined hierarchical tree. If there is an overlap, the process ends.
- S902 and S903 are equivalent to those of the first embodiment.
- S2102 it is determined whether or not a user-defined hierarchical tree in the jth attribute exists. If the user-defined hierarchical tree does not exist, the process proceeds to S2103. If the user-defined hierarchical tree exists, the process proceeds to S2104. Details of S2103 and S2104 will be described later. After the process of S2103 or S2104 is completed, the process proceeds to S943.
- S2103 The process of S2103 will be described with reference to FIG. S2103 is processing when a generalized hierarchical tree is configured using only information in the personal information table without using a user-defined hierarchical tree. That is, all the processes in S2103 are equivalent to the processes described in the first embodiment.
- S904 and S905 The processing of S904 and S905 is as described above.
- the attribute type of the attribute is “character string processing type”
- the process proceeds to S2311
- if “order saving type” the process proceeds to S2321, and if “other”, the process proceeds to S2331. Details of S2311, S2321, and S2331 will be described later.
- the process of S2311, S2321, or 2331 is completed, the process proceeds to S942.
- y is the number of hierarchies of the deepest hierarchy of the user-defined hierarchy tree 1732. “*” Including all attribute values is defined as layer 0, and is called layer 1, layer 2,.
- S911 is equivalent to that of the first embodiment.
- step S2401 the parameter x is initialized to y.
- a list z listing all nodes of the hierarchy x is prepared using the user-defined hierarchical tree of the jth attribute.
- one node is selected from the list z, and the selected node is deleted from the list z.
- nodes that are descendants of the node selected in S2405 are listed. Specifically, when a node having no children is selected in the user-defined hierarchical tree 1732 in S2405, attribute values that are descendants of the node are listed using the attribute value information acquired in S911. For example, when the node “Kawasaki City, Kanagawa” is selected, attribute values including the character string “Kawasaki City, Kanagawa” are listed. If nodes having children are selected in the user-defined hierarchy tree 1732 in S2405, nodes defined as children of the nodes in the user-defined hierarchy 1732 are listed.
- the character string processing designated from the nodes listed in S2411 is performed, the inclusion relation is extracted, and a tree rooted at the node selected in S2405 is constructed based on the inclusion relation.
- the tree construction method depends on various known character string processing techniques as in the first embodiment.
- the constructed tree becomes a part of the generalized hierarchical tree based on the user-defined hierarchical tree.
- the user-defined hierarchical tree is updated using the constructed tree.
- the generalized hierarchical tree based on the user-defined hierarchy is configured for the attribute whose attribute type is “character string processing type”.
- frequency information of a node that becomes a leaf of a subtree to be updated is acquired.
- the frequency information update target subtree refers to the tree configured in S2412, and the nodes that are the leaves of the subtree indicate all the nodes listed in S2411.
- the frequency information acquired in S2501 is assigned to the corresponding leaf.
- frequency information of the node that is a descendant of the node selected in S2405 is acquired. Specifically, when a node having no children is selected in the user-defined hierarchical tree 1732 in S2405, the attribute value frequency information that is a descendant of the node is acquired using the attribute value information acquired in S921. If a node having a child is selected in the user-defined hierarchy tree 1732 in S2405, node frequency information defined as a child of the node in the user-defined hierarchy 1732 is acquired. For example, when the node “20 to 24 years old” is selected in the user-defined hierarchical tree 1732, the attribute values are “20 years old”, “21 years old”, “22 years old”, “23 years old”, and “24 years old”. Get frequency information.
- a Hu-Tucker code tree rooted at the node selected in S2405 is constructed using the frequency information acquired in S2421.
- the user-defined hierarchical tree is updated using the constructed tree.
- the generalized hierarchical tree based on the user-defined hierarchy is configured for the attribute whose attribute type is “order preservation type”.
- the frequency information of the node attribute value that is a descendant of the node selected in S2405 is acquired. Specifically, when a node having no children is selected in the user-defined hierarchical tree 1732 in S2405, the attribute value frequency information that is a descendant of the node is acquired using the attribute value information acquired in S931. If a node having a child is selected in the user-defined hierarchy tree 1732 in S2405, node frequency information defined as a child of the node in the user-defined hierarchy 1732 is acquired. For example, when “Europe” is selected in the user-defined hierarchical tree 1732, frequency information of “UK”, “France”, and “Germany” is acquired.
- a Huffman code tree or a Shannon-Fano code tree is constructed using the frequency information acquired in S2431.
- which code tree is used is determined in advance by the designer of the computer 100.
- the user-defined hierarchical tree is updated using the constructed tree.
- the generalized hierarchical tree based on the user-defined hierarchy is configured for the attribute whose attribute type is “other”.
- a feature of the computer 100 configured in the third embodiment is that a generalized hierarchical tree considering a classification desired by the user by defining a part of attributes having a classification desired by the user and a part of a hierarchy as a user-defined hierarchical tree. Is automatically generated. In addition, by automatically generating a generalized hierarchical tree using frequency information, the data is anonymized with a small amount of loss information.
- 100 Computer, 101: CPU, 102: Memory, 121: Generalized hierarchical tree automatic generation unit, 122: Re-encoding unit, 123: Loss information amount measurement unit, 103: Storage, 131: Personal information table, 132: Anonymous Information table, 133: Minimum equivalence number information, 134: Attribute type information, 135: Generalized hierarchical tree table, 151: Program, 104: Input device, 105: Output device, 106: Communication device, 107: Internal communication line, 1321 : Pseudo personal information generation unit, 1331: program, 1332: generation information table, 1721: generalized hierarchical tree generation unit based on user-defined hierarchical tree, 1731: program, 1732: user-defined hierarchical tree table.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
<実施例1>
図1を参照し、実施例1の技術を実現する装置の構成例を説明する。
n Mathematical Society, pp.95‐96, 2002」に記載の方法等を用いる。S932の処理を終えた後は、後述するS941の処理に移る。
<実施例2>
次に実施例2について説明する。
<実施例3>
次に実施例3について説明する。
S2311またはS2321または2331の処理を終えたら、S942に進む。
Claims (19)
- 属性毎の属性値からなる個人情報を1件以上格納する個人情報格納手段と、
属性を1つ選択し、該選択した属性に出現する属性値毎に該属性値を持つ該入力個人情報の件数をカウントする頻度取得手段を用いて、該属性の各々に対して該入力個人情報に出現する各属性値の上位概念をその曖昧度によって木構造で表現する一般化階層木を自動構成する一般化階層木自動生成手段と、
該一般化階層木自動生成手段を用いて該属性の各々に対して生成された一般化階層木を用いて該入力個人情報を再符号化する手段と、を有する
ことを特徴とする個人情報匿名化装置。 - 請求項1に記載の個人情報匿名化装置であって、
該再符号化手段が、1件の個人情報の1つの属性値を該自動生成された一般化階層木を用いて曖昧化する際に損失する情報量を計量する損失情報量計量手段を備える
ことを特徴とする個人情報匿名化装置。 - 請求項2に記載の個人情報匿名化装置であって、
該損失情報量計量手段が、該自動生成された一般化階層木の各節点の出現頻度を、葉であれば該葉に示される属性値が出現する個人情報の件数としてカウントし、内部節点であれば外節点の子孫でありかつ葉である節点の頻度の総和としてカウントする節点頻度取得手段を備え、1件の該入力個人情報の各属性値を該属性値に対応する節点aをその祖先である節点bに曖昧化するときの損失情報量を-log(aの頻度/bの頻度)で計量する
ことを特徴とする個人情報匿名化装置。 - 請求項3に記載の個人情報匿名化装置であって、
該再符号化手段を用いて生成された各匿名情報の属性値それぞれに対し、該節点頻度取得手段を用いて取得した節点頻度を利用して該属性値が該一般化階層木の節点aであるとき該属性値の子孫である1つ以上の葉に対して、該属性値を該葉cの属性値にcの頻度/aの頻度の確率で置き換えたものを出力する手段を備える
ことを特徴とする個人情報匿名化装置。 - 請求項1または2に記載の個人情報匿名化装置であって、
該一般化階層木の各節点の出現頻度を、葉であれば該葉に示される属性値が出現する該入力個人情報の件数としてカウントし、内部節点であれば外節点の子孫でかつ葉である節点の頻度の総和としてカウントする節点頻度取得手段を用いて、該再符号化手段を用いて生成された各匿名情報の属性値それぞれに対し、該属性値が該一般化階層木の節点aであるとき該属性値の子孫である1つ以上の葉に対して、該属性値を該葉cの属性値にcの頻度/aの頻度の確率で置き換えたものを出力する手段を備える
ことを特徴とする個人情報匿名化装置。 - 請求項1から5のいずれか一に記載の個人情報匿名化装置であって、
該一般化階層木自動生成手段が、該頻度取得手段によって取得した頻度を用いて、Huffman符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置。 - 請求項1から5のいずれか一に記載の個人情報匿名化装置であって、
該一般化階層木自動生成手段が、該頻度取得手段によって取得した頻度を用いてShannon-Fano符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置 - 請求項1から5のいずれか一に記載の個人情報匿名化装置であって、
該一般化階層木自動生成手段が、該頻度取得手段によって取得した頻度と該属性の属性値に予め定義される順序情報を用いて、Hu-Tucker符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置。 - 請求項1から5のいずれか一に記載の個人情報匿名化装置であって、
該自動構成した一般化階層木を記憶する手段を有する
ことを特徴とする個人情報匿名化装置。 - 属性毎の属性値からなる1件以上の個人情報と、該属性の各々に対して該1件以上の個人情報に出現する各属性値の上位概念をその曖昧度によって木構造で表現する一般化階層木を入力とし、
1件の個人情報の1つの属性値を該自動生成された一般化階層木を用いて曖昧化する際に損失する情報量を計量する損失情報量計量手段と、
該損失情報量計量手段と該一般化階層木を用いて、該入力個人情報の各属性値を該属性値の示す節点の祖先である節点に曖昧化することで該入力個人情報を再符号化する手段と、を有する
ことを特徴とする個人情報匿名化装置。 - 属性の各々に対して属性値の上位概念をその曖昧度によって木構造で表現する一般化階層木を記憶する一般化階層木と、
該一般化階層木を用いて1件以上の個人情報が匿名化された匿名情報と、
各属性の属性値毎に該属性値が出現する該個人情報の件数と、を入力とし、
該一般化階層木の各節点の出現頻度を、葉であれば該葉に示される属性値が出現する該元の個人情報の件数としてカウントし、
内部節点であれば外節点の子孫でかつ葉である節点の頻度の総和としてカウントする節点頻度取得手段を用いて、
該入力の各匿名情報の該各属性の属性値それぞれに対し、該属性値が該一般化階層木の節点aであるとき該属性値の子孫である1つ以上の葉に対して、該属性値を該葉cの属性値にcの頻度/aの頻度の確率で置き換えたものを出力する
ことを特徴とする個人情報匿名化装置。 - 請求項1から5および請求項9に記載の個人情報匿名化装置であって、
属性の一般化階層木の節点の一部を定義したユーザ定義階層木を格納するユーザ定義階層木格納手段と、該ユーザ定義階層木と該頻度取得手段によって取得した頻度とを用いて一般化階層木を自動生成するユーザ定義階層木に基づく一般化階層木自動生成手段を備える
ことを特徴とする個人情報匿名化装置 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木に基づく一般化階層木自動生成手段が、該ユーザ定義階層木と該頻度取得手段によって取得した頻度を用いて、Huffman符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置。 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木に基づく一般化階層木自動生成手段が、該ユーザ定義階層木と該頻度取得手段によって取得した頻度を用いてShannon-Fano符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木に基づく一般化階層木自動生成手段が、該ユーザ定義階層木と該頻度取得手段によって取得した頻度と該属性の属性値に予め定義される順序情報を用いて、Hu-Tucker符号木を生成することで達成する
ことを特徴とする個人情報匿名化装置。 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木に基づく一般化階層木生成手段は、該ユーザ定義階層木を構成する節点のうち祖先と子孫の関係にない2つ以上の節点において、各節点の子孫が重複しないことをチェックする
ことを特徴とする個人情報匿名化装置。 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木の節点が、該節点の全ての子のラベルを列挙したラベルを持つ
ことを特徴とする個人情報匿名化装置。 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木が、子となりうる節点が自明でない抽象的な名称のラベルを持つ節点と、該抽象的な名称のラベルを持つ節点を親とする節点から構成される
ことを特徴とする個人情報匿名化装置。 - 請求項12に記載の個人情報匿名化装置であって、
該ユーザ定義階層木の節点が、該節点の子孫となる属性値の範囲を表すラベルを持ち、該範囲は該節点と祖先または子孫の関係にない節点の範囲と重複しない
ことを特徴とする個人情報匿名化装置。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012515790A JP5492296B2 (ja) | 2010-05-19 | 2011-04-05 | 個人情報匿名化装置 |
US13/697,904 US20130138698A1 (en) | 2010-05-19 | 2011-04-05 | Identity information de-identification device |
DK11783338.4T DK2573699T3 (en) | 2010-05-19 | 2011-04-05 | Device for de-identification of identity information |
EP11783338.4A EP2573699B1 (en) | 2010-05-19 | 2011-04-05 | Identity information de-identification device |
CN201180024517.9A CN102893553B (zh) | 2010-05-19 | 2011-04-05 | 个人信息匿名化装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010114885 | 2010-05-19 | ||
JP2010-114885 | 2010-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011145401A1 true WO2011145401A1 (ja) | 2011-11-24 |
Family
ID=44991517
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/058590 WO2011145401A1 (ja) | 2010-05-19 | 2011-04-05 | 個人情報匿名化装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20130138698A1 (ja) |
EP (1) | EP2573699B1 (ja) |
JP (1) | JP5492296B2 (ja) |
CN (1) | CN102893553B (ja) |
DK (1) | DK2573699T3 (ja) |
WO (1) | WO2011145401A1 (ja) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013190838A (ja) * | 2012-03-12 | 2013-09-26 | Nippon Telegraph & Telephone West Corp | 情報匿名化システム、情報損失判定方法、及び情報損失判定プログラム |
JP2013200659A (ja) * | 2012-03-23 | 2013-10-03 | Nippon Telegraph & Telephone West Corp | 属性選択装置、情報匿名化装置、属性選択方法、情報匿名化方法、属性選択プログラム、及び情報匿名化プログラム |
WO2013183250A1 (ja) * | 2012-06-04 | 2013-12-12 | 日本電気株式会社 | 匿名化を行う情報処理装置及び匿名化方法 |
WO2014006851A1 (ja) * | 2012-07-02 | 2014-01-09 | 日本電気株式会社 | 匿名化装置、匿名化システム、匿名化方法、及び、プログラム記録媒体 |
WO2014030302A1 (ja) * | 2012-08-20 | 2014-02-27 | 日本電気株式会社 | 匿名化を実行する情報処理装置及び匿名化処理方法 |
EP2728508A1 (en) * | 2012-10-31 | 2014-05-07 | Tata Consultancy Services Limited | Dynamic data masking |
WO2014136422A1 (ja) * | 2013-03-06 | 2014-09-12 | 日本電気株式会社 | 匿名化処理を行う情報処理装置及び匿名化方法 |
WO2014150250A2 (en) * | 2013-03-15 | 2014-09-25 | PathAR, LLC | Method and apparatus for substitution scheme for anonymizing personally identifiable information |
JP2015046030A (ja) * | 2013-08-28 | 2015-03-12 | 株式会社日立ソリューションズ | パーソナル情報匿名化システム |
JP2015049648A (ja) * | 2013-08-30 | 2015-03-16 | ニフティ株式会社 | 匿名化システム |
JP2015125646A (ja) * | 2013-12-26 | 2015-07-06 | ニフティ株式会社 | 匿名化システム |
JP2016009338A (ja) * | 2014-06-24 | 2016-01-18 | 株式会社日立ソリューションズ | パーソナル情報管理システム及びパーソナル情報匿名化装置 |
JP2016031567A (ja) * | 2014-07-28 | 2016-03-07 | 株式会社日立ソリューションズ | パーソナル情報匿名化装置 |
JP2016224772A (ja) * | 2015-06-01 | 2016-12-28 | ニフティ株式会社 | 検出方法、検出装置及び検出プログラム |
JP2017041212A (ja) * | 2015-08-21 | 2017-02-23 | ニフティ株式会社 | 匿名処理装置及び匿名処理方法 |
US9600673B2 (en) | 2012-12-27 | 2017-03-21 | Industrial Technology Research Institute | Method and device for risk evaluation |
WO2017103970A1 (ja) * | 2015-12-14 | 2017-06-22 | 株式会社日立製作所 | データ加工システム及びデータ加工方法 |
WO2018004236A1 (ko) * | 2016-06-30 | 2018-01-04 | 주식회사 파수닷컴 | 개인정보의 비식별화 방법 및 장치 |
JP2019512128A (ja) * | 2016-02-22 | 2019-05-09 | タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited | データの秘匿性−実用性間のトレードオフを算出するためのシステムおよび方法 |
WO2020179606A1 (ja) * | 2019-03-05 | 2020-09-10 | 日本電信電話株式会社 | 一般化階層集合生成装置、一般化階層集合生成方法、プログラム |
JP2021503648A (ja) * | 2017-11-17 | 2021-02-12 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | データ匿名化のためのコンピュータ実施方法、コンピュータ・プログラム製品、およびシステム |
US20220300716A1 (en) * | 2021-03-18 | 2022-09-22 | Hcl Technologies Limited | System and method for designing artificial intelligence (ai) based hierarchical multi-conversation system |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9047488B2 (en) * | 2013-03-15 | 2015-06-02 | International Business Machines Corporation | Anonymizing sensitive identifying information based on relational context across a group |
JP2014229039A (ja) | 2013-05-22 | 2014-12-08 | 株式会社日立製作所 | プライバシ保護型データ提供システム |
US8978153B1 (en) * | 2014-08-01 | 2015-03-10 | Datalogix, Inc. | Apparatus and method for data matching and anonymization |
US10095883B2 (en) * | 2016-07-22 | 2018-10-09 | International Business Machines Corporation | Method/system for the online identification and blocking of privacy vulnerabilities in data streams |
US10430609B2 (en) * | 2016-09-23 | 2019-10-01 | International Business Machines Corporation | Low privacy risk and high clarity social media support system |
US10762139B1 (en) * | 2016-09-29 | 2020-09-01 | EMC IP Holding Company LLC | Method and system for managing a document search index |
US10628384B2 (en) * | 2016-12-09 | 2020-04-21 | Salesforce.Com, Inc. | Optimized match keys for fields with prefix structure |
JP6828181B2 (ja) * | 2017-10-11 | 2021-02-10 | 日本電信電話株式会社 | k−匿名化装置、方法及びプログラム |
US10528761B2 (en) * | 2017-10-26 | 2020-01-07 | Sap Se | Data anonymization in an in-memory database |
FR3077894B1 (fr) | 2018-02-13 | 2021-10-29 | Digital & Ethics | Procede de traitement automatique pour l’anonymisation d’un jeu de donnees numeriques |
US10810324B2 (en) * | 2018-04-20 | 2020-10-20 | At&T Intellectual Property I, L.P. | Methods, systems and algorithms for providing anonymization |
US10715394B2 (en) | 2018-10-29 | 2020-07-14 | Sap Portals Israel Ltd. | Data aggregation based on a heirarchical tree |
WO2020175305A1 (ja) * | 2019-02-26 | 2020-09-03 | 日本電信電話株式会社 | 匿名性評価装置、匿名性評価方法、プログラム |
WO2020235016A1 (ja) * | 2019-05-21 | 2020-11-26 | 日本電信電話株式会社 | 情報処理装置、情報処理方法及びプログラム |
US11360990B2 (en) | 2019-06-21 | 2022-06-14 | Salesforce.Com, Inc. | Method and a system for fuzzy matching of entities in a database system based on machine learning |
JP7377664B2 (ja) * | 2019-10-01 | 2023-11-10 | 株式会社日立製作所 | データベース管理システム、および、データベース処理方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002092367A (ja) * | 2000-09-11 | 2002-03-29 | Fujitsu Ltd | コンピュータネットワークを使用した問い合わせ方法 |
JP2007172404A (ja) * | 2005-12-22 | 2007-07-05 | Internatl Business Mach Corp <Ibm> | 文字列処理方法、装置、およびプログラム |
JP2008217425A (ja) * | 2007-03-05 | 2008-09-18 | Hitachi Ltd | 情報出力装置、情報出力方法、及び、情報出力プログラム |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7024409B2 (en) * | 2002-04-16 | 2006-04-04 | International Business Machines Corporation | System and method for transforming data to preserve privacy where the data transform module suppresses the subset of the collection of data according to the privacy constraint |
US8200775B2 (en) * | 2005-02-01 | 2012-06-12 | Newsilike Media Group, Inc | Enhanced syndication |
US7365658B2 (en) * | 2006-02-28 | 2008-04-29 | The Board Of Trustees Of The University Of Arkansas | Method and apparatus for lossless run-length data encoding |
US20090006399A1 (en) * | 2007-06-29 | 2009-01-01 | International Business Machines Corporation | Compression method for relational tables based on combined column and row coding |
US20090303237A1 (en) * | 2008-06-06 | 2009-12-10 | International Business Machines Corporation | Algorithms for identity anonymization on graphs |
CN101350033B (zh) * | 2008-09-05 | 2011-10-26 | 北京邮电大学 | 一种owl信息到关系数据库的转换方法及装置 |
CA2679800A1 (en) * | 2008-09-22 | 2010-03-22 | University Of Ottawa | Re-identification risk in de-identified databases containing personal information |
US8112422B2 (en) * | 2008-10-27 | 2012-02-07 | At&T Intellectual Property I, L.P. | Computer systems, methods and computer program products for data anonymization for aggregate query answering |
US8209342B2 (en) * | 2008-10-31 | 2012-06-26 | At&T Intellectual Property I, Lp | Systems and associated computer program products that disguise partitioned data structures using transformations having targeted distributions |
CA2690788C (en) * | 2009-06-25 | 2018-04-24 | University Of Ottawa | System and method for optimizing the de-identification of datasets |
-
2011
- 2011-04-05 US US13/697,904 patent/US20130138698A1/en not_active Abandoned
- 2011-04-05 EP EP11783338.4A patent/EP2573699B1/en active Active
- 2011-04-05 CN CN201180024517.9A patent/CN102893553B/zh active Active
- 2011-04-05 WO PCT/JP2011/058590 patent/WO2011145401A1/ja active Application Filing
- 2011-04-05 DK DK11783338.4T patent/DK2573699T3/en active
- 2011-04-05 JP JP2012515790A patent/JP5492296B2/ja active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002092367A (ja) * | 2000-09-11 | 2002-03-29 | Fujitsu Ltd | コンピュータネットワークを使用した問い合わせ方法 |
JP2007172404A (ja) * | 2005-12-22 | 2007-07-05 | Internatl Business Mach Corp <Ibm> | 文字列処理方法、装置、およびプログラム |
JP2008217425A (ja) * | 2007-03-05 | 2008-09-18 | Hitachi Ltd | 情報出力装置、情報出力方法、及び、情報出力プログラム |
Non-Patent Citations (4)
Title |
---|
D. E. KNUTH: "The Art of Computer Programming: Volume 3 Sorting and Searching", vol. 3, 1973, ADDISON-WESLEY, pages: 439 - 444 |
K. LEFEVRE; D. J. DEWITT; R. RAMAKRISHNAN: "Incognito: Efficient Full-Domain K-Anonymity", 2005 ACM SIGMOD INTERNATIONAL CONF. MANAGEMENT OF DATA, 2005, pages 49 - 60, XP002734215 |
T. S. HAN; K. KOBAYASHI: "Mathematics of Information and Coding", AMERICAN MATHEMATICAL SOCIETY, 2002, pages 95 - 96 |
T. S. HAN; K. KOBAYASHI: "Mathematics of Information and Coding", AMERICAN MATHEMATICAL SOCIETY, 2002, pages 99 - 105 |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013190838A (ja) * | 2012-03-12 | 2013-09-26 | Nippon Telegraph & Telephone West Corp | 情報匿名化システム、情報損失判定方法、及び情報損失判定プログラム |
JP2013200659A (ja) * | 2012-03-23 | 2013-10-03 | Nippon Telegraph & Telephone West Corp | 属性選択装置、情報匿名化装置、属性選択方法、情報匿名化方法、属性選択プログラム、及び情報匿名化プログラム |
WO2013183250A1 (ja) * | 2012-06-04 | 2013-12-12 | 日本電気株式会社 | 匿名化を行う情報処理装置及び匿名化方法 |
WO2014006851A1 (ja) * | 2012-07-02 | 2014-01-09 | 日本電気株式会社 | 匿名化装置、匿名化システム、匿名化方法、及び、プログラム記録媒体 |
WO2014030302A1 (ja) * | 2012-08-20 | 2014-02-27 | 日本電気株式会社 | 匿名化を実行する情報処理装置及び匿名化処理方法 |
EP2728508A1 (en) * | 2012-10-31 | 2014-05-07 | Tata Consultancy Services Limited | Dynamic data masking |
US9600673B2 (en) | 2012-12-27 | 2017-03-21 | Industrial Technology Research Institute | Method and device for risk evaluation |
WO2014136422A1 (ja) * | 2013-03-06 | 2014-09-12 | 日本電気株式会社 | 匿名化処理を行う情報処理装置及び匿名化方法 |
WO2014150250A3 (en) * | 2013-03-15 | 2014-10-23 | PathAR, LLC | Method and apparatus for substitution scheme for anonymizing personally identifiable information |
WO2014150250A2 (en) * | 2013-03-15 | 2014-09-25 | PathAR, LLC | Method and apparatus for substitution scheme for anonymizing personally identifiable information |
AU2014237406B2 (en) * | 2013-03-15 | 2018-02-15 | Babel Street, Inc | Method and apparatus for substitution scheme for anonymizing personally identifiable information |
JP2015046030A (ja) * | 2013-08-28 | 2015-03-12 | 株式会社日立ソリューションズ | パーソナル情報匿名化システム |
JP2015049648A (ja) * | 2013-08-30 | 2015-03-16 | ニフティ株式会社 | 匿名化システム |
JP2015125646A (ja) * | 2013-12-26 | 2015-07-06 | ニフティ株式会社 | 匿名化システム |
JP2016009338A (ja) * | 2014-06-24 | 2016-01-18 | 株式会社日立ソリューションズ | パーソナル情報管理システム及びパーソナル情報匿名化装置 |
JP2016031567A (ja) * | 2014-07-28 | 2016-03-07 | 株式会社日立ソリューションズ | パーソナル情報匿名化装置 |
JP2016224772A (ja) * | 2015-06-01 | 2016-12-28 | ニフティ株式会社 | 検出方法、検出装置及び検出プログラム |
JP2017041212A (ja) * | 2015-08-21 | 2017-02-23 | ニフティ株式会社 | 匿名処理装置及び匿名処理方法 |
US11295635B2 (en) | 2015-12-14 | 2022-04-05 | Hitachi, Ltd. | Data processing system and data processing method |
WO2017103970A1 (ja) * | 2015-12-14 | 2017-06-22 | 株式会社日立製作所 | データ加工システム及びデータ加工方法 |
JPWO2017103970A1 (ja) * | 2015-12-14 | 2018-08-02 | 株式会社日立製作所 | データ加工システム及びデータ加工方法 |
JP2019512128A (ja) * | 2016-02-22 | 2019-05-09 | タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited | データの秘匿性−実用性間のトレードオフを算出するためのシステムおよび方法 |
JP2019527409A (ja) * | 2016-06-30 | 2019-09-26 | ファスー ドット コム カンパニー リミテッドFasoo. Com Co., Ltd | 個人情報の非識別化方法および装置 |
WO2018004236A1 (ko) * | 2016-06-30 | 2018-01-04 | 주식회사 파수닷컴 | 개인정보의 비식별화 방법 및 장치 |
US11354436B2 (en) | 2016-06-30 | 2022-06-07 | Fasoo.Com Co., Ltd. | Method and apparatus for de-identification of personal information |
JP7121460B2 (ja) | 2017-11-17 | 2022-08-18 | インターナショナル・ビジネス・マシーンズ・コーポレーション | データ匿名化のためのコンピュータ実施方法、コンピュータ・プログラム製品、およびシステム |
JP2021503648A (ja) * | 2017-11-17 | 2021-02-12 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | データ匿名化のためのコンピュータ実施方法、コンピュータ・プログラム製品、およびシステム |
JPWO2020179606A1 (ja) * | 2019-03-05 | 2021-12-09 | 日本電信電話株式会社 | 一般化階層集合生成装置、一般化階層集合生成方法、プログラム |
WO2020179606A1 (ja) * | 2019-03-05 | 2020-09-10 | 日本電信電話株式会社 | 一般化階層集合生成装置、一般化階層集合生成方法、プログラム |
JP7425040B2 (ja) | 2019-03-05 | 2024-01-30 | 日本電信電話株式会社 | 一般化階層集合生成装置、一般化階層集合生成方法、プログラム |
US20220300716A1 (en) * | 2021-03-18 | 2022-09-22 | Hcl Technologies Limited | System and method for designing artificial intelligence (ai) based hierarchical multi-conversation system |
US11875125B2 (en) * | 2021-03-18 | 2024-01-16 | Hcl Technologies Limited | System and method for designing artificial intelligence (AI) based hierarchical multi-conversation system |
Also Published As
Publication number | Publication date |
---|---|
EP2573699B1 (en) | 2017-06-07 |
EP2573699A4 (en) | 2015-06-03 |
US20130138698A1 (en) | 2013-05-30 |
JP5492296B2 (ja) | 2014-05-14 |
CN102893553A (zh) | 2013-01-23 |
CN102893553B (zh) | 2015-11-25 |
DK2573699T3 (en) | 2017-07-31 |
EP2573699A1 (en) | 2013-03-27 |
JPWO2011145401A1 (ja) | 2013-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5492296B2 (ja) | 個人情報匿名化装置 | |
US9652467B2 (en) | Inline tree data structure for high-speed searching and filtering of large datasets | |
US7519582B2 (en) | System and method for performing a high-level multi-dimensional query on a multi-structural database | |
US9767127B2 (en) | Method for record linkage from multiple sources | |
Lal | Neo4j graph data modeling | |
US9519636B2 (en) | Deduction of analytic context based on text and semantic layer | |
CN102144227B (zh) | 与基于文档类型的文档处理相关的方法及设备 | |
US10579678B2 (en) | Dynamic hierarchy generation based on graph data | |
JP2013080375A (ja) | 個人情報匿名化装置及び方法 | |
US20110153611A1 (en) | Extracting data from a report document | |
US9697250B1 (en) | Systems and methods for high-speed searching and filtering of large datasets | |
JP6173848B2 (ja) | 文書分類装置 | |
Iglesias-Molina et al. | An ontological approach for representing declarative mapping languages | |
Staar et al. | Corpus processing service: a knowledge graph platform to perform deep data exploration on corpora | |
Bernard et al. | Theseus: A framework for managing knowledge graphs about geographical divisions and their evolution | |
US8423523B2 (en) | Apparatus and method for utilizing context to resolve ambiguous queries | |
JP7423859B2 (ja) | データクラスタリング方法及びシステム、データ記憶方法及びシステム並びに記憶媒体 | |
Meyer et al. | Integration of data mining results into multi-dimensional data models | |
US20200342139A1 (en) | High-dimensional data anonymization for in- memory applications | |
Wu et al. | Efficient in-memory evaluation of reachability graph pattern queries on data graphs | |
Huang | Visualising category recoding and numeric redistributions | |
Delanaux | Privacy-Preserving Linked Data Integration | |
US20230195752A1 (en) | Virtual foreign keys | |
Abramowicz et al. | Business Information Systems: 22nd International Conference, BIS 2019, Seville, Spain, June 26–28, 2019, Proceedings, Part I | |
Zhao | Schema Profiling for Document Databases: System Development and Case Studies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180024517.9 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11783338 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012515790 Country of ref document: JP |
|
REEP | Request for entry into the european phase |
Ref document number: 2011783338 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011783338 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13697904 Country of ref document: US |