WO2016059787A1 - 情報処理装置、情報処理方法、及び、記録媒体 - Google Patents
情報処理装置、情報処理方法、及び、記録媒体 Download PDFInfo
- Publication number
- WO2016059787A1 WO2016059787A1 PCT/JP2015/005148 JP2015005148W WO2016059787A1 WO 2016059787 A1 WO2016059787 A1 WO 2016059787A1 JP 2015005148 W JP2015005148 W JP 2015005148W WO 2016059787 A1 WO2016059787 A1 WO 2016059787A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- grouping
- group
- node
- unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Definitions
- the present invention relates to information retrieval, and more particularly, to an information processing apparatus, an information processing method, and a recording medium for grouping similar data.
- data such as images or documents are evaluated based on the similarity of features of the data rather than evaluation based on matching of the data. Further, when classifying or summarizing such data, it is effective to collect data in which the degree of similarity between data is a predetermined value or more. Such processing is generally called grouping based on similarity (see, for example, Patent Document 1).
- the information search device described in Patent Document 1 provides a function of grouping the results of a similar search (similar search from a user) input by a user into similar groups.
- the information search device when grouping the similar search results received from the user, the information search device further performs a similar search (similar search with respect to the search results) for the similar search results received from the user. Then, the information search apparatus collects data having similarities equal to or higher than a predetermined threshold with respect to the search result into a group. The information retrieval apparatus performs grouping based on such an operation. At this time, the information search device executes a similarity search from search results having a high similarity among the similar search results from the user. Then, the information search apparatus groups search results whose similarity is equal to or greater than a predetermined threshold. However, the information search apparatus does not perform grouping on search results for which a similar search has already been performed.
- Non-Patent Document 1 constructs a tree structure of data (hereinafter, simply referred to as “tree structure”) in consideration of the hierarchy of similarity between data.
- the technique of Non-Patent Document 1 uses such a tree structure to realize high-speed similarity search.
- the tree structure described in Non-Patent Document 1 is generally configured as follows. That is, the nodes constituting the tree structure store data.
- Non-Patent Document 1 When a node includes data that exceeds the capacity of the node, the technique described in Non-Patent Document 1 selects representative data (representative data) from the data included in the node, and Is placed on the parent node of the node. At the same time, the technique described in Non-Patent Document 1 associates the upper limit of the similarity between the representative data and the data at the node with the edge between the parent node and the node. In the technique described in Non-Patent Document 1, the similarity value associated with the edge increases as the entire tree structure increases from the root node (root node) toward the leaf node (leaf node). , Maintain the tree structure. The technique described in Non-Patent Document 1 provides a data structure that focuses on the hierarchy of similarity as a tree structure. However, Non-Patent Document 1 does not disclose a grouping method.
- Patent Document 1 discloses a technique related to grouping of similar search results from users. That is, the technique described in Patent Document 1 does not assume general data. Therefore, grouping in general data cannot be realized. For example, using the technique described in Patent Document 1, “group what types of images are in the database based on their similarity to the image database”. Such a query cannot be executed. As described above, Patent Document 1 has a problem in that data to be grouped is limited.
- Non-Patent Document 1 does not disclose a grouping method.
- Non-Patent Document 1 is combined with Patent Document 1, the limitation on data to be grouped cannot be solved.
- Patent Document 1 takes a long time for calculation for grouping similar data. The reason is that the technique described in Patent Document 1 performs a similar search on each search result, and therefore it takes time of O (N 2 ) order in the grouping calculation.
- O () is a notation that gives an approximate evaluation of a change in value, and is generally called “Landau symbol” or “O-symbol”.
- N indicates the number of data items.
- Patent Document 1 has a problem that it takes time for processing.
- An object of the present invention is to solve the above-described problems, and to reduce the calculation time and to realize grouping based on data similarity without limiting the data to be processed, an information processing method, And providing a recording medium.
- An information processing apparatus includes: a search unit that searches for tree-structured data whose elements are nodes including data; data included in a search target node of the search unit; and a lower node of the data Grouping determination means for determining whether to group using data and lower nodes based on the similarity associated with the edge between and a predetermined threshold, grouping as a result of the determination, Subtree grouping means for creating a group by grouping the determined data and lower nodes, and when the search target node is a leaf node, the search target leaf node is grouped to one or more The leaf node grouping means for creating a group and the data returned by the search by the search means in the backtrack to the higher-level node If the over-flop has not been decided, including a data merging means for merging the data to one of the groups of lower node of the data, and a group merging means for merging at least part of the group in the group.
- a data processing method searches for tree-structured data having nodes including data as elements, and is associated with an edge between data included in the search target node and a node below the data. Based on the similarity and a predetermined threshold value, it is determined whether to group using data and lower nodes, and the data determined to be grouped as a result of the determination and the lower nodes are grouped. Group to create a group, and if the node to be searched is a leaf node, group the leaf nodes to be searched to create one or more groups, and the search returns in the backtrack to the higher nodes. If the group to which the data belongs is not determined, the data is merged into one of the lower nodes of the data, and at least the group To merge a group of parts.
- a recording medium relates to a process of searching for tree-structured data whose elements are nodes including data, and association with an edge between the data included in the search target node and a lower node of the data
- a process for determining whether to group using data and a lower node based on the obtained similarity and a predetermined threshold, and the data determined to be grouped as a result of the determination and the lower node To create a group, and when the search target node is a leaf node, group the search target leaf nodes to create one or more groups, and search If the group to which the data returned in the backtrack to the node belongs is not determined, the data is combined with one of the groups below the data.
- a process of, readable records the program from a computer that includes a processing for merging at least part of the group in the group.
- FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus according to the first embodiment of the present invention.
- FIG. 2 is a flowchart illustrating an example of the operation of the information processing apparatus according to the first embodiment.
- FIG. 3 is a diagram illustrating a tree structure used for explaining the operation of the first embodiment.
- FIG. 4 is a diagram illustrating an example of grouping of subtrees.
- FIG. 5 is a diagram illustrating an example of grouping leaf nodes.
- FIG. 6 is a diagram illustrating an example of merging representative data.
- FIG. 7 is a diagram illustrating an example of group merging.
- FIG. 8 is a block diagram illustrating an example of the configuration of a modification of the information processing apparatus according to the first embodiment.
- FIG. 9 is a block unit illustrating an exemplary configuration of a modification of the information processing apparatus according to the first embodiment.
- FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus 10 according to the first embodiment of the present invention.
- the direction of the arrow in the drawing shows an example, and does not limit the direction of the signal between the blocks.
- the information processing apparatus 10 includes a data processing unit 100, a tree structure holding unit 110, a similarity holding unit 120, and an intermediate result holding unit 130.
- holding means storage or storage.
- the tree structure holding unit 110 holds data (hereinafter referred to as a tree structure) constructed as a tree structure that is a processing target of the information processing apparatus 10.
- the tree structure includes nodes (eg, root nodes and leaf nodes) and edges.
- Each node includes data (for example, representative data).
- representative data is data representing a node in data included in the node.
- the similarity holding unit 120 holds a similarity (for example, a similar radius) associated with (given) an edge.
- the tree structure holding unit 110 and the similarity holding unit 120 may hold the data described above in advance before the operation in the data processing unit 100 described later.
- the user of the information processing apparatus 10 may store the data in each holding unit prior to processing.
- the intermediate result holding unit 130 holds a result (hereinafter referred to as “intermediate result”) generated based on grouping processing of each unit of the data processing unit 100 described later.
- the data processing unit 100 searches for a tree structure held by the tree structure holding unit 110. Then, the data processing unit 100 uses the tree structure held by the tree structure holding unit 110, the received grouping threshold 140, which will be described later, and the similarity held by the similarity holding unit 120, to form a group of tree structures. Execute the conversion.
- the data processing unit 100 causes the intermediate result holding unit 130 to hold the grouping processing result (intermediate result). Then, the data processing unit 100 outputs the intermediate result held by the intermediate result holding unit 130 as the grouping result 150 after the grouping process ends.
- the data processing unit 100 includes a tree search unit 102, a grouping determination unit 103, a subtree grouping unit 104, a leaf node grouping unit 105, an inter-group merging unit 106, and a representative data merging unit 107. Including. Further, the data processing unit 100 includes a grouping threshold value receiving unit 101 and a grouping result output unit 108. Each configuration included in the data processing unit 100 uses data held in the tree structure holding unit 110, the similarity holding unit 120, and the intermediate result holding unit 130 as necessary. Each configuration causes the intermediate result holding unit 130 to hold data as necessary. In the following description, for the sake of convenience of explanation, each configuration may omit an operation for holding data in each holding unit and an operation for reading data.
- the grouping threshold receiving unit 101 receives the grouping threshold 140 from an external device (not shown).
- the grouping threshold value reception unit 101 may receive the grouping threshold value 140 from a device operated by the user.
- the grouping threshold receiving unit 101 passes the grouping threshold 140 to the grouping determination unit 103.
- the grouping threshold value reception unit 101 may transmit the grouping threshold value 140 in response to the request from the grouping determination unit 103.
- the grouping threshold receiving unit 101 may store the grouping threshold 140 in a storage unit (not shown).
- the grouping determination unit 103 may read the grouping threshold value 140 from the storage unit.
- the tree search unit 102 follows the tree structure (tree) held in the tree structure holding unit 110 according to the structure. Then, the tree search unit 102 performs processing on each unit to be described later based on the node, data, or edge that is being traced, that is, the node, data, or edge that is being searched (hereinafter collectively referred to as “search target”). Ask.
- the grouping determination unit 103 compares the similarity (for example, the similarity radius) associated with the edge of the tree structure (the search target edge) that the tree search unit 102 is currently tracking, and the grouping threshold 140. To do. The grouping determination unit 103 determines whether the node group associated with the search target edge can be grouped based on the comparison result. That is, the grouping threshold value 140 is a threshold value used for determining whether grouping is possible.
- the subtree grouping unit 104 groups subtrees that can be grouped. That is, the subtree grouping unit 104 creates a group using a subtree that can be grouped.
- the subtree that can be grouped is a subtree associated with an edge that the grouping determination unit 103 has determined to be a similarity degree equal to or greater than the grouping threshold 140.
- the leaf node grouping unit 105 groups data in leaf nodes (leaf nodes). That is, the leaf node grouping unit 105 creates a group using the leaf node data.
- the inter-group merging unit 106 merges the groups that can be merged into one group. That is, the inter-group merging unit 106 edits a group based on the created group.
- the representative data merging unit 107 merges the representative data into the belonging group. That is, the representative data merging unit 107 edits the group based on the created group and the representative data.
- the grouping result output unit 108 reads the intermediate result held in the intermediate result holding unit 130 and outputs it as a grouping result 150. For example, the grouping result output unit 108 transmits the grouping result 150 to a device operated by the user. Alternatively, the grouping result output unit 108 may display the grouping result 150 on a display device (not shown).
- FIG. 2 is a flowchart showing an example of the operation of the information processing apparatus 10 according to the present embodiment.
- the grouping threshold receiving unit 101 receives the grouping threshold 140.
- the tree search unit 102 starts searching from the root node in the tree structure held in the tree structure holding unit 110 (step A201).
- the tree search unit 102 determines whether or not the current node is a leaf node (step A202).
- the tree search unit 102 determines whether there is uninspected representative data in the node (Step A203).
- the tree search unit 102 selects one representative data from the uninspected representative data. Then, the tree search unit 102 requests the grouping determination unit 103 to perform processing.
- the grouping determination unit 103 determines whether or not the similarity associated with the edge of the selected representative data is greater than or equal to the grouping threshold 140 by referring to the similarity held by the similarity holding unit 120 (inspection). (Step A204). The grouping determination unit 103 returns the determination result to the tree search unit 102.
- the tree search unit 102 requests the subtree grouping unit 104 to perform processing based on the determination result.
- the subtree grouping unit 104 groups representative data and data of nodes below the representative data (creates a group). Then, the subtree grouping unit 104 outputs the result (group information) to the intermediate result holding unit 130 (step A205). Then, the operation of the information processing apparatus 10 returns to Step A203.
- a subtree including representative data and data of nodes below the representative data is referred to as “subtree below the representative data”.
- the tree search unit 102 moves the search target to a child node ahead of the edge (Step A206). Then, the tree search unit 102 returns to Step A202 and repeats the same operation.
- the tree search unit 102 requests the leaf node grouping unit 105 to perform processing.
- the leaf node grouping unit 105 groups (creates a group) data in the leaf node.
- the leaf node grouping unit 105 holds the result (group information) in the intermediate result holding unit 130 (step A207).
- the leaf node grouping unit 105 groups the leaf nodes into one or a plurality of groups. That is, the leaf node grouping unit 105 creates one or a plurality of groups based on the grouping of leaf nodes.
- the tree search unit 102 determines whether or not the current node (in this case, the leaf node) is a root node (root node) (step A208).
- Step A208 If it is not the root node (No in Step A208), the search in the tree search unit 102 returns to the upper node (parent node) (backtrack) (Step A209).
- the tree search unit 102 requests the representative data merging unit 107 to perform processing.
- the representative data merging unit 107 merges the representative data into the group most suitable for the attribution of the representative data among the groups created by the nodes (subtrees) lower than the representative data (step A210). Details of this operation will be described later. If there is no belonging group, the representative data merging unit 107 creates a group based on the representative data.
- the representative data merging unit 107 holds the processing result (group information) in the intermediate result holding unit 130. Then, the information processing apparatus 10 returns to step A203.
- the tree search unit 102 requests the intergroup merging unit 106 to perform processing.
- the group merging unit 106 merges the groups that can be merged into one group in the created grouping result 150 (step A211). Details of this operation will be described later.
- the inter-group merging unit 106 causes the intermediate result holding unit 130 to hold the processing result.
- step A208 the information processing apparatus 10 proceeds to step A208 and executes the operation already described.
- the information processing apparatus 10 repeats the operation described above until the search target of the tree search unit 102 returns to the root node. If the tree search unit 102 determines that the node being searched is a root node (Yes in step A208), the tree search unit 102 requests the grouping result output unit 108 to perform processing. The grouping result output unit 108 outputs the grouping result 150 (step A211).
- the information processing apparatus 10 ends the grouping operation.
- the information processing apparatus 10 can obtain an effect that data grouping can be realized without restricting target data.
- the information processing apparatus 10 does not require restrictions on data to be processed in its operation.
- the information processing apparatus 10 can obtain an effect that the calculation time can be reduced and the grouping calculation can be executed.
- the grouping determination unit 103 determines grouping, and each unit executes grouping processing as described below.
- the subtree grouping unit 104 groups subtrees (representative data and data of nodes lower than the representative data) below the representative data whose similarity associated with the edge is the grouping threshold 140 or more.
- the information processing apparatus 10 can eliminate processing at a node (child node) below the edge. This is the first reason.
- the leaf node grouping unit 105 groups the leaf nodes.
- the representative data merging unit 107 merges the representative data into an appropriate group.
- the group merging unit 106 merges groups that can be merged in the group result.
- a general information processing apparatus that searches a tree structure searches a sub-tree structure of at least a part of the tree structure a plurality of times.
- the information processing apparatus 10 can realize the grouping operation based on a single tree structure search in the tree search unit 102. This is the second reason.
- the calculation of grouping in the leaf node grouping unit 105 occurs when the search of the tree search unit 102 proceeds to the leaf node.
- the leaf nodes to be calculated are some leaf nodes.
- the calculation at the leaf node is calculation using data included in the leaf node. Therefore, the calculation amount for the leaf node in this embodiment is not a large calculation amount.
- FIG. 3 is a diagram showing a tree structure of data used for explaining the operation of the first embodiment.
- the tree structure shown in FIG. 3 is a hierarchical structure in which data is created based on similarity using, for example, the technique described in Non-Patent Document 1.
- a rectangle shown in FIG. 3 indicates a node.
- Each node (node A to node M) are representative data of n matter (e.g., representative data of the node A, from A 1 is A n) including.
- all the nodes include n representative data, but this is an example. In the present embodiment, the number of data included in a node may be different for each node.
- the edge is indicated by an arrow from the data of the parent node to the child node.
- the tree structure holding unit 110 holds the tree structure shown in FIG. 3 in advance.
- the similarity holding unit 120 holds the similarity shown in FIG. 3 in advance in association with the tree structure.
- the tree structure holding method in the tree structure holding unit 110 of this embodiment is not particularly limited.
- a node ID for example, A to M shown in FIG. 3
- data belonging to the node, and data representing which sub-tree each data represents are represented in a tabular format, or A method of holding in an object format can be assumed.
- the method for holding the similarity (for example, the similarity radius) in the similarity holding unit 120 of the present embodiment is not particularly limited. Since the similarity retention method depends on the tree structure retention method, it may be determined based on the tree structure. However, the tree structure holding method is a method in which an edge is associated with a similarity degree associated with the edge. As a method of holding similarity, for example, a method in which a node on the side of referring to a subtree holds similarity by associating with representative data, or a method in which a root node side of a subtree holds edges by associating similarity, is assumed. Alternatively, as a method of holding the similarity, an edge object that connects the representative data and the root node of the subtree holds the similarity in the inside, or the edge object holds the similarity in association with the edge ID. A method is conceivable.
- the method for holding intermediate results in this embodiment is not particularly limited.
- the intermediate result holding method is preferably a data structure like a stack. This is for the following reason.
- a data structure such as a stack is a data structure in which the last input data is output first (LIFO: Last ⁇ ⁇ ⁇ In, First Out).
- a data structure like a stack is hereinafter simply referred to as a “stack”. The amount of data stored in the stack is called the stack height.
- the information processing apparatus 10 In extraction of group candidates to which representative data belongs and extraction of groups to be merged in inter-group merging, the information processing apparatus 10 generates an intermediate generated for a lower node of the target node (current node) It is necessary to extract the result (group information).
- the intermediate result holding unit 130 holds an intermediate result using a data structure (LIFO) such as a stack
- the information processing apparatus 10 checks the data stacked on the stack as a search for a tree structure in the above extraction. This is because extraction can be realized.
- LIFO data structure
- the information processing apparatus 10 receives the threshold ⁇ q as the grouping threshold 140. That is, the information processing apparatus 10 uses the threshold value ⁇ q as a reference for grouping data, and creates (extracts) a group including data having a similarity of the threshold value ⁇ q or more.
- the operation of grouping data based on the search of the tree structure in the information processing apparatus 10 according to the present embodiment is generally the operation described below.
- the tree search unit 102 starts searching for a tree structure from a root node (root node) (step A201).
- the root node is node A.
- Node A includes n representative data from A 1 to A n.
- the tree search unit 102 determines whether or not the node A is a leaf node (leaf node) (step A202).
- tree search unit 102 determines whether there is uninspected representative data (step A203).
- tree searching unit 102 selects a representative data A 1 (Yes at step A203).
- the grouping determination unit 103 compares the threshold ⁇ q with the similarity ⁇ 1 associated with the edge connecting the representative data A 1 and the node B including the representative data B 1 to B n (step S1). A204).
- the tree search unit 102 sets the search target node as a child beyond the edge. Move to the node (in this case, node B) (step A206). In this case, the similarity relationship is assumed to be ⁇ 1 ⁇ q . Therefore, the tree search unit 102 sets the search target as the node B. Based on such an operation, the tree search unit 102 searches for an edge associated with a similarity of ⁇ q or more.
- the subtree grouping unit 104 creates a subtree group (step A205).
- FIG. 4 is a diagram showing an example of subtree grouping.
- the similarity ⁇ 3 associated with the edge between the data C 1 of the node C and the node D is greater than or equal to a threshold ⁇ q ( ⁇ 3 ⁇ ⁇ q ).
- the group determination section 103 determines similarity associated with this edge is equal to or greater than the threshold value [delta] q (Yes at step A204). That is, in this case, the underlying data in the data C 1, based on the data C 1, has more similarity threshold [delta] q.
- the sub-tree grouping unit 104 a representative data C 1 or less subtrees of (Group 1 in Figure 4) group. Then, the subtree grouping unit 104 outputs the created group to the intermediate result holding unit 130 as an intermediate result (step A205).
- the nodes included in the subtree below the representative data C 1 are grouped as a group 1. Therefore, the information processing apparatus 10 does not have to search for the representative data C 1 from the leaf node (and data included in the node). Therefore, the tree search unit 102 selects data C 2 ,..., C n which are the next representative data. Similarly, the grouping determination unit 103 determines whether there is an edge having a similarity equal to or greater than the threshold value ⁇ q.
- the leaf node grouping unit 105 groups the leaf nodes ( Step A207).
- FIG. 5 is a diagram illustrating an example of grouping leaf nodes.
- the leaf node grouping unit 105 a leaf node (node G), so as to satisfy the threshold value [delta] q, for example, grouped into two groups (Group 2 and Group 3).
- the processing of the leaf node grouping unit 105 is not particularly limited. However, in order to group the leaf nodes, the leaf node grouping unit 105 needs to examine the data of the leaf nodes in detail. Thus, for example, the leaf node grouping unit 105 may group leaf nodes using the method described in Patent Document 1.
- the data to be grouped is a sufficiently small number of data suppressed by the capacity of the nodes. Therefore, even when the calculation amount of the leaf node grouping unit 105 requires an order of O (n 2 ) for the number n of cases, the total calculation amount of the information processing apparatus 10 does not become a large calculation amount, There is a calculation amount that can be processed at a sufficiently high speed.
- the grouping process in the leaf node of the leaf node grouping unit 105 is not particularly limited. Various variations can be assumed for the grouping process of leaf nodes according to the present embodiment. For example, when the predetermined number of data is not included, the leaf node grouping unit 105 may not generate a group in the leaf node. Alternatively, the leaf node grouping unit 105 may exclude the leaf node from the grouping target when the leaf node does not include a predetermined number of data.
- Tree searching unit 102 grouping subtree similarity corresponding to the above edge threshold [delta] q, or, after the grouping of the leaf nodes, performing backtracking to a higher node in the tree structure (step A209).
- the representative data merging unit 107 merges representative data for which the group to which attribution belongs has not yet been determined. (Step A210). Further, the group merging unit 106 performs merging between groups (step A211).
- FIG. 6 is a diagram illustrating an example of merging representative data. Referring to FIG. 6, described merging of representative data C n.
- the representative data C n is representative of data in a node below the representative data C n . Therefore, the representative data merging unit 107, the representative data C n is in the group generated by the lower node from the representative data C n, to check whether attributable to any group.
- the representative data merging unit 107 has a number corresponding to the stack height. By examining the group candidates, you can narrow down the group candidates. That is, the representative data merging unit 107 selects, as inspection candidates, groups up to the height stacked in the stack when the representative data is visited in the order of (first) in the search by the tree search unit 102. And it is sufficient.
- the representative data merging unit 107 compares the similarity between the representative data of the candidate group and the representative data (C n ) to be grouped.
- the representative data merging unit 107 may merge the group having the highest similarity and the representative data (C n ). However, the representative data merging unit 107 merges the representative data (C n ) into groups so that the similarity is equal to or greater than the threshold value ⁇ q .
- the group that already exists when the tree search unit 102 visits in the order of travel is group 1.
- the groups existing when visiting in the order of return are groups 1, 2, and 3. That is, the representative data merging unit 107, a group 2 composed of its difference, or group 3, it can be determined that the group generated by the lower node from the representative data C n. For this reason, the representative data merging unit 107 only needs to inspect which of the group 2 or the group 3 is more appropriate for the representative data C n .
- FIG. 6 shows that the representative data C n has been merged into the group 3.
- the above is not limited to the representative data merging unit 107 using the stacked data.
- the representative data merging unit 107 may use data other than the stack structure.
- the inter-group merging unit 106 can narrow down group candidates to be examined as to whether or not merging is possible in the merging between groups, as in the case of merging representative data. That is, the group merging unit 106 may select a group within the stack height when visiting nodes in the order of travel (first) in the tree search. The group merging unit 106 may select a merging method that matches the nature of the data to be merged as a merging method between groups. For example, the group merging unit 106 may simply merge groups when the similarity of the representative data between the groups is equal to or greater than the similarity that takes into account the similarity of each group.
- FIG. 7 is a diagram showing an example of group merging.
- the inter-group merging unit 106 compares these groups and checks whether or not merging is possible between the groups.
- the example shown in FIG. 7 indicates that group 1 and group 2 are merged to generate group 1-2.
- the above is not limited to the intergroup merging unit 106 using the stacked data.
- the group merging unit 106 may use data other than the stack structure.
- the operation timing of the inter-group merging unit 106 is not limited as long as it is before the operation of the grouping result output unit 108.
- the group merging unit 106 may merge the groups after the tree search unit 102 returns to the root node.
- the tree search unit 102 searches the tree structure until there is no unprocessed representative data of the root node (end).
- the grouping result output unit 108 outputs the grouping result 150 (step A211).
- the information processing apparatus 10 described above is configured as follows.
- each component of the information processing apparatus 10 may be configured with a hardware circuit.
- the information processing apparatus 10 may be configured using a plurality of apparatuses in which each component is connected via a network or a bus (hereinafter collectively referred to as “network or the like”).
- FIG. 8 is a block diagram illustrating an example of the configuration of the information processing apparatus 11 according to a modification of the present embodiment.
- the direction of the arrow in the drawing shows an example, and does not limit the direction of the signal between the blocks.
- the information processing apparatus 11 includes a tree search unit 102, a grouping determination unit 103, a sub-tree grouping unit 104, a leaf node grouping unit 105, an inter-group merging unit 106, and a representative data merging unit 107.
- Each configuration of the information processing apparatus 11 includes a grouping threshold value receiving unit 101, a tree structure holding unit 110, a similarity holding unit 120, and an intermediate result holding unit not shown in FIG. 130 is connected.
- Each configuration of the information processing apparatus 11 operates in the same manner as each configuration of the information processing apparatus 10.
- the grouping result output unit 108 may extract the grouping result 150 from the intermediate result holding unit 130 after the information processing apparatus 11 operates.
- the information processing apparatus 11 configured in this way can obtain the same effects as the information processing apparatus 10.
- each configuration of the information processing apparatus 11 operates in the same manner as the configuration of the information processing apparatus 10 and can perform grouping.
- the information processing apparatus 11 is the minimum configuration of the embodiment of the present invention.
- the information processing apparatus 10 may be configured by a single piece of hardware.
- the information processing apparatus 10 may be realized as a computer device including a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory).
- the information processing apparatus 10 may be realized as a computer apparatus that further includes an input / output connection circuit (IOC: Input ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ / Output Circuit) and a network interface circuit (NIC: Network Interface Circuit).
- IOC Input ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ / Output Circuit
- NIC Network Interface Circuit
- FIG. 9 is a block diagram illustrating an example of a configuration of an information processing apparatus 600 that is a second modification of the information processing apparatus 10.
- the information processing apparatus 600 includes a CPU 610, a ROM 620, a RAM 630, an internal storage device 640, an IOC 650, and a NIC 680, and constitutes a computer device.
- CPU 610 reads a program from ROM 620.
- the CPU 610 controls the RAM 630, the internal storage device 640, the IOC 650, and the NIC 680 based on the read program.
- the computer including the CPU 610 controls these configurations and realizes each function as each unit shown in FIG. 1 are a grouping threshold receiving unit 101, a tree searching unit 102, a grouping determining unit 103, a subtree grouping unit 104, a leaf node grouping unit 105, an intergroup merging unit 106, and a representative data merging unit 107. , And the grouping result output unit 108.
- the CPU 610 may use the RAM 630 or the internal storage device 640 as a temporary storage of a program when realizing each function.
- the CPU 610 may read the program included in the storage medium 700 storing the program so as to be readable by a computer using a storage medium reading device (not shown). Alternatively, the CPU 610 may receive a program from an external device (not shown) via the NIC 680, store the program in the RAM 630, and operate based on the stored program.
- ROM 620 stores programs executed by CPU 610 and fixed data.
- the ROM 620 is, for example, a P-ROM (Programmable-ROM) or a flash ROM.
- the RAM 630 temporarily stores programs executed by the CPU 610 and data.
- the RAM 630 is, for example, a D-RAM (Dynamic-RAM).
- the internal storage device 640 stores data and programs stored in the information processing device 600 for a long period of time. Further, the internal storage device 640 may operate as a temporary storage device for the CPU 610.
- the internal storage device 640 is, for example, a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), or a disk array device.
- the internal storage device 640 operates as the tree structure holding unit 110, the similarity holding unit 120, and the intermediate result holding unit 130.
- the ROM 620 and the internal storage device 640 are nonvolatile storage media.
- the RAM 630 is a volatile storage medium.
- the CPU 610 can operate based on a program stored in the ROM 620, the internal storage device 640, or the RAM 630. That is, the CPU 610 can operate using a nonvolatile storage medium or a volatile storage medium.
- the IOC 650 mediates data between the CPU 610, the input device 660, and the display device 670.
- the IOC 650 is, for example, an IO interface card or a USB (Universal Serial Bus) card.
- the input device 660 is a device that receives an input instruction from an operator of the information processing apparatus 600.
- the input device 660 is, for example, a keyboard, a mouse, or a touch panel.
- the input device 660 may operate as the grouping threshold receiving unit 101.
- the display device 670 is a device that displays information to the operator of the information processing apparatus 600.
- the display device 670 is a liquid crystal display, for example.
- the display device 670 may operate as the grouping result output unit 108.
- the NIC 680 relays data exchange with an external device (not shown) via the network.
- the NIC 680 is, for example, a LAN (Local Area Network) card.
- the NIC 680 may operate as the grouping threshold receiving unit 101 or the grouping result output unit 108.
- the information processing apparatus 600 configured in this way can obtain the same effects as the information processing apparatus 10.
- the present invention can be applied to data summarization such as images, videos, and documents.
- Information processing apparatus 11 Information processing apparatus 100 Data processing part 101 Grouping threshold receiving part 102 Tree search part 103 Grouping determination part 104 Subtree grouping part 105 Leaf node grouping part 106 Intergroup merging part 107 Representative data merging part 108 Group Grouping result output unit 110 tree structure holding unit 120 similarity holding unit 130 intermediate result holding unit 140 grouping threshold 150 grouping result 600 information processing device 610 CPU 620 ROM 630 RAM 640 Internal storage device 650 IOC 660 Input device 670 Display device 680 NIC 700 storage media
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
まず、第1の実施形態に係る情報処理装置10の構成について説明する。
図1は、本発明における第1の実施形態に係る情報処理装置10の構成の一例を示すブロック図である。ただし、図面中の矢印の方向は、一例を示すものであり、ブロック間の信号の向きを限定するものではない。
次に、図面を参照して、本実施形態の動作について説明する。
次に、本実施の形態の効果について説明する。
次に、具体的な木構造を用いて、本実施形態の詳細な動作について説明する。
以上の説明した情報処理装置10は、次のように構成される。
また、情報処理装置10は、複数の構成部を1つのハードウェアで構成されても良い。
11 情報処理装置
100 データ処理部
101 グループ化閾値受信部
102 木探索部
103 グループ化判定部
104 サブツリーグループ化部
105 リーフノードグループ化部
106 グループ間併合部
107 代表データ併合部
108 グループ化結果出力部
110 木構造保持部
120 類似度保持部
130 中間結果保持部
140 グループ化閾値
150 グループ化結果
600 情報処理装置
610 CPU
620 ROM
630 RAM
640 内部記憶装置
650 IOC
660 入力機器
670 表示機器
680 NIC
700 記憶媒体
Claims (7)
- データを含むノードを要素とした木構造のデータを探索する探索手段と、
前記探索手段の探索対象のノードに含まれるデータとそのデータの下位のノードとの間のエッジに関連付けられた類似度と、所定の閾値とを基に、前記データと前記下位のノードとを用いてグループ化するか否かを判定するグループ化判定手段と、
前記判定の結果としてグループ化と判定された前記データと前記下位のノードとをグループ化して、グループを作成するサブツリーグループ化手段と、
前記探索対象のノードがリーフノードの場合に、前記検索対象のリーフノードをグループ化して、1つ又は複数のグループを作成するリーフノードグループ化手段と、
前記探索手段における探索が上位のノードへのバックトラックにおいて戻ったデータが帰属先のグループが決まっていない場合に、そのデータをそのデータの下位のノードのいずれかのグループに併合するデータ併合手段と、
前記グループの少なくとも一部のグループを併合するグループ併合手段と
を含む情報処理装置。 - 前記木構造のデータに関連付けられる類似度を保持する類似度手段と、
前記類似度の範囲が前記木構造の下位に行くほど大きな値となるように構築された木構造のデータを保持する木構造手段と、
前記グループ化判定手段の判定に用いられる前記閾値を受信するグループ化閾値受信手段と、
中間結果である前記作成又は併合されたグループを保持する中間結果保持手段と、
グループ化結果として前記中間結果保持手段が保持するグループを出力するグループ化結果出力手段と
を含む請求項1に記載の情報処理装置。 - 前記中間結果保持手段が、
スタック構造を用いて前記中間結果を保持する
請求項2に記載の情報処理装置。 - 前記代表データ併合手段又はグループ併合手段が、
前記スタック構造に積まれたデータを用いて処理する
請求項3に記載の情報処理装置 - 前記代表データ併合手段が、
帰属するグループがない場合に、代表データを用いてグループを作成する
請求項1ないし4のいずれか1項に記載の情報処理装置。 - データを含むノードを要素とした木構造のデータを探索し、
探索対象のノードに含まれるデータとそのデータの下位のノードとの間のエッジに関連付けられた類似度と、所定の閾値とを基に、前記データと前記下位のノードとを用いてグループ化するか否かを判定し、
前記判定の結果としてグループ化と判定された前記データと前記下位のノードとをグループ化して、グループを作成し、
前記探索対象のノードがリーフノードの場合に、前記検索対象のリーフノードをグループ化して、1つ又は複数のグループを作成し、
前記探索が上位のノードへのバックトラックにおいて戻ったデータが帰属先のグループが決まっていない場合に、そのデータをそのデータの下位のノードのいずれかのグループに併合し、
前記グループの少なくとも一部のグループを併合する
情報処理方法。 - データを含むノードを要素とした木構造のデータを探索する処理と、
探索対象のノードに含まれるデータとそのデータの下位のノードとの間のエッジに関連付けられた類似度と、所定の閾値とを基に、前記データと前記下位のノードとを用いてグループ化するか否かを判定する処理と、
前記判定の結果としてグループ化と判定された前記データと前記下位のノードとをグループ化して、グループを作成する処理と、
前記探索対象のノードがリーフノードの場合に、前記検索対象のリーフノードをグループ化して、1つ又は複数のグループを作成する処理と、
前記探索が上位のノードへのバックトラックにおいて戻ったデータが帰属先のグループが決まっていない場合に、そのデータをそのデータの下位のノードのいずれかのグループに併合する処理と、
前記グループの少なくとも一部のグループを併合する処理と
をコンピュータに実行させるプログラムをコンピュータから読み取り可能に記録する記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/517,735 US10482075B2 (en) | 2014-10-14 | 2015-10-09 | Information processing device, information processing method, and recording medium |
JP2016553969A JP6624062B2 (ja) | 2014-10-14 | 2015-10-09 | 情報処理装置、情報処理方法、及び、プログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014209936 | 2014-10-14 | ||
JP2014-209936 | 2014-10-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016059787A1 true WO2016059787A1 (ja) | 2016-04-21 |
Family
ID=55746347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/005148 WO2016059787A1 (ja) | 2014-10-14 | 2015-10-09 | 情報処理装置、情報処理方法、及び、記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10482075B2 (ja) |
JP (1) | JP6624062B2 (ja) |
WO (1) | WO2016059787A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114281802A (zh) * | 2021-12-27 | 2022-04-05 | 中国建设银行股份有限公司 | 一种数据处理方法及装置 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10148499B2 (en) * | 2016-11-09 | 2018-12-04 | Seagate Technology Llc | Verifying distributed computing results via nodes configured according to a tree structure |
CN112463857B (zh) * | 2020-03-27 | 2023-07-25 | 谭凌 | 基于关系数据库支持回溯数据查询的数据处理方法及系统 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010245983A (ja) * | 2009-04-09 | 2010-10-28 | Nippon Telegr & Teleph Corp <Ntt> | 映像構造化装置,映像構造化方法および映像構造化プログラム |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000112988A (ja) | 1998-10-05 | 2000-04-21 | Canon Inc | 情報検索装置と情報検索方法、及び記憶媒体 |
US6741983B1 (en) * | 1999-09-28 | 2004-05-25 | John D. Birdwell | Method of indexed storage and retrieval of multidimensional information |
US7340674B2 (en) * | 2002-12-16 | 2008-03-04 | Xerox Corporation | Method and apparatus for normalizing quoting styles in electronic mail messages |
FI119160B (fi) * | 2005-10-10 | 2008-08-15 | Medicel Oy | Tietokannan hallintajärjestelmä |
US7752233B2 (en) * | 2006-03-29 | 2010-07-06 | Massachusetts Institute Of Technology | Techniques for clustering a set of objects |
US20080228699A1 (en) * | 2007-03-16 | 2008-09-18 | Expanse Networks, Inc. | Creation of Attribute Combination Databases |
US20100106713A1 (en) * | 2008-10-28 | 2010-04-29 | Andrea Esuli | Method for performing efficient similarity search |
JP4839424B2 (ja) * | 2008-12-15 | 2011-12-21 | インターナショナル・ビジネス・マシーンズ・コーポレーション | プログラムの解析を支援するための方法、並びにそのコンピュータ・プログラム及びコンピュータ・システム |
CN101576932B (zh) * | 2009-06-16 | 2012-07-04 | 阿里巴巴集团控股有限公司 | 近重复图片的计算机查找方法和装置 |
US8438184B1 (en) * | 2012-07-30 | 2013-05-07 | Adelphic, Inc. | Uniquely identifying a network-connected entity |
US9536065B2 (en) * | 2013-08-23 | 2017-01-03 | Morphotrust Usa, Llc | System and method for identity management |
US9407620B2 (en) * | 2013-08-23 | 2016-08-02 | Morphotrust Usa, Llc | System and method for identity management |
-
2015
- 2015-10-09 WO PCT/JP2015/005148 patent/WO2016059787A1/ja active Application Filing
- 2015-10-09 US US15/517,735 patent/US10482075B2/en active Active
- 2015-10-09 JP JP2016553969A patent/JP6624062B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010245983A (ja) * | 2009-04-09 | 2010-10-28 | Nippon Telegr & Teleph Corp <Ntt> | 映像構造化装置,映像構造化方法および映像構造化プログラム |
Non-Patent Citations (1)
Title |
---|
JIANQUAN LIU ET AL.: "Ruijido no Kaiso Kankei ni Motozuku Kikozo Sakuin o Mochiita Koritsuteki na Ruiji Kensaku", DAI 5 KAI FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT RONBUNSHU, 31 May 2013 (2013-05-31), pages 1 - 8 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114281802A (zh) * | 2021-12-27 | 2022-04-05 | 中国建设银行股份有限公司 | 一种数据处理方法及装置 |
CN114281802B (zh) * | 2021-12-27 | 2024-05-28 | 中国建设银行股份有限公司 | 一种数据处理方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
US20170329809A1 (en) | 2017-11-16 |
JP6624062B2 (ja) | 2019-12-25 |
JPWO2016059787A1 (ja) | 2017-07-27 |
US10482075B2 (en) | 2019-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200356901A1 (en) | Target variable distribution-based acceptance of machine learning test data sets | |
CN111324784B (zh) | 一种字符串处理方法及装置 | |
US8559731B2 (en) | Personalized tag ranking | |
US9317613B2 (en) | Large scale entity-specific resource classification | |
JP6332264B2 (ja) | 類似データ検索装置、類似データ検索方法、及びプログラム | |
Yagoubi et al. | Massively distributed time series indexing and querying | |
CN108804458B (zh) | 一种爬虫网页采集方法和装置 | |
JP6065844B2 (ja) | インデックス走査装置及びインデックス走査方法 | |
JP5825122B2 (ja) | 生成プログラム、生成方法、および生成システム | |
US20150058352A1 (en) | Thin database indexing | |
JP2015508543A (ja) | 店舗訪問データを処理すること | |
US9830344B2 (en) | Evaluation of nodes | |
US20190179823A1 (en) | Extreme value computation | |
JP6624062B2 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
CN103778217A (zh) | 基于当前网页列表进行推荐的方法和系统 | |
US10147095B2 (en) | Chain understanding in search | |
JP6338618B2 (ja) | 生成装置、生成方法、及び生成プログラム | |
WO2020136790A1 (ja) | エッジシステム、情報処理方法及び情報処理プログラム | |
JP5224537B2 (ja) | 局所性検知可能ハッシュの構築装置、類似近傍検索処理装置及びプログラム | |
US9292553B2 (en) | Queries for thin database indexing | |
CN109948018B (zh) | 一种Web结构化数据快速提取方法及系统 | |
CN114911826A (zh) | 一种关联数据检索方法和系统 | |
US11244015B1 (en) | Projecting queries into a content item embedding space | |
US10372694B2 (en) | Structured information differentiation in naming | |
CN109710833B (zh) | 用于确定内容节点的方法与设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15850959 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016553969 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15517735 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15850959 Country of ref document: EP Kind code of ref document: A1 |