WO2011001806A1 - グラフの類似度計算システム、方法及びプログラム - Google Patents
グラフの類似度計算システム、方法及びプログラム Download PDFInfo
- Publication number
- WO2011001806A1 WO2011001806A1 PCT/JP2010/059795 JP2010059795W WO2011001806A1 WO 2011001806 A1 WO2011001806 A1 WO 2011001806A1 JP 2010059795 W JP2010059795 W JP 2010059795W WO 2011001806 A1 WO2011001806 A1 WO 2011001806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- value
- label
- graph
- graphs
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
- G06V30/1983—Syntactic or structural pattern recognition, e.g. symbolic string recognition
- G06V30/1988—Graph matching
Definitions
- the present invention relates to a technique for calculating or evaluating the similarity of an object having a data structure represented as a graph on a computer.
- a graph is a mathematical object consisting of vertices (also called nodes) with labels for identifying each other and edges (also called edges, branches, and links) that connect the vertices, and considering realistic objects For example, it is understood that a road map, a chemical formula, and the like are represented by a graph.
- an intersection can be regarded as a node and a road can be regarded as an edge.
- elements can be regarded as nodes and bonds between elements can be regarded as edges. Given this, it can be seen that the graph finds its application in a very wide range of genes, protein structures, electrical circuits, geography, architecture, etc.
- a specific state of the SNS can be expressed in a graph by regarding each user of the SNS as a node and regarding a friendly relationship between the users as a node or the like.
- the WWW link structure can also be represented by a graph.
- a real object when expressed as a graph, it is a natural request to evaluate whether the two graphs match or are similar. For example, if it can be evaluated that the chemical formula graph of one chemical and the chemical formula graph of another chemical are similar, it can be estimated that the efficacy of the two chemicals is similar.
- Japanese Patent Application Laid-Open No. 7-334366 has a hash table for storing hash values of all the subgraphs of the graph S, and has reached the subgraphs that have existed in the past. There is a description of storing a set of subgraphs of the reduction destination.
- this technique gives a hash value recursively, it can be applied to a directed acyclic graph, but cannot be applied to a more general graph including a loop.
- US Pat. No. 6,473,881 discloses a technique in which a transistor level design automation tool performs circuit design pattern matching through timing analysis, electrical rule check, noise analysis, and the like.
- this technique uses circuit-specific properties such as key nodes and is difficult to extend to general graph comparison.
- an object of the present invention is to provide a graph comparison technique that makes it possible to obtain the similarity between graphs having an extremely large number of nodes such as links of SNS and WWW in a reasonable calculation time.
- graph data to be compared is expressed using a known data structure for graph expression, such as matrix expression and list expression, and stored on a storage device such as a hard disk of a computer.
- Each node in the graph has an individual label, and the label is assumed to have discrete values.
- the label has four types of adenine, thymine, guanine, and cytosine.
- amino acids such as glycine, tryptophan, and isoleucine.
- hydrogen, helium, Lithium, beryllium, boron, carbon, nitrogen, oxygen or less about 100 at most.
- a unique value is assigned to the label of the node of the graph.
- this value is a fixed length bit string.
- the length of the bit string at this time is selected to be sufficiently larger than the number of digits sufficient to express the type of label. This is to reduce the possibility of hash collision described later.
- the system of the present invention sequentially visits the nodes of the graph for each graph by using existing graph search techniques such as depth-first search and width-first search.
- the system of the present invention calculates the bit string value by performing the calculation on the bit string label values of all the nodes adjacent to the node and the bit string label values of the nodes of the node.
- the system of the present invention performs a hash calculation from the calculated bit string value and the bit string label value originally possessed by the node to calculate another bit string label value, which is used as the label value of the node.
- the similarity can be obtained by calculating the ratio of the label value that matches another graph to the total number of nodes.
- a slightly more complicated calculation method of similarity will be described.
- one of a plurality of methods can be used as a method for calculating the label value of the own node from the label value of the adjacent node.
- One method is to calculate the XOR of the value obtained by XORing the label values of all adjacent nodes and the value obtained by bit-rotating the label value of the own node, and setting it as the label value of the own node.
- Another method sorts the label values of all adjacent nodes, and as a result, when the same label value appears continuously, it is represented by one label value, and the number of times it appears consecutively (referred to as a count value). to add.
- the obtained label value is bit-rotated by the count value, the result of XORing all the results, and the XOR of the value obtained by bit-rotating the label value of its own node are calculated, and this is used as the label of its own node. Value.
- the present invention can use various methods for calculating the label value of the own node from the label values of all adjacent nodes and the label value of the own node.
- the similarity between graphs is calculated based on the label value obtained as a result of the calculation using the hash value of one node and the label value of the adjacent node as the new label value of the one node.
- N the number of nodes in the graph
- the graph similarity can be calculated at high speed with a calculation amount of the order of O (N 2 ) or less.
- Other known graph similarity comparison techniques are in the exponential order or at least on the order of O (N 3 ), so that the present invention has a large speed-up effect, particularly when N is large.
- FIG. 6 is a diagram showing a flowchart of processing for calculating a label value of a node of a graph according to the present invention. It is a figure which shows the change of the label value of the node of a graph according to a calculation step. It is a figure which shows the notional flowchart which shows the process which calculates the new label value of an own node from the label of an own node, and the label set of an adjacent node.
- FIG. 1 there is shown a block diagram of computer hardware for realizing a system configuration and processing according to an embodiment of the present invention.
- a CPU 104 a main memory (RAM) 106, a hard disk drive (HDD) 108, a keyboard 110, a mouse 112, and a display 114 are connected to the system path 102.
- the CPU 104 is preferably based on a 32-bit or 64-bit architecture, for example, Intel Pentium (trademark) 4, Core (trademark) 2, Duo, Xeon (trademark), AMD Athlon (trademark). Etc. can be used.
- the main memory 106 preferably has a capacity of 2 GB or more.
- the hard disk drive 108 preferably has a capacity of, for example, 320 GB or more so that a large amount of graph data can be stored.
- the hard disk drive 108 stores an operating system in advance, although not individually illustrated.
- the operating system may be any compatible with the CPU 104, such as Linux (trademark), Microsoft Windows XP (trademark), Windows (trademark) 2000, Apple Computer Mac OS (trademark).
- the hard disk drive 108 also stores program language processing systems such as C, C ++, C #, and Java (trademark). This programming language processor is used to create and maintain modules or tools for processing graph data, which will be described later.
- the hard disk drive 108 may further include a text editor for writing source code for compiling with a program language processing system, and a development environment such as Eclipse (trademark).
- a text editor for writing source code for compiling with a program language processing system
- a development environment such as Eclipse (trademark).
- the keyboard 110 and the mouse 112 are loaded from the operating system or the hard disk drive 108 into the main memory 106, and are used to start a program (not shown) displayed on the display 114 and to input characters. .
- the display 114 is preferably a liquid crystal display, and an arbitrary resolution such as XGA (1024 ⁇ 768 resolution) or UXGA (1600 ⁇ 1200 resolution) can be used. Although not shown, the display 114 is used to display graph data to be processed and graph similarity.
- FIG. 2 is a functional block diagram of the processing module according to the present invention. These modules are written in an existing programming language such as C, C ++, C #, Java (trademark), stored in the hard disk drive 108 in an executable binary format, and in response to the operation of the mouse 112 or the keyboard 110, It is called up to the main memory 106 and executed by the operation of an operating system (not shown).
- C existing programming language
- C # C #
- Java trademark
- the graph data creation module 202 converts a given graph into a computer readable data structure.
- the following data structure is used as a graph g of the number of nodes n and the average number of adjacent nodes d.
- g.nodelist List of length n representing node list
- g.labellist List of length n representing node label list
- g.labellistx a list of length n that has the same data structure as g.labellist and is used as a buffer for writing labels
- g.adjacencymatrix the adjacency matrix of the graph. If there is a link between node i and node j, the matrix element (i, j) has 1; The size is n ⁇ n, but if you use a data structure called a sparse matrix that omits 0 elements, it will be n ⁇ d.
- the label value assigned to the label is common to the two graphs.
- the notation such as # 0101 represents a binary number.
- the label value is preferably a fixed number of bits. This is because, as will be described later, this is more convenient for calculations such as bit rotation, XOR, and radix sort.
- the configured graph data is loaded on the main memory 106 or stored in the hard disk 108. If the graph data is very large, the graph data may be first placed on the hard disk 108 and only the data necessary for the calculation may be loaded into the main memory.
- the graph search module 206 sequentially searches the graph, visits all nodes of one graph, refers to the adjacent nodes of the node, calls the hash calculation module 208 with the adjacent nodes, and labels the nodes. Process to update the value.
- FIG. 3 is a diagram showing a flowchart of processing of the graph search module 206.
- the graph search module 206 determines whether all nodes of the graph have been visited. This decision is actually made based on whether the end of g.nodelist has been reached.
- step 302 If it is determined in step 302 that all the nodes of the graph have not been visited, the graph search module 206 visits the next node according to g.nodelist in step 304. In the first stage of the graph search, in step 304, the head node is visited.
- the graph search module 206 calls the module 208 to calculate a label value by hash using information on the adjacent node of the node currently visited.
- the adjacent node is a node directly connected to the node via an edge.
- Such an adjacency relationship can be examined by referring to a value recorded in g.adjacencymatrix.
- the label value of the node and the label value of the adjacent node are used. These label values are obtained by referring to g.labellist. The calculation of the label value will be described in more detail later with reference to the flowcharts of FIGS.
- the graph search module 206 updates the label value of the node with the calculated label value.
- g.labellist may be directly overwritten, but more preferably, the update label is written in g.labellistx instead of in g.labellist. This is because when g.labellist is directly overwritten, the result changes depending on the node search order.
- steps 304, 306 and 308 are executed until all nodes have been visited.
- the label value rewriting process by visiting such a graph is preferably performed a plurality of times as shown in FIG.
- the accuracy of graph comparison generally increases, the accuracy does not necessarily increase as the number of times increases, and there is an optimum number of times.
- the graph similarity calculation module 210 calculates the similarity of two graphs based on the rewritten label value.
- the simplest method for calculating the similarity is to calculate the coincidence ratio of the rewritten label value between the two graphs. Later, the calculation of the similarity of a more complicated graph will also be described.
- FIG. 5 is a schematic flowchart for explaining the processing of the hash calculation module 208 with the adjacent node in more detail.
- the self node label 502 is a label value corresponding to the currently staying node. Yes, obtained from g.labellist. For convenience, I will write thisNodeLabel.
- the set of labels 504 adjacent to the currently staying node is obtained from g.labellist by referring to the values recorded in g.adjacencymatrix. Since there can be several in general, we will write NeighboringNodeLabels [].
- NewLabel Hash (ThisNodeLabel, NeighboringNodeLabels [])
- FIG. 6 is a diagram illustrating an embodiment of the process of FIG. That is, in the process of FIG. 6, in order to generate a new label 608 from the label 602 of the own node and the label set 604 of the adjacent node, the hash calculation block 606 is a block that rotates the label 602 of the own node by 1 bit. 610, a block 612 that XOR's the label set 604 of the adjacent node, and a block 614 that XOR's the output of block 610 and the output of block 612 as a new label 608.
- FIG. 8 shows a specific calculation example of the processing of FIG.
- the label of its own node is # 1000 and the labels of adjacent nodes are # 1110 and # 1100, respectively.
- the output of the block 612 becomes # 0010 by the XOR of # 1110 and # 1100
- the output of the block 610 becomes # 0001 by 1-bit rotation of # 1000
- the output of the block 614 taking the XOR of them is # 0011. This becomes the new label of the own node.
- FIG. 7 is a diagram showing still another embodiment of the process of FIG. That is, in the process of FIG. 7, in order to generate a new label 708 from the label 702 of the own node and the label set 704 of the adjacent node, the hash calculation block 706 is a block that rotates the label 702 of the own node by 1 bit.
- a block 712 that sorts the label set 704 of the adjacent node, a block 714 that counts the overlap of the sorted output, a block 716 that adds the count value, a block 718 that rotates the bit by the count, and a bit rotated
- the label is a fixed-width bit string, so the sort at block 712 advantageously uses a radix sort.
- FIG. 9 shows a specific calculation example of the processing of FIG.
- the label of its own node is # 1000 and the labels of adjacent nodes are # 0101, # 1100, and # 0101, respectively.
- the sort outputs of the block 712 are # 0101, # 0101, and # 1100.
- the count output of the block 714 is 2 for # 0101 and 1 for # 1100.
- block 716 adds a count output to the original label value.
- # 0101 is incremented by 2 to # 0111, and # 1100 is incremented by 1 and becomes # 1101.
- # 0111 is # 1101 with 2-bit rotation
- # 1101 is # 1011 with 1-bit rotation.
- the block 720 calculates an XOR of # 1101 and # 1011 which are bit-rotated values, and outputs # 0110.
- the block 710 outputs # 0001, which is 1-bit rotation of # 1000 of the label of its own node. Then, the block 722 calculates the XOR of # 0110 output from the block 720 and # 0001 output from the block 710, and the result # 0111 becomes the new label of its own node.
- NeighboringNodeLabels [] may be sorted and then arranged and regarded as one number, and a remainder obtained by dividing by an appropriate prime number P1 may be NewLabel.
- the graph data is stored.
- a binary label value having a predetermined number of bits corresponding to the label is given to the nodes of the graph by the method described above.
- r max is the number of repetitions for hash calculation. Depending on the case, r max is selected to be about 3 to 5.
- I is an h ⁇ h unit matrix.
- G i r NH (G i r-1 )
- G i r represents a graph having the label value of the r-th result of the hash calculation, not G i to the power of r.
- NH () is a function or subroutine that performs the processing of the flowchart of FIG.
- the algorithm for performing the hash calculation with the adjacent node in this case is not limited to this. For example, the algorithm shown in FIG. 7 is assumed.
- V i r is the node list of G i r .
- V i SORT RADIX_SORT (V i r ) causes the components of V i r to be stored in V i SORT in a radix-sorted sequence based on the label values.
- i is incremented by 1, and the flow returns to step 1012. That is, steps 1014, 1016, and 1018 are repeated until i reaches h.
- step 1012 If it is determined at step 1012 that i exceeds h, then go to step 1020 where G r-1 is removed.
- G r-1 the G r-1, G 1 r -1, ..., a generic symbol of G h r-1, short, G 1 r-1, ... , G h r-1.
- step 1022 i is set to 1. This means that the loop for i begins.
- COMPARE_LABELS () is a function that compares the labels of two graphs specified by the argument and returns the comparison result as a real number. The detailed processing contents will be described later with reference to the flowchart of FIG. To do. In the specific calculation, V i SORT and V j SORT calculated in step 1016 are used.
- step 1034 j is incremented by 1 and the process returns to step 1028, that is, steps 1030, 1032 and 1034 are repeated until j reaches h.
- step 1028 when it is determined in step 1028 that j exceeds h, i is incremented by 1 in step 1036 and the process goes to step 1024. If it is determined in step 1024 that i exceeds h, r is incremented by 1 in step 1038 and the processing returns to step 1006.
- the similarity matrix K is calculated by the following formula, and the process is completed.
- the ij component represents the similarity between the graph G i 0 and the graph G j 0 .
- step 1102 V a SORT, a V b SORT the two sorted node list of the graph, the order of the V a SORT to n a, the order of V b SORT and n b.
- step 1106 it is determined that i> n a or j> n b, go to step 1120, where, by the following equation, similarity k is calculated.
- step 1122 the value of k thus calculated is returned. In practice, this value is used in step 1032 which is the caller of COMPARE_LABELS ().
- the similarity between two nodes can be calculated by the present invention as follows. That is, the target nodes are A and B, respectively.
- the proportion of the updated label of A and the updated label of B can be obtained and used as the similarity of A and B.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Algebra (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
g.labellist:ノードラベルリストを表す長さnのリスト
g.labellistx:g.labellistと同じデータ構造で、ラベル書込み用バッファとして使用される、長さnのリスト
g.adjacencymatrix:グラフの隣接行列。ノードiとノードjにリンクがあれば行列の要素(i, j)は1、そうでなければ0を持つ。大きさはn × nになるが、0要素を省略するスパース行列というデータ構造を使えばn × dで済む。
ラベルLi ( i = 1, .., p )の各々に、異なるmビットのラベル値を付与することができる。
for ( i = 1; i <= p; i++ ) {
LHi = (P2 * i) % P1;
}
ここで、%は、割算の余りを計算する演算子である。
あるいは、これ以外の任意の乱数生成ルーチンを使用することもできる。
NewLabel = Hash(ThisNodeLabel,NeighboringNodeLabels[]) と計算される。
すなわち、ノードに隣接するラベルの集合をNeighboringNodeLabels[]とし、自ノードのラベルをThisNodeLabelとしたとき、
NewLabel = Hash(ThisNodeLabel,NeighboringNodeLabels[])のような引数をとる関数である。
NeighboringNodeLabels[]は、#0101, #1100, #0101からなるので、それらをソートして並べることにより、#010101011100
よって、NewLabel = #010101011100 mod P1 と計算される。
Gi r = NH(Gi r-1)
ここでGi rは、Giのr乗ではなく、ハッシュ計算のr回目の結果のラベル値をもつグラフを表す。また、NH()は、図3のフローチャートの処理を行う関数またはサブルーチンである。この場合の隣接ノードとのハッシュ計算を行うアルゴリズムは、これには限定されないが、例えば、図7に示すアルゴリズムであるとする。
ステップ1016では、Vi SORT = RADIX_SORT(Vi r)によって、Vi rの成分が、ラベル値に基づき基数ソートされた並びでVi SORTに格納される。ステップ1018では、iが1だけ増分されて、ステップ1012に戻る。すなわち、iがhに達するまで、ステップ1014、1016、及び1018が繰り返される。
Kij r = Kji r = COMPARE_LABELS(Gi r,Gj r)という計算がなされる。COMPARE_LABELS()は、その引数で指定した2つのグラフのラベルを比較して、その比較結果を実数で返す関数であり、その詳細な処理内容は、図11のフローチャートを参照して、後で説明する。また、その具体的計算で、ステップ1016で計算されたVi SORT、Vj SORTを使用する。
204・・・グラフ・データ
206・・・グラフ探索モジュール
208・・・隣接ノードのハッシュ計算モジュール
Claims (18)
- コンピュータの処理によって、各ノードに離散ラベルが付与された、2つのグラフの間の類似度を計算する方法であって、
前記2つのグラフの各々に、所与のノードと、その隣接ノードに、異なる離散ラベルに異なる値が対応するように、ラベル値を付与するステップと、
前記2のグラフにおいて、順次ノードを辿るステップと、
前記ノードを辿る間に、訪問しているノードのラベル値と、該訪問しているノードに隣接しているノードのラベル値とのハッシュ計算により新たなラベル値を計算して、該新たなラベル値で、該訪問しているノードのラベル値を更新するステップと、
前記2つのグラフのノードに付与されている、一致するラベル列の個数に基づき、前記2つのグラフの間の類似度を計算するステップを有する、
方法。 - 前記ラベル値が、固定幅のビット列である、請求項1に記載の方法。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビットシフトした値と、前記隣接しているノードのラベル値のXORした値とをXORして行われる、請求項2に記載の方法。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビット回転した値と、前記隣接しているノードのラベル値をソートし、重複度をカウントし、カウント値を加算し、カウント値分ビット回転し、XORした値とをXORして行われる、請求項2に記載の方法。
- 前記ソートが、基数ソートである、請求項4に記載の方法。
- 前記2つのグラフのうちの1つのグラフが、所与のグラフの第1のノードを含む第1の部分グラフであり、前記2つのグラフのうちのもう1つのグラフが、該所与のグラフの第2のノードを含む第2の部分グラフであり、
計算された前記2つのグラフの間の類似度が、該第1のノードと該第2のノードの間の類似度と見なされる、請求項1に記載の方法。 - コンピュータの処理によって、各ノードに離散ラベルが付与された、2つのグラフの間の類似度を計算するシステムであって、
前記2つのグラフの各々に、所与のノードと、その隣接ノードに、異なる離散ラベルに異なる値が対応するように、ラベル値を付与する手段と、
前記2のグラフにおいて、順次ノードを辿る手段と、
前記ノードを辿る間に、訪問しているノードのラベル値と、該訪問しているノードに隣接しているノードのラベル値とのハッシュ計算により新たなラベル値を計算して、該新たなラベル値で、該訪問しているノードのラベル値を更新する手段と、
前記2つのグラフのノードに付与されている、一致するラベル列の個数に基づき、前記2つのグラフの間の類似度を計算する手段を有する、
システム。 - 前記ラベル値が、固定幅のビット列である、請求項7に記載のシステム。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビットシフトした値と、前記隣接しているノードのラベル値のXORした値とをXORして行われる、請求項8に記載のシステム。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビット回転した値と、前記隣接しているノードのラベル値をソートし、重複度をカウントし、カウント値を加算し、カウント値分ビット回転し、XORした値とをXORして行われる、請求項8に記載のシステム。
- 前記ソートが、基数ソートである、請求項10に記載のシステム。
- 前記2つのグラフのうちの1つのグラフが、所与のグラフの第1のノードを含む第1の部分グラフであり、前記2つのグラフのうちのもう1つのグラフが、該所与のグラフの第2のノードを含む第2の部分グラフであり、
計算された前記2つのグラフの間の類似度が、該第1のノードと該第2のノードの間の類似度と見なされる、請求項7に記載のシステム。 - コンピュータの処理によって、各ノードに離散ラベルが付与された、2つのグラフの間の類似度を計算するプログラムであって、
前記コンピュータに、
前記2つのグラフの各々に、所与のノードと、その隣接ノードに、異なる離散ラベルに異なる値が対応するように、ラベル値を付与するステップと、
前記2のグラフにおいて、順次ノードを辿るステップと、
前記ノードを辿る間に、訪問しているノードのラベル値と、該訪問しているノードに隣接しているノードのラベル値とのハッシュ計算により新たなラベル値を計算して、該新たなラベル値で、該訪問しているノードのラベル値を更新するステップと、
前記2つのグラフのノードに付与されている、一致するラベル列の個数に基づき、前記2つのグラフの間の類似度を計算するステップを実行させる、
プログラム。 - 前記ラベル値が、固定幅のビット列である、請求項13に記載のプログラム。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビットシフトした値と、前記隣接しているノードのラベル値のXORした値とをXORして行われる、請求項14に記載のプログラム。
- 前記ハッシュ計算が、前記訪問しているノードのラベル値のビット回転した値と、前記隣接しているノードのラベル値をソートし、重複度をカウントし、カウント値を加算し、カウント値分ビット回転し、XORした値とをXORして行われる、請求項14に記載のプログラム。
- 前記ソートが、基数ソートである、請求項16に記載のプログラム。
- 前記2つのグラフのうちの1つのグラフが、所与のグラフの第1のノードを含む第1の部分グラフであり、前記2つのグラフのうちのもう1つのグラフが、該所与のグラフの第2のノードを含む第2の部分グラフであり、
計算された前記2つのグラフの間の類似度が、該第1のノードと該第2のノードの間の類似度と見なされる、請求項13に記載のプログラム。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201080010259.4A CN102341802B (zh) | 2009-06-30 | 2010-06-09 | 图的相似度计算系统和方法 |
CA2757461A CA2757461C (en) | 2009-06-30 | 2010-06-09 | Graph similarity calculation system, method and program |
US13/377,445 US8588531B2 (en) | 2009-06-30 | 2010-06-09 | Graph similarity calculation system, method and program |
EP10793976.1A EP2442239A4 (en) | 2009-06-30 | 2010-06-09 | SYSTEM, METHOD AND PROGRAM FOR CALCULATING DIAGRAM MOLECULARITIES |
JP2011520851A JP5306461B2 (ja) | 2009-06-30 | 2010-06-09 | グラフの類似度計算システム、方法及びプログラム |
US14/039,805 US9122771B2 (en) | 2009-06-30 | 2013-09-27 | Graph similarity calculation system, method and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-155060 | 2009-06-30 | ||
JP2009155060 | 2009-06-30 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/377,445 A-371-Of-International US8588531B2 (en) | 2009-06-30 | 2010-06-09 | Graph similarity calculation system, method and program |
US14/039,805 Continuation US9122771B2 (en) | 2009-06-30 | 2013-09-27 | Graph similarity calculation system, method and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011001806A1 true WO2011001806A1 (ja) | 2011-01-06 |
Family
ID=43410885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/059795 WO2011001806A1 (ja) | 2009-06-30 | 2010-06-09 | グラフの類似度計算システム、方法及びプログラム |
Country Status (6)
Country | Link |
---|---|
US (2) | US8588531B2 (ja) |
EP (1) | EP2442239A4 (ja) |
JP (1) | JP5306461B2 (ja) |
CN (1) | CN102341802B (ja) |
CA (1) | CA2757461C (ja) |
WO (1) | WO2011001806A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9892532B2 (en) | 2014-08-25 | 2018-02-13 | Fujitsu Limited | Apparatus and method for generating a shortest-path tree in a graph |
JP2019144939A (ja) * | 2018-02-22 | 2019-08-29 | Kddi株式会社 | 情報処理装置、情報処理方法、及びプログラム |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8588531B2 (en) * | 2009-06-30 | 2013-11-19 | International Business Machines Corporation | Graph similarity calculation system, method and program |
US9805307B2 (en) | 2012-04-24 | 2017-10-31 | International Business Machines Corporation | Determining soft graph correspondence between node sets based on feature representations |
CN102750263B (zh) * | 2012-05-31 | 2014-10-22 | 常州工学院 | 互联网超链接网络图数据的简化方法 |
BR102012024729B1 (pt) | 2012-09-27 | 2020-05-19 | Mahle Int Gmbh | anel de controle de óleo de três peças para motores de combustão interna, elemento expansor e elemento anelar |
CN106598970B (zh) * | 2015-10-14 | 2020-04-24 | 阿里巴巴集团控股有限公司 | 一种标签确定方法、设备和系统 |
US10803053B2 (en) | 2015-12-03 | 2020-10-13 | Hewlett Packard Enterprise Development Lp | Automatic selection of neighbor lists to be incrementally updated |
US10410113B2 (en) * | 2016-01-14 | 2019-09-10 | Preferred Networks, Inc. | Time series data adaptation and sensor fusion systems, methods, and apparatus |
CN108073949A (zh) * | 2017-12-28 | 2018-05-25 | 合肥学院 | 一种绘画相似度比对系统 |
US11853713B2 (en) | 2018-04-17 | 2023-12-26 | International Business Machines Corporation | Graph similarity analytics |
US11809986B2 (en) | 2020-05-15 | 2023-11-07 | International Business Machines Corporation | Computing graph similarity via graph matching |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07334366A (ja) | 1994-06-07 | 1995-12-22 | Fujitsu Ltd | グラフリダクション機構の最適化方法および装置 |
US6473881B1 (en) | 2000-10-31 | 2002-10-29 | International Business Machines Corporation | Pattern-matching for transistor level netlists |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7376643B2 (en) * | 2004-05-14 | 2008-05-20 | Microsoft Corporation | Method and system for determining similarity of objects based on heterogeneous relationships |
US7472121B2 (en) * | 2005-12-15 | 2008-12-30 | International Business Machines Corporation | Document comparison using multiple similarity measures |
WO2008083447A1 (en) | 2007-01-12 | 2008-07-17 | Synetek Systems Pty Ltd | Method and system of obtaining related information |
US7788254B2 (en) * | 2007-05-04 | 2010-08-31 | Microsoft Corporation | Web page analysis using multiple graphs |
US8588531B2 (en) * | 2009-06-30 | 2013-11-19 | International Business Machines Corporation | Graph similarity calculation system, method and program |
-
2010
- 2010-06-09 US US13/377,445 patent/US8588531B2/en active Active
- 2010-06-09 CN CN201080010259.4A patent/CN102341802B/zh active Active
- 2010-06-09 EP EP10793976.1A patent/EP2442239A4/en not_active Withdrawn
- 2010-06-09 CA CA2757461A patent/CA2757461C/en active Active
- 2010-06-09 WO PCT/JP2010/059795 patent/WO2011001806A1/ja active Application Filing
- 2010-06-09 JP JP2011520851A patent/JP5306461B2/ja not_active Expired - Fee Related
-
2013
- 2013-09-27 US US14/039,805 patent/US9122771B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07334366A (ja) | 1994-06-07 | 1995-12-22 | Fujitsu Ltd | グラフリダクション機構の最適化方法および装置 |
US6473881B1 (en) | 2000-10-31 | 2002-10-29 | International Business Machines Corporation | Pattern-matching for transistor level netlists |
Non-Patent Citations (5)
Title |
---|
"Proc. of the Nineth IEEE International Conference on Data Mining (ICDM2009), [online], Edited by W. Wang et al. IEEE, December 6-9, 2009", 5 July 2010, article HIDO, SHOHEI ET AL.: "A Linear-Time Graph Kernel", pages: 179 - 188, XP031585332 * |
See also references of EP2442239A4 |
SHIGEO ABE: "Introduction of Support Vector Machines for Pattern Classification-VI : Current Topics", SYSTEM/SEIGYO/JOHO, vol. 53, no. 5, 15 May 2009 (2009-05-15), pages 41 - 46, XP008150377 * |
SHOHEI HIDO ET AL.: "A Fast Graph Kernel Using Neighborhood Hash", DAI 12 KAI INFORMATION- BASED INDUCTION SCIENCES (IBIS) WORKSHOP (IBIS 2009), 19 October 2009 (2009-10-19), XP008150380, Retrieved from the Internet <URL:http://ibis-workshop.org/2009/pdf-ippan/82.pdf> [retrieved on 20100705] * |
TAKAHISA WADA ET AL.: "Bubun Kozo ni Motozuku Kozo Ruijisei o Mochiita Tokucho Chushutsu System to Sono Oyo", JOURNAL OF THE DBSJ, vol. 7, no. 1, 27 June 2008 (2008-06-27), pages 187 - 192, XP008150379 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9892532B2 (en) | 2014-08-25 | 2018-02-13 | Fujitsu Limited | Apparatus and method for generating a shortest-path tree in a graph |
JP2019144939A (ja) * | 2018-02-22 | 2019-08-29 | Kddi株式会社 | 情報処理装置、情報処理方法、及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
CN102341802A (zh) | 2012-02-01 |
EP2442239A4 (en) | 2015-06-03 |
CA2757461A1 (en) | 2011-01-06 |
JPWO2011001806A1 (ja) | 2012-12-13 |
JP5306461B2 (ja) | 2013-10-02 |
US20140032490A1 (en) | 2014-01-30 |
US20120093417A1 (en) | 2012-04-19 |
US9122771B2 (en) | 2015-09-01 |
CN102341802B (zh) | 2014-05-28 |
US8588531B2 (en) | 2013-11-19 |
CA2757461C (en) | 2023-05-16 |
EP2442239A1 (en) | 2012-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5306461B2 (ja) | グラフの類似度計算システム、方法及びプログラム | |
JP5315291B2 (ja) | グラフにおけるノードの間の類似度を計算するための方法、プログラム、およびシステム | |
Li et al. | Fast and accurate long-read alignment with Burrows–Wheeler transform | |
Li et al. | Fast and accurate short read alignment with Burrows–Wheeler transform | |
US8694979B2 (en) | Efficient egonet computation in a weighted directed graph | |
Rasheed et al. | A map-reduce framework for clustering metagenomes | |
Alanko et al. | Buffering updates enables efficient dynamic de Bruijn graphs | |
Djukanovic et al. | An A⁎ search algorithm for the constrained longest common subsequence problem | |
Aberer et al. | Rapid forward-in-time simulation at the chromosome and genome level | |
Blum et al. | Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem | |
JP6367959B2 (ja) | 部分文字列位置検出装置、部分文字列位置検出方法及びプログラム | |
CN107305522A (zh) | 用于对应用程序的重复崩溃进行检测的装置和方法 | |
Patra et al. | Motif discovery in biological network using expansion tree | |
Alnafisah | An Algorithmic Solution for Storing Big Data on the DNA Sequence | |
Aborot | An Oracle design for Grover’s quantum search algorithm for solving the exact string matching problem | |
Balewski et al. | Efficient Quantum Counting and Quantum Content-Addressable Memory for DNA similarity | |
US11915792B2 (en) | Method and a system for profiling of metagenome | |
Wang et al. | A distributed storage MLCS algorithm with time efficient upper bound and precise lower bound | |
JP4082615B2 (ja) | 判定装置、判定方法、及びプログラム | |
US20230394141A1 (en) | Indexing Software Packages and Detecting Malicious or Potentially Harmful Code using API-call N-Grams | |
Varma et al. | Hardware acceleration of de novo genome assembly | |
Cardona et al. | Path lengths in tree-child time consistent hybridization networks | |
Pei et al. | Transition adjacency relation computation based on unfolding: Potentials and challenges | |
Sacomoto | Efficient algorithms for de novo assembly of alternative splicing events from RNA-seq data | |
Qiu | Algorithmic Foundations of Genome Graph Construction and Comparison |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080010259.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10793976 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 7036/CHENP/2011 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2757461 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011520851 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13377445 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010793976 Country of ref document: EP |