JP2014134901A

JP2014134901A - Updating method, update program and collation processing device

Info

Publication number: JP2014134901A
Application number: JP2013001412A
Authority: JP
Inventors: Tatsuya Asai; 達哉浅井; Shinichiro Tako; 真一郎多湖; Hiroaki Morikawa; 裕章森川; Takashi Kato; 孝河東; Hiroya Inakoshi; 宏弥稲越
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-01-08
Filing date: 2013-01-08
Publication date: 2014-07-24
Anticipated expiration: 2033-01-08
Also published as: JP5998942B2

Abstract

PROBLEM TO BE SOLVED: To fasten collation processing of a query using an automaton.SOLUTION: When updating an automaton A for each portion required for collation step by step from an initial automaton A0, a collation processing device 100 counts the cumulative number of times of detection each time a path satisfying a condition of a query Q is detected. In the case where the path satisfying the condition of the query Q is detected, the collation processing device 100 newly adds a path collation state specifying the detected path to the automaton A. In the case where there is no space in the automaton A, the collation processing device 100 cancels a path collation state for specifying the path with the minimum number of times of detection from the automaton A and newly adds a path collation state to the automaton A.

Description

本発明は、更新方法、更新プログラム、および照合処理装置に関する。 The present invention relates to an update method, an update program, and a verification processing device.

従来、入力ストリームに対して、オートマトンを用いて、パス・キーワードを含むクエリの照合を行う技術がある。オートマトンは、初期のオートマトンから照合に必要な部分ごとに段階的に更新される。オートマトンの更新では、オートマトンを記憶する記憶領域の使用量を抑制するため、オートマトンに規定された状態が削除されることがある。 Conventionally, there is a technique for matching a query including a path keyword to an input stream using an automaton. The automaton is updated step by step from the initial automaton for each part required for verification. In the automaton update, the state defined in the automaton may be deleted in order to suppress the use amount of the storage area for storing the automaton.

関連する技術としては、例えば、複数の分岐ＸＰａｔｈ式ごとにターゲットパスおよび述語それぞれの絶対パスおよびスコープパスを抽出し、各パスをノードとして部分的に等しいノードまたはノード間のリンクを共有可能部分として共有化するものがある。また、例えば、ノードまたはノード間のリンクを共有化して、複数の分岐ＸＰａｔｈ式から１つの共有化インデックスを構築し、共有化インデックスのノードごとに設定された参照カウンタをもとに、分岐ＸＰａｔｈ式の登録および削除を行う技術がある。また、例えば、構造化文書の構造情報から検索式で指定された要素が出現しなくなる条件を求め、中断条件が満たされた場合には検索オートマトンの状態遷移を削除し、すべての有効な状態遷移がなくなった場合には解析を終了する技術がある。 As a related technique, for example, an absolute path and a scope path of a target path and a predicate are extracted for each of a plurality of branch XPath expressions, and a partially equal node or a link between nodes is shared as a shareable part. There is something to share. Further, for example, a node or a link between nodes is shared to construct one shared index from a plurality of branch XPath expressions, and the branch XPath expression is based on a reference counter set for each node of the shared index. There is a technology to register and delete. In addition, for example, the condition that the element specified by the search expression does not appear is obtained from the structure information of the structured document, and when the interruption condition is satisfied, the state transition of the search automaton is deleted and all the valid state transitions are deleted. There is a technique for ending the analysis when there is no more.

特開２００７−２４９７２４号公報JP 2007-249724 A 国際公開第２００６／０８０４６９International Publication No. 2006/080469

しかしながら、上述した従来技術では、オートマトンから削除した状態と同一の状態に遷移する場合に行われる、削除した状態と同一の状態を再作成してオートマトンに追加する更新処理の処理量が増大して、クエリの照合処理にかかる処理時間が増大してしまう。 However, in the above-described conventional technology, the amount of update processing that is performed when transitioning to the same state as the state deleted from the automaton and re-creating the same state as the deleted state and adding to the automaton is increased. The processing time required for query matching processing increases.

１つの側面では、本発明は、オートマトンを用いたクエリの照合処理の高速化を図ることを目的とする。 In one aspect, an object of the present invention is to speed up a query matching process using an automaton.

本発明の一側面によれば、オートマトンを用いて、タグにより階層化された入力ストリームに対して、キーワードとキーワードに対応するパスの条件とを含むクエリの照合を行うコンピュータが、入力ストリームから条件を満たすパスを検出するたびに、条件を満たすパスごとの検出回数を計数し、初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定され、条件を満たすパスを示すパス照合状態が複数規定されたオートマトンに、条件を満たすパスを示す新たなパス照合状態を追加する場合、計数した条件を満たすパスごとの検出回数に基づいて、オートマトンを更新する更新方法、更新プログラム、および照合処理装置が提案される。 According to one aspect of the present invention, a computer that performs matching of a query including a keyword and a path condition corresponding to the keyword with respect to an input stream hierarchized by tags using an automaton, Each time a path that satisfies the condition is detected, the number of detections for each path that satisfies the condition is counted, and an initial state, a start state that indicates a start tag symbol, and an end state that indicates an end tag symbol are defined. An update method for updating the automaton based on the number of detections for each path satisfying the counted condition when adding a new path matching state indicating a path satisfying the condition to the automaton in which a plurality of path matching states indicating An update program and a verification processing device are proposed.

本発明の一側面によれば、オートマトンを用いたクエリの照合処理の高速化を図ることができるという効果を奏する。 According to one aspect of the present invention, it is possible to increase the speed of query matching processing using an automaton.

図１は、本実施の形態にかかるオートマトン更新例を示す説明図である。FIG. 1 is an explanatory diagram showing an example of automaton update according to the present embodiment. 図２は、入力ストリームＳとクエリＱとの照合例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of collation between the input stream S and the query Q. 図３は、入力ストリームＳの変換例を示す説明図である。FIG. 3 is an explanatory diagram illustrating an example of conversion of the input stream S. 図４は、入力ストリームＳのパスを示す説明図である。FIG. 4 is an explanatory diagram showing the path of the input stream S. 図５は、パスＩＤ管理テーブルの一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the path ID management table. 図６は、頻度管理テーブルの一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of the frequency management table. 図７は、初期オートマトンＡ０の構築例を示す説明図である。FIG. 7 is an explanatory diagram showing a construction example of the initial automaton A0. 図８は、ノード構造体の一例を示す説明図である。FIG. 8 is an explanatory diagram illustrating an example of a node structure. 図９は、初期オートマトンＡ０における各状態のノード構造体Ｎの一例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of the node structure N in each state in the initial automaton A0. 図１０は、照合処理装置１００の第１の動作例を示す説明図（その１）である。FIG. 10 is an explanatory diagram (part 1) of the first operation example of the verification processing device 100. 図１１は、照合処理装置１００の第１の動作例を示す説明図（その２）である。FIG. 11 is an explanatory diagram (part 2) of the first operation example of the collation processing device 100. 図１２は、照合処理装置１００の第１の動作例を示す説明図（その３）である。FIG. 12 is an explanatory diagram (part 3) illustrating a first operation example of the collation processing device 100. 図１３は、照合処理装置１００の第１の動作例を示す説明図（その４）である。FIG. 13 is an explanatory diagram (part 4) illustrating a first operation example of the collation processing device 100. 図１４は、照合処理装置１００の第１の動作例を示す説明図（その５）である。FIG. 14 is an explanatory diagram (No. 5) illustrating a first operation example of the matching processing device 100. 図１５は、照合処理装置１００の第１の動作例を示す説明図（その６）である。FIG. 15 is an explanatory diagram (No. 6) illustrating a first operation example of the matching processing device 100. 図１６は、照合処理装置１００の第１の動作例を示す説明図（その７）である。FIG. 16 is an explanatory diagram (No. 7) illustrating a first operation example of the matching processing device 100. 図１７は、照合処理装置１００の第１の動作例を示す説明図（その８）である。FIG. 17 is an explanatory diagram (No. 8) illustrating a first operation example of the matching processing device 100. 図１８は、照合処理装置１００の第１の動作例を示す説明図（その９）である。FIG. 18 is an explanatory diagram (No. 9) illustrating a first operation example of the matching processing device 100. 図１９は、照合処理装置１００の第１の動作例を示す説明図（その１０）である。FIG. 19 is an explanatory diagram (No. 10) illustrating a first operation example of the matching processing device 100. 図２０は、照合処理装置１００の第１の動作例を示す説明図（その１１）である。FIG. 20 is an explanatory diagram (part 11) illustrating the first operation example of the matching processing device 100. 図２１は、照合処理装置１００の第１の動作例を示す説明図（その１２）である。FIG. 21 is an explanatory diagram (No. 12) illustrating a first operation example of the matching processing device 100. 図２２は、照合処理装置１００の第１の動作例を示す説明図（その１３）である。FIG. 22 is an explanatory diagram (No. 13) illustrating a first operation example of the matching processing device 100. 図２３は、照合処理装置１００の第１の動作例を示す説明図（その１４）である。FIG. 23 is an explanatory diagram (No. 14) illustrating a first operation example of the matching processing device 100. 図２４は、照合処理装置１００の第１の動作例を示す説明図（その１５）である。FIG. 24 is an explanatory diagram (No. 15) illustrating a first operation example of the matching processing device 100. 図２５は、照合処理装置１００の第１の動作例を示す説明図（その１６）である。FIG. 25 is an explanatory diagram (No. 16) illustrating a first operation example of the matching processing device 100. 図２６は、照合処理装置１００の第１の動作例を示す説明図（その１７）である。FIG. 26 is an explanatory diagram (No. 17) illustrating a first operation example of the matching processing device 100. 図２７は、照合処理装置１００の第１の動作例を示す説明図（その１８）である。FIG. 27 is an explanatory diagram (No. 18) illustrating a first operation example of the matching processing device 100. 図２８は、照合処理装置１００の第１の動作例を示す説明図（その１９）である。FIG. 28 is an explanatory diagram (No. 19) illustrating a first operation example of the matching processing device 100. 図２９は、照合処理装置１００の第１の動作例を示す説明図（その２０）である。FIG. 29 is an explanatory diagram (No. 20) illustrating a first operation example of the matching processing device 100. 図３０は、照合処理装置１００の第１の動作例を示す説明図（その２１）である。FIG. 30 is an explanatory diagram (No. 21) illustrating a first operation example of the matching processing device 100. 図３１は、照合処理装置１００の第２の動作例を示す説明図（その１）である。FIG. 31 is an explanatory diagram (part 1) illustrating a second operation example of the matching processing device 100. 図３２は、照合処理装置１００の第２の動作例を示す説明図（その２）である。FIG. 32 is an explanatory diagram (part 2) of the second operation example of the collation processing device 100. 図３３は、照合処理装置１００の第２の動作例を示す説明図（その３）である。FIG. 33 is an explanatory diagram (part 3) of the second operation example of the matching processing device 100. 図３４は、照合処理装置１００の第２の動作例を示す説明図（その４）である。FIG. 34 is an explanatory diagram (part 4) of the second operation example of the matching processing device 100. 図３５は、照合処理装置１００の第２の動作例を示す説明図（その５）である。FIG. 35 is an explanatory diagram (No. 5) illustrating a second operation example of the matching processing device 100. 図３６は、照合処理装置１００の第２の動作例を示す説明図（その６）である。FIG. 36 is an explanatory diagram (No. 6) illustrating a second operation example of the matching processing device 100. 図３７は、照合処理装置１００の第３の動作例を示す説明図（その１）である。FIG. 37 is an explanatory diagram (No. 1) illustrating a third operation example of the matching processing device 100. 図３８は、照合処理装置１００の第３の動作例を示す説明図（その２）である。FIG. 38 is an explanatory diagram (part 2) of the third operation example of the collation processing device 100. 図３９は、照合処理装置１００の第３の動作例を示す説明図（その３）である。FIG. 39 is an explanatory diagram (part 3) of the third operation example of the collation processing device 100. 図４０は、照合処理装置１００の第３の動作例を示す説明図（その４）である。FIG. 40 is an explanatory diagram (part 4) of the third operation example of the collation processing device 100. 図４１は、照合処理装置１００の第３の動作例を示す説明図（その５）である。FIG. 41 is an explanatory diagram (No. 5) illustrating a third operation example of the matching processing device 100. 図４２は、照合処理装置１００のハードウェア構成例を示すブロック図である。FIG. 42 is a block diagram illustrating a hardware configuration example of the collation processing device 100. 図４３は、照合処理装置１００の機能的構成例を示すブロック図である。FIG. 43 is a block diagram illustrating a functional configuration example of the collation processing device 100. 図４４は、照合処理装置１００の更新処理の処理手順の一例を示すフローチャートである。FIG. 44 is a flowchart illustrating an example of a processing procedure of update processing of the verification processing device 100. 図４５は、図４４に示した初期オートマトン構築処理（ステップＳ４４０３）の処理手順の一例を示すフローチャートである。FIG. 45 is a flowchart showing an example of the processing procedure of the initial automaton construction process (step S4403) shown in FIG. 図４６は、図４４に示した第１の走査処理（ステップＳ４４０９）の処理手順の一例を示すフローチャートである。FIG. 46 is a flowchart illustrating an example of a processing procedure of the first scanning process (step S4409) illustrated in FIG. 図４７は、図４６に示した第１の更新処理（ステップＳ４６０６）の処理手順の一例を示すフローチャートである。FIG. 47 is a flowchart illustrating an example of a processing procedure of the first update processing (step S4606) illustrated in FIG. 図４８は、図４７に示した削除処理（ステップＳ４７０２）の処理手順の一例を示すフローチャートである。FIG. 48 is a flowchart illustrating an example of a processing procedure of the deletion process (step S4702) illustrated in FIG. 図４９は、図４６に示した第２の更新処理（ステップＳ４６０７）の処理手順の一例を示すフローチャートである。FIG. 49 is a flowchart illustrating an example of a processing procedure of the second update process (step S4607) illustrated in FIG. 図５０は、図４６に示した累計回数更新処理（ステップＳ４６０９）の処理手順の一例を示すフローチャートである。FIG. 50 is a flowchart illustrating an example of a processing procedure of the cumulative number updating process (step S4609) illustrated in FIG. 図５１は、図４４に示した第２の走査処理（ステップＳ４４１０）の処理手順の一例を示すフローチャートである。FIG. 51 is a flowchart showing an example of the processing procedure of the second scanning process (step S4410) shown in FIG. 図５２は、図４４に示した第３の走査処理（ステップＳ４４１１）の処理手順の一例を示すフローチャートである。FIG. 52 is a flowchart showing an example of the processing procedure of the third scanning process (step S4411) shown in FIG.

以下に添付図面を参照して、本発明にかかる更新方法、更新プログラム、および照合処理装置の実施の形態を詳細に説明する。なお、本実施の形態では、オートマトンを総称する符号として「Ａ」を用いるが、更新の状態を特定するために、「Ａ０」，「Ａ１」，「Ａ２」…「Ａｎ」の符号を付すことがある。「Ａ０」は初期のオートマトンを示す符号であり、「Ａ１」は１回目の更新後のオートマトンを示す符号であり、「Ａ２」は２回目の更新後のオートマトンを示す符号であり、「Ａｎ」はｎ回目の更新後のオートマトンを示す符号である。 Exemplary embodiments of an update method, an update program, and a verification processing device according to the present invention will be described below in detail with reference to the accompanying drawings. In the present embodiment, “A” is used as a code that collectively refers to automata, but “A0”, “A1”, “A2”... “An” are added in order to specify the update state. There is. “A0” is a code indicating the initial automaton, “A1” is a code indicating the automaton after the first update, “A2” is a code indicating the automaton after the second update, and “An” Is a code indicating the automaton after the nth update.

＜オートマトン更新例＞
図１は、本実施の形態にかかるオートマトン更新例を示す説明図である。図１に示すように、照合処理装置１００は、入力ストリームＳに対して、オートマトンＡを用いて、クエリＱの照合を行うコンピュータである。照合処理装置１００は、クエリＱの照合を行う際に、クエリＱの照合に用いる各種状態がオートマトンＡに規定されていなければ、クエリＱの照合に用いる各種状態を新たにオートマトンＡに追加することにより、オートマトンＡを更新する。 <Example of automaton update>
FIG. 1 is an explanatory diagram showing an example of automaton update according to the present embodiment. As shown in FIG. 1, the matching processing device 100 is a computer that performs query Q matching on an input stream S using an automaton A. When collation processing apparatus 100 performs collation of query Q, if various states used for collation of query Q are not defined in automaton A, various states used for collation of query Q are newly added to automaton A. Thus, the automaton A is updated.

ここで、入力ストリームＳとは、タグにより階層化されたデータ列であって、発生源Ｇとなるコンピュータからネットワーク１０１を介して受信されるデータ列である。入力ストリームＳとは、例えば、ＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）データやＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）データである。以下の説明では、ＸＭＬデータを例に挙げて説明する。 Here, the input stream S is a data string that is hierarchized by tags, and is a data string that is received from the computer that is the source G via the network 101. The input stream S is, for example, XML (Extensible Markup Language) data or HTML (HyperText Markup Language) data. In the following description, XML data will be described as an example.

タグとは、要素の集合が存在する位置を示す情報であって、例えば、開始タグや終了タグである。位置とは、例えば、入力ストリームＳ内の階層である。開始タグとは、要素の集合の開始位置を示す情報であって、図１の例では、開始タグ＜ｎｅｗｓ＞や開始タグ＜ｎａｍｅ＞などである。終了タグとは、要素の集合の終了位置を示す情報であって、図１の例では、終了タグ＜／ｎｅｗｓ＞や終了タグ＜／ｎａｍｅ＞などである。要素とは、開始タグと終了タグとの間に含まれる文字列や開始タグや終了タグである。 A tag is information indicating a position where a set of elements exists, and is, for example, a start tag or an end tag. The position is, for example, a hierarchy in the input stream S. The start tag is information indicating the start position of a set of elements, and in the example of FIG. 1, is a start tag <news>, a start tag <name>, or the like. The end tag is information indicating the end position of the set of elements. In the example of FIG. 1, the end tag is an end tag </ news>, an end tag </ name>, or the like. An element is a character string, a start tag, or an end tag included between a start tag and an end tag.

クエリＱとは、例えば、キーワードとキーワードに対応するパスの条件とを含む情報である。クエリＱは、図１の例では、クエリＤＢ１０２に記憶される。キーワードとは、入力ストリームＳ内の文字列と照合される文字列であって、図１の例では、文字列「Ｂｏｂ」である。パスとは、例えば、入力ストリームＳの最上階層から任意のタグが示す階層までの経路である。以下の説明では、パスは、最上階層から任意のタグが示す階層までの経路上にある各タグのタグ内文字列を「／」で接続した形式で表記される。図１の例では、開始タグ＜ｎｅｗｓ＞のパスは、最上階層であるルートが示す第０階層から開始タグ＜ｎｅｗｓ＞が示す第１階層までの経路上の各タグのタグ内文字列を「／」で接続した「／ｎｅｗｓ」である。同様に、開始タグ＜ｄａｔｅ＞のパスは、最上階層であるルートが示す第０階層⇒開始タグ＜ｎｅｗｓ＞が示す第１階層⇒開始タグ＜ｄａｔｅ＞が示す第２階層の経路上の各タグのタグ内文字列を「／」で接続した「／ｎｅｗｓ／ｄａｔｅ」である。 The query Q is information including, for example, a keyword and a path condition corresponding to the keyword. The query Q is stored in the query DB 102 in the example of FIG. The keyword is a character string that is collated with the character string in the input stream S, and is the character string “Bob” in the example of FIG. The path is, for example, a path from the top layer of the input stream S to a layer indicated by an arbitrary tag. In the following description, the path is expressed in a format in which the character strings in the tags of each tag on the route from the top layer to the layer indicated by an arbitrary tag are connected by “/”. In the example of FIG. 1, the path of the start tag <news> is a character string in the tag of each tag on the route from the 0th hierarchy indicated by the root that is the highest hierarchy to the 1st hierarchy indicated by the start tag <news>. “/ News” connected with “/”. Similarly, the path of the start tag <date> includes each tag on the route of the second hierarchy indicated by the start hierarchy <date> of the 0th hierarchy indicated by the root which is the highest hierarchy → the start tag <news> This is “/ news / date” in which the character strings in the tag are connected by “/”.

パスの条件とは、キーワードが存在する階層を示すパスの条件であって、図１の例では、「／ｎｅｗｓ／／ｎａｍｅ」である。「／ｎｅｗｓ／／ｎａｍｅ」は、第０階層⇒第１階層「ｎｅｗｓ」⇒任意の階層「ｎａｍｅ」のパスを示す。「／／」は、第１階層「ｎｅｗｓ」から任意の階層「ｎａｍｅ」までの経路上に、どのような階層があってもよいことを示す。したがって、例えば、パス「／ｎｅｗｓ／ｎａｍｅ」やパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」やパス「／ｎｅｗｓ／ｎａｍｅ／ｗｒｉｔｅ／ｎａｍｅ」などが、クエリＱ内のパスの条件を満たす。 The path condition is a path condition indicating a hierarchy in which a keyword exists, and is “/ news // name” in the example of FIG. “/ News // name” indicates a path from the 0th layer → the first layer “news” → an arbitrary layer “name”. “//” indicates that there may be any hierarchy on the route from the first hierarchy “news” to an arbitrary hierarchy “name”. Therefore, for example, the path “/ news / name”, the path “/ news / write / name”, the path “/ news / name / write / name”, and the like satisfy the conditions of the path in the query Q.

オートマトンＡとは、例えば、少なくとも、初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定された情報である。開始タグ記号とは、開始タグからタグ内文字列を除いた「＜＞」である。終了タグ記号とは、終了タグからタグ内文字列を除いた「＜／＞」である。 The automaton A is information defining, for example, at least an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol. The start tag symbol is “<>” obtained by removing the character string in the tag from the start tag. The end tag symbol is “</>” obtained by removing the character string in the tag from the end tag.

オートマトンＡには、さらに、クエリＱの照合に用いる各種状態が規定されていてもよい。クエリＱの照合に用いる各種状態とは、例えば、クエリＱ内の条件を満たすパスを示すパス照合状態と、パス照合状態からクエリＱ内のキーワードによって順次遷移されるキーワード照合途中状態およびキーワード照合完了状態と、である。また、オートマトンＡの記憶領域には上限があるため、オートマトンＡに規定することができる状態の数には上限があるとする。 In the automaton A, various states used for matching the query Q may be further defined. The various states used for query Q verification include, for example, a path verification status indicating paths that satisfy the condition in query Q, a keyword verification intermediate state that is sequentially shifted from the path verification status by a keyword in query Q, and keyword verification completion. State. Further, since the storage area of the automaton A has an upper limit, it is assumed that the number of states that can be defined in the automaton A has an upper limit.

以下の説明では、まず、オートマトンＡの更新例１として、照合処理装置１００が、オートマトンＡの空き領域がなくなるまで、クエリＱの照合に用いる各種状態を新たにオートマトンＡに追加する場合の例を示す。次に、オートマトンＡの更新例２として、照合処理装置１００が、オートマトンＡの空き領域がない時に、クエリＱの照合に用いる各種状態を新たにオートマトンＡに追加することにより、オートマトンＡを更新する場合の例を示す。 In the following description, first, as an update example 1 of the automaton A, an example in which the collation processing apparatus 100 newly adds various states used for collation of the query Q to the automaton A until there is no free space in the automaton A. Show. Next, as an update example 2 of the automaton A, the collation processing apparatus 100 updates the automaton A by adding various states used for collation of the query Q to the automaton A when there is no free space in the automaton A. An example of the case is shown.

（オートマトン更新例１）
オートマトンＡの更新例１においては、照合処理装置１００は、入力ストリームＳの先頭から読み込んで、クエリＱの条件を満たすパスを検出する。照合処理装置１００は、クエリＱの条件を満たすパスを検出するごとに、クエリＱの条件を満たすパスごとの検出回数を計数すると共に、クエリＱのキーワードの照合に用いるパス照合状態を生成する。 (Automata update example 1)
In the update example 1 of the automaton A, the matching processing device 100 reads from the head of the input stream S and detects a path that satisfies the condition of the query Q. Each time the verification processing device 100 detects a path that satisfies the query Q condition, the verification processing apparatus 100 counts the number of times of detection for each path that satisfies the query Q condition and generates a path verification state used for matching the keyword of the query Q.

例えば、照合処理装置１００は、クエリＱの条件を満たすパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」を検出し、検出したパスを示すパス照合状態を生成する。ここで、パス照合状態には、クエリＱの条件を満たすパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」に固有なパスＩＤが割り当てられる。図１の例では、「４」が割り当てられる。 For example, the verification processing apparatus 100 detects a path “/ news / write / name” that satisfies the condition of the query Q, and generates a path verification state indicating the detected path. Here, a path ID unique to a path “/ news / write / name” that satisfies the condition of the query Q is assigned to the path verification state. In the example of FIG. 1, “4” is assigned.

また、照合処理装置１００は、開始状態からの遷移先の変更を行う。具体的には、照合処理装置１００は、開始状態からパス照合状態への遷移、パス照合状態から開始状態や終了状態への遷移、パス照合状態からクエリＱ内のキーワードへの遷移、その遷移先状態を生成する。 Further, the verification processing device 100 changes the transition destination from the start state. Specifically, the verification processing device 100 changes the transition from the start state to the path verification state, the transition from the path verification state to the start state or the end state, the transition from the path verification state to the keyword in the query Q, and the transition destination. Generate state.

また、照合処理装置１００は、入力ストリームＳの走査位置がパス照合状態の時に「Ｂｏｂ」を読み込んだ場合に、クエリＱのキーワードと照合できるよう、パス照合状態からの遷移先状態を生成する。遷移先状態は、第１のキーワード照合途中状態と、パス照合状態から第１のキーワード照合途中状態への遷移「Ｂ」と、である。また、照合処理装置１００は、第１のキーワード照合途中状態からの遷移先になる第２のキーワード照合途中状態と、第１のキーワード照合途中状態から第２のキーワード照合途中状態への遷移「ｏ」と、を生成する。また、照合処理装置１００は、第２のキーワード照合途中状態からの遷移先になるキーワード照合完了状態と、第２のキーワード照合途中状態からキーワード照合完了状態への遷移「ｂ」と、を生成する。 Further, the matching processing device 100 generates a transition destination state from the path matching state so that it can be matched with the keyword of the query Q when “Bob” is read when the scanning position of the input stream S is in the path matching state. The transition destination state is a first keyword matching intermediate state and a transition “B” from the path matching state to the first keyword matching intermediate state. The matching processing device 100 also includes a second keyword matching intermediate state that is a transition destination from the first keyword matching intermediate state, and a transition “o” from the first keyword matching intermediate state to the second keyword matching intermediate state. Is generated. Further, the matching processing device 100 generates a keyword matching completed state that is a transition destination from the second keyword matching intermediate state and a transition “b” from the second keyword matching intermediate state to the keyword matching completed state. .

また、照合処理装置１００は、パス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」の検出回数を計数する。ここでは、照合処理装置１００は、パス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」の検出回数「１」を計数したとする。 Further, the verification processing device 100 counts the number of times the path “/ news / write / name” is detected. Here, it is assumed that the collation processing apparatus 100 counts the number of detections “1” of the path “/ news / write / name”.

照合処理装置１００は、同様に、パス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」を検出し、パス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」のパスＩＤ「７」が割り当てられたパス照合状態を生成してオートマトンＡに規定する。また、照合処理装置１００は、キーワード照合途中状態と、キーワード照合完了状態と、を生成してオートマトンＡに規定する。また、照合処理装置１００は、パス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」の検出回数「１」を計数する。 Similarly, the verification processing apparatus 100 detects the path “/ news / edit / name”, generates a path verification state to which the path ID “7” of the path “/ news / edit / name” is assigned, and generates the automaton A. Stipulate. Further, the matching processing device 100 generates a keyword matching in-progress state and a keyword matching completed state and defines them in the automaton A. In addition, the verification processing device 100 counts the number of detections “1” of the path “/ news / edit / name”.

また、照合処理装置１００は、同様に、パス「／ｎｅｗｓ／ｐｒｅｖ／ｎａｍｅ」を検出し、パス「／ｎｅｗｓ／ｐｒｅｖ／ｎａｍｅ」のパスＩＤ「９」が割り当てられたパス照合状態を生成してオートマトンＡに規定する。また、照合処理装置１００は、キーワード照合途中状態と、キーワード照合完了状態と、を生成してオートマトンＡに規定する。また、照合処理装置１００は、パス「／ｎｅｗｓ／ｐｒｅｖ／ｎａｍｅ」の検出回数「１」を計数する。ここで、オートマトンＡの空き領域がなくなるとする。 Similarly, the verification processing apparatus 100 detects the path “/ news / prev / name” and generates a path verification state to which the path ID “9” of the path “/ news / prev / name” is assigned. Automaton A is specified. Further, the matching processing device 100 generates a keyword matching in-progress state and a keyword matching completed state and defines them in the automaton A. Further, the verification processing apparatus 100 counts the number of detections “1” of the path “/ news / prev / name”. Here, it is assumed that there is no free space in the automaton A.

その後、同様に、照合処理装置１００は、入力ストリームＳを順次読み込んで処理を行い、パス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」の検出回数「３」を計数し、パス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」の検出回数「３」を計数したとする。また、パス「／ｎｅｗｓ／ｐｒｅｖ／ｎａｍｅ」の検出回数は「１」のままであるとする。これにより、オートマトンＡの更新例１が終了する。 Thereafter, similarly, the matching processing apparatus 100 sequentially reads and processes the input stream S, counts the detection number “3” of the path “/ news / write / name”, and passes the path “/ news / edit / name”. Assume that the number of detections “3” is counted. Further, the number of detections of the path “/ news / prev / name” remains “1”. Thereby, the update example 1 of the automaton A ends.

（オートマトン更新例２）
オートマトンＡの更新例２において、照合処理装置１００は、クエリＱの条件を満たすパス「／ｎｅｗｓ／ｇｈｏｓｔ／ｎａｍｅ」を検出する。ここで、オートマトンＡには、空き領域がないため、検出したパス「／ｎｅｗｓ／ｇｈｏｓｔ／ｎａｍｅ」のパスＩＤ「１０」が割り当てられたパス照合状態を、新たにオートマトンＡに規定することができない。 (Automata update example 2)
In the update example 2 of the automaton A, the verification processing apparatus 100 detects a path “/ news / host / name” that satisfies the condition of the query Q. Here, since the automaton A has no free space, the path collation state to which the path ID “10” of the detected path “/ news / host / name” is assigned cannot be newly defined in the automaton A. .

そこで、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、いずれかのパス照合状態を削除することにより、パスＩＤ「１０」が割り当てられたパス照合状態を新たにオートマトンＡに規定する。具体的には、照合処理装置１００は、パスごとに計数した検出回数に基づいて、検出回数が少ないパスのパスＩＤが割り当てられたパス照合状態を削除する。そして、照合処理装置１００は、パスＩＤ「１０」が割り当てられたパス照合状態を生成してオートマトンＡに新たに規定する。また、照合処理装置１００は、キーワード照合途中状態と、キーワード照合完了状態と、を生成してオートマトンＡに規定する。これにより、オートマトンＡの更新例２が終了する。 Therefore, the verification processing apparatus 100 deletes one of the path verification states defined in the automaton A, so that the path verification state to which the path ID “10” is assigned is newly assigned to the automaton A. Stipulate. Specifically, the verification processing apparatus 100 deletes the path verification state to which the path ID of the path with the low detection count is assigned based on the detection count counted for each path. Then, the verification processing device 100 generates a path verification state to which the path ID “10” is assigned, and newly defines it in the automaton A. Further, the matching processing device 100 generates a keyword matching in-progress state and a keyword matching completed state and defines them in the automaton A. Thereby, the update example 2 of the automaton A ends.

このように、照合処理装置１００は、オートマトンＡの更新において、オートマトンＡに規定されたパス照合状態のうち、クエリＱの条件を満たすと検出された検出回数が相対的に多いパス照合状態を残す。したがって、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、後に再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 Thus, in the update of the automaton A, the matching processing apparatus 100 leaves a path matching state in which the number of detections detected when the condition of the query Q is satisfied among the path matching states defined in the automaton A is relatively large. . Therefore, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again later among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

＜入力ストリームＳとクエリＱとの照合例＞
図２は、入力ストリームＳとクエリＱとの照合例を示す説明図である。照合処理装置１００は、入力ストリームＳにおいて、クエリＱ内のパスの条件を満たすパスが示す階層にキーワード「Ｂｏｂ」が含まれる場合に、キーワード「Ｂｏｂ」を出力する。ここで、照合処理装置１００は、文字列「Ｂｏｂ」と共に、文字列「Ｂｏｂ」の位置を示す情報を出力してもよい。文字列「Ｂｏｂ」の位置を示す情報とは、例えば、パスであってもよいし、文字列「Ｂｏｂ」が存在する行番号であってもよい。 <Example of collation between input stream S and query Q>
FIG. 2 is an explanatory diagram showing an example of collation between the input stream S and the query Q. The collation processing apparatus 100 outputs the keyword “Bob” when the keyword “Bob” is included in the hierarchy indicated by the path satisfying the path condition in the query Q in the input stream S. Here, the collation processing apparatus 100 may output information indicating the position of the character string “Bob” together with the character string “Bob”. The information indicating the position of the character string “Bob” may be, for example, a path or a line number where the character string “Bob” exists.

図３は、入力ストリームＳの変換例を示す説明図である。照合処理装置１００は、入力ストリームＳを先頭の＜ｎｅｗｓ＞から末尾の＜／ｎｅｗｓ＞まで順次読み込む。照合処理装置１００は、タグについて変換を行う。これにより、タグが圧縮されるため、オートマトンＡの縮小化を図ることができる。 FIG. 3 is an explanatory diagram illustrating an example of conversion of the input stream S. The verification processing apparatus 100 sequentially reads the input stream S from the first <news> to the last </ news>. The verification processing device 100 performs conversion on the tag. Thereby, since the tag is compressed, the automaton A can be reduced.

図３の例では、開始タグ記号「＜＞」は「［」に変換される。また、終了タグ記号「＜／＞」は「］」に変換される。また、タグ内文字列は、各々固有な番号に変換される。本例では、読み込み順に、「１」、「２」、…に変換される。例えば、「ｎｅｗｓ」が「１」に、「ｄａｔｅ」が「２」に、「ｗｒｉｔｅ」が「３」に、「ｎａｍｅ」が「４」に変換される。以下の説明では、変換された入力ストリームＳを、「変換ストリームｓ」と表記する場合がある。 In the example of FIG. 3, the start tag symbol “<>” is converted to “[”. Also, the end tag symbol “</>” is converted to “]”. In addition, each character string in the tag is converted into a unique number. In this example, they are converted into “1”, “2”,. For example, “news” is converted into “1”, “date” is converted into “2”, “write” is converted into “3”, and “name” is converted into “4”. In the following description, the converted input stream S may be referred to as “converted stream s”.

図４は、入力ストリームＳのパスを示す説明図である。パスとは、入力ストリームＳの最上階層から各タグが示す階層までの経路を示す情報である。図４の例では、理解の容易のため、便宜上、入力ストリームＳについてのパススキーマを用いてパスを説明するが、入力ストリームＳがどのように構成されているかは、入力ストリームＳを読み込んで自ら解析するまで照合処理装置１００は認識していない。 FIG. 4 is an explanatory diagram showing the path of the input stream S. The path is information indicating a route from the top layer of the input stream S to the layer indicated by each tag. In the example of FIG. 4, for ease of understanding, the path will be described using a path schema for the input stream S for convenience. However, how the input stream S is configured is determined by reading the input stream S The collation processing apparatus 100 does not recognize until analysis.

パススキーマは、入力ストリームＳの階層構造を示すツリー構造体である。入力ストリームＳでは、「ｎｅｗｓ」が第１階層、「ｄａｔｅ」および「ｗｒｉｔｅ」が第２階層、「ｎａｍｅ」が第３階層となる。第０階層のルートから第１階層の「ｎｅｗｓ」に至るまでの経路がパスｐ１、ルートから「ｄａｔｅ」に至るまでの経路がパスｐ２、ルートから「ｗｒｉｔｅ」に至るまでの経路がパスｐ３、ルートから「ｎａｍｅ」に至るまでの経路がパスｐ４である。 The path schema is a tree structure indicating the hierarchical structure of the input stream S. In the input stream S, “news” is the first hierarchy, “date” and “write” are the second hierarchy, and “name” is the third hierarchy. The path from the root of the 0th hierarchy to the "news" of the 1st hierarchy is the path p1, the path from the root to "date" is the path p2, the path from the root to "write" is the path p3, The path from the route to “name” is the path p4.

各パスｐ１〜ｐ４に割り当てた数字は、図３で変換した固有な番号に対応する。具体的には、パスｐ＃（＃は数字）は、＃に変換されたタグ文字列のタグが示す階層に到達するパスである。例えば、パスｐ２は、「２」に変換されたタグ内文字列「ｄａｔｅ」に到達するパスである。また、パスｐ４は、「４」に変換されたタグ内文字列「ｎａｍｅ」に到達するパスである。＃は、後述するパスＩＤとなる。照合処理装置１００は、入力ストリームＳの読み込み中に、現在読み込んでいるタグを末尾とするパスを検出することになる。 The numbers assigned to the paths p1 to p4 correspond to the unique numbers converted in FIG. Specifically, the path p # (# is a number) is a path that reaches the hierarchy indicated by the tag of the tag character string converted to #. For example, the path p2 is a path that reaches the in-tag character string “date” converted to “2”. The path p4 is a path that reaches the in-tag character string “name” converted to “4”. # Is a path ID described later. While reading the input stream S, the verification processing apparatus 100 detects a path that ends with the tag currently being read.

図５は、パスＩＤ管理テーブルの一例を示す説明図である。パスＩＤ管理テーブルＴは、パスＩＤを管理するテーブルである。パスＩＤ管理テーブルＴは、パスＩＤ項目と、パス項目と、累計項目と、前回項目と、を有し、パスごとに、各項目の値がレコードとして格納される。 FIG. 5 is an explanatory diagram showing an example of the path ID management table. The path ID management table T is a table for managing path IDs. The path ID management table T includes a path ID item, a path item, a cumulative item, and a previous item, and the value of each item is stored as a record for each path.

パスＩＤ管理テーブルＴは、デフォルトではレコードがない空の状態である。パスＩＤ項目には、パスＩＤが格納される。パスＩＤとは、パスを一意に特定する識別情報であり、上述した＃となる。パス項目には、パスが格納される。例えば、パスＩＤ「４」のパスｐ４の場合、「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」が格納される。 The path ID management table T is empty by default with no records. The path ID item stores a path ID. The path ID is identification information for uniquely specifying a path, and is # described above. A path is stored in the path item. For example, in the case of the path p4 with the path ID “4”, “/ news / write / name” is stored.

累計項目には、パスがクエリＱの条件に一致した累計回数が格納される。累計回数とは、例えば、そのパスにより到達したタグがクエリＱの条件に一致した回数であって、そのパスに対応するパス照合状態の生成や削除に関わらず、累計される。 The cumulative number field stores the cumulative number of times that the path matches the query Q condition. The cumulative number of times is, for example, the number of times that the tag reached by the path matches the query Q condition, and is cumulative regardless of the generation or deletion of the path matching state corresponding to the path.

前回項目には、フラグが格納される。フラグとは、例えば、そのパスにより到達したタグが開始タグとして出現したか終了タグとして出現したかを示す識別子である。フラグは、未出現の場合が「（なし）」であり、開始タグとして出現すれば「開始」に設定され、開始タグとして出現した後に終了タグとして出現すれば「終了」に設定される。 A flag is stored in the previous item. The flag is an identifier indicating, for example, whether a tag reached by the path has appeared as a start tag or an end tag. The flag is “(none)” when it does not appear, and is set to “start” if it appears as a start tag, and is set to “end” if it appears as a start tag after appearing as a start tag.

例えば、入力ストリームＳの読み込み中に開始タグ＜ｎａｍｅ＞が出現すると、照合処理装置１００は、パスｐ４を検出して、パスＩＤ管理テーブルＴの新規レコードに、パスＩＤ「４」と、パスｐ４（／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ）、累計回数「１」、フラグ「開始」を登録する。その後、終了タグ＜／ｎａｍｅ＞が出現すると、照合処理装置１００は、パスｐ４のフラグを「開始」から「終了」に変更する。 For example, when the start tag <name> appears while the input stream S is being read, the verification processing apparatus 100 detects the path p4, and the path ID “4” and the path p4 are included in the new record of the path ID management table T. (/ News / write / name), cumulative count “1”, and flag “start” are registered. Thereafter, when the end tag </ name> appears, the collation processing device 100 changes the flag of the path p4 from “start” to “end”.

このようにして、パスＩＤ管理テーブルＴには、パスが検出される都度レコードが追加され、その終了タグが検出されると、フラグが更新される。これにより、パスの累計回数を検出することができる。また、終了タグの直後に読み込まれた開始タグは、その終了タグのパスよりも１階層上の階層から辿ることにより、パスを検出できることになる。例えば、パスｐ２の終了タグ＜／ｄａｔｅ＞の後に開始タグ＜ｗｒｉｔｅ＞が出現するが、１つ上の階層の＜ｎｅｗｓ＞に戻ってから＜ｗｒｉｔｅ＞に辿ることにより、パスｐ３を検出することができる。 In this way, a record is added to the path ID management table T every time a path is detected, and the flag is updated when the end tag is detected. Thereby, the cumulative number of passes can be detected. In addition, the start tag read immediately after the end tag can be detected by tracing from the layer one level higher than the path of the end tag. For example, the start tag <write> appears after the end tag </ date> of the path p2, but the path p3 is detected by going back to <new> after returning to <news> one level above. Can do.

図６は、頻度管理テーブルの一例を示す説明図である。頻度管理テーブルＦは、パスＩＤの出現頻度を管理するテーブルである。頻度管理テーブルＦは、パスＩＤ項目を有し、パスが出現するごとに、パスＩＤ項目の値がレコードとして格納される。 FIG. 6 is an explanatory diagram showing an example of the frequency management table. The frequency management table F is a table for managing the appearance frequency of the path ID. The frequency management table F has a path ID item, and each time a path appears, the value of the path ID item is stored as a record.

頻度管理テーブルＦは、デフォルトではレコードがない空の状態である。パスＩＤ項目には、パスＩＤが格納される。パスＩＤとは、上述したように、パスを一意に特定する識別情報であり、＃となる。 The frequency management table F is empty by default with no records. The path ID item stores a path ID. As described above, the path ID is identification information that uniquely identifies a path, and is #.

例えば、入力ストリームＳの読み込み中に開始タグ＜ｎａｍｅ＞が出現すると、照合処理装置１００は、パスｐ４を検出して、頻度管理テーブルＦの新規レコードに、パスＩＤ「４」を登録する。 For example, when the start tag <name> appears while the input stream S is being read, the verification processing apparatus 100 detects the path p4 and registers the path ID “4” in the new record of the frequency management table F.

頻度管理テーブルＦは、レコード数の上限が決定されたテーブルであって、レコード数が上限に達した場合に新たにレコードを格納する時は、最古のレコードを削除して新たにレコードを格納する。このようにして、頻度管理テーブルＦには、パスが検出される都度レコードが追加される。 The frequency management table F is a table in which the upper limit of the number of records is determined. When a new record is stored when the number of records reaches the upper limit, the oldest record is deleted and a new record is stored. To do. In this way, a record is added to the frequency management table F every time a path is detected.

図７は、初期オートマトンＡ０の構築例を示す説明図である。図１に示したように、照合処理装置１００は、初期オートマトンＡ０を記憶する。初期オートマトンＡ０は、どの入力ストリームＳにも適用可能であるため、或るタイミングで構築して照合処理装置１００内に保存しておけばよい。初期オートマトンＡ０が未構築の場合、照合処理装置１００は、（Ａ）において、初期状態Ｎ０と、開始タグ記号「＜＞」から変換された開始タグ記号「［」を示す開始状態Ｎ１を生成する。また、照合処理装置１００は、終了タグ記号「＜／＞」から変換された終了タグ記号「］」を示す終了状態Ｎ２を生成する。 FIG. 7 is an explanatory diagram showing a construction example of the initial automaton A0. As shown in FIG. 1, the verification processing apparatus 100 stores an initial automaton A0. Since the initial automaton A0 can be applied to any input stream S, it may be constructed at a certain timing and stored in the verification processing apparatus 100. When the initial automaton A0 is not yet constructed, the verification processing apparatus 100 generates an initial state N0 and a start state N1 indicating the start tag symbol “[” converted from the start tag symbol “<>” in (A). . In addition, the verification processing device 100 generates an end state N2 indicating the end tag symbol “]” converted from the end tag symbol “</>”.

そして、照合処理装置１００は、初期状態Ｎ０自身のループとなる遷移ｔ００と、開始状態Ｎ１から初期状態Ｎ０への遷移ｔ１０と、終了状態Ｎ２から初期状態Ｎ０への遷移ｔ２０を生成する。「Σ」は、すべての記号を示す。なお、以下の図面中の遷移の符号ｔｘｙにおいて、ｘは遷移元の状態の状態ＩＤを示し、ｙは遷移先の状態の状態ＩＤを示す。 Then, the verification processing device 100 generates a transition t00 that is a loop of the initial state N0 itself, a transition t10 from the start state N1 to the initial state N0, and a transition t20 from the end state N2 to the initial state N0. “Σ” indicates all symbols. Note that, in a transition code txy in the following drawings, x indicates a state ID of a transition source state, and y indicates a state ID of a transition destination state.

照合処理装置１００は、（Ｂ）において、初期状態Ｎ０自身のループとなる遷移ｔ００から「［」および「］」を削除し、初期状態Ｎ０から開始状態Ｎ１への遷移ｔ０１と、初期状態Ｎ０から終了状態Ｎ２への遷移ｔ０２を生成する。これにより、初期オートマトンＡ０が構築される。構築された初期オートマトンＡ０は、照合処理装置１００内の記憶領域に格納され、入力ストリームＳの受信が開始される都度、読み出される。 In (B), the verification processing device 100 deletes “[” and “]” from the transition t00 that is the loop of the initial state N0 itself, and the transition t01 from the initial state N0 to the start state N1 and the initial state N0. A transition t02 to the end state N2 is generated. Thereby, the initial automaton A0 is constructed. The constructed initial automaton A0 is stored in a storage area in the verification processing apparatus 100, and is read each time reception of the input stream S is started.

図８は、ノード構造体の一例を示す説明図である。ノード構造体Ｎとは、オートマトンＡの状態の特徴を記憶するデータ構造体である。具体的には、ノード構造体Ｎは、状態ごとに、状態ＩＤと、状態の種類と、ノードカウンタと、リストＬとを記憶する。状態ＩＤとは、状態を一意に特定する識別情報である。各状態には、状態ＩＤとして固有の識別情報が割り当てられる。状態の種類とは、その状態が、どのような種類なのかを特定する属性情報である。例えば、上述した「初期状態」、「開始状態」、「終了状態」がある。このほか、「パス照合状態」、「キーワード照合途中状態」、「キーワード照合完了状態」がある。「パス照合状態」、「キーワード照合途中状態」、「キーワード照合完了状態」については、後述する。 FIG. 8 is an explanatory diagram illustrating an example of a node structure. The node structure N is a data structure that stores the characteristics of the state of the automaton A. Specifically, the node structure N stores a state ID, a state type, a node counter, and a list L for each state. The state ID is identification information that uniquely identifies the state. Each state is assigned unique identification information as a state ID. The type of state is attribute information that identifies what type the state is. For example, there are “initial state”, “start state”, and “end state” described above. In addition, there are a “path verification state”, a “keyword verification in progress”, and a “keyword verification completion status”. The “path verification status”, “keyword verification in progress”, and “keyword verification completion status” will be described later.

ノードカウンタとは、開始状態からノードに遷移した遷移回数である。リストＬとは、状態の遷移先状態を特定する遷移を保持するデータ構造である。具体的には、リストＬには、記号ごとに領域が用意されており、その領域に遷移先状態の状態ＩＤが格納される。以下、図９に、初期オートマトンＡ０に規定された各状態のノード構造体Ｎを示す。 The node counter is the number of transitions from the start state to the node. The list L is a data structure that holds a transition that specifies a state transition destination state. Specifically, an area is prepared for each symbol in the list L, and the state ID of the transition destination state is stored in the area. FIG. 9 shows the node structure N in each state defined in the initial automaton A0.

図９は、初期オートマトンＡ０における各状態のノード構造体Ｎの一例を示す説明図である。図９において、（Ａ）に初期状態Ｎ０のノード構造体Ｎ、（Ｂ）に開始状態Ｎ１のノード構造体Ｎ、（Ｃ）に終了状態Ｎ２のノード構造体Ｎを示す。 FIG. 9 is an explanatory diagram showing an example of the node structure N in each state in the initial automaton A0. 9A shows the node structure N in the initial state N0, FIG. 9B shows the node structure N in the start state N1, and FIG. 9C shows the node structure N in the end state N2.

（Ａ）において、初期状態Ｎ０の状態ＩＤは「０」である。また、初期状態Ｎ０であるため、状態の種類は「初期」である。また、初期状態Ｎ０において、「［」が出現すると開始状態Ｎ１に遷移することになるため、リストＬの記号「［」の領域には、開始状態Ｎ１の状態ＩＤ「１」が格納される。同様に、初期状態Ｎ０において、「］」が出現すると終了状態Ｎ２に遷移することになるため、リストＬの記号「］」の領域には、終了状態Ｎ２の状態ＩＤ「２」が格納される。また、全記号Σのうち「［」および「］」を除いた「Σ＼｛［，］｝」の各々については、初期状態Ｎ０自身にループするため、「Σ＼｛［，］｝」の各記号の領域には、初期状態Ｎ０の状態ＩＤ「０」が格納される。 In (A), the state ID of the initial state N0 is “0”. Further, since the state is the initial state N0, the state type is “initial”. In addition, when “[” appears in the initial state N0, the state transitions to the start state N1, and thus the state ID “1” of the start state N1 is stored in the area of the symbol “[” in the list L. Similarly, when “]” appears in the initial state N0, the state transitions to the end state N2, and thus the state ID “2” of the end state N2 is stored in the area of the symbol “]” of the list L. . Further, each of “Σ \ {[,]}” excluding “[” and “]” among all symbols Σ loops to the initial state N0 itself, and therefore “Σ \ {[,]}” In each symbol area, the state ID “0” of the initial state N0 is stored.

（Ｂ）において、開始状態Ｎ１の状態ＩＤは「１」である。また、開始状態Ｎ１であるため、状態の種類は「開始」である。また、開始状態Ｎ１において、全記号Σのいずれの記号が出現しても初期状態Ｎ０に遷移することになるため、リストＬの各記号の領域には、初期状態Ｎ０の状態ＩＤ「０」が格納される。 In (B), the state ID of the start state N1 is “1”. Further, since the state is the start state N1, the state type is “start”. In addition, since any symbol of all symbols Σ appears in the start state N1, the state transitions to the initial state N0. Therefore, the state ID “0” of the initial state N0 is set in each symbol area of the list L. Stored.

（Ｃ）において、終了状態Ｎ２の状態ＩＤは「２」である。また、終了状態Ｎ２であるため、状態の種類は「終了」である。また、終了状態Ｎ２において、全記号Σのいずれの記号が出現しても初期状態Ｎ０に遷移することになるため、リストＬの各記号の領域には、初期状態Ｎ０の状態ＩＤ「０」が格納される。初期オートマトンＡ０に含まれていないパス照合状態、キーワード照合途中状態、およびキーワード照合完了状態も、（Ａ）〜（Ｃ）と同様のノード構造体Ｎである。 In (C), the state ID of the end state N2 is “2”. Further, since the state is the end state N2, the state type is “end”. In addition, since any symbol of all symbols Σ appears in the end state N2, the state transitions to the initial state N0. Therefore, the state ID “0” of the initial state N0 is set in each symbol area of the list L. Stored. The path matching state, the keyword matching in progress state, and the keyword matching completion state that are not included in the initial automaton A0 are also the node structures N similar to (A) to (C).

＜照合処理装置１００の動作例＞
次に、照合処理装置１００の動作例について具体的に説明する。以下の図において、図中、旗印は、オートマトンＡの現在位置を示し、太矢印は走査を示す。 <Operation Example of Collation Processing Device 100>
Next, an operation example of the verification processing apparatus 100 will be specifically described. In the following figures, the flag indicates the current position of the automaton A, and the thick arrow indicates scanning.

照合処理装置１００は、例えば、パスＩＤ管理テーブルＴの累計回数を用いてオートマトンＡを更新する動作を行うことができる。以下の説明では、照合処理装置１００によって行われる、パスＩＤ管理テーブルＴの累計回数を用いてオートマトンＡを更新する動作を、「第１の動作例」と表記する場合がある。 The verification processing apparatus 100 can perform an operation of updating the automaton A using, for example, the cumulative number of times of the path ID management table T. In the following description, the operation of updating the automaton A using the cumulative number of the path ID management table T performed by the matching processing apparatus 100 may be referred to as “first operation example”.

また、照合処理装置１００は、例えば、パスＩＤ管理テーブルＴの累計回数を頻度管理テーブルＦにより修正し、修正した累計回数を用いてオートマトンＡを更新する動作を行うことができる。以下の説明では、修正した累計回数を用いてオートマトンＡを更新する動作を、「第２の動作例」と表記する場合がある。 Further, for example, the verification processing apparatus 100 can perform an operation of correcting the automaton A using the corrected cumulative number of times by correcting the cumulative number of the path ID management table T by the frequency management table F. In the following description, the operation of updating the automaton A using the corrected cumulative number may be referred to as “second operation example”.

また、照合処理装置１００は、例えば、パスＩＤ管理テーブルＴの累計回数とフラグとを用いてオートマトンＡを更新する動作を行うことができる。以下の説明では、パスＩＤ管理テーブルＴの累計回数とフラグとを用いてオートマトンＡを更新する動作を、「第３の動作例」と表記する場合がある。 In addition, the verification processing apparatus 100 can perform an operation of updating the automaton A by using, for example, the cumulative number of times in the path ID management table T and the flag. In the following description, the operation of updating the automaton A using the cumulative number of times in the path ID management table T and the flag may be referred to as “third operation example”.

（照合処理装置１００の第１の動作例）
まず、図１０〜図３０を用いて、照合処理装置１００の第１の動作例について説明する。図１０〜図３０は、照合処理装置１００の第１の動作例を示す説明図である。第１の動作例においては、パスＩＤ管理テーブルＴの前回項目がなくてもよいし、頻度管理テーブルＦがなくてもよい。したがって、図１０〜図３０では、パスＩＤ管理テーブルＴの前回項目と、頻度管理テーブルＦと、の表記を省略する。 (First operation example of collation processing apparatus 100)
First, a first operation example of the verification processing apparatus 100 will be described with reference to FIGS. 10 to 30 are explanatory diagrams illustrating a first operation example of the collation processing device 100. In the first operation example, the previous item of the path ID management table T may not be present, and the frequency management table F may not be present. Therefore, in FIG. 10 to FIG. 30, the notation of the previous item of the path ID management table T and the frequency management table F is omitted.

図１０は、初期オートマトンＡ０の走査開始前の状態を示している。図１０では、クエリＱと初期オートマトンＡ０が用意されている。また、入力ストリームＳの各入力データが格納される第１バッファｂ１と、入力データから変換された変換入力データが格納される第２バッファｂ２と、現在のパスが登録される第３バッファｂ３と、が用意されている。図１０は走査前であるため、第１バッファｂ１と第２バッファｂ２と第３バッファｂ３とは空である。また、パスＩＤ管理テーブルＴにもレコードは存在しない。 FIG. 10 shows a state before the scanning of the initial automaton A0. In FIG. 10, a query Q and an initial automaton A0 are prepared. Also, a first buffer b1 in which each input data of the input stream S is stored, a second buffer b2 in which converted input data converted from the input data is stored, and a third buffer b3 in which the current path is registered , Is prepared. Since FIG. 10 is before scanning, the first buffer b1, the second buffer b2, and the third buffer b3 are empty. Further, there is no record in the path ID management table T.

次に、図１１の説明に移行する。図１１は、図１０の状態から入力ストリームＳのレコード１の先頭データである開始タグ＜ｎｅｗｓ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 11 shows processing when the start tag <news>, which is the top data of the record 1 of the input stream S, is received from the state of FIG.

図１１において、入力ストリームＳの先頭データである開始タグ＜ｎｅｗｓ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎｅｗｓ＞を読み出して「［１」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎｅｗｓ＞が受信されたため、照合処理装置１００は、現在のパスｐとして、「／ｎｅｗｓ」を検出して第３バッファｂ３に書き込む。 In FIG. 11, when the start tag <news>, which is the top data of the input stream S, is written to the first buffer b1, the verification processing apparatus 100 reads <news>, converts it to "[1", and converts it to the second buffer. Write to b2. Also, since the start tag <news> has been received, the verification processing apparatus 100 detects “/ news” as the current path p and writes it to the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「１」、第３バッファｂ３内のパス「／ｎｅｗｓ」、累計回数「（なし）」のレコードを登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “1”, the path “/ news” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T.

第３バッファｂ３のパスｐ１はクエリＱの条件に一致しないため、照合処理装置１００は、第２バッファｂ２の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「１」により、開始状態Ｎ１から初期状態Ｎ０に走査する。 Since the path p1 of the third buffer b3 does not match the condition of the query Q, the collation processing device 100 scans from the initial state N0 that is the scanning position to the start state N1 by “[” of the second buffer b2, and the second Scanning from the start state N1 to the initial state N0 is performed by "1" in the buffer b2.

次に、図１２の説明に移行する。図１２は、図１１の状態から入力ストリームＳのレコード１の開始タグ＜ｄａｔｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 12 shows processing when the start tag <date> of the record 1 of the input stream S is received from the state of FIG.

図１２において、入力ストリームＳの開始タグ＜ｄａｔｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｄａｔｅ＞を読み出して「［２」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｄａｔｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ」の末尾に「／ｄａｔｅ」を書き込む。 In FIG. 12, when the start tag <date> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <date>, converts it to “[2”, and writes it to the second buffer b2. Also, since the start tag <date> has been received, the verification processing apparatus 100 writes “/ date” at the end of “/ news” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「２」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｄａｔｅ」、累計回数「（なし）」のレコードを登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “2”, the path “/ news / date” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T.

また、第３バッファｂ３のパスｐ２はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「２」により、開始状態Ｎ１から初期状態Ｎ０に走査する。 Further, the path p2 of the third buffer b3 does not match the condition of the query Q. Therefore, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the start state N1 by “[” of the second buffer b2, and from the start state N1 to the initial state by “2” of the second buffer b2. Scan to N0.

次に、図１３の説明に移行する。図１３は、図１２の状態から入力ストリームＳのレコード１の文字列「２０１１−１２−０１」が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 13 shows processing when the character string “2011-12-01” of the record 1 of the input stream S is received from the state of FIG.

図１３において、入力ストリームＳの文字列「２０１１−１２−０１」が第１バッファｂ１に書き込まれると、照合処理装置１００は、文字列「２０１１−１２−０１」を読み出して変換せずに第２バッファｂ２に書き込む。また、文字列が受信された場合、パスＩＤ管理テーブルＴへの登録は実行されない。 In FIG. 13, when the character string “2011-12-01” of the input stream S is written to the first buffer b1, the collation processing device 100 reads the character string “2011-12-01” and converts the character string “2011-12-01” without conversion. 2 Write to buffer b2. When a character string is received, registration in the path ID management table T is not executed.

そして、照合処理装置１００は、第２バッファｂ２の文字列「２０１１−１２−０１」のうちの先頭文字「２」により、走査位置である初期状態Ｎ０から初期状態Ｎ０に走査する。また、照合処理装置１００は、第２バッファの文字列「２０１１−１２−０１」の先頭以降の文字により、同様に初期状態Ｎ０から初期状態Ｎ０に走査する。 Then, the collation processing device 100 scans from the initial state N0 that is the scanning position to the initial state N0 by the first character “2” of the character string “2011-12-01” in the second buffer b2. The collation processing apparatus 100 similarly scans from the initial state N0 to the initial state N0 by the characters after the head of the character string “2011-12-01” in the second buffer.

次に、図１４の説明に移行する。図１４は、図１３の状態から入力ストリームＳのレコード１の終了タグ＜／ｄａｔｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 14 shows processing when the end tag </ date> of the record 1 of the input stream S is received from the state of FIG.

図１４において、入力ストリームＳの終了タグ＜／ｄａｔｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜／ｄａｔｅ＞を読み出して「］２」に変換し、第２バッファｂ２に書き込む。また、照合処理装置１００は、第３バッファｂ３のパスｐ２：「／ｎｅｗｓ／ｄａｔｅ」から「／」および終了タグ＜／ｄａｔｅ＞のタグ内文字列「ｄａｔｅ」を削除して、パスｐ１：「／ｎｅｗｓ」に戻す。 In FIG. 14, when the end tag </ date> of the input stream S is written to the first buffer b1, the verification processing apparatus 100 reads </ date>, converts it to "] 2", and stores it in the second buffer b2. Write. Further, the verification processing device 100 deletes the character string “date” of “/” and the end tag </ date> from the path p2: “/ news / date” of the third buffer b3, and passes the path p1: “ / News ".

また、第３バッファｂ３のパスｐ１はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「］」により、走査位置である初期状態Ｎ０から終了状態Ｎ２に走査し、第２バッファｂ２の「２」により、終了状態Ｎ２から初期状態Ｎ０に走査する。 Further, the path p1 of the third buffer b3 does not match the query Q condition. For this reason, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the end state N2 by “]” of the second buffer b2, and from the end state N2 to the initial state by “2” of the second buffer b2. Scan to N0.

次に、図１５の説明に移行する。図１５は、図１４の状態から入力ストリームＳのレコード１の開始タグ＜ｗｒｉｔｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 15 shows processing when the start tag <write> of the record 1 of the input stream S is received from the state of FIG.

図１５において、入力ストリームＳの開始タグ＜ｗｒｉｔｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｗｒｉｔｅ＞を読み出して「［３」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｗｒｉｔｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ」の末尾に「／」およびタグ内文字列「ｗｒｉｔｅ」を書き込む。 In FIG. 15, when the start tag <write> of the input stream S is written to the first buffer b1, the verification processing apparatus 100 reads <write>, converts it to “[3”, and writes it to the second buffer b2. Also, since the start tag <write> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “write” at the end of “/ news” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「３」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｗｒｉｔｅ」、累計回数「（なし）」のレコードを登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “3”, the path “/ news / write”, and the cumulative number “(none)” in the third buffer b3 in the path ID management table T.

また、第３バッファｂ３のパスｐ３はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「３」により、開始状態Ｎ１から初期状態Ｎ０に走査する。 Further, the path p3 of the third buffer b3 does not match the query Q condition. Therefore, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the start state N1 by “[” of the second buffer b2, and from the start state N1 to the initial state by “3” of the second buffer b2. Scan to N0.

次に、図１６の説明に移行する。図１６は、図１５の状態から入力ストリームＳのレコード１の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 16 shows processing when the start tag <name> of the record 1 of the input stream S is received from the state of FIG.

図１６において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［４」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｗｒｉｔｅ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 16, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <name>, converts it to “[4”, and writes it to the second buffer b2. Further, since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “name” at the end of “/ news / write” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「４」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。 Then, the verification processing apparatus 100 registers a record of the path ID “4”, the path “/ news / write / name” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. Here, the path p4 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”.

また、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの１回目の更新を実行する。具体的には、照合処理装置１００は、まず、新規状態を生成する。新規状態は、パスＩＤ「４」を示す状態である。この状態はクエリＱの条件を満たすパスｐ４を特定する状態であるため、以下の説明では、この状態を「パス照合状態」と表記する場合がある。 The path p4 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 executes the first update of the automaton A. Specifically, the verification processing device 100 first generates a new state. The new state is a state indicating the path ID “4”. Since this state is a state that specifies the path p4 that satisfies the condition of the query Q, in the following description, this state may be referred to as a “path verification state”.

また、第２バッファｂ２には、「［４」が書き込まれたため、パス照合状態Ｎ３は、開始状態Ｎ１からの遷移先状態となる。このとき、具体的には、開始状態Ｎ１のリストＬの記号「４」の領域に、パス照合状態Ｎ３の状態ＩＤが格納される。以下の説明では、簡単のため、各状態のリストＬの更新についての説明は省略する。したがって、記号ΣのうちパスＩＤ「４」を除いた記号「Σ＼｛４｝」で遷移される場合は、開始状態Ｎ１の遷移先は初期状態Ｎ０となる。一方で、パスＩＤ「４」で遷移される場合は、開始状態Ｎ１の遷移先はパス照合状態Ｎ３となる。 Further, since “[4” is written in the second buffer b2, the path verification state N3 becomes the transition destination state from the start state N1. At this time, specifically, the state ID of the path matching state N3 is stored in the area of the symbol “4” in the list L of the start state N1. In the following description, for the sake of simplicity, description of updating the list L in each state is omitted. Accordingly, when transition is made with the symbol “Σ \ {4}” excluding the path ID “4” in the symbol Σ, the transition destination of the start state N1 is the initial state N0. On the other hand, when the transition is made with the path ID “4”, the transition destination of the start state N1 is the path verification state N3.

また、パス照合状態Ｎ３が生成されると、照合処理装置１００は、クエリＱのキーワードを構成する文字によりパス照合状態Ｎ３から順次遷移される遷移先状態を生成する。以下の説明では、キーワードを構成する文字により遷移される遷移先状態のうち、末尾文字以外の文字により遷移される遷移先状態を、「キーワード照合途中状態」と表記する場合がある。以下の説明では、末尾文字により遷移される遷移先状態を、「キーワード照合完了状態」と表記する場合がある。 In addition, when the path matching state N3 is generated, the matching processing device 100 generates a transition destination state that is sequentially shifted from the path matching state N3 by characters constituting the keyword of the query Q. In the following description, a transition destination state that is transitioned by a character other than the last character among transition destination states that are transitioned by characters constituting the keyword may be referred to as a “keyword matching intermediate state”. In the following description, the transition destination state that is transitioned by the end character may be referred to as a “keyword matching completion state”.

図１６の例では、先頭文字「Ｂ」および２番目の文字「ｏ」により遷移される遷移先状態は、キーワード照合途中状態Ｎ４，Ｎ５となり、末尾文字「ｂ」により遷移される遷移先状態は、キーワード照合完了状態Ｎ６となる。 In the example of FIG. 16, the transition destination states that are transitioned by the first character “B” and the second character “o” are the keyword matching intermediate states N4 and N5, and the transition destination state that is transitioned by the last character “b” is The keyword collation completion state N6 is entered.

また、照合処理装置１００は、パス照合状態Ｎ３から開始状態Ｎ１への遷移（不図示）と終了状態Ｎ２への遷移（不図示）を生成する。また、照合処理装置１００は、パス照合状態Ｎ３が自身にループする遷移（不図示）も生成する。この遷移は、全記号Σから開始タグ記号「［」、終了タグ記号「］」およびキーワードの先頭文字「Ｂ」を除いた記号「Σ＼｛［，］，Ｂ｝」となる。これにより、１回目のオートマトンＡの更新が完了する。 Further, the matching processing device 100 generates a transition (not shown) from the path matching state N3 to the start state N1 and a transition (not shown) to the end state N2. The matching processing device 100 also generates a transition (not shown) in which the path matching state N3 loops to itself. This transition is a symbol “Σ \ {[,], B}” obtained by removing the start tag symbol “[”, the end tag symbol “]” and the first character “B” of the keyword from all symbols Σ. Thereby, the first update of the automaton A is completed.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［４」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「４」により、開始状態Ｎ１からパス照合状態Ｎ３に走査する。走査位置は、パス照合状態Ｎ３となる。照合処理装置１００は、パス照合状態Ｎ３に走査すると、パス照合状態Ｎ３のノードカウンタを「０」から「１」に更新する。図１６〜図４２において、パス照合状態の右上に示した数字は当該パス照合状態のノードカウンタの値を示す。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 which is the scanning position to the start state N1 by “[” of “[4” of the second buffer b2, and from the start state N1 by “4” of the second buffer b2. Scan to the path verification state N3. The scanning position is in a path verification state N3. When the verification processing device 100 scans the path verification state N3, the node counter in the path verification state N3 is updated from “0” to “1”. 16 to 42, the numbers shown at the upper right of the path verification state indicate the values of the node counters in the path verification state.

次に、図１７の説明に移行する。図１７は、図１６の状態から入力ストリームＳのレコード１の文字列「Ａｌｉｃｅ」が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 17 shows processing when the character string “Alice” of the record 1 of the input stream S is received from the state of FIG.

図１７において、入力ストリームＳの文字列「Ａｌｉｃｅ」が第１バッファｂ１に書き込まれると、照合処理装置１００は、文字列「Ａｌｉｃｅ」を読み出して変換せずに第２バッファｂ２に書き込む。また、文字列が受信された場合、パスＩＤ管理テーブルＴへの登録は実行されない。 In FIG. 17, when the character string “Alice” of the input stream S is written into the first buffer b1, the verification processing apparatus 100 reads the character string “Alice” and writes it into the second buffer b2 without conversion. When a character string is received, registration in the path ID management table T is not executed.

また、第２バッファｂ２の文字列「Ａｌｉｃｅ」のうち先頭文字「Ａ」はキーワードの先頭文字「Ｂ」と異なるため、照合処理装置１００は、文字「Ａ」により、走査位置であるパス照合状態Ｎ３からパス照合状態Ｎ３に走査する。また、照合処理装置１００は、第２バッファの文字列「Ａｌｉｃｅ」の先頭以降の文字により、同様にパス照合状態Ｎ３からパス照合状態Ｎ３に走査する。 In addition, since the first character “A” in the character string “Alice” in the second buffer b2 is different from the first character “B” of the keyword, the collation processing device 100 uses the character “A” to determine the path collation state that is the scanning position. Scan from N3 to path verification state N3. Further, the matching processing device 100 similarly scans from the path matching state N3 to the path matching state N3 by the characters after the head of the character string “Alice” in the second buffer.

次に、図１８の説明に移行する。図１８は、図１７の状態から入力ストリームＳのレコード１の終了タグ＜／ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 18 shows processing when the end tag </ name> of the record 1 of the input stream S is received from the state of FIG.

図１８において、入力ストリームＳの終了タグ＜／ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜／ｎａｍｅ＞を読み出して「］４」に変換し、第２バッファｂ２に書き込む。また、照合処理装置１００は、第３バッファｂ３のパスｐ４：「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」から「／」および終了タグ＜／ｎａｍｅ＞のタグ内文字列「ｎａｍｅ」を削除して、パスｐ３：「／ｎｅｗｓ／ｗｒｉｔｅ」に戻す。 In FIG. 18, when the end tag </ name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads </ name>, converts it to "] 4", and stores it in the second buffer b2. Write. Further, the verification processing apparatus 100 deletes the character string “name” in the tag “/” and the end tag </ name> from the path p4: “/ news / write / name” of the third buffer b3, and passes the path p3. : Return to “/ news / write”.

また、第３バッファｂ３のパスｐ３はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「］」により、走査位置であるパス照合状態Ｎ３から終了状態Ｎ２に走査し、第２バッファｂ２の「４」により、終了状態Ｎ２から初期状態Ｎ０に走査する。 Further, the path p3 of the third buffer b3 does not match the query Q condition. For this reason, the collation processing apparatus 100 scans from the path collation state N3 that is the scanning position to the end state N2 by “]” of the second buffer b2, and from the end state N2 by “4” of the second buffer b2. Scan to state N0.

次に、図１９の説明に移行する。図１９は、図１８の状態から入力ストリームＳのレコード１の終了タグ＜／ｗｒｉｔｅ＞が受信された場合の処理を示している。 Next, the description shifts to the description of FIG. FIG. 19 shows processing when the end tag </ write> of the record 1 of the input stream S is received from the state of FIG.

図１９において、入力ストリームＳの終了タグ＜／ｗｒｉｔｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜／ｗｒｉｔｅ＞を読み出して「］３」に変換し、第２バッファｂ２に書き込む。また、照合処理装置１００は、第３バッファｂ３のパスｐ３：「／ｎｅｗｓ／ｗｒｉｔｅ」から「／」および終了タグ＜／ｗｒｉｔｅ＞のタグ内文字列「ｗｒｉｔｅ」を削除して、パスｐ１：「／ｎｅｗｓ」に戻す。 In FIG. 19, when the end tag </ write> of the input stream S is written to the first buffer b1, the verification processing device 100 reads </ write>, converts it to "] 3", and stores it in the second buffer b2. Write. Further, the verification processing apparatus 100 deletes “/” and the character string “write” in the tag of the end tag </ write> from the path p3: “/ news / write” of the third buffer b3, and passes the path p1: “ / News ".

また、第３バッファｂ３のパスｐ１はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「］」により、走査位置である初期状態Ｎ０から終了状態Ｎ２に走査し、第２バッファｂ２の「３」により、終了状態Ｎ２から初期状態Ｎ０に走査する。 Further, the path p1 of the third buffer b3 does not match the query Q condition. For this reason, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the end state N2 by “]” of the second buffer b2, and from the end state N2 to the initial state by “3” of the second buffer b2. Scan to N0.

次に、図２０の説明に移行する。図２０は、図１９の状態から入力ストリームＳのレコード１の開始タグ＜ｅｄｉｔ＞が受信された場合の処理を示している。 Next, the description shifts to the description of FIG. FIG. 20 shows a process when the start tag <edit> of the record 1 of the input stream S is received from the state of FIG.

図２０において、入力ストリームＳの開始タグ＜ｅｄｉｔ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｅｄｉｔ＞を読み出して「［５」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｅｄｉｔ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ」の末尾に「／」およびタグ内文字列「ｅｄｉｔ」を書き込む。 In FIG. 20, when the start tag <edit> of the input stream S is written to the first buffer b1, the verification processing apparatus 100 reads <edit>, converts it to “[5”, and writes it to the second buffer b2. Also, since the start tag <edit> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “edit” at the end of “/ news” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「５」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｅｄｉｔ」、累計回数「（なし）」のレコードを登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “5”, the path “/ news / edit” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T.

また、第３バッファｂ３のパスｐ３はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「５」により、開始状態Ｎ１から初期状態Ｎ０に走査する。 Further, the path p3 of the third buffer b3 does not match the query Q condition. For this reason, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the start state N1 by “[” of the second buffer b2, and from the start state N1 to the initial state by “5” of the second buffer b2. Scan to N0.

次に、図２１の説明に移行する。図２１は、図２０の状態から入力ストリームＳのレコード１の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 21 shows processing when the start tag <name> of the record 1 of the input stream S is received from the state of FIG.

図２１において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［６」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｅｄｉｔ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 21, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <name>, converts it to “[6”, and writes it to the second buffer b2. Since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “name” at the end of “/ news / edit” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「６」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ６はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。 Then, the verification processing apparatus 100 registers a record of the path ID “6”, the path “/ news / edit / name” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. Here, the path p6 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”.

また、第３バッファｂ３のパスｐ６はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの２回目の更新を実行する。具体的には、照合処理装置１００は、クエリＱの条件を満たすパスｐ６を特定するパス照合状態Ｎ７を生成する。また、パス照合状態Ｎ７が生成されると、照合処理装置１００は、クエリＱのキーワードの各文字によりパス照合状態Ｎ７から順次遷移される遷移先状態として、キーワード照合途中状態Ｎ８，Ｎ９とキーワード照合完了状態Ｎ１０とを生成する。 The path p6 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 executes the second update of the automaton A. Specifically, the matching processing device 100 generates a path matching state N7 that identifies the path p6 that satisfies the condition of the query Q. When the path matching state N7 is generated, the matching processing device 100 sets the keyword matching intermediate states N8, N9 and the keyword matching as the transition destination states that are sequentially shifted from the path matching state N7 by the characters of the keyword of the query Q. A completion state N10 is generated.

また、照合処理装置１００は、パス照合状態Ｎ７から開始状態Ｎ１への遷移（不図示）と終了状態Ｎ２への遷移（不図示）を生成する。また、照合処理装置１００は、パス照合状態Ｎ７が自身にループする遷移（不図示）も生成する。この遷移は、全記号Σから開始タグ記号「［」、終了タグ記号「］」およびキーワードの先頭文字「Ｂ」を除いた記号「Σ＼｛［，］，Ｂ｝」となる。これにより、２回目のオートマトンＡの更新が完了する。 Further, the matching processing device 100 generates a transition (not shown) from the path matching state N7 to the start state N1 and a transition (not shown) to the end state N2. The matching processing device 100 also generates a transition (not shown) in which the path matching state N7 loops to itself. This transition is a symbol “Σ \ {[,], B}” obtained by removing the start tag symbol “[”, the end tag symbol “]” and the first character “B” of the keyword from all symbols Σ. This completes the second update of the automaton A.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［６」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「６」により、開始状態Ｎ１からパス照合状態Ｎ７に走査する。走査位置は、パス照合状態Ｎ７となる。照合処理装置１００は、パス照合状態Ｎ７に走査すると、パス照合状態Ｎ７のノードカウンタを「０」から「１」に更新する。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 which is the scanning position to the start state N1 by “[” of “[6” of the second buffer b2, and from the start state N1 by “6” of the second buffer b2. Scan to the path verification state N7. The scanning position is in a path verification state N7. When the verification processing device 100 scans the path verification state N7, the node counter of the path verification state N7 is updated from “0” to “1”.

次に、図２２の説明に移行する。図２２は、図２１の状態から入力ストリームＳのレコード１の文字列「Ｂｏｂ」が受信された場合の処理を示している。 Next, the description shifts to the description of FIG. FIG. 22 shows processing when the character string “Bob” of the record 1 of the input stream S is received from the state of FIG.

図２２において、入力ストリームＳの文字列「Ｂｏｂ」が第１バッファｂ１に書き込まれると、照合処理装置１００は、文字列「Ｂｏｂ」を読み出して変換せずに第２バッファｂ２に書き込む。また、文字列が受信された場合、パスＩＤ管理テーブルＴへの登録は実行されない。 In FIG. 22, when the character string “Bob” of the input stream S is written to the first buffer b1, the verification processing apparatus 100 reads the character string “Bob” and writes it to the second buffer b2 without conversion. When a character string is received, registration in the path ID management table T is not executed.

また、第２バッファｂ２の文字列「Ｂｏｂ」のうち先頭文字「Ｂ」はキーワードの先頭文字「Ｂ」と一致するため、照合処理装置１００は、文字「Ｂ」により、走査位置であるパス照合状態Ｎ７からキーワード照合途中状態Ｎ８に走査する。次に、照合処理装置１００は、文字列「Ｂｏｂ」のうち２番目の文字「ｏ」により、走査位置であるキーワード照合途中状態Ｎ８から次のキーワード照合途中状態Ｎ９に走査する。 In addition, since the first character “B” in the character string “Bob” in the second buffer b2 matches the first character “B” of the keyword, the verification processing device 100 uses the character “B” to verify the path verification that is the scanning position. Scan from the state N7 to the keyword collation intermediate state N8. Next, the matching processing device 100 scans from the keyword matching intermediate state N8 that is the scanning position to the next keyword matching intermediate state N9 by the second character “o” in the character string “Bob”.

そして、照合処理装置１００は、文字列「Ｂｏｂ」のうち末尾文字「ｂ」により、キーワード照合途中状態Ｎ９からキーワード照合完了状態Ｎ１０に走査する。ここで、キーワード照合完了状態Ｎ１０に走査したため、照合処理装置１００は、クエリＱのキーワード「Ｂｏｂ」を出力する。 Then, the matching processing device 100 scans from the keyword matching in-progress state N9 to the keyword matching completed state N10 by the end character “b” in the character string “Bob”. Here, since the keyword matching completion state N10 is scanned, the matching processing device 100 outputs the keyword “Bob” of the query Q.

次に、図２３の説明に移行する。図２３は、図２２の状態から入力ストリームＳのレコード１の終了タグ＜／ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 23 shows processing when the end tag </ name> of the record 1 of the input stream S is received from the state of FIG.

図２３において、入力ストリームＳの終了タグ＜／ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜／ｎａｍｅ＞を読み出して「］６」に変換し、第２バッファｂ２に書き込む。また、照合処理装置１００は、第３バッファｂ３のパスｐ６：「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」から「／」および終了タグ＜／ｎａｍｅ＞のタグ内文字列「ｎａｍｅ」を削除して、パスｐ５：「／ｎｅｗｓ／ｅｄｉｔ」に戻す。 In FIG. 23, when the end tag </ name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads </ name>, converts it to "] 6", and stores it in the second buffer b2. Write. Further, the verification processing apparatus 100 deletes “/” and the in-tag character string “name” of the end tag </ name> from the path p6: “/ news / edit / name” of the third buffer b3, and passes the path p5 : Return to “/ news / edit”.

ここで、第３バッファｂ３のパスｐ５はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「］」により、走査位置であるキーワード照合完了状態Ｎ１０から終了状態Ｎ２に走査し、第２バッファｂ２の「６」により、終了状態Ｎ２から初期状態Ｎ０に走査する。 Here, the path p5 of the third buffer b3 does not match the query Q condition. For this reason, the collation processing device 100 scans from the keyword collation completion state N10 that is the scanning position to the end state N2 by “]” of the second buffer b2, and from the end state N2 by “6” of the second buffer b2. Scan to the initial state N0.

次に、図２４の説明に移行する。図２４は、図２３の状態から、入力ストリームＳのレコード１の残りの各データが受信され、照合処理装置１００によって図１０〜図２３と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 24 shows a processing result when each remaining data of the record 1 of the input stream S is received from the state of FIG. 23 and processed in the same manner as FIGS. .

図２４において、開始タグ＜ｐｒｅｖ＞が受信されると、照合処理装置１００によって、開始タグ＜ｐｒｅｖ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「７」に関するレコードが登録され、オートマトンＡが走査される。 In FIG. 24, when the start tag <prev> is received, the collation processing device 100 writes the data obtained by converting the start tag <prev> into the second buffer b2, and the third buffer b3 is updated by the current path. The Further, a record relating to the path ID “7” is registered in the path ID management table T, and the automaton A is scanned.

次に、開始タグ＜ｗｒｉｔｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｗｒｉｔｅ＞を変換したデータが第２バッファｂ２に書き込まれ、第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「８」に関するレコードが登録され、オートマトンＡが走査される。 Next, when the start tag <write> is received, the collation processing device 100 writes the data obtained by converting the start tag <write> into the second buffer b2, and updates the third buffer b3. Further, a record related to the path ID “8” is registered in the path ID management table T, and the automaton A is scanned.

そして、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「９」に関するレコードが登録され、オートマトンＡが走査される。また、オートマトンＡにパス照合状態Ｎ１１と、キーワード照合途中状態Ｎ１２，Ｎ１３と、キーワード照合完了状態Ｎ１４と、が生成される。また、パス照合状態Ｎ１１のノードカウンタは「０」から「１」に更新される。 When the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. Further, a record related to the path ID “9” is registered in the path ID management table T, and the automaton A is scanned. Further, a path collation state N11, keyword collation intermediate states N12 and N13, and a keyword collation completion state N14 are generated in the automaton A. Further, the node counter in the path verification state N11 is updated from “0” to “1”.

次に、文字列「Ｃａｒｏｌ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、終了タグ＜／ｐｒｅｖ＞が受信され、終了タグ＜／ｎｅｗｓ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。走査位置は、初期状態Ｎ０となる。 Next, the character string “Carol” is received, the end tag </ name> is received, the end tag </ write> is received, the end tag </ prev> is received, and the end tag </ news> is received. Is done. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The The scanning position is in the initial state N0.

次に、図２５の説明に移行する。図２５は、図２４の状態から、入力ストリームＳのレコード２の各データが受信され、照合処理装置１００によって図１０〜図２４と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 25 shows a processing result when each data of the record 2 of the input stream S is received from the state of FIG. 24 and processed in the same manner as in FIGS.

図２５において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０２」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 25, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-02” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「１」から「２」に更新され、オートマトンＡが走査される。また、パスＩＤ「４」を特定するパス照合状態Ｎ３のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . Further, the cumulative number related to the path ID “4” in the path ID management table T is updated from “1” to “2”, and the automaton A is scanned. Further, the node counter of the path verification state N3 that specifies the path ID “4” is updated from “1” to “2”.

次に、文字列「Ｃａｒｏｌ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、開始タグ＜ｅｄｉｔ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Carol” is received, the end tag </ name> is received, the end tag </ write> is received, and the start tag <edit> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「６」に関する累計回数が「１」から「２」に更新され、オートマトンＡが走査される。また、パスＩＤ「６」を特定するパス照合状態Ｎ７のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and updates the third buffer b3. In addition, the cumulative number related to the path ID “6” in the path ID management table T is updated from “1” to “2”, and the automaton A is scanned. In addition, the node counter of the path verification state N7 that specifies the path ID “6” is updated from “1” to “2”.

次に、文字列「Ｄｉｃｋ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｅｄｉｔ＞が受信され、終了タグ＜／ｎｅｗｓ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。走査位置は、初期状態Ｎ０となる。 Next, the character string “Dick” is received, the end tag </ name> is received, the end tag </ edit> is received, and the end tag </ news> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The The scanning position is in the initial state N0.

次に、図２６の説明に移行する。図２６は、図２５の状態から、入力ストリームＳのレコード３の途中までの各データが受信され、照合処理装置１００によって図１０〜図２４と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 26 shows a processing result when each piece of data from the state of FIG. 25 to the middle of the record 3 of the input stream S is received and processed in the same manner as in FIGS. Yes.

図２６において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０３」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 26, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-03” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「２」から「３」に更新され、オートマトンＡが走査される。また、パスＩＤ「４」を特定するパス照合状態Ｎ３のノードカウンタが「２」から「３」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and updates the third buffer b3. In addition, the cumulative number related to the path ID “4” in the path ID management table T is updated from “2” to “3”, and the automaton A is scanned. Further, the node counter in the path verification state N3 that specifies the path ID “4” is updated from “2” to “3”.

次に、文字列「Ａｌｉｃｅ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、開始タグ＜ｅｄｉｔ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Alice” is received, the end tag </ name> is received, the end tag </ write> is received, and the start tag <edit> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「６」に関する累計回数が「２」から「３」に更新され、オートマトンＡが走査される。また、パスＩＤ「６」を特定するパス照合状態Ｎ７のノードカウンタが「２」から「３」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and updates the third buffer b3. Further, the cumulative number related to the path ID “6” in the path ID management table T is updated from “2” to “3”, and the automaton A is scanned. In addition, the node counter in the path verification state N7 that specifies the path ID “6” is updated from “2” to “3”.

次に、文字列「Ｄｉｃｋ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｅｄｉｔ＞が受信され、開始タグ＜ｐｒｅｖ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Dick” is received, the end tag </ name> is received, the end tag </ edit> is received, the start tag <prev> is received, and the start tag <write> is received. . Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「９」に関する累計回数が「１」から「２」に更新され、オートマトンＡが走査される。また、パスＩＤ「９」を特定するパス照合状態Ｎ１１のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “9” in the path ID management table T is updated from “1” to “2”, and the automaton A is scanned. Further, the node counter of the path verification state N11 that specifies the path ID “9” is updated from “1” to “2”.

次に、文字列「Ｃａｒｏｌ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Carol” is received, the end tag </ name> is received, and the end tag </ write> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｅｄｉｔ＞が受信されると、照合処理装置１００によって、開始タグ＜ｅｄｉｔ＞を変換したデータが第２バッファｂ２に書き込まれ、第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「１０」に関するレコードが登録され、オートマトンＡが走査される。 Next, when the start tag <edit> is received, the collation processing device 100 writes the data obtained by converting the start tag <edit> into the second buffer b2, and updates the third buffer b3. Further, a record related to the path ID “10” is registered in the path ID management table T, and the automaton A is scanned.

次に、図２７の説明に移行する。図２７は、図２６の状態から入力ストリームＳのレコード３の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 27 shows processing when the start tag <name> of the record 3 of the input stream S is received from the state of FIG.

図２７において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［１１」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｐｒｅｖ／ｅｄｉｔ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 27, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing apparatus 100 reads <name>, converts it to “[11”, and writes it to the second buffer b2. Also, since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “name” at the end of “/ news / prev / edit” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「１１」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｐｒｅｖ／ｅｄｉｔ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ１１はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。 Then, the verification processing apparatus 100 registers a record of the path ID “11”, the path “/ news / prev / edit / name” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. To do. Here, the path p11 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”.

また、第３バッファｂ３のパスｐ１１はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの３回目の更新を実行する。具体的には、照合処理装置１００は、クエリＱの条件を満たすパスｐ１１を特定するパス照合状態Ｎ１５を生成する。このとき、オートマトンＡには、既に最大規定数のパス照合状態が規定されているため、照合処理装置１００は、オートマトンＡに規定されているパス照合状態Ｎ３，Ｎ７，Ｎ１１のうち、いずれかのパス照合状態を削除して、パス照合状態Ｎ１５を生成する。 The path p11 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 executes the third update of the automaton A. Specifically, the matching processing device 100 generates a path matching state N15 that identifies the path p11 that satisfies the condition of the query Q. At this time, since the maximum specified number of path verification states are already specified in the automaton A, the verification processing apparatus 100 selects one of the path verification states N3, N7, and N11 specified in the automaton A. The path verification state is deleted and a path verification state N15 is generated.

例えば、照合処理装置１００は、パスＩＤ管理テーブルＴの累計回数が最小になるパスＩＤ「９」を特定する。次に、照合処理装置１００は、特定したパスＩＤ「９」を特定するパス照合状態Ｎ１１を削除する。そして、照合処理装置１００は、パス照合状態Ｎ１１の削除に伴って、パス照合状態Ｎ１１から順次遷移されるキーワード照合途中状態Ｎ１２，Ｎ１３とキーワード照合完了状態Ｎ１４とを削除する。次に、照合処理装置１００は、パス照合状態Ｎ１５を生成する。また、パス照合状態Ｎ１５が生成されると、照合処理装置１００は、クエリＱのキーワードを構成する文字によりパス照合状態Ｎ１５から順次遷移される遷移先状態として、キーワード照合途中状態Ｎ１６，Ｎ１７とキーワード照合完了状態Ｎ１８とを生成する。 For example, the verification processing apparatus 100 specifies the path ID “9” that minimizes the cumulative number of times in the path ID management table T. Next, the verification processing apparatus 100 deletes the path verification state N11 that specifies the specified path ID “9”. Then, with the deletion of the path verification state N11, the verification processing device 100 deletes the keyword verification intermediate states N12 and N13 and the keyword verification completion state N14 that are sequentially shifted from the path verification state N11. Next, the matching processing device 100 generates a path matching state N15. When the path matching state N15 is generated, the matching processing device 100 sets the keyword matching intermediate states N16, N17 and the keyword as transition destination states that are sequentially shifted from the path matching state N15 by the characters constituting the keyword of the query Q. A verification completion state N18 is generated.

また、照合処理装置１００は、パス照合状態Ｎ１５から開始状態Ｎ１への遷移（不図示）と終了状態Ｎ２への遷移（不図示）を生成する。また、照合処理装置１００は、パス照合状態Ｎ１５が自身にループする遷移（不図示）も生成する。この遷移は、全記号Σから開始タグ記号「［」、終了タグ記号「］」およびキーワードの先頭文字「Ｂ」を除いた記号「Σ＼｛［，］，Ｂ｝」となる。これにより、３回目のオートマトンＡの更新が完了する。 Further, the matching processing device 100 generates a transition (not shown) from the path matching state N15 to the start state N1 and a transition (not shown) to the end state N2. The matching processing device 100 also generates a transition (not shown) in which the path matching state N15 loops to itself. This transition is a symbol “Σ \ {[,], B}” obtained by removing the start tag symbol “[”, the end tag symbol “]” and the first character “B” of the keyword from all symbols Σ. Thus, the third update of the automaton A is completed.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［１１」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「１１」により、開始状態Ｎ１からパス照合状態Ｎ１５に走査する。走査位置は、パス照合状態Ｎ１５となる。照合処理装置１００は、パス照合状態Ｎ１５に走査すると、パス照合状態Ｎ１５のノードカウンタを「０」から「１」に更新する。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 which is the scanning position to the start state N1 by “[” of “[11” of the second buffer b2, and from the start state N1 by “11” of the second buffer b2. Scan to the path verification state N15. The scanning position is in a path verification state N15. When the verification processing device 100 scans the path verification state N15, the node counter of the path verification state N15 is updated from “0” to “1”.

ここでは、照合処理装置１００は、パス照合状態Ｎ１１を削除してパス照合状態Ｎ１５を生成したが、これに限らない。例えば、照合処理装置１００は、パス照合状態Ｎ１５を生成してから、パス照合状態Ｎ１１を削除してもよい。また、例えば、照合処理装置１００は、パス照合状態Ｎ１１にパス照合状態Ｎ１５を上書きすることにより、パス照合状態Ｎ１１の削除とパス照合状態Ｎ１５の生成とを行ってもよい。この場合、照合処理装置１００は、キーワード照合途中状態Ｎ１２，Ｎ１３とキーワード照合完了状態Ｎ１４とを、キーワード照合途中状態Ｎ１６，Ｎ１７とキーワード照合完了状態Ｎ１８として、流用してもよい。 Here, the verification processing device 100 deletes the path verification status N11 and generates the path verification status N15, but this is not a limitation. For example, the verification processing apparatus 100 may delete the path verification state N11 after generating the path verification state N15. Further, for example, the verification processing apparatus 100 may delete the path verification state N11 and generate the path verification state N15 by overwriting the path verification state N15 on the path verification state N11. In this case, the matching processing device 100 may divert the keyword matching intermediate states N12 and N13 and the keyword matching completion state N14 as the keyword matching intermediate states N16 and N17 and the keyword matching completion state N18.

次に、図２８の説明に移行する。図２８は、図２７の状態から、入力ストリームＳのレコード３の残りの各データが受信され、照合処理装置１００によって図１０〜図２４と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 28 shows a processing result when each remaining data of the record 3 of the input stream S is received from the state of FIG. 27 and processed by the matching processing device 100 in the same manner as in FIGS. .

図２８において、文字列「Ｂｏｂ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｅｄｉｔ＞が受信され、終了タグ＜／ｐｒｅｖ＞が受信され、終了タグ＜／ｎｅｗｓ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。図２８において、パスＩＤ管理テーブルＴは更新されない。また、オートマトンＡは更新されない。走査位置は、初期状態Ｎ０となる。 In FIG. 28, the character string “Bob” is received, the end tag </ name> is received, the end tag </ edit> is received, the end tag </ prev> is received, and the end tag </ news> is set. Received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The In FIG. 28, the path ID management table T is not updated. Also, automaton A is not updated. The scanning position is in the initial state N0.

次に、図２９の説明に移行する。図２９は、図２８の状態から、入力ストリームＳのレコード４の各データが受信され、照合処理装置１００によって図１０〜図２４と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 29 shows a processing result when each data of the record 4 of the input stream S is received from the state of FIG. 28 and processed in the same way as in FIGS.

図２９において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０４」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 29, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-04” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「３」から「４」に更新され、オートマトンＡが走査される。また、パスＩＤ「４」を特定するパス照合状態Ｎ３のノードカウンタが「３」から「４」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “4” in the path ID management table T is updated from “3” to “4”, and the automaton A is scanned. In addition, the node counter in the path verification state N3 that specifies the path ID “4” is updated from “3” to “4”.

次に、文字列「Ｃａｒｏｌ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、開始タグ＜ｐｒｅｖ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Carol” is received, the end tag </ name> is received, the end tag </ write> is received, the start tag <prev> is received, and the start tag <write> is received. . Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「９」に関する累計回数が「２」から「３」に更新され、オートマトンＡが走査される。また、パスＩＤ管理テーブルＴの累積回数に基づいて、パスＩＤ「１１」のパスを特定するパス照合状態Ｎ１５が削除され、パスＩＤ「９」のパスを特定するパス照合状態Ｎ１１が再作成される。また、パスＩＤ「９」を特定するパス照合状態Ｎ１１のノードカウンタが「０」から「１」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “9” in the path ID management table T is updated from “2” to “3”, and the automaton A is scanned. Also, based on the cumulative number of times in the path ID management table T, the path verification state N15 that specifies the path with the path ID “11” is deleted, and the path verification state N11 that specifies the path with the path ID “9” is recreated. The Further, the node counter of the path verification state N11 that specifies the path ID “9” is updated from “0” to “1”.

次に、文字列「Ａｌｉｃｅ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 Next, the character string “Alice” is received, the end tag </ name> is received, the end tag </ write> is received, and the start tag <write> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「９」に関する累計回数が「３」から「４」に更新され、オートマトンＡが走査される。また、パスＩＤ「９」を特定するパス照合状態Ｎ１１のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “9” in the path ID management table T is updated from “3” to “4”, and the automaton A is scanned. Further, the node counter of the path verification state N11 that specifies the path ID “9” is updated from “1” to “2”.

次に、文字列「Ｂｏｂ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｗｒｉｔｅ＞が受信され、終了タグ＜／ｐｒｅｖ＞が受信され、終了タグ＜／ｎｅｗｓ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。走査位置は、初期状態Ｎ０となる。 Next, the character string “Bob” is received, the end tag </ name> is received, the end tag </ write> is received, the end tag </ prev> is received, and the end tag </ news> is received. Is done. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The The scanning position is in the initial state N0.

次に、図３０の説明に移行する。図３０は、図２９の状態から、入力ストリームＳのレコード５の各データが受信され、照合処理装置１００によって図１０〜図２４と同様にして処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 30 shows a processing result when each data of the record 5 of the input stream S is received from the state of FIG. 29 and processed in the same manner as in FIGS.

図３０において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０５」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｐｒｅｖ＞が受信され、開始タグ＜ｅｄｉｔ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 30, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-05” is received, the end tag </ date> is received, and the start tag <prev> Is received and the start tag <edit> is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信され、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「１１」に関する累計回数が「１」から「２」に更新される。また、パスＩＤ管理テーブルＴの累積回数に基づいて、パスＩＤ「６」のパスを特定するパス照合状態Ｎ７が削除され、パスＩＤ「１１」のパスを特定するパス照合状態Ｎ１５が再作成される。また、パスＩＤ「１１」を特定するパス照合状態Ｎ１５のノードカウンタが「０」から「１」に更新される。 Next, the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. Further, the cumulative number of times related to the path ID “11” in the path ID management table T is updated from “1” to “2”. Further, based on the cumulative number of times in the path ID management table T, the path verification state N7 that specifies the path with the path ID “6” is deleted, and the path verification state N15 that specifies the path with the path ID “11” is recreated. The Further, the node counter in the path verification state N15 that specifies the path ID “11” is updated from “0” to “1”.

次に、文字列「Ｄｉｃｋ」が受信され、終了タグ＜／ｎａｍｅ＞が受信され、終了タグ＜／ｅｄｉｔ＞が受信され、終了タグ＜／ｐｒｅｖ＞が受信され、終了タグ＜／ｎｅｗｓ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。走査位置は、初期状態Ｎ０となる。 Next, the character string “Dick” is received, the end tag </ name> is received, the end tag </ edit> is received, the end tag </ prev> is received, and the end tag </ news> is received. Is done. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The The scanning position is in the initial state N0.

（照合処理装置１００の第２の動作例）
次に、図３１〜図３６を用いて、照合処理装置１００の第２の動作例について説明する。図３１〜図３６は、照合処理装置１００の第２の動作例を示す説明図である。第２の動作例においては、パスＩＤ管理テーブルＴの前回項目がなくてもよい。したがって、図３１〜図３６では、パスＩＤ管理テーブルＴの前回項目の表記を省略する。 (Second operation example of collation processing apparatus 100)
Next, a second operation example of the verification processing device 100 will be described with reference to FIGS. FIGS. 31 to 36 are explanatory diagrams illustrating a second operation example of the verification processing device 100. In the second operation example, the previous item of the path ID management table T may not be present. Therefore, in FIG. 31 to FIG. 36, the previous item of the path ID management table T is not shown.

図３１は、第２の動作例における初期オートマトンＡの走査開始前の状態を示している。図３１においては、図１０に示したクエリＱと初期オートマトンＡと第１バッファｂ１と第２バッファｂ２と第３バッファｂ３とに加えて、クエリＱの条件に一致した過去５回分のパスＩＤを格納する頻度管理テーブルＦが用意されている。 FIG. 31 shows a state before the scanning of the initial automaton A in the second operation example. In FIG. 31, in addition to the query Q, the initial automaton A, the first buffer b1, the second buffer b2, and the third buffer b3 shown in FIG. A frequency management table F to be stored is prepared.

図３１は走査前であるため、第１バッファｂ１と第２バッファｂ２と第３バッファｂ３とは空である。また、パスＩＤ管理テーブルＴと頻度管理テーブルＦにもレコードは存在しない。この後、照合処理装置１００は、入力ストリームＳのレコード１の各データを図１１〜図１５と同様にして処理する。 Since FIG. 31 is before scanning, the first buffer b1, the second buffer b2, and the third buffer b3 are empty. Further, there is no record in the path ID management table T and the frequency management table F. Thereafter, the collation processing apparatus 100 processes each data of the record 1 of the input stream S in the same manner as in FIGS.

次に、図３２の説明に移行する。図３２は、図１６と同様に入力ストリームＳのレコード１の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description shifts to the description of FIG. FIG. 32 shows processing when the start tag <name> of the record 1 of the input stream S is received as in FIG.

図３２において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［４」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｗｒｉｔｅ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 32, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <name>, converts it to “[4”, and writes it to the second buffer b2. Further, since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “name” at the end of “/ news / write” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「４」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。また、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、頻度管理テーブルＦにパスＩＤ「４」を登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “4”, the path “/ news / write / name” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. Here, the path p4 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”. The path p4 of the third buffer b3 matches the query Q condition. Therefore, the verification processing apparatus 100 registers the path ID “4” in the frequency management table F.

また、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの更新を実行する。具体的には、照合処理装置１００は、まず、パスＩＤ「４」を示すパス照合状態Ｎ３を生成する。また、パス照合状態Ｎ３が生成されると、照合処理装置１００は、クエリＱのキーワードを構成する文字によりパス照合状態Ｎ３から順次遷移される遷移先状態として、キーワード照合途中状態Ｎ４，Ｎ５と、キーワード照合完了状態Ｎ６と、を生成する。 The path p4 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the automaton A. Specifically, the matching processing device 100 first generates a path matching state N3 indicating the path ID “4”. When the path matching state N3 is generated, the matching processing device 100 sets the keyword matching in-progress states N4 and N5 as transition destination states that are sequentially shifted from the path matching state N3 by the characters constituting the keyword of the query Q. A keyword matching completion state N6 is generated.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［４」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「４」により、開始状態Ｎ１からパス照合状態Ｎ３に走査する。走査位置は、パス照合状態Ｎ３となる。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 which is the scanning position to the start state N1 by “[” of “[4” of the second buffer b2, and from the start state N1 by “4” of the second buffer b2. Scan to the path verification state N3. The scanning position is in a path verification state N3.

照合処理装置１００は、パス照合状態Ｎ３に走査すると、パス照合状態Ｎ３のノードカウンタを「０」から「１」に更新する。この後、照合処理装置１００は、入力ストリームＳのレコード１の各データを図１７〜図２０と同様にして処理する。 When the verification processing device 100 scans the path verification state N3, the node counter in the path verification state N3 is updated from “0” to “1”. Thereafter, the collation processing apparatus 100 processes each data of the record 1 of the input stream S in the same manner as in FIGS.

次に、図３３の説明に移行する。図３３は、図２１と同様に入力ストリームＳのレコード１の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 33 shows processing when the start tag <name> of the record 1 of the input stream S is received as in FIG.

図３３において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［６」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｅｄｉｔ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 33, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <name>, converts it to “[6”, and writes it to the second buffer b2. Since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “name” at the end of “/ news / edit” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「６」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｅｄｉｔ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ６はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。また、第３バッファｂ３のパスｐ６はクエリＱの条件に一致する。このため、照合処理装置１００は、頻度管理テーブルＦにパスＩＤ「６」を登録する。 Then, the verification processing apparatus 100 registers a record of the path ID “6”, the path “/ news / edit / name” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. Here, the path p6 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”. The path p6 of the third buffer b3 matches the query Q condition. For this reason, the verification processing apparatus 100 registers the path ID “6” in the frequency management table F.

また、第３バッファｂ３のパスｐ６はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの更新を実行する。具体的には、照合処理装置１００は、クエリＱの条件を満たすパスｐ６を特定するパス照合状態Ｎ７を生成する。また、パス照合状態Ｎ７が生成されると、照合処理装置１００は、クエリＱのキーワードを構成する文字によりパス照合状態Ｎ７から順次遷移される遷移先状態として、キーワード照合途中状態Ｎ８，Ｎ９とキーワード照合完了状態Ｎ１０とを生成する。 The path p6 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the automaton A. Specifically, the matching processing device 100 generates a path matching state N7 that identifies the path p6 that satisfies the condition of the query Q. When the path matching state N7 is generated, the matching processing device 100 sets the keyword matching intermediate states N8, N9 and the keyword as transition destination states that are sequentially shifted from the path matching state N7 by the characters constituting the keyword of the query Q. A verification completion state N10 is generated.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［６」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「６」により、開始状態Ｎ１からパス照合状態Ｎ７に走査する。走査位置は、パス照合状態Ｎ７となる。照合処理装置１００は、パス照合状態Ｎ７に走査すると、パス照合状態Ｎ７のノードカウンタを「０」から「１」に更新する。この後、照合処理装置１００は、入力ストリームＳのレコード１の各データを図２２および図２３と同様にして処理する。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 which is the scanning position to the start state N1 by “[” of “[6” of the second buffer b2, and from the start state N1 by “6” of the second buffer b2. Scan to the path verification state N7. The scanning position is in a path verification state N7. When the verification processing device 100 scans the path verification state N7, the node counter of the path verification state N7 is updated from “0” to “1”. Thereafter, the collation processing apparatus 100 processes each data of the record 1 of the input stream S in the same manner as in FIG. 22 and FIG.

次に、図３４の説明に移行する。図３４は、図２４と同様に入力ストリームＳのレコード１の残りの各データが受信され、照合処理装置１００によって処理された場合の処理結果を示している。 Next, the description shifts to the description of FIG. FIG. 34 shows the processing result when the remaining data of the record 1 of the input stream S is received and processed by the matching processing device 100 as in FIG.

図３４において、開始タグ＜ｐｒｅｖ＞が受信されると、照合処理装置１００によって、開始タグ＜ｐｒｅｖ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「７」に関するレコードが登録され、オートマトンＡが走査される。 In FIG. 34, when the start tag <prev> is received, the collation processing device 100 writes the data obtained by converting the start tag <prev> into the second buffer b2, and the third buffer b3 is updated by the current path. The Further, a record relating to the path ID “7” is registered in the path ID management table T, and the automaton A is scanned.

次に、開始タグ＜ｗｒｉｔｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｗｒｉｔｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「８」に関するレコードが登録され、オートマトンＡが走査される。 Next, when the start tag <write> is received, the collation processing device 100 writes the data obtained by converting the start tag <write> into the second buffer b2, and the third buffer b3 is updated by the current path. . Further, a record related to the path ID “8” is registered in the path ID management table T, and the automaton A is scanned.

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴにパスＩＤ「９」に関するレコードが登録され、オートマトンＡが走査される。また、頻度管理テーブルＦにパスＩＤ「９」が登録される。また、オートマトンＡにパス照合状態Ｎ１１と、キーワード照合途中状態Ｎ１２，Ｎ１３と、キーワード照合完了状態Ｎ１４と、が生成される。パス照合状態Ｎ１１のノードカウンタは「０」から「１」に更新されている。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . Further, a record related to the path ID “9” is registered in the path ID management table T, and the automaton A is scanned. Further, the path ID “9” is registered in the frequency management table F. Further, a path collation state N11, keyword collation intermediate states N12 and N13, and a keyword collation completion state N14 are generated in the automaton A. The node counter in the path verification state N11 is updated from “0” to “1”.

次に、図３５の説明に移行する。図３５は、図３４の状態から、図２５と同様に入力ストリームＳのレコード２の各データが受信され、照合処理装置１００によって処理された場合の処理結果を示している。 Next, the description proceeds to FIG. FIG. 35 shows a processing result when each data of the record 2 of the input stream S is received from the state of FIG. 34 and processed by the collation processing device 100 as in FIG.

図３５において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０２」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 35, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-02” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「１」から「２」に更新され、オートマトンＡが走査される。また、頻度管理テーブルＦにパスＩＤ「４」が登録される。また、パスＩＤ「４」を特定するパス照合状態Ｎ３のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . Further, the cumulative number related to the path ID “4” in the path ID management table T is updated from “1” to “2”, and the automaton A is scanned. Further, the path ID “4” is registered in the frequency management table F. Further, the node counter of the path verification state N3 that specifies the path ID “4” is updated from “1” to “2”.

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「６」に関する累計回数が「１」から「２」に更新され、オートマトンＡが走査される。また、頻度管理テーブルＦにパスＩＤ「６」が登録される。ここで、頻度管理テーブルＦのレコード数は上限に達する。また、パスＩＤ「６」を特定するパス照合状態Ｎ７のノードカウンタが「１」から「２」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “6” in the path ID management table T is updated from “1” to “2”, and the automaton A is scanned. Further, the path ID “6” is registered in the frequency management table F. Here, the number of records in the frequency management table F reaches the upper limit. In addition, the node counter of the path verification state N7 that specifies the path ID “6” is updated from “1” to “2”.

次に、図３６の説明に移行する。図３６は、図３５の状態から、入力ストリームＳのレコード３の途中までの各データが受信され、照合処理装置１００によって処理された場合の処理結果を示している。 Next, the description shifts to the description of FIG. FIG. 36 shows a processing result when each piece of data from the state of FIG. 35 to the middle of the record 3 of the input stream S is received and processed by the collation processing device 100.

図３６において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０３」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 36, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-03” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「２」から「３」に更新され、オートマトンＡが走査される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . In addition, the cumulative number related to the path ID “4” in the path ID management table T is updated from “2” to “3”, and the automaton A is scanned.

また、頻度管理テーブルＦがレコード数の上限に達しているため、頻度管理テーブルＦから最古のパスＩＤ「４」が削除され、頻度管理テーブルＦにパスＩＤ「４」が登録される。ここで、頻度管理テーブルＦから削除されたパスＩＤ「４」に関するパスＩＤ管理テーブルＴの累計回数を「３」から「２」に更新する。照合処理装置１００は、入力ストリームＳの以降の文字列について同様に処理を行う。入力ストリームＳの以降の文字列についての処理の説明は省略する。 Further, since the frequency management table F has reached the upper limit of the number of records, the oldest path ID “4” is deleted from the frequency management table F, and the path ID “4” is registered in the frequency management table F. Here, the cumulative number of times in the path ID management table T related to the path ID “4” deleted from the frequency management table F is updated from “3” to “2”. The collation processing device 100 performs the same processing on the subsequent character strings of the input stream S. A description of the processing for the subsequent character strings of the input stream S is omitted.

上述した第２の動作例においては、パスＩＤ管理テーブルＴの前回項目のフラグをオートマトンＡの更新に使用しない例を示したが、これに限らない。例えば、第２の動作例においては、第３の動作例と同様の動作を行ってパスＩＤ管理テーブルＴの前回項目を更新してもよい。そして、上述したようにパスＩＤ管理テーブルＴの累計回数を頻度管理テーブルＦにより修正し、修正した累計回数とフラグとを用いてオートマトンＡを更新してもよい。 In the second operation example described above, an example in which the flag of the previous item in the path ID management table T is not used for updating the automaton A is shown, but the present invention is not limited to this. For example, in the second operation example, the previous item of the path ID management table T may be updated by performing the same operation as in the third operation example. Then, as described above, the cumulative number of times of the path ID management table T may be corrected by the frequency management table F, and the automaton A may be updated using the corrected cumulative number of times and the flag.

このように、照合処理装置１００は、パスＩＤごとのクエリＱの条件を満たす累計回数を、頻度を考慮して計数する。例えば、照合処理装置１００は、過去にクエリＱの条件を満たすと検出された所定回数の中でのパスＩＤごとのクエリＱの条件を満たす累計回数を計数する。したがって、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、近々、再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 In this way, the verification processing device 100 counts the cumulative number that satisfies the query Q condition for each path ID in consideration of the frequency. For example, the matching processing device 100 counts the total number of times that satisfies the condition of the query Q for each path ID in the predetermined number of times detected when the condition of the query Q is satisfied in the past. Therefore, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again soon, among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

（照合処理装置１００の第３の動作例）
次に、図３７〜図４１を用いて、照合処理装置１００の第３の動作例について説明する。図３７〜図４１は、照合処理装置１００の第３の動作例を示す説明図である。第３の動作例においては、頻度管理テーブルＦがなくてもよい。したがって、図３７〜図４１では、頻度管理テーブルＦの表記を省略する。 (Third Operation Example of Collation Processing Device 100)
Next, a third operation example of the verification processing apparatus 100 will be described with reference to FIGS. 37 to 41 are explanatory diagrams illustrating a third operation example of the matching processing device 100. FIG. In the third operation example, the frequency management table F may not be provided. Therefore, the description of the frequency management table F is omitted in FIGS.

図３７は、第３の動作例におけるオートマトンＡの状態の一例を示している。図３７において、第２バッファｂ２と第３バッファｂ３とは空である。また、パスＩＤ管理テーブルＴは、図示した状態である。以下では、第１バッファｂ１に示す入力ストリームＳのレコード６が受信された場合に、照合処理装置１００によってレコード６の各データが処理された場合の処理結果について説明する。 FIG. 37 shows an example of the state of the automaton A in the third operation example. In FIG. 37, the second buffer b2 and the third buffer b3 are empty. Further, the path ID management table T is in the illustrated state. Hereinafter, the processing result when each data of the record 6 is processed by the collation processing device 100 when the record 6 of the input stream S shown in the first buffer b1 is received will be described.

次に、図３８の説明に移行する。図３８は、図３７の状態から、入力ストリームＳのレコード６の途中までの各データが、照合処理装置１００によって処理された場合の処理結果を示している。 Next, the description shifts to the description of FIG. FIG. 38 shows a processing result when each data from the state of FIG. 37 to the middle of the record 6 of the input stream S is processed by the collation processing device 100.

図３８において、開始タグ＜ｎｅｗｓ＞が受信され、開始タグ＜ｄａｔｅ＞が受信され、文字列「２０１１−１２−０６」が受信され、終了タグ＜／ｄａｔｅ＞が受信され、開始タグ＜ｗｒｉｔｅ＞が受信される。ここで、各データが受信されるごとに、照合処理装置１００によって、各データを変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新され、オートマトンＡが走査される。 In FIG. 38, the start tag <news> is received, the start tag <date> is received, the character string “2011-12-06” is received, the end tag </ date> is received, and the start tag <write> is received. Is received. Here, each time each data is received, the data obtained by converting each data is written into the second buffer b2 by the verification processing device 100, the third buffer b3 is updated by the current pass, and the automaton A is scanned. The

次に、開始タグ＜ｎａｍｅ＞が受信されると、照合処理装置１００によって、開始タグ＜ｎａｍｅ＞を変換したデータが第２バッファｂ２に書き込まれ、現在のパスにより第３バッファｂ３が更新される。また、パスＩＤ管理テーブルＴのパスＩＤ「４」に関する累計回数が「６」から「７」に更新され、オートマトンＡが走査される。また、パスＩＤ「４」を特定するパス照合状態Ｎ３のノードカウンタが「６」から「７」に更新される。 Next, when the start tag <name> is received, the collation processing device 100 writes the data obtained by converting the start tag <name> into the second buffer b2, and the third buffer b3 is updated by the current path. . Further, the cumulative number related to the path ID “4” in the path ID management table T is updated from “6” to “7”, and the automaton A is scanned. Further, the node counter in the path verification state N3 that specifies the path ID “4” is updated from “6” to “7”.

次に、図３９の説明に移行する。図３９は、図３８の状態から、入力ストリームＳのレコード６の開始タグ＜ｇｈｏｓｔ＞が受信された場合の処理を示している。 Next, the description shifts to the description of FIG. FIG. 39 shows a process when the start tag <host> of the record 6 of the input stream S is received from the state of FIG.

図３９において、入力ストリームＳの開始タグ＜ｇｈｏｓｔ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｇｈｏｓｔ＞を読み出して「［９」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｇｈｏｓｔ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」の末尾に「／」およびタグ内文字列「ｇｈｏｓｔ」を書き込む。 In FIG. 39, when the start tag <host> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <host>, converts it to “[9”, and writes it to the second buffer b2. Also, since the start tag <ghost> has been received, the verification processing apparatus 100 writes “/” and the in-tag character string “ghost” at the end of “/ news / write / name” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「９」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ／ｇｈｏｓｔ」、累計回数「（なし）」のレコードを登録する。ここで、パスｐ９の親パスとなるパスｐ４を示すパス照合状態Ｎ３がオートマトンＡに規定されているため、終了状態Ｎ２からパス照合状態Ｎ３への遷移を生成すると共に、パスＩＤ「４」に関するパスＩＤ管理テーブルＴの前回項目に「開始」を設定する。 Then, the verification processing apparatus 100 registers a record of the path ID “9”, the path “/ news / write / name / host” in the third buffer b3, and the cumulative number “(none)” in the path ID management table T. To do. Here, since the path collation state N3 indicating the path p4 which is the parent path of the path p9 is defined in the automaton A, a transition from the end state N2 to the path collation state N3 is generated and the path ID “4” is related. “Start” is set in the previous item of the path ID management table T.

ここで、第３バッファｂ３のパスｐ９はクエリＱの条件に一致しない。このため、照合処理装置１００は、第２バッファｂ２の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「９」により、開始状態Ｎ１から初期状態Ｎ０に走査する。次に、図４０の説明に移行する。図４０は、図３９の状態から、入力ストリームＳのレコード６の開始タグ＜ｎａｍｅ＞が受信された場合の処理を示している。 Here, the path p9 of the third buffer b3 does not match the query Q condition. Therefore, the collation processing device 100 scans from the initial state N0 that is the scanning position to the start state N1 by “[” of the second buffer b2, and from the start state N1 to the initial state by “9” of the second buffer b2. Scan to N0. Next, the description proceeds to FIG. FIG. 40 shows processing when the start tag <name> of the record 6 of the input stream S is received from the state of FIG.

図４０において、入力ストリームＳの開始タグ＜ｎａｍｅ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜ｎａｍｅ＞を読み出して「［１０」に変換し、第２バッファｂ２に書き込む。また、開始タグ＜ｎａｍｅ＞が受信されたため、照合処理装置１００は、第３バッファｂ３の「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ／ｇｈｏｓｔ」の末尾に「／」およびタグ内文字列「ｎａｍｅ」を書き込む。 In FIG. 40, when the start tag <name> of the input stream S is written to the first buffer b1, the verification processing device 100 reads <name>, converts it to “[10”, and writes it to the second buffer b2. Since the start tag <name> has been received, the verification processing apparatus 100 writes “/” and the character string “name” in the tag at the end of “/ news / write / name / host” in the third buffer b3.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴに、パスＩＤ「１０」、第３バッファｂ３内のパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ／ｇｈｏｓｔ／ｎａｍｅ」、累計回数「（なし）」のレコードを登録する。ここで、第３バッファｂ３のパスｐ１０はクエリＱの条件に一致する。このため、照合処理装置１００は、レコードの累計回数を「（なし）」から「１」に更新する。 Then, the verification processing apparatus 100 records in the path ID management table T the path ID “10”, the path “/ news / write / name / host / name” in the third buffer b3, and the cumulative number “(none)”. Register. Here, the path p10 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the cumulative number of records from “(none)” to “1”.

また、第３バッファｂ３のパスｐ１０はクエリＱの条件に一致する。このため、照合処理装置１００は、オートマトンＡの更新を実行する。具体的には、照合処理装置１００は、クエリＱの条件を満たすパスｐ１０を特定するパス照合状態Ｎ１５を生成する。このとき、オートマトンＡには、既に最大規定数のパス照合状態が規定されているため、照合処理装置１００は、オートマトンＡに規定されているパス照合状態のうち、いずれかのパス照合状態を削除してから、パスｐ１０を特定するパス照合状態Ｎ１５を生成する。 The path p10 of the third buffer b3 matches the query Q condition. For this reason, the collation processing apparatus 100 updates the automaton A. Specifically, the matching processing device 100 generates a path matching state N15 that identifies the path p10 that satisfies the condition of the query Q. At this time, since the maximum specified number of path verification states are already specified in the automaton A, the verification processing apparatus 100 deletes any path verification status from among the path verification states specified in the automaton A. Then, a path verification state N15 that specifies the path p10 is generated.

例えば、照合処理装置１００は、パスＩＤ管理テーブルＴの累計回数が最小になるパスＩＤ「４」を特定する。しかしながら、パスＩＤ「４」に対応する前回項目に「開始」が設定されているため、パスＩＤ「４」のパスを特定するパス照合状態Ｎ３は、近々、走査されるパス照合状態である。したがって、照合処理装置１００は、パスＩＤ「４」のパスを特定するパス照合状態Ｎ３を削除しない。 For example, the verification processing apparatus 100 identifies the path ID “4” that minimizes the cumulative number of times in the path ID management table T. However, since “start” is set in the previous item corresponding to the path ID “4”, the path verification state N3 for specifying the path with the path ID “4” is a path verification state to be scanned soon. Therefore, the verification processing apparatus 100 does not delete the path verification state N3 that specifies the path with the path ID “4”.

そこで、照合処理装置１００は、パスＩＤ管理テーブルＴの累積回数が２番目に小さいパスＩＤ「８」を特定する。次に、照合処理装置１００は、パスＩＤ「８」に対応する前回項目に「開始」が設定されていないため、特定したパスＩＤ「８」を特定するパス照合状態Ｎ１１を削除する。 Therefore, the verification processing apparatus 100 identifies the path ID “8” with the second smallest cumulative number in the path ID management table T. Next, since “start” is not set in the previous item corresponding to the path ID “8”, the verification processing apparatus 100 deletes the path verification state N11 that specifies the specified path ID “8”.

そして、照合処理装置１００は、パスＩＤ「８」を特定するパス照合状態Ｎ１１の削除に伴って、パスＩＤ「８」を特定するパス照合状態から順次遷移されるキーワード照合途中状態Ｎ１２，Ｎ１３とキーワード照合完了状態Ｎ１４とを削除する。次に、照合処理装置１００は、パスｐ１０を特定するパス照合状態Ｎ１５を生成する。また、パスｐ１０を特定するパス照合状態Ｎ１５が生成されると、照合処理装置１００は、クエリＱのキーワードを構成する文字によりパスｐ１０を特定するパス照合状態Ｎ１５から順次遷移される遷移先状態を生成する。遷移先状態は、キーワード照合途中状態Ｎ１６，Ｎ１７とキーワード照合完了状態Ｎ１８とである。 Then, the collation processing apparatus 100 includes the keyword collation intermediate states N12 and N13 that are sequentially shifted from the path collation state that specifies the path ID “8” with the deletion of the path collation state N11 that identifies the path ID “8”. The keyword matching completion state N14 is deleted. Next, the matching processing device 100 generates a path matching state N15 that identifies the path p10. Further, when the path matching state N15 that identifies the path p10 is generated, the matching processing device 100 displays the transition destination state that is sequentially shifted from the path matching state N15 that identifies the path p10 by the characters constituting the keyword of the query Q. Generate. The transition destination states are keyword collation intermediate states N16 and N17 and keyword collation completion state N18.

また、照合処理装置１００は、パスｐ１０を特定するパス照合状態Ｎ１５から開始状態Ｎ１への遷移（不図示）と終了状態Ｎ２への遷移（不図示）を生成する。また、照合処理装置１００は、パスｐ１０を特定するパス照合状態Ｎ１５が自身にループする遷移（不図示）も生成する。この遷移は、全記号Σから開始タグ記号「［」、終了タグ記号「］」およびキーワードの先頭文字「Ｂ」を除いた記号「Σ＼｛［，］，Ｂ｝」となる。これにより、１回目のオートマトンＡの更新が完了する。 In addition, the matching processing device 100 generates a transition (not shown) from the path matching state N15 that specifies the path p10 to the start state N1 and a transition (not shown) to the end state N2. The matching processing device 100 also generates a transition (not shown) in which the path matching state N15 that specifies the path p10 loops on itself. This transition is a symbol “Σ \ {[,], B}” obtained by removing the start tag symbol “[”, the end tag symbol “]” and the first character “B” of the keyword from all symbols Σ. Thereby, the first update of the automaton A is completed.

照合処理装置１００は、更新後のオートマトンＡにより走査を開始する。照合処理装置１００は、第２バッファｂ２の「［１０」の「［」により、走査位置である初期状態Ｎ０から開始状態Ｎ１に走査し、第２バッファｂ２の「１０」により、開始状態Ｎ１からパスｐ１０を特定するパス照合状態Ｎ１５に走査する。走査位置は、パスｐ１０を特定するパス照合状態Ｎ１５となる。 The verification processing device 100 starts scanning with the updated automaton A. The collation processing device 100 scans from the initial state N0 that is the scanning position to the start state N1 by “[” of “[10” of the second buffer b2, and from the start state N1 by “10” of the second buffer b2. Scan to the path verification state N15 that identifies the path p10. The scanning position is in a path collation state N15 that specifies the path p10.

照合処理装置１００は、パスｐ１０を特定するパス照合状態Ｎ１５に走査すると、パスｐ１０を特定するパス照合状態Ｎ１５のノードカウンタを「０」から「１」に更新する。この後、文字列「Ｅｍｉｌｙ」が受信され、終了タグ＜／ｎａｍｅ＞が受信される。 When the verification processing device 100 scans the path verification state N15 that specifies the path p10, the node counter of the path verification state N15 that specifies the path p10 is updated from “0” to “1”. Thereafter, the character string “Emily” is received, and the end tag </ name> is received.

次に、図４１の説明に移行する。図４１は、図４０の状態から、入力ストリームＳのレコード６の終了タグ＜／ｇｈｏｓｔ＞が受信された場合の処理を示している。 Next, the description proceeds to FIG. FIG. 41 shows the processing when the end tag </ host> of the record 6 of the input stream S is received from the state of FIG.

図４１において、入力ストリームＳの終了タグ＜／ｇｈｏｓｔ＞が第１バッファｂ１に書き込まれると、照合処理装置１００は、＜／ｇｈｏｓｔ＞を読み出して「］９」に変換し、第２バッファｂ２に書き込む。また、照合処理装置１００は、第３バッファｂ３のパスｐ９：「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ／ｇｈｏｓｔ」から「／」および終了タグ＜／ｇｈｏｓｔ＞のタグ内文字列「ｇｈｏｓｔ」を削除して、パスｐ４：「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」に戻す。ここで、パスｐ４を示すパス照合状態Ｎ３がオートマトンＡに規定されているため、パスＩＤ「４」に関するパスＩＤ管理テーブルＴの前回項目を「開始」から「終了」に更新する。 In FIG. 41, when the end tag </ host> of the input stream S is written to the first buffer b1, the verification processing device 100 reads </ host>, converts it to “] 9”, and stores it in the second buffer b2. Write. Further, the verification processing device 100 deletes “/” and the in-tag character string “ghost” of the end tag </ host> from the path p9: “/ news / write / name / host” of the third buffer b3, Path p4: Return to “/ news / write / name”. Here, since the path verification state N3 indicating the path p4 is defined in the automaton A, the previous item of the path ID management table T regarding the path ID “4” is updated from “start” to “end”.

ここで、第３バッファｂ３のパスｐ４はクエリＱの条件に一致する。このため、照合処理装置１００は、第２バッファｂ２の「］」により、走査位置である初期状態Ｎ０から終了状態Ｎ２に走査し、第２バッファｂ２の「９」により、終了状態Ｎ２からパスｐ４を特定するパス照合状態Ｎ３に走査する。 Here, the path p4 of the third buffer b3 matches the query Q condition. For this reason, the collation processing device 100 scans from the initial state N0, which is the scanning position, to the end state N2 by “]” of the second buffer b2, and passes from the end state N2 to the path p4 by “9” of the second buffer b2. To the path verification state N3 that specifies

上述した第３の動作例においては、頻度管理テーブルＦをオートマトンＡの更新に使用しない例を示したが、これに限らない。例えば、第３の動作例においては、第２の動作例と同様の動作を行ってパスＩＤ管理テーブルＴの累計回数を頻度管理テーブルＦにより修正し、修正した累計回数とフラグとを用いてオートマトンＡを更新してもよい。 In the third operation example described above, the example in which the frequency management table F is not used for updating the automaton A is shown, but the present invention is not limited thereto. For example, in the third operation example, the same operation as in the second operation example is performed to correct the cumulative number of the path ID management table T by the frequency management table F, and the automaton using the corrected cumulative number and flag. A may be updated.

このように、開始状態Ｎ１から走査された後であって、かつ、終了状態Ｎ２から走査される前であるパス照合状態があると、照合処理装置１００は、当該パス照合状態を、近々、終了状態Ｎ２から走査されるとして、残しておく。したがって、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、近々、再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 As described above, when there is a path matching state after scanning from the start state N1 and before scanning from the end state N2, the matching processing device 100 ends the path matching state soon. It is left as it is scanned from state N2. Therefore, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again soon, among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

＜照合処理装置１００のハードウェア構成例＞
図４２は、照合処理装置１００のハードウェア構成例を示すブロック図である。図４２において、照合処理装置１００は、プロセッサ４２０１、記憶装置４２０２、入力装置４２０３、出力装置４２０４、および通信装置４２０５が、バス４２０６に接続されて構成されるコンピュータである。 <Hardware Configuration Example of Collation Processing Device 100>
FIG. 42 is a block diagram illustrating a hardware configuration example of the collation processing device 100. In FIG. 42, the collation processing apparatus 100 is a computer configured by connecting a processor 4201, a storage device 4202, an input device 4203, an output device 4204, and a communication device 4205 to a bus 4206.

プロセッサ４２０１は、コンピュータの全体の制御を司る。また、プロセッサ４２０１は、記憶装置４２０２に記憶されている各種プログラムを実行することにより、記憶装置４２０２内のデータを読み出したり、実行結果となるデータを記憶装置４２０２に書き込んだりする。各種プログラムには、例えば、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や本実施の形態の更新プログラムがある。 The processor 4201 controls the entire computer. In addition, the processor 4201 executes various programs stored in the storage device 4202 to read data in the storage device 4202 and write data as an execution result to the storage device 4202. Examples of the various programs include an OS (Operating System) and an update program according to the present embodiment.

記憶装置４２０２は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ、磁気ディスクドライブなどで構成され、プロセッサ４２０１のワークエリアになったり、各種プログラムや各種プログラムの実行により得られたデータを含む各種データを記憶したりする。 The storage device 4202 includes a ROM (Read Only Memory), a RAM (Random Access Memory), a flash memory, a magnetic disk drive, and the like. The storage device 4202 becomes a work area of the processor 4201, and is obtained by executing various programs and various programs. Various data including data is stored.

入力装置４２０３は、キーボード、マウス、タッチパネルなどユーザの操作により、各種データの入力を行うインターフェースである。出力装置４２０４は、プロセッサ４２０１の指示により、データを出力するインターフェースである。出力装置４２０４には、例えば、ディスプレイやプリンタが挙げられる。通信装置４２０５は、ネットワーク１０１を介して外部からデータを受信したり、外部にデータを送信したりするインターフェースである。 The input device 4203 is an interface for inputting various data by a user operation such as a keyboard, a mouse, and a touch panel. The output device 4204 is an interface that outputs data in accordance with an instruction from the processor 4201. Examples of the output device 4204 include a display and a printer. The communication device 4205 is an interface that receives data from the outside via the network 101 and transmits data to the outside.

＜照合処理装置１００の機能的構成例＞
図４３は、照合処理装置１００の機能的構成例を示すブロック図である。照合処理装置１００は、読込部４３０１と、特定部４３０２と、検出部４３０３と、第１の計数部４３０４と、第２の計数部４３０５と、更新部４３０６と、を有する。 <Example of Functional Configuration of Collation Processing Device 100>
FIG. 43 is a block diagram illustrating a functional configuration example of the collation processing device 100. The verification processing apparatus 100 includes a reading unit 4301, a specifying unit 4302, a detecting unit 4303, a first counting unit 4304, a second counting unit 4305, and an updating unit 4306.

照合処理装置１００は、オートマトンＡを用いて、入力ストリームＳに対して、クエリＱの照合を行うコンピュータである。ここで、入力ストリームＳとは、タグにより階層化されたデータ列であって、発生源Ｇとなるコンピュータからネットワーク１０１を介して受信されるデータ列である。入力ストリームＳとは、例えば、図２に示したようなＸＭＬデータである。入力ストリームＳとは、例えば、ＨＴＭＬデータであってもよい。 The collation processing apparatus 100 is a computer that collates the query Q against the input stream S using the automaton A. Here, the input stream S is a data string that is hierarchized by tags, and is a data string that is received from the computer that is the source G via the network 101. The input stream S is, for example, XML data as shown in FIG. The input stream S may be, for example, HTML data.

タグとは、要素の集合が存在する位置を示す情報であって、例えば、開始タグや終了タグである。位置とは、例えば、入力ストリームＳ内の階層である。開始タグとは、要素の集合の開始位置を示す情報であって、例えば、図２に示した開始タグ＜ｎｅｗｓ＞や開始タグ＜ｎａｍｅ＞などである。終了タグとは、要素の集合の終了位置を示す情報であって、例えば、図２に示した終了タグ＜／ｎｅｗｓ＞や終了タグ＜／ｎａｍｅ＞などである。要素とは、開始タグと終了タグとの間に含まれる文字列や別の開始タグや別の終了タグである。 A tag is information indicating a position where a set of elements exists, and is, for example, a start tag or an end tag. The position is, for example, a hierarchy in the input stream S. The start tag is information indicating the start position of the set of elements, and is, for example, the start tag <news> or the start tag <name> shown in FIG. The end tag is information indicating the end position of the set of elements, and is, for example, the end tag </ news> or the end tag </ name> shown in FIG. The element is a character string included between the start tag and the end tag, another start tag, or another end tag.

クエリＱとは、キーワードとキーワードに対応するパスの条件とを含む情報である。キーワードとは、入力ストリームＳの文字列と照合される文字列であって、例えば、図２に示した文字列「Ｂｏｂ」である。パスとは、任意のタグの位置を示す情報であって、入力ストリームＳの最上階層から任意のタグが示す階層までの経路である。例えば、図２に示した開始タグ＜ｎｅｗｓ＞のパスは、最上階層である第０階層を示すルートから開始タグ＜ｎｅｗｓ＞が示す第１階層までの経路「／ｎｅｗｓ」である。パスの条件とは、キーワードが存在する階層を示すパスの条件であって、例えば、図２に示した「／ｎｅｗｓ／／ｎａｍｅ」である。 The query Q is information including a keyword and a path condition corresponding to the keyword. The keyword is a character string that is collated with the character string of the input stream S, and is, for example, the character string “Bob” illustrated in FIG. The path is information indicating the position of an arbitrary tag, and is a path from the top layer of the input stream S to the layer indicated by the arbitrary tag. For example, the path of the start tag <news> shown in FIG. 2 is a path “/ news” from the root indicating the 0th hierarchy, which is the highest hierarchy, to the first hierarchy indicated by the start tag <news>. The path condition is a path condition indicating a hierarchy in which the keyword exists, and is, for example, “/ news // name” illustrated in FIG.

「／ｎｅｗｓ／／ｎａｍｅ」は、第０階層のルート⇒第１階層「ｎｅｗｓ」⇒任意の階層「ｎａｍｅ」のパスを示す。したがって、例えば、パス「／ｎｅｗｓ／ｎａｍｅ」やパス「／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅ」やパス「／ｎｅｗｓ／ｎａｍｅ／ｗｒｉｔｅ／ｎａｍｅ」などが、クエリＱ内のキーワードに対応するパスの条件を満たす。 “/ News // name” indicates the path of the root of the 0th layer → the first layer “news” → the arbitrary layer “name”. Therefore, for example, the path “/ news / name”, the path “/ news / write / name”, the path “/ news / name / write / name”, and the like satisfy the conditions of the path corresponding to the keyword in the query Q.

オートマトンＡとは、例えば、少なくとも、初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定された情報である。オートマトンＡには、さらに、条件を満たすパスを示すパス照合状態が複数規定されていてもよいし、パス照合状態からの遷移先状態としてキーワード照合途中状態とキーワード照合完了状態とが規定されていてもよい。 The automaton A is information defining, for example, at least an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol. The automaton A may further include a plurality of path collation states indicating paths that satisfy the condition, and a keyword collation intermediate state and a keyword collation completion state are defined as transition destination states from the path collation state. Also good.

読込部４３０１は、入力ストリームＳを先頭から読み込む。読込部４３０１は、例えば、図１１に示したように、入力ストリームＳのレコード１の先頭データである開始タグ＜ｎｅｗｓ＞が受信され、第１バッファｂ１に書き込まれると、＜ｎｅｗｓ＞を読み込む。そして、読込部４３０１は、読み込んだ開始タグ＜ｎｅｗｓ＞を、「［１」に変換して、第２バッファｂ２に書き込む。 The reading unit 4301 reads the input stream S from the top. For example, as illustrated in FIG. 11, the reading unit 4301 receives <news> when the start tag <news>, which is the first data of the record 1 of the input stream S, is received and written to the first buffer b1. Then, the reading unit 4301 converts the read start tag <news> into “[1” and writes it in the second buffer b2.

読込部４３０１は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、読込部４３０１は、特定部４３０２に処理させるデータを読み込むことができる。 The reading unit 4301 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42, for example. Thereby, the reading unit 4301 can read data to be processed by the specifying unit 4302.

特定部４３０２は、読込部４３０１によって入力ストリームＳ内の開始タグが読み込まれると、入力ストリームＳの最上階層から開始タグが示す階層までのパスを特定する。特定部４３０２は、例えば、開始タグ＜ｎｅｗｓ＞が受信され、読込部４３０１によって第２バッファに書き込まれると、現在のパス「／ｎｅｗｓ」を特定して、第３バッファｂ３に書き込む。また、特定部４３０２は、特定したパスが、パスＩＤ管理テーブルＴに登録されていないパスである場合は、パスＩＤ管理テーブルＴに、パスＩＤ「１」、第３バッファｂ３内のパス「／ｎｅｗｓ」、累計回数「（なし）」、フラグ「（なし）」のレコードを登録する。 When the start tag in the input stream S is read by the reading unit 4301, the specifying unit 4302 specifies a path from the highest layer of the input stream S to the layer indicated by the start tag. For example, when the start tag <news> is received and written to the second buffer by the reading unit 4301, the specifying unit 4302 specifies the current path “/ news” and writes it to the third buffer b3. If the identified path is not registered in the path ID management table T, the identifying unit 4302 stores the path ID “1” and the path “/” in the third buffer b3 in the path ID management table T. records of “news”, cumulative number of times “(none)”, and flag “(none)” are registered.

また、特定部４３０２は、読込部４３０１によって入力ストリームＳ内の終了タグが読み込まれると、現在のパスが示す階層の上位階層を示すパスを特定する。特定部４３０２は、例えば、終了タグ＜／ｎｅｗｓ＞が受信され、読込部４３０１によって第２バッファに書き込まれると、第３バッファに書き込まれた現在のパス「／ｎｅｗｓ」を取得する。次に、特定部４３０２は、取得した現在のパス「／ｎｅｗｓ」から、「／」と終了タグ＜／ｎｅｗｓ＞のタグ内文字列「ｎｅｗｓ」とを削除して、上位階層である第０階層を示すパス「（なし）」を特定して、第３バッファｂ３に書き込む。 Further, when the reading unit 4301 reads the end tag in the input stream S, the specifying unit 4302 specifies a path indicating an upper layer of the layer indicated by the current path. For example, when the end tag </ news> is received and written to the second buffer by the reading unit 4301, the specifying unit 4302 obtains the current path “/ news” written to the third buffer. Next, the identifying unit 4302 deletes “/” and the in-tag character string “news” of the end tag </ news> from the acquired current path “/ news”, and the 0th hierarchy that is the upper hierarchy. The path “(None)” indicating is written to the third buffer b3.

特定部４３０２は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、検出部４３０３は、特定部４３０２によって特定されたパスを取得して、クエリＱのパスの条件を満たすか否かを判定することができる。 The specifying unit 4302 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42, for example. Thus, the detection unit 4303 can acquire the path specified by the specifying unit 4302 and determine whether or not the path condition of the query Q is satisfied.

検出部４３０３は、特定部４３０２によって特定されたパスが条件を満たすか否かを判定することにより、条件を満たすパスを検出する。検出部４３０３は、例えば、特定部４３０２によって特定されたパスが「／ｎｅｗｓ」である場合には、パスの条件「／ｎｅｗｓ／／ｎａｍｅ」を満たさないと判定して、検出しない。一方で、検出部４３０３は、例えば、特定部４３０２によって特定されたパスが「／ｎｅｗｓ／ｎａｍｅ」である場合には、パスの条件「／ｎｅｗｓ／／ｎａｍｅ」を満たすと判定して、パス「／ｎｅｗｓ／ｎａｍｅ」を検出する。 The detection unit 4303 detects a path that satisfies the condition by determining whether the path specified by the specifying unit 4302 satisfies the condition. For example, when the path specified by the specifying unit 4302 is “/ news”, the detection unit 4303 determines that the path condition “/ news // name” is not satisfied, and does not detect it. On the other hand, for example, when the path specified by the specifying unit 4302 is “/ news / name”, the detection unit 4303 determines that the path condition “/ news // name” is satisfied, and the path “ / News / name "is detected.

検出部４３０３は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、第１の計数部４３０４は、検出部４３０３によってパスが検出された累積回数を計数することができる。 The detection unit 4303 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42, for example. Accordingly, the first counting unit 4304 can count the cumulative number of times that the path is detected by the detecting unit 4303.

第１の計数部４３０４は、検出部４３０３によって入力ストリームＳから条件を満たすパスが検出されるたびに、条件を満たすパスごとの検出回数を計数する。第１の計数部４３０４は、例えば、検出部４３０３によって開始タグが示すパス「／ｎｅｗｓ／ｎａｍｅ」が検出された場合には、パスＩＤ管理テーブルＴのパス「／ｎｅｗｓ／ｎａｍｅ」に対応する累積回数をインクリメントする。一方で、第１の計数部４３０４は、検出部４３０３によって終了タグが示すパス「／ｎｅｗｓ／ｎａｍｅ」が検出された場合には、パスＩＤ管理テーブルＴのパス「／ｎｅｗｓ／ｎａｍｅ」に対応する累積回数をインクリメントしなくてもよい。 The first counting unit 4304 counts the number of detections for each path that satisfies the condition every time the detection unit 4303 detects a path that satisfies the condition from the input stream S. For example, if the detection unit 4303 detects the path “/ news / name” indicated by the start tag, the first counting unit 4304 accumulates corresponding to the path “/ news / name” in the path ID management table T. Increment the number of times. On the other hand, when the detection unit 4303 detects the path “/ news / name” indicated by the end tag, the first counting unit 4304 corresponds to the path “/ news / name” in the path ID management table T. It is not necessary to increment the cumulative number.

また、第１の計数部４３０４は、検出部４３０３によって開始タグが示すパス「／ｎｅｗｓ／ｎａｍｅ」が検出された場合には、パスＩＤ管理テーブルＴのパス「／ｎｅｗｓ／ｎａｍｅ」に対応するフラグに「開始」を設定する。第１の計数部４３０４は、例えば、検出部４３０３によって終了タグが示すパス「／ｎｅｗｓ／ｎａｍｅ」が検出された場合には、パスＩＤ管理テーブルＴのパス「／ｎｅｗｓ／ｎａｍｅ」に対応するフラグを「終了」に設定する。また、第１の計数部４３０４は、最新の所定回数分の条件を満たすパスの検出回数のうちの条件を満たすパスごとの検出回数を計数してもよい。第１の計数部４３０４は、例えば、上述したようにパスＩＤ管理テーブルＴの累計回数を頻度管理テーブルＦにより修正する。 In addition, when the detection unit 4303 detects the path “/ news / name” indicated by the start tag, the first counting unit 4304 has a flag corresponding to the path “/ news / name” in the path ID management table T. Set to “Start”. For example, when the detection unit 4303 detects the path “/ news / name” indicated by the end tag, the first counting unit 4304 has a flag corresponding to the path “/ news / name” in the path ID management table T. Set to “End”. The first counting unit 4304 may count the number of detections for each path that satisfies the condition among the number of detections of the path that satisfy the latest predetermined number of times. For example, the first counting unit 4304 corrects the cumulative number of times in the path ID management table T by the frequency management table F as described above.

第１の計数部４３０４は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、更新部４３０６は、第１の計数部４３０４によって計数された検出回数に基づいてオートマトンＡを更新することができる。 The first counting unit 4304 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42, for example. Accordingly, the update unit 4306 can update the automaton A based on the number of detections counted by the first counting unit 4304.

第２の計数部４３０５は、オートマトンＡに規定されたパス照合状態ごとに開始状態から当該パス照合状態に遷移が行われた遷移回数を計数する。第２の計数部４３０５は、照合処理装置１００によってオートマトンＡが走査され、オートマトンＡに規定されたパス照合状態に遷移が行われた場合に、遷移が行われたパス照合状態のノードカウンタをインクリメントする。 The second counting unit 4305 counts the number of times of transition from the start state to the path matching state for each path matching state defined in the automaton A. When the automaton A is scanned by the matching processing device 100 and a transition is made to the path matching state defined in the automaton A, the second counting unit 4305 increments the node counter in the path matching state in which the transition has been performed. To do.

第２の計数部４３０５は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、更新部４３０６は、第２の計数部４３０５によって計数された遷移回数に基づいてオートマトンＡを更新することができる。 For example, the second counting unit 4305 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42. Accordingly, the update unit 4306 can update the automaton A based on the number of transitions counted by the second counting unit 4305.

更新部４３０６は、オートマトンＡに、条件を満たすパスを示す新たなパス照合状態を追加する場合、計数した検出回数に基づいて、オートマトンＡを更新する。更新部４３０６は、例えば、オートマトンＡに規定されたパス照合状態が所定数以上ある場合には、オートマトンＡを更新する。更新部４３０６は、具体的には、オートマトンＡに規定されたパス照合状態のうち、計数した検出回数が相対的に少ない条件を満たすパスを示すパス照合状態を削除する The update unit 4306 updates the automaton A based on the counted number of detections when adding a new path matching state indicating a path that satisfies the condition to the automaton A. The update unit 4306 updates the automaton A when, for example, there are a predetermined number or more of path collation states defined in the automaton A. Specifically, the update unit 4306 deletes a path matching state indicating a path satisfying a condition with a relatively small number of detected counts among the path matching states defined in the automaton A.

また、更新部４３０６は、条件を満たすパスを検出した場合であって、オートマトンＡに規定されたパス照合状態が所定数より少ない場合には、検出した条件を満たすパスを示すパス照合状態をオートマトンＡに追加することにより、オートマトンＡを更新する。 In addition, when the update unit 4306 detects a path satisfying the condition and the number of path matching states defined in the automaton A is smaller than a predetermined number, the update unit 4306 sets the path matching state indicating the detected path to the automaton. By adding to A, automaton A is updated.

上述したパス照合状態をオートマトンＡから削除する処理と、上述したパス照合状態をオートマトンＡに追加する処理と、を実行する順序は、削除する処理が先であってもよいし、追加する処理が先であってもよい。更新部４３０６は、例えば、図４２に示した記憶装置４２０２に記憶されたプログラムをプロセッサ４２０１に実行させることにより、その機能を実現する。これにより、更新部４３０６は、照合処理装置１００の第１の動作例を実現することができる。 The process of deleting the path matching state described above from the automaton A and the process of adding the path matching state described above to the automaton A may be executed first or the process of adding may be performed. It may be the destination. The update unit 4306 realizes its function by causing the processor 4201 to execute a program stored in the storage device 4202 illustrated in FIG. 42, for example. As a result, the update unit 4306 can realize the first operation example of the verification processing apparatus 100.

また、更新部４３０６は、計数した遷移回数に基づいて、オートマトンＡに規定されたパス照合状態のうちで、計数した検出回数が同一の第１および第２のパス照合状態のいずれかを削除することにより、オートマトンＡを更新してもよい。これにより、更新部４３０６は、照合処理装置１００の第２の動作例を実現することができる。 Further, the update unit 4306 deletes one of the first and second path matching states having the same counted number of detections among the path matching states defined in the automaton A based on the counted number of transitions. Accordingly, the automaton A may be updated. Thereby, the update unit 4306 can realize the second operation example of the verification processing apparatus 100.

更新部４３０６は、例えば、オートマトンＡに規定されたパス照合状態が所定数以上ある場合には、オートマトンＡを更新する。更新部４３０６は、具体的には、オートマトンＡに規定されたパス照合状態であって、読み込まれた開始タグに対応する終了タグが読み込まれたパス照合状態のうち、計数した検出回数が相対的に少ないパスを示すパス照合状態を削除する。これにより、更新部４３０６は、照合処理装置１００の第３の動作例を実現することができる。 The update unit 4306 updates the automaton A when, for example, there are a predetermined number or more of path collation states defined in the automaton A. Specifically, the update unit 4306 is in a path matching state defined in the automaton A, and among the path matching states in which the end tag corresponding to the read start tag is read, the counted number of detections is relative. Delete path verification states that indicate fewer paths. Thereby, the update unit 4306 can realize the third operation example of the verification processing apparatus 100.

＜更新処理手順＞
図４４は、照合処理装置１００の更新処理の処理手順の一例を示すフローチャートである。 <Update procedure>
FIG. 44 is a flowchart illustrating an example of a processing procedure of update processing of the verification processing device 100.

まず、照合処理装置１００は、初期オートマトンＡ０があるか否かを判断する（ステップＳ４４０１）。初期オートマトンＡ０がない場合（ステップＳ４４０１：Ｎｏ）、照合処理装置１００は、初期化を実行する（ステップＳ４４０２）。初期化では、照合処理装置１００は、記憶装置内のパスＩＤ管理テーブルＴを空にする。 First, the verification processing apparatus 100 determines whether or not there is an initial automaton A0 (step S4401). When there is no initial automaton A0 (step S4401: No), the collation processing apparatus 100 executes initialization (step S4402). In initialization, the verification processing device 100 empties the path ID management table T in the storage device.

次に、照合処理装置１００は、初期オートマトン構築処理を実行して（ステップＳ４４０３）、ステップＳ４４０５に移行する。初期オートマトン構築処理（ステップＳ４４０３）では、初期オートマトンＡ０が構築される。初期オートマトン構築処理（ステップＳ４４０３）の処理については図４５を用いて後述する。 Next, the matching processing device 100 executes an initial automaton construction process (step S4403), and proceeds to step S4405. In the initial automaton construction process (step S4403), the initial automaton A0 is constructed. The initial automaton construction process (step S4403) will be described later with reference to FIG.

一方で、初期オートマトンＡ０がある場合（ステップＳ４４０１：Ｙｅｓ）、照合処理装置１００は、記憶装置から初期オートマトンＡ０を取得して（ステップＳ４４０４）、ステップＳ４４０５に移行する。ステップＳ４４０５において、照合処理装置１００は、初期オートマトンＡ０を走査対象のオートマトンＡに決定する（ステップＳ４４０５）。 On the other hand, if there is an initial automaton A0 (step S4401: YES), the verification processing apparatus 100 acquires the initial automaton A0 from the storage device (step S4404), and proceeds to step S4405. In step S4405, the collation processing apparatus 100 determines the initial automaton A0 as the automaton A to be scanned (step S4405).

このあと、照合処理装置１００は、入力ストリームＳを待ち受ける（ステップＳ４４０６：Ｎｏ）。入力ストリームＳが受信された場合（ステップＳ４４０６：Ｙｅｓ）、照合処理装置１００は、入力ストリームＳの先頭位置を現在の読込位置Ｓｃｕｒに設定する（ステップＳ４４０７）。 Thereafter, the verification processing apparatus 100 waits for the input stream S (step S4406: No). When the input stream S is received (step S4406: YES), the collation processing apparatus 100 sets the head position of the input stream S to the current reading position Scur (step S4407).

そして、照合処理装置１００は、現在の読込位置Ｓｃｕｒのデータが、開始タグか文字か終了タグかを判断する（ステップＳ４４０８）。開始タグである場合（ステップＳ４４０８：開始タグ）、照合処理装置１００は、第１の走査処理を実行して（ステップＳ４４０９）ステップＳ４４０８に戻る。第１の走査処理（ステップＳ４４０９）の詳細については図４６を用いて後述する。 Then, the verification processing apparatus 100 determines whether the data at the current reading position Scur is a start tag, a character, or an end tag (step S4408). If it is a start tag (step S4408: start tag), the collation processing apparatus 100 executes the first scanning process (step S4409) and returns to step S4408. Details of the first scanning process (step S4409) will be described later with reference to FIG.

文字である場合（ステップＳ４４０８：文字）、照合処理装置１００は、第２の走査処理を実行して（ステップＳ４４１０）ステップＳ４４０８に戻る。第２の走査処理（ステップＳ４４１０）の詳細については図５１を用いて後述する。 If it is a character (step S4408: character), the collation processing apparatus 100 executes the second scanning process (step S4410) and returns to step S4408. Details of the second scanning process (step S4410) will be described later with reference to FIG.

終了タグである場合（ステップＳ４４０８：終了タグ）、照合処理装置１００は、第３の走査処理を実行して（ステップＳ４４１１）ステップＳ４４０８に戻る。第３の走査処理（ステップＳ４４１１）の詳細については図５２を用いて後述する。 If it is an end tag (step S4408: end tag), the collation processing device 100 executes a third scanning process (step S4411) and returns to step S4408. Details of the third scanning process (step S4411) will be described later with reference to FIG.

一方で、ステップＳ４４０８において、現在の読込位置Ｓｃｕｒがない場合（ステップＳ４４０８：なし）、照合処理装置１００は、現在の読込位置がない状態から一定時間が経過したか否かを判断する（ステップＳ４４１２）。一定時間経過していない場合（ステップＳ４４１２：Ｎｏ）、ステップＳ４４０８に戻る。これにより、入力ストリームＳのデータ受信を待ち受けることになる。一方で、一定時間経過した場合（ステップＳ４４１２：Ｙｅｓ）、照合処理装置１００は、更新処理を終了する。 On the other hand, in step S4408, when there is no current reading position Scur (step S4408: none), the collation processing device 100 determines whether or not a certain time has elapsed since there is no current reading position (step S4412). ). If the predetermined time has not elapsed (step S4412: NO), the process returns to step S4408. As a result, data reception of the input stream S is awaited. On the other hand, when the fixed time has elapsed (step S4412: Yes), the matching processing device 100 ends the update process.

＜初期オートマトン構築処理＞
図４５は、図４４に示した初期オートマトン構築処理（ステップＳ４４０３）の処理手順の一例を示すフローチャートである。 <Initial automaton construction process>
FIG. 45 is a flowchart showing an example of the processing procedure of the initial automaton construction process (step S4403) shown in FIG.

照合処理装置１００は、初期状態Ｎ０、開始状態Ｎ１および終了状態Ｎ２を作成し（ステップＳ４５０１）、作成した各状態の遷移先の初期値を初期状態Ｎ０に設定する（ステップＳ４５０２）。これにより、図７に示したように、初期オートマトンＡ０が構築される。構築された初期オートマトンＡ０は、記憶装置に記憶される。 The verification processing device 100 creates an initial state N0, a start state N1, and an end state N2 (step S4501), and sets the initial value of the created transition destination of each state to the initial state N0 (step S4502). As a result, the initial automaton A0 is constructed as shown in FIG. The constructed initial automaton A0 is stored in the storage device.

＜第１の走査処理＞
図４６は、図４４に示した第１の走査処理（ステップＳ４４０９）の処理手順の一例を示すフローチャートである。 <First scanning process>
FIG. 46 is a flowchart illustrating an example of a processing procedure of the first scanning process (step S4409) illustrated in FIG.

照合処理装置１００は、読み込まれた開始タグをバイナリ変換する（ステップＳ４６０１）。例えば、図３に示したように、開始タグ＜ｎｅｗｓ＞が読み込まれた場合、「［１」に変換する。 The verification processing apparatus 100 performs binary conversion on the read start tag (step S4601). For example, as shown in FIG. 3, when the start tag <news> is read, it is converted to "[1".

照合処理装置１００は、現在のパスｐに、「／ｔ」を追加する（ステップＳ４６０２）。「／」は階層の境界を示す記号である。ｔはタグ内文字列である。例えば、開始タグ＜ｎｅｗｓ＞が読み込まれた場合、ｐは、ｐ＝／ｎｅｗｓとなる。また、現在のパスが「／ｎｅｗｓ／ｗｒｉｔｅ」である場合に、開始タグ＜ｎａｍｅ＞が読み込まれた場合、ｐは、ｐ＝／ｎｅｗｓ／ｗｒｉｔｅ／ｎａｍｅとなる。 The verification processing apparatus 100 adds “/ t” to the current path p (step S4602). “/” Is a symbol indicating a hierarchy boundary. t is a character string in the tag. For example, when the start tag <news> is read, p becomes p = / news. When the current path is “/ news / write” and the start tag <name> is read, p becomes p = / news / write / name.

そして、照合処理装置１００は、パスＩＤ管理テーブルＴを参照して、パスＩＤ管理テーブルＴに、現在のパスｐが存在するか否かを判断する（ステップＳ４６０３）。パスＩＤ管理テーブルＴに現在のパスｐがある場合（ステップＳ４６０３：Ｙｅｓ）、ステップＳ４６０８に移行する。 Then, the verification processing device 100 refers to the path ID management table T and determines whether or not the current path p exists in the path ID management table T (step S4603). If the current path p is present in the path ID management table T (step S4603: YES), the process proceeds to step S4608.

一方で、パスＩＤ管理テーブルＴに現在のパスｐがない場合（ステップＳ４６０３：Ｎｏ）、照合処理装置１００は、パスＩＤ管理テーブルＴに現在のパスｐを追加して、パスｐに新たなパスＩＤを割り当てる（ステップＳ４６０４）。例えば、パスＩＤ管理テーブルＴが空の状態で開始タグ＜ｎｅｗｓ＞が読み込まれた場合、パスＩＤ「１」、パスｐ＝／ｎｅｗｓを追加する。 On the other hand, when there is no current path p in the path ID management table T (step S4603: No), the verification processing apparatus 100 adds the current path p to the path ID management table T and adds a new path to the path p. An ID is assigned (step S4604). For example, when the start tag <news> is read while the path ID management table T is empty, the path ID “1” and the path p = / news are added.

そして、照合処理装置１００は、現在のパスｐがクエリＱの条件にマッチするか否かを判断する（ステップＳ４６０５）。マッチする場合（ステップＳ４６０５：Ｙｅｓ）、照合処理装置１００は、第１の更新処理を実行して（ステップＳ４６０６）、ステップＳ４６０８に移行する。第１の更新処理（ステップＳ４６０６）の詳細については図４７を用いて後述する。 Then, the verification processing apparatus 100 determines whether or not the current path p matches the query Q condition (step S4605). If a match is found (step S4605: YES), the collation processing device 100 executes the first update process (step S4606), and proceeds to step S4608. Details of the first update process (step S4606) will be described later with reference to FIG.

一方で、マッチしない場合（ステップＳ４６０５：Ｎｏ）、照合処理装置１００は、第２の更新処理を実行して（ステップＳ４６０７）、ステップＳ４６０８に移行する。第２の更新処理（ステップＳ４６０７）の詳細については図４９を用いて後述する。 On the other hand, when there is no match (step S4605: No), the collation processing device 100 executes the second update process (step S4607), and proceeds to step S4608. Details of the second update process (step S4607) will be described later with reference to FIG.

次に、照合処理装置１００は、走査対象のオートマトンＡの現在の読込位置Ａｃｕｒから「［ｉ」について遷移させ、遷移先を新たな読込位置Ａｃｕｒにする（ステップＳ４６０８）。ｉは、ｐに対応するパスＩＤである。そして、照合処理装置１００は、累計回数更新処理を実行する（ステップＳ４６０９）。累計回数更新処理の詳細については図５０を用いて後述する。 Next, the collation processing device 100 makes a transition for “[i” from the current reading position Acur of the automaton A to be scanned, and sets the transition destination to a new reading position Acur (step S4608). i is a path ID corresponding to p. Then, the verification processing device 100 executes the cumulative number update process (step S4609). Details of the cumulative number update process will be described later with reference to FIG.

次に、照合処理装置１００は、パスＩＤ管理テーブルＴにおける現在のパスＩＤに対応する前回項目に「開始」を設定する（ステップＳ４６１０）。そして、照合処理装置１００は、入力ストリームＳの現在の読込位置Ｓｃｕｒを開始タグの文字列長分加算して現在の読込位置Ｓｃｕｒを更新し（ステップＳ４６１１）、第１の走査処理を終了する。 Next, the verification processing apparatus 100 sets “start” in the previous item corresponding to the current path ID in the path ID management table T (step S4610). Then, the verification processing apparatus 100 adds the current reading position Scur of the input stream S by the character string length of the start tag to update the current reading position Scur (step S4611), and ends the first scanning process.

＜第１の更新処理＞
図４７は、図４６に示した第１の更新処理（ステップＳ４６０６）の処理手順の一例を示すフローチャートである。 <First update process>
FIG. 47 is a flowchart illustrating an example of a processing procedure of the first update processing (step S4606) illustrated in FIG.

照合処理装置１００は、オートマトンＡに新規状態を追加できるか否かを判定する（ステップＳ４７０１）。ここで、追加できない場合（ステップＳ４７０１：Ｎｏ）、照合処理装置１００は、削除処理を実行し（ステップＳ４７０２）、ステップＳ４７０３に移行する。削除処理の詳細については図４８を用いて後述する。一方で、追加できる場合（ステップＳ４７０１：Ｙｅｓ）、照合処理装置１００は、ステップＳ４７０３に移行する。 The verification processing apparatus 100 determines whether or not a new state can be added to the automaton A (step S4701). Here, when it cannot add (step S4701: No), the collation processing apparatus 100 performs a deletion process (step S4702), and transfers to step S4703. Details of the deletion process will be described later with reference to FIG. On the other hand, when it can add (step S4701: Yes), collation processing device 100 shifts to step S4703.

ステップＳ４７０３において、照合処理装置１００は、パスＩＤ「ｉ」に対応するパス照合状態Ｖを作成する（ステップＳ４７０３）。次に、照合処理装置１００は、作成したパス照合状態Ｖからの「［」に関する遷移先を開始状態Ｎ１に設定し、「］」に関する遷移先を終了状態Ｎ２に設定する。そして、照合処理装置１００は、作成したパス照合状態Ｖからのそれ以外の遷移先として、自分自身を設定する（ステップＳ４７０４）。 In step S4703, the matching processing apparatus 100 creates a path matching state V corresponding to the path ID “i” (step S4703). Next, the verification processing apparatus 100 sets the transition destination related to “[” from the created path verification status V to the start state N1, and sets the transition destination related to “]” to the end state N2. Then, the matching processing device 100 sets itself as the other transition destination from the created path matching state V (step S4704).

次に、照合処理装置１００は、開始状態Ｎ１のパスＩＤ「ｉ」に関する遷移先を初期状態Ｎ０からパス照合状態Ｖに変更する（ステップＳ４７０５）。また、照合処理装置１００は、クエリＱ内のキーワードｋに関するキーワード状態を作成し、キーワード状態の間の遷移を設定する（ステップＳ４７０６）。キーワード状態とは、キーワードの各文字を遷移とした場合の遷移先状態であり、例えば、キーワード照合途中状態Ｎ４，Ｎ５およびキーワード照合完了状態Ｎ６である。 Next, the verification processing device 100 changes the transition destination for the path ID “i” in the start state N1 from the initial state N0 to the path verification state V (step S4705). In addition, the matching processing device 100 creates a keyword state related to the keyword k in the query Q and sets a transition between the keyword states (step S4706). The keyword state is a transition destination state when each character of the keyword is transitioned, and is, for example, a keyword collation intermediate state N4, N5 and a keyword collation completion state N6.

次に、照合処理装置１００は、パス照合状態Ｖからキーワードｋの先頭文字に対応する状態への遷移を設定する（ステップＳ４７０７）。そして、照合処理装置１００は、キーワードｋの末尾文字に対応する状態を、キーワード照合完了状態Ｎ６として設定する（ステップＳ４７０８）。これにより、図１６に示したように、初期オートマトンＡ０がオートマトンＡ１に更新される。 Next, the matching processing device 100 sets a transition from the path matching state V to a state corresponding to the first character of the keyword k (step S4707). Then, the matching processing device 100 sets the state corresponding to the last character of the keyword k as the keyword matching completion state N6 (step S4708). As a result, as shown in FIG. 16, the initial automaton A0 is updated to the automaton A1.

＜削除処理＞
図４８は、図４７に示した削除処理（ステップＳ４７０２）の処理手順の一例を示すフローチャートである。 <Delete processing>
FIG. 48 is a flowchart illustrating an example of a processing procedure of the deletion process (step S4702) illustrated in FIG.

照合処理装置１００は、現在のオートマトンＡに規定されたパス照合状態を取得し、パス照合状態により特定されるパスＩＤの集合Ｓを取得する（ステップＳ４８０１）。次に、照合処理装置１００は、集合Ｓのうち、パスＩＤ管理テーブルＴの累積回数が最小になるパスＩＤを特定する（ステップＳ４８０２）。 The verification processing apparatus 100 acquires the path verification status defined in the current automaton A, and acquires a set S of path IDs specified by the path verification status (step S4801). Next, the verification processing device 100 identifies a path ID that minimizes the cumulative number of times in the path ID management table T in the set S (step S4802).

そして、照合処理装置１００は、特定したパスＩＤが２つ以上あるか否かを判定する（ステップＳ４８０３）。ここで、パスＩＤが１つである場合（ステップＳ４８０３：Ｎｏ）、照合処理装置１００は、ステップＳ４８０６に移行する。 Then, the verification processing apparatus 100 determines whether there are two or more specified path IDs (step S4803). Here, when there is one path ID (step S4803: No), the verification processing apparatus 100 proceeds to step S4806.

一方で、パスＩＤが２つ以上ある場合（ステップＳ４８０３：Ｙｅｓ）、照合処理装置１００は、特定した２つ以上のパスＩＤのうち、ノードカウンタが最小になるパスＩＤを特定する（ステップＳ４８０４）。次に、照合処理装置１００は、特定したパスＩＤのうち、ランダムに一つのパスＩＤを特定する（ステップＳ４８０５）。そして、照合処理装置１００は、ステップＳ４８０６に移行する。 On the other hand, when there are two or more path IDs (step S4803: Yes), the collation processing apparatus 100 identifies a path ID that minimizes the node counter among the identified two or more path IDs (step S4804). . Next, the verification processing device 100 randomly specifies one path ID among the specified path IDs (step S4805). Then, the verification processing apparatus 100 proceeds to step S4806.

ステップＳ４８０６において、照合処理装置１００は、特定したパスＩＤを特定するパス照合状態を削除する（ステップＳ４８０６）。次に、照合処理装置１００は、削除処理を終了する。 In step S4806, the matching processing device 100 deletes the path matching status that identifies the identified path ID (step S4806). Next, the verification processing apparatus 100 ends the deletion process.

＜第２の更新処理＞
図４９は、図４６に示した第２の更新処理（ステップＳ４６０７）の処理手順の一例を示すフローチャートである。 <Second update process>
FIG. 49 is a flowchart illustrating an example of a processing procedure of the second update process (step S4607) illustrated in FIG.

照合処理装置１００は、現在のパスｐから末尾のタグである「／ｔ」を取り除いて得られるパスをパスｐａｒとし、パスｐａｒに対応するパスＩＤをｊとする（ステップＳ４９０１）。 The verification processing apparatus 100 sets a path obtained by removing the last tag “/ t” from the current path p as a path par, and sets a path ID corresponding to the path par as j (step S4901).

次に、照合処理装置１００は、パスＩＤ「ｊ」を特定するパス照合状態がオートマトンＡに規定されているか否かを判定する（ステップＳ４９０２）。ここで、規定されていない場合（ステップＳ４９０２：Ｎｏ）、照合処理装置１００は、第２の更新処理を終了する。 Next, the matching processing device 100 determines whether or not the path matching state for specifying the path ID “j” is defined in the automaton A (step S4902). Here, when not prescribed | regulated (step S4902: No), the collation processing apparatus 100 complete | finishes a 2nd update process.

一方で、規定されている場合（ステップＳ４９０２：Ｙｅｓ）、照合処理装置１００は、パスＩＤ「ｊ」に対応するパス照合状態Ｎ３をｗとし（ステップＳ４９０３）、終了状態Ｎ２のｉに関する遷移先を、初期状態Ｎ０からパス照合状態ｗに変更する（ステップＳ４９０４）。これにより、図２０に示したように、オートマトンＡがオートマトンＡ１からオートマトンＡ２に更新される。 On the other hand, when defined (step S4902: Yes), the verification processing device 100 sets the path verification state N3 corresponding to the path ID “j” to w (step S4903), and sets the transition destination for i in the end state N2. Then, the initial state N0 is changed to the path verification state w (step S4904). Accordingly, as shown in FIG. 20, the automaton A is updated from the automaton A1 to the automaton A2.

＜累計回数更新処理＞
図５０は、図４６に示した累計回数更新処理（ステップＳ４６０９）の処理手順の一例を示すフローチャートである。 <Cumulative number update process>
FIG. 50 is a flowchart illustrating an example of a processing procedure of the cumulative number updating process (step S4609) illustrated in FIG.

照合処理装置１００は、ＡｃｕｒがパスＩＤ照合状態か否かを判定する（ステップＳ５００１）。ここで、パスＩＤ照合状態でない場合（ステップＳ５００１：Ｎｏ）、照合処理装置１００は、累計回数更新処理を終了する。 The verification processing apparatus 100 determines whether Acur is in the path ID verification state (step S5001). Here, when it is not in the path ID collation state (step S5001: No), the collation processing apparatus 100 ends the cumulative number update process.

一方で、パスＩＤ照合状態である場合（ステップＳ５００１：Ｙｅｓ）、照合処理装置１００は、Ａｃｕｒのノードカウンタを更新する（ステップＳ５００２）。次に、照合処理装置１００は、パスＩＤ管理テーブルＴにおけるＡｃｕｒのパスＩＤに対応する累計回数を更新する（ステップＳ５００３）。 On the other hand, when it is in the path ID collation state (step S5001: Yes), the collation processing apparatus 100 updates the Acur node counter (step S5002). Next, the verification processing apparatus 100 updates the cumulative number corresponding to the Acur path ID in the path ID management table T (step S5003).

そして、照合処理装置１００は、頻度管理テーブルＦに空き領域があるか否かを判定する（ステップＳ５００４）。ここで、空き領域がない場合（ステップＳ５００４：Ｎｏ）、照合処理装置１００は、頻度管理テーブルＦの先頭データを取得する（ステップＳ５００５）。次に、照合処理装置１００は、パスＩＤ管理テーブルＴにおける先頭データが示すパスＩＤに対応する累計回数を更新する（ステップＳ５００６）。そして、照合処理装置１００は、ステップＳ５００７に移行する。 Then, the verification processing device 100 determines whether or not there is a free area in the frequency management table F (step S5004). Here, when there is no free space (step S5004: No), the collation processing apparatus 100 acquires the top data of the frequency management table F (step S5005). Next, the verification processing apparatus 100 updates the cumulative number corresponding to the path ID indicated by the head data in the path ID management table T (step S5006). Then, the verification processing apparatus 100 proceeds to step S5007.

一方で、空き領域がある場合（ステップＳ５００４：Ｙｅｓ）、照合処理装置１００は、ステップＳ５００７に移行する。ステップＳ５００７において、照合処理装置１００は、ＡｃｕｒのパスＩＤを頻度管理テーブルＦに末尾データとして追加する（ステップＳ５００７）。ここで、頻度管理テーブルＦのレコード数が上限に達している場合、パスＩＤが末尾データとして追加されると、先頭データが削除される。そして、照合処理装置１００は、累計回数更新処理を終了する。 On the other hand, when there is a free area (step S5004: Yes), the verification processing apparatus 100 proceeds to step S5007. In step S5007, the verification processing apparatus 100 adds the Acur path ID as tail data to the frequency management table F (step S5007). Here, when the number of records in the frequency management table F has reached the upper limit, the head data is deleted when the path ID is added as the tail data. Then, the verification processing device 100 ends the cumulative number update process.

＜第２の走査処理＞
図５１は、図４４に示した第２の走査処理（ステップＳ４４１０）の処理手順の一例を示すフローチャートである。 <Second scanning process>
FIG. 51 is a flowchart showing an example of the processing procedure of the second scanning process (step S4410) shown in FIG.

照合処理装置１００は、オートマトンＡの現在の読込位置Ａｃｕｒから、今回読み込まれた文字について遷移させ、遷移先を新たなＡｃｕｒにする（ステップＳ５１０１）。以下の説明では、今回読み込まれた文字を、「文字ｃ」と表記する場合がある。そして、照合処理装置１００は、現在の読込位置Ａｃｕｒが示す状態がキーワード照合完了状態Ｎ６であるか否かを判断する（ステップＳ５１０２）。 The verification processing device 100 makes a transition for the character read this time from the current reading position Acur of the automaton A, and sets the transition destination to a new Acur (step S5101). In the following description, the character read this time may be referred to as “character c”. Then, the matching processing device 100 determines whether or not the state indicated by the current reading position Acur is the keyword matching completed state N6 (step S5102).

キーワード照合完了状態Ｎ６である場合（ステップＳ５１０２：Ｙｅｓ）、照合処理装置１００は、クエリＱ照合結果ＡｎｓとなるクエリＱのキーワードｋを出力して（ステップＳ５１０３）、ステップＳ５１０４に移行する。 When it is the keyword collation completion state N6 (step S5102: Yes), the collation processing apparatus 100 outputs the keyword k of the query Q that becomes the query Q collation result Ans (step S5103), and proceeds to step S5104.

一方で、キーワード照合完了状態Ｎ６でない場合（ステップＳ５１０２：Ｎｏ）、ステップＳ５１０４に移行する。ステップＳ５１０４において、照合処理装置１００は、入力ストリームＳの現在の読込位置ＳｃｕｒをＳｃｕｒ＝Ｓｃｕｒ＋１に更新する（ステップＳ５１０４）。すなわち、照合処理装置１００は、１文字分読込位置Ｓｃｕｒを進める。これにより、第２の走査処理（ステップＳ４４１０）を終了する。 On the other hand, when it is not the keyword matching completion state N6 (step S5102: No), the process proceeds to step S5104. In step S5104, the collation processing apparatus 100 updates the current reading position Scur of the input stream S to Scur = Scur + 1 (step S5104). That is, the verification processing apparatus 100 advances the reading position Scur for one character. Thus, the second scanning process (step S4410) is terminated.

＜第３の走査処理＞
図５２は、図４４に示した第３の走査処理（ステップＳ４４１１）の処理手順の一例を示すフローチャートである。 <Third scanning process>
FIG. 52 is a flowchart showing an example of the processing procedure of the third scanning process (step S4411) shown in FIG.

照合処理装置１００は、読み込まれた終了タグをバイナリ変換する（ステップＳ５２０１）。例えば、図３に示したように、終了タグ＜／ｎｅｗｓ＞が読み込まれた場合、「］１」に変換する。 The verification processing apparatus 100 performs binary conversion on the read end tag (step S5201). For example, as shown in FIG. 3, when the end tag </ news> is read, it is converted to "] 1".

照合処理装置１００は、オートマトンＡの現在の読込位置Ａｃｕｒから「］ｉ」について遷移させ、遷移先を新たな読込位置Ａｃｕｒにする（ステップＳ５２０２）。ｉはｐに対応するパスＩＤである。そして、照合処理装置１００は、現在のパスＩＤに関する前回項目が「開始」であれば、「終了」に更新する（ステップＳ５２０３）。 The verification processing device 100 makes a transition from the current reading position Acur of the automaton A to “] i”, and sets the transition destination to a new reading position Acur (step S5202). i is a path ID corresponding to p. If the previous item regarding the current path ID is “start”, the verification processing apparatus 100 updates it to “end” (step S5203).

次に、照合処理装置１００は、入力ストリームＳの現在の読込位置Ｓｃｕｒを終了タグの文字列長分加算して現在の読込位置Ｓｃｕｒを更新する（ステップＳ５２０４）。そして、照合処理装置１００は、現在のパスｐの末尾から「／ｔ」を削除したものを新たなパスｐとし（ステップＳ５２０５）、第３の走査処理を終了する。 Next, the verification processing apparatus 100 updates the current reading position Scur by adding the current reading position Scur of the input stream S by the character string length of the end tag (step S5204). Then, the collation processing device 100 sets a path obtained by deleting “/ t” from the end of the current path p as a new path p (step S5205), and ends the third scanning process.

このように、照合処理装置１００によれば、照合処理においてクエリＱの条件を満たすパスが検出されるごとにクエリＱの条件を満たすパスごとの検出回数を計数し、検出回数に基づいてオートマトンＡを更新することができる。これにより、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、後に走査する可能性が高いパス照合状態を把握して、オートマトンＡを更新することができる。 As described above, according to the matching processing device 100, every time a path satisfying the query Q condition is detected in the matching process, the number of detections for each path satisfying the query Q is counted, and the automaton A is based on the number of detections. Can be updated. Thereby, the collation processing apparatus 100 can grasp the path collation state that is highly likely to be scanned later among the path collation states defined in the automaton A, and can update the automaton A.

例えば、照合処理装置１００によれば、検出回数が最小のパスを示すパス照合状態を削除することができる。これにより、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、後に再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 For example, according to the matching processing device 100, the path matching state indicating the path with the smallest number of detections can be deleted. As a result, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again later among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

また、例えば、照合処理装置１００によれば、検出回数が同一の第１および第２のパスを示す第１および第２のパス照合状態があっても、ノードカウンタが少ないパス照合状態を削除することができる。これにより、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、後に再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 Further, for example, according to the matching processing device 100, even if there are first and second path matching states indicating the first and second paths having the same number of detections, the path matching state with a small node counter is deleted. be able to. As a result, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again later among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

また、例えば、照合処理装置１００によれば、前回項目のフラグが「終了」になっているパスを示すパス照合状態のうちで、検出回数が最小のパス照合状態を削除することができる。これにより、照合処理装置１００は、近々、終了状態Ｎ２から走査されるパス照合状態をオートマトンＡに残しておく。したがって、照合処理装置１００は、オートマトンＡに規定されたパス照合状態のうち、近々、再び走査する可能性が相対的に高いパス照合状態を残すことができる。換言すれば、照合処理装置１００は、後に再び走査する可能性が相対的に高いパス照合状態を削除してしまって、後に再び同一のパス照合状態を生成することになってしまうことを防止することができる。結果として、照合処理装置１００は、オートマトン更新処理の負荷を低減し、オートマトンＡにパス照合状態を生成することによって生じる処理遅延時間を抑制し、オートマトンＡを用いたクエリＱの照合処理を高速化することができる。 Further, for example, according to the matching processing device 100, the path matching status with the smallest number of detections can be deleted from among the path matching statuses indicating the paths whose previous item flag is “finished”. Thereby, the collation processing apparatus 100 leaves the automaton A in a path collation state scanned from the end state N2 in the near future. Therefore, the verification processing apparatus 100 can leave a path verification state that is relatively likely to be scanned again soon, among the path verification states defined in the automaton A. In other words, the matching processing device 100 prevents a path matching state that is relatively likely to be scanned again later from being deleted, and subsequently generating the same path matching state again. be able to. As a result, the matching processing device 100 reduces the load of automaton update processing, suppresses processing delay time caused by generating a path matching state in the automaton A, and speeds up the query Q matching processing using the automaton A. can do.

また、例えば、照合処理装置１００によれば、パスＩＤ管理テーブルＴの累計回数を頻度管理テーブルＦにより修正することができる。これにより、照合処理装置１００は、パスＩＤごとのクエリＱの条件を満たす累計回数を、頻度を考慮して計数することができる。例えば、照合処理装置１００は、過去にクエリＱの条件を満たすと検出された所定回数の中での、パスＩＤごとのクエリＱの条件を満たす累計回数を計数することができる。 Further, for example, according to the verification processing apparatus 100, the cumulative number of times of the path ID management table T can be corrected by the frequency management table F. Thereby, the collation processing apparatus 100 can count the cumulative number that satisfies the condition of the query Q for each path ID in consideration of the frequency. For example, the verification processing apparatus 100 can count the cumulative number of times that satisfy the query Q condition for each path ID among the predetermined number of times detected when the query Q condition is satisfied in the past.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）オートマトンを用いて、タグにより階層化された入力ストリームに対して、キーワードと前記キーワードに対応するパスの条件とを含むクエリの照合を行うコンピュータが、
前記入力ストリームから前記条件を満たすパスを検出するたびに、前記条件を満たすパスごとの検出回数を計数し、
初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定され、前記条件を満たすパスを示すパス照合状態が複数規定されたオートマトンに、前記条件を満たすパスを示す新たなパス照合状態を追加する場合、計数した前記条件を満たすパスごとの検出回数に基づいて、前記オートマトンを更新する、
処理を実行することを特徴とする更新方法。 (Additional remark 1) The computer which collates the query containing the keyword and the conditions of the path | pass corresponding to the said keyword with respect to the input stream hierarchized by the tag using an automaton,
Each time a path satisfying the condition is detected from the input stream, the number of detections for each path satisfying the condition is counted,
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path matching state, the automaton is updated based on the number of detections for each path that satisfies the counted condition.
An update method characterized by executing processing.

（付記２）前記更新する処理は、
前記オートマトンに規定されたパス照合状態が所定数以上ある場合には、前記オートマトンに規定されたパス照合状態のうち、計数した検出回数が相対的に少ない前記条件を満たすパスを示すパス照合状態を削除することにより、前記オートマトンを更新することを特徴とする付記１に記載の更新方法。 (Appendix 2) The update process is as follows:
When there are a predetermined number or more of path matching states defined for the automaton, among path matching states defined for the automaton, a path matching state indicating a path satisfying the condition with a relatively small number of detected counts. The update method according to appendix 1, wherein the automaton is updated by deleting.

（付記３）前記コンピュータが、
前記入力ストリームを先頭から読み込み、
前記入力ストリーム内の開始タグを読み込むと、前記入力ストリームの最上階層から前記開始タグが示す階層までのパスを特定し、
特定したパスが前記条件を満たすか否かを判定することにより、前記条件を満たすパスを検出する、処理を実行し、
前記計数する処理は、
前記条件を満たすパスを検出した場合には、検出した前記条件を満たすパスの検出回数を計数し、
前記更新する処理は、
前記オートマトンに規定されたパス照合状態が所定数以上ある場合には、前記オートマトンに規定されたパス照合状態であって、読み込まれた開始タグに対応する終了タグが読み込まれたパス照合状態のうち、計数した検出回数が相対的に少ないパスを示すパス照合状態を削除することにより、前記オートマトンを更新する、
ことを特徴とする付記１または２に記載の更新方法。 (Supplementary note 3)
Read the input stream from the beginning,
When the start tag in the input stream is read, the path from the top layer of the input stream to the layer indicated by the start tag is specified,
Detecting a path that satisfies the condition by determining whether or not the identified path satisfies the condition,
The counting process is:
If a path that satisfies the condition is detected, the number of detections of the detected path that satisfies the condition is counted,
The update process is as follows:
If there are a predetermined number or more of path matching states defined in the automaton, the path matching state defined in the automaton is a path matching state in which an end tag corresponding to the read start tag is read. Updating the automaton by deleting a path matching state indicating a path with a relatively small number of detected counts,
The update method according to appendix 1 or 2, characterized in that:

（付記４）前記更新する処理は、
前記条件を満たすパスを検出した場合であって、前記オートマトンに規定されたパス照合状態が所定数より少ない場合には、検出した前記条件を満たすパスを示すパス照合状態を前記オートマトンに追加することにより、前記オートマトンを更新する、
ことを特徴とする付記１〜３のいずれか一つに記載の更新方法。 (Appendix 4) The update process is as follows:
When a path satisfying the condition is detected and the number of path matching states defined in the automaton is less than a predetermined number, a path matching state indicating a path that satisfies the detected condition is added to the automaton. To update the automaton,
The update method according to any one of appendices 1 to 3, wherein:

（付記５）前記コンピュータが、
前記オートマトンに規定されたパス照合状態ごとに前記開始状態から当該パス照合状態に遷移が行われた遷移回数を計数する処理を実行し、
前記更新する処理は、
計数した遷移回数に基づいて、前記オートマトンに規定されたパス照合状態のうちで、計数した検出回数が同一の第１および第２のパス照合状態のいずれかを削除することにより、前記オートマトンを更新することを特徴とする付記１〜４のいずれか一つに記載の更新方法。 (Appendix 5) The computer
For each path matching state defined in the automaton, execute a process of counting the number of transitions made from the start state to the path matching state,
The update process is as follows:
Based on the counted number of transitions, the automaton is updated by deleting one of the first and second path matching states with the same number of detected counts among the path matching states defined for the automaton. The update method according to any one of appendices 1 to 4, wherein:

（付記６）前記検出回数を計数する処理は、前記条件を満たすパスを検出するたびに、最新の所定回数分の前記条件を満たすパスの検出回数のうちの前記条件を満たすパスごとの検出回数を計数することを特徴とする付記１〜５のいずれか一つに記載の更新方法。 (Supplementary Note 6) The process of counting the number of detections is performed every time a path satisfying the condition is detected, the number of detections for each path satisfying the condition among the latest predetermined number of detections of the path satisfying the condition. The update method according to any one of appendices 1 to 5, wherein the update method is counted.

（付記７）オートマトンを用いて、タグにより階層化された入力ストリームに対して、キーワードと前記キーワードに対応するパスの条件とを含むクエリの照合を行うコンピュータに、
前記入力ストリームから前記条件を満たすパスを検出するたびに、前記条件を満たすパスごとの検出回数を計数し、
初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定され、前記条件を満たすパスを示すパス照合状態が複数規定されたオートマトンに、前記条件を満たすパスを示す新たなパス照合状態を追加する場合、計数した前記条件を満たすパスごとの検出回数に基づいて、前記オートマトンを更新する、
処理を実行させることを特徴とする更新プログラム。 (Supplementary Note 7) A computer that performs matching of a query including a keyword and a path condition corresponding to the keyword on an input stream hierarchized by tags using an automaton,
Each time a path satisfying the condition is detected from the input stream, the number of detections for each path satisfying the condition is counted,
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path matching state, the automaton is updated based on the number of detections for each path that satisfies the counted condition.
An update program characterized by causing a process to be executed.

（付記８）オートマトンを用いて、タグにより階層化された入力ストリームに対して、キーワードと前記キーワードに対応するパスの条件とを含むクエリの照合を行う照合処理装置であって、
前記入力ストリームから前記条件を満たすパスを検出するたびに、前記条件を満たすパスごとの検出回数を計数する計数部と、
初期状態と、開始タグ記号を示す開始状態と、終了タグ記号を示す終了状態とが規定され、前記条件を満たすパスを示すパス照合状態が複数規定されたオートマトンに、前記条件を満たすパスを示す新たなパス照合状態を追加する場合、前記計数部によって計数された前記条件を満たすパスごとの検出回数に基づいて、前記オートマトンを更新する更新部と、
を有することを特徴とする照合処理装置。 (Supplementary note 8) A collation processing device for collating a query including a keyword and a path condition corresponding to the keyword against an input stream hierarchized by a tag using an automaton,
A counter that counts the number of detections for each path that satisfies the condition each time a path that satisfies the condition is detected from the input stream;
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path verification state, an updating unit that updates the automaton based on the number of detections for each path that satisfies the condition counted by the counting unit;
The collation processing apparatus characterized by having.

１００照合処理装置
４３０１読込部
４３０２特定部
４３０３検出部
４３０４第１の計数部
４３０５第２の計数部
４３０６更新部
Ａオートマトン
ＴパスＩＤ管理テーブル DESCRIPTION OF SYMBOLS 100 Collation processing apparatus 4301 Reading part 4302 Identification part 4303 Detection part 4304 1st counting part 4305 2nd counting part 4306 Update part A Automaton T Path ID management table

Claims

A computer that performs matching of a query including a keyword and a path condition corresponding to the keyword with respect to an input stream layered by tags using an automaton,
Each time a path satisfying the condition is detected from the input stream, the number of detections for each path satisfying the condition is counted,
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path matching state, the automaton is updated based on the number of detections for each path that satisfies the counted condition.
An update method characterized by executing processing.

The update process is as follows:
When there are a predetermined number or more of path matching states defined for the automaton, among path matching states defined for the automaton, a path matching state indicating a path satisfying the condition with a relatively small number of detected counts. The update method according to claim 1, wherein the automaton is updated by deleting.

The computer is
Read the input stream from the beginning,
When the start tag in the input stream is read, the path from the top layer of the input stream to the layer indicated by the start tag is specified,
Detecting a path that satisfies the condition by determining whether or not the identified path satisfies the condition,
The counting process is:
If a path that satisfies the condition is detected, the number of detections of the detected path that satisfies the condition is counted,
The update process is as follows:
If there are a predetermined number or more of path matching states defined in the automaton, the path matching state defined in the automaton is a path matching state in which an end tag corresponding to the read start tag is read. Updating the automaton by deleting a path matching state indicating a path with a relatively small number of detected counts,
The update method according to claim 1 or 2, characterized in that:

The computer is
For each path matching state defined in the automaton, execute a process of counting the number of transitions made from the start state to the path matching state,
The update process is as follows:
Based on the counted number of transitions, the automaton is updated by deleting one of the first and second path matching states with the same number of detected counts among the path matching states defined for the automaton. The update method according to claim 1, wherein the update method is performed.

Using an automaton, to a computer that performs matching of a query including a keyword and a path condition corresponding to the keyword against an input stream hierarchized by tags,
Each time a path satisfying the condition is detected from the input stream, the number of detections for each path satisfying the condition is counted,
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path matching state, the automaton is updated based on the number of detections for each path that satisfies the counted condition.
An update program characterized by causing a process to be executed.

A matching processing device that performs matching of a query including a keyword and a path condition corresponding to the keyword against an input stream hierarchized by tags using an automaton,
A counter that counts the number of detections for each path that satisfies the condition each time a path that satisfies the condition is detected from the input stream;
An automaton in which an initial state, a start state indicating a start tag symbol, and an end state indicating an end tag symbol are defined, and a plurality of path matching states indicating paths satisfying the condition are indicated, indicates a path satisfying the condition. When adding a new path verification state, an updating unit that updates the automaton based on the number of detections for each path that satisfies the condition counted by the counting unit;
The collation processing apparatus characterized by having.