JP5320697B2

JP5320697B2 - Collation processing program and collation processing apparatus

Info

Publication number: JP5320697B2
Application number: JP2007195081A
Authority: JP
Inventors: 達哉浅井; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-07-26
Filing date: 2007-07-26
Publication date: 2013-10-23
Anticipated expiration: 2027-07-26
Also published as: US20090030887A1; JP2009032025A

Abstract

A collation processing device has a document storage unit, axis transforming unit, automaton creating unit, and collating processing unit. The document storage unit stores document data having a hierarchical structure in which elements are sectioned by element identifiers. The axis transforming unit executes axis transformation on a search formula when the search formula is obtained, whereby the search formula concerned is transformed to a search formula constructed of child axes. The automaton creating unit identifies the type of element identifiers contained in the transformed search formula to create the automaton corresponding to the search formula concerned. The collating processing unit collates data contained in the document data with the automaton to output the data corresponding to the search formula.

Description

この発明は、要素識別子により要素が区切られた階層構造を有する文書データから検索式に該当するデータを検索する照合処理プログラムに関し、特に、検索式の構造によらず文書データから該当データを検索することができる照合処理プログラム等に関するものである。 The present invention relates to a collation processing program for retrieving data corresponding to a search expression from document data having a hierarchical structure in which elements are separated by element identifiers. In particular, the data is searched from document data regardless of the structure of the search expression. It is related with the collation processing program etc. which can be performed.

近年、コンピュータで処理される文書データとして、ＸＭＬ（Extensible Markup Language）などが利用されている。このＸＭＬは、タグとして参照される要素識別子「＜」や、「／＞」を使用した階層構造を含み、テキスト形式よりも多くの情報を含ませることが可能となっているため、コンピュータにおいてますます多用されてきている（以下、ＸＭＬに基づいて記述された階層構造をなす文書データをＸＭＬデータと表記する）。 In recent years, XML (Extensible Markup Language) or the like is used as document data processed by a computer. This XML includes a hierarchical structure using element identifiers "<" and "/>" that are referred to as tags, and can contain more information than text format. (Hereinafter, document data having a hierarchical structure described based on XML is referred to as XML data).

そして、階層構造を含むＸＭＬデータを効率的に検索するため、一般的には、クエリ（Xpath式）などの検索式を使用し、そのクエリに該当する文書データおよびノードを検索する方法が知られている（例えば、特許文献１参照）。 In order to efficiently search XML data including a hierarchical structure, a method of searching for document data and nodes corresponding to the query is generally used by using a search expression such as a query (Xpath expression). (For example, refer to Patent Document 1).

特開２００４−１２６９３３号公報JP 2004-126933 A

しかしながら、ＸＭＬデータのますますの巨大化にともない、ストリーム処理に基づいて、コンピュータに負荷をかけることなく、クエリに該当する文書およびノードを検索することが求められているが、クエリに逆行軸などが含まれている場合には、ストリーム処理によってＸＭＬデータを検索することが困難であるという問題があった。 However, as XML data grows larger and larger, it is required to search for documents and nodes corresponding to the query based on stream processing without putting a load on the computer. Is included, there is a problem that it is difficult to search XML data by stream processing.

図３４は、従来技術の問題点を説明するための図である。ストリーム処理によってＸＭＬデータを検索することが困難な理由を説明すると、ストリーム指向に基づいた処理では、すでに読んだデータを再度読むことができないが、クエリに逆行軸が含まれていると、現在のデータ位置（図３４のＤｎ）よりも過去のデータ（図３４のＤ１〜Ｄｎ−１）にアクセスする必要があるためである。 FIG. 34 is a diagram for explaining the problems of the prior art. Explaining why it is difficult to retrieve XML data by stream processing, stream-oriented processing cannot reread already read data, but if the query includes a retrograde axis, This is because it is necessary to access past data (D1 to Dn-1 in FIG. 34) rather than the data position (Dn in FIG. 34).

すなわち、クエリに分岐などが含まれているばあであっても、ＸＭＬデータからクエリに該当する文書データ等を高速かつ効率よく検索することが極めて重要な課題となっている。 That is, even if the query includes a branch or the like, it is a very important issue to search the document data corresponding to the query from the XML data at high speed and efficiently.

この発明は、上述した従来技術による問題点を解消するためになされたものであり、クエリの構成によらず、ＸＭＬデータからクエリに該当する文書データ等を高速かつ効率よく検索することができる照合処理プログラムおよび照合処理装置を提供することを目的とする。 The present invention has been made to solve the above-described problems caused by the prior art, and is a collation capable of quickly and efficiently retrieving document data corresponding to a query from XML data, regardless of the structure of the query. An object is to provide a processing program and a verification processing device.

上述した課題を解決し、目的を達成するため、本発明は、コンピュータに、要素識別子により要素が区切られた階層構造を有する文書データを記憶装置に記憶する文書記憶手順と、前記記憶装置に記憶された文書データに含まれるデータを検索する検索式を取得した場合に、取得した検索式に対して軸変換を実行し、当該検索式を子供軸によって構成される検索式に変換する軸変換手順と、前記軸変換手順によって変換された検索式に含まれる要素識別子の種類を識別して当該検索式に対応するオートマトンを作成するオートマトン作成手順と、前記文書データに含まれるデータと前記オートマトンとを順に照合して前記検索式に該当するデータを出力する照合処理手順と、を実行させることを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides a computer with a document storage procedure for storing document data having a hierarchical structure in which elements are separated by element identifiers in a storage device, and storing the document data in the storage device. Axis conversion procedure to perform axis conversion on the acquired search expression and convert the search expression to a search expression composed of child axes when a search expression for searching data included in the document data is acquired Automaton creation procedure for identifying the type of element identifier included in the search expression converted by the axis conversion procedure and creating an automaton corresponding to the search expression, data included in the document data, and the automaton. And a collation process procedure for collating in order and outputting data corresponding to the search formula.

また、本発明は、上記発明において、前記軸変換手順は、前記検索式に兄弟軸が存在しているか否かを判定し、兄弟軸が存在している場合に、当該兄弟軸を親軸と子供軸とに変換することを特徴とする。 Further, the present invention is the above invention, wherein the axis conversion procedure determines whether or not a sibling axis exists in the search formula, and when the sibling axis exists, the sibling axis is set as a parent axis. It is characterized by converting to a child axis.

上記発明において、前記軸変換手順は、前記検索式に親軸が存在しているか否かを判定し、親軸が存在している場合に、当該親軸を子供軸に変換する。すなわち、軸変換の一例として「親軸を子供軸に変換」を扱うことができ、実現方法として例えば、参考文献１（D.Olteanu et al.,“XPath:Looking Forward”,Proc.XMLDM'02,2002.）がある。 In the above invention, the axis conversion procedure determines whether or not a parent axis exists in the search formula, and when the parent axis exists, converts the parent axis to a child axis. That is, “conversion of parent axis to child axis” can be handled as an example of axis conversion. For example, Reference 1 (D. Olteanu et al., “XPath: Looking Forward”, Proc.XMLDM'02 , 2002.).

また、上記発明において、前記検索式は、制約条件となる述語部を有し、前記軸変換手順は、変換前の検索式と変換後の検索式との関係を同値に保ったまま、前記検索式に含まれる述語部の位置を変更する。 In the above invention, the search expression includes a predicate part that is a constraint condition, and the axis conversion procedure maintains the relationship between the search expression before conversion and the search expression after conversion at the same value. Change the position of the predicate part included in the expression.

また、本発明は、上記発明において、前記照合処理手順は、前記文書データに含まれるデータと前記オートマトンとを順に照合していく過程において検出されるデータを一時記憶テーブルに順次記憶し、照合が終了した時点において前記一時記憶テーブルに記憶されているデータを出力することを特徴とする。 Also, in the present invention, in the above invention, the collation processing procedure sequentially stores data detected in the process of collating data included in the document data and the automaton in order in a temporary storage table. The data stored in the temporary storage table is output at the time of termination.

本発明によれば、要素識別子により要素が区切られた階層構造を有する文書データを記憶装置に記憶し、記憶装置に記憶された文書データに含まれるデータを検索する検索式を取得した場合に、取得した検索式に対して軸変換を実行し、当該検索式を子供軸によって構成される検索式に変換し、変換した検索式に含まれる要素識別子の種類を識別して当該検索式に対応するオートマトンを作成し、文書データに含まれるデータとオートマトンとを順に照合して検索式に該当するデータを出力するので、検索式の構成によらず、文書データから検索式に該当するデータを高速かつ効率よく検索することができる。 According to the present invention, when storing document data having a hierarchical structure in which elements are separated by element identifiers in a storage device, and obtaining a search expression for searching for data included in the document data stored in the storage device, Axis conversion is performed on the acquired search expression, the search expression is converted into a search expression constituted by child axes, and the type of element identifier included in the converted search expression is identified to correspond to the search expression An automaton is created, and the data contained in the document data and the automaton are collated in order, and the data corresponding to the search expression is output. Therefore, the data corresponding to the search expression can be quickly retrieved from the document data regardless of the structure of the search expression. You can search efficiently.

また、本発明によれば、検索式に兄弟軸が存在しているか否かを判定し、兄弟軸が存在している場合に、当該兄弟軸を親軸と子供軸とに変換するので、ストリーム処理によって、検索式に該当するデータを効率よく検索することができる。 Further, according to the present invention, it is determined whether or not a sibling axis exists in the search expression. When the sibling axis exists, the sibling axis is converted into a parent axis and a child axis. By processing, data corresponding to the search expression can be efficiently searched.

また、本発明によれば、検索式に親軸が存在しているか否かを判定し、親軸が存在している場合に、当該親軸を子供軸に変換するので、ストリーム処理によって、検索式に該当するデータを効率よく検索することができる。軸変換の一例として「親軸を子供軸に変換」を扱うことができ、実現方法として例えば、参考文献１（D.Olteanu et al.,“XPath:Looking Forward”,Proc.XMLDM'02,2002.）がある。 Further, according to the present invention, it is determined whether or not a parent axis exists in the search formula, and when the parent axis exists, the parent axis is converted into a child axis. Data corresponding to the expression can be searched efficiently. As an example of axis conversion, “transform parent axis to child axis” can be handled. As an implementation method, for example, Reference 1 (D. Olteanu et al., “XPath: Looking Forward”, Proc. XMLDM'02, 2002 .)

また、本発明によれば、変換前の検索式と変換後の検索式との関係を同値に保ったまま、検索式に含まれる述語部の位置を変更するので、検索式に該当するデータを効率よく検索することができる。 In addition, according to the present invention, the position of the predicate part included in the search expression is changed while maintaining the relationship between the search expression before conversion and the search expression after conversion at the same value. You can search efficiently.

また、本発明によれば、文書データに含まれるデータとオートマトンとを順に照合していく過程において検出されるデータを一時記憶テーブルに順次記憶し、照合が終了した時点において一時記憶テーブルに記憶されているデータを出力するので、分岐を含む検索式に該当するデータを効率よく出力することができる。 In addition, according to the present invention, data detected in the process of sequentially collating data contained in document data and an automaton is sequentially stored in the temporary storage table, and stored in the temporary storage table when the collation is completed. Therefore, data corresponding to a search expression including a branch can be output efficiently.

以下に添付図面を参照して、この発明に係る照合処理プログラムおよび照合処理装置の好適な実施の形態を詳細に説明する。 Exemplary embodiments of a verification processing program and a verification processing device according to the present invention will be described below in detail with reference to the accompanying drawings.

まず、ＸＭＬデータおよびクエリの逆行軸について説明する。図１は、ＸＭＬデータの木表現とＸＭＬデータのストリーム表現とを示す図であり、図２は、逆行軸を含むクエリ（検索式）の一例を示す図である。 First, the reverse axis of the XML data and query will be described. FIG. 1 is a diagram illustrating a tree representation of XML data and a stream representation of XML data, and FIG. 2 is a diagram illustrating an example of a query (search expression) including a reverse axis.

図１に示すように、ＸＭＬデータの木表現では、ＸＭＬデータは、papers１０、paper１１，１２、author１３，１６、title１４，１５の各要素を有し、それぞれの要素を接続している。 As shown in FIG. 1, in the tree representation of XML data, the XML data has elements of papers10, paper11, 12, author13, 16, and title14, 15, and connects these elements.

具体的には、papers１０は、paper１１，１２に接続され、paper１１は、author１３およびtitle１４に接続され、paper１２は、title１５およびauthor１６に接続されている。また、author１３，１６は、文書データ「asai」に接続され、title１４は、文書データ「XML」に接続され、title１５は、文書データ「Data Stream」に接続されている。 Specifically, papers 10 is connected to papers 11 and 12, paper 11 is connected to author 13 and title 14, and paper 12 is connected to title 15 and author 16. Authors 13 and 16 are connected to document data “asai”, title 14 is connected to document data “XML”, and title 15 is connected to document data “Data Stream”.

ここで、papers１０とpaper１１，１２との関係を、親と子と定義する。また、paper１１，１２との関係を兄弟とし、paper１１を兄、paper１２を弟と定義する。同様に、paper１１とauthor１３、title１４との関係を親と子と定義する。また、author１３とtitle１４との関係を兄弟とし、author１３を兄、title１４を弟と定義する。 Here, the relationship between papers 10 and papers 11 and 12 is defined as a parent and a child. Also, the relationship between papers 11 and 12 is defined as a brother, paper 11 is defined as an elder brother, and paper 12 is defined as a younger brother. Similarly, the relationship between paper 11 and author 13 and title 14 is defined as a parent and a child. Further, the relationship between author13 and title14 is defined as a brother, author13 is defined as an elder brother, and title14 is defined as a younger brother.

また、paper１２とtitle１５、author１６との関係を親と子と定義する。そして、title１５とauthor１６との関係を兄弟とし、title１５を兄、author１６を弟と定義する。また、各要素の下側に接続された要素を子孫と定義する。例えば、papers１０の子孫は、paper１１，１２、author１３，１６、title１４，１５となる。 Further, the relationship between paper 12 and title 15 and author 16 is defined as a parent and a child. The relationship between title15 and author16 is defined as a brother, title15 is defined as an elder brother, and author16 is defined as a younger brother. In addition, an element connected to the lower side of each element is defined as a descendant. For example, the descendants of papers10 are paper11, 12, author13, 16, title14, 15.

なお、ＸＭＬデータのストリーム表現では、各要素がＸＭＬデータの木表現の左側の軸から順に並べられている。このストリーム表現によるＸＭＬデータに対してクエリによるデータ検索を行う場合には、メモリ使用量が少なくて済みかつ、巨大データをあつかいやすいというメリットがあるが、すでに読んだデータを再度読むことができない。例えば、ストリーム表現されたＸＭＬデータにおいて、（open,title）（text,“XML”）を参照した後に、（open,author）（text,“asai”）を読むことができない。 In the stream representation of XML data, each element is arranged in order from the left axis of the tree representation of XML data. In the case of performing a data search by a query with respect to the XML data represented by this stream expression, there is an advantage that the memory usage is small and it is easy to deal with huge data, but already read data cannot be read again. For example, in the stream-represented XML data, (open, author) (text, “asai”) cannot be read after (open, title) (text, “XML”) is referenced.

続いて、図２の説明に移行すると、図２に示すクエリの意味は、[../author=“asai”]という制約条件のもとで、papers直下のpaper直下のtitle要素を検索するという意味である。なお、図２の場合の制約条件[../author=“asai”]の意味は、title要素の親（この場合はpaper）の直下に「asai」という文書データを持ったauthorが存在するという制約条件である。 Subsequently, when shifting to the explanation of FIG. 2, the meaning of the query shown in FIG. 2 is to search for the title element directly under the papers under the constraint condition [../author=“asai ”]. Meaning. In the case of FIG. 2, the constraint condition [../author=“asai ”] means that there is an author having document data“ asai ”directly under the parent of the title element (in this case, paper). It is a constraint condition.

図２のクエリによって検索される要素は、図１のtitle１４とtitle１５となり、図３に示すような検索結果が表示されることになる。図３は、図１のＸＭＬデータを図２に示すクエリで検索した場合の検索結果を示す図である。 The elements searched by the query of FIG. 2 are title14 and title15 of FIG. 1, and the search results as shown in FIG. 3 are displayed. FIG. 3 is a diagram showing a search result when the XML data of FIG. 1 is searched by the query shown in FIG.

しかしながら、図２に示すクエリは、一旦、titleを参照した後に、titleの親軸paperを参照する必要があるため（逆行軸を含んでいるため）、図１のＸＭＬデータをストリーム処理によってそのまま検索することが困難であるという問題がある。図２に示す例では、「../」が逆行軸を表している。 However, since the query shown in FIG. 2 needs to refer to the parent axis paper of the title after referring to the title once (including the retrograde axis), the XML data in FIG. There is a problem that it is difficult to do. In the example shown in FIG. 2, “../” represents the retrograde axis.

次に、本実施例にかかる照合処理装置について説明する。本実施例にかかる照合処理装置は、上述したような逆行軸や分岐などが含まれるクエリを軸変換アルゴリズムに基づいて軸変換し、軸変換したクエリを利用して、ストリーム処理によるＸＭＬデータの検索を実行する。このように、本実施例にかかる照合処理装置は、クエリの軸変換を行った後に、ストリーム処理によるＸＭＬデータの検索を実行するので、クエリの構成によらず、ＸＭＬデータからクエリに該当する文書データなどを高速かつ効率よく検索することができる。 Next, a verification processing apparatus according to the present embodiment will be described. The collation processing apparatus according to the present embodiment performs the axis conversion on the query including the retrograde axis and the branch as described above based on the axis conversion algorithm, and searches the XML data by the stream processing using the axis converted query. Execute. As described above, since the collation processing apparatus according to the present embodiment performs the search of the XML data by the stream processing after performing the axis conversion of the query, the document corresponding to the query from the XML data regardless of the configuration of the query. Data and the like can be searched quickly and efficiently.

図４は、本実施例にかかる照合処理装置の構成を示す機能ブロック図である。同図に示すように、この照合処理装置１００は、入力部１１０と、出力部１２０と、記憶部１３０と、前処理部１４０と、後処理部１５０とを備えて構成される。 FIG. 4 is a functional block diagram illustrating the configuration of the verification processing apparatus according to the present embodiment. As shown in the figure, the collation processing apparatus 100 includes an input unit 110, an output unit 120, a storage unit 130, a pre-processing unit 140, and a post-processing unit 150.

このうち、入力部１１０は、各種の情報を入力する入力手段であり、キーボードやマウス、マイク、データ読取装置などによって構成され、例えば、上述したＸＭＬデータ、クエリ等を入力する。出力部１２０は、各種の情報（例えば、クエリに該当するデータ）を出力する手段であり、モニタ（若しくはディスプレイ、タッチパネル）等によって構成される。 Among these, the input unit 110 is an input unit that inputs various types of information, and includes a keyboard, a mouse, a microphone, a data reading device, and the like, and inputs, for example, the above-described XML data, query, and the like. The output unit 120 is a unit that outputs various types of information (for example, data corresponding to a query), and includes a monitor (or a display, a touch panel) or the like.

記憶部１３０は、前処理部１４０および後処理部１５０による各種処理に必要なデータおよびプログラムを記憶する記憶手段（格納手段）であり、特に本発明に密接に関連するものとしては、ＸＭＬデータ１３１と、パストライ１３２と、ＢＩＮファイル１３３と、クエリデータ１３４と、兄弟対応テーブル１３５と、オートマトンデータ１３６と、ヒットテーブル１３７と、スタック１３８とを備える。 The storage unit 130 is a storage unit (storage unit) that stores data and programs necessary for various processes performed by the pre-processing unit 140 and the post-processing unit 150. The XML data 131 is particularly closely related to the present invention. A path trie 132, a BIN file 133, query data 134, a sibling correspondence table 135, automaton data 136, a hit table 137, and a stack 138.

ＸＭＬデータ１３１は、タグとして参照される要素識別子「＜」や、「／＞」などを使用した階層構造をなす文書データである。図５は、ＸＭＬデータのデータ構造の一例を示す図である。図５に示すＸＭＬデータを木表現で表すと、図６のように示すことができる。図６は、図５に示すＸＭＬデータを木表現で表した場合の図である。図６に関する説明は、図１の木表現にかかる説明と同様であるため、説明を省略する。 The XML data 131 is document data having a hierarchical structure using element identifiers “<” and “/>” that are referred to as tags. FIG. 5 is a diagram illustrating an example of the data structure of XML data. If the XML data shown in FIG. 5 is represented by a tree representation, it can be shown as in FIG. FIG. 6 is a diagram when the XML data shown in FIG. 5 is represented in a tree representation. 6 is the same as the description of the tree representation of FIG.

パストライ１３２は、ＸＭＬデータの重複するパスを省略し、ＸＭＬデータの各要素に一意のＩＤを割り当てたデータである。図７は、パストライ１３２のデータ構造の一例を示す図である。同図に示すように、このパストライ１３２は、複数のタグ（papers、paper、author、title）を備え、各タグに一意のＩＤを割り当てている。 The path trie 132 is data in which overlapping paths of XML data are omitted and a unique ID is assigned to each element of the XML data. FIG. 7 is a diagram illustrating an example of the data structure of the path trie 132. As shown in the figure, this path trie 132 includes a plurality of tags (papers, paper, author, title), and a unique ID is assigned to each tag.

図７に示す例では、タグ「papers」にタグＩＤ（１）を割り当て、タグ「paper」にタグＩＤ（２）を割り当て、タグ「author」にタグＩＤ（３）を割り当て、タグ「title」にタグＩＤ（４）を割り当てている。 In the example illustrated in FIG. 7, tag ID (1) is assigned to tag “papers”, tag ID (2) is assigned to tag “paper”, tag ID (3) is assigned to tag “author”, and tag “title” is assigned. Is assigned a tag ID (4).

なお、図６に示したＸＭＬデータ（木表現）では、paperからauthorに至る軸およびpaperからtitleに至る軸がそれぞれ重複しているので、パストライ１３２は重複した軸を一つの軸にまとめている。 In the XML data (tree representation) shown in FIG. 6, the axis from paper to author and the axis from paper to title overlap, so the path trie 132 combines the overlapping axes into one axis. .

図８は、図７に示した各タグのデータ構造の一例を示す図である。同図に示すように、このタグは、タグ名と、タグＩＤと、子節へのポインタとを備える。ここで、図７に示した「papers」のタグを例にあげて説明すると、タグ名には「papers」が登録され、タグＩＤには、タグＩＤ（１）が登録され、子節へのポインタには、子節となる「paper」のポインタが登録される。 FIG. 8 shows an example of the data structure of each tag shown in FIG. As shown in the figure, this tag includes a tag name, a tag ID, and a pointer to a child clause. Here, the “papers” tag shown in FIG. 7 will be described as an example. “Papers” is registered as the tag name, tag ID (1) is registered as the tag ID, and In the pointer, a pointer of “paper” as a child clause is registered.

ＢＩＮファイル１３３は、ＸＭＬデータ１３１（図５参照）に含まれる各要素をパストライ１３２（図７参照）の各タグのＩＤで置き換えたデータである。図９は、ＢＩＮファイル１３３のデータ構造の一例を示す図である。同図に示すように、このＢＩＮファイル１３３は、各要素の位置を識別するための識別番号１００１〜１０１０と、タグＩＤで置き換えた要素とから構成される。 The BIN file 133 is data obtained by replacing each element included in the XML data 131 (see FIG. 5) with the ID of each tag of the path trie 132 (see FIG. 7). FIG. 9 is a diagram illustrating an example of the data structure of the BIN file 133. As shown in the figure, the BIN file 133 is composed of identification numbers 1001 to 1010 for identifying the position of each element and elements replaced with tag IDs.

具体的に、図５と図９とを比較すると、＜papers＞は、[（１）に変換され、＜paper＞は、[（２）に変換され、＜author＞は、[（３）に変換され、＜title＞は、[（４）に変換されている。また、＜／papers＞は、／（１）に変換され、＜／paper＞は、／（２）に変換され、＜／author＞は、／（３）に変換され、＜／title＞は、／（４）に変換されている。 Specifically, comparing FIG. 5 and FIG. 9, <papers> is converted to [(1), <paper> is converted to [(2), and <author> is converted to [(3). After conversion, <title> is converted into [(4). Also, </ papers> is converted to / (1), </ paper> is converted to / (2), </ author> is converted to / (3), and </ title> / (4).

クエリデータ１３４は、入力部１１０から入力されるクエリを記憶したデータである。図１０は、クエリデータ１３４として記憶されるクエリの一例を示す図である。なお、図１０に示すクエリの意味は、図２において説明したクエリと同様の意味であるため説明を省略する。 The query data 134 is data that stores a query input from the input unit 110. FIG. 10 is a diagram illustrating an example of a query stored as the query data 134. The meaning of the query shown in FIG. 10 is the same as that of the query described in FIG.

兄弟対応テーブル１３５は、クエリに対して軸変換を実施した場合に、軸変換後の各要素の兄弟関係を記憶するためのテーブルである。図１１は、兄弟対応テーブル１３５のデータ構造の一例を示す図である。同図に示すように、この兄弟テーブル１３５には、各要素の兄弟関係が示されている。例えば、図１１では、「２＜３」と記録されているので、番号２，３番によって識別される各要素のうち、番号２の要素が兄となり、番号３の要素が弟となることを表している。 The sibling correspondence table 135 is a table for storing the sibling relationship of each element after axis conversion when axis conversion is performed on the query. FIG. 11 is a diagram illustrating an example of a data structure of the brother correspondence table 135. As shown in the figure, the sibling table 135 shows the sibling relationship of each element. For example, in FIG. 11, since “2 <3” is recorded, among the elements identified by the numbers 2 and 3, the element of the number 2 becomes an older brother and the element of the number 3 becomes an older brother. Represents.

オートマトンデータ１３６は、軸変換されたクエリに基づいて生成されるオートマトンを記憶するデータである。オートマトンデータ１３６に関する詳しい説明は後述する。 The automaton data 136 is data for storing an automaton generated based on the axis-converted query. Detailed description regarding the automaton data 136 will be described later.

ヒットテーブル１３７は、ＢＩＮファイル１３３と、オートマトンデータ１３６とを利用して、検索対象を検索する場合に利用するテーブルである。図１２は、ヒットテーブル１３７のデータ構造の一例を示す図である。同図に示すように、このヒットテーブル１３７は、文脈ノード検出イベントＣが発生したＢＩＮファイル１３３の位置および述語受理イベント（Ａｍ）が派生したＢＩＮファイル１３３の位置を格納するフィールドを複数有する。なお、文脈ノード検出イベントおよび述語受理イベントに関する説明は後述する。 The hit table 137 is a table used when searching for a search object using the BIN file 133 and the automaton data 136. FIG. 12 is a diagram illustrating an example of the data structure of the hit table 137. As shown in the figure, the hit table 137 has a plurality of fields for storing the position of the BIN file 133 where the context node detection event C occurs and the position of the BIN file 133 where the predicate acceptance event (Am) is derived. Note that a description of the context node detection event and the predicate acceptance event will be described later.

スタック１３８は、ヒットテーブル１３７に格納するデータを一時的に記憶するデータである。図１３は、スタックのデータ構造の一例を示す図である。同図に示すように、スタック１３８は、文脈ノード検出イベントＣが発生したＢＩＮファイル１３３の位置および述語受理イベント（Ａｍ）が派生したＢＩＮファイル１３３の位置を格納するフィールドを１つ有する。 The stack 138 is data that temporarily stores data to be stored in the hit table 137. FIG. 13 is a diagram illustrating an example of a data structure of a stack. As shown in the figure, the stack 138 has one field for storing the position of the BIN file 133 where the context node detection event C occurs and the position of the BIN file 133 where the predicate acceptance event (Am) is derived.

図４の説明に戻ると、前処理部１４０は、ＸＭＬデータ１３１に基づいて、パストライ１３２およびＢＩＮファイル１３３を生成する手段であり、パストライ作成部１４１およびＢＩＮファイル作成部１４２を備える。なお、前処理部１４０は、入力部１１０から、ＸＭＬデータを取得した場合には、取得したＸＭＬデータを記憶部１３０に記憶させる。 Returning to the description of FIG. 4, the pre-processing unit 140 is a unit that generates a path trie 132 and a BIN file 133 based on the XML data 131, and includes a path trie creation unit 141 and a BIN file creation unit 142. When the preprocessing unit 140 acquires XML data from the input unit 110, the preprocessing unit 140 stores the acquired XML data in the storage unit 130.

パストライ作成部１４１は、ＸＭＬデータ１３１（図５参照）に基づいて、パストライ１３２（図７参照）を作成する手段である。具体的に、パストライ作成部１４１は、ＸＭＬデータ１３１を解析し、ＸＭＬデータ１３１の重複するパスを検出する。そして、ＸＭＬデータ１３１に重複するパスが存在する場合には、重複するパスのうち、１つのパスを残した状態で、ＸＭＬデータ１３１の各要素に対応するタグを作成し、ＸＭＬデータ１３１の親子関係に則して、各タグを接続したパストライ１３２（図７参照）を作成する。また、パストライ作成部１４１は、各タグに一意のタグＩＤを割り当てる。 The path trie creation unit 141 is a unit that creates a path trie 132 (see FIG. 7) based on the XML data 131 (see FIG. 5). Specifically, the path trie creation unit 141 analyzes the XML data 131 and detects a path in which the XML data 131 overlaps. If there are overlapping paths in the XML data 131, a tag corresponding to each element of the XML data 131 is created with one path remaining among the overlapping paths, and the parent and child of the XML data 131 are created. In accordance with the relationship, a path trie 132 (see FIG. 7) in which each tag is connected is created. The path trie creation unit 141 assigns a unique tag ID to each tag.

ＢＩＮファイル作成部１４２は、ＸＭＬデータ１３１（図５参照）およびパストライ１３２（図７参照）を基にして、ＢＩＮファイル１３３（図９参照）を作成する手段である。具体的に、ＢＩＮファイル作成部１３３は、ＸＭＬデータ１３１の各要素と、パストライ１３２のタグ名とを比較し、ＸＭＬデータ１３１の各要素の名称に対応するタグ名のタグＩＤを割り当て、ＢＩＮファイル１３３を作成する。 The BIN file creation unit 142 is a means for creating a BIN file 133 (see FIG. 9) based on the XML data 131 (see FIG. 5) and the path trie 132 (see FIG. 7). Specifically, the BIN file creation unit 133 compares each element of the XML data 131 with the tag name of the path trie 132, assigns a tag ID of a tag name corresponding to the name of each element of the XML data 131, and outputs a BIN file. 133 is created.

後処理部１５０は、照合処理を行い、クエリデータ１３４に該当するデータを検出する手段であり、軸変換処理部１５１と、オートマトン作成部１５２と、照合処理部１５３とを備える。なお、後処理部１５０は、入力部１１０から、クエリデータを取得した場合には、クエリデータ１３４として、記憶部１３０に記憶する。また、後処理部１５０は、検出したデータを出力部１２０に出力する。 The post-processing unit 150 is a unit that performs collation processing and detects data corresponding to the query data 134, and includes an axis conversion processing unit 151, an automaton creation unit 152, and a collation processing unit 153. When the query data is acquired from the input unit 110, the post-processing unit 150 stores the query data as the query data 134 in the storage unit 130. In addition, the post-processing unit 150 outputs the detected data to the output unit 120.

軸変換処理部１５１は、クエリデータ１３４に対して軸変換を行う手段である。図１４は、軸変換処理部１５１の処理の概要を説明するための図である。同図に示すように、軸変換処理部１５１は、クエリ（逆行軸を含む）に対して軸変換を実行し、子供軸のみで構成されるクエリを生成する。そして、クエリの各要素名とパストライ１３２のタグ名とを比較して、各要素名に対応するタグ名のタグＩＤで各要素を変換する。 The axis conversion processing unit 151 is a unit that performs axis conversion on the query data 134. FIG. 14 is a diagram for explaining the outline of the processing of the axis conversion processing unit 151. As shown in the figure, the axis conversion processing unit 151 performs axis conversion on a query (including a retrograde axis), and generates a query including only child axes. Then, each element name of the query is compared with the tag name of the path trie 132, and each element is converted with the tag ID of the tag name corresponding to each element name.

以下において、軸変換処理部１５１の処理を具体的に説明する。軸変換において、軸変換処理部１５１は、クエリデータ１３４に対して、兄弟軸変換処理を実行した後に、親軸変換処理を実行する。ここでは、まず、軸変換処理部１５１が行う兄弟軸変換について説明する。 Below, the process of the axis conversion process part 151 is demonstrated concretely. In the axis conversion, the axis conversion processing unit 151 executes the parent axis conversion process after executing the sibling axis conversion process on the query data 134. Here, sibling axis conversion performed by the axis conversion processing unit 151 will be described first.

（兄弟軸変換処理）
兄弟軸変換処理において、軸変換処理部１５１は、クエリデータ１３４から兄弟軸を検出する。例えば、兄弟軸は、クエリ上で、「following-sibling」、「preceding-sibling」によって示される。軸変換処理部１５１は、兄弟軸を検出した場合に、兄弟軸変換ルールを用いて、兄弟軸を親軸と子供軸に変換し、兄弟関係を兄弟対応テーブル１３５に登録する。 (Brother axis conversion processing)
In the sibling axis conversion process, the axis conversion processing unit 151 detects a sibling axis from the query data 134. For example, sibling axes are indicated by “following-sibling” and “preceding-sibling” on the query. When detecting the sibling axis, the axis conversion processing unit 151 converts the sibling axis into the parent axis and the child axis using the sibling axis conversion rule, and registers the sibling relationship in the sibling correspondence table 135.

兄弟軸変換ルールは、
「/a/following-sibling::b⇒/a/../b」
「/a/preceding-sibling::b⇒/a/../b」
となる。 Sibling axis conversion rules are
“/A/following-sibling::b⇒/a/../b”
"/A/preceding-sibling::b⇒/a/../b"
It becomes.

図１５は、兄弟軸変換処理を補足説明するための図である。同図において、ＸＭＬデータのノード「Ｃ」を検索するためのクエリは、「/a/b/following-sibling::c」となり、兄弟軸「following-sibling」を含んでいることがわかる。このクエリ「/a/b/following-sibling::c」に上記した兄弟軸変換ルールを適用すると、「/a/b/../c」に変換することができ、兄弟軸を、親軸と子供軸のみで表すことが可能となる。 FIG. 15 is a diagram for supplementarily explaining the sibling axis conversion process. In the drawing, the query for searching the node “C” of the XML data is “/ a / b / following-sibling :: c”, and it is understood that the sibling axis “following-sibling” is included. By applying the sibling axis conversion rule described above to this query “/ a / b / following-sibling :: c”, it is possible to convert it to “/a/b/../c”. It is possible to represent only with the child axis.

また、軸変換処理部１５１は、兄弟軸を親軸と子供軸に変換した場合に、その兄弟関係を、兄弟対応テーブル１３５に登録する。図１５に示す例では、２番によって識別されるｂが兄、３番によって識別されるｃが弟であるため、兄弟対応テーブル１３５に登録される情報は、「２＜３」となる。 In addition, the axis conversion processing unit 151 registers the sibling relationship in the sibling correspondence table 135 when the sibling axis is converted into the parent axis and the child axis. In the example shown in FIG. 15, b identified by No. 2 is an older brother, and c identified by No. 3 is a younger brother, so the information registered in the brother correspondence table 135 is “2 <3”.

軸変換処理部１５１は、兄弟軸を親軸と子供軸に変換した後に、変換したクエリに対して同値性ルールを適用し、述語部（クエリの［]の部分）のネスト（[]の内部に更に[]が存在するものをネストと呼ぶ）を消す。また、連続する述語部では、親軸を含む述語部が先頭に来るように、同値性ルールを適用してクエリを並び替える。例えば、「π[a][../b][c/d]」に同値性ルールを適用して、「π[../b][a][c/d]」に並び替える。 The axis conversion processing unit 151 converts the sibling axis into a parent axis and a child axis, and then applies an equivalence rule to the converted query, and nests the predicate part ([] part of the query) If there is a [] in it, it is called a nest. In the continuous predicate parts, the equivalence rule is applied to rearrange the queries so that the predicate part including the parent axis comes first. For example, the equivalence rule is applied to “π [a] [../ b] [c / d]” and rearranged to “π [../ b] [a] [c / d]”.

同値性ルールは、下記のように、同値性ルール１〜７が存在する。なお、下記のπ１、π２は、任意のクエリのパス表現である。また、任意の子節ｘ∈Ｎに対してＳ[π１]（ｘ）＝Ｓ[π２]（ｘ）が成り立つとき、π１とπ２は同値であるといい、「π１≡π２」と表記する。
同値性ルール１：π１/π≡π２/π（π１≡π２の場合のみ適用）
同値性ルール２：π/π１≡π/π２（π１≡π２の場合のみ適用）
同値性ルール３：π[π１]≡π[π２]（π１≡π２の場合のみ適用）
同値性ルール４：π１[π]≡π２[π]（π１≡π２の場合のみ適用）
同値性ルール５：π[π１[π２]]≡π[π１/π２]
同値性ルール６：π[[π１]π２]≡π[π１][π２]
同値性ルール７：π[π１][π２]≡π[π２][π１] Equivalence rules 1 to 7 exist as follows. Note that π1 and π2 below are path expressions of arbitrary queries. Further, when S [π1] (x) = S [π2] (x) holds for an arbitrary child node x∈N, π1 and π2 are said to be the same value and are expressed as “π1≡π2”.
Equivalence rule 1: π1 / π≡π2 / π (applicable only when π1≡π2)
Equivalence rule 2: π / π1≡π / π2 (applicable only when π1≡π2)
Equivalence rule 3: π [π1] ≡π [π2] (applicable only when π1≡π2)
Equivalence rule 4: π1 [π] ≡π2 [π] (applicable only when π1≡π2)
Equivalence rule 5: π [π1 [π2]] ≡π [π1 / π2]
Equivalence rule 6: π [[π1] π2] ≡π [π1] [π2]
Equivalence rule 7: π [π1] [π2] ≡π [π2] [π1]

（親軸変換処理）
親軸変換処理において、軸変換処理部１５１は、クエリデータ親軸を検出する。そして、軸変換処理部１５１は、親軸変換ルールを適用し、検出した親軸を子供軸に変換する。
なお、親軸を子供軸に変換する手法は、例えば、参考文献１（D.Olteanu et al.,“XPath:Looking Forward”,Proc.XMLDM'02,2002.）に開示された手法を用いることができる。
親軸変換ルールは、
親軸変換ルール１：π/a/../≡π[a]
親軸変換ルール２：a/../≡./[π]/a
が存在する。 (Parent axis conversion process)
In the parent axis conversion process, the axis conversion processing unit 151 detects the query data parent axis. Then, the axis conversion processing unit 151 applies a parent axis conversion rule and converts the detected parent axis into a child axis.
For example, the method disclosed in Reference Document 1 (D. Olteanu et al., “XPath: Looking Forward”, Proc. XMLDM'02, 2002.) is used as a method for converting a parent axis into a child axis. Can do.
The parent axis conversion rule is
Parent axis conversion rule 1: π / a /../ ≡π [a]
Parent axis conversion rule 2: a /../ ≡ ./ [π] / a
Exists.

軸変換処理部１５１は、親軸を子供軸に変換した後に、変換したクエリに対して同値性ルールを適用し、述語部のネストを消す。また、連続する述語部では、親軸を含む述語部が先頭に来るように、同値性ルールを適用してクエリを並び替える。なお、同値性ルールは、上記した同値性ルール１〜７と同様であるため説明を省略する。 After converting the parent axis to the child axis, the axis conversion processing unit 151 applies the equivalence rule to the converted query and deletes the predicate part from nesting. In the continuous predicate parts, the equivalence rule is applied to rearrange the queries so that the predicate part including the parent axis comes first. Note that the equivalence rule is the same as the above equivalence rules 1 to 7, and thus the description thereof is omitted.

ここで、親軸変換ルールおよび同値性ルールを適用して、親軸を含むクエリのパスを変換する処理の具体例を示す。変換対象となるクエリのパスを
π＝/b1/b2[b3/b4/../../../b8]
とする。このパスπには、変換すべき親軸「../」が３つ含まれている。 Here, a specific example of processing for converting a path of a query including the parent axis by applying the parent axis conversion rule and the equivalence rule will be described. The path of the query to be converted is π = /b1/b2[b3/b4/../../../b8]
And This path π includes three parent axes “../” to be converted.

πの一番左の親軸に対して親軸変換ルール１を適用したものをπ_１とすると、
π_１＝/b1/b2[b3[b4]../../b8]
となる。そして、π_１の一番左の親軸に対して親軸変更ルール１を適用したものをπ_２とすると、
π_２＝/b1/b2[b3[b4]]../b8]
となる。 The When [pi ₁ obtained by applying the parent axis conversion rule 1 for the leftmost parent axis of the [pi,
π ₁ = / b1 / b2 [b3 [b4] ../../ b8]
It becomes. And, if the parent axis change rule 1 applied to the leftmost parent axis of π ₁ is π ₂ ,
π ₂ = / b1 / b2 [b3 [b4]] ../ b8]
It becomes.

続いて、π_２に同値性ルール５を適用すると、
π_２＝/b1/b2[b3/b4]../b8]
となり、同値性ルール５を適用したπ_２に同値性ルール６を適用すると、
π_２＝/b1/b2[b3/b4][../b8]
となる。 Subsequently, applying equivalence rule 5 to π ₂ gives
π ₂ = / b1 / b2 [b3 / b4] ../ b8]
When the equivalence rule 6 is applied to π ₂ to which the equivalence rule 5 is applied,
π ₂ = / b1 / b2 [b3 / b4] [../ b8]
It becomes.

また、同値性ルール６を適用したπ_２に同値性ルール７を適用すると、
π_２＝/b1/b2[../b8][b3/b4]
となる。そして、同値性ルール５〜７を適用したπ_２に親軸変換ルール２を適用したものをπ_３とすると、
π_２＝/b1[b8]b2[b3/b4]
となる。 When the equivalence rule 7 is applied to π ₂ to which the equivalence rule 6 is applied,
π ₂ = / b1 / b2 [../ b8] [b3 / b4]
It becomes. Then, when π ₃ is the π ₂ to which the equivalence rules 5 to 7 are applied, and the parent axis conversion rule 2 is π ₃ ,
π ₂ = / b1 [b8] b2 [b3 / b4]
It becomes.

なお、軸変換処理部１５１は、クエリに対して親軸（あるいは先祖軸）変換処理を行う場合には、子孫軸をパストライで展開してから、親軸の変換を行う。例えば、
π＝/a//../d
に対して親軸変換処理を実行する場合には、
π＝/a/b/../d，a/b/c/d/../d
に展開した後に、親軸変換処理を行い、
π＝/a[b]d，a/b/c[b]d
に変換する。 In addition, when performing the parent axis (or ancestor axis) conversion processing on the query, the axis conversion processing unit 151 performs parent axis conversion after expanding the descendant axes with a path trie. For example,
π = /a//../d
When executing the parent axis conversion process for
π = /a/b/../d, a / b / c / d /../ d
After expanding to the parent axis conversion process,
π = / a [b] d, a / b / c [b] d
Convert to

軸変換処理部１５１は、クエリデータ１３４に記憶されたクエリに対して、兄弟軸変換処理および親軸変換処理を実行し、軸変換を実施したクエリをクエリデータ１３４に登録する（軸変換前のクエリを軸変換後のクエリによって更新する）。そして、軸変換処理部１５１は、変換後のクエリの各要素名と、パストライ１３２のタグ名とを比較して、クエリの各要素名をタグＩＤに変換する。タグＩＤに変換したクエリを変換クエリと表記する。 The axis conversion processing unit 151 performs a sibling axis conversion process and a parent axis conversion process on the query stored in the query data 134, and registers the query subjected to the axis conversion in the query data 134 (before the axis conversion). Update the query with the post-axis conversion query). Then, the axis conversion processing unit 151 compares each element name of the converted query and the tag name of the path trie 132, and converts each element name of the query into a tag ID. A query converted to a tag ID is referred to as a conversion query.

図４の説明に戻ると、オートマトン作成部１５２は、軸変換処理部１５１によって作成された変換クエリに対応するオートマトンデータを作成する手段である。オートマトン作成部１５２が作成したオートマトンデータは、オートマトンデータ１３６として記憶部１３０に記憶される。 Returning to the description of FIG. 4, the automaton creating unit 152 is a unit that creates automaton data corresponding to the transformation query created by the axis transformation processing unit 151. The automaton data created by the automaton creation unit 152 is stored in the storage unit 130 as automaton data 136.

ここで、オートマトン作成部１５２の処理について具体的に説明する。図１６は、オートマトン作成部１５２の処理を補足説明するための図である。なお、ここでは、説明の便宜上、クエリを
Q=/Syain/ACT/[contains(cast,"浅井")]chara[contains(name,"ブルー")]
とし、かかるクエリの各要素をタグＩＤに変換した変換クエリを
Q'=(2)[(5):e1](3)[(6):e2]
とし、オートマトン生成にかかる説明を行う。なお、かかる変換クエリＱ’において、「（２）」は「/Syain/ACT」に対応し、「（３）」は「chara」に対応し、「[（５）：ｅ１]」は、「[contains(cast,"浅井")]」に対応し、「[（６）：ｅ２]」は、「[contains(name,"ブルー")]」に対応する（変換クエリＱ’に対応するオートマトンが、図１６の下段に示すオートマトンとなる）。 Here, the processing of the automaton creation unit 152 will be specifically described. FIG. 16 is a diagram for supplementarily explaining the processing of the automaton creation unit 152. Here, for convenience of explanation, the query is
Q = / Syain / ACT / [contains (cast, "Asai")] chara [contains (name, "Blue")]
And a conversion query that converts each element of the query into a tag ID
Q '= (2) [(5): e1] (3) [(6): e2]
Let us explain the automaton generation. In this conversion query Q ′, “(2)” corresponds to “/ Syain / ACT”, “(3)” corresponds to “chara”, and “[(5): e1]” corresponds to [contains (cast, "Asai")], and "[(6): e2]" corresponds to "[contains (name," blue ")]" (automaton corresponding to conversion query Q ' Is the automaton shown in the lower part of FIG. 16).

図１６に示すオートマトンは、複数のノード構造体２０〜２７、イベント構造体３０〜３４を備えている。また、各ノード構造体２０〜２６、イベント構造体３０〜３１を結ぶ線は、かかる線に対応する条件を満たした場合に、矢印の方向に処理が移行することとなる。なお、図１６におけるεは、無条件で矢印の方向に処理が移行することを示し、Σ＼｛ｎ｝は、ｎ以外の場合に、矢印の方向に処理が移行することを示している。 The automaton illustrated in FIG. 16 includes a plurality of node structures 20 to 27 and event structures 30 to 34. In addition, when a line connecting the node structures 20 to 26 and the event structures 30 to 31 satisfies a condition corresponding to the line, the process moves in the direction of the arrow. Note that ε in FIG. 16 indicates that the process shifts in the direction of the arrow unconditionally, and Σ \ {n} indicates that the process shifts in the direction of the arrow in cases other than n.

まず、オートマトン作成部１５２は、変換クエリＱ’を解析して、
述語パスＩＤの集合：A=｛a1,...an｝（nは自然数）
分岐パスＩＤの集合：A=｛z1,...zn｝（nは自然数）
文脈パスＩＤ：c
評価パスＩＤ：ｄ
各ai∈Ａに対するキーワード集合key(ai)
を抽出する。 First, the automaton creation unit 152 analyzes the conversion query Q ′, and
Set of predicate path IDs: A = {a1, ... an} (n is a natural number)
Set of branch path IDs: A = {z1, ... zn} (n is a natural number)
Context path ID: c
Evaluation path ID: d
Keyword set key (ai) for each ai∈A
To extract.

図１６に示す変換クエリＱ’では、オートマトン作成部１５２は、述語パスＩＤの集合として、「（５）、（６）」を抽出し、分岐パスＩＤの集合として、「（２）、（３）」を抽出する。また、文脈パスＩＤとして「（３）」を抽出する。文脈パスＩＤの抽出方法としては、変換クエリＱ’の最後の述語部[]の前に該当するものを抽出する。 In the conversion query Q ′ illustrated in FIG. 16, the automaton creation unit 152 extracts “(5), (6)” as a set of predicate path IDs, and “(2), (3 ) ”Is extracted. Also, “(3)” is extracted as the context path ID. As a method for extracting the context path ID, a corresponding one is extracted before the last predicate part [] of the conversion query Q ′.

また、オートマトン作成部１５２は、評価パスＩＤとして「（２）」を抽出する。評価パスＩＤは、例えば、変換クエリＱ’の最左のものが抽出される。そして、キーワード集合key(ai)として、「e1（浅井）」、「e2（ブルー）」を抽出する。 In addition, the automaton creation unit 152 extracts “(2)” as the evaluation path ID. As the evaluation path ID, for example, the leftmost one of the conversion query Q ′ is extracted. Then, “e1 (Asai)” and “e2 (blue)” are extracted as the keyword set key (ai).

続いて、オートマトン作成部１５２は、オートマトンの初期状態Ini（図１６のノード構造体２０）、開始状態Open（開始記号”[”）を読んだ状態；ノード構造体２１）、終了状態Close（終了記号”/”；ノード構造体２７）を作成する。なお、Goto(Ini、”[”)＝OpenおよびGoto(Ini、”/”)＝Closeとする。 Subsequently, the automaton creating unit 152 reads the initial state Ini (node structure 20 in FIG. 16), the start state Open (start symbol “[”) of the automaton; the node structure 21), and the end state Close (end). The symbol “/”; node structure 27) is created. Note that Goto (Ini, “[”) = Open and Goto (Ini, “/”) = Close.

オートマトン作成部１５２は、任意のｉ＝１〜ｎに対して、以下の処理１−１〜１−６を行う。まず、処理１−１において、オートマトン作成部１５２は、述語パスＩＤ（ai∈A）に対応する状態State(ai)を作成する。図１６に示す例では、（５）に対応するState(a1)のノード構造体２２と（６）に対応するState（a2）のノード構造体２４とが生成される。 The automaton creation unit 152 performs the following processing 1-1 to 1-6 for any i = 1 to n. First, in process 1-1, the automaton creation unit 152 creates a state State (ai) corresponding to the predicate path ID (aiεA). In the example shown in FIG. 16, a node structure 22 of State (a1) corresponding to (5) and a node structure 24 of State (a2) corresponding to (6) are generated.

処理１−２において、オートマトン作成部１５２は、key(ai)を受理するキーワード参照オートマトンを作成し、各状態State（ai）のノード構造体からつなげる。図１６に示す例では、State(a1)のノード構造体２２からイベント構造体３０「Ａ１」に至るまでの各ノード構造体２２，２３とイベント構造体３０とを繋ぎ、State(a2)のノード構造体２４からイベント構造体３１「Ａ２」に至るまでのノード構造体２４，２５，２６とイベント構造体３１とを繋ぐ。 In process 1-2, the automaton creation unit 152 creates a keyword reference automaton that accepts key (ai) and connects it from the node structure of each state State (ai). In the example shown in FIG. 16, the node structures 22 and 23 from the node structure 22 of State (a1) to the event structure 30 “A1” are connected to the event structure 30, and the node of State (a2) The node structures 24, 25 and 26 from the structure 24 to the event structure 31 “A2” and the event structure 31 are connected.

続いて、処理１−３において、オートマトン作成部１５２は、各状態State(ai)に対して、Goto(Open,ai)＝State(ai)となるように、State(a1)のノード構造体２２とノード構造体２１とを接続し、State(a2)のノード構造体２４とノード構造体２１とを接続する。 Subsequently, in process 1-3, the automaton creation unit 152 sets the node structure 22 of State (a1) so that Goto (Open, ai) = State (ai) for each state State (ai). And the node structure 21 are connected, and the node structure 24 and the node structure 21 of State (a2) are connected.

また、処理１−４では、オートマトン作成部１５２は、パストライ上におけるaiの任意の子供に対して、Goto(Close,b)＝State(ai)となるように、State(a2)のノード構造体２４とノード構造体２７とを接続する。図１６に示す例では、タグＩＤ（６）に該当するタグ（name）の子供がタグＩＤ（７）に該当するタグ（ID）となる。 Also, in process 1-4, the automaton creating unit 152 sets the node structure of State (a2) so that Goto (Close, b) = State (ai) for any child of ai on the path try. 24 and the node structure 27 are connected. In the example shown in FIG. 16, the child of the tag (name) corresponding to the tag ID (6) becomes the tag (ID) corresponding to the tag ID (7).

処理１−５において、オートマトン作成部１５２は、分岐パスＩＤ（zi∈）に対応する状態State(zi)を作成する。図１６に示す例では、（２）に対応するState(z1)のイベント構造体３２「Ｚ１」と、（３）に対応するState(z2)のイベント構造体３３「Ｚ２」とが生成される。 In process 1-5, the automaton creation unit 152 creates a state State (zi) corresponding to the branch path ID (ziε). In the example shown in FIG. 16, an event structure 32 “Z1” of State (z1) corresponding to (2) and an event structure 33 “Z2” of State (z2) corresponding to (3) are generated. .

処理１−６において、オートマトン作成部１５２は、各状態State(z1)に対して、Goto(close,zi)＝State(zi)となるように、State(z1)のイベント構造体３２とノード構造体２７とを接続し、State(z2)のイベント構造体３３とノード構造体２７とを接続する。 In process 1-6, the automaton creation unit 152 sets the event structure 32 and the node structure of State (z1) so that Goto (close, zi) = State (zi) is satisfied for each state State (z1). The body 27 is connected, and the event structure 33 of State (z2) and the node structure 27 are connected.

続いて、オートマトン作成部１５２は、文脈パスＩＤ「c」に対する状態State(c)を作成する。図１６に示す例では、イベント構造体３４「Ｃ」が作成される。そして、Goto(Open,c)＝State(c)となるように、ノード構造体２１とイベント構造体３４とを接続する。 Subsequently, the automaton creation unit 152 creates a state State (c) for the context path ID “c”. In the example shown in FIG. 16, the event structure 34 “C” is created. Then, the node structure 21 and the event structure 34 are connected so that Goto (Open, c) = State (c).

また、オートマトン作成部１５２は、評価パスＩＤ「d」に対応する状態State(d)を作成する。図１６に示す例では、イベント構造体３２「Ｄ」が作成される（図１６では、「Ｚ１」と「Ｄ」とを一つのイベント構造体３２にまとめている）。そして、そして、Goto(close,d)＝State(d)となるように、ノード構造体３３とイベント構造体２７とを接続する。 Further, the automaton creating unit 152 creates a state State (d) corresponding to the evaluation path ID “d”. In the example shown in FIG. 16, an event structure 32 “D” is created (in FIG. 16, “Z1” and “D” are combined into one event structure 32). Then, the node structure 33 and the event structure 27 are connected so that Goto (close, d) = State (d).

上述したような各種の処理をオートマトン作成部１５２は実行し、変換クエリＱ’に対応するオートマトンデータを作成し、作成したオートマトンデータを記憶部１３０に記憶する。 The automaton creation unit 152 executes various processes as described above, creates automaton data corresponding to the conversion query Q ′, and stores the created automaton data in the storage unit 130.

ここで、上述したオートマトンデータに含まれるノード構造体のデータ構造およびイベント構造体のデータ構造について説明する。図１７は、ノード構造体のデータ構造の一例を示す図であり、図１８は、イベント構造体のデータ構造の一例を示す図である。 Here, the data structure of the node structure and the data structure of the event structure included in the above-described automaton data will be described. FIG. 17 is a diagram illustrating an example of the data structure of the node structure, and FIG. 18 is a diagram illustrating an example of the data structure of the event structure.

図１７に示すように、ノード構造体は、ノード構造体を識別するノードＩＤと、イベント構造体へのポインタと、他のノード構造体へのポインタを備える。例えば、図１６に示すノード構造体２１を例にあげると、イベント構造体へのポインタは、イベント構造体３４に該当するポインタが格納される。また、ノード構造体へのポインタは、ノード２０，２２，２４に該当するポインタが格納される。 As shown in FIG. 17, the node structure includes a node ID for identifying the node structure, a pointer to the event structure, and a pointer to another node structure. For example, taking the node structure 21 shown in FIG. 16 as an example, a pointer corresponding to the event structure 34 is stored as a pointer to the event structure. In addition, pointers corresponding to the nodes 20, 22, and 24 are stored as pointers to the node structure.

また、図１８に示すように、イベント構造体は、イベント構造体を識別するイベントＩＤと、クエリを識別するクエリＩＤと、イベント型（文脈ノード検出イベント、述語受理イベント、述語評価イベント、クエリ評価イベント）を識別するイベント型と、イベント構造体のデータ位置と、他のイベント構造体へのポインタを備える。 As shown in FIG. 18, the event structure includes an event ID for identifying the event structure, a query ID for identifying the query, and an event type (context node detection event, predicate acceptance event, predicate evaluation event, query evaluation). Event type for identifying the event), the data position of the event structure, and pointers to other event structures.

図４の説明に戻ると、照合処理部１５３は、ＢＩＮファイル１３３とオートマトンデータ１３６と基にして、クエリデータ１３４に該当するデータを出力する手段である。ここで、照合処理部１５３の処理を具体的に説明する。なお、ここでは説明の便宜上、図１９に示すＢＩＮファイルと、図１６に示したオートマトンデータとを用いて説明する。図１９は、照合処理を説明するためのＢＩＮファイルのデータ構造の一例を示す図である。 Returning to the explanation of FIG. 4, the collation processing unit 153 is means for outputting data corresponding to the query data 134 based on the BIN file 133 and the automaton data 136. Here, the process of the collation process part 153 is demonstrated concretely. For convenience of explanation, the BIN file shown in FIG. 19 and the automaton data shown in FIG. 16 are used for explanation. FIG. 19 is a diagram illustrating an example of the data structure of a BIN file for explaining the collation processing.

なお、照合処理部１５３がＢＩＮファイルをオートマトンデータ代入して処理を遂行する過程において発生するイベントＥを、Ｅ＝（Ｑ，Ｔ，Ｐ）と定義する。ここで、イベントＥに含まれる「Ｑ」はクエリＩＤを示し、「Ｔ」はイベント型を示し、「Ｐ」はイベントが発生した瞬間のデータ位置を示す。 An event E that occurs in the process in which the verification processing unit 153 performs processing by substituting the BIN file for automaton data is defined as E = (Q, T, P). Here, “Q” included in the event E indicates a query ID, “T” indicates an event type, and “P” indicates a data position at the moment when the event occurs.

照合処理部１５３は、イベントＥの「Ｔ」が文脈ノード検出イベント（Ｃ）のとき、クエリＩＤ「Ｑ」のヒットテーブル１３７（図１２参照）に新規エントリを登録し、登録した新規エントリの内容に、現在のスタック１３８（図１３参照）の内容を登録する。 When “T” of event E is the context node detection event (C), the matching processing unit 153 registers a new entry in the hit table 137 (see FIG. 12) with the query ID “Q”, and the contents of the registered new entry The contents of the current stack 138 (see FIG. 13) are registered.

照合処理部１５３は、イベントＥの「Ｔ」が述語受理イベント（Ａｍ）のとき、クエリＩＤ「Ｑ」のヒットテーブル１３７、およびスタック１３８の第ｍ項目にイベントＥに含まれる「Ｐ」を登録する。 When “T” of event E is a predicate acceptance event (Am), the matching processing unit 153 registers “P” included in the event E in the hit table 137 of the query ID “Q” and the mth item of the stack 138. To do.

照合処理部１５３は、イベントＥの「Ｔ」が述語評価イベント（Ｚｍ）のとき、クエリＩＤ「Ｑ」のヒットテーブル１３７において、第ｍ目が空欄となっているエントリを削除し、スタック１３８の第ｍ項目を削除する。 When “T” of event E is a predicate evaluation event (Zm), the matching processing unit 153 deletes the entry in which the mth column is blank in the hit table 137 of the query ID “Q”, and the stack 138 Delete the mth item.

照合処理部１５３は、イベントＥの「Ｔ」がクエリ評価イベント（Ｄ）のとき、クエリＩＤ「Ｑ」のヒットテーブルに生き残っているエントリを、正解として出力部１２０に出力する。 When “T” of event E is a query evaluation event (D), the collation processing unit 153 outputs the entry remaining in the hit table with the query ID “Q” to the output unit 120 as a correct answer.

以上をふまえた上で、図１６に示すオートマトンと図１９に示すＢＩＮファイルと用いた照合処理部１５３の処理をＢＩＮファイルの位置「１００１」〜「１０１１」に分けて説明する。 Based on the above, the processing of the collation processing unit 153 using the automaton shown in FIG. 16 and the BIN file shown in FIG. 19 will be described separately for the positions “1001” to “1011” of the BIN file.

（ＢＩＮファイルの位置「１００１」）
照合処理部１５３は、ＢＩＮファイルの位置「１００１」に対応するデータ「[(1) シグマ戦隊中原ジャー」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点とし、ノード構造体２１に移行した段階で、次に対応する文字が存在しなくなるため、ノード構造体２０に戻り、位置「１００１」の検索は終了する。 (BIN file position “1001”)
The matching processing unit 153 substitutes the data “[((1) Sigma Squadron Nakahara Jar)” corresponding to the position “1001” of the BIN file for the automaton. Then, such data starts from the node structure 20 and when it moves to the node structure 21, the next corresponding character does not exist, so the data returns to the node structure 20 and the search for the position “1001” is completed. To do.

（ＢＩＮファイルの位置「１００２」）
照合処理部１５３は、ＢＩＮファイルの位置「１００２」に対応するデータ「[(2)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点とし、ノード構造体２１に移行した段階で、次に対応する文字が存在しなくなるため、ノード構造体２０に戻り、位置「１００２」の検索は終了する。 (BIN file position “1002”)
The verification processing unit 153 substitutes the data “[(2)” corresponding to the position “1002” of the BIN file for the automaton. Then, such data starts from the node structure 20 and when it moves to the node structure 21, the next corresponding character does not exist, so the data returns to the node structure 20 and the search for the position “1002” is completed. To do.

（ＢＩＮファイルの位置「１００３」）
照合処理部１５３は、ＢＩＮファイルの位置「１００３」に対応するデータ「[(3) シグマブルー１」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３４に到達する。イベント構造体３４に到達した時点で、照合処理部１５３は、イベントＥ１＝（Ｑ１、Ｃ、１００３）を発生させる。 (BIN file position “1003”)
The matching processing unit 153 substitutes the data “[(3) Sigma Blue 1” corresponding to the position “1003” of the BIN file for the automaton. Then, such data reaches the event structure 34 starting from the node structure 20. When the event structure 34 is reached, the verification processing unit 153 generates an event E1 = (Q1, C, 1003).

図２０は、イベントＥ１＝（Ｑ１、Ｃ、１００３）が発生した時点での、ヒットテーブル１３７の状態を示す図である。なお、図２０に示すヒットテーブル１３７の「１００３」の行に対応する、Ａ１〜Ａｍには、スタック１３８の値がコピーされる（現段階において、スタック１３８には何も登録されていないので、ヒットテーブル１３７に現段階で何もコピーされない）。 FIG. 20 is a diagram illustrating the state of the hit table 137 when the event E1 = (Q1, C, 1003) occurs. Note that the values of the stack 138 are copied to A1 to Am corresponding to the row “1003” of the hit table 137 shown in FIG. 20 (since nothing is registered in the stack 138 at this stage, Nothing is copied to the hit table 137 at this stage).

（ＢＩＮファイルの位置「１００４」）
照合処理部１５３は、ＢＩＮファイルの位置「１００４」に対応するデータ「[(6) ブルー /(6)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３１に到達する。イベント構造体３１に到達した時点で、照合処理部１５３は、イベントＥ２＝（Ｑ１、Ａ２、１００４）を発生させる。 (BIN file position “1004”)
The matching processing unit 153 substitutes the data “[(6) Blue / (6)” corresponding to the position “1004” of the BIN file for the automaton. Then, the data reaches the event structure 31 with the node structure 20 as a starting point. When the event structure 31 is reached, the verification processing unit 153 generates an event E2 = (Q1, A2, 1004).

図２１は、イベントＥ２＝（Ｑ１、Ａ２、１００４）が発生した時点でのヒットテーブル１３７の状態を示す図であり、図２２は、イベントＥ２＝（Ｑ２、Ａ２、１００４）が発生した時点でのスタック１３８の状態を示す図である。図２１、図２２に示すように、「Ａ２」の該当位置に「１００４」が登録される。 FIG. 21 is a diagram showing the state of the hit table 137 when the event E2 = (Q1, A2, 1004) occurs. FIG. 22 shows the state when the event E2 = (Q2, A2, 1004) occurs. It is a figure which shows the state of the stack 138 of. As shown in FIGS. 21 and 22, “1004” is registered at the corresponding position of “A2”.

（ＢＩＮファイルの位置「１００５」）
照合処理部１５３は、ＢＩＮファイルの位置「１００５」に対応するデータ「/(3)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３２に到達する。イベント構造体３２に到達した時点で、照合処理部１５３は、イベントＥ３＝（Ｑ１、Ｚ２、１００５）を発生させる。 (BIN file location “1005”)
The matching processing unit 153 substitutes data “/ (3)” corresponding to the position “1005” of the BIN file for the automaton. Then, such data reaches the event structure 32 starting from the node structure 20. When the event structure 32 is reached, the matching processing unit 153 generates an event E3 = (Q1, Z2, 1005).

イベントＥ３＝（Ｑ１、Ｚ２、１００５）が発生すると、照合処理部１５３は、ヒットテーブル１３７を参照し、「Ａ２」が未設定の行を削除する。なお、図２１に示すように、現段階で、ヒットテーブル１３７には、「Ａ２」に値が設定されているため、行の削除は実行されない。また、イベントＥ３＝（Ｑ１、Ｚ２、１００５）が発生すると、照合処理部１５３は、スタック１３８の「Ａ２」をクリアする。 When the event E3 = (Q1, Z2, 1005) occurs, the collation processing unit 153 refers to the hit table 137 and deletes a line in which “A2” is not set. As shown in FIG. 21, since a value is set in “A2” in the hit table 137 at this stage, no line deletion is executed. When event E3 = (Q1, Z2, 1005) occurs, the collation processing unit 153 clears “A2” in the stack 138.

（ＢＩＮファイルの位置「１００６」）
照合処理部１５３は、ＢＩＮファイルの位置「１００６」に対応するデータ「[(3) シグマブルー２」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３４に到達する。イベント構造体３４に到達した時点で、照合処理部１５３は、イベントＥ４＝（Ｑ１、Ｃ、１００６）を発生させる。 (BIN file location “1006”)
The matching processing unit 153 substitutes data “[(3) Sigma Blue 2” corresponding to the position “1006” of the BIN file for the automaton. Then, such data reaches the event structure 34 starting from the node structure 20. When the event structure 34 is reached, the matching processing unit 153 generates an event E4 = (Q1, C, 1006).

図２３は、イベントＥ４＝（Ｑ１、Ｃ、１００６）が発生した時点でのヒットテーブル１３７の状態を示す図である。同図に示すように、ヒットテーブル１３７の「Ｃ」の列に「１００６」が登録される。 FIG. 23 is a diagram showing the state of the hit table 137 when the event E4 = (Q1, C, 1006) occurs. As shown in the figure, “1006” is registered in the “C” column of the hit table 137.

（ＢＩＮファイルの位置「１００７」）
照合処理部１５３は、ＢＩＮファイルの位置「１００７」に対応するデータ「[(6) ブルー /(6)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３１に到達する。イベント構造体３１に到達した時点で、照合処理部１５３は、イベントＥ５＝（Ｑ１、Ａ２、１００７）を発生させる。 (BIN file location “1007”)
The matching processing unit 153 substitutes the data “[(6) Blue / (6)” corresponding to the position “1007” of the BIN file for the automaton. Then, the data reaches the event structure 31 with the node structure 20 as a starting point. When the event structure 31 is reached, the verification processing unit 153 generates an event E5 = (Q1, A2, 1007).

図２４は、イベントＥ５＝（Ｑ１、Ａ２、１００７）が発生した時点でのヒットテーブル１３７の状態を示す図であり、図２５は、イベントＥ５＝（Ｑ１、Ａ２、１００７）が発生した時点でのスタック１３８の状態を示す図である。図２４、図２５に示すように、「Ａ２」の該当位置に「１００７」が登録される。 FIG. 24 is a diagram showing the state of the hit table 137 when the event E5 = (Q1, A2, 1007) occurs, and FIG. 25 shows the state when the event E5 = (Q1, A2, 1007) occurs. It is a figure which shows the state of the stack 138 of. As shown in FIGS. 24 and 25, “1007” is registered at the corresponding position of “A2”.

（ＢＩＮファイルの位置「１００８」）
照合処理部１５３は、ＢＩＮファイルの位置「１００８」に対するデータ「/(3)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３３に到達する。イベント構造体３３に到達した時点で、照合処理部１５３は、イベントＥ６＝（Ｑ１、Ｚ２、１００８）を発生させる。 (BIN file location “1008”)
The collation processing unit 153 substitutes the data “/ (3)” for the position “1008” of the BIN file for the automaton. Then, such data reaches the event structure 33 with the node structure 20 as a starting point. When the event structure 33 is reached, the verification processing unit 153 generates an event E6 = (Q1, Z2, 1008).

イベントＥ６＝（Ｑ１、Ｚ２、１００８）が発生すると、照合処理部１５３は、ヒットテーブル１３７を参照し、「Ａ２」が未設定の行を削除する。なお、図２４に示すように、現段階で、ヒットテーブル１３７には、「Ａ２」に値が設定されているため、行の削除は実行されない。また、イベントＥ６＝（Ｑ１、Ｚ２、１００８）が発生すると、照合処理部１５３は、スタック１３８の「Ａ２」をクリアする。 When the event E6 = (Q1, Z2, 1008) occurs, the collation processing unit 153 refers to the hit table 137 and deletes a line in which “A2” is not set. As shown in FIG. 24, since a value is set in “A2” in the hit table 137 at this stage, no line deletion is executed. When event E6 = (Q1, Z2, 1008) occurs, the collation processing unit 153 clears “A2” in the stack 138.

（ＢＩＮファイルの位置「１００９」）
照合処理部１５３は、ＢＩＮファイルの位置「１００９」に対するデータ「[(5) 浅井達哉 /(5)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点として、イベント構造体３０に到達する。イベント構造体３０に到達した時点で、照合処理部１５３は、イベントＥ７＝（Ｑ１、Ａ１、１００９）を発生させる。 (BIN file position “1009”)
The matching processing unit 153 substitutes the data “[(5) Tatsuya Asai / (5)” for the position “1009” of the BIN file into the automaton. Then, such data reaches the event structure 30 starting from the node structure 20. When the event structure 30 is reached, the matching processing unit 153 generates an event E7 = (Q1, A1, 1009).

図２６は、イベントＥ７＝（Ｑ１、Ａ１、１００９）が発生した時点でのヒットテーブル１３７の状態を示す図であり、図２７は、イベントＥ７＝（Ｑ１、Ａ１、１００９）が発生した時点でのスタック１３８の状態を示す図である。図２６、図２７に示すように、「Ａ１」の該当位置「１００９」が登録される。 FIG. 26 is a diagram showing the state of the hit table 137 when the event E7 = (Q1, A1, 1009) occurs, and FIG. 27 shows the state when the event E7 = (Q1, A1, 1009) occurs. It is a figure which shows the state of the stack 138 of. As shown in FIGS. 26 and 27, the corresponding position “1009” of “A1” is registered.

（ＢＩＮファイルの位置「１０１０」）
照合処理部１５３は、ＢＩＮファイルの位置「１０１０」に対するデータ「/(2)」をオートマトンに代入する。するとかかるデータは、ノード構造体２０を始点として、イベント構造体３２に到達する。イベント構造体３２に到達した時点で、照合処理部１５３は、イベントＥ８＝（Ｑ１、Ｚ１、１０１０）、Ｅ９＝（Ｑ１、Ｄ、１０１０）を発生させる。 (BIN file position “1010”)
The matching processing unit 153 substitutes the data “/ (2)” for the position “1010” of the BIN file into the automaton. Then, such data reaches the event structure 32 starting from the node structure 20. When the event structure 32 is reached, the matching processing unit 153 generates events E8 = (Q1, Z1, 1010), E9 = (Q1, D, 1010).

イベントＥ８＝（Ｑ１、Ｚ１、１０１０）が発生すると、照合処理部１５３は、ヒットテーブル１３７を参照し、「Ａ１」が未設定の行を削除する。なお、図２６に示すように、現段階で、ヒットテーブル１３７には、「Ａ１」に値が設定されているため、行の削除は実行されない。また、イベントＥ８＝（Ｑ１、Ｚ１、１０１０）が発生すると、照合処理部１５３は、スタック１３８の「Ａ１」をクリアする。 When the event E8 = (Q1, Z1, 1010) occurs, the collation processing unit 153 refers to the hit table 137 and deletes a line in which “A1” is not set. As shown in FIG. 26, since a value is set in “A1” in the hit table 137 at this stage, no line deletion is executed. When event E8 = (Q1, Z1, 1010) occurs, the collation processing unit 153 clears “A1” in the stack 138.

イベントＥ９＝（Ｑ１、Ｄ、１０１０）が発生すると、照合処理部１５３は、ヒットテーブル１３７を参照し、ヒットテーブル１３７の「Ｃ」列に登録された位置情報を出力部１２０に出力する。図２６に示す例では、ＢＩＮファイルの位置「１００３」、「１００６」が出力される。かかる位置データが、クエリデータ１３４に該当するデータとなる。なお、イベントＥ９＝（Ｑ１、Ｄ、１０１０）が発生すると、照合処理部１５３は、ヒットテーブル１３７に登録されたデータを削除する。 When the event E9 = (Q1, D, 1010) occurs, the matching processing unit 153 refers to the hit table 137 and outputs the position information registered in the “C” column of the hit table 137 to the output unit 120. In the example shown in FIG. 26, the positions “1003” and “1006” of the BIN file are output. Such position data is data corresponding to the query data 134. When event E9 = (Q1, D, 1010) occurs, collation processing unit 153 deletes data registered in hit table 137.

（ＢＩＮファイル位置「１０１１」）
照合処理部１５３は、ＢＩＮファイルの位置「１０１１」に対応するデータ「/(1)」をオートマトンに代入する。すると、かかるデータは、ノード構造体２０を始点とし、ノード構造体２７に移行した段階で、次に対応する文字が存在しなくなるため、ノード構造体２０に戻り、位置「１０１１」の検索は終了する。 (BIN file position “1011”)
The matching processing unit 153 substitutes data “/ (1)” corresponding to the position “1011” of the BIN file for the automaton. Then, such data starts from the node structure 20 and when it moves to the node structure 27, there is no next corresponding character. Therefore, the data returns to the node structure 20 and the search for the position “1011” is completed. To do.

次に、本実施例にかかる照合処理装置１００の処理について説明する。図２８は、本実施例にかかる照合処理装置１００の処理手順を示すフローチャートである。同図に示すように、照合処理装置１００は、ＸＭＬデータ１３１を取得し（ステップＳ１０１）、パストライ作成部１４１がＸＭＬデータ１３１を基にしてパストライ１３２を作成し（ステップＳ１０２）、ＢＩＮファイル作成部１４２がＸＭＬデータ１３１およびパストライ１３２を基にしてＢＩＮファイルを作成する（ステップＳ１０３）。 Next, the process of the collation processing apparatus 100 according to the present embodiment will be described. FIG. 28 is a flowchart illustrating the processing procedure of the verification processing apparatus 100 according to the present embodiment. As shown in the figure, the collation processing apparatus 100 acquires XML data 131 (step S101), the path trie creation unit 141 creates a path trie 132 based on the XML data 131 (step S102), and a BIN file creation unit. 142 creates a BIN file based on the XML data 131 and the path trie 132 (step S103).

そして、照合処理装置１００は、クエリデータ１３４を取得し（ステップＳ１０４）、クエリデータ１３４に逆行軸が存在するか（あるいは、軸変換が必要であるか）否かを判定する（ステップＳ１０５）。 And the collation processing apparatus 100 acquires the query data 134 (step S104), and determines whether a retrograde axis exists in the query data 134 (or whether axis conversion is necessary) (step S105).

クエリデータ１３４に逆行軸が存在しない場合（あるいは、軸変換が不要な場合）には（ステップＳ１０６，Ｎｏ）、ステップＳ１０８に移行する。ステップＳ１０８に関する説明は後述する。 If there is no retrograde axis in the query data 134 (or if no axis conversion is required) (No in step S106), the process proceeds to step S108. The description regarding step S108 will be described later.

一方、クエリデータ１３４に逆行軸が存在する場合（あるいは、軸変換が必要な場合）には（ステップＳ１０６，Ｙｅｓ）、軸変換処理部１５１がクエリデータ１３４の軸変換処理を実行し（ステップＳ１０７）、クエリデータ１３４の各要素をタグＩＤ（パスＩＤ）に変換する（ステップＳ１０８）。 On the other hand, when the retrograde axis exists in the query data 134 (or when axis conversion is necessary) (Yes in step S106), the axis conversion processing unit 151 executes the axis conversion process of the query data 134 (step S107). ), Each element of the query data 134 is converted into a tag ID (path ID) (step S108).

そして、照合処理装置１００は、オートマトン作成部１５２がクエリデータ１３４を基にしてオートマトンデータ１３６を作成し（ステップＳ１０９）、照合処理部１５３がオートマトンデータ１３６とＢＩＮファイル１３３に基づいて、照合処理を実行する（ステップＳ１１０）。 Then, in the collation processing device 100, the automaton creation unit 152 creates automaton data 136 based on the query data 134 (step S109), and the collation processing unit 153 performs collation processing based on the automaton data 136 and the BIN file 133. Execute (Step S110).

次に、図２８のステップＳ１０７に示した軸変換処理について説明する。図２９は、本実施例にかかる軸変換処理を示すフローチャートである。同図に示すように、軸変換処理部１５１は、クエリデータのパス表現をπとし、兄弟対応テーブル１３５を初期化して（ステップＳ２０１）、πに兄弟軸が存在するか否かを判定する（ステップＳ２０２）。 Next, the axis conversion process shown in step S107 of FIG. 28 will be described. FIG. 29 is a flowchart illustrating the axis conversion process according to the present embodiment. As shown in the figure, the axis conversion processing unit 151 sets π as the path expression of the query data, initializes the sibling correspondence table 135 (step S201), and determines whether or not a sibling axis exists in π ( Step S202).

πに兄弟軸が存在しない場合には（ステップＳ２０３，Ｎｏ）、ステップＳ２０８に移行する。ステップＳ２０８に関する説明は後述する。一方、πに兄弟軸が存在する場合には（ステップＳ２０３，Ｙｅｓ）、πの一番左の兄弟軸に、兄弟軸変換ルールを適用し（ステップＳ２０４）、兄弟対応テーブル１３５に兄弟関係を登録する（ステップＳ２０５）。 If no sibling axis exists in π (step S203, No), the process proceeds to step S208. The description regarding step S208 will be described later. On the other hand, if there is a sibling axis in π (step S203, Yes), the sibling axis conversion rule is applied to the leftmost sibling axis of π (step S204), and the sibling relationship is registered in the sibling correspondence table 135. (Step S205).

そして、軸変換処理部１５１は、同値性ルールが適用できる場合に、πに同値性ルールを適用し（ステップＳ２０６）、クエリデータ１３４のパス表現πを更新する（ステップＳ２０７）。 When the equivalence rule can be applied, the axis conversion processing unit 151 applies the equivalence rule to π (step S206) and updates the path expression π of the query data 134 (step S207).

続いて、軸変換処理部１５１は、πに親軸が存在するか否かを判定し（ステップＳ２０８）、πに親軸が存在しない場合には（ステップＳ２０９，Ｎｏ）、ステップＳ２１３に移行する。ステップＳ２１３に関する説明は後述する。 Subsequently, the axis conversion processing unit 151 determines whether or not a parent axis exists in π (step S208). If no parent axis exists in π (step S209, No), the process proceeds to step S213. . The description regarding step S213 will be described later.

一方、πに親軸が存在する場合には（ステップＳ２０９，Ｙｅｓ）、πの一番左の親軸に、親軸変換ルールを適用し（ステップＳ２１０）、同値性ルールが適用できる場合に、πに同値性ルールを適用し（ステップＳ２１１）、クエリデータ１３４のパス表現πを更新し（ステップＳ２１２）、クエリデータ１３４のパス表現πと兄弟対応テーブル１３５を出力する（ステップＳ２１３）。 On the other hand, when the parent axis exists in π (step S209, Yes), the parent axis conversion rule is applied to the leftmost parent axis of π (step S210), and the equivalence rule can be applied. The equivalence rule is applied to π (step S211), the path expression π of the query data 134 is updated (step S212), and the path expression π of the query data 134 and the sibling correspondence table 135 are output (step S213).

次に、図２８のステップＳ１０９に示したオートマトン作成処理について説明する。図３０は、本実施例にかかるオートマトン作成処理を示すフローチャートである。同図に示すように、オートマトン作成部１５２は、クエリデータ１３４を解析し、述語パスＩＤの集合、分岐パスＩＤの集合、文脈パスＩＤ、キーワード集合を抽出する（ステップＳ３０１）。 Next, the automaton creation process shown in step S109 of FIG. 28 will be described. FIG. 30 is a flowchart illustrating the automaton creation process according to the present embodiment. As shown in the figure, the automaton creation unit 152 analyzes the query data 134 and extracts a set of predicate path IDs, a set of branch path IDs, a context path ID, and a keyword set (step S301).

そして、オートマトン作成部１５２は、オートマトンの初期状態Ini、開始状態Open、終了状態Closeを作成し（ステップＳ３０２）、述語パスＩＤに対応する状態State(ai)を作成し（ステップＳ３０３）、キーワード集合を受理する照合オートマトンを作成しState(ai)に接続する（ステップＳ３０４）。 Then, the automaton creating unit 152 creates an initial state Ini, a start state Open, and an end state Close of the automaton (Step S302), creates a state State (ai) corresponding to the predicate path ID (Step S303), and sets a keyword set. Is created and connected to State (ai) (step S304).

続いて、オートマトン作成部１５２は、Goto(Opne,ai)＝State(ai)に設定し（ステップＳ３０５）、パストライ上におけるaiの任意の子供bに対して、Goto(Close,d)＝State(ai)に設定し（ステップＳ３０６）、分岐パスＩＤに対する状態State（zi）を作成し（ステップＳ３０７）、Goto(Opne,zi)＝State(zi)に設定する（ステップＳ３０８）。 Subsequently, the automaton creating unit 152 sets Goto (Opne, ai) = State (ai) (step S305), and Goto (Close, d) = State ( ai) (step S306), a state State (zi) for the branch path ID is created (step S307), and Goto (Opne, zi) = State (zi) is set (step S308).

そして、オートマトン作成部１５２は、文脈パスＩＤに対する状態State(c)を作成し（ステップＳ３０９）、Goto(Open,c)＝State(c)に設定し（ステップＳ３１０）、評価パスＩＤに対応する状態State(d)を作成し（ステップＳ３１１）、Goto(Open,d)＝State(d)に設定する（ステップＳ３１２）。 Then, the automaton creation unit 152 creates a state State (c) for the context path ID (step S309), sets Goto (Open, c) = State (c) (step S310), and corresponds to the evaluation path ID. A state State (d) is created (step S311), and Goto (Open, d) = State (d) is set (step S312).

次に、図２８のステップＳ１１０に示した照合処理について説明する。図３１は、本実施例にかかる照合処理を示すフローチャートである。同図に示すように、照合処理部１５３は、s＝Ini（初期状態）に設定し（ステップＳ４０１）、ＢＩＮファイル１３３に次の文字ａが存在するか否かを判定し（ステップＳ４０２）、存在しない場合には（ステップＳ４０３，Ｎｏ）、照合処理を終了する。 Next, the collation process shown in step S110 of FIG. 28 will be described. FIG. 31 is a flowchart illustrating the collating process according to the present embodiment. As shown in the figure, the collation processing unit 153 sets s = Ini (initial state) (step S401), determines whether or not the next character a exists in the BIN file 133 (step S402), and If it does not exist (step S403, No), the collation process is terminated.

一方、ＢＩＮファイル１３３に次の文字ａが存在する場合には（ステップＳ４０３，Ｙｅｓ）、ｓ＝Goto(s,a)とし（ステップＳ４０４）、ｓがイベント発生ノードか否か判定する（ステップＳ４０５）。 On the other hand, when the next character a is present in the BIN file 133 (step S403, Yes), s = Goto (s, a) is set (step S404), and it is determined whether or not s is an event occurrence node (step S405). ).

そして、照合処理部１５３は、ｓがイベント発生ノードでない場合には（ステップＳ４０６，Ｎｏ）、ステップＳ４０２に移行する。一方、ｓがイベント発生ノードである場合には（ステップＳ４０６，Ｙｅｓ）、イベント評価処理を実行して（ステップＳ４０７）、ステップＳ４０２に移行する。 Then, when s is not an event occurrence node (No at Step S406), the verification processing unit 153 proceeds to Step S402. On the other hand, when s is an event occurrence node (step S406, Yes), an event evaluation process is executed (step S407), and the process proceeds to step S402.

次に、図３１のステップＳ４０７に示したイベント評価処理について説明する。図３２は、本実施例にかかるイベント評価処理を示すフローチャートである。同図に示すように、照合処理部１５３は、発生したイベントをＥ＝（Ｑ，Ｔ，Ｐ）とし、Ｑのヒットテーブル１３７をＨ（Ｑ）に設定し（ステップＳ５０１）、スタック１３８をStack＝φで初期化する（ステップＳ５０２）。 Next, the event evaluation process shown in step S407 of FIG. 31 will be described. FIG. 32 is a flowchart illustrating event evaluation processing according to the present embodiment. As shown in the figure, the collation processing unit 153 sets the generated event to E = (Q, T, P), sets the Q hit table 137 to H (Q) (step S501), and sets the stack 138 to Stack. = Φ is initialized (step S502).

そして、照合処理部１５３は、Ｔが文脈検出イベントか否かを判定し（ステップＳ５０３）、文脈検出イベントである場合には（ステップＳ５０４，Ｙｅｓ）、ヒットテーブルＨ（Ｑ）に新規エントリ（P、Stack）を追加し（ステップＳ５０５）、イベント評価処理を終了する。 Then, the matching processing unit 153 determines whether or not T is a context detection event (step S503). If T is a context detection event (step S504, Yes), a new entry (P , Stack) is added (step S505), and the event evaluation process is terminated.

一方、Ｔが文脈検出イベントでない場合には（ステップＳ５０４，Ｎｏ）、Ｔが述語受理イベント（Ａｍ）であるか否かを判定し（ステップＳ５０６）、述語受理イベント（Ａｍ）である場合には（ステップＳ５０７，Ｙｅｓ）、ヒットテーブルＨ（Ｑ）の第ｍ項目にＰを記入し、スタックの第ｍ項目にＰを記入し（ステップＳ５０８）、イベント評価処理を終了する。 On the other hand, when T is not a context detection event (step S504, No), it is determined whether T is a predicate acceptance event (Am) (step S506), and when T is a predicate acceptance event (Am). (Step S507, Yes), P is entered in the m-th item of the hit table H (Q), P is entered in the m-th item of the stack (step S508), and the event evaluation process is terminated.

一方、Ｔが述語受理イベント（Ａｍ）でない場合には（ステップＳ５０７，Ｎｏ）、Ｔが述語受理イベント（Ｚｍ）か否かを判定し（ステップＳ５０９）、述語受理イベント（Ｚｍ）である場合には（ステップＳ５１０，Ｙｅｓ）、ヒットテーブルＨ（Ｑ）の全エントリのうち、第ｍ項目が空欄であるものを削除し、スタック１３８の第ｍ項目を削除し（ステップＳ５１１）、イベント評価処理を終了する。 On the other hand, if T is not a predicate acceptance event (Am) (step S507, No), it is determined whether T is a predicate acceptance event (Zm) (step S509), and if T is a predicate acceptance event (Zm). (Yes in step S510), deletes all entries in the hit table H (Q) whose mth item is blank, deletes the mth item in the stack 138 (step S511), and performs event evaluation processing. finish.

一方、Ｔが述語受理イベント（Ｚｍ）でない場合には（ステップＳ５１０，Ｎｏ）、Ｔをクエリ評価イベントと判定し（ステップＳ５１２）、ヒットテーブルＨ（Ｑ）の全エントリを解として出力し（ステップＳ５１３）、ヒットテーブルＨ（Ｑ）をクリアする（ステップＳ５１４）。 On the other hand, if T is not a predicate acceptance event (Zm) (step S510, No), T is determined as a query evaluation event (step S512), and all entries in the hit table H (Q) are output as solutions (step S512). S513), the hit table H (Q) is cleared (step S514).

上述してきたように、本実施例にかかる照合処理装置１００は、パストライ作成部１４１がＸＭＬデータ１３１に基づいてパストライ１３２を作成し、ＢＩＮファイル作成部１４２がＸＭＬデータ１３１およびパストライ１３２に基づいてＢＩＮファイル１３３を作成する。そして、軸変換処理部１５１が軸変換アルゴリズムに基づいて、クエリデータ１３４の軸変換処理を行い、オートマトン作成部１５２が軸変換されたクエリデータ１３４に基づいてオートマトンデータ１３６を作成し、照合処理部１５３がオートマトンデータ１３６にＢＩＮファイル１３３を入力してクエリデータ１３４に該当するデータを出力するので、クエリデータの逆行軸など含まれていても、ストリーム処理によって、ＸＭＬデータ１３１からクエリデータに該当するデータを検索することができる。 As described above, in the collation processing apparatus 100 according to the present embodiment, the path trie creation unit 141 creates the path trie 132 based on the XML data 131, and the BIN file creation unit 142 creates the BIN based on the XML data 131 and the path trie 132. A file 133 is created. Then, the axis conversion processing unit 151 performs the axis conversion processing of the query data 134 based on the axis conversion algorithm, and the automaton creation unit 152 creates the automaton data 136 based on the axis-converted query data 134, and the matching processing unit Since 153 inputs the BIN file 133 to the automaton data 136 and outputs data corresponding to the query data 134, even if the reverse axis of the query data is included, it corresponds to the query data from the XML data 131 by the stream processing. Data can be searched.

また、本実施例にかかる照合処理装置１００は、ＢＩＮファイル作成部１４２が、ＸＭＬデータ１３１の各要素をタグＩＤに変換したＢＩＮファイル１３３を作成し、照合処理部１５３が、タグＩＤに変換されたＢＩＮファイル１３３を用いて数値比較のみの照合処理を行うので、照合処理装置１００にかかる負担を軽減させることができる。 Further, in the collation processing apparatus 100 according to the present embodiment, the BIN file creation unit 142 creates the BIN file 133 in which each element of the XML data 131 is converted into the tag ID, and the collation processing unit 153 is converted into the tag ID. Since the collation process only for the numerical comparison is performed using the BIN file 133, the burden on the collation processing apparatus 100 can be reduced.

また、本実施例にかかる照合処理装置１００は、軸変換処理部１５１が軸変換アルゴリズム（兄弟軸変換ルール、親軸変換ルール、同値性ルール）に基づいて、クエリデータ１３４の全ての軸を子供軸に変換するので、クエリデータ１３４の階層管理が不要になり、高速にクエリデータ１３４に該当するデータを検索することができる。 Further, in the collation processing device 100 according to the present embodiment, the axis conversion processing unit 151 sets all the axes of the query data 134 as children based on the axis conversion algorithm (sibling axis conversion rule, parent axis conversion rule, equivalence rule). Since it is converted into an axis, the hierarchical management of the query data 134 is not required, and data corresponding to the query data 134 can be searched at high speed.

ところで、本実施例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部あるいは一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 By the way, among the processes described in the present embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図４に示した照合処理装置１００の構成は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部がＣＰＵ（あるいは、ＭＣＵ、ＭＰＵ）および当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, the configuration of the collation processing device 100 shown in FIG. 4 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Furthermore, each processing function performed in each device is realized by a CPU (or MCU, MPU) and a program that is analyzed and executed by the CPU, or hardware by wired logic. It can be realized as wear.

図３３は、図４に示した照合処理装置１００が備えるコンピュータのハードウェア構成を示す図である。このコンピュータ６０は、ユーザからのデータの入力を受け付ける入力装置６１、モニタ６２、ＲＡＭ（Random Access Memory）６３、ＲＯＭ（Read Only Memory）６４、記憶媒体からデータを読み取る媒体読取装置６５、ＣＰＵ（Central Processing Unit）６６、およびＨＤＤ（Hard Disk Drive）６７をバス６８で接続して構成される。 FIG. 33 is a diagram illustrating a hardware configuration of a computer included in the collation processing device 100 illustrated in FIG. 4. The computer 60 includes an input device 61 that accepts data input from a user, a monitor 62, a RAM (Random Access Memory) 63, a ROM (Read Only Memory) 64, a medium reader 65 that reads data from a storage medium, and a CPU (Central A processing unit (66) and an HDD (Hard Disk Drive) 67 are connected by a bus 68.

そして、ＨＤＤ６７には、上述した照合処理装置１００の機能と同様の機能を発揮する前処理プログラム６７ｂ、後処理プログラム６７ｃが記憶されている。そして、ＣＰＵ６６が前処理プログラム６７ｂ、後処理プログラム６７ｃをＨＤＤ６７から読み出して実行することにより、上述した照合処理装置１００の機能部の機能を実現する前処理プロセス６６ａ、後処理プロセス６６ｂが起動される。この前処理プロセス６６ａ、後処理プロセス６６ｂは、図４に示した前処理部１４０、後処理部１５０にそれぞれ対応する。 The HDD 67 stores a pre-processing program 67b and a post-processing program 67c that exhibit functions similar to the functions of the verification processing device 100 described above. Then, the CPU 66 reads the pre-processing program 67b and the post-processing program 67c from the HDD 67 and executes them, thereby starting the pre-processing process 66a and the post-processing process 66b that realize the functions of the functional units of the collation processing device 100 described above. . The pre-processing process 66a and the post-processing process 66b correspond to the pre-processing unit 140 and the post-processing unit 150 shown in FIG.

また、ＨＤＤ６７には、上述した照合処理装置１００の記憶部１３０に記憶されるデータに対応する各種データ６７ａが記憶される。この各種データ６７ａは、図４に示したＸＭＬデータ１３１、パストライ１３２、ＢＩＮファイル１３３、クエリデータ１３４、兄弟対応テーブル１３５、オートマトンデータ１３６、ヒットテーブル１３７、スタック１３８に対応する。 In addition, the HDD 67 stores various data 67a corresponding to the data stored in the storage unit 130 of the collation processing device 100 described above. The various data 67a correspond to the XML data 131, the path trie 132, the BIN file 133, the query data 134, the sibling correspondence table 135, the automaton data 136, the hit table 137, and the stack 138 shown in FIG.

ＣＰＵ６６は、各種データ６７ａをＨＤＤ６７に記憶するとともに、各種データ６７ａをＨＤＤ６７から読み出してＲＡＭ６３に格納し、ＲＡＭ６３に格納された各種データ６３ａを利用して照合処理を行う。 The CPU 66 stores various data 67 a in the HDD 67, reads the various data 67 a from the HDD 67, stores it in the RAM 63, and performs a collation process using the various data 63 a stored in the RAM 63.

（付記１）コンピュータに
要素識別子により要素が区切られた階層構造を有する文書データを記憶装置に記憶する文書記憶手順と、
前記記憶装置に記憶された文書データに含まれるデータを検索する検索式を取得した場合に、取得した検索式に対して軸変換を実行し、当該検索式を子供軸によって構成される検索式に変換する軸変換手順と、
前記軸変換手順によって変換された検索式に含まれる要素識別子の種類を識別して当該検索式に対応するオートマトンを作成するオートマトン作成手順と、
前記文書データに含まれるデータと前記オートマトンとを順に照合して前記検索式に該当するデータを出力する照合処理手順と、
を実行させるための照合処理プログラム。 (Supplementary note 1) Document storage procedure for storing document data having a hierarchical structure in which elements are separated by element identifiers in a storage device in a computer;
When a search expression for searching for data included in document data stored in the storage device is acquired, axis conversion is performed on the acquired search expression, and the search expression is converted into a search expression configured by child axes. Axis conversion procedure to convert,
An automaton creation procedure for identifying an element identifier type included in the search expression converted by the axis conversion procedure and creating an automaton corresponding to the search expression;
A collation processing procedure for collating data contained in the document data and the automaton in order and outputting data corresponding to the search formula;
Verification processing program for executing

（付記２）前記軸変換手順は、前記検索式に兄弟軸が存在しているか否かを判定し、兄弟軸が存在している場合に、当該兄弟軸を親軸と子供軸とに変換することを特徴とする付記１に記載の照合処理プログラム。 (Supplementary Note 2) The axis conversion procedure determines whether or not a sibling axis exists in the search expression, and if a sibling axis exists, converts the sibling axis into a parent axis and a child axis. The collation processing program according to appendix 1, characterized in that.

（付記３）前記照合処理手順は、前記文書データに含まれるデータと前記オートマトンとを順に照合していく過程において検出されるデータを一時記憶テーブルに順次記憶し、照合が終了した時点において前記一時記憶テーブルに記憶されているデータを出力することを特徴とする付記１または２に記載の照合処理プログラム。 (Supplementary Note 3) The collation processing procedure sequentially stores data detected in the process of collating the data included in the document data with the automaton in the temporary storage table, and the temporary data is stored when the collation is completed. The collation processing program according to appendix 1 or 2, wherein the data stored in the storage table is output.

（付記４）前記記憶装置に記憶された文書データおよび検索式に含まれる各要素識別子を数値に変換する数値変換手順を更にコンピュータに実行させることを特徴とする付記１〜３のいずれか一つに記載の照合処理プログラム。 (Supplementary note 4) Any one of Supplementary notes 1 to 3, further causing a computer to execute a numerical value conversion procedure for converting each element identifier included in the document data and the search formula stored in the storage device into a numerical value. The verification processing program described in 1.

（付記５）要素識別子により要素が区切られた階層構造を有する文書データを記憶装置に記憶する文書記憶工程と、
前記記憶装置に記憶された文書データに含まれるデータを検索する検索式を取得した場合に、取得した検索式に対して軸変換を実行し、当該検索式を子供軸によって構成される検索式に変換する軸変換工程と、
前記軸変換工程によって変換された検索式に含まれる要素識別子の種類を識別して当該検索式に対応するオートマトンを作成するオートマトン作成工程と、
前記文書データに含まれるデータと前記オートマトンとを順に照合して前記検索式に該当するデータを出力する照合処理工程と、
を含んだことを特徴とする照合処理方法。 (Supplementary Note 5) Document storage step of storing document data having a hierarchical structure in which elements are divided by element identifiers in a storage device;
When a search expression for searching for data included in document data stored in the storage device is acquired, axis conversion is performed on the acquired search expression, and the search expression is converted into a search expression configured by child axes. An axis conversion process to convert;
An automaton creating step of identifying an element identifier type included in the search formula converted by the axis conversion step and creating an automaton corresponding to the search formula;
A collation processing step of sequentially collating data included in the document data and the automaton and outputting data corresponding to the search formula;
The collation processing method characterized by including.

（付記６）要素識別子により要素が区切られた階層構造を有する文書データを記憶する文書記憶手段と、
前記文書記憶手段に記憶された文書データに含まれるデータを検索する検索式を取得した場合に、取得した検索式に対して軸変換を実行し、当該検索式を子供軸によって構成される検索式に変換する軸変換手段と、
前記軸変換手段によって変換された検索式に含まれる要素識別子の種類を識別して当該検索式に対応するオートマトンを作成するオートマトン作成手段と、
前記文書データに含まれるデータと前記オートマトンとを順に照合して前記検索式に該当するデータを出力する照合処理手段と、
を備えたことを特徴とする照合処理装置。 (Supplementary Note 6) Document storage means for storing document data having a hierarchical structure in which elements are divided by element identifiers;
When a search expression for searching for data included in document data stored in the document storage means is acquired, axis conversion is performed on the acquired search expression, and the search expression is configured by a child axis. Axis conversion means for converting to
Automaton creating means for identifying the type of element identifier included in the search expression converted by the axis conversion means and creating an automaton corresponding to the search expression;
Collating processing means for sequentially collating data included in the document data and the automaton and outputting data corresponding to the search formula;
A collation processing device comprising:

以上のように、本発明にかかる照合処理プログラムおよび照合処理装置は、要素識別子により要素が区切られた階層構造を有する文書データから検索式に該当するデータを検索する検索システムなどに有用であり、特に、検索式の構成によらず、高速に検索式に該当するデータを検索する必要がある場合に適している。 As described above, the collation processing program and the collation processing device according to the present invention are useful for a retrieval system that retrieves data corresponding to a retrieval formula from document data having a hierarchical structure in which elements are separated by element identifiers, In particular, it is suitable when it is necessary to search data corresponding to the search formula at a high speed regardless of the configuration of the search formula.

ＸＭＬデータの木表現とＸＭＬデータのストリーム表現とを示す図である。It is a figure which shows the tree representation of XML data, and the stream representation of XML data. 逆行軸を含むクエリの一例を示す図である。It is a figure which shows an example of the query containing a retrograde axis. 図１のＸＭＬデータを図２に示すようなクエリで検索した場合の検索結果を示す図である。It is a figure which shows the search result at the time of searching the XML data of FIG. 1 by the query as shown in FIG. 本実施例にかかる照合処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the collation processing apparatus concerning a present Example. ＸＭＬデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of XML data. 図５に示すＸＭＬデータを木表現で表した場合の図である。It is a figure at the time of expressing the XML data shown in FIG. 5 by a tree expression. パストライのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a path trie. 図７に示した各タグのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of each tag shown in FIG. ＢＩＮファイルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a BIN file. クエリデータとして記憶されるクエリの一例を示す図である。It is a figure which shows an example of the query memorize | stored as query data. 兄弟対応テーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a sibling correspondence table. ヒットテーブルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a hit table. スタックのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a stack. 軸変換処理部の処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of a process of an axis conversion process part. 兄弟軸変換処理を補足説明するための図である。It is a figure for supplementary explanation of brother axis conversion processing. オートマトン作成部の処理を補足説明するための図である。It is a figure for supplementary explanation of processing of an automaton creation part. ノード構造体のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of a node structure. イベント構造体のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of an event structure. 照合処理を説明するためのＢＩＮファイルのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the BIN file for demonstrating collation processing. イベントＥ１＝（Ｑ１、Ｃ、１００３）が発生した時点でのヒットテーブルの状態を示す図である。It is a figure which shows the state of a hit table at the time of event E1 = (Q1, C, 1003) generating. イベントＥ２＝（Ｑ２、Ａ２、１００４）が発生した時点でのヒットテーブルの状態を示す図である。It is a figure which shows the state of a hit table at the time of event E2 = (Q2, A2, 1004) generating. イベントＥ２＝（Ｑ２、Ａ２、１００４）が発生した時点でのスタックの状態を示す図である。It is a figure which shows the state of a stack | stuck at the time of event E2 = (Q2, A2, 1004) generate | occur | produced. イベントＥ４＝（Ｑ１、Ｃ、１００６）が発生した時点でのヒットテーブルの状態を示す図である。It is a figure which shows the state of a hit table at the time of event E4 = (Q1, C, 1006) generating. イベントＥ５＝（Ｑ１、Ａ２、１００７）が発生した時点でのヒットテーブルの状態を示す図である。It is a figure which shows the state of a hit table at the time of event E5 = (Q1, A2, 1007) generating. イベントＥ５＝（Ｑ１、Ａ２、１００７）が発生した時点でのスタックの状態を示す図である。It is a figure which shows the state of a stack | stuck when event E5 = (Q1, A2, 1007) generate | occur | produced. イベントＥ７＝（Ｑ１、Ａ１、１００９）が発生した時点でのヒットテーブルの状態を示す図である。It is a figure which shows the state of a hit table at the time of event E7 = (Q1, A1, 1009) generating. イベントＥ７＝（Ｑ１、Ａ１、１００９）が発生した時点でのスタックの状態を示す図である。It is a figure which shows the state of a stack | stuck when event E7 = (Q1, A1, 1009) generate | occur | produced. 本実施例にかかる照合処理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the collation processing apparatus concerning a present Example. 本実施例にかかる軸変換処理を示すフローチャートである。It is a flowchart which shows the axis conversion process concerning a present Example. 本実施例にかかるオートマトン作成処理を示すフローチャートである。It is a flowchart which shows the automaton creation process concerning a present Example. 本実施例にかかる照合処理を示すフローチャートである。It is a flowchart which shows the collation process concerning a present Example. 本実施例にかかるイベント評価処理を示すフローチャートである。It is a flowchart which shows the event evaluation process concerning a present Example. 図４に示した照合処理装置が備えるコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer with which the collation processing apparatus shown in FIG. 4 is provided. 従来技術の問題点を説明するための図である。It is a figure for demonstrating the problem of a prior art.

Explanation of symbols

６０コンピュータ
６１入力装置
６２モニタ
６３ＲＡＭ
６３ａ，６７ａ各種データ
６４ＲＯＭ
６５媒体読取装置
６６ＣＰＵ
６６ａ前処理プロセス
６６ｂ後処理プロセス
６７ＨＤＤ
６７ｂ前処理プログラム
６７ｃ後処理プログラム
６８バス
１００照合処理装置
１１０入力部
１２０出力部
１３０記憶部
１３１ＸＭＬデータ
１３２パストライ
１３３ＢＩＮファイル
１３４クエリデータ
１３５兄弟対応テーブル
１３６オートマトンデータ
１３７ヒットテーブル
１３８スタック
１４０前処理部
１４１パストライ作成部
１４２ＢＩＮファイル作成部
１５０後処理部
１５１軸変換処理部
１５２オートマトン作成部
１５３照合処理部 60 Computer 61 Input Device 62 Monitor 63 RAM
63a, 67a Various data 64 ROM
65 Medium reader 66 CPU
66a Pre-processing process 66b Post-processing process 67 HDD
67b Pre-processing program 67c Post-processing program 68 Bus 100 Collation processing device 110 Input unit 120 Output unit 130 Storage unit 131 XML data 132 Past tri 133 BIN file 134 Query data 135 Sibling correspondence table 136 Automaton data 137 Hit table 138 Stack 140 Preprocessing unit 141 Path trie creation unit 142 BIN file creation unit 150 Post processing unit 151 Axis conversion processing unit 152 Automaton creation unit 153 Collation processing unit

Claims

A document storage procedure for storing, in a storage device, document data having a hierarchical structure in which elements are separated by element identifiers in a computer;
A search expression for searching for data included in the document data stored in the storage device is acquired, and when a sibling axis exists in the search expression, axis conversion is performed on the search expression to perform axis conversion. If the search formula contains a parent axis , an axis conversion procedure for converting to a search formula composed only of child axes by further performing axis conversion on the search formula that has undergone axis conversion, ,
An automaton creation procedure for identifying an element identifier type included in the search expression converted by the axis conversion procedure and creating an automaton corresponding to the search expression;
A collation processing procedure for collating data contained in the document data and the automaton in order and outputting data corresponding to the search formula;
Verification processing program for executing

The collation processing procedure sequentially stores data detected in the process of sequentially collating data included in the document data and the automaton in a temporary storage table, and stores the data in the temporary storage table when collation is completed. The collation processing program according to claim 1 , wherein the stored data is output.

The collation processing program according to claim 1 or 2 , further causing a computer to execute a numerical value conversion procedure for converting each element identifier included in the document data and the search expression stored in the storage device into a numerical value.

Document storage means for storing document data having a hierarchical structure in which elements are separated by element identifiers;
A search expression for searching for data included in the document data stored in the document storage means is acquired, and when there is a sibling axis in the search expression, axis conversion is performed on the search expression, and axis conversion is performed. Axis conversion means for converting into a search expression composed only of child axes by further performing axis conversion on the search expression that has undergone axis conversion when the search expression that has been performed includes a parent axis When,
Automaton creating means for identifying the type of element identifier included in the search expression converted by the axis conversion means and creating an automaton corresponding to the search expression;
Collating processing means for sequentially collating data included in the document data and the automaton and outputting data corresponding to the search formula;
A collation processing device comprising: