JP3968749B2

JP3968749B2 - XPath processing method, XPath processing device, XPath processing program, and storage medium storing the program

Info

Publication number: JP3968749B2
Application number: JP2002132710A
Authority: JP
Inventors: 真鬼塚
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-05-08
Filing date: 2002-05-08
Publication date: 2007-08-29
Anticipated expiration: 2022-05-08
Also published as: JP2003323429A

Description

【０００１】
【発明の属する技術分野】
本発明は、オートマトンを用いたＸＰａｔｈの処理に関し、殊にＸＰａｔｈの差分追加を行う、ＸＰａｔｈ処理方法、ＸＰａｔｈ処理装置、ＸＰａｔｈ処理プログラム、及びそのプログラムを記憶した記憶媒体に関する。
【０００２】
【従来の技術】
例えば、ＸＭＬ（eXtensible Markup Language）をベースにした新しいニュース配信フォーマットにＮｅｗｓＭＬ（ニューズエムエル）がある。ＮｅｗｓＭＬは、ニュース記事、画像、動画、音声等のニュース素材を自由に組み合わせ、ウェブサイトや携帯電話、テレビ（テレビのデータ放送）等、さまざまな機器を対象に情報を送ることができる。ＮｅｗｓＭＬの受け側（利用者）は、フィルタエンジンに検索条件を登録しておくことで、必要な情報を得ることができる。検索条件はＸＭＬ問い合わせ言語で記述されるため、フィルタエンジンはＸＰａｔｈ群（個々のＸＰａｔｈ）を処理すると共に、ＸＰａｔｈの追加を処理することになる。
【０００３】
【発明が解決しようとする課題】
ところで、従来のＸＰａｔｈの追加方式では、ＸＰａｔｈを処理するための決定性オートマトンが、ＸＰａｔｈの追加の度に別途生成されていた。このため、ＸＰａｔｈの追加の度に決定性オートマトンの数が増えてしまい、その結果フィルタエンジンを構成するコンピュータのメモリ空間を圧迫するという問題があった。殊に、ＸＰａｔｈの追加が多く行われるとその問題が顕著になる。
【０００４】
そこで、本発明は、ＸＰａｔｈの追加に対して、メモリ空間を有効に活用することのできるＸＰａｔｈ処理方法、ＸＰａｔｈ処理装置、ＸＰａｔｈ処理プログラム、及びそのプログラムを記憶した記憶媒体を提供することを主たる目的とする。
【０００５】
【課題を解決するための手段】
前記課題に鑑み本発明者らは鋭意研究を行い、既にあるＤＦＡに対して、ＸＰａｔｈの情報の差分追加を行うことによりメモリ空間の有効活用を図れることに着目し、本発明を完成するに至った。
【０００６】
〔ＸＰａｔｈ処理方法〕
即ち、前記課題を解決した本発明（請求項１）は、オートマトンを用いてＸＰａｔｈを処理するＸＰａｔｈ処理装置におけるＸＰａｔｈ処理方法であって、前記ＸＰａｔｈ処理装置が、個々のＸＰａｔｈから非決定性オートマトン（以下、「ＮＦＡ」という）を生成するステップ、前記生成した非決定性オートマトンを統合して結合非決定性オートマトンを生成し記憶装置に登録するステップ、入力されたＸＭＬデータからＳＡＸイベントを生成するステップ、前記生成したＳＡＸイベントに応じて、前記結合非決定性オートマトンを用いて決定性オートマトン（以下、「ＤＦＡ」という）を生成し記憶装置に登録するステップ、を備えると共に、ＸＰａｔｈの追加がある場合に、前記追加されるＸＰａｔｈを非決定性オートマトンに変換し、前記ＸＰａｔｈの追加による非決定性オートマトンの更新を行った後、予め保持しておいたＸＭＬのルートのタグから現在処理中のタグまでの全ての開始タグを順に前記決定性オートマトンに入力することにより、ＸＰａｔｈの情報を差分追加すること、を特徴とする。
【０００７】
この構成によれば、これまでに生成されているＤＦＡ（既存のＤＦＡ）に追加対象のＸＰａｔｈの情報を差分追加するだけなので、新たなＤＦＡを別に生成する必要がない。このため、メモリ空間の有効活用を図ることができる。なお、後述する発明の実施形態では、ＸＰａｔｈ（個々のＸＰａｔｈ，ＸＰａｔｈ群）は、図２の問い合わせパースモジュールが生成する。ＮＦＡ、結合ＮＦＡ、ＤＦＡは図２のオートマトン管理部（データ抽出モジュール）が生成する。ＸＰａｔｈの追加要求の判断は、図６のフローのステップＳ１４で行う。ＤＦＡに対するＸＰａｔｈの情報の差分追加は、図６のフローのステップＳ１６（ＤＦＡ更新１）とステップＳ２２（ＤＦＡ更新２）で行う処理が該当する。この構成においては、例えば後述する実施形態の図６・図７に示すフロー（ＤＦＡ更新１）のようにして、追加対象のＸＰａｔｈの情報が既存のＤＦＡに差分追加される。なお、後述する実施形態では、ｌａｂｅｌＳｔａｃｋが現状入力したＳＡＸイベントまでに完了していない（処理中の）要素名リストを保有する（タグは要素の位置を明示する）。また、ｒｏｏｔがＤＦＡのルート状態を表現する。また、ｃｕｒｒｅｎｔがＤＦＡにおける現状の状態を表現する。ちなみに、「開始タグを順に」における「順」とは、昇順、降順、ランダム順（順序を特定しない）等がある。
【００１０】
また、本発明（請求項２）は、請求項１の構成において、前記ＤＦＡにおける各状態において、その状態が何件目のＸＰａｔｈまでの情報を元に更新されたかを示す状態フラグに基づいて、前記追加されるＸＰａｔｈの情報を差分追加すること、を特徴とする。
【００１１】
この構成によれば、状態フラグにより、更新すべき部分に対してムダなく確実に更新を行うことができる。なお、後述する実施形態では、状態フラグはｖａｒＩｄが相当する。
【００１４】
また、本発明（請求項３）は、請求項１又は請求項２の構成において、前記ＤＦＡにおける状態推移を行う際、現在のＤＦＡにおける各状態の状態フラグをチェックし、その状態が、最新のＸＰａｔｈを元に更新されていない場合は、前記状態フラグを用いてその状態が更新された後に追加されたＸＰａｔｈを特定し、該特定したＸＰａｔｈに対応するＮＦＡの状態を用いて決定性オートマトンを更新すること、を特徴とする。
【００１５】
この構成においては、例えば後述する実施形態の図１３のように、ＸＰａｔｈが追加される。
【００１６】
〔ＸＰａｔｈ処理装置〕
また、本発明のＸＰａｔｈ処理装置（請求項４）は、入力されたＸＭＬデータからＳＡＸイベントを生成する手段と、利用者の検索条件に基づいた個々のＸＰａｔｈをＮＦＡに変換し、該生成したＮＦＡを統合して結合ＮＦＡを生成すると共に記憶装置に登録する手段と、前記ＳＡＸイベントに応じて、前記登録した結合ＮＦＡを用いてＤＦＡを生成し、記憶装置に登録する手段と、ＸＰａｔｈの追加要求があると、前記ＸＰａｔｈの追加によるＮＦＡの更新を行った後、予め保持しておいたＸＭＬのルートのタグから現在処理中のタグまでの全ての開始タグを順に前記ＮＦＡに入力することにより、ＸＰａｔｈの情報を差分追加する手段と、を備えたことを特徴とする。
【００１７】
この構成によれば、ＸＰａｔｈの情報が既存のＤＦＡに差分追加されるだけですむので、複数のＤＦＡを生成する必要がない。よって、メモリ空間の有効活用が図れる。なお、後述する実施形態では、フィルタエンジンがＸＰａｔｈ処理装置に相当する。また、ＸＭＬパースモジュールがＳＡＸイベントを生成する手段に相当する。また、問い合わせパースモジュールが、個々のＸＰａｔｈの生成、追加されるＸＰａｔｈの生成を行う手段に相当する。また、オートマトン管理部が、ＸＰａｔｈのＮＦＡへの変換、結合ＮＦＡの生成、ＤＦＡの生成、ＤＦＡに対するＸＰａｔｈの情報の差分追加を行う手段に相当する。また、この構成においては、例えば後述する実施形態の図６・図７に示すフロー（ＤＦＡ更新１）のようにして、追加対象のＸＰａｔｈの情報が既存のＤＦＡに差分追加される。
【００２０】
〔ＸＰａｔｈ処理プログラム〕
また、本発明のＸＰａｔｈ処理プログラム（請求項５）は、オートマトンを用いてＸＰａｔｈを処理するため、コンピュータであるＸＰａｔｈ処理装置に、請求項１ないし請求項３のいずれか１項に記載のＸＰａｔｈ処理方法を実行させることを特徴とする。
【００２１】
この構成によれば、このプログラムをインストールされたコンピュータに、プログラムに基づいた各ステップを実行させ、ＸＰａｔｈの処理（ＤＦＡの差分更新）を行う。
【００２２】
〔ＸＰａｔｈ処理プログラムを記憶した記憶媒体〕
また、本発明のＸＰａｔｈ処理プログラムを記憶した記憶媒体（請求項６）は、請求項５に記載のＸＰａｔｈ処理プログラムを記憶する構成とした。
【００２３】
このＸＰａｔｈ処理プログラムは、例えばＣＤ−ＲＯＭ等の記録媒体に複写・記録されて市場を流通されたり、例えばネットワーク上を伝送されたりする。そして、このプログラムをインストールされたコンピュータにＸＰａｔｈの処理（ＤＦＡの差分更新）を実行させる。
【００２４】
【発明の実施の形態】
以下、本発明のＸＰａｔｈ処理方法の実施の形態（実施形態）を、図面を参照して詳細に説明する。なお、以下説明するＸＰａｔｈ処理方法は、ＸＰａｔｈ処理装置、ＸＰａｔｈ処理プログラムを具現化したものでもある。
【００２５】
図１は、ＸＰａｔｈ処理方法が適用されるフィルタエンジンの概要を示す図である。
図示しないデータ提供者によりＸＭＬ形式に従って生成されたＸＭＬデータが、ＸＰａｔｈ処理方法を実行するフィルタエンジン１０にイントラネット等のネットワークを経由して送信される。フィルタエンジン１０には、ＸＭＬデータ受け取る個々の利用者が、自分の欲しいデータの条件（個人プロファイル）を、従来例のごとく、ＸＭＬ問い合わせという形式でフィルタエンジン１０に予め登録している。
【００２６】
フィルタエンジン１０は、登録されている個人プロファイルに応じて送られてくるニュースソース等のＸＭＬデータをフィルタ・変換して個々の利用者にＸＭＬデータとして配信する。ニュースソース等のＸＭＬデータの具体例としては、ＮｅｗｓＭＬがある。ＮｅｗｓＭＬは前記のとおり、ＸＭＬをベースにした新しいニュース配信フォーマットであり、ニュース記事、画像、動画、音声等のニュース素材を自由に組み合わせ、ウェブサイトや携帯電話等さまざまな機器を対象に情報を送ることができる。また、ニュース記事、画像、動画、音声等のさまざまなニュース素材を構造化して一元管理するのに適する。
【００２７】
なお、フィルタエンジン１０は、ＸＰａｔｈ処理装置及びＸＰａｔｈ処理プログラムを内包するものでもある。ちなみに、ＸＭＬは、インターネットの標準としてＷ３Ｃ（World Wide Web Consortium）により勧告されたメタ言語である。メタ言語は、言語を作る言語という意味である。ＸＭＬデータ（ＸＭＬ文書[XML Document]ともいう）は、ＸＭＬによって作られた言語を用いて作成された文書やデータである。
【００２８】
図２は、図１のフィルタエンジン１０の内部構成を示すブロック図である。この図２に示すように、フィルタエンジン１０は、ＸＭＬパースモジュール１１、問い合わせパースモジュール１２、データ抽出モジュール１３、記憶装置１４、オートマトン管理部１５、データ変換モジュール１６を含んで構成される。なお、データ抽出モジュール１３と記憶装置１４を含んでオートマトン管理部１５が構成されるものとする。ちなみに、フィルタエンジン１０は、ＣＰＵ及びＲＡＭから構成される主制御装置、ハードディスク等から構成される外部記憶装置、通信を行うためのＮＩＣ（Network Interface Card）を有するコンピュータと、ルータ（Router）を含んで構成される。
【００２９】
ＸＭＬパースモジュール（ＸＭＬパーサ）１１は、入力されるＸＭＬデータをパースして内部形式ＸＭＬデータ（ＳＡＸイベント）に変換し、データ抽出モジュール１３ヘ出力する。なお、パースとは、テキスト形式で記述されたＸＭＬデータを読み込んで、ＸＭＬのタグで指定された文書要素や属性等を解析する解析処理である（本発明においてはパースの手順等は特に限定するものではない）。ちなみに、ＸＭＬパースモジュール１１を通してＸＭＬデータを操作するためのＡＰＩ（Application Programming Interface）には、ＤＯＭ（Document Object Mode）とＳＡＸ（Simple API for XML）という２種類の標準インターフェースがある。本実施形態では、ＸＭＬパースモジュール１１は、後者のＳＡＸに対応している。なお、ＳＡＸに対応したＸＭＬパースモジュール１１は、ＸＭＬデータを順次シーケンシャルに読み込みつつ、ＸＭＬのタグ（開始タグ、終了タグ、空要素タグ）を検出するごとにアドインされた各種ハンドラを起動する。ここでハンドラとは、ＳＡＸインターフェースに基づいてＸＭＬデータの各要素を処理するためのメソッドを定義したプログラムである。また、タグとは、ＸＭＬデータにおいて、要素の位置を明示し、属性を収納するために記述される文字列である。
【００３０】
問い合わせパースモジュール１２は、追加される個人プロファイル（ＸＭＬ問い合わせ言語で記述される検索条件）をパース（解析処理）し、「データ変換操作」とデータ抽出操作である「ＸＰａｔｈ（ＸＰａｔｈ群・個々のＸＰａｔｈ）」とに分離する。ＸＰａｔｈはデータ抽出モジュール１３ヘ出力され、データ変換操作はデータ変換モジュール１６ヘ出力される。なお、ＸＰａｔｈ（XML Path Language）は、ＸＭＬデータの特定の部分を指し示す言語である。ＸＰａｔｈを利用すれば、ＸＭＬデータ中にアンカ等が埋め込まれていなくとも、データ中の任意の位置を指し示すことができる。
【００３１】
データ抽出モジュール１３は、問い合わせパースモジュール１２から入力される個々のＸＰａｔｈをＮＦＡに変換（ＮＦＡを生成）し、記憶装置１４に登録／追加する。更にデータ抽出モジュール１３は、ＸＭＬパースモジュール１１から入力されるＳＡＸイベントに応じて、ＮＦＡを用いて順次ＤＦＡを生成／更新する。そして、データ抽出モジュール１３は、ＤＦＡを用いて、入力されたＸＭＬデータから抽出された部分ＸＭＬを内部形式ＸＭＬデータ（フィルタされた後の内部形式ＸＭＬデータ）としてデータ変換モジュール１６ヘ出力する。
【００３２】
ちなみに、オートマトン（Automaton）とは、コンピュータ等の計算機構を数学的に表すモデルの総称であり、入力、出力、状態をもつ。このうち、ＤＦＡ（決定性オートマトン）は、入力に対する推移先（遷移先）が１つに決まるオートマトンである。一方、ＮＦＡ（非決定性オートマトン）は、ある状態において入力に対する推移先（遷移先）が複数存在するオートマトンである。
【００３３】
なお、データ抽出モジュール１３と記憶装置１４を含んで構成されるオートマトン管理部１５の詳細な機能は、図面を参照して後で詳細に説明する。
【００３４】
データ変換モジュール１６は、データ変換操作と抽出された内部形式のＸＭＬデータとから所定の変換を実行し、その結果をフィルタされたＸＭＬデータ（変換後ＸＭＬデータ）として出力する。なお、所定の変換は本発明においては特に限定するものではない。
【００３５】
次に、オートマトン管理部１５を詳細に説明する。
図３は、図２におけるオートマトン管理部１５のメモリ上でのデータ構成を示した図である。なお、ＤＦＡの詳細は図５に示す。
【００３６】
図３に示すように、個々のＸＰａｔｈごとにＮＦＡが生成される。こうして生成された複数のＮＦＡは１つのノードにより結合され、そのルートからイプシロンエッジにより個々のＮＦＡに接続される（結合ＮＦＡ）。なお、イブシロンエッジとは、例えばオートマトンにおいて通常定義される”空文字列”のことである。
ＤＦＡは、結合（統合）されたＮＦＡを用い、ＸＭＬデータの入力に応じて必要な状態を順次生成／更新する。
【００３７】
図４は、図３の具体的なプログラム上でのデータ構造を示す図である。この図４において、Ｖａｒｉａｂｌｅクラス（class Variable）は、個々のＸＰａｔｈごとにインスタンスを生成するクラスであり、次のような属性を有する。
＊そのインスタンスごとに異なる内部識別子である”ｉｄ”
＊個々のＸＰａｔｈ表現である”ｘｐａｔｈ”
＊Ｖａｒｉａｂｌｅを利用者が区別するための名称である”ｖａｒＮａｍｅ”なお、インスタンス（Instance）とは、例えばオブジェクト指向プログラミングにおいて、あるクラスの定義をひな型として実際に作られたオブジェクトのことをいう。
【００３８】
Ｓｔａｔｅクラス（class State）は、ＮＦＡの状態を表現するクラスであり、次のような属性を有する。
＊自身の状態から他の状態への推移を表現するエッジの集合”ｅｄｇｅｓ”
＊自身の状態が終端か非終端かを示す”ｔｙｐｅ”
なお、ＸＰａｔｈの表現をＮＦＡに変換した場合、最後尾の状態は終端であり、それ以外の状態は非終端であるとする。
＊自身の状態がどのＶａｒｉａｂｌｅインスタンスから（つまりそのｘｐａｔｈから）生成されたかを示す”ｖａｒ”
【００３９】
ＤＦＡＳｔａｔｅクラス（class DFAState）はＤＦＡの状態を表現し、Ｓｔａｔｅクラスをオブジェクト指向的な継承を用いて定義され、次のような属性を有する。
＊ＮＦＡを用いてＤＦＡを生成する際に必要になる、ＮＦＡの状態群を表現する”ｓｔａｔｅｓ”
＊ＸＰａｔｈの追加処理を高速化するため、自状態がどのＶａｒｉａｂｌｅインスタンスに相当するＸＰａｔｈまでを考慮して生成されたかを示すＶａｒｉａｂｌｅの最大ｉｄである”ｖａｒＩｄ”
このｖａｒＩｄが請求項の「状態フラグ」に相当し、ＤＦＡにおける各状態において、それが何件目のＸＰａｔｈまでの情報を元に更新されたかを表現する。
なお、ｅｄｇｅｓはＳｔａｔｅクラスのものを継承しているが、処理の高速化を図るため、ｌｉｓｔではなくｈａｓｈやｍａｐ構造を用いて再定義することも可能である。
【００４０】
Ｅｄｇｅクラス（class Edge）はＮＦＡの状態Ｓｔａｔｅ間のエッジ、もしくはＤＦＡの状態間のエッジを表現し、次のような属性を有する。
＊自身のエッジのエッジ先である状態”ｔｏ”
＊自身のエッジのエッジ元である状態”ｆｒｏｍ”
【００４１】
その他、大域的な変数として次の変数を利用する。
＊Ｖａｒｉａｂｌｅインスタンスのｉｄの払い出しをするため、最新に生成されたＶａｒｉａｂｌｅインスタンスのｉｄを表す”ｇＭａｘＶａｒＩｄ”
＊ＳＡＸイベントであるｅｎｄＥｌｅｍｅｎｔイベントを処理する際には、オートマトンの状態推移を戻す必要があり、結果としてオートマトンのルート状態から現状の状態までのパスを管理する必要がある。この状態リストを表す”ｓｔａｔｅＳｔａｃｋ”
＊本実施形態ではＸＰａｔｈの差分追加のため、現状入力したＳＡＸイベントまでに完了していない（つまりｓｔａｒｔＥｌｅｍｅｎｔを入力したがｅｎｄＥｌｅｍｅｎｔが未入力である）要素名リストを保有する。この要素名リストを表す”ｌａｂｅｌＳｔａｃｋ”
＊ＤＦＡのルート状態を表現する”ｒｏｏｔ”
＊ＤＦＡで現状の状態を表現する”ｃｕｒｒｅｎｔ”
【００４２】
図５は、図４のＤＦＡのメモリ上でのデータ構造を図示した図である。但し、この図５では図４と異なり、”ｅｄｇｅｓ”を実現するためにｌｉｓｔではなくｍａｐ構造を利用している。
【００４３】
この図５の左に位置するＤＦＡＳｔａｔｅはＤＦＡＳｔａｔｅクラスのインスタンスであり、ＤＦＡの１状態を表現している。この図５の状態ではその属性値であるｓｔａｔｅｓ，ｅｄｇｅｓ，ｔｙｐｅ，ｖａｒ，ｖａｒＩｄの値がそれぞれ設定されている。
ここで、ｓｔａｔｅｓはＮＦＡの状態群を表す。ｅｄｇｅｓはエッジのラベルとエッジインスタンスヘの参照の組を表現している。ｔｙｐｅは非終端であるので”ｎｏｎ−ｔｅｒｍｉｎａｌ”になっている。ｖａｒはＶａｒｉａｂｌｅインスタンスヘの参照である。この例では、ｖａｒＩｄは３であり、３番目のＶａｒｉａｂｌｅ（ＸＰａｔｈ）までの状態をこのＤＦＡＳｔａｔｅが反映していることを意味している。
【００４４】
図５の中間（左中間）に位置するのがＥｄｇｅのインスタンス群である。
また、図５の右中間に位置するのがＥｄｇｅの稚移先に相当するＤＦＡＳｔａｔｅインスタンス群である。なお、図５に示すように、Ｅｄｇｅの推移先はＮｕｌｌ（ＤＦＡＳｔａｔｅが未生成）であることも許されている。
【００４５】
〔処理フロー〕
図６〜図９は、本実施形態における処理フローの説明である。以下フローを参照して、本発明をより具体的に説明する。
【００４６】
図６はＸＰａｔｈ処理方法の全体の処理フローである。
一番上位のステップＳ１１の処理では、ｇＭａｘＶａｒＩｄ，ｓｔａｔｅＳｔａｃｋ，ｌａｂｅｌＳｔａｃｋを初期化する。なお、ｇＭａｘＶａｒＩｄにはＸＰａｔｈの件数が設定される。その次のステップＳ１２の処理では、初期登録された個々のＸＰａｔｈ（ＸＰａｔｈ群）からＮＦＡを生成し、オートマトン管理部１５（記憶装置１４）へ登録（更新）する。
【００４７】
その次のステップＳ１３の処理では、生成したＮＦＡ群を統合して結合ＮＦＡを生成する（図３の中段図参照）。また、ＤＦＡのルート状態も生成し、ｒｏｏｔＳｔａｔｅ，ｃｕｒｒｅｎｔに設定する。これらの設定はオートマトン管理部１５の記憶装置１４に記憶される。
【００４８】
その次の選択条件処理では、ＸＰａｔｈの追加要求の有無を判定している。具体的には、ステップＳ１４でＸＰａｔｈの追加要求がある場合は、ステップＳ１５でＸＰａｔｈの追加を受け付け、ステップＳ１６で”ＤＦＡ更新１”のサブルーチン処理を行う。”ＤＦＡ更新１”は図７に示される処理であるが、説明は後で行う。一方、ステップＳ１４でＸＰａｔｈの追加要求がない場合は、ステップＳ１７に移行する。
【００４９】
ステップＳ１７の選択条件処理では、入力ＸＭＬデータからのＳＡＸイベントを待っている。ステップＳ１８では、ＳＡＸイベントがある場合に、該ＳＡＸイベントがｅｎｄＥｖｅｎｔかｓｔａｒｔＥｖｅｎｔかを判断する。ＳＡＸイベントがｓｔａｒｔＥｖｅｎｔ（ｓｔａｒｔＥｌｅｍｅｎｔ，ｓｔａｒｔＤｏｃｕｍｅｎｔ）の場合は、該ＳＡＸイベントに対応する推移先の状態がＤＦＡにあるか否かを判断する（ステップＳ１９）。移動先の状態が未生成（Ｎｕｌｌ）の場合（ない場合、No）は、ステップＳ２０でＮＦＡを参照してＤＦＡの推移先の状態を生成する。さらにステップＳ２０では、状態フラグであるｖａｒＩｄに、ｇＭａｘＶａｒＩｄを設定する。これによりｖａｒＩｄは、ＤＦＡにおける各状態において、それが何件目のＸＰａｔｈまでの情報を元に更新されたかを表現する。これらは、オートマトン管理部１５の更新に反映される。そして、ステップＳ２３に移行する。
【００５０】
一方、ステップＳ１９で、ＳＡＸイベントに対応する推移先の状態がＤＦＡにある場合（生成済み、Yes）は、推移後のｓｔａｔｅのｖａｒＩｄがｇＭａｘＶａｒＩｄに等しいかを判断する（ステップＳ２１）。
【００５１】
また、ステップＳ２１で判断して等しくない場合（No）は、ステップＳ２２で”ＤＦＡ更新２”の処理を行ってからステップＳ２３に移行する。なお、”ＤＦＡ更新２”は図８に示される処理であるが、説明は後で行う。また、ステップＳ２１で判断して、ｖａｒＩｄとｇＭａｘＶａｒＩｄが等しい場合（Yes）は、そのままステップＳ２３に移行する。
【００５２】
ステップＳ２３では、ｃｕｒｒｅｎｔをｓｔａｔｅＳｔａｃｋにスタック（１件追加）し、ＳｔａｒｔＥｖｅｎｔのタグ名をｌａｂｅｌＳｔａｃｋにスタック（１件追加）し、ｃｕｒｒｅｎｔを推移後のＳｔａｔｅに設定する。また、必要に応じてＳＡＸイベントを出力する。そして、ステップＳ２５に移行して結果を出力する（内部形式ＸＭＬデータを出力）。
【００５３】
ところで、ステップＳ１８でＳＡＸイベントがｅｎｄＥｖｅｎｔ（ｅｎｄＥｌｅｍｅｎｔ，ｅｎｄＤｏｃｕｍｅｎｔ）の場合は、ステップＳ２４でｓｔａｃｋＳｔａｔｅをポップし、結果をｃｕｒｒｅｎｔに設定し、ｌａｂｅｌＳｔａｃｋをポップする。また、必要に応じてＳＡＸイベントを出力する。なお、ポップとは最後に追加したものを抽出することである。ステップＳ２４を処理した後は、ステップＳ２５に移行して結果を出力する（内部形式ＸＭＬデータを出力）。
【００５４】
そして、ステップＳ２６で終了か否かを判断して、終了でない場合（No）は、ステップＳ１４に移行して処理を継続する。終了である場合（Yes）は、Ｅｎｄに移行して処理を終了する。
【００５５】
図７は、前記した図６のステップＳ１６におけるサブルーチン”ＤＦＡ更新１”の処理フローである。
最初のステップＳ１６１の処理は、図６のフローチャートに示すように、ＸＰａｔｈの１件追加（新規ＸＰａｔｈの追加）であることからｇＭａｘＶａｒＩｄに１を追加してインクリメントする。次のステップＳ１６２の処理では、追加された新規ＸＰａｔｈ（追加されるＸＰａｔｈ）からＮＦＡを生成し、オートマトン管理部１５の結合ＮＦＡに追加（挿入）する。
【００５６】
次のステップＳ１６３の処理では、以降で用いるループ変数Ｉを初期化（Ｉ＝０）する。また、生成されたＮＦＡのルート状態を一時変数ｕｓｔａｔｅｓに設定し、ｒｏｏｔＳｔａｔｅのｓｔａｔｅｓにｕｓｔａｔｅｓを追加し、ｕｓｔａｔｅＳから発するエッジ（群）をｒｏｏｔＳｔａｔｅのｅｄｇｅｓに追加し、ｒｏｏｔＳｔａｔｅのｖａｒＩｄにｇＭａｘＶａｒＩｄを設定し、ｃｕｒｒｅｎｔにｒｏｏｔＳｔａｔｅを設定する。
【００５７】
次のステップＳ１６４の判定処理では、ループ変数ＩがｌａｂｅｌＳｔａｃｋ．ｓｉｚｅより小さいかどうか（未満かどうか）を判定する。小さい場合（Yes）は、ステップＳ１６５で該Ｉ番目の要素をｌａｂｅｌＳｔａｃｋから取り出し、ｓｔａｒｔイベント（ｓｔａｒｔＥｌｅｍｅｎｔ，ｓｔａｒｔＤｏｃｕｍｅｎｔ）を発生させる。
【００５８】
次のステップＳ１６６の判定処理では、発生したＳＡＸイベントにより推移する推移先の状態のｖａｒＩｄとｇＭａｘＶａｒＩｄの等価性を判断する。具体的には、両者が等しい場合は、ステップＳ１６８の処理に進む。等しくない場合は、ステップＳ１６７で、推移先の状態（ｓｔａｔｅｓ，ｅｄｇｅｓ，ＶａｒＩｄ）を下記のように更新する。
【００５９】
即ち、ステップＳ１６７では、一時変数ｎｅｗＳｔａｔｅｓにｔｒａｎｓｉｔ（ｓｔａｔｅｓ，推移先ｖａｒＩｄ，入力ＳＡＸイベント）の結果を設定する。ｔｒａｎｓｉｔの処理については図９を参照して説明する。このＳ１６７では、結果として、現状のＤＦＡの状態が更新される。
【００６０】
具体的には、推移先のｓｔａｔｅｓにｎｅｗＳｔａｔｅｓを追加する。推移先のｅｄｇｅｓにｇｅｔＥｄｇｅｓ（ｎｅｗＳｔａｔｅｓ）の結果を追加する。推移先のｖａｒＩｄにｇＭａｘＶａｒＩｄを設定する。これは推移先の状態が更新完了であることを意味する。ちなみに、ｇｅｔＥｄｇｅｓは、パラメータｎｅｗＳｔａｔｅｓに接続される全てのエッジ群を抽出する関数である。
【００６１】
次のステップＳ１６８の処理では、ｃｕｒｒｅｎｔをｓｔａｔｅＳｔａｃｋにスタックし、タグ名をｌａｂｅｌＳｔａｃｋにスタックし、ｃｕｒｒｅｎｔを推移後の状態に設定する。
【００６２】
次のステップＳ１６９の処理では、ループ変数Ｉをインクリメントし、ステップＳ１６４のループ判定の判定処理に戻る。ステップＳ１６４で、ループ変数ＩがｌａｂｅｌＳｔａｃｋ．ｓｉｚｅ以上の場合（No）は、処理を柊了し図６のステップＳ１６に戻る（Return，ステップＳ１７０）。
【００６３】
図８は、前記した図６のステップＳ２２におけるサブルーチン”ＤＦＡ更新２”の処理を示したフローである。
図８の処理では、推移先の状態のｓｔａｅｓ、ｅｄｇｅｓ，ｖａｒＩｄを更新する。即ち、一時変数ｎｅｗＳｔａｔｅｓにｔｒａｎｓｉｔ（ｓｔａｔｅｓ，推移先ｖａｒＩｄ，入力ＳＡＸイベント）の結果を設定する。ｔｒａｎｓｉｔの処理については図９を参照して説明する。この”ＤＦＡ更新２”では、結果として現状のＤＦＡの状態を更新する。
【００６４】
具体的には、”ＤＦＡ更新２”では、推移先のｓｔａｔｅｓにｎｅｗＳｔａｔｅｓを追加する。推移先のｓｔａｔｅｓのエントリの重複排除をする。推移先のｅｄｇｅｓにｇｅｔＥｄｇｅｓ（ｎｅｗＳｔａｔｅｓ）の結果を追加する。推移先のｖａｒＩｄにｇＭａｘＶａｒＩｄを設定する。これは推移先の状態が更新完了であることを意味する。ちなみに、ｇｅｔＥｄｇｅｓは、パラメータｎｅｗＳｔａｔｅｓに接続される全てのエッジ群を抽出する関数である。
【００６５】
図９は、図７及び図８に用いられている関数ｔｒａｎｓｉｔの処理を示したフローを示す図である。
図９のフローにおけるステップＳ４１での入力（パラメータ）は、ｓｔａｔｅｓ（ＮＦＡの状態群），ＶａｒＩｄ（Ｖａｒｉａｂｌｅインスタンスの属性ｉｄ値），入力ＳＡＸイベントの３つである。
【００６６】
次のステップＳ４２では、ループ変数Ｉを初期化（Ｉ＝０）し、演算結果格納領域ｒｓｔａｔｅｓを初期化する。次のステップＳ４３の判定処理では、ループ変数Ｉがｓｔａｔｅｓ．ｓｉｚｅより小さいかどうかを判定する。Ｉがｓｔａｔｅ．ｓｉｚｅより小さい場合（Yes）は、ステップＳ４４でＩ番目の要素をｓｔａｔｅｓから取り出して一時変数ｓｔａｔｅに設定し、その属性ｖａｒを一時変数ｖｖａｒに設定する。
【００６７】
次のステップＳ４５の判定処理では、ｖｖａｒのｉｄが推移先ｖａｒＩｄより大きいかどうか判定する。ｖｖａｒのｉｄが推移先ｖａｒＩｄを超える場合（Yes）は、ステップＳ４６のように、入力ＳＡＸイベントによりｓｔａｔｅから推移できるＮＦＡの状態を全てｒｓｔａｔｅｓに追加し、ステップＳ４７の処理に進む。一方、ｖｖａｒのｉｄが推移先ｖａｒＩｄ以下の場合（No）は、そのままステップＳ４７の処理に進む。
【００６８】
次のステップＳ４７の処理では、ループ変数Ｉをインクリメントし、ステップＳ４３に移行してループの判定条件に戻る。ステップＳ４３では、ループ変数Ｉがｓｔａｔｅｓ．ｓｉｚｅより小さくない場合（No）は、ｔｒａｎｓｉｔ関数を終了し、ｒｅｓｔａｔｅｓを返却する。つまり、図７及び図８のＮｅｗＳｔａｔｅｓに代入される。
【００６９】
〔動作に対応したオートマトン管理部の状態〕
次に、本実施形態のＸＰａｔｈ処理方法の動作例をより具体的に説明する。
図１０〜図１４は、ＸＰａｔｈ処理方法の動作例を説明するために引用した図である。このうち、図１０はＸＰａｔｈ処理方法の動作の時点を示す図である。図１１〜図１４はそれぞれの時点において変化するオートマトン管理部の内部の様子を例示する。
【００７０】
まず図１０では、本実施形態でのＸＰａｔｈ処理方法における次の１．〜４．の４つの動作の時点を示している。
１．２つのＸＰａｔｈ”／ａ／ｂ”，”／ａ／／ｃ”が登録された時点
２．その後ＸＭＬを途中（＜ａ＞＜ｂ＞＜ｃ／＞＜ｄ＞）まで入力した時点
３．ＸＰａｔｈ”／ａ／／ｄ”を追加した時点
４．その後ＸＭＬの入力を継続（ｔｅｓｔ＜／ｄ＞＜／ｂ＞＜ｄ／＞＜／ａ＞）した時点
【００７１】
図１１は、図１０の１．の時点におけるオートマトン管理部の内部の様子を例示した図である。この図１１の上段図に示すように、ＸＰａｔｈは、”／ａ／ｂ”と”／ａ／／ｃ”の２件が登録されている。ＸＰａｔｈは、前記したとおり、問い合わせパースモジュール１２により個人プロファイルから分離・生成される（図２参照）。
【００７２】
図１１の中段図のＮＦＡは、ルートノードに上段図の２件のＸＰａｔｈが変換されて接続されたＮＦＡ（結合ＮＦＡ）が図示されている。この中段図では７個の状態とそれらを接続する６個のエッジが図示されている。ここで二重の円は終端の状態を示し、一重の円は非終端の状態を示している。矢印はエッジであり、それぞれ受理するタグを示している。また、εはイブシロン推移を示し、＊は任意のタグであり、それ以外の（ａ，ｂ，ｃ）はタグ名自身を示している。
【００７３】
図１１の下段図のＤＦＡは、ルートノードが生成されており、ＮＦＡの状態群として｛０，１，４｝が設定されている。これはＮＦＡのルート状態０からイブシロン推移により推移可能なＮＦＡの状態群に相当する。また｛０，１，４｝に接続されるエッジはａのみであるため、エッジとしてａが生成され、推移先はｎｕｌｌになっている。なお、ＤＦＡにおける状態の上部に”２”とあるが、これは状態フラグであるｖａｒＩｄを示している。ここでは、上段図に示すようにＸＰａｔｈの件数が２なので、ｖａｒＩｄは２と設定されている。
【００７４】
図１２は、図１０の２．の時点におけるオートマトン管理部の内部の様子を例示している。図１１との違いは下段図のＤＦＡの部分であり、図１０の２．に示すようにＸＭＬの入力＜ａ＞＜ｂ＞＜ｃ／＞＜ｄ＞により必要なＤＦＡの状態とエッジが生成されている。
【００７５】
ＤＦＡのルート状態におけるＮＦＡの状態群｛０，１，４｝からＸＭＬの入力＜ａ＞により稚移可能なＮＦＡの状態群は｛２，５｝であるので、ＤＦＡの状態を生成する。また、この｛２，５｝に接続するエッジ群は｛ｂ，ｃ，＊｝であることからエッジとして｛ｂ，ｃ，ｏｔｈｅｒ｝を生成し、ＤＦＡの状態に接続する。推移先は、最初はｎｕｌｌとしておく。
【００７６】
同様にＸＭＬの入力＜ｂ＞＜ｃ／＞＜ｄ＞により適宜ＤＦＡの状態とエッジを生成する。具体的には、入力＜ｂ＞により推移可能なＮＦＡの状態は｛３，５｝であるので、ＤＦＡの状態を生成する。また、この｛３，５｝に接続するエッジ群は｛ｃ，＊｝であることからエッジとして｛ｃ，ｏｔｈｅｒ｝を生成し、ＤＦＡの状態に接続する。推移先は、最初はｎｕｌｌとしておく。次に、入力＜ｃ／＞、入力＜ｄ＞と順々に処理してゆく。
【００７７】
図１３は、図１０の３．の時点におけるオートマトン管理部の内部の様子を例示している。図１３の上段図では、図１０の３．に示すように、ＸＰａｔｈとして”／ａ／／ｄ”が１件追加されている。図１３の中段図では、この追加されるＸＰａｔｈ”／ａ／／ｄ”に対応するＮＦＡが追加されている。具体的には、状態７，８，９とそれらに関するエッジ群である。
【００７８】
図１３の下段図のＤＦＡでは、図６〜図９の処理フローに従い更新、つまりこれまでに生成されているＤＦＡに対して、追加されるＸＰａｔｈの情報を差分追加する処理が実行される。
【００７９】
まず、ＤＦＡのルート状態のｓｔａｔｅｓに、追加されたＸＰａｔｈのＮＦＡのルート状態７が追加される。また、処理されるＸＰａｔｈが３件目であることから、状態フラグであるｖａｒＩｄを３にインクリメントする。
【００８０】
この時点でｌａｂｅｌＳｔａｃｋに｛ａ，ｂ，ｄ｝がスタックされているため、ＤＦＡのルート状態からａ，ｂ，ｄの順にＳＡＸイベントを生成して推移させる。まず、ａにより次の状態を更新し推移する。こうしてｓｔａｔｅｓに８が追加され｛２，５，８｝になり、エッジとしてｄが追加されている。
次にｂにより次の状態を更新し准移する。こうしてｓｔａｔｅｓに８が追加され｛３，５，８｝になり、エッジとしてｄが追加されている。
【００８１】
次に、ｄにより次の状態を更新し推移する。こうして、図１３の下段図に示すように、ｓｔａｔｅｓに５，８，９が追加され｛５，８，９｝になる。これら更新されたＤＦＡにおける状態のｖａｒＩｄは、全て３にインクリメントされる。
【００８２】
図１４は、図１０の４．の時点のオートマトン管理部の内部の様子を例示している。この図１４では、前の状態を示す図１３に対して、ＤＦＡにはエッジｄが１つ追加されている。
【００８３】
以上説明したＸＰａｔｈ処理方法では、ＸＰａｔｈが追加された際に、それを変換してＮＦＡを生成し、そのルートノードヘ、既存のＮＦＡのルートノードからイプシロンエッジを設定する。そして、予め保持しておいたＸＭＬのルートタグから現在処理中のタグまでの全ての開始タグを順にＤＦＡに入力することによって、順次、ＤＦＡの状態を更新する。つまり、追加されるＸＰａｔｈの情報を既存のＤＦＡに差分追加する。従って、１つのＤＦＡを更新することにより、複数のＤＦＡが生成されてしまう既存技術と比較して、メモリ空間を効率的に使用することが可能となる。ちなみに従来は、図１２を例にすると、ＸＰａｔｈの追加があると、追加されたＸＰａｔｈ”／ａ／／ｄ”に対応する新たなＤＦＡが、既にあるＤＦＡとは別に生成され、メモリ空間を圧迫したが、本発明ではそのように重複してＤＦＡが生成されることはない。
【００８４】
この際、ＤＦＡにおける各状態において、それが何件目のＸＰａｔｈまでの情報を元に更新されたかを表現する状態フラグｖａｒＩｄを用いて処理を行うので、ＸＰａｔｈの追加処理（既存のＤＦＡに対して追加されるＸＰａｔｈの情報の差分追加）を、確実かつ高速に行うことができる。
【００８５】
また、図６、図７、図１３、図１４等に示すように、ＤＦＡにおける状態推移を行う際、現在のＤＦＡにおける各状態の状態フラグｖａｒＩｄをチェックし、その状態が、最新のＸＰａｔｈを元に更新されていない場合は、状態フラグｖａｒＩｄを用いてその状態が更新された後に追加されたＸＰａｔｈを特定し、該特定したＸＰａｔｈに対応するＮＦＡの状態を用いてＤＦＡを更新するので、メモリ空間の有効活用を図れると共に、ＸＰａｔｈの追加処理を、確実かつ高速に行うことができる。
【００８６】
また、本実施形態では、現状入力したＳＡＸイベントまでに完了していない（ｓｔａｒｔＥｌｅｍｅｎｔを入力したがｅｎｄＥｌｅｍｅｎｔが未入力の）要素名リストであるｌａｂｅｌＳｔａｃｋを保有する。このため、ＸＰａｔｈの追加処理を、確実かつ高速に行うことができる。
【００８７】
なお、本発明は上記した実施形態に限定されることなく、幅広く変形実施することができる。例えば、ＮｅｗｓＭＬは一例でありＸＭＬデータがＮｅｗｓＭＬに限定されることはない。例えば、ＸＭＬデータがＮＩＴＦ（News Industry Text Format）データ等でもよい。また、フィルタエンジン１０は、例えばＡＳＰ（Application Service Provider）が企業や個人ユーザのために設置したり、企業や団体等が自社の社員や構成員等のために設置したりする。また、上記実施形態で示したＤＦＡに対するＸＰａｔｈの情報の差分追加の手段・手法は一例であり、本発明が前記実施形態で示した差分追加の手段・手法に限定されることはない。また、前記したフローなどは、ＸＰａｔｈ処理プログラムとしてネットワーク上を伝送されたり、ＣＤ−ＲＯＭ等の記憶媒体に記憶されて流通されたりする。
【００８８】
【発明の効果】
以上説明した本発明によれば、以下に例示するような優れた効果を奏する。
即ち、本発明によれば、既存のＤＦＡを利用するのでコンピュータのメモリ空間を効率的に使用することができる。また、確実にＤＦＡに対するＸＰａｔｈの情報の差分追加を行え、また、さらに効率的にメモリ空間を使用することができる。また、追加（更新）すべき部分を特定してムダなく確実に差分追加を行うことができる。よって、さらに効率的にメモリ空間を使用することができる。また、より確実に差分追加を行うことができる。よって、さらに効率的にメモリ空間を使用することができる。また、追加すべき部分（更新すべき部分）を、さらにムダなく確実に差分追加を行うことができる。よって、さらに効率的にメモリ空間を使用することができる。
【００８９】
また、本発明によれば、追加のＸＰａｔｈの情報が既存のＤＦＡに差分追加されるだけですむので、複数のＤＦＡを生成する必要がない。よって、コンピュータのメモリ空間の有効活用が図れる。また、本発明によれば、より確実に差分追加を行うことができる。よって、さらに効率的にメモリ空間を活用することができる。
【００９０】
また、本発明のＸＰａｔｈ処理プログラム及びそのプログラムを記憶した記憶媒体は、このプログラムをインストールされたコンピュータにプログラムに基づいた各ステップを実行させ、ＸＰａｔｈの情報の差分追加を行う。よって、コンピュータのメモリ空間の有効活用を行うことができる。
【図面の簡単な説明】
【図１】本発明の実施形態に係るＸＰａｔｈ処理方法が適用されるフィルタエンジンの概要を示す図である。
【図２】図１のフィルタエンジンの内部構成を示すブロック図である。
【図３】図２におけるオートマトン管理部のメモリ上でのデータ構成を示した図である。
【図４】図３の具体的なプログラム上でのデータの構造を示す図である。
【図５】図４のＤＦＡのメモリ上でのデータ構成を示した図である。
【図６】ＸＰａｔｈ処理方法の全体の処理フローである。
【図７】図６のサブルーチンＤＦＡ更新１の処理フローである。
【図８】図６のサブルーチンＤＦＡ更新２の処理フローである。
【図９】図７及び図８に用いられている関数ｔｒａｎｓｉｔの処理を示したフローである。
【図１０】本発明の実施形態に係るＸＰａｔｈ処理方法の動作の時点を示す図である。
【図１１】図１０の１．の時点におけるオートマトン管理部の内部の様子を例示した図である。
【図１２】図１０の２．の時点におけるオートマトン管理部の内部の様子を例示した図である。
【図１３】図１０の３．の時点におけるオートマトン管理部の内部の様子を例示した図である。
【図１４】図１０の４．の時点におけるオートマトン管理部の内部の様子を提示した図である。
【符号の説明】
１０… フィルタエンジン（ＸＰａｔｈ処理装置）
１１… ＸＭＬパースモジュール
１２… 問い合わせパースモジュール
１３… データ抽出モジュール
１４… 記憶装置
１５… オートマトン管理部
１６… データ変換モジュール[0001]
BACKGROUND OF THE INVENTION
The present invention relates to XPath processing using an automaton, and more particularly to an XPath processing method, an XPath processing device, an XPath processing program, and a storage medium storing the program for adding an XPath difference.
[0002]
[Prior art]
For example, NewsML is a new news distribution format based on XML (eXtensible Markup Language). NewsML can freely combine news materials such as news articles, images, videos, and voices, and can send information to various devices such as websites, mobile phones, and televisions (TV data broadcasts). A recipient (user) of NewsML can obtain necessary information by registering a search condition in the filter engine. Since the search condition is described in the XML query language, the filter engine processes the XPath group (individual XPath) and also adds the XPath.
[0003]
[Problems to be solved by the invention]
By the way, in the conventional XPath addition method, a deterministic automaton for processing the XPath is separately generated every time the XPath is added. For this reason, every time XPath is added, the number of deterministic automata increases, and as a result, there is a problem that the memory space of the computer constituting the filter engine is compressed. In particular, the problem becomes conspicuous when a lot of XPath is added.
[0004]
Accordingly, the present invention mainly provides an XPath processing method, an XPath processing device, an XPath processing program, and a storage medium storing the program that can effectively use the memory space for the addition of XPath. And
[0005]
[Means for Solving the Problems]
In view of the above problems, the present inventors have conducted intensive research and focused on the fact that the memory space can be effectively utilized by adding the difference of the XPath information to the existing DFA, thereby completing the present invention. It was.
[0006]
 [XPath processing method]
 That is, the present invention (Claim 1) that solves the above-described problem processes an XPath using an automaton.In XPath processing equipmentAn XPath processing method,The XPath processing apparatus isA step of generating a nondeterministic automaton (hereinafter referred to as “NFA”) from individual XPaths, and generating the combined nondeterministic automaton by integrating the generated nondeterministic automatonRegister to the storage deviceStep, inputIsGenerating a SAX event from the generated XML data, and generating a deterministic automaton (hereinafter referred to as “DFA”) using the combined non-deterministic automaton according to the generated SAX eventRegister to the storage deviceThe XPath to be added when there is an XPath addition.After converting to a non-deterministic automaton and updating the non-deterministic automaton by adding the XPath, all the start tags from the XML root tag to the tag currently being processed are stored in order. To add the difference of the XPath information,It is characterized by.
[0007]
 According to this configuration, since only the difference of the XPath information to be added is added to the DFA generated so far (existing DFA), it is not necessary to generate a new DFA separately. For this reason, the memory space can be effectively used. In the embodiment of the invention described later, the XPath (individual XPath, XPath group) is generated by the inquiry parsing module of FIG. NFA, combined NFA, and DFA are generated by the automaton management unit (data extraction module) in FIG. The determination of the XPath addition request is performed in step S14 of the flow of FIG. The addition of the difference of the XPath information to the DFA corresponds to the processing performed in step S16 (DFA update 1) and step S22 (DFA update 2) in the flow of FIG.In this configuration, for example, as in the flow (DFA update 1) shown in FIGS. 6 and 7 of the embodiment described later, the XPath information to be added is added to the existing DFA as a difference. In an embodiment described later, labelStack holds a list of element names that are not completed (currently being processed) by the SAX event currently input (the tag specifies the position of the element). The root expresses the DFA route state. Further, current represents the current state of DFA. By the way, “order” in “start tag in order” includes ascending order, descending order, random order (no order is specified), and the like.
[0010]
 Further, the present invention (claims)2) Claims1'sIn the configuration, in each state in the DFA, a difference is added to the added XPath information based on a state flag indicating that the state has been updated based on information up to the number of XPaths. And
[0011]
According to this configuration, the state flag can reliably update the portion to be updated without waste. In the embodiment described later, the status flag corresponds to varId.
[0014]
 Further, the present invention (claims)3) Claims1Or claim2When the state transition in the DFA is performed, the state flag of each state in the current DFA is checked. If the state is not updated based on the latest XPath, the state flag is used to check the state flag. The XPath added after the state is updated is specified, and the deterministic automaton is updated using the state of the NFA corresponding to the specified XPath.
[0015]
In this configuration, for example, an XPath is added as shown in FIG.
[0016]
[XPath processing equipment]
 Further, the XPath processing apparatus of the present invention (claims)4) EnterIsMeans for generating a SAX event from the XML data, means for converting individual XPaths based on user search conditions into NFA, integrating the generated NFA to generate a combined NFA, and registering it in a storage device; In response to the SAX event, there is a means for generating a DFA using the registered combined NFA and registering it in a storage device, and an XPath addition request.After updating the NFA by adding the XPath, by sequentially inputting all the start tags from the XML root tag stored in advance to the tag currently being processed to the NFA,And means for adding a difference in XPath information.
[0017]
 According to this configuration, since only the difference of the XPath information is added to the existing DFA, there is no need to generate a plurality of DFAs. Therefore, the memory space can be effectively used. In the embodiment described later, the filter engine corresponds to an XPath processing device. The XML parsing module corresponds to a means for generating a SAX event. The inquiry parsing module corresponds to a means for generating individual XPaths and adding XPaths to be added. In addition, the automaton management unit corresponds to a unit that performs conversion of XPath to NFA, generation of combined NFA, generation of DFA, and addition of difference of XPath information to DFA.Further, in this configuration, for example, as in the flow (DFA update 1) shown in FIGS. 6 and 7 of the embodiment described later, the difference of the XPath information to be added is added to the existing DFA.
[0020]
 [XPath processing program]
 Further, the XPath processing program of the present invention (claims)5) Is a computer to process XPath using automataXPath processing deviceIn addition,The XPath processing method according to any one of claims 1 to 3.It is made to perform.
[0021]
According to this configuration, the computer in which this program is installed is caused to execute each step based on the program, and XPath processing (DFA differential update) is performed.
[0022]
 [Storage medium storing XPath processing program]
 A storage medium storing the XPath processing program of the present invention (claims)6)Claim 5An XPath processing program is stored.
[0023]
This XPath processing program is copied and recorded on a recording medium such as a CD-ROM and distributed in the market, or transmitted over a network, for example. Then, the computer in which this program is installed is caused to execute XPath processing (DFA differential update).
[0024]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment (embodiment) of the XPath processing method of the present invention will be described in detail with reference to the drawings. Note that the XPath processing method described below also embodies an XPath processing apparatus and an XPath processing program.
[0025]
FIG. 1 is a diagram showing an outline of a filter engine to which the XPath processing method is applied.
XML data generated according to the XML format by a data provider (not shown) is transmitted to the filter engine 10 that executes the XPath processing method via a network such as an intranet. In the filter engine 10, each user who receives XML data registers in advance the data condition (personal profile) he wants in the filter engine 10 in the form of an XML query as in the conventional example.
[0026]
The filter engine 10 filters and converts XML data such as a news source sent according to the registered personal profile, and distributes the converted XML data to individual users. A specific example of XML data such as a news source is NewsML. As mentioned above, NewsML is a new news distribution format based on XML, which can freely combine news materials such as news articles, images, videos, and voices, and send information to various devices such as websites and mobile phones. be able to. It is also suitable for structuring and centrally managing various news materials such as news articles, images, videos, and voices.
[0027]
The filter engine 10 also includes an XPath processing device and an XPath processing program. Incidentally, XML is a meta language recommended by the World Wide Web Consortium (W3C) as an Internet standard. Meta language means the language that makes a language. XML data (also referred to as an XML document [XML Document]) is a document or data created using a language created by XML.
[0028]
FIG. 2 is a block diagram showing an internal configuration of the filter engine 10 of FIG. As shown in FIG. 2, the filter engine 10 includes an XML parsing module 11, an inquiry parsing module 12, a data extraction module 13, a storage device 14, an automaton management unit 15, and a data conversion module 16. It is assumed that the automaton management unit 15 includes the data extraction module 13 and the storage device 14. Incidentally, the filter engine 10 includes a main control device composed of a CPU and a RAM, an external storage device composed of a hard disk, a computer having a NIC (Network Interface Card) for communication, and a router. Consists of.
[0029]
The XML parsing module (XML parser) 11 parses input XML data, converts it into internal format XML data (SAX event), and outputs it to the data extraction module 13. Note that parsing is analysis processing that reads XML data described in a text format and analyzes document elements, attributes, and the like specified by XML tags (in the present invention, parsing procedures are particularly limited). Not a thing). Incidentally, API (Application Programming Interface) for manipulating XML data through the XML parsing module 11 includes two types of standard interfaces, DOM (Document Object Mode) and SAX (Simple API for XML). In this embodiment, the XML parsing module 11 corresponds to the latter SAX. Note that the XML parsing module 11 corresponding to SAX sequentially reads XML data and starts various handlers added in each time an XML tag (start tag, end tag, empty element tag) is detected. Here, the handler is a program that defines a method for processing each element of XML data based on the SAX interface. Further, the tag is a character string described in the XML data for clearly indicating the position of the element and storing the attribute.
[0030]
The inquiry parsing module 12 parses (analyzes) the personal profile to be added (search condition described in the XML inquiry language), and performs “data conversion operation” and data extraction operation “XPath (XPath group / individual XPath)”. ) ”. XPath is output to the data extraction module 13, and the data conversion operation is output to the data conversion module 16. XPath (XML Path Language) is a language indicating a specific part of XML data. If XPath is used, an arbitrary position in the data can be indicated even if no anchor or the like is embedded in the XML data.
[0031]
The data extraction module 13 converts each XPath input from the inquiry parsing module 12 into NFA (generates NFA) and registers / adds it to the storage device 14. Further, the data extraction module 13 sequentially generates / updates DFA using NFA in accordance with the SAX event input from the XML parsing module 11. Then, the data extraction module 13 outputs the partial XML extracted from the input XML data to the data conversion module 16 as internal format XML data (filtered internal format XML data) using DFA.
[0032]
Incidentally, an automaton is a general term for a model that mathematically represents a calculation mechanism such as a computer, and has an input, an output, and a state. Among these, DFA (deterministic automaton) is an automaton in which a transition destination (transition destination) with respect to an input is determined to be one. On the other hand, an NFA (nondeterministic automaton) is an automaton in which a plurality of transition destinations (transition destinations) for an input exist in a certain state.
[0033]
The detailed functions of the automaton management unit 15 including the data extraction module 13 and the storage device 14 will be described in detail later with reference to the drawings.
[0034]
The data conversion module 16 performs a predetermined conversion from the data conversion operation and the extracted XML data in the internal format, and outputs the result as filtered XML data (converted XML data). The predetermined conversion is not particularly limited in the present invention.
[0035]
Next, the automaton management unit 15 will be described in detail.
FIG. 3 is a diagram showing a data structure on the memory of the automaton management unit 15 in FIG. Details of the DFA are shown in FIG.
[0036]
As shown in FIG. 3, an NFA is generated for each XPath. The plurality of NFAs generated in this way are combined by one node, and connected to each NFA from the root by an epsilon edge (combined NFA). The epsilon edge is an “empty character string” normally defined in an automaton, for example.
The DFA uses a combined (integrated) NFA and sequentially generates / updates necessary states according to the input of XML data.
[0037]
FIG. 4 is a diagram showing a data structure on the specific program of FIG. In FIG. 4, a variable class (class variable) is a class that generates an instance for each XPath, and has the following attributes.
* "Id" which is an internal identifier that is different for each instance
* "Xpath" which is an individual XPath expression
* "VarName" which is a name for distinguishing Variable from the user Note that an instance refers to an object actually created using a class definition as a model in object-oriented programming, for example.
[0038]
The State class (class State) is a class that expresses the state of the NFA and has the following attributes.
* A set of edges “edges” representing transitions from one state to another
* "Type" indicating whether the terminal is in a terminal or non-terminal state
When the expression of XPath is converted to NFA, the last state is assumed to be a termination, and the other states are assumed to be non-termination.
* "Var" indicating from which Variable instance (that is, from its xpath) its state was created
[0039]
The DFAState class (class DFAState) represents the state of the DFA, the State class is defined using object-oriented inheritance, and has the following attributes.
* “States” that expresses the NFA state group that is required when generating a DFA using NFA
* In order to speed up the XPath addition process, “variId”, which is the maximum id of Variable indicating that the own state has been generated in consideration of up to XPath corresponding to which Variable instance.
This varId corresponds to the “state flag” in the claims, and expresses in what state in the DFA it has been updated based on the information up to the XPath.
Although “edges” is inherited from the “State” class, it can be redefined using a hash or map structure instead of “list” in order to increase the processing speed.
[0040]
The Edge class (class Edge) represents an edge between NFA state states or an edge between DFA states, and has the following attributes.
* State "to" that is the edge destination of its own edge
* State "from" which is the edge source of its own edge
[0041]
In addition, the following variables are used as global variables.
* “GMaxVarId”, which represents the id of the most recently created Variable instance in order to pay out the variable instance id.
* When processing an endElement event that is a SAX event, it is necessary to return the state transition of the automaton, and as a result, it is necessary to manage the path from the root state of the automaton to the current state. "StateStack" representing this state list
* In this embodiment, since the difference of XPath is added, an element name list that is not completed by the SAX event that is currently input (that is, startElement is input but endElement is not input) is held. "LabelStack" representing this element name list
* "Root" expressing the DFA route status
* "Current" that expresses the current state in DFA
[0042]
FIG. 5 is a diagram illustrating a data structure on the memory of the DFA of FIG. However, in FIG. 5, unlike FIG. 4, a map structure is used instead of a list in order to realize “edges”.
[0043]
DFAState located on the left side of FIG. 5 is an instance of the DFAState class, and expresses one state of DFA. In the state of FIG. 5, the values of states, edges, type, var, and varId that are attribute values are set.
Here, states represents an NFA state group. edge represents a set of a reference to an edge label and an edge instance. Since type is non-terminal, it is “non-terminal”. var is a reference to a Variable instance. In this example, varId is 3, which means that this DFAState reflects the state up to the third variable (XPath).
[0044]
An Edge instance group is located in the middle (left middle) of FIG.
Also, a DFAState instance group corresponding to Edge's child transition destination is located in the middle right of FIG. As shown in FIG. 5, the transition destination of Edge is allowed to be Null (DFAState is not generated).
[0045]
[Process flow]
6 to 9 are diagrams illustrating a processing flow in the present embodiment. Hereinafter, the present invention will be described more specifically with reference to the flow.
[0046]
FIG. 6 is an overall processing flow of the XPath processing method.
In the process of step S11 at the highest level, gMaxVarId, stateStack, and labelStack are initialized. Note that the number of XPaths is set in gMaxVarId. In the next step S12, an NFA is generated from each initially registered XPath (XPath group) and registered (updated) in the automaton management unit 15 (storage device 14).
[0047]
In the process of the next step S13, the generated NFA groups are integrated to generate a combined NFA (see the middle diagram in FIG. 3). Also, the root state of the DFA is generated and set to rootState and current. These settings are stored in the storage device 14 of the automaton management unit 15.
[0048]
In the next selection condition process, it is determined whether there is an XPath addition request. Specifically, if there is an XPath addition request in step S14, the addition of XPath is accepted in step S15, and the "DFA update 1" subroutine processing is performed in step S16. “DFA update 1” is the processing shown in FIG. 7, but will be described later. On the other hand, if there is no request for adding an XPath in step S14, the process proceeds to step S17.
[0049]
In the selection condition process of step S17, a SAX event from the input XML data is awaited. In step S18, when there is a SAX event, it is determined whether the SAX event is endEvent or startEvent. If the SAX event is startEvent (startElement, startDocument), it is determined whether or not the transition destination state corresponding to the SAX event is in DFA (step S19). If the destination state is not generated (Null) (No, if not), the DFA transition destination state is generated with reference to the NFA in step S20. In step S20, gMaxVarId is set to varId, which is a status flag. As a result, varId expresses in what state in the DFA it has been updated based on the information up to which XPath. These are reflected in the update of the automaton management unit 15. Then, the process proceeds to step S23.
[0050]
On the other hand, if the state of the transition destination corresponding to the SAX event is DFA (generated, Yes) in step S19, it is determined whether varId of the state after transition is equal to gMaxVarId (step S21).
[0051]
On the other hand, if they are not equal as determined in step S21 (No), the process proceeds to step S23 after performing “DFA update 2” in step S22. Note that “DFA update 2” is the processing shown in FIG. 8, but will be described later. If it is determined in step S21 that varId and gMaxVarId are equal (Yes), the process proceeds to step S23.
[0052]
In step S23, current is stacked in stateStack (adding one item), the tag name of StartEvent is stacked in labelStack (adding one item), and current is set in State after transition. Also, SAX events are output as necessary. And it transfers to step S25 and outputs a result (output internal format XML data).
[0053]
By the way, if the SAX event is endEvent (endElement, endDocument) in step S18, the stackState is popped in step S24, the result is set to current, and the labelStack is popped. Also, SAX events are output as necessary. Note that pop means extracting the last addition. After processing step S24, the process proceeds to step S25 to output the result (output internal format XML data).
[0054]
Then, in step S26, it is determined whether or not the process is completed. If the process is not completed (No), the process proceeds to step S14 to continue the process. When it is the end (Yes), the process shifts to End to end the process.
[0055]
FIG. 7 is a processing flow of the subroutine “DFA update 1” in step S16 of FIG.
The process of the first step S161 is incremented by adding 1 to gMaxVarId since one XPath is added (addition of a new XPath) as shown in the flowchart of FIG. In the processing of the next step S162, an NFA is generated from the added new XPath (added XPath) and added (inserted) to the combined NFA of the automaton management unit 15.
[0056]
In the process of the next step S163, a loop variable I to be used later is initialized (I = 0). Also, set the root state of the generated NFA to a temporary variable states, add states to the states of the rootState, add the edge (s) originating from the stateS to the edges of the rootState, and set gMaxVarId to the rootState's varId, Set rootState in current.
[0057]
In the determination processing in the next step S164, the loop variable I is set to labelStack. It is determined whether or not it is smaller than (or less than) size. If it is smaller (Yes), the I-th element is extracted from labelStack in step S165, and a start event (startElement, startDocument) is generated.
[0058]
In the determination process of the next step S166, the equivalence between varId and gMaxVarId of the transition destination state that is changed by the generated SAX event is determined. Specifically, if both are equal, the process proceeds to step S168. If they are not equal, in step S167, the transition destination states (states, edges, VarId) are updated as follows.
[0059]
That is, in step S167, the result of transit (states, transition destination varId, input SAX event) is set in the temporary variable newStates. The process of transit will be described with reference to FIG. In S167, as a result, the current DFA state is updated.
[0060]
Specifically, newStates is added to the state of the transition destination. The result of getEdges (newStates) is added to the transition destination edge. Set gMaxVarId to the destination varId. This means that the transition destination state is update completion. Incidentally, getEdges is a function that extracts all edge groups connected to the parameter newStates.
[0061]
In the next step S168, current is stacked on stateStack, the tag name is stacked on labelStack, and current is set to the state after transition.
[0062]
In the next process of step S169, the loop variable I is incremented, and the process returns to the loop determination process of step S164. In step S164, the loop variable I is set to labelStack. If it is not smaller than size (No), the process is terminated and the process returns to step S16 in FIG. 6 (Return, step S170).
[0063]
FIG. 8 is a flowchart showing the processing of the subroutine “DFA update 2” in step S22 of FIG.
In the processing of FIG. 8, the statuses, edges, and varId of the transition destination state are updated. That is, the result of transit (states, transition destination varId, input SAX event) is set in the temporary variable newStates. The process of transit will be described with reference to FIG. In this “DFA update 2”, the current DFA state is updated as a result.
[0064]
Specifically, in “DFA update 2”, newStates is added to the transition destination states. Deduplicate the entry of the transition destination states. The result of getEdges (newStates) is added to the transition destination edge. Set gMaxVarId to the destination varId. This means that the transition destination state is update completion. Incidentally, getEdges is a function that extracts all edge groups connected to the parameter newStates.
[0065]
FIG. 9 is a diagram showing a flow showing processing of the function “transit” used in FIGS. 7 and 8.
The input (parameter) in step S41 in the flow of FIG. 9 includes three (states (NFA state group)), VarId (variable instance attribute id value), and input SAX event.
[0066]
In the next step S42, the loop variable I is initialized (I = 0), and the calculation result storage area rstates is initialized. In the determination process of the next step S43, the loop variable I is set to states. It is determined whether it is smaller than size. I is state. If smaller than size (Yes), in step S44, the I-th element is extracted from the states and set in the temporary variable state, and the attribute var is set in the temporary variable vvar.
[0067]
In the determination process in the next step S45, it is determined whether or not the id of vvar is larger than the transition destination varId. If the id of the vvar exceeds the transition destination varId (Yes), as shown in step S46, all the NFA states that can be shifted from the state by the input SAX event are added to rstates, and the process proceeds to step S47. On the other hand, if the id of vvar is less than or equal to the transition destination varId (No), the process proceeds to step S47 as it is.
[0068]
In the process of the next step S47, the loop variable I is incremented, and the process proceeds to step S43 to return to the loop determination condition. In step S43, the loop variable I is set to states. If it is not smaller than size (No), the transit function is terminated and restates is returned. That is, it is substituted for NewStates in FIGS.
[0069]
[State of the automaton manager corresponding to the operation]
Next, an operation example of the XPath processing method of the present embodiment will be described more specifically.
10 to 14 are diagrams cited for explaining an operation example of the XPath processing method. Among these, FIG. 10 is a diagram showing the time of operation of the XPath processing method. FIGS. 11-14 illustrate the internal state of the automaton manager that changes at each point in time.
[0070]
First, in FIG. 10, the following 1. in the XPath processing method in this embodiment will be described. ~ 4. The time points of the four operations are shown.
1. When two XPath "/ a / b" and "/ a // c" are registered
2. After that, when XML is input halfway (<a> <c /> <d>)
3. When XPath "/ a // d" is added
4). After that, when XML input is continued (test </ d> <d /> </a>)
[0071]
FIG. It is the figure which illustrated the mode inside the automaton management part in the time of. As shown in the upper diagram of FIG. 11, two XPaths, “/ a / b” and “/ a // c”, are registered. As described above, the XPath is separated and generated from the personal profile by the inquiry parsing module 12 (see FIG. 2).
[0072]
The NFA in the middle diagram of FIG. 11 shows an NFA (combined NFA) in which the two XPaths in the upper diagram are converted and connected to the root node. In the middle diagram, seven states and six edges connecting them are shown. Here, a double circle indicates a terminal state, and a single circle indicates a non-terminal state. Arrows are edges, each indicating a tag to be accepted. Also, ε represents the epsilon transition, * is an arbitrary tag, and (a, b, c) other than that represents the tag name itself.
[0073]
In the DFA in the lower diagram of FIG. 11, a root node is generated, and {0, 1, 4} is set as an NFA state group. This corresponds to a group of NFA states that can transition from the root state 0 of the NFA to the transition of the epsilon. Since only the edge a is connected to {0, 1, 4}, a is generated as an edge, and the transition destination is null. Note that “2” is shown at the top of the state in the DFA, which indicates varId, which is a state flag. Here, since the number of XPaths is 2, as shown in the upper diagram, varId is set to 2.
[0074]
12 is the same as FIG. This illustrates an internal state of the automaton management unit at the point of time. The difference from FIG. 11 is the DFA part in the lower diagram, and 2 in FIG. As shown in FIG. 4, the necessary DFA states and edges are generated by the XML inputs <a> <c /> <d>.
[0075]
Since the NFA state group {0, 1, 4} in the root state of the DFA and the NFA state group that can be transferred by the XML input <a> are {2, 5}, the DFA state is generated. Since the edge group connected to {2, 5} is {b, c, *}, {b, c, other} is generated as an edge and connected to the DFA state. The transition destination is initially null.
[0076]
Similarly, the state and edge of the DFA are appropriately generated based on the XML input <c /> <d>. Specifically, since the NFA state that can be transitioned by the input is {3, 5}, the DFA state is generated. Since the edge group connected to {3, 5} is {c, *}, {c, other} is generated as an edge and connected to the DFA state. The transition destination is initially null. Next, input <c /> and input <d> are processed in order.
[0077]
13 is the same as FIG. This illustrates an internal state of the automaton management unit at the point of time. In the upper part of FIG. As shown in FIG. 6, one “/ a // d” is added as the XPath. In the middle diagram of FIG. 13, an NFA corresponding to the added XPath "/ a // d" is added. Specifically, states 7, 8, and 9 and edge groups related to them.
[0078]
In the DFA in the lower diagram of FIG. 13, updating according to the processing flows of FIGS. 6 to 9, that is, a process of adding a difference in the added XPath information to the DFA generated so far is executed.
[0079]
First, the route state 7 of the added XPath NFA is added to the DFA route state. Also, since the XPath to be processed is the third case, the status flag varId is incremented to 3.
[0080]
Since {a, b, d} is stacked in labelStack at this time, SAX events are generated and shifted in the order of a, b, d from the DFA root state. First, the next state is updated and changed by a. In this way, 8 is added to the states to become {2, 5, 8}, and d is added as an edge.
Next, the next state is updated and transferred by b. In this way, 8 is added to the states to become {3, 5, 8}, and d is added as an edge.
[0081]
Next, the next state is updated and changed by d. In this way, as shown in the lower diagram of FIG. 13, 5,8,9 is added to the states to become {5,8,9}. The state varIds in these updated DFAs are all incremented to 3.
[0082]
14 is the same as FIG. The state inside the automaton management unit at the time of is illustrated. In FIG. 14, one edge d is added to the DFA with respect to FIG. 13 showing the previous state.
[0083]
In the XPath processing method described above, when an XPath is added, it is converted to generate an NFA, and an epsilon edge is set to the root node from the root node of the existing NFA. Then, by sequentially inputting all start tags from the XML root tag held in advance to the tag currently being processed to the DFA, the state of the DFA is sequentially updated. That is, the difference of the added XPath information is added to the existing DFA. Therefore, by updating one DFA, it becomes possible to use the memory space more efficiently than in the existing technology in which a plurality of DFAs are generated. By the way, in the past, taking Fig. 12 as an example, if XPath is added, a new DFA corresponding to the added XPath "/ a // d" is generated separately from the existing DFA and compresses the memory space. However, in the present invention, such a duplicate DFA is not generated.
[0084]
At this time, in each state in the DFA, processing is performed using the state flag varId that represents how many times the XPath has been updated, so that the XPath addition processing (for the existing DFA) The addition of the difference of the XPath information to be added) can be performed reliably and at high speed.
[0085]
Also, as shown in FIGS. 6, 7, 13, 14, and the like, when performing state transition in the DFA, the state flag varId of each state in the current DFA is checked, and the state is based on the latest XPath. If the XPath added after the state is updated using the state flag varId and the DFA is updated using the state of the NFA corresponding to the specified XPath, the memory space Can be used effectively, and XPath addition processing can be performed reliably and at high speed.
[0086]
Also, in the present embodiment, labelStack, which is an element name list that has not been completed by the currently input SAX event (startElement has been input but endElement has not been input), is held. For this reason, the XPath addition process can be performed reliably and at high speed.
[0087]
The present invention is not limited to the above-described embodiment, and can be widely modified. For example, NewsML is an example, and XML data is not limited to NewsML. For example, the XML data may be NITF (News Industry Text Format) data. The filter engine 10 is installed by, for example, an ASP (Application Service Provider) for a company or an individual user, or a company, an organization, or the like is installed for an employee or a member of the company. Further, the means / method for adding the difference of the XPath information for the DFA shown in the above embodiment is an example, and the present invention is not limited to the means / method for adding the difference shown in the above embodiment. The above-described flow or the like is transmitted as an XPath processing program over a network, or stored and distributed in a storage medium such as a CD-ROM.
[0088]
【The invention's effect】
 According to the present invention described above, there are excellent effects as exemplified below.
 That is,The present inventionSince the existing DFA is used, the memory space of the computer can be used efficiently. Also, SureIn fact, the difference of the XPath information with respect to the DFA can be added, and the memory space can be used more efficiently. Also, AddIt is possible to specify a portion to be added (updated) and reliably add a difference without waste. Therefore, the memory space can be used more efficiently. Also, YoThe difference can be added more reliably. Therefore, the memory space can be used more efficiently. Also, AddIt is possible to reliably add a difference to a portion to be added (a portion to be updated) without further waste. Therefore, the memory space can be used more efficiently.
[0089]
 Also,The present inventionAccording to the above, since only the difference is added to the information of the additional XPath in the existing DFA, it is not necessary to generate a plurality of DFAs. Therefore, it is possible to effectively use the memory space of the computer. Also,BookAccording to the invention, the difference addition can be more reliably performed. Therefore, the memory space can be utilized more efficiently.
[0090]
 Also,The present inventionThe XPath processing program and the storage medium storing the program cause the computer in which the program is installed to execute each step based on the program to add the difference of the XPath information. Therefore, it is possible to effectively use the memory space of the computer.
[Brief description of the drawings]
FIG. 1 is a diagram showing an outline of a filter engine to which an XPath processing method according to an embodiment of the present invention is applied.
2 is a block diagram showing an internal configuration of the filter engine of FIG. 1. FIG.
3 is a diagram showing a data configuration on a memory of the automaton management unit in FIG. 2; FIG.
4 is a diagram showing a data structure on the specific program of FIG. 3; FIG.
FIG. 5 is a diagram showing a data configuration on the memory of the DFA of FIG. 4;
FIG. 6 is an overall processing flow of an XPath processing method.
7 is a processing flow of subroutine DFA update 1 in FIG. 6;
FIG. 8 is a processing flow of subroutine DFA update 2 in FIG. 6;
FIG. 9 is a flow showing processing of a function “transit” used in FIGS. 7 and 8;
FIG. 10 is a diagram illustrating a time point of operation of the XPath processing method according to the embodiment of the present invention.
FIG. It is the figure which illustrated the mode inside the automaton management part in the time of.
12 is a diagram of FIG. It is the figure which illustrated the mode inside the automaton management part in the time of.
13 is a diagram of FIG. It is the figure which illustrated the mode inside the automaton management part in the time of.
14 is the same as FIG. It is the figure which showed the mode inside the automaton management part in the time of.
[Explanation of symbols]
10. Filter engine (XPath processing device)
11 ... XML parsing module
12 ... Inquiry parsing module
13 ... Data extraction module
14 ... Storage device
15 ... Automaton Management Department
16 ... Data conversion module

Claims

An XPath processing method in an XPath processing apparatus that processes XPath using an automaton,
The XPath processing apparatus is
Generating a non-deterministic automaton from individual XPaths;
Integrating the generated non-deterministic automaton to generate a combined non-deterministic automaton and registering it in a storage device ;
Generating SAX events from the input XML data,
In response to the generated SAX event, generating a deterministic automaton using the combined non-deterministic automaton and registering it in a storage device ,
When XPath is added, the added XPath is converted into a non-deterministic automaton, the non-deterministic automaton is updated by adding the XPath, and the current processing is performed from the XML root tag held in advance. Adding the difference of XPath information by inputting all start tags up to the inside tag in order to the deterministic automaton,
An XPath processing method characterized by the above.

In each state in the deterministic automaton, adding the difference of the added XPath information based on a state flag indicating that the state has been updated based on information up to what number XPath,
The XPath processing method according to claim 1 .

When performing state transition in the deterministic automaton, check the state flag of each state in the current deterministic automaton,
If the state is not updated based on the latest XPath, the state flag is used to identify the XPath added after the state is updated, and the state of the non-deterministic automaton corresponding to the identified XPath Updating deterministic automata using
XPath processing method according to claim 1 or claim 2, characterized in.

Means for generating SAX events from the input XML data,
Means for converting individual XPaths based on a user's search condition into non-deterministic automata, integrating the generated non-deterministic automata to generate a combined non-deterministic automaton and registering it in a storage device;
Means for generating a deterministic automaton using the registered non-deterministic automaton registered in response to the SAX event and registering it in a storage device;
When there is an XPath addition request, a non-deterministic automaton is generated from the Xpath to be added, and after updating the non-deterministic automaton by adding the XPath, currently processing from the XML route tag held in advance Means for sequentially adding all of the start tags up to the tag to the deterministic automaton, thereby adding a difference in the XPath information;
An XPath processing apparatus comprising:

An XPath processing program that causes an XPath processing apparatus , which is a computer , to execute an XPath processing method according to any one of claims 1 to 3 in order to process XPath using an automaton.

A storage medium storing the XPath processing program according to claim 5 .