JP4242701B2

JP4242701B2 - Storage search device, storage search program, and storage search program recording medium

Info

Publication number: JP4242701B2
Application number: JP2003146784A
Authority: JP
Inventors: 史朗春日; 恭太郎堀口; 光明綱川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-05-23
Filing date: 2003-05-23
Publication date: 2009-03-25
Anticipated expiration: 2023-05-23
Also published as: JP2004348593A

Description

【０００１】
【発明の属する技術分野】
本発明は、構造化文書を格納し、検索するコンピュータシステムに用いられるミドルウェアに関し、特に、メタ情報を格納する機能を持たないオブジェクト指向データベースに対して構造化文書およびメタ情報を格納し、検索するミドルウェアに関する。
【０００２】
【従来の技術】
近年、ＸＭＬ（eXtensible Markup Language）をはじめとする構造化文書が、インターネット上の様々な情報共有のためのデータフォーマットとして、利用されるようになっている。ＸＭＬは、１９９７年１２月に、標準化団体Ｗ３Ｃ（World Wide Web Consorium）により標準化された構造化文書の規格の一種である。このＸＭＬ規格に沿って書かれたデータをＸＭＬ文書と呼ぶ。
【０００３】
ＸＭＬ文書は、人が解読・編集可能な文書である。しかし、同時に、ＸＭＬ文書は、タグを用いて構造化されており、コンピュータプログラムが、容易に処理することが可能なデータでもある。ＸＭＬ文書のタグは、見かけは文書中に埋めこまれた「<」と「>」で囲まれた文字列である。タグには、開始タグと終了タグがあり、開始タグと終了タグで囲まれた領域を要素と呼ぶ。要素は、複数の子要素を持ち、それぞれの子要素が複数の孫要素を持つというように、入れ子状に記述できる。そのため、ＸＭＬ文書は、多段階の木構造を表現することができる。
【０００４】
現在、ＸＭＬ文書によって表現される情報は、多岐に渡り、ＸＭＬ規格にタグの付け方の規則を規定することで、特定の用途への応用が行われている。例えば、企業間連携のためのＲｏｓｅｔｔａＮｅｔ（http://www.rosettanet.gr.jp/）やｅｂＸＭＬ（http://www.ebxml.org/）、リソース情報記述のためのＲＤＦ(
Resource Definition Framework,http://www.w3.org/RDF/）、マルチメデイア情報記述のためのＳＶＧ（Scalable Vector Graphics）やＳＭＩＬ（Synchronized Multimedia Integration Language）などがある。上記の特定用途のＸＭＬ文書を利用するシステムは、それぞれのシステムが処理すべきＸＭＬ文書であることを確認するために、ＸＭＬ用のスキーマ言語（ＸＭＬ Schema, http://www.w3.org/XML/Schema）を用いて検証を行い、規定外のＸＭＬ文書を排除することで、処理対象のＸＭＬ文書のみに処理を注力することができる。
【０００５】
ＸＭＬ文書をコンピュータプログラムが処理する際には、ＸＭＬ文書が表現する木構造をコンピュータメモリ上の木構造に変換した方が便利である。このように、ＸＭＬ文書をコンピュータメモリ上の木構造として表現したものを、ＤＯＭ（Document Object Model）と呼ぶ。ＤＯＭは、同じくＷ３Ｃにより標準化されている。ＤＯＭは、ノードとリンクよりなるノード・リンクモデルでＸＭＬ文書を表現する。ＸＭＬ文書の要素は、ＤＯＭのノードに相当する。
【０００６】
コンピュータメモリ上のＤＯＭデータを処理するシステムを作成する際に、ＤＯＭデータ中のノードを指し示す検索式が利用できれば便利である。そのために、同じくＷ３Ｃにより、ＸＰａｔｈ（XML Path Language）という表記方法が標準化されている。ＸＰａｔｈのようなパス検索式を用いることで、ＤＯＭデータ中の条件に合うノードを指し示すことができる。
【０００７】
上記のようなＸＭＬに関する様々な技術の規格化が行われ、様々なコンピュータシステムがＸＭＬをベースとして開発されるようになったため、近年、ＸＭＬ文書を格納するためのデータベースの必要性も増している。ＸＭＬを格納するデータベースには、大きく分けて、リレーショナルデータベース、オブジェクト指向データベース、文書データベースの３種類がある。
【０００８】
リレーショナルデータベースにＸＭＬ文書を格納するには、ＸＭＬ文書をリレーショナルデータベースの格納モデルである二次元の表に変換する必要がある。現在、リレーショナルモデルに基づくリレーショナルデータベース管理システム（ＲＤＢＭＳ）は、データベース管理システム（ＤＢＭＳ）の主流として、顧客管理データベースや物品管理データベースなどに広く利用されている。従って、信頼性の高いリレーショナルデータベース管理システム（ＲＤＢＭＳ）を利用することは容易であるが、ＸＭＬ文書を二次元の表形式に変換するには、元となるＸＭＬ文書の形式や利用目的を分析し、最適な変換方法を検討し、リレーショナルスキーマを設計する必要がある。そのため、設計・構築コストが高く、大規模なシステム開発には向くが、中小規模のシステム開発には不向きである。
【０００９】
オブジェクト指向データベースにＸＭＬ文書を格納するには、ＸＭＬ文書をそのままデータベースに格納すればよい。これは、オブジェクト指向データベースは、ＸＭＬ文書の基本構造である木構造をオブジェクトの親子関係として、そのままの形で格納することができるからである。そのため、システム開発のコスト低減や、構築期間の短縮が重要な中小規模のシステム開発においては、複雑なスキーマ設計が必要ないという理由から、ＸＭＬ文書を木構造データとしてデータベースに格納し、パス検索式を用いて検索を行うことが可能なオブジェクト指向データベースが盛んに利用されている。なお、以後の説明において、構造化文書を格納するオブジェクト指向データベースをオブジェクト指向構造化文書データベースと呼ぶ。
【００１０】
文書データベースにＸＭＬ文書を格納する際には、構造化文書を文章として格納する。文書データベースは、構造化文書を文章として扱い、自然言語解析を施し、索引付けを行い、データベースに格納するので、文章の類似検索が可能なデータベースである。そのため、文書データベースは、ＸＭＬ文書のうち、文章データを格納する場合に特化して利用されるが、文章を扱うシステム開発以外には、用いられない。
【００１１】
オブジェクト指向構造化文書データベースの構造化文書の格納は、図１９に示す構造化文書を図２０（ａ）のようなノードとリンクの木構造として表現し、ノードオブジェクトとその間のリンクという形式で保存することで実現されている。尚、図２０（ｂ）は図２０（ａ）の表記方法を説明している凡例であるが、これによれば、木構造には、必ずルートノードがあり、構造化文書の要素は要素ノード、属性は属性ノード、文字列はテキストノードとして格納される。
【００１２】
オブジェクト指向構造化文書データベースは、このノードとリンクの木構造に対して、木構造取得機能、木構造操作機能、およびパス検索機能の３つの機能を有する。
【００１３】
木構造取得機能は、データベースに格納された構造化文書を木構造としてアクセスし、ノード情報を取得する機能である。これにより、データベースクライアントは、木構造を辿り、ノードの情報を取得することができる。また、木構造を辿ることで、元の構造化文書を再構成することができる。例えば、図２０（ａ）に示すノードｎ_００２を基点と指定すると、図２１に示す部分構造化文書を取り出すことができる。
【００１４】
木構造操作機能は、データベースに格納された構造化文書を木構造としてアクセスし、ノード情報を操作する機能である。これにより、データベースクライアントは、基点となるノードを指定し、そのノードへ新しい子ノードの追加を行うことができる。本機能を用いると、構造化文書中に別の構造化文書を、部分構造化文書として埋め込むことができる。例えば、図２２に示す部分構造化文書をノードｎ₀₀₂の子ノードに追加すると、図２３に示す木構造になる。この機能を部分構造化文書挿入と呼ぶ。尚、部分構造化文書挿入については、ルートノード（図２３におけるｎ₀₀₀）を基点として構造化文書自身の挿入を指定することで、構造化文書の全文書挿入を実現することができる。
【００１５】
パス検索機能は、パス検索式により該当するノード群をノード集合として取得する機能である。パスは、複数の要素名や属性名を“/”で区切った文字列で、ＵＮＩＸ（登録商標）ＯＳなどで用いられているディレクトリパスと似た概念であり、構造化文書の木構造を辿る順序を表している。また、パス検索式には、条件式を付加することができる。条件式は、木構造を辿る際に、ノードの絞込みを行うことを指示する。図２４は、パス検索式の一例である。この例では、ｏｒｄｅｒノードの子の、ｂｏｏｋノードの子の、ａｕｔｈｏｒノードを返却することと、ｐｒｉｃｅノードの値が２００以上であるｂｏｏｋノードに限ることを表している。図２４に示すパス検索式は、ルートノードを基点とし、図２３に示すノード集合Ｎ＝｛ｎ₀₀₅｝が返却される。
【００１６】
以上のように、ＸＭＬ文書のような構造化された文書を格納する必要がある中小規模のデータベースシステムには、オブジェクト指向データベースが適している。
【００１７】
尚、この出願に関連する先行技術文献情報としては、次のものがある。
【００１８】
【特許文献１】
特開２００１−３３１４７９
【００１９】
【発明が解決しようとする課題】
ところで、従来、オブジェクト指向データベースとアプリケーションプログラムを用いてコンピュータシステムを開発する際には、ＸＭＬ文書のような構造化文書に文書外追加情報（例えば、ＸＭＬ文書の作成者、日付、更新履歴など。以下、これをメタ情報と呼ぶ）を付加して、オブジェクト指向データベースに格納する場合が多い。このような場合には、元となるＸＭＬ文書のスキーマに任意の構造を付加する機能があるため、ＸＭＬ文書のスキーマにメタ情報を追加するという方法で対応していた。
【００２０】
しかしながら、上記の方法においては、オブジェクト指向データベースのスキーマ上では元のＸＭＬ文書とメタ情報部分は区別されないため、
（１）ＸＭＬ文書をオブジェクト指向データベースに格納する際には、ＸＭＬ文書とメタ情報を結合してから格納する
（２）オブジェクト指向データベースからＸＭＬ文書を検索する際には、元のＸＭＬ文書にメタ情報部分が含まれたまま取り出される
という処理が行われることになる。特に（２）については、扱うＸＭＬ文書がＲＤＦやＳＭＩＬなど規格化されたＸＭＬ文書である場合には、メタ情報部分が不正スキーマとしてエラーになるため、規格に合わないＸＭＬ文書をそのままＲＤＦやＳＭＩＬなどの処理に使えないという問題が発生する。
【００２１】
そのため、通常、アプリケーションが不要な部分であるメタ情報部分を削除し、ＲＤＦやＳＭＩＬの処理で扱えるようにする手段が必要となり、このことから、上記コンピュータシステムの開発においては以下の問題が発生していた。
【００２２】
（１）メタ情報の付加および削除処理にアプリケーションが対応する必要があり、開発コストが余計にかかる。
【００２３】
（２）メタ情報が増えるとその度にアプリケーションを修正する必要がある。
【００２４】
本発明は、上記の事情を鑑みたものであり、オブジェクト指向構造化文書データベースがメタ情報を格納する機能を有しなくても、アプリケーションプログラムに依存せずに、メタ情報の追加・変更・削除に柔軟に対応できる格納検索装置、格納検索プログラム、および格納検索プログラム記録媒体を提供することを目的とする。
【００２５】
【課題を解決するための手段】
上記目的を達成するため、請求項１記載の本発明は、アプリケーションプログラムからの指示に基づいてオブジェクト指向構造化文書データベースにアクセスし、情報処理を行う格納検索装置であって、構造化文書とともに前記オブジェクト指向構造化文書データベースに格納される前記構造化文書のメタ情報のパスに関する情報を記憶する設定情報記憶手段と、前記アプリケーションプログラムからの格納指示のもと受け取った前記構造化文書を前記オブジェクト指向構造化文書データベースに格納するとともに、前記アプリケーションプログラムから受け取ったメタ情報を、前記設定情報記憶手段に記憶されている前記メタ情報のパスに関する情報に従って、格納された前記構造化文書に挿入し、拡張構造化文書として格納する格納手段と、前記アプリケーションプログラムからの検索指示であるパス検索式と前記設定情報記憶手段に記憶された前記メタ情報のパスに関する情報との比較に基づいて、取得する文書が前記構造化文書か前記メタ情報かを判定し、当該判定結果に基づいて、前記オブジェクト指向構造化文書データベースに格納された前記拡張構造化文書から該当する前記構造化文書又は前記メタ情報を分離して取得する検索手段と、を有し、当該検索手段は、前記判定結果が構造化文書である場合には、前記拡張構造化文書から前記メタ情報を取り除くことを要旨とする。
【００２６】
請求項２記載の本発明は、請求項１に記載の格納検索装置を構成する各手段としてコンピュータを機能させることを要旨とする。
【００２７】
請求項３記載の本発明は、請求項２に記載された格納検索プログラムをコンピュータ読み取り可能な記録媒体に記録していることを要旨とする。
【００３１】
【発明の実施の形態】
以下、図面を用いて本発明の実施の形態について説明する。
【００３２】
図１は本発明の実施形態に係る格納・検索システム１の概略構成図である。図１に示す格納・検索システム１は、アプリケーション１００、格納・検索装置２００、オブジェクト指向構造化文書データベース（以下、データベースと呼ぶ）３００を備えている。尚、格納・検索システム１は、構成としては、一つからなる装置、各構成要素が分散されて複数の装置がネットワーク接続されたシステムなどのいずれの構成であっても良い。
【００３３】
アプリケーション１００は、格納・検索装置２００を利用するアプリケーションプログラムであり、その処理の中で構造化文書をデータベースに格納し、構造化文書を検索することを必要とするアプリケーションプログラムである。ここで、構造化文書を検索するために用いる検索式は上述したパス検索式である。
【００３４】
格納・検索装置２００は、アプリケーション１００から渡された構造化文書およびメタ情報をデータベース３００に格納する機能と、アプリケーション１００から渡されたパス検索式をもってデータベース３００に格納された構造化文書およびメタ情報の検索を行い、その検索結果をアプリケーション１００に返却する機能と、を有するミドルウェアプログラムが記録され、実行される装置である。
【００３５】
データベース３００は、構造化文書を格納するオブジェクト指向構造化文書データベースであり、上述した木構造取得機能、木構造操作機能、およびパス検索機能の３つの機能を有する。
【００３６】
さらに詳しくは、格納・検索装置２００は、制御装置２０１、設定情報辞書２０２、格納装置２０３および、検索装置２０４を備えている。
【００３７】
制御装置２０１は、アプリケーション１００からパス検索式を受け取ると、他の装置２０２乃至２０４を制御し、アプリケーション１００に検索結果を返却するようになっている。
【００３８】
設定情報辞書２０２は、格納・検索装置２００の動作を決定する設定情報を格納する辞書である。
【００３９】
格納装置２０３は、制御装置２０１から構造化文書およびメタ情報を受け取り、データベース３００に格納するようになっている。
【００４０】
検索装置２０４は、制御装置２０１からパス検索式を受け取り、データベース３００に対し検索を実行し、返却結果を制御装置２０１に返却するようになっている。
【００４１】
尚、設定情報辞書２０２に格納される設定情報には、データベース３００に格納する構造化文書に対して、どの位置にメタ情報を付加するかを示すメタ情報パスＰが含まれている。
【００４２】
次に、本実施の形態に係る格納・検索システム１の動作を図２乃至５を用いて説明する。ここで、図２は、格納・検索システム１の処理手順を示すフローチャート図である。図３乃至５は、図２の各ステップＳ１００、Ｓ２００、およびＳ３００を詳細に説明するフローチャート図である。
【００４３】
図２に示すように、格納・検索システム１は、まず、辞書の設定を行い、次に、アプリケーション１００からの指示により、構造化文書の格納もしくは、構造化文書の検索を行う（ステップＳ１００〜Ｓ４００）。尚、複数の構造化文書を処理する場合においては、ステップＳ１００をはじめに一度だけ行い、以降は個々の構造化文書について任意の順序でステップＳ２００およびステップＳ３００を繰り返し行う。
【００４４】
ここで、上述の各ステップについて説明する。まず、図３を用いて辞書の設定ステップＳ１００について説明する。
【００４５】
ユーザは、上述したメタ情報パスＰの一覧であるメタ情報パス集合Ｐ_1-nを生成し（ステップＳ１０１）、生成したメタ情報パス集合Ｐ_1-nを設定情報辞書２０２に対し設定入力する（ステップＳ１０２）。
【００４６】
次に、図４を用いて構造化文書の格納ステップＳ２００について説明する。
【００４７】
アプリケーション１００は、格納する構造化文書Ｄとメタ情報集合Ｍ_1-nを生成し、制御装置２０１に入力し、構造化文書格納を指示する（ステップＳ２０１）。なお、メタ情報Ｍ_iは、パスＰ_iに対応するメタ情報である。
【００４８】
制御装置２０１は、ステップＳ２０１で入力された構造化文書Ｄを、格納装置２０３に入力し、構造化文書の格納を指示する（ステップＳ２０２）。
【００４９】
格納装置２０３は、ステップＳ２０２で入力された構造化文書Ｄをデータベース３００に入力し、全文書挿入を指示する（ステップＳ２０３）。
【００５０】
データベース３００は、ステップＳ２０３で入力され指示された構造化文書Ｄを用いて、全文書挿入を実行する（ステップＳ２０４）。
【００５１】
次に、制御装置２０１は、ステップＳ２０１で入力されたメタ情報集合Ｍ_1-n、および設定情報辞書２０２より取り出したメタ情報パス集合Ｐ_1-nを格納装置２０３に入力し、メタ情報格納を指示する（ステップＳ２０５）。
【００５２】
格納装置２０３は、ステップＳ２０５で入力されたメタ情報Ｍ_iをメタ情報パスＰ_iに従ってデータベース３００に入力し、部分構造化文書挿入を指示する（ステップＳ２０６）。
【００５３】
データベース３００は、ステップＳ２０６で入力され指示されたメタ情報Ｍ_i、およびメタ情報パスＰ_iに基づいて部分構造化文書挿入を実行する（ステップＳ２０７）。この際、メタ情報パスＰ_iの指し示す位置にメタ情報Ｍ_iを挿入する。
【００５４】
次に、図５を用いて構造化文書の検索ステップＳ３００について説明する。
【００５５】
アプリケーション１００は、データベース３００より構造化文書またはメタ情報を取得するためのパス検索式Ｑを生成する（ステップＳ３０１）。この際、パス検索式Ｑの条件に、構造化文書Ｄおよびメタ情報集合Ｍ_1-nを指し示すパスを指定することができる。
【００５６】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑを制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００５７】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑを検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００５８】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑをデータベース３００に入力し、検索実行を指示すると、データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ_1-mを返却する（ステップＳ３０４）。
【００５９】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ_1-mを、制御装置２０１に返却する（ステップＳ３０５）。
【００６０】
制御装置２０１は、設定情報辞書２０２よりメタ情報パス集合Ｐ_1-nを取得し、該メタ情報パス集合Ｐ_1-nと、パス検索式Ｑから条件を除いたパスＰ_Qと、を比較する（ステップＳ３０６）。これは、具体的には、メタ情報パス集合Ｐ_1-n中の全てのメタ情報パスＰ_kについて、パスＰ_Qがメタ情報パスＰ_k自身またはその子孫ノードを指し示すかどうかで判定するものである。パスＰ_Qがメタ情報パスＰ_k自身またはその子孫ノードを指し示さない場合には、パス検索式Ｑは構造化文書Ｄの部分構造化文書集合を指し示すものとみなし、ステップＳ３０８を実行する。これに対して、パスＰ_Qがメタ情報パスＰ_k自身またはその子孫ノードを指し示す場合には、パス検索式Ｑはメタ情報集合Ｍ_1-nの部分構造化文書集合を指し示すものとみなし、ステップＳ３０９を実行する（ステップＳ３０７）。
【００６１】
パスＰ_Qがメタ情報パスＰ_k自身またはその子孫ノードを指し示さない場合には、制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ_1-mの個々のノードＮ_jについて、部分構造化文書取得を実行する（ステップＳ３０８）。部分構造化文書取得は、ノードＮ_j以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。ただし、この際、設定情報辞書２０２より、メタ情報パス集合Ｐ_1-nを取得し、これらのパスに該当するノードに関しては取得しない。これにより生成されるノードＮ_jを頂点とする部分構造化文書を部分構造化文書Ｅ_jとする。最終的に、ノード集合Ｎ_1-mの全てのノードについて部分構造化文書を生成し、部分構造化文書集合Ｅ_1-mを得る。
【００６２】
これに対して、パスＰ_Qがメタ情報パスＰ_k自身またはその子孫ノードを指し示す場合には、制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ_1-mの個々のノードＮ_jについて、部分構造化文書取得を実行する（ステップＳ３０９）。部分構造化文書取得は、ノードＮ_j以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。これにより生成されるノードＮ_jを頂点とする部分構造化文書を部分構造化文書Ｅ_jとする。最終的に、ノード集合Ｎ_1-mの全てのノードについて、部分構造化文書を生成し、部分構造化文書集合Ｅ_1-mを得る。
【００６３】
制御装置２０１は、ステップＳ３０８又はＳ３０９で生成した部分構造化文書集合Ｅ_1-mをアプリケーション１００に返却する（ステップＳ３１０）。
【００６４】
次に、具体的に、構造化文書としてＸＭＬ（eXtensible Markup Language）、データベース３００は、パス検索式としてＸＰａｔｈ（XML Path Language）をサポートするデータベース（以下、ＸＭＬＤＢと呼ぶ）を用いた場合の格納・検索システム１について説明する。
【００６５】
この格納・検索システム１は、上述した図２のフローチャートに示す動作を行う。ここで、実際に処理においては、アプリケーション１００の利用目的とユーザの操作に応じて、任意の順序でステップＳ２００およびＳ３００を必要回数繰り返すが、説明上、ステップＳ１００乃至Ｓ３００を１度のみ行うものとする。
【００６６】
まず、図３を用いて辞書の設定ステップＳ１００について説明する。
【００６７】
ユーザは、設定情報辞書２０２に対して、図６に示すメタ情報パスＰ₁およびＰ₂の一覧であるメタ情報パス集合Ｐ_1-2を生成する（ステップＳ１０１）。
【００６８】
そして、ユーザは、ステップＳ１０１で生成したメタ情報パス集合Ｐ_1-2を、設定情報辞書２０２に対し設定する（ステップＳ１０２）。
【００６９】
次に、図４を用いて構造化文書の格納ステップＳ２００について説明する。
【００７０】
アプリケーション１００は、図７に示す構造化文書Ｄと、図８に示すメタ情報集合Ｍ_1-2を生成し、制御装置２０１に入力し、構造化文書格納を指示する（ステップＳ２０１）。なお、メタ情報Ｍ_i（ｉ＝１，２）は、パスＰ_i（ｉ＝１，２）に対応するメタ情報である。
【００７１】
制御装置２０１は、ステップＳ２０１で入力された構造化文書Ｄを、格納装置２０３に入力し、構造化文書の格納を指示する（ステップＳ２０２）。
【００７２】
格納装置２０３は、ステップＳ２０２で入力された構造化文書Ｄをデータベース３００に入力し、全文書挿入を指示する（ステップＳ２０３）。
【００７３】
データベース３００は、ステップＳ２０３で入力され指示された構造化文書Ｄを用いて、全文書挿入を実行する（ステップＳ２０４）。格納された構造化文書Ｄのデータベース内での構造を図９に示す。
【００７４】
制御装置２０１は、ステップＳ２０１で入力されたメタ情報集合Ｍ_1-2、および設定情報辞書２０２より取り出されたメタ情報パス集合Ｐ_1-2を格納装置２０３に入力し、メタ情報格納を指示する（ステップＳ２０５）。
【００７５】
格納装置２０３は、ステップＳ２０５で入力されたメタ情報Ｍ_iをメタ情報パスＰ_iに従って、データベース３００に入力し、部分構造化文書挿入を指示する（ステップＳ２０６）。
【００７６】
データベース３００は、ステップＳ２０６で入力され指示されたメタ情報Ｍ_i、およびメタ情報パスＰ_iに基づいて部分構造化文書挿入を実行する（ステップＳ２０７）。この際、メタ情報パスＰ_iの指し示す位置にメタ情報Ｍ_iを挿入する。挿入された構造化文書Ｄとメタ情報集合Ｍ_1-2のデータベース内での構造を図１０に示す。図１０においては、メタ情報パスＰ₁の示すｎ₀₀₆の位置にメタ情報Ｍ₁が、メタ情報パスＰ₂の示すｎ₀₀₇の位置にメタ情報Ｍ₂が挿入されている。
【００７７】
次に、図５を用いて構造化文書の検索ステップＳ３００について説明する。
【００７８】
アプリケーション１００は、データベース３００より構造化文書を取得するための図１１に示すパス検索式Ｑを生成する（ステップＳ３０１）。図１１に示すパス検索式Ｑは、条件としてメタ情報を指定し（メタ情報であるfilename属性が‘homepage1.xml’であるもの）、構造化文書Ｄの部分構造化文書取得を表している（パス“/RDF”配下の部分構造化文書を取得）。
【００７９】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑを制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００８０】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑを検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００８１】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑをデータベース３００に入力し、検索実行を指示すると、データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ_1-mを返却する（ステップＳ３０４）。返却されるノード集合Ｎ_1-m（ｍ＝１であり、Ｎ₁）を図１２に示す。
【００８２】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ₁を、制御装置２０１に返却する（ステップＳ３０５）。
【００８３】
制御装置２０１は、設定情報辞書２０２より、メタ情報パス集合Ｐ_1-2を取得し、パス検索式Ｑから条件を除いたパスＰ_Qと比較する（ステップＳ３０６）。この例におけるパスＰ_Qを図１３に示す。パスＰ_Qが指し示すパスはルートノードの子ノードの“ＲＤＦ”ノードである。メタ情報パス集合Ｐ_1-2中の全てのメタ情報パスＰ_kについて、パスＰ_Qが指し示すノードが、メタ情報パスＰ_k自身かその子孫ノードであるようなメタ情報パスＰ_kが存在しないので、パス検索式Ｑは構造化文書Ｄの部分構造化文書集合を返却するものとみなしステップＳ３０８を実行する（ステップＳ３０７）。
【００８４】
制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ₁のノードＮ₁について、部分構造化文書取得を実行する（ステップＳ３０８）。部分構造化文書取得は、ノードＮ₁以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。ただし、この際、設定情報辞書２０２より、図６に示すメタ情報パス集合Ｐ_1-2を取得し、これらのパスに該当するノードは取得しない。これにより生成されるノードＮ₁を頂点とする部分構造化文書を部分構造化文書Ｅ₁とする。この具体例においては、ノード集合Ｎ₁のノードはノードＮ₁だけであるので、ノードＮ_１より部分構造化文書集合Ｅ₁を得る。図１４に生成される部分構造化文書集合Ｅ₁を示す。この時、データベース３００内の木構造に付加されていたメタ情報は、部分構造化文書集合Ｅ₁には付加されず、元の構造化文書Ｄに含まれている要素だけが出力される。
【００８５】
制御装置２０１は、ステップＳ３０８で生成した部分構造化文書集合Ｅ₁をアプリケーション１００に返却する（ステップＳ３１０）。
【００８６】
次に、ステップＳ３０７において、パス検索式Ｑがメタ情報を取得する場合の処理を以下に示す。
【００８７】
アプリケーション１００は、データベース３００より構造化文書を取得するための図１５に示すパス検索式Ｑ’を生成する（ステップＳ３０１）。パス検索式Ｑ’は、条件として構造化文書を指定し（構造化文書の“/RDF/Description/dc:creator”要素が‘春日’であるもの）、メタ情報の部分構造化文書取得を表している（パス“/RDF/change_log/log”配下の部分構造化文書を取得）。
【００８８】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑ’を制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００８９】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑ’を検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００９０】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑ’をデータベース３００に入力し、検索実行を指示する（ステップＳ３０４）。データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ’_1-mを返却する。返却されるノード集合Ｎ’_1-m（ｍ＝１であり、Ｎ’₁）を図１６に示す。
【００９１】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ’₁を制御装置２０１に返却する（ステップＳ３０５）。
【００９２】
制御装置２０１は、設定情報辞書２０２よりメタ情報パス集合Ｐ_1-2を取得し、パス検索式Ｑから条件を除いたパスＰ_Q _’と比較する（ステップＳ３０６）。この例におけるパスＰ_Q _’を図１７に示す。パスＰ_Qが指し示すパスはメタ情報集合Ｍ₂の子ノードの“log”ノードである。メタ情報パス集合Ｐ_1-2中の全てのメタ情報パスＰ_kについて、パスＰ_Qが指し示すノードが、メタ情報パスＰ_k自身かその子孫ノードであるようなメタ情報パスＰ_kが存在する（「メタ情報パスＰ₂」が該当）ので、パス検索式Ｑ’はメタ情報集合Ｍ_1-2の部分構造化文書集合を返却するものとみなしステップＳ３０９を実行する（ステップＳ３０７）。
【００９３】
制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ’₁のノードＮ’₁について、部分構造化文書取得を実行する（ステップＳ３０９）。部分構造化文書取得は、ノードＮ’₁以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。これにより生成されるノードＮ’₁を頂点とする部分構造化文書を部分構造化文書Ｅ’₁とする。この具体例においては、ノード集合Ｎ’_1-のノードはノードＮ’₁だけであるので、ノードＮ’_１より部分構造化文書集合Ｅ’₁を得る。図１８に生成される部分構造化文書集合Ｅ’₁を示す。
【００９４】
制御装置２０１は、ステップＳ３０９で生成した部分構造化文書集合Ｅ’₁をアプリケーション１００に返却する（ステップＳ３１０）。
【００９５】
従って、本実施の形態の格納・検索システム１によれば、メタ情報を格納する機能のないデータベース３００とアプリケーション１００の間に、ミドルウェアとしての格納・検索装置２００を用いることで、アプリケーション１００に依存せずに、メタ情報の追加・変更・削除に柔軟に対応することができるので、システム設計・開発の利便性の向上を図ることができる。
【００９６】
具体的には、オブジェクト指向構造化文書データベースに構造化文書を格納する際には、メタ情報を合わせて格納することができ、構造化文書を検索する際には、構造化文書とメタ情報を別々に取得することができる。また、構造化文書を取得する際には、メタ情報を条件として検索することが可能となり、メタ情報を取得する際には、構造化文書を条件として検索することが可能となる。
尚、上記実施の形態の格納・検索装置２００に格納されたミドルウェアプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体に記録することも、通信ネットワークを介して配信することも可能である。
【００９７】
【発明の効果】
以上説明したように、本発明によれば、オブジェクト指向構造化文書データベースがメタ情報を格納する機能を有しなくても、アプリケーションプログラムに依存せずに、メタ情報の追加・変更・削除に柔軟に対応できる格納検索装置、格納検索プログラム、および格納検索プログラム記録媒体を提供することができる。
【００９８】
これにより、メタ情報を格納する機能を有しないオブジェクト指向構造化文書データベースを利用して、構造化文書およびメタ情報を格納・検索するコンピュータシステムのシステム開発コストを低減させることができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る格納・検索システムの概略構成図である。
【図２】本発明の実施の形態に係る格納・検索システムの動作を示すフローチャートである
【図３】本発明の実施の形態に係る格納・検索システムの辞書の設定動作を示すフローチャートである。
【図４】本発明の実施の形態に係る格納・検索システムの構造化文書の格納動作を示すフローチャートである。
【図５】本発明の実施の形態に係る格納・検索システムの構造化文書の検索動作を示すフローチャートである。
【図６】メタ情報パスの一例である。
【図７】構造化文書の一例である。
【図８】メタ情報の一例である。
【図９】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図１０】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図１１】パス検索式の一例である。
【図１２】ノード集合の一例である。
【図１３】パスの一例である。
【図１４】部分構造化文書の一例である。
【図１５】パス検索式の一例である。
【図１６】ノード集合の一例である。
【図１７】パスの一例である。
【図１８】部分構造化文書の一例である。
【図１９】構造化文書の一例である。
【図２０】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図２１】取り出された部分構造化文書の一例である。
【図２２】挿入する部分構造化文書の一例である。
【図２３】挿入後のオブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図２４】パス検索式の一例である。
【符号の説明】
１格納・検索システム
１００アプリケーション
２００格納・検索装置
２０１制御装置
２０２設定情報辞書
２０３格納装置
２０４検索装置
３００オブジェクト指向構造化データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to middleware used in a computer system that stores and retrieves structured documents, and particularly stores and retrieves structured documents and meta information in an object-oriented database that does not have a function of storing meta information. Regarding middleware.
[0002]
[Prior art]
In recent years, structured documents such as XML (eXtensible Markup Language) have been used as data formats for sharing various information on the Internet. XML is a type of structured document standard that was standardized in December 1997 by the standardization body W3C (World Wide Web Consorium). Data written in accordance with the XML standard is called an XML document.
[0003]
An XML document is a document that can be read and edited by a person. However, at the same time, the XML document is structured using tags, and is also data that can be easily processed by a computer program. The tag of the XML document is a character string surrounded by “<” and “>” that is embedded in the document. The tag includes a start tag and an end tag, and an area surrounded by the start tag and the end tag is called an element. Elements can be described in a nested manner, such as having multiple child elements, each child element having multiple grandchild elements. Therefore, the XML document can express a multi-level tree structure.
[0004]
Currently, there is a wide variety of information expressed by XML documents, and the rules for tagging are defined in the XML standard, and are applied to specific uses. For example, RosettaNet (http://www.rosettanet.gr.jp/) and ebXML (http://www.ebxml.org/) for inter-company collaboration, RDF (
Resource Definition Framework, http://www.w3.org/RDF/), SVG (Scalable Vector Graphics) for multimedia information description, SMIL (Synchronized Multimedia Integration Language), and the like. In order to confirm that each of the above-mentioned XML documents for specific use is an XML document to be processed, an XML schema language (XML Schema, http://www.w3.org/ By performing verification using XML / Schema) and excluding non-standard XML documents, it is possible to focus processing only on the XML document to be processed.
[0005]
When a computer program processes an XML document, it is more convenient to convert the tree structure represented by the XML document into a tree structure on a computer memory. A representation of an XML document as a tree structure on a computer memory is called a DOM (Document Object Model). DOM is also standardized by W3C. DOM expresses an XML document with a node / link model composed of nodes and links. The elements of the XML document correspond to DOM nodes.
[0006]
When creating a system for processing DOM data on a computer memory, it is convenient if a search expression indicating a node in the DOM data can be used. Therefore, the notation method called XPath (XML Path Language) is also standardized by W3C. By using a path search expression such as XPath, it is possible to indicate a node that meets the conditions in the DOM data.
[0007]
Since various technologies related to XML as described above have been standardized and various computer systems have been developed based on XML, the need for a database for storing XML documents has increased in recent years. . There are roughly three types of databases that store XML: relational databases, object-oriented databases, and document databases.
[0008]
In order to store an XML document in a relational database, it is necessary to convert the XML document into a two-dimensional table which is a relational database storage model. At present, a relational database management system (RDBMS) based on a relational model is widely used in a customer management database, an article management database, and the like as a mainstream database management system (DBMS). Therefore, it is easy to use a highly reliable relational database management system (RDBMS), but in order to convert an XML document into a two-dimensional table format, the format and purpose of use of the original XML document are analyzed. It is necessary to study the optimal conversion method and design a relational schema. Therefore, the design / construction cost is high and it is suitable for large-scale system development, but is not suitable for medium- and small-scale system development.
[0009]
In order to store the XML document in the object-oriented database, the XML document may be stored in the database as it is. This is because the object-oriented database can store the tree structure, which is the basic structure of the XML document, as the parent-child relationship of the object as it is. For this reason, in medium and small scale system development where it is important to reduce system development costs and shorten the construction period, XML documents are stored in the database as tree structure data for the reason that complicated schema design is not necessary, and path retrieval formulas are used. Object-oriented databases that can be searched using are widely used. In the following description, an object-oriented database that stores structured documents is referred to as an object-oriented structured document database.
[0010]
When an XML document is stored in the document database, the structured document is stored as a sentence. The document database is a database in which a structured document is handled as a sentence, subjected to natural language analysis, indexed, and stored in the database. For this reason, the document database is used exclusively for storing text data among XML documents, but is not used except for system development for handling text.
[0011]
The structured document stored in the object-oriented structured document database is expressed by expressing the structured document shown in FIG. 19 as a tree structure of nodes and links as shown in FIG. It is realized by doing. 20B is a legend explaining the notation method of FIG. 20A. According to this legend, the tree structure always has a root node, and the elements of the structured document are element nodes. , Attributes are stored as attribute nodes, and character strings are stored as text nodes.
[0012]
The object-oriented structured document database has three functions of a tree structure acquisition function, a tree structure operation function, and a path search function for the tree structure of nodes and links.
[0013]
The tree structure acquisition function is a function for accessing a structured document stored in a database as a tree structure and acquiring node information. As a result, the database client can follow the tree structure and acquire node information. Also, by tracing the tree structure, the original structured document can be reconstructed. For example, the node n shown in FIG.₀₀₂Is designated as the base point, the partially structured document shown in FIG. 21 can be extracted.
[0014]
The tree structure operation function is a function for accessing a structured document stored in a database as a tree structure and operating node information. As a result, the database client can designate a node as a base point and add a new child node to the node. When this function is used, another structured document can be embedded in the structured document as a partially structured document. For example, the partially structured document shown in FIG.₀₀₂Is added to the child node, the tree structure shown in FIG. 23 is obtained. This function is called partial structured document insertion. As for the partially structured document insertion, the root node (n in FIG.₀₀₀) Is used as the base point to specify the insertion of the structured document itself, so that the entire document can be inserted into the structured document.
[0015]
The path search function is a function for acquiring a corresponding node group as a node set by a path search expression. The path is a character string in which a plurality of element names and attribute names are separated by “/”, and is a concept similar to a directory path used in the UNIX (registered trademark) OS and the like, and follows the tree structure of the structured document. Represents the order. A conditional expression can be added to the path search expression. The conditional expression instructs to narrow down the nodes when tracing the tree structure. FIG. 24 shows an example of a path search expression. In this example, it is shown that the author node of the child of the order node, the child node of the book node is returned, and the book node is limited to the book node whose value of the price node is 200 or more. The path search expression shown in FIG. 24 uses a root node as a base point, and a node set N = {n shown in FIG.₀₀₅} Is returned.
[0016]
As described above, the object-oriented database is suitable for a medium-sized database system that needs to store a structured document such as an XML document.
[0017]
The prior art document information related to this application includes the following.
[0018]
[Patent Document 1]
JP 2001-331479 A
[0019]
[Problems to be solved by the invention]
Conventionally, when a computer system is developed using an object-oriented database and an application program, additional information outside the document (for example, the creator, date, update history, etc. of the XML document) is added to a structured document such as an XML document. In many cases, this is referred to as meta information) and stored in an object-oriented database. In such a case, since there is a function of adding an arbitrary structure to the schema of the original XML document, it has been dealt with by a method of adding meta information to the schema of the XML document.
[0020]
However, in the above method, since the original XML document and the meta information part are not distinguished on the schema of the object-oriented database,
(1) When storing an XML document in an object-oriented database, the XML document and the meta information are combined and stored.
(2) When retrieving an XML document from an object-oriented database, the original XML document is extracted with the meta information part included.
The process is performed. Particularly for (2), if the XML document to be handled is a standardized XML document such as RDF or SMIL, the meta-information part will be an error as an invalid schema, so an XML document that does not conform to the standard will be used as it is as RDF or SMIL. The problem that it cannot be used for such processing occurs.
[0021]
For this reason, it is usually necessary to have means for deleting the meta information part, which is an unnecessary part of the application, so that it can be handled by RDF or SMIL processing. This causes the following problems in the development of the computer system. It was.
[0022]
(1) The application needs to cope with the addition and deletion processing of meta information, and the development cost is excessive.
[0023]
(2) The application needs to be corrected each time meta information increases.
[0024]
  The present invention has been made in view of the above circumstances, and even if the object-oriented structured document database does not have a function of storing meta information, addition / change / deletion of meta information can be performed without depending on an application program. Can respond flexibly toStorage retrieval device, storage retrieval program,It is another object of the present invention to provide a storage retrieval program recording medium.
[0025]
[Means for Solving the Problems]
  In order to achieve the above object, the present invention according to claim 1 is a storage and retrieval apparatus that accesses an object-oriented structured document database based on an instruction from an application program and performs information processing. Setting information storage means for storing information on the path of meta information of the structured document stored in the object-oriented structured document database; and the structured document received under the storage instruction from the application program In addition to storing in the structured document database, the meta information received from the application program is inserted into the structured document stored in accordance with the information about the path of the meta information stored in the setting information storage means, and extended. Storage means for storing as structured documents ,Based on a comparison between a path search expression that is a search instruction from the application program and information on the path of the meta information stored in the setting information storage unit, it is determined whether the document to be acquired is the structured document or the meta information. Search means for determining and separating the corresponding structured document or the meta information from the extended structured document stored in the object-oriented structured document database based on the determination result; The search means removes the meta information from the extended structured document when the determination result is a structured document.This is the gist.
[0026]
  The present invention described in claim 2A computer is caused to function as each means constituting the storage / retrieval apparatus according to claim 1.The gist.
[0027]
  The present invention described in claim 3The storage retrieval program according to claim 2 is recorded on a computer-readable recording medium.The gist.
[0031]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0032]
FIG. 1 is a schematic configuration diagram of a storage / retrieval system 1 according to an embodiment of the present invention. A storage / retrieval system 1 shown in FIG. 1 includes an application 100, a storage / retrieval apparatus 200, and an object-oriented structured document database (hereinafter referred to as a database) 300. The storage / retrieval system 1 may have any configuration such as a single device or a system in which each component is distributed and a plurality of devices are connected to the network.
[0033]
The application 100 is an application program that uses the storage / retrieval apparatus 200, and is an application program that needs to store a structured document in a database and retrieve a structured document during the processing. Here, the search formula used for searching the structured document is the above-described path search formula.
[0034]
The storage / retrieval apparatus 200 has a function of storing the structured document and meta information passed from the application 100 in the database 300 and a structured document and meta information stored in the database 300 with a path search expression passed from the application 100. A middleware program having a function of performing the above-described search and returning the search result to the application 100 is recorded and executed.
[0035]
The database 300 is an object-oriented structured document database that stores structured documents, and has the three functions of the above-described tree structure acquisition function, tree structure operation function, and path search function.
[0036]
More specifically, the storage / retrieval device 200 includes a control device 201, a setting information dictionary 202, a storage device 203, and a retrieval device 204.
[0037]
When receiving a path search expression from the application 100, the control device 201 controls the other devices 202 to 204 and returns the search result to the application 100.
[0038]
The setting information dictionary 202 is a dictionary that stores setting information that determines the operation of the storage / retrieval apparatus 200.
[0039]
The storage device 203 receives structured documents and meta information from the control device 201 and stores them in the database 300.
[0040]
The search device 204 receives a path search formula from the control device 201, performs a search on the database 300, and returns a return result to the control device 201.
[0041]
Note that the setting information stored in the setting information dictionary 202 includes a meta information path P indicating where the meta information is added to the structured document stored in the database 300.
[0042]
Next, the operation of the storage / retrieval system 1 according to the present embodiment will be described with reference to FIGS. Here, FIG. 2 is a flowchart showing a processing procedure of the storage / retrieval system 1. 3 to 5 are flowcharts for explaining in detail each step S100, S200, and S300 of FIG.
[0043]
As shown in FIG. 2, the storage / retrieval system 1 first sets a dictionary, and then stores a structured document or searches for a structured document in accordance with an instruction from the application 100 (steps S100 to S100). S400). In the case of processing a plurality of structured documents, step S100 is performed only once at the beginning, and thereafter, steps S200 and S300 are repeated in an arbitrary order for each structured document.
[0044]
Here, each step described above will be described. First, the dictionary setting step S100 will be described with reference to FIG.
[0045]
The user selects a meta information path set P that is a list of the meta information paths P described above._1-nIs generated (step S101), and the generated meta information path set P is generated._1-nIs input to the setting information dictionary 202 (step S102).
[0046]
Next, the structured document storing step S200 will be described with reference to FIG.
[0047]
The application 100 stores the structured document D and the meta information set M to be stored._1-nIs input to the control device 201 and the structured document storage is instructed (step S201). Meta information M_iIs the path P_iMeta information corresponding to.
[0048]
The control device 201 inputs the structured document D input in step S201 to the storage device 203 and instructs storage of the structured document (step S202).
[0049]
The storage device 203 inputs the structured document D input in step S202 to the database 300, and instructs insertion of all documents (step S203).
[0050]
The database 300 performs all document insertion using the structured document D input and designated in step S203 (step S204).
[0051]
Next, the control device 201 controls the meta information set M input in step S201._1-n, And a meta information path set P extracted from the setting information dictionary 202_1-nIs input to the storage device 203 to instruct storage of meta information (step S205).
[0052]
The storage device 203 uses the meta information M input in step S205._iMeta information path P_iTo input the partially structured document (step S206).
[0053]
The database 300 stores the meta information M input and instructed in step S206._i, And meta information path P_iThe partially structured document insertion is executed based on (Step S207). At this time, the meta information path P_iMeta information M at the position indicated by_iInsert.
[0054]
Next, the structured document search step S300 will be described with reference to FIG.
[0055]
The application 100 generates a path search formula Q for acquiring a structured document or meta information from the database 300 (step S301). At this time, the structured document D and the meta information set M are included in the conditions of the path search expression Q._1-nYou can specify a path that points to.
[0056]
The application 100 inputs the path search formula Q generated in step S301 to the control device 201 and instructs the search execution (step S302).
[0057]
The control device 201 inputs the path search formula Q input in step S302 to the search device 204 and instructs the search execution (step S303).
[0058]
When the search device 204 inputs the path search formula Q input in step S303 to the database 300 and instructs execution of the search, the database 300 executes the search and sends the node set N to the search device 204._1-mIs returned (step S304).
[0059]
The search device 204 uses the node set N returned in step S304._1-mIs returned to the control device 201 (step S305).
[0060]
The control device 201 reads the meta information path set P from the setting information dictionary 202._1-nAnd the meta information path set P_1-nAnd path P excluding the condition from path search formula Q_QAre compared (step S306). Specifically, this is a meta information path set P_1-nAll meta information paths P in_kPass P_QIs meta information path P_kJudgment is made based on whether it points to itself or its descendant nodes. Path P_QIs meta information path P_kIf it does not point to itself or its descendant nodes, the path search expression Q is regarded as pointing to the partially structured document set of the structured document D, and step S308 is executed. On the other hand, the path P_QIs meta information path P_kWhen pointing to itself or its descendant nodes, the path search expression Q is the meta information set M_1-nAnd the step S309 is executed (step S307).
[0061]
Path P_QIs meta information path P_kWhen the control device 201 does not point to itself or its descendant nodes, the control device 201 returns the node set N returned in step S305._1-mIndividual node N_jA partially structured document is acquired for (Step S308). Partially structured document acquisition is performed by node N_jAll of the following descendant nodes are obtained and assembled into a structured document. However, at this time, from the setting information dictionary 202, the meta information path set P_1-nAre acquired for nodes corresponding to these paths. Node N generated by this_jA partially structured document with a vertex as a partially structured document E_jAnd Finally, node set N_1-mPartially structured documents are generated for all nodes of the partial structured document set E_1-mGet.
[0062]
On the other hand, the path P_QIs meta information path P_kWhen indicating the node itself or its descendant nodes, the control device 201 determines that the node set N returned in step S305._1-mIndividual node N_jA partially structured document is acquired for (Step S309). Partially structured document acquisition is performed by node N_jAll of the following descendant nodes are obtained and assembled into a structured document. Node N generated by this_jA partially structured document with a vertex as a partially structured document E_jAnd Finally, node set N_1-mA partial structured document is generated for all nodes of the partial structured document set E._1-mGet.
[0063]
The control device 201 executes the partially structured document set E generated in step S308 or S309._1-mIs returned to the application 100 (step S310).
[0064]
Next, specifically, XML (eXtensible Markup Language) is used as a structured document, and the database 300 is stored using a database that supports XPath (XML Path Language) as a path search expression (hereinafter referred to as XML DB). -The search system 1 is demonstrated.
[0065]
The storage / retrieval system 1 performs the operation shown in the flowchart of FIG. Here, in actual processing, steps S200 and S300 are repeated as many times as necessary depending on the purpose of use of the application 100 and the user's operation. However, for the sake of explanation, steps S100 to S300 are performed only once. To do.
[0066]
First, the dictionary setting step S100 will be described with reference to FIG.
[0067]
The user makes the meta information path P shown in FIG.₁And P₂Meta information path set P that is a list of_1-2Is generated (step S101).
[0068]
The user then sets the meta information path set P generated in step S101._1-2Is set in the setting information dictionary 202 (step S102).
[0069]
Next, the structured document storing step S200 will be described with reference to FIG.
[0070]
The application 100 includes a structured document D shown in FIG. 7 and a meta information set M shown in FIG._1-2Is input to the control device 201 and the structured document storage is instructed (step S201). Meta information M_i(I = 1, 2) is the path P_iMeta information corresponding to (i = 1, 2).
[0071]
The control device 201 inputs the structured document D input in step S201 to the storage device 203 and instructs storage of the structured document (step S202).
[0072]
The storage device 203 inputs the structured document D input in step S202 to the database 300, and instructs insertion of all documents (step S203).
[0073]
The database 300 performs all document insertion using the structured document D input and designated in step S203 (step S204). The structure of the stored structured document D in the database is shown in FIG.
[0074]
The control device 201 reads the meta information set M input in step S201._1-2, And the meta information path set P extracted from the setting information dictionary 202_1-2Is input to the storage device 203 to instruct storage of meta information (step S205).
[0075]
The storage device 203 uses the meta information M input in step S205._iMeta information path P_iAccordingly, the data is input to the database 300 and a partial structured document insertion is instructed (step S206).
[0076]
The database 300 stores the meta information M input and instructed in step S206._i, And meta information path P_iThe partially structured document insertion is executed based on (Step S207). At this time, the meta information path P_iMeta information M at the position indicated by_iInsert. Inserted structured document D and meta information set M_1-2The structure in the database is shown in FIG. In FIG. 10, the meta information path P₁N₀₀₆Meta information M at the position of₁Is the meta information path P₂N₀₀₇Meta information M at the position of₂Has been inserted.
[0077]
Next, the structured document search step S300 will be described with reference to FIG.
[0078]
The application 100 generates a path search formula Q shown in FIG. 11 for acquiring a structured document from the database 300 (step S301). The path search formula Q shown in FIG. 11 specifies meta information as a condition (the filename attribute that is meta information is 'homepage1.xml'), and represents acquisition of a partially structured document of the structured document D ( Get a partially structured document under the path “/ RDF”).
[0079]
The application 100 inputs the path search formula Q generated in step S301 to the control device 201 and instructs the search execution (step S302).
[0080]
The control device 201 inputs the path search formula Q input in step S302 to the search device 204 and instructs the search execution (step S303).
[0081]
When the search device 204 inputs the path search formula Q input in step S303 to the database 300 and instructs execution of the search, the database 300 executes the search and sends the node set N to the search device 204._1-mIs returned (step S304). Node set N to be returned_1-m(M = 1, N₁) Is shown in FIG.
[0082]
The search device 204 uses the node set N returned in step S304.₁Is returned to the control device 201 (step S305).
[0083]
The control device 201 reads the meta information path set P from the setting information dictionary 202._1-2And the path P obtained by removing the condition from the path search formula Q_Q(Step S306). Path P in this example_QIs shown in FIG. Path P_QThe path indicated by “RDF” is a child node of the root node. Meta information path set P_1-2All meta information paths P in_kPass P_QThe node pointed to by the meta information path P_kMeta information path P that is itself or its descendant node_kTherefore, the path search formula Q is regarded as returning a partially structured document set of the structured document D, and step S308 is executed (step S307).
[0084]
The control device 201 sets the node set N returned in step S305.₁Node N of₁A partially structured document is acquired for (Step S308). Partially structured document acquisition is performed by node N₁All of the following descendant nodes are obtained and assembled into a structured document. At this time, however, the meta information path set P shown in FIG._1-2And nodes corresponding to these paths are not acquired. Node N generated by this₁A partially structured document with a vertex as a partially structured document E₁And In this example, the node set N₁Node is node N₁Only because node N₁More partially structured document set E₁Get. The partially structured document set E generated in FIG.₁Indicates. At this time, the meta information added to the tree structure in the database 300 is stored in the partially structured document set E.₁Only the elements included in the original structured document D are output.
[0085]
The control device 201 executes the partially structured document set E generated in step S308.₁Is returned to the application 100 (step S310).
[0086]
Next, processing in the case where the path search formula Q acquires meta information in step S307 will be described below.
[0087]
The application 100 generates a path search formula Q ′ shown in FIG. 15 for acquiring a structured document from the database 300 (step S301). The path search expression Q 'specifies a structured document as a condition (the "/ RDF / Description / dc: creator" element of the structured document is' Kasuga'), and represents the acquisition of a partially structured document of meta information. (Acquires partially structured documents under the path “/ RDF / change_log / log”).
[0088]
The application 100 inputs the path search formula Q ′ generated in step S301 to the control device 201 and instructs execution of search (step S302).
[0089]
The control device 201 inputs the path search formula Q ′ input in step S302 to the search device 204 and instructs the search execution (step S303).
[0090]
The search device 204 inputs the path search formula Q ′ input in step S303 to the database 300 and instructs the search execution (step S304). The database 300 executes the search, and sends the node set N ′ to the search device 204._1-mTo return. Node set N 'to be returned_1-m(M = 1, N ′₁) Is shown in FIG.
[0091]
The search device 204 returns the node set N ′ returned in step S304.₁Is returned to the control device 201 (step S305).
[0092]
The control device 201 reads the meta information path set P from the setting information dictionary 202._1-2And the path P obtained by removing the condition from the path search formula Q_Q _'(Step S306). Path P in this example_Q _'Is shown in FIG. Path P_QThe path pointed to by the meta information set M₂This is the “log” node of the child node. Meta information path set P_1-2All meta information paths P in_kPass P_QThe node pointed to by the meta information path P_kMeta information path P that is itself or its descendant node_k("Meta information path P₂”), The path search expression Q ′ is the meta information set M_1-2The step S309 is executed assuming that the partially structured document set is returned (step S307).
[0093]
The control device 201 determines that the node set N ′ returned in step S <b> 305.₁Node N '₁A partially structured document is acquired for (Step S309). Partially structured document acquisition is performed by the node N ′₁All of the following descendant nodes are obtained and assembled into a structured document. Node N ′ generated thereby₁A partially structured document having a vertex as a partially structured document E ′₁And In this example, the node set N '_1-Is the node N '₁Only node N '₁More partially structured document set E ′₁Get. The partially structured document set E ′ generated in FIG.₁Indicates.
[0094]
The control apparatus 201 executes the partially structured document set E ′ generated in step S309.₁Is returned to the application 100 (step S310).
[0095]
Therefore, according to the storage / retrieval system 1 of the present embodiment, the storage / retrieval apparatus 200 as middleware is used between the database 300 having no function of storing meta-information and the application 100, so that it depends on the application 100. Therefore, it is possible to flexibly cope with addition / change / deletion of meta information, so that convenience of system design / development can be improved.
[0096]
Specifically, when storing a structured document in an object-oriented structured document database, meta information can be stored together. When searching for a structured document, the structured document and meta information are stored together. Can be acquired separately. Further, when acquiring a structured document, it is possible to search using meta information as a condition, and when acquiring meta information, it is possible to search using a structured document as a condition.
The middleware program stored in the storage / retrieval apparatus 200 of the above embodiment may be recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, or a DVD-ROM. It is also possible to distribute via
[0097]
【The invention's effect】
  As described above, according to the present invention, even if the object-oriented structured document database does not have a function of storing meta information, it is flexible to add / change / delete meta information without depending on the application program. Can respond toStorage retrieval device, storage retrieval program,In addition, a storage retrieval program recording medium can be provided.
[0098]
Accordingly, it is possible to reduce the system development cost of a computer system that stores and retrieves structured documents and meta information using an object-oriented structured document database that does not have a function of storing meta information.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram of a storage / retrieval system according to an embodiment of the present invention.
FIG. 2 is a flowchart showing the operation of the storage / retrieval system according to the embodiment of the present invention.
FIG. 3 is a flowchart showing dictionary setting operation of the storage / retrieval system according to the embodiment of the present invention;
FIG. 4 is a flowchart showing the storage operation of the structured document of the storage / retrieval system according to the embodiment of the present invention.
FIG. 5 is a flowchart showing a structured document search operation of the storage / retrieval system according to the embodiment of the present invention;
FIG. 6 is an example of a meta information path.
FIG. 7 is an example of a structured document.
FIG. 8 is an example of meta information.
FIG. 9 is an example of a structured document stored in an object-oriented structured document database.
FIG. 10 is an example of a structured document stored in an object-oriented structured document database.
FIG. 11 is an example of a path search expression.
FIG. 12 is an example of a node set.
FIG. 13 is an example of a path.
FIG. 14 is an example of a partially structured document.
FIG. 15 is an example of a path search expression.
FIG. 16 is an example of a node set.
FIG. 17 is an example of a path.
FIG. 18 is an example of a partially structured document.
FIG. 19 is an example of a structured document.
FIG. 20 is an example of a structured document stored in an object-oriented structured document database.
FIG. 21 is an example of the extracted partially structured document.
FIG. 22 is an example of a partially structured document to be inserted.
FIG. 23 is an example of a structured document stored in an object-oriented structured document database after insertion.
FIG. 24 is an example of a path search expression.
[Explanation of symbols]
1 Storage / retrieval system
100 applications
200 Storage / retrieval device
201 Control device
202 Setting information dictionary
203 Storage device
204 Search device
300 Object-oriented structured database

Claims

A storage and retrieval apparatus that accesses an object-oriented structured document database based on an instruction from an application program and performs information processing,
Setting information storage means for storing information on a path of meta information of the structured document stored in the object-oriented structured document database together with the structured document;
The structured document received under the storage instruction from the application program is stored in the object-oriented structured document database, and the meta information received from the application program is stored in the setting information storage means Storage means for inserting into the stored structured document and storing it as an extended structured document according to information about the path of the meta information;
Whether the document to be acquired is the structured document or the meta information based on a comparison between a path search expression that is a search instruction from the application program and information on the path of the meta information stored in the setting information storage unit. Search means for determining and separating the corresponding structured document or the meta information from the extended structured document stored in the object-oriented structured document database based on the determination result; ,
The retrieval unit is characterized in that, when the determination result is a structured document, the meta information is removed from the extended structured document .

A storage retrieval program for causing a computer to function as each means constituting the storage retrieval device according to claim 1.

A storage retrieval program recording medium, wherein the storage retrieval program according to claim 2 is recorded in a computer-readable recording medium.