JP3765459B2

JP3765459B2 - XML document search device

Info

Publication number: JP3765459B2
Application number: JP05496099A
Authority: JP
Inventors: 智弘小野; 智西山; 貞夫小花
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 1999-03-03
Filing date: 1999-03-03
Publication date: 2006-04-12
Anticipated expiration: 2019-03-03
Also published as: JP2000250938A

Description

【０００１】
【発明の属する技術分野】
この発明はＸＭＬ文書検索装置に関し、特に、ユーザが検索対象となる文書の型のタグ定義（ＤＴＤ：Document Type Definition）を知らなくても、ＸＭＬ（eXtensible Markup Language）データベースから所望のデータを検索することのできるＸＭＬ文書検索装置に関する。
【０００２】
【従来の技術】
近年、インターネットやイントラネット上で文書を記述、交換するための言語として、ＸＭＬが注目されている。ＸＭＬはＨＴＭＬと異なり、構造をもった文書を記述するためのタグを用いることにより、文書を一まとまりではなく、細かい要素の単位で記述、管理することを可能としている。今日までに、ＸＭＬで記述された文書を格納し、検索するためのデータベースが幾つか発表されている。例えば、Object Design 社のeXcelon という名の商品等がある。
【０００３】
さて、ＸＭＬ文書では、タグはユーザが自由に定義して使用できるため、全ての利用者間で共通のＤＴＤが利用されるのではなく、情報発信者が独自に定義／拡張したＤＴＤを用いて文書が記述されることがあると考えられる。この結果、インターネットやイントラネット上では、構造上は異なっているが、意味的に類似したＤＴＤをもつＸＭＬ文書が散在することになる。
【０００４】
図６に、構造上は異なっているが、意味的に類似した２種類のＤＴＤの例と、これに基づいたＸＭＬ文書および検索式の例を示す。図６(a) は、paper,title,author, およびdateの各タグ（タグの名前を要素名と呼ぶ）を定義しているＤＴＤで、paper が残りの３つを含むことを示している。一方、同図(a')は、article,Title,page, およびwriterの各タグを定義しているＤＴＤで、article が残りの３つを含むことを示している。
【０００５】
同図(b) はＸＭＬ文書の表現を示し、paper を起点（ルート要素名）とするＤＴＤに従っていること、各要素名に対する値が、SAMPLE TITLE、john,1103 であることを示している。同図(b')は、article を起点とするＤＴＤに従っていること、各要素名に対する値が、SAMPLE TITLE,123,john であることを示している。
【０００６】
さらに、同図(c) はＸＭＬ−ＱＬで記述した検索式で、paper をルート要素名とし、authorの値がjohnであるＸＭＬ文書からtitle の値を取得することを示している。また、同図(c')は、article をルート要素名とし、writerの値がjohnであるＸＭＬ文書からTitle の値を取得することを示している。
【０００７】
図９は、前記のＸＭＬデータベースを使用した文書の検索例の説明図である。プロセス構成は、ユーザの検索要求を受付け、データベースへデータベース操作言語で要求を送るデータベースクライアント３１と、ＸＭＬ文書を格納し、外部へデータベース操作言語による操作を提供するＸＭＬデータベース３２からなっている。この従来構成では、ユーザあるいはアプリケーションプログラムが、データベースから文書全体あるいはその一部を取得しようとすると、該ユーザ等は目的とする文書が存在しそうな全ての型（例えば、paper 型、article 型）のＤＴＤをそれぞれ理解し、図示されている３３、３４のように、それらの型毎に検索操作を発行することが必要になる。
【０００８】
【発明が解決しようとする課題】
前記したように、インターネットやイントラネット上では、構造上は異なっているが、意味的に類似したＤＴＤをもつＸＭＬデータベースが散在するため、ユーザあるいはアプリケーションプログラムが、該ＸＭＬデータベースからＸＭＬ文書を検索しようとすると、必要な値があると思われる全てのＤＴＤの文書に対して別々に検索式を記述することが必要になり、効率的でないという問題があった。
【０００９】
例えば、図９を例にとると、john氏が書いた著書の題名を知りたい場合、ＸＭＬデータベースでは、paper とarticle で定義される文書は異なったものであるため、paper とarticlのそれぞれに対して、図９の３３、３４のように、別々に検索式を記述して問い合わせることが必要になる。
また、このため、そのコストは類似した異なるＤＴＤに基づいて記述された文書が増えるに従って増大するという問題もあった。
【００１０】
本発明の目的は、前記した従来技術の問題点を除去し、構造上は異なっているが、意味的に類似したＤＴＤをもつＸＭＬデータベースに対して、ユーザがＤＴＤの差異を意識せずに効率的に検索することのできるＸＭＬ文書検索装置を提供することにある。
【００１１】
【課題を解決するための手段】
前記した目的を達成するために、この発明は、ＸＭＬ文書から所望の文書を検索するためのＸＭＬ文書検索装置において、入力された検索式からタグの要素名を抽出する手段と、該抽出された要素名の類義語を抽出する手段と、該類義語を、ＸＭＬデータベースのタグ定義（ＤＴＤ）に対応したカテゴリ索引と対照し、該カテゴリ索引から前記類義語と一致するタグの要素名を取得する手段と、該カテゴリ索引から取得したタグの要素名を用いて出力用の検索式を作成する手段とを具備し、該出力用の検索式を用いて、前記ＸＭＬデータベースを検索するようにした点に特徴がある。
【００１２】
この発明によれば、入力された検索式は、該検索式に記述されているタグの要素名の類義語を基に、ＸＭＬデータベース内に実在する文書のタグ定義に対応した要素名をもつ出力用の検索式に自動的に変換されるので、データベースクライアントは検索対象となる文書の型のＤＴＤを知る必要がなく、検索手続きが簡単になると共に、検索範囲を拡張させることができるようになる。
【００１３】
【発明の実施の形態】
以下に、図面を参照して、本発明を詳細に説明する。図１は、本発明のＸＭＬ文書検索システムの一実施形態の構成を示すブロック図である。
図１に示されているように、ＸＭＬ文書検索システムは、ＸＭＬ文書検索装置１と、ＸＭＬデータベース２と、データベースクライアント３から構成されている。
【００１４】
ＸＭＬ文書検索装置１は、外部からの入力を受付けてこれを解析する入力解析部１１と、要素の集合を受取り、その要素の集合を特徴付けるカテゴリ名を出力するカテゴリ類推部１２と、ＸＭＬデータベース２のＤＴＤの情報に対応したカテゴリ索引１３を管理するカテゴリ索引管理部１４と、与えたキーワードの複数の類義語を出力する類義語抽出部１５と、検索装置１の処理結果を外部へ送出する出力合成部１６と、前記各部の全体の制御を行う中央制御部１７から構成されている。
【００１５】
前記ＸＭＬ文書検索装置１の構成をさらに詳細に説明すると、前記入力解析部１１は、データベースクライアント３からのデータベース操作要求を受付け、操作要求のパラメタの抽出を行う。また、ＸＭＬデータベース２からの応答を受け付ける。前記カテゴリ類推部１２は同一要素名に属する要素の集合を中央制御部１７から受取り、その要素集合を特徴付けるカテゴリ名を類推し、その中で最も信頼度の高いものを中央制御部１７へ送出する。前記カテゴリ索引管理部１４は、ＸＭＬデータベース２のＤＴＤの情報に対応したカテゴリ索引１３を管理する。
【００１６】
前記カテゴリ索引１３は、ＤＴＤのあるタグに対応した要素の集合を特徴付ける「カテゴリ名」を索引鍵とし、それに対応する実際のＤＴＤを値とするものである。該「カテゴリ名」は、実際のＸＭＬデータベース２の値からシソーラスを利用した類推により導出される。
【００１７】
また、前記類義語抽出部１５は、与えたキーワードの複数の類義語を出力する。既存のシソーラスＤＢ等が使用可能である。例えば、QZS Dictionary Server 等のシソーラスＤＢが使用可能である。前記出力合成部１６は、データベースクライアント３によってなされたデータベース操作要求に伴ってＸＭＬ文書検索装置１によってなされた処理結果である検索式の各パラメタを受取り、複数の検索式を合成してＸＭＬデータベース２に送出する。また、入力解析部１１から転送されたＸＭＬデータベース２からの応答をデータベースクライアント３へ送出する。前記中央制御部１７は、入力解析部１１からパラメタを受取り、カテゴリ類推部１２、カテゴリ索引管理部１４、および類義語抽出部１５を利用して、データベース操作処理、カテゴリ索引構築／変更処理を行い、その結果を出力合成部１６に送る。
【００１８】
次に、前記の構成を有するＸＭＬ文書検索装置１の動作を、以下に説明する。まず、該ＸＭＬ文書検索装置１を初めてＸＭＬデータベース２に接続した時に、前記中央制御部１７が行う動作を、図２のフローチャートと図３の具体例を参照して説明する。この動作は、実際のＸＭＬデータベース２の値からカテゴリ索引１３を構築する動作である。
【００１９】
ステップＳ１では、ＸＭＬデータベース２から全てのルート要素名と、それに対応する型（ＤＴＤ）を取得し、カテゴリ索引管理部１４へＤＴＤ登録要求を出す。カテゴリ索引管理部１４はカテゴリ索引１３にＤＴＤを登録する。図３の例では、ＸＭＬデータベース２中に格納されているルート要素名「paper 」とそれに対応するＤＴＤ「paper,title,author,date 」、次のルート要素名「article 」とそれに対応するＤＴＤ「article,Title,page,writer 」、さらに次のルート要素名「trip」とそれに対応するＤＴＤ「destination,departure,arrival 」、…を、ＸＭＬデータベース２から取得し、一旦カテゴリ索引１３に登録する。
【００２０】
ステップＳ２では、前記ルート要素名の中の、あるルート要素名について、ＸＭＬデータベース２から、任意個の文書（data) を取得する。図３の例では、ルート要素名「paper 」に対応する文書「SAMPLE,john,9701」、「SAMPLE2,john,9811 」等を、ＸＭＬデータベース２から取得する。
【００２１】
ステップＳ３では、取得した複数の文書をカテゴリ類推部１２へ送り、送った複数の文書を代表するカテゴリ名を取得する。カテゴリ類推部１２では、複数の文書を基にそれを代表するカテゴリ名を類推し、最も信頼度の高いもの（cname)を中央制御部１７へ送出する。図３の例では、カテゴリ類推部１２が前記文書「SAMPLE,john,9701」、「SAMPLE2,john,9811 」から、カテゴリ名「本」を類推したとする。
【００２２】
ステップＳ４では、カテゴリ索引管理部１４に対して、該cname の登録要求を出す。カテゴリ索引管理部１４は該cname を前記ルート要素名と対応付けてカテゴリ索引１３に登録し管理する。図３の例では、cname である「本」をルート要素名「paper 」と関連付けてカテゴリ索引１３に登録する。
【００２３】
ステップＳ５では、全部のルート要素名にcname が対応付けられたか否かの判断がなされ、この判断が否定の時にはステップＳ２に戻って、前記の動作が繰り返される。図３の例では、次に、ルート要素名「article 」に対応する文書「Flower,101,thomas 」、「Animals,100,tom 」、「Database,56,john」が取得され、これらから例えばカテゴリ名「本」が類推されて、cname である「本」をルート要素名「article 」と関連付けてカテゴリ索引１３に登録する。
【００２４】
以上の処理が繰返し行われ、前記ステップＳ５の判断が肯定になると、カテゴリ索引構築の処理は終了する。以上の動作により、例えば、図５に示されているような、カテゴリ索引１３が作成される。
【００２５】
なお、構築されたカテゴリ索引は、データ型の挿入や更新に伴って変更したり、格納する文書の増加あるいは変化に伴ってカテゴリ名の精度を向上させる等により、維持することが必要である。このカテゴリ名の更新は、データ操作やデータ型操作を契機として、前記中央制御部１７とカテゴリ索引管理部１４とカテゴリ類推部１２が行う。
【００２６】
次に、ＸＭＬ文書検索装置１のデータ検索処理の動作を、図４のフローチャートおよび図５の説明図を参照して説明する。
ステップＳ１１では、前記データベースクライアント３の検索操作により、検索式の入力があったか否かの判断がなされる。この判断が肯定になるとステップＳ１２に進み、ある数ｉが１と置かれる。ステップＳ１３では、前記検索式２１から、ルート要素名と、パラメタ要素名と、その値が抽出される。抽出されたパラメタ数（ルート要素名＋パラメタ要素名）の個数をｘ個とする。
【００２７】
例えば、図５に示されているように、データベースクライアント３から、検索式２１が入力されたとすると、該検索式は入力解析部１１を通って中央制御部１７に送られる。該中央制御部１７は、検索式２１から、ルート要素名「文書」と、パラメタの要素名に相当する「著者」とその値である「john」と、他の要素名である「題名」を抽出する。この場合には、パラメタ数ｘ＝３となる。
【００２８】
ステップＳ１４では、類義語抽出部１５へ、該抽出したルート要素名とパラメタの要素名を渡し、それぞれの類義語を取得する。図５の例では、ルート要素名である「文書」と、パラメタの要素名である「著者」と「題名」が、類義語抽出部１５に渡される。そうすると、該類義語抽出部１５は、前記ルート要素名およびパラメタの要素名に対応する類義語を中央制御部１７に回答する。なお、該類義語抽出部１５としては、市販のシソーラスＤＢ２３を使用することができる。
【００２９】
ステップＳ１５では、該ルート要素名の類義語、例えば前記「文書」の類義語である本、paper,Paper,Document,article等を前記カテゴリ索引管理部１４を通してカテゴリ索引１３に送り、該カテゴリ索引１３から、該類義語をカテゴリ名にもつルート要素名とＤＴＤを取得する。図５の例では、カテゴリ索引１３から、カテゴリ索引「本」に対応するルート要素名「paper 」と「article 」とを取得する。また、各ルート要素名に対応するＤＴＤを取得する。
【００３０】
ステップＳ１６では、カテゴリ索引の中に、前記ルート要素名の類義語群が存在するか否かの判断がなされる。この判断が否定の時には、処理を終了する。一方、肯定の時には、ステップＳ１７に進んで、前記カテゴリ索引から取得したルート要素名の個数をｋ個とし、ｉ番目のルート要素名のＤＴＤを取得し、該ＤＴＤの中で前記類義語と一致する要素名を選択する。この時、選択した要素名の個数をｙとする。
【００３１】
図５の例では、ルート要素名「paper 」のＤＴＤ「paper,title,author,date 」を取得し、前記ルート要素名の下位のパラメタの類義語「author,writer,Author,....,Title,title,Theme,... 」と一致する要素名を、前記ＤＴＤから選択する。この例では、「paper,title,author」が一致するので、該「paper,title,author」が選択される。
【００３２】
ステップＳ１８では、該一致した要素名の個数ｙ＝前記検索式から抽出したパラメタ個数ｘが成立するか否かの判断を行い、この判断が肯定の場合には、ステップＳ１９に進んで、出力検索式を１個作成する。図５の例では、「paper,title,author」を用いて一つの出力検索式が作成される。
【００３３】
ステップＳ２０では、ｉ≧ｋが成立するか否かの判断が行われる。この判断が否定の時およびステップＳ１８の判断が否定の時には、ステップＳ２１に進んでｉに１が加算される。そして、ステップＳ１７に戻って、次のルート要素名（図５の例では、「article 」）のＤＴＤを取得し、該ＤＴＤの中で前記類義語と一致する要素名を選択する。この例では、「article,writer,Title」が選択される。以上の動作が繰返し行われ、ステップＳ２０の判断が肯定になると、ステップＳ２２に進んで、前記出力合成部１６にて、出力検索式の合成が行われる。図５の例では、この合成により、出力検索式２２ａと２２ｂが得られることになる。
【００３４】
ステップＳ２３では、該検索式２２ａと２２ｂが前記ＸＭＬデータベース２に送られる。ステップＳ２４では、ＸＭＬデータベース２からの応答が収集されて入力解析部１１を介して出力合成部１６に送られ、ステップＳ２５では収集結果が該出力合成部１６からデータベースクライアント３へ送られる。
【００３５】
以上のようにして、上記の実施形態によれば、ユーザはＤＴＤの要素名の差や配置を意識せずに、ＸＭＬデータベースを効率的に検索することができるようになる。
【００３６】
次に、本発明の第２実施形態を、図６および図７を参照して説明する。図６は前記カテゴリ索引１３を構築する動作の説明図である。この実施形態は、図３で示したようなカテゴリ類推部１２を用いずに、ＸＭＬデータベース２から、この中に格納されているルート要素名とそれに対応するＤＴＤを任意の個数または全部取得し、カテゴリ索引１３に登録するようにしたものである。この方法によれば、図７に示されているような内容の、ルート要素名とＤＴＤがカテゴリ索引１３として登録されることになる。
【００３７】
次に、ＸＭＬ文書検索装置１のデータ検索処理の動作を図７を参照して説明する。本実施形態の動作が図５の動作と異なる点は、中央制御部１７が、類義語抽出部１５から取得したルート要素名の類義語を基に、カテゴリ索引１３のルート要素名を検索するようにしたことにあり、他の点は、図５と同じである。
【００３８】
この実施形態によれば、ＸＭＬデータベースの検索の精度は、前記第１実施形態に比べて若干低下すると考えられるが、カテゴリ索引１３を簡単な構成でかつ安価に構築できるという利点を有している。
【００３９】
【発明の効果】
以上の説明から明らかなように、本発明によれば、入力された検索式からタグの要素名を抽出し、該要素名を、その類義語を基にＸＭＬデータベースに格納されているタグの要素名に変換して、出力検索式を作成するようにしているので、ユーザは、検索対象となるＸＭＬデータベースの文書の型のＤＴＤを予め知っている必要がなく、簡単に検索式を作成することができる。したがって、ユーザは効率的に検索でき、しかも、精度良く検索結果を取得することができる。
【００４０】
また、カテゴリ索引は、ＸＭＬデータベースの文書に追加、変更、削除等の更新があると自動的に更新されるので、何らのメンテナンスをすることなく、最良の状態に維持できる。
【図面の簡単な説明】
【図１】本発明の一実施形態の概略の構成を示すブロック図である。
【図２】本発明の第１実施形態のカテゴリ索引構築の動作を示すフローチャートである。
【図３】該第１実施形態のカテゴリ索引構築の動作説明図である。
【図４】本発明の第１実施形態のＸＭＬ文書検索装置のデータ検索処理の動作を示すフローチャートである。
【図５】前記第１実施形態のＸＭＬ文書検索装置のデータ検索処理の動作説明図である。
【図６】本発明の第２実施形態のカテゴリ索引構築の動作説明図である。
【図７】本発明の第２実施形態のＸＭＬ文書検索装置のデータ検索処理の動作説明図である。
【図８】ＤＴＤ、ＸＭＬ文書、および検索式の一例の説明図である。
【図９】従来のＸＭＬ文書検索方法の説明図である。
【符号の説明】
１…ＸＭＬ文書検索装置、２…ＸＭＬデータベース、３…データベースクライアント、１１…入力解析部、１２…カテゴリ類推部、１３…カテゴリ索引、１４…カテゴリ索引管理部、１５…類義語抽出部、１６…出力合成部、２１…入力された検索式、２２ａ，２２ｂ…出力検索式。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an XML document search apparatus, and in particular, searches for desired data from an XML (eXtensible Markup Language) database even if a user does not know a tag type (DTD: Document Type Definition) of a document type to be searched. The present invention relates to an XML document search apparatus capable of performing the above.
[0002]
[Prior art]
In recent years, XML has attracted attention as a language for describing and exchanging documents on the Internet or an intranet. Unlike HTML, XML uses a tag for describing a document having a structure, so that the document can be described and managed in units of small elements rather than as a unit. To date, several databases have been published for storing and retrieving documents written in XML. For example, there is a product named eXcelon from Object Design.
[0003]
In an XML document, since tags can be freely defined and used by users, a common DTD is not used among all users, but a DTD uniquely defined / extended by an information sender is used. A document may be described. As a result, on the Internet or an intranet, XML documents having DTDs that are structurally different but having semantically similar DTDs are scattered.
[0004]
FIG. 6 shows two types of DTDs that are structurally different but semantically similar, and examples of XML documents and search formulas based on the examples. FIG. 6A shows a DTD that defines each tag of paper, title, author, and date (the tag name is called an element name), and shows that paper includes the remaining three. On the other hand, (a ′) in the figure shows a DTD that defines each tag of article, title, page, and writer, and shows that article includes the remaining three.
[0005]
FIG. 6B shows the representation of the XML document, which shows that it follows DTD starting from paper (root element name), and that the value for each element name is SAMPLE TITLE, john, 1103. FIG. 4B 'shows that DTD is followed starting from article, and that the value for each element name is SAMPLE TITLE, 123, john.
[0006]
Further, FIG. 6C shows a retrieval formula described in XML-QL, in which the value of title is acquired from an XML document in which paper is the root element name and the value of author is john. Further, (c ′) in the figure shows that a Title value is acquired from an XML document in which article is the root element name and the writer value is john.
[0007]
FIG. 9 is an explanatory diagram of an example of document search using the XML database. The process configuration includes a database client 31 that accepts a user search request and sends a request to the database in a database operation language, and an XML database 32 that stores an XML document and provides an operation in the database operation language to the outside. In this conventional configuration, when a user or an application program tries to acquire the entire document or a part thereof from the database, the user or the like has all types (for example, paper type and article type) where the target document is likely to exist. It is necessary to understand each DTD and issue a search operation for each of those types, as shown by 33 and 34 shown in the figure.
[0008]
[Problems to be solved by the invention]
As described above, XML databases having different DTDs that are structurally different on the Internet or an intranet are scattered, but a user or an application program tries to retrieve an XML document from the XML database. Then, it is necessary to separately describe a search expression for all DTD documents that are considered to have a necessary value, and there is a problem that it is not efficient.
[0009]
For example, taking Fig. 9 as an example, if you want to know the title of a book written by Mr. John, the document defined by paper and article is different in the XML database, so for each of paper and articl Thus, it is necessary to separately inquire by describing a search expression as indicated by 33 and 34 in FIG.
For this reason, there is also a problem that the cost increases as the number of documents described based on different DTDs increases.
[0010]
The object of the present invention is to eliminate the above-mentioned problems of the prior art and to improve the efficiency of the XML database having a DTD that is structurally different but having a DTD that is semantically similar without being aware of the difference in DTD. It is an object of the present invention to provide an XML document search apparatus that can search automatically.
[0011]
[Means for Solving the Problems]
In order to achieve the above-described object, the present invention provides an XML document search apparatus for searching a desired document from an XML document, means for extracting a tag element name from an input search expression, and the extracted Means for extracting a synonym of an element name; and means for comparing the synonym with a category index corresponding to a tag definition (DTD) of an XML database, and obtaining an element name of a tag that matches the synonym from the category index; And a means for creating an output search expression using the element name of the tag acquired from the category index, and the XML database is searched using the output search expression. is there.
[0012]
According to this invention, the input search expression is for output having an element name corresponding to the tag definition of the document existing in the XML database based on the synonym of the element name of the tag described in the search expression. Therefore, the database client does not need to know the DTD of the document type to be searched, and the search procedure is simplified and the search range can be expanded.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an embodiment of an XML document search system of the present invention.
As shown in FIG. 1, the XML document search system includes an XML document search device 1, an XML database 2, and a database client 3.
[0014]
The XML document search apparatus 1 includes an input analysis unit 11 that receives and analyzes an input from the outside, a category analogy unit 12 that receives a set of elements and outputs a category name that characterizes the set of elements, and an XML database 2. Category index management unit 14 for managing the category index 13 corresponding to the DTD information, a synonym extraction unit 15 for outputting a plurality of synonyms of the given keyword, and an output synthesis unit for sending processing results of the search device 1 to the outside 16 and a central control unit 17 that performs overall control of each unit.
[0015]
The configuration of the XML document search apparatus 1 will be described in more detail. The input analysis unit 11 receives a database operation request from the database client 3 and extracts an operation request parameter. Also, a response from the XML database 2 is accepted. The category analogy unit 12 receives a set of elements belonging to the same element name from the central control unit 17, estimates the category names that characterize the element set, and sends the most reliable one to the central control unit 17. . The category index management unit 14 manages the category index 13 corresponding to the DTD information in the XML database 2.
[0016]
The category index 13 uses a “category name” characterizing a set of elements corresponding to a tag having a DTD as an index key and an actual DTD corresponding to the “category name” as a value. The “category name” is derived from the values in the actual XML database 2 by analogy using a thesaurus.
[0017]
The synonym extraction unit 15 outputs a plurality of synonyms of the given keyword. An existing thesaurus DB or the like can be used. For example, a thesaurus DB such as QZS Dictionary Server can be used. The output synthesizing unit 16 receives parameters of a search expression that is a processing result made by the XML document search apparatus 1 in response to a database operation request made by the database client 3, and synthesizes a plurality of search expressions to the XML database 2 To send. Also, the response from the XML database 2 transferred from the input analysis unit 11 is sent to the database client 3. The central control unit 17 receives parameters from the input analysis unit 11 and performs database operation processing and category index construction / change processing using the category analogy unit 12, the category index management unit 14, and the synonym extraction unit 15, The result is sent to the output composition unit 16.
[0018]
Next, the operation of the XML document search apparatus 1 having the above configuration will be described below. First, the operation performed by the central control unit 17 when the XML document retrieval apparatus 1 is first connected to the XML database 2 will be described with reference to the flowchart of FIG. 2 and the specific example of FIG. This operation is an operation for constructing the category index 13 from the values in the actual XML database 2.
[0019]
In step S 1, all root element names and corresponding types (DTD) are acquired from the XML database 2, and a DTD registration request is issued to the category index management unit 14. The category index management unit 14 registers DTD in the category index 13. In the example of FIG. 3, the root element name “paper” stored in the XML database 2 and the corresponding DTD “paper, title, author, date”, the next root element name “article” and the corresponding DTD “ “article, Title, page, writer” and the next root element name “trip” and the corresponding DTD “destination, department, arrival”,... are acquired from the XML database 2 and temporarily registered in the category index 13.
[0020]
In step S2, an arbitrary number of documents (data) is acquired from the XML database 2 for a certain root element name among the root element names. In the example of FIG. 3, the documents “SAMPLE, john, 9701”, “SAMPLE2, john, 9811”, etc. corresponding to the root element name “paper” are acquired from the XML database 2.
[0021]
In step S3, the acquired plurality of documents are sent to the category analogy unit 12, and category names representing the plurality of sent documents are acquired. The category analogy unit 12 infers a category name representing it based on a plurality of documents, and sends the most reliable one (cname) to the central control unit 17. In the example of FIG. 3, it is assumed that the category analogy unit 12 analogizes the category name “book” from the documents “SAMPLE, john, 9701” and “SAMPLE2, john, 9811”.
[0022]
In step S4, a registration request for the cname is issued to the category index management unit 14. The category index management unit 14 registers and manages the cname in the category index 13 in association with the root element name. In the example of FIG. 3, “book” as cname is associated with the root element name “paper” and registered in the category index 13.
[0023]
In step S5, it is determined whether or not cname is associated with all the root element names. If this determination is negative, the process returns to step S2 to repeat the above operation. In the example of FIG. 3, next, the documents “Flower, 101, thomas”, “Animals, 100, tom”, “Database, 56, john” corresponding to the root element name “article” are acquired. The name “book” is inferred, and “book” which is cname is associated with the root element name “article” and registered in the category index 13.
[0024]
When the above process is repeated and the determination in step S5 is affirmative, the category index construction process ends. With the above operation, for example, the category index 13 as shown in FIG. 5 is created.
[0025]
It is necessary to maintain the constructed category index by changing it with the insertion or update of the data type, or improving the accuracy of the category name as the number of stored documents increases or changes. The update of the category name is performed by the central control unit 17, the category index management unit 14, and the category analogy unit 12 triggered by a data operation or a data type operation.
[0026]
Next, the data search processing operation of the XML document search apparatus 1 will be described with reference to the flowchart of FIG. 4 and the explanatory diagram of FIG.
In step S11, it is determined whether or not a search expression has been input by the search operation of the database client 3. If this determination becomes affirmative, the process proceeds to step S12, and a certain number i is set to 1. In step S13, a root element name, a parameter element name, and a value thereof are extracted from the search formula 21. Let x be the number of extracted parameters (root element name + parameter element name).
[0027]
For example, as shown in FIG. 5, if a search formula 21 is input from the database client 3, the search formula is sent to the central control unit 17 through the input analysis unit 11. The central control unit 17 obtains the root element name “document”, the “author” corresponding to the element name of the parameter, the value “john”, and the other element name “title” from the search expression 21. Extract. In this case, the parameter number x = 3.
[0028]
In step S14, the extracted root element name and the element name of the parameter are passed to the synonym extraction unit 15, and each synonym is acquired. In the example of FIG. 5, “document” as the root element name and “author” and “title” as parameter element names are passed to the synonym extraction unit 15. Then, the synonym extraction unit 15 returns a synonym corresponding to the root element name and the element name of the parameter to the central control unit 17. In addition, as this synonym extraction part 15, commercially available thesaurus DB23 can be used.
[0029]
In step S15, a synonym of the root element name, for example, a book, paper, paper, document, article, or the like that is a synonym of the “document” is sent to the category index 13 through the category index management unit 14, and from the category index 13, A root element name and DTD having the synonym as a category name are acquired. In the example of FIG. 5, the root element names “paper” and “article” corresponding to the category index “book” are acquired from the category index 13. Also, a DTD corresponding to each root element name is acquired.
[0030]
In step S16, it is determined whether or not a synonym group of the root element name exists in the category index. When this determination is negative, the process is terminated. On the other hand, if the determination is affirmative, the process proceeds to step S17, where the number of root element names acquired from the category index is k, the DTD of the i-th root element name is acquired, and matches the synonym in the DTD. Select an element name. At this time, the number of selected element names is y.
[0031]
In the example of FIG. 5, the DTD “paper, title, author, date” of the root element name “paper” is acquired, and the synonyms “author, writer, Author,... , title, Theme, ... "is selected from the DTD. In this example, “paper, title, author” matches, so “paper, title, author” is selected.
[0032]
In step S18, it is determined whether or not the number y of matched element names is equal to the number of parameters x extracted from the search expression. If this determination is affirmative, the process proceeds to step S19, and an output search is performed. Create one expression. In the example of FIG. 5, one output search expression is created using “paper, title, author”.
[0033]
In step S20, it is determined whether i ≧ k is satisfied. When this determination is negative and when the determination in step S18 is negative, the process proceeds to step S21 and 1 is added to i. Then, returning to step S17, the DTD of the next root element name (“article” in the example of FIG. 5) is acquired, and the element name that matches the synonym is selected in the DTD. In this example, “article, writer, Title” is selected. When the above operation is repeated and the determination in step S20 becomes affirmative, the process proceeds to step S22, and the output combining unit 16 combines the output search expressions. In the example of FIG. 5, the output search formulas 22a and 22b are obtained by this synthesis.
[0034]
In step S23, the search expressions 22a and 22b are sent to the XML database 2. In step S24, responses from the XML database 2 are collected and sent to the output synthesis unit 16 via the input analysis unit 11, and in step S25, the collection results are sent from the output synthesis unit 16 to the database client 3.
[0035]
As described above, according to the above-described embodiment, the user can efficiently search the XML database without being aware of the difference or arrangement of the DTD element names.
[0036]
Next, a second embodiment of the present invention will be described with reference to FIGS. FIG. 6 is an explanatory diagram of the operation of constructing the category index 13. In this embodiment, an arbitrary number or all of root element names and corresponding DTDs stored therein are acquired from the XML database 2 without using the category analogy unit 12 as shown in FIG. This is registered in the category index 13. According to this method, the root element name and DTD having contents as shown in FIG. 7 are registered as the category index 13.
[0037]
Next, the operation of the data search process of the XML document search apparatus 1 will be described with reference to FIG. The operation of this embodiment is different from the operation of FIG. 5 in that the central control unit 17 searches for the root element name in the category index 13 based on the synonym of the root element name acquired from the synonym extraction unit 15. In particular, the other points are the same as in FIG.
[0038]
According to this embodiment, the accuracy of the XML database search is considered to be slightly lower than that of the first embodiment, but has the advantage that the category index 13 can be constructed with a simple configuration and at a low cost. .
[0039]
【The invention's effect】
As is apparent from the above description, according to the present invention, the element name of the tag is extracted from the input search expression, and the element name is stored in the XML database based on the synonym. Therefore, the user does not need to know the DTD of the document type of the XML database to be searched in advance, and can easily create the search expression. it can. Therefore, the user can efficiently search, and can acquire the search result with high accuracy.
[0040]
Further, the category index is automatically updated when there is an update such as addition, change, or deletion in the document of the XML database, so that it can be maintained in the best state without any maintenance.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of category index construction according to the first embodiment of this invention.
FIG. 3 is an operation explanatory diagram of category index construction according to the first embodiment.
FIG. 4 is a flowchart showing the data search processing operation of the XML document search apparatus according to the first embodiment of the present invention.
FIG. 5 is an operation explanatory diagram of data search processing of the XML document search apparatus of the first embodiment.
FIG. 6 is an operation explanatory diagram of category index construction according to the second embodiment of this invention.
FIG. 7 is an operation explanatory diagram of a data search process of the XML document search device according to the second embodiment of the present invention.
FIG. 8 is an explanatory diagram of an example of a DTD, an XML document, and a search expression;
FIG. 9 is an explanatory diagram of a conventional XML document search method.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... XML document search apparatus, 2 ... XML database, 3 ... Database client, 11 ... Input analysis part, 12 ... Category analogy part, 13 ... Category index, 14 ... Category index management part, 15 ... Synonym extraction part, 16 ... Output Synthesizer, 21... Input search formula, 22a, 22b... Output search formula.

Claims

In an XML document search apparatus for searching a desired document from a plurality of XML documents,
Means for extracting the element name of the tag from the input search expression;
Means for extracting a synonym of the extracted element name;
Means for comparing the synonym with a category index corresponding to a tag definition (DTD) of the XML database, and obtaining an element name of a tag matching the synonym from the category index;
Means for creating a search expression for output using the element name of the tag acquired from the category index,
An XML document search apparatus, wherein the XML database is searched using the search expression for output.

The XML document search device according to claim 1,
The XML document search apparatus characterized in that the inputted search expression has a root element name, and an element name of a tag that matches a synonym of the root element name is obtained from the category index.

The XML document search device according to claim 1 or 2,
The category index includes a category name and an element name of a tag positioned below the category name. The synonym of the root element name of the input search expression is compared with the category name of the category index, and the two match. XML document search, wherein a synonym of an element name further subordinate to the root element name is compared with an element name of a tag related to the category name of the category index. apparatus.

The XML document search device according to claim 3,
The XML document search apparatus according to claim 1, wherein the category name of the category index is determined by analogy based on a plurality of documents stored in the XML database.

In the XML document search device according to any one of claims 1 to 4,
2. The XML document search apparatus according to claim 1, wherein the category index is updated with a change in the contents of the XML database.