JP2000250938A

JP2000250938A - Retrieval device for xml document

Info

Publication number: JP2000250938A
Application number: JP11054960A
Authority: JP
Inventors: Toshihiro Ono; 智弘小野; Satoshi Nishiyama; 智西山; Sadao Obana; 貞夫小花
Original assignee: KDD Corp
Current assignee: KDDI Corp
Priority date: 1999-03-03
Filing date: 1999-03-03
Publication date: 2000-09-14
Anticipated expiration: 2019-03-03
Also published as: JP3765459B2

Abstract

PROBLEM TO BE SOLVED: To enable a user to effectively retrieve an XML(extensible markup language) data base having the DTD(document-types tag definitions) which are structurally different from each other and semantically similar to each other with no consciousness of the difference of DTD. SOLUTION: An input analysis part 11 extracts an element name of an input retrieval expression which is produced by a data base client 3. A central control part 17 acquires a near-synonym of the extracted element name via a near- synonym extraction part 15 and compares this near-synonym with an element name that is stored in a category analogizing part 12 to select the element names which are coincident with each other. Then the part 17 produces an output retrieval expression by means of the selected element name and retrieves an XML data base 2 by using the output retrieval expression. The retrieval result of the data base 2 is notified to the client 3 via the parts 11 and 17 and an output synthesizing part 16.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明はＸＭＬ文書検索装
置に関し、特に、ユーザが検索対象となる文書の型のタ
グ定義（ＤＴＤ：Document Type Definition）を知らな
くても、ＸＭＬ（eXtensible Markup Language）データ
ベースから所望のデータを検索することのできるＸＭＬ
文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an XML document retrieval apparatus, and more particularly, to an XML (extensible Markup Language) database even if a user does not know a tag type definition (DTD: Document Type Definition) of a document to be retrieved. XML that can retrieve desired data from XML
The present invention relates to a document search device.

【０００２】[0002]

【従来の技術】近年、インターネットやイントラネット
上で文書を記述、交換するための言語として、ＸＭＬが
注目されている。ＸＭＬはＨＴＭＬと異なり、構造をも
った文書を記述するためのタグを用いることにより、文
書を一まとまりではなく、細かい要素の単位で記述、管
理することを可能としている。今日までに、ＸＭＬで記
述された文書を格納し、検索するためのデータベースが
幾つか発表されている。例えば、Object Design 社のeX
celon という名の商品等がある。2. Description of the Related Art In recent years, XML has attracted attention as a language for describing and exchanging documents on the Internet or an intranet. Unlike HTML, XML uses a tag for describing a document having a structure, so that the document can be described and managed in units of small elements instead of as a unit. To date, several databases for storing and retrieving documents described in XML have been published. For example, Object Design eX
There is a product named celon.

【０００３】さて、ＸＭＬ文書では、タグはユーザが自
由に定義して使用できるため、全ての利用者間で共通の
ＤＴＤが利用されるのではなく、情報発信者が独自に定
義／拡張したＤＴＤを用いて文書が記述されることがあ
ると考えられる。この結果、インターネットやイントラ
ネット上では、構造上は異なっているが、意味的に類似
したＤＴＤをもつＸＭＬ文書が散在することになる。In an XML document, since a tag can be freely defined and used by a user, a common DTD is not used among all users, but a DTD defined / extended independently by an information sender. It is conceivable that a document may be described by using. As a result, XML documents having different DTDs but having semantically similar DTDs are scattered on the Internet or an intranet.

【０００４】図６に、構造上は異なっているが、意味的
に類似した２種類のＤＴＤの例と、これに基づいたＸＭ
Ｌ文書および検索式の例を示す。図６(a) は、paper,ti
tle,author, およびdateの各タグ（タグの名前を要素名
と呼ぶ）を定義しているＤＴＤで、paper が残りの３つ
を含むことを示している。一方、同図(a')は、article,
Title,page, およびwriterの各タグを定義しているＤＴ
Ｄで、article が残りの３つを含むことを示している。FIG. 6 shows an example of two types of DTDs which are different in structure but similar in meaning, and an XM based on the two types.
5 shows an example of an L document and a search expression. Fig. 6 (a) shows paper, ti
This is a DTD that defines each tag of tle, author, and date (the name of the tag is called an element name), and indicates that paper includes the remaining three. On the other hand, FIG.
DT that defines each tag of Title, page, and writer
D indicates that the article contains the remaining three.

【０００５】同図(b) はＸＭＬ文書の表現を示し、pape
r を起点（ルート要素名）とするＤＴＤに従っているこ
と、各要素名に対する値が、SAMPLE TITLE、john,1103
であることを示している。同図(b')は、article を起点
とするＤＴＤに従っていること、各要素名に対する値
が、SAMPLE TITLE,123,john であることを示している。FIG. 1B shows an expression of an XML document.
r is the starting point (root element name), and the value for each element name is SAMPLE TITLE, john, 1103
Is shown. FIG. 11B shows that the DTD according to the article is used as a starting point, and that the value for each element name is SAMPLE TITLE, 123, john.

【０００６】さらに、同図(c) はＸＭＬ−ＱＬで記述し
た検索式で、paper をルート要素名とし、authorの値が
johnであるＸＭＬ文書からtitle の値を取得することを
示している。また、同図(c')は、article をルート要素
名とし、writerの値がjohnであるＸＭＬ文書からTitle
の値を取得することを示している。FIG. 2C is a search expression described in XML-QL, where paper is the root element name and the value of author is
This indicates that the value of title is acquired from the XML document john. Also, FIG. 13C shows that Title is a Title element from an XML document whose writer value is john, with article as the root element name.
Is obtained.

【０００７】図９は、前記のＸＭＬデータベースを使用
した文書の検索例の説明図である。プロセス構成は、ユ
ーザの検索要求を受付け、データベースへデータベース
操作言語で要求を送るデータベースクライアント３１
と、ＸＭＬ文書を格納し、外部へデータベース操作言語
による操作を提供するＸＭＬデータベース３２からなっ
ている。この従来構成では、ユーザあるいはアプリケー
ションプログラムが、データベースから文書全体あるい
はその一部を取得しようとすると、該ユーザ等は目的と
する文書が存在しそうな全ての型（例えば、paper 型、
article 型）のＤＴＤをそれぞれ理解し、図示されてい
る３３、３４のように、それらの型毎に検索操作を発行
することが必要になる。FIG. 9 is an explanatory diagram of an example of document retrieval using the XML database. The process configuration includes a database client 31 that receives a user's search request and sends the request to the database in a database operation language.
And an XML database 32 for storing an XML document and providing an external operation in a database operation language. In this conventional configuration, when a user or an application program tries to acquire the entire document or a part of the document from the database, the user or the like will be able to obtain all types of documents (eg, paper type,
It is necessary to understand the DTD of each article type and issue a search operation for each of them as shown in 33 and 34 shown in the figure.

【０００８】[0008]

【発明が解決しようとする課題】前記したように、イン
ターネットやイントラネット上では、構造上は異なって
いるが、意味的に類似したＤＴＤをもつＸＭＬデータベ
ースが散在するため、ユーザあるいはアプリケーション
プログラムが、該ＸＭＬデータベースからＸＭＬ文書を
検索しようとすると、必要な値があると思われる全ての
ＤＴＤの文書に対して別々に検索式を記述することが必
要になり、効率的でないという問題があった。As described above, on the Internet and intranets, XML databases having different DTDs in structure but having semantically similar DTDs are scattered. If an attempt is made to retrieve an XML document from the XML database, it is necessary to separately describe a retrieval formula for all DTD documents that are considered to have a necessary value, which is not efficient.

【０００９】例えば、図９を例にとると、john氏が書い
た著書の題名を知りたい場合、ＸＭＬデータベースで
は、paper とarticle で定義される文書は異なったもの
であるため、paper とarticlのそれぞれに対して、図９
の３３、３４のように、別々に検索式を記述して問い合
わせることが必要になる。また、このため、そのコスト
は類似した異なるＤＴＤに基づいて記述された文書が増
えるに従って増大するという問題もあった。For example, taking FIG. 9 as an example, if you want to know the title of a book written by john, in the XML database, the documents defined by paper and article are different. For each, FIG.
As in 33 and 34, it is necessary to separately describe a query and make an inquiry. For this reason, there is also a problem that the cost increases as the number of documents described based on different similar DTDs increases.

【００１０】本発明の目的は、前記した従来技術の問題
点を除去し、構造上は異なっているが、意味的に類似し
たＤＴＤをもつＸＭＬデータベースに対して、ユーザが
ＤＴＤの差異を意識せずに効率的に検索することのでき
るＸＭＬ文書検索装置を提供することにある。[0010] An object of the present invention is to eliminate the above-mentioned problems of the prior art, and to allow a user to be aware of the difference between DTDs in an XML database having a structurally different but semantically similar DTD. An object of the present invention is to provide an XML document search device that can efficiently search without using an XML document.

【００１１】[0011]

【課題を解決するための手段】前記した目的を達成する
ために、この発明は、ＸＭＬ文書から所望の文書を検索
するためのＸＭＬ文書検索装置において、入力された検
索式からタグの要素名を抽出する手段と、該抽出された
要素名の類義語を抽出する手段と、該類義語を、ＸＭＬ
データベースのタグ定義（ＤＴＤ）に対応したカテゴリ
索引と対照し、該カテゴリ索引から前記類義語と一致す
るタグの要素名を取得する手段と、該カテゴリ索引から
取得したタグの要素名を用いて出力用の検索式を作成す
る手段とを具備し、該出力用の検索式を用いて、前記Ｘ
ＭＬデータベースを検索するようにした点に特徴があ
る。In order to achieve the above-mentioned object, the present invention provides an XML document search apparatus for searching a desired document from an XML document by retrieving a tag element name from an input search formula. Means for extracting, means for extracting a synonym of the extracted element name, and
Means for acquiring a tag element name corresponding to the synonym from the category index in comparison with a category index corresponding to a tag definition (DTD) of a database, and outputting the tag element name using the tag element name acquired from the category index Means for creating a search formula for X, and using the search formula for output,
The feature is that the ML database is searched.

【００１２】この発明によれば、入力された検索式は、
該検索式に記述されているタグの要素名の類義語を基
に、ＸＭＬデータベース内に実在する文書のタグ定義に
対応した要素名をもつ出力用の検索式に自動的に変換さ
れるので、データベースクライアントは検索対象となる
文書の型のＤＴＤを知る必要がなく、検索手続きが簡単
になると共に、検索範囲を拡張させることができるよう
になる。According to the present invention, the input search formula is
Based on a synonym of the element name of the tag described in the search expression, it is automatically converted into an output search expression having an element name corresponding to the tag definition of a document existing in the XML database. The client does not need to know the DTD of the type of the document to be searched, so that the search procedure is simplified and the search range can be expanded.

【００１３】[0013]

【発明の実施の形態】以下に、図面を参照して、本発明
を詳細に説明する。図１は、本発明のＸＭＬ文書検索シ
ステムの一実施形態の構成を示すブロック図である。図
１に示されているように、ＸＭＬ文書検索システムは、
ＸＭＬ文書検索装置１と、ＸＭＬデータベース２と、デ
ータベースクライアント３から構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the XML document search system of the present invention. As shown in FIG. 1, the XML document search system includes:
It comprises an XML document search device 1, an XML database 2, and a database client 3.

【００１４】ＸＭＬ文書検索装置１は、外部からの入力
を受付けてこれを解析する入力解析部１１と、要素の集
合を受取り、その要素の集合を特徴付けるカテゴリ名を
出力するカテゴリ類推部１２と、ＸＭＬデータベース２
のＤＴＤの情報に対応したカテゴリ索引１３を管理する
カテゴリ索引管理部１４と、与えたキーワードの複数の
類義語を出力する類義語抽出部１５と、検索装置１の処
理結果を外部へ送出する出力合成部１６と、前記各部の
全体の制御を行う中央制御部１７から構成されている。The XML document retrieval apparatus 1 includes an input analysis unit 11 for receiving an external input and analyzing the input, a category analogization unit 12 for receiving a set of elements and outputting a category name characterizing the set of elements, XML database 2
Category index management unit 14 that manages a category index 13 corresponding to the information of the DTD, a synonym extraction unit 15 that outputs a plurality of synonyms of a given keyword, and an output synthesis unit that sends processing results of the search device 1 to the outside. 16 and a central control unit 17 for controlling the whole of the respective units.

【００１５】前記ＸＭＬ文書検索装置１の構成をさらに
詳細に説明すると、前記入力解析部１１は、データベー
スクライアント３からのデータベース操作要求を受付
け、操作要求のパラメタの抽出を行う。また、ＸＭＬデ
ータベース２からの応答を受け付ける。前記カテゴリ類
推部１２は同一要素名に属する要素の集合を中央制御部
１７から受取り、その要素集合を特徴付けるカテゴリ名
を類推し、その中で最も信頼度の高いものを中央制御部
１７へ送出する。前記カテゴリ索引管理部１４は、ＸＭ
Ｌデータベース２のＤＴＤの情報に対応したカテゴリ索
引１３を管理する。The configuration of the XML document search apparatus 1 will be described in more detail. The input analysis unit 11 receives a database operation request from the database client 3 and extracts parameters of the operation request. Further, a response from the XML database 2 is received. The category analogizing unit 12 receives a set of elements belonging to the same element name from the central control unit 17, estimates a category name characterizing the element set, and sends the category name having the highest reliability to the central control unit 17. . The category index management unit 14 uses XM
The category index 13 corresponding to the DTD information in the L database 2 is managed.

【００１６】前記カテゴリ索引１３は、ＤＴＤのあるタ
グに対応した要素の集合を特徴付ける「カテゴリ名」を
索引鍵とし、それに対応する実際のＤＴＤを値とするも
のである。該「カテゴリ名」は、実際のＸＭＬデータベ
ース２の値からシソーラスを利用した類推により導出さ
れる。The category index 13 uses a "category name" characterizing a set of elements corresponding to a certain tag of the DTD as an index key and an actual DTD corresponding thereto as a value. The “category name” is derived from an actual value of the XML database 2 by analogy using a thesaurus.

【００１７】また、前記類義語抽出部１５は、与えたキ
ーワードの複数の類義語を出力する。既存のシソーラス
ＤＢ等が使用可能である。例えば、QZS Dictionary Ser
ver等のシソーラスＤＢが使用可能である。前記出力合
成部１６は、データベースクライアント３によってなさ
れたデータベース操作要求に伴ってＸＭＬ文書検索装置
１によってなされた処理結果である検索式の各パラメタ
を受取り、複数の検索式を合成してＸＭＬデータベース
２に送出する。また、入力解析部１１から転送されたＸ
ＭＬデータベース２からの応答をデータベースクライア
ント３へ送出する。前記中央制御部１７は、入力解析部
１１からパラメタを受取り、カテゴリ類推部１２、カテ
ゴリ索引管理部１４、および類義語抽出部１５を利用し
て、データベース操作処理、カテゴリ索引構築／変更処
理を行い、その結果を出力合成部１６に送る。The synonym extracting section 15 outputs a plurality of synonyms of the given keyword. An existing thesaurus DB or the like can be used. For example, QZS Dictionary Ser
A thesaurus DB such as ver can be used. The output synthesizing unit 16 receives each parameter of a search expression that is a processing result performed by the XML document search device 1 in response to a database operation request made by the database client 3, and synthesizes a plurality of search expressions to form the XML database 2 To send to. Also, the X transferred from the input analysis unit 11
A response from the ML database 2 is sent to the database client 3. The central control unit 17 receives parameters from the input analysis unit 11 and performs a database operation process and a category index construction / change process using the category analogization unit 12, the category index management unit 14, and the synonym extraction unit 15, The result is sent to the output synthesizing unit 16.

【００１８】次に、前記の構成を有するＸＭＬ文書検索
装置１の動作を、以下に説明する。まず、該ＸＭＬ文書
検索装置１を初めてＸＭＬデータベース２に接続した時
に、前記中央制御部１７が行う動作を、図２のフローチ
ャートと図３の具体例を参照して説明する。この動作
は、実際のＸＭＬデータベース２の値からカテゴリ索引
１３を構築する動作である。Next, the operation of the XML document retrieval apparatus 1 having the above configuration will be described below. First, an operation performed by the central control unit 17 when the XML document search device 1 is connected to the XML database 2 for the first time will be described with reference to a flowchart of FIG. 2 and a specific example of FIG. This operation is an operation of constructing the category index 13 from the values of the actual XML database 2.

【００１９】ステップＳ１では、ＸＭＬデータベース２
から全てのルート要素名と、それに対応する型（ＤＴ
Ｄ）を取得し、カテゴリ索引管理部１４へＤＴＤ登録要
求を出す。カテゴリ索引管理部１４はカテゴリ索引１３
にＤＴＤを登録する。図３の例では、ＸＭＬデータベー
ス２中に格納されているルート要素名「paper 」とそれ
に対応するＤＴＤ「paper,title,author,date 」、次の
ルート要素名「article」とそれに対応するＤＴＤ「art
icle,Title,page,writer 」、さらに次のルート要素名
「trip」とそれに対応するＤＴＤ「destination,depart
ure,arrival 」、…を、ＸＭＬデータベース２から取得
し、一旦カテゴリ索引１３に登録する。In step S1, the XML database 2
To all root element names and their corresponding types (DT
D), and issues a DTD registration request to the category index management unit 14. The category index management unit 14 stores the category index 13
Register the DTD. In the example of FIG. 3, the root element name “paper” stored in the XML database 2 and its corresponding DTD “paper, title, author, date”, the next root element name “article” and its corresponding DTD “ art
icle, Title, page, writer ”, the next root element name“ trip ”and the corresponding DTD“ destination, depart
ure, arrival ",... are acquired from the XML database 2 and temporarily registered in the category index 13.

【００２０】ステップＳ２では、前記ルート要素名の中
の、あるルート要素名について、ＸＭＬデータベース２
から、任意個の文書（data) を取得する。図３の例で
は、ルート要素名「paper 」に対応する文書「SAMPLE,j
ohn,9701」、「SAMPLE2,john,9811 」等を、ＸＭＬデー
タベース２から取得する。In step S2, for a certain root element name in the root element names, the XML database 2
Get any number of documents (data) from In the example of FIG. 3, the document “SAMPLE, j” corresponding to the root element name “paper”
ohn, 9701, "SAMPLE2, john, 9811" and the like are acquired from the XML database 2.

【００２１】ステップＳ３では、取得した複数の文書を
カテゴリ類推部１２へ送り、送った複数の文書を代表す
るカテゴリ名を取得する。カテゴリ類推部１２では、複
数の文書を基にそれを代表するカテゴリ名を類推し、最
も信頼度の高いもの（cname)を中央制御部１７へ送出す
る。図３の例では、カテゴリ類推部１２が前記文書「SA
MPLE,john,9701」、「SAMPLE2,john,9811 」から、カテ
ゴリ名「本」を類推したとする。In step S3, the acquired plurality of documents are sent to the category estimating unit 12, and a category name representing the sent plurality of documents is acquired. The category analogizing section 12 analogizes a category name representing the plurality of documents based on the plurality of documents, and sends the category name having the highest reliability (cname) to the central control section 17. In the example of FIG. 3, the category analogizing unit 12 determines that the document “SA”
It is assumed that the category name "book" is inferred from "MPLE, john, 9701" and "SAMPLE2, john, 9811".

【００２２】ステップＳ４では、カテゴリ索引管理部１
４に対して、該cname の登録要求を出す。カテゴリ索引
管理部１４は該cname を前記ルート要素名と対応付けて
カテゴリ索引１３に登録し管理する。図３の例では、cn
ame である「本」をルート要素名「paper 」と関連付け
てカテゴリ索引１３に登録する。In step S4, the category index management unit 1
4, a registration request for the cname is issued. The category index management unit 14 registers and manages the cname in the category index 13 in association with the root element name. In the example of FIG.
The ame “book” is registered in the category index 13 in association with the root element name “paper”.

【００２３】ステップＳ５では、全部のルート要素名に
cname が対応付けられたか否かの判断がなされ、この判
断が否定の時にはステップＳ２に戻って、前記の動作が
繰り返される。図３の例では、次に、ルート要素名「ar
ticle 」に対応する文書「Flower,101,thomas 」、「An
imals,100,tom 」、「Database,56,john」が取得され、
これらから例えばカテゴリ名「本」が類推されて、cnam
e である「本」をルート要素名「article 」と関連付け
てカテゴリ索引１３に登録する。In step S5, all root element names are
It is determined whether or not cname has been associated. If the determination is negative, the process returns to step S2 and the above operation is repeated. In the example of FIG. 3, next, the root element name “ar
documents "Flower, 101, thomas" and "An
imals, 100, tom "and" Database, 56, john "
From these, for example, the category name “book” is inferred, and cnam
The e-book "book" is registered in the category index 13 in association with the root element name "article".

【００２４】以上の処理が繰返し行われ、前記ステップ
Ｓ５の判断が肯定になると、カテゴリ索引構築の処理は
終了する。以上の動作により、例えば、図５に示されて
いるような、カテゴリ索引１３が作成される。The above processing is repeated, and if the determination in step S5 becomes affirmative, the processing of constructing the category index ends. By the above operation, for example, the category index 13 as shown in FIG. 5 is created.

【００２５】なお、構築されたカテゴリ索引は、データ
型の挿入や更新に伴って変更したり、格納する文書の増
加あるいは変化に伴ってカテゴリ名の精度を向上させる
等により、維持することが必要である。このカテゴリ名
の更新は、データ操作やデータ型操作を契機として、前
記中央制御部１７とカテゴリ索引管理部１４とカテゴリ
類推部１２が行う。It should be noted that the constructed category index needs to be maintained by changing it according to the insertion or updating of the data type, or by improving the accuracy of the category name as the number of stored documents increases or changes. It is. The update of the category name is performed by the central control unit 17, the category index management unit 14, and the category analogization unit 12 in response to a data operation or a data type operation.

【００２６】次に、ＸＭＬ文書検索装置１のデータ検索
処理の動作を、図４のフローチャートおよび図５の説明
図を参照して説明する。ステップＳ１１では、前記デー
タベースクライアント３の検索操作により、検索式の入
力があったか否かの判断がなされる。この判断が肯定に
なるとステップＳ１２に進み、ある数ｉが１と置かれ
る。ステップＳ１３では、前記検索式２１から、ルート
要素名と、パラメタ要素名と、その値が抽出される。抽
出されたパラメタ数（ルート要素名＋パラメタ要素名）
の個数をｘ個とする。Next, the operation of the data search processing of the XML document search apparatus 1 will be described with reference to the flowchart of FIG. 4 and the explanatory diagram of FIG. In step S11, it is determined whether or not a search expression has been input by the search operation of the database client 3. If this determination is affirmative, the process proceeds to step S12, where a certain number i is set to one. In step S13, a root element name, a parameter element name, and its value are extracted from the search expression 21. Number of extracted parameters (root element name + parameter element name)
Is x.

【００２７】例えば、図５に示されているように、デー
タベースクライアント３から、検索式２１が入力された
とすると、該検索式は入力解析部１１を通って中央制御
部１７に送られる。該中央制御部１７は、検索式２１か
ら、ルート要素名「文書」と、パラメタの要素名に相当
する「著者」とその値である「john」と、他の要素名で
ある「題名」を抽出する。この場合には、パラメタ数ｘ
＝３となる。For example, as shown in FIG. 5, when a search formula 21 is input from the database client 3, the search formula is sent to the central control unit 17 through the input analysis unit 11. The central control unit 17 calculates the root element name “document”, the “author” corresponding to the parameter element name and “john” as its value, and the other element name “title” from the search formula 21. Extract. In this case, the number of parameters x
= 3.

【００２８】ステップＳ１４では、類義語抽出部１５
へ、該抽出したルート要素名とパラメタの要素名を渡
し、それぞれの類義語を取得する。図５の例では、ルー
ト要素名である「文書」と、パラメタの要素名である
「著者」と「題名」が、類義語抽出部１５に渡される。
そうすると、該類義語抽出部１５は、前記ルート要素名
およびパラメタの要素名に対応する類義語を中央制御部
１７に回答する。なお、該類義語抽出部１５としては、
市販のシソーラスＤＢ２３を使用することができる。In step S14, the synonym extraction unit 15
Then, the extracted root element name and parameter element names are passed to and the corresponding synonyms are obtained. In the example of FIG. 5, the root element name “document” and the parameter element names “author” and “title” are passed to the synonym extraction unit 15.
Then, the synonym extraction unit 15 returns a synonym corresponding to the root element name and the parameter element name to the central control unit 17. Note that the synonym extraction unit 15 includes:
A commercially available thesaurus DB23 can be used.

【００２９】ステップＳ１５では、該ルート要素名の類
義語、例えば前記「文書」の類義語である本、paper,Pa
per,Document,article等を前記カテゴリ索引管理部１４
を通してカテゴリ索引１３に送り、該カテゴリ索引１３
から、該類義語をカテゴリ名にもつルート要素名とＤＴ
Ｄを取得する。図５の例では、カテゴリ索引１３から、
カテゴリ索引「本」に対応するルート要素名「paper 」
と「article 」とを取得する。また、各ルート要素名に
対応するＤＴＤを取得する。In step S15, a synonym of the root element name, for example, a book, paper, Pa
Per, Document, article, etc. are stored in the category index management unit 14
To the category index 13 through the category index 13
From the root element name and the DT that have the synonym in the category name
Get D. In the example of FIG.
Root element name "paper" corresponding to category index "book"
And "article". Further, a DTD corresponding to each root element name is obtained.

【００３０】ステップＳ１６では、カテゴリ索引の中
に、前記ルート要素名の類義語群が存在するか否かの判
断がなされる。この判断が否定の時には、処理を終了す
る。一方、肯定の時には、ステップＳ１７に進んで、前
記カテゴリ索引から取得したルート要素名の個数をｋ個
とし、ｉ番目のルート要素名のＤＴＤを取得し、該ＤＴ
Ｄの中で前記類義語と一致する要素名を選択する。この
時、選択した要素名の個数をｙとする。In step S16, it is determined whether or not a synonym group of the root element name exists in the category index. If this determination is negative, the process ends. On the other hand, when the result is affirmative, the process proceeds to step S17, where the number of root element names obtained from the category index is set to k, and the DTD of the i-th root element name is obtained.
An element name that matches the synonym in D is selected. At this time, the number of the selected element names is set to y.

【００３１】図５の例では、ルート要素名「paper 」の
ＤＴＤ「paper,title,author,date」を取得し、前記ル
ート要素名の下位のパラメタの類義語「author,writer,
Author,....,Title,title,Theme,... 」と一致する要素
名を、前記ＤＴＤから選択する。この例では、「paper,
title,author」が一致するので、該「paper,title,auth
or」が選択される。In the example of FIG. 5, the DTD “paper, title, author, date” of the root element name “paper” is acquired, and the synonyms “author, writer,
Author, ...., Title, title, Theme, ... "is selected from the DTD. In this example, "paper,
title, author '' matches, so the `` paper, title, auth
or "is selected.

【００３２】ステップＳ１８では、該一致した要素名の
個数ｙ＝前記検索式から抽出したパラメタ個数ｘが成立
するか否かの判断を行い、この判断が肯定の場合には、
ステップＳ１９に進んで、出力検索式を１個作成する。
図５の例では、「paper,title,author」を用いて一つの
出力検索式が作成される。In step S18, it is determined whether or not the number of matching element names y = the number of parameters x extracted from the search formula is satisfied. If this determination is affirmative,
Proceeding to step S19, one output search formula is created.
In the example of FIG. 5, one output search formula is created using “paper, title, author”.

【００３３】ステップＳ２０では、ｉ≧ｋが成立するか
否かの判断が行われる。この判断が否定の時およびステ
ップＳ１８の判断が否定の時には、ステップＳ２１に進
んでｉに１が加算される。そして、ステップＳ１７に戻
って、次のルート要素名（図５の例では、「article
」）のＤＴＤを取得し、該ＤＴＤの中で前記類義語と
一致する要素名を選択する。この例では、「article,wr
iter,Title」が選択される。以上の動作が繰返し行わ
れ、ステップＳ２０の判断が肯定になると、ステップＳ
２２に進んで、前記出力合成部１６にて、出力検索式の
合成が行われる。図５の例では、この合成により、出力
検索式２２ａと２２ｂが得られることになる。In step S20, it is determined whether or not i ≧ k is satisfied. When this determination is negative and when the determination in step S18 is negative, the process proceeds to step S21 and 1 is added to i. Then, returning to step S17, the next root element name (in the example of FIG. 5, “article
)), And selects an element name that matches the synonym in the DTD. In this example, "article, wr
iter, Title "is selected. The above operation is repeatedly performed, and if the determination in step S20 becomes affirmative, the process proceeds to step S20.
Proceeding to 22, the output synthesizing unit 16 synthesizes an output search formula. In the example of FIG. 5, the output search expressions 22a and 22b are obtained by this combination.

【００３４】ステップＳ２３では、該検索式２２ａと２
２ｂが前記ＸＭＬデータベース２に送られる。ステップ
Ｓ２４では、ＸＭＬデータベース２からの応答が収集さ
れて入力解析部１１を介して出力合成部１６に送られ、
ステップＳ２５では収集結果が該出力合成部１６からデ
ータベースクライアント３へ送られる。In step S23, the search expressions 22a and 22
2b is sent to the XML database 2. In step S24, responses from the XML database 2 are collected and sent to the output synthesizing unit 16 via the input analyzing unit 11,
In step S25, the collection result is sent from the output synthesizing unit 16 to the database client 3.

【００３５】以上のようにして、上記の実施形態によれ
ば、ユーザはＤＴＤの要素名の差や配置を意識せずに、
ＸＭＬデータベースを効率的に検索することができるよ
うになる。As described above, according to the above-described embodiment, the user is not conscious of the difference or arrangement of the DTD element names,
An XML database can be efficiently searched.

【００３６】次に、本発明の第２実施形態を、図６およ
び図７を参照して説明する。図６は前記カテゴリ索引１
３を構築する動作の説明図である。この実施形態は、図
３で示したようなカテゴリ類推部１２を用いずに、ＸＭ
Ｌデータベース２から、この中に格納されているルート
要素名とそれに対応するＤＴＤを任意の個数または全部
取得し、カテゴリ索引１３に登録するようにしたもので
ある。この方法によれば、図７に示されているような内
容の、ルート要素名とＤＴＤがカテゴリ索引１３として
登録されることになる。Next, a second embodiment of the present invention will be described with reference to FIGS. FIG. 6 shows the category index 1
3 is an explanatory diagram of an operation for constructing No. 3; FIG. This embodiment uses XM without using the category inference unit 12 as shown in FIG.
An arbitrary number or all of the root element names and the corresponding DTDs stored therein are acquired from the L database 2 and registered in the category index 13. According to this method, the root element name and the DTD having the contents as shown in FIG. 7 are registered as the category index 13.

【００３７】次に、ＸＭＬ文書検索装置１のデータ検索
処理の動作を図７を参照して説明する。本実施形態の動
作が図５の動作と異なる点は、中央制御部１７が、類義
語抽出部１５から取得したルート要素名の類義語を基
に、カテゴリ索引１３のルート要素名を検索するように
したことにあり、他の点は、図５と同じである。Next, the operation of the data search process of the XML document search device 1 will be described with reference to FIG. The operation of the present embodiment is different from the operation of FIG. 5 in that the central control unit 17 searches for a root element name of the category index 13 based on a synonym of the root element name acquired from the synonym extraction unit 15. The other points are the same as those in FIG.

【００３８】この実施形態によれば、ＸＭＬデータベー
スの検索の精度は、前記第１実施形態に比べて若干低下
すると考えられるが、カテゴリ索引１３を簡単な構成で
かつ安価に構築できるという利点を有している。According to this embodiment, the accuracy of XML database search is considered to be slightly lower than that of the first embodiment, but there is an advantage that the category index 13 can be constructed with a simple configuration and at low cost. are doing.

【００３９】[0039]

【発明の効果】以上の説明から明らかなように、本発明
によれば、入力された検索式からタグの要素名を抽出
し、該要素名を、その類義語を基にＸＭＬデータベース
に格納されているタグの要素名に変換して、出力検索式
を作成するようにしているので、ユーザは、検索対象と
なるＸＭＬデータベースの文書の型のＤＴＤを予め知っ
ている必要がなく、簡単に検索式を作成することができ
る。したがって、ユーザは効率的に検索でき、しかも、
精度良く検索結果を取得することができる。As is apparent from the above description, according to the present invention, an element name of a tag is extracted from an input search expression, and the element name is stored in an XML database based on its synonyms. Since the output search formula is created by converting to the element name of the tag that exists, the user does not need to know in advance the DTD of the document type of the XML database to be searched, and the search formula can be easily performed. Can be created. Therefore, users can search efficiently,
Search results can be obtained with high accuracy.

【００４０】また、カテゴリ索引は、ＸＭＬデータベー
スの文書に追加、変更、削除等の更新があると自動的に
更新されるので、何らのメンテナンスをすることなく、
最良の状態に維持できる。Further, the category index is automatically updated when there is an update such as addition, change or deletion in the document of the XML database, so that no maintenance is required.
Can be maintained in the best condition.

[Brief description of the drawings]

【図１】本発明の一実施形態の概略の構成を示すブロ
ック図である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.

【図２】本発明の第１実施形態のカテゴリ索引構築の
動作を示すフローチャートである。FIG. 2 is a flowchart showing an operation of constructing a category index according to the first embodiment of the present invention.

【図３】該第１実施形態のカテゴリ索引構築の動作説
明図である。FIG. 3 is an explanatory diagram of an operation of constructing a category index according to the first embodiment.

【図４】本発明の第１実施形態のＸＭＬ文書検索装置
のデータ検索処理の動作を示すフローチャートである。FIG. 4 is a flowchart illustrating an operation of a data search process of the XML document search device according to the first embodiment of the present invention.

【図５】前記第１実施形態のＸＭＬ文書検索装置のデ
ータ検索処理の動作説明図である。FIG. 5 is an explanatory diagram of an operation of a data search process of the XML document search device of the first embodiment.

【図６】本発明の第２実施形態のカテゴリ索引構築の
動作説明図である。FIG. 6 is a diagram illustrating an operation of constructing a category index according to the second embodiment of this invention.

【図７】本発明の第２実施形態のＸＭＬ文書検索装置
のデータ検索処理の動作説明図である。FIG. 7 is a diagram illustrating an operation of a data search process of the XML document search device according to the second embodiment of the present invention.

【図８】ＤＴＤ、ＸＭＬ文書、および検索式の一例の
説明図である。FIG. 8 is an explanatory diagram of an example of a DTD, an XML document, and a search expression.

【図９】従来のＸＭＬ文書検索方法の説明図である。FIG. 9 is an explanatory diagram of a conventional XML document search method.

[Explanation of symbols]

１…ＸＭＬ文書検索装置、２…ＸＭＬデータベース、３
…データベースクライアント、１１…入力解析部、１２
…カテゴリ類推部、１３…カテゴリ索引、１４…カテゴ
リ索引管理部、１５…類義語抽出部、１６…出力合成
部、２１…入力された検索式、２２ａ，２２ｂ…出力検
索式。1 ... XML document retrieval device, 2 ... XML database, 3
... Database client, 11 ... Input analysis unit, 12
... Category analogization unit, 13 ... Category index, 14 ... Category index management unit, 15 ... Synonymous word extraction unit, 16 ... Output synthesis unit, 21 ... Input search formula, 22a, 22b ... Output search formula.

───────────────────────────────────────────────────── フロントページの続き (72)発明者小花貞夫埼玉県上福岡市大原２−１−15 株式会社ケイディディ研究所内Ｆターム(参考） 5B075 KK02 KK07 KK13 KK37 KK39 ND03 ND35 NK02 NK32 NK35 PP23 PP25 PP26 PR06 QM07 QP03 UU06 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Sadao Obana 2-1-15 Ohara, Kamifukuoka-shi, Saitama F-term in Kadidi Laboratory Co., Ltd. (Reference) 5B075 KK02 KK07 KK13 KK37 KK39 ND03 ND35 NK02 NK32 NK35 PP23 PP25 PP26 PR06 QM07 QP03 UU06

Claims

[Claims]

1. An XML document retrieval apparatus for retrieving a desired document from a plurality of XML documents, means for extracting a tag element name from an input retrieval formula, and extracting a synonym of the extracted element name Means to perform the synonym, tag definition (DTD) of XML database
Means for acquiring a tag element name that matches the synonym from the category index in comparison with the category index corresponding to the category index, and a means for creating an output search formula using the tag element name acquired from the category index An XML document search apparatus, comprising: searching the XML database by using the search expression for output.

2. The XML document search device according to claim 1, wherein the input search expression has a root element name, and an element name of a tag matching a synonym of the root element name is obtained from the category index. XML, characterized in that it is acquired
Document search device.

3. The XML document search device according to claim 1, wherein the category index includes a category name and an element name of a tag located below the category name, and a root of the input search expression. The synonym of the element name is compared with the category name of the category index, and if the two match, a synonym of the element name below the root element name and the tag of the category index related to the category name are further compared. An XML document search device, wherein the XML document is compared with an element name.

4. The XML document search device according to claim 3, wherein the category name of the category index is determined by analogy based on tag data stored in the XML database. XML document search device.

5. The XML according to claim 1, wherein
In the document search device, the category index is updated according to a change in the content of the XML database.
ML document search device.