JP3287307B2

JP3287307B2 - Structured document search system, structured document search method, and recording medium storing structured document search program

Info

Publication number: JP3287307B2
Application number: JP17318498A
Authority: JP
Inventors: みさ波内
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-06-19
Filing date: 1998-06-19
Publication date: 2002-06-04
Anticipated expiration: 2018-06-19
Also published as: JP2000010988A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、構造化文書検索技
術に関し、特に、構造化文書の文書構造と、文書中に含
まれる文字列とを検索条件とする複合問合せを処理でき
る構造化文書検索システム、構造化文書検索方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a structured document search technique, and more particularly to a structured document search technique capable of processing a compound query using a document structure of a structured document and a character string included in the document as search conditions. The present invention relates to a system and a structured document search method.

【０００２】[0002]

【従来の技術】構造化文書を構成する文書要素を分解し
てデータベースに格納し、そのテキストと共に文書要素
間の関連（「文書構造情報」という）を併せ持つことに
より、文書構造と、これに含まれる文字列を検索条件と
して指定する検索システムが実現されている。2. Description of the Related Art Document elements constituting a structured document are decomposed and stored in a database, and the texts are associated with the document elements together (hereinafter referred to as "document structure information"), so that the document structure is included in the document structure. A search system that specifies a character string to be searched as a search condition has been realized.

【０００３】この種の従来の構造化文書検索システムの
一例として、1994年に発行された「プロシーディングス
・オブ・ファースト・インターナショナル・コンファレ
ンス・オン・ジ・アプリケーション・オブ・データベー
ス・テクノロジーズ・アンド・ゼア・インテグレーショ
ン（Proceedings of 1st International Conferenceon
the Ａpplication of Database Technologies & their
Integration (ADTI’94)）」の第272頁ないし第283頁に
掲載された「データベース・システムズ・フォー・スト
ラクチャード・ドキュメンツ（Database Systems for S
tructured Documents）」と題するR. Sacks-Davisらに
よる論文（「文献１」という）が参照される。As an example of this type of conventional structured document retrieval system, "Procedures of First International Conference on the Application of Database Technologies and There" was published in 1994.・ Integration (Proceedings of 1st International Conferenceon
the Application of Database Technologies & their
Integration (ADTI'94)) on pages 272 to 283 of "Database Systems for Structured Documents".
Reference is made to a paper by R. Sacks-Davis et al. entitled "Tructured Documents".

【０００４】文書要素を格納するデータベースには、リ
レーショナル・データベースやオブジェクト指向データ
ベースが用いられるが、特に、オブジェクト指向データ
ベースを利用した場合、文書要素間の関係を直接データ
ベース中のデータ間の関係として表すことができるた
め、文書構造を文書の先頭からその構造に従って巡航
し、検索条件に指定された文字列を含む文書要素を見つ
けることや、特定の構造、例えば「文書要素<author>を
５つ以上含む文書要素<paper>」等、を検出することが
可能となっている。この例は、上記文献１に記載されて
いる。[0004] A relational database or an object-oriented database is used as a database for storing document elements. In particular, when an object-oriented database is used, the relation between document elements is directly expressed as the relation between data in the database. It is possible to cruise the document structure according to the structure from the beginning of the document to find a document element that includes the character string specified in the search condition, or to search for a specific structure, for example, "5 or more document elements <author> Including document element <paper> ”can be detected. This example is described in the above reference 1.

【０００５】文書要素を分解して管理するデータベース
に対する検索方法として、従来より、２つの方法が知ら
れている。Conventionally, two methods have been known as a search method for a database in which document elements are decomposed and managed.

【０００６】その一つは、元文書の最初に現れる文書要
素から開始し、包含関係にある文書要素の子要素を順番
に訪問し、条件判定を実行することにより、条件検索を
行う方法である。ここでは、この方法を、「トップダウ
ン（top-down）方式」とよぶことにする。One of the methods is to perform a condition search by starting with a document element that appears first in an original document, sequentially visiting child elements of the document element in an inclusive relationship, and executing a condition determination. . Here, this method is referred to as a “top-down method”.

【０００７】別の方法として、文書要素を、文書要素名
（タグ名）ごとに分類した集合を予めデータベース中に
作成しておき、その分類集合から、条件に合致する文字
列を含むものを検索する方法がある。この方法を、「ボ
トムアップ（bottom-up）方式」とよぶ。As another method, a set in which document elements are classified for each document element name (tag name) is created in a database in advance, and a search for a character string that meets a condition is searched from the classification set. There is a way to do that. This method is called "bottom-up method".

【０００８】データベースを利用しないで、一般の全文
検索（フルテキストサーチ）エンジンを具備すること
で、高速文書検索を実現する専用文書検索システムも知
られている。There is also known a dedicated document search system that realizes high-speed document search by using a general full-text search (full-text search) engine without using a database.

【０００９】また、オブジェクト指向データベースと文
書検索システムを組み合わせた例として、1994年に発行
された「プロシーディングス・オブ・ザ・トゥエンティ
ース・ブイエルディービー・コンファレンス（Proceedi
ngs of the 20th VLDB Conference）」の第740頁ないし
第749頁に掲載された「インテグレーティング・ア・ス
トラクチャード・テキスト・リトリーバル・システム・
ウィズ・アン・オブジェクト・オリエンテッド・データ
ベース・システム（Integrating a Structured-Text Re
trieval System with an Object-Oriented Database Sy
stem）」と題するT.W. Yanらによる論文（「文献２」と
いう）が参照される。[0009] As an example of combining an object-oriented database and a document retrieval system, "Proceedings of the Twentys VLD Conference (Proceedi
ngs of the 20th VLDB Conference), pages 740 to 749, "Integrating a Structured Text Retrieval System.
With an object-oriented database system (Integrating a Structured-Text Re
trieval System with an Object-Oriented Database Sy
stem) ”by TW Yan et al. (referred to as“ Reference 2 ”).

【００１０】[0010]

【発明が解決しようとする課題】トップダウン方式によ
る検索は、文書要素間の包含関係（親子関係）がデータ
ベース中のデータ構造に現れているため、文書構造に関
する条件が検索条件に含まれた場合には、文書構造の情
報を取得して処理することは容易である。しかし、この
トップダウン方式は、検索効率が悪い、という問題点を
有している。In the search by the top-down method, since the inclusion relation (parent-child relationship) between document elements appears in the data structure in the database, the search condition includes the condition relating to the document structure. In this case, it is easy to obtain and process information on the document structure. However, this top-down method has a problem that search efficiency is poor.

【００１１】その理由は、文書要素の条件判定は、文書
の先頭文書要素から構造順に行われるため、文書要素名
などによる検索対象集合の絞り込みができず、不要な文
書要素を訪問する可能性が高いため、である。The reason is that the condition determination of the document element is performed in the order of the structure from the first document element of the document. Therefore, it is not possible to narrow down the set to be searched by the document element name or the like, and there is a possibility that an unnecessary document element is visited. Because it is high.

【００１２】一方のボトムアップ方式による検索は、文
書要素名ごとに予め集合が生成されているため、検索条
件に含まれる文書要素名による絞り込みが予め行われて
いることになり、文書要素名が異なる文書要素は条件判
定対象とならないため、検索効率は良いものの、文書構
造による条件判定が困難である、いう問題点を有してい
る。On the other hand, in the search by the bottom-up method, since a set is generated in advance for each document element name, narrowing down by the document element name included in the search condition is performed in advance. Since different document elements are not subject to the condition determination, there is a problem that the search efficiency is good, but the condition determination based on the document structure is difficult.

【００１３】その理由は、文書要素が文書構造とは独立
に分類集合として格納されており、文書構造に関する情
報が欠落する、ためである。The reason is that the document elements are stored as a classification set independently of the document structure, and information on the document structure is missing.

【００１４】さらに、全文検索エンジンによる検索方式
の場合には、任意の文字列に対し、高速な検索が可能で
あるが、ボトムアップ方式と同様に、構造情報が失われ
てしまうため、全文検索のみで、文書構造に関する条件
を含む問合せに回答することはできない。Furthermore, in the case of a search method using a full-text search engine, high-speed search is possible for an arbitrary character string. However, similar to the bottom-up method, structural information is lost. It is not possible to answer a query including a condition relating to the document structure by itself.

【００１５】また、文書構造の情報を格納したデータベ
ースと全文検索エンジンを組み合わせる方式では、構造
情報を検索するためのデータベース・アクセスがボトル
ネックとなり、性能劣化を引き起こす原因ともなる。Further, in a system in which a database storing information of a document structure is combined with a full-text search engine, a database access for searching for structure information becomes a bottleneck, which causes performance degradation.

【００１６】したがって、本発明は、上記問題点に鑑み
てなされたものであって、その目的は、構造化文書の文
書構造を検索条件とする問合せに回答できる構造化文書
検索システム及び方法を提供することにある。SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and has as its object to provide a structured document search system and method which can answer a query using the document structure of a structured document as a search condition. Is to do.

【００１７】本発明の他の目的は、文書中に任意の文字
列を含むことを検索条件（例えば文書中に含まれる文字
列の適合条件）とする問合せに高速に回答できる構造化
文書検索システム及び方法を提供することにある。Another object of the present invention is to provide a structured document search system capable of quickly answering a query in which an arbitrary character string is included in a document as a search condition (for example, a matching condition of a character string included in the document). And a method.

【００１８】本発明のさらに他の目的は、構造化文書の
文書構造に関する検索条件と、文書中に含まれる文字列
の適合条件（検索条件）とを含む複合問合せに高速に回
答できる構造化文書検索システム及び方法を提供するこ
とにある。Still another object of the present invention is to provide a structured document capable of quickly answering a compound query including a search condition relating to a document structure of a structured document and a matching condition (search condition) of a character string included in the document. A search system and method are provided.

【００１９】[0019]

【課題を解決するための手段】前記目的を達成する本発
明の構造化文書検索システムは、入力手段から入力さ
れた構造化文書を受け取り前記構造化文書のテキスト全
文をテキスト格納部に格納する構造化文書登録手段と、
前記入力された前記構造化文書のテキストを、文書要素
単位に分解して固有のＩＤ（識別番号）を付与し、前記
文書要素のそれぞれに対応するテキストと文書要素間の
関連を文書部品格納部に格納する文書要素分解・格納手
段と、前記入力された構造化文書および該構造化文書を
構成する文書要素を単位として、全文インデックスを生
成し、その際、前記文書要素に対するテキストには、該
文書要素のＩＤと、該文書要素を包含する親文書要素の
ＩＤと、元文書のＩＤが、テキストとして追加され、そ
の状態のテキストに対して、全文インデックスを生成す
る全文インデックス生成手段と、前記全文インデック
ス生成手段で生成された全文インデックスを用いてテキ
ストと文書構造に関する検索を実行する全文検索実行手
段と、を含む全文検索手段と、前記入力手段から入力さ
れた構造化文書に対する検索要求を受け取り問合せ条件
を解析する問い合わせ解析手段と、前記文書部品格納部
に格納された文書の構造情報を基に、構造に関する条件
のみを判定する構造条件判定手段と、前記問い合わせ解
析手段の解析に従い、前記全文検索手段もしくは前記構
造条件判定手段を用いて、検索条件に合致する文書ある
いは文書要素の検索を実行する問い合わせ実行手段と、
前記全文検索手段もしくは前記構造条件判定手段による
検索の結果得られた、文書要素ＩＤから、該当する文書
要素ＩＤに対応する文書要素の情報を前記文書部品格納
部から取り出し、さらに、元文書ＩＤより前記テキスト
格納部より元の構造化文書のテキストを取り出し出力手
段に表示出力するように制御する文書要素取り出し手段
と、を備え、複数の条件から構成される複合問合せが与
えられた場合に、前記問合せ実行手段では、前記複数の
条件の各々について一つずつ検索を実行し、その際、２
つ目以降の条件検索では、先に実行した条件の検索結果
を条件に埋め込み検索対象集合を絞り込む新しい問合せ
条件を生成し、ある文書要素のＩＤと、該文書要素を起
点としその親文書要素あるいは子文書要素など該文書要
素に関連した検索条件が入力として与えられると、前記
文書部品格納部中の文書構造を検索して、前記文書要素
が該条件を満たすか否かを判定する文書要素条件判定手
段をさらに備えたものである。A structured document retrieval system according to the present invention that achieves the above object receives a structured document input from an input unit and stores a full text of the structured document in a text storage unit. Document registration means,
The input text of the structured document is decomposed into document elements, and a unique ID (identification number) is assigned, and the text element corresponding to each of the document elements and the relationship between the document elements are stored in a document part storage unit. A full-text index is generated in units of the input structured document and the document elements constituting the structured document, and the text for the document element includes A full-text index generating means for adding a document element ID, a parent document element ID including the document element, and an ID of an original document as text, and generating a full-text index for the text in that state; Full-text search execution means for performing a search on text and document structure using the full-text index generated by the full-text index generation means. Search means, query analysis means for receiving a search request for a structured document input from the input means, and analyzing query conditions, and based on the structure information of the document stored in the document part storage section, only conditions relating to the structure And a query execution unit that executes a search for a document or a document element that matches a search condition using the full-text search unit or the structure condition determination unit according to the analysis of the query analysis unit.
From the document element ID obtained as a result of the search by the full-text search unit or the structure condition determining unit, information on the document element corresponding to the corresponding document element ID is extracted from the document parts storage unit. A document element retrieval unit for controlling the text storage unit to retrieve the text of the original structured document from the text storage unit and display and output the text to the output unit, and when a compound query including a plurality of conditions is given, The query execution means executes a search for each of the plurality of conditions, one at a time,
In the third and subsequent condition searches, a new query condition is created in which the search result of the previously executed condition is embedded as a condition to narrow down the set of search targets, and the ID of a certain document element and its parent document element or When a search condition related to the document element such as a child document element is given as an input, a document element condition is determined by searching a document structure in the document part storage unit and determining whether the document element satisfies the condition. It further comprises a judgment means.

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【００２３】[0023]

【発明の実施の形態】本発明の実施の形態について以下
に説明する。まず、本発明の好ましい実施の形態につい
てその要部を概説する。Embodiments of the present invention will be described below. First, the essential parts of a preferred embodiment of the present invention will be outlined.

【００２４】本発明の構造化文書検索システムは、その
好ましい第１の実施の形態において、個々の文書要素の
テキストに、その文書要素IDと、当該文書要素を包含す
る親文書要素IDと、元の構造化文書IDと、をテキストと
して追加し、この形で、全文インデックスを生成する全
文インデックス生成手段（図１の132）と、文書の構造
情報を格納する文書要素分解・格納手段（図１の112）
と、を備え、構造化文書に対する問合せ中に、文字列の
適合条件と、文書構造に関する条件が含まれる場合に
は、全文検索実行手段（図１の131）によって検索し、
問合せ中に、文書構造に関する条件だけが含まれる場合
には、構造条件判定手段（図１の116）により、文書部
品格納部（図１の122）の情報のみを使って、検索す
る。In the first preferred embodiment of the structured document retrieval system of the present invention, the text of each document element includes a document element ID, a parent document element ID including the document element, and an original document element ID. A full-text index generating means (132 in FIG. 1) for generating a full-text index in this form, and a document element decomposing / storing means (FIG. 1) for storing document structure information. Of 112)
When the query for the structured document includes the matching condition of the character string and the condition related to the document structure, the search is performed by the full-text search execution means (131 in FIG. 1).
If only the condition related to the document structure is included in the inquiry, the structure condition determination means (116 in FIG. 1) searches using only the information in the document part storage unit (122 in FIG. 1).

【００２５】また、本発明の構造化文書検索システム
は、その好ましい第２の実施の形態において、ある文書
要素を起点として、それに関連するすべての文書要素に
対し、複数の問合せ条件の真偽を判定する文書要素条件
判定手段（図１０の218）を備え、複合問合せの一部の
条件検索結果の個数が十分少ないときには、検索の結果
得られた文書要素IDと、文書部品格納部（図１０の12
2）の情報を使って、検索の結果得られた文書要素に関
連するすべての文書要素を対象として、他の未処理のす
べての条件を一度に判定するよう動作する。In the structured document search system according to the second embodiment of the present invention, a certain document element is used as a starting point to determine the truth of a plurality of query conditions for all related document elements. A document element condition judging means (218 in FIG. 10) is provided, and when the number of condition search results of a part of the compound query is sufficiently small, a document element ID obtained as a result of the search and a document part storage unit (FIG. 10) Of 12
Using the information of 2), the operation is performed to determine all the other unprocessed conditions at once for all the document elements related to the document element obtained as a result of the search.

【００２６】以下、本発明の実施の形態について図面を
参照して詳細に説明する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

【００２７】［実施の形態１］図１は、本発明の第１の
実施の形態の構成を示す図である。図１を参照すると、
本発明の第１の実施の形態は、プログラム制御により動
作するデータ処理装置100と、キーボード等の入力装置1
50と、記憶装置120と、出力装置140と、を備えて構成さ
れている。[First Embodiment] FIG. 1 is a diagram showing a configuration of a first embodiment of the present invention. Referring to FIG.
The first embodiment of the present invention includes a data processing device 100 that operates under program control, and an input device 1 such as a keyboard.
50, a storage device 120, and an output device 140.

【００２８】記憶装置120は、テキスト格納部121と、文
書部品格納部122と、全文インデックス格納部123と、を
含む。The storage device 120 includes a text storage section 121, a document parts storage section 122, and a full-text index storage section 123.

【００２９】データ処理装置100は、構造化文書管理手
段110と、全文検索手段130と、を含む。The data processing device 100 includes a structured document management means 110 and a full text search means 130.

【００３０】構造化文書管理手段110は、構造化文書登
録手段111と、文書要素分解／格納手段112と、問合せ処
理手段113と、文書（要素）取出し手段117と、を含む。The structured document managing means 110 includes a structured document registering means 111, a document element decomposing / storing means 112, an inquiry processing means 113, and a document (element) extracting means 117.

【００３１】問合せ処理手段113は、問合せ解析手段114
と、問合せ実行手段115と、構造条件判定手段116と、を
含む。The query processing means 113 includes a query analysis means 114
And a query execution unit 115 and a structure condition determination unit 116.

【００３２】全文検索手段130は、全文検索実行手段131
と、全文インデックス生成手段132と、を含む。The full-text search means 130 is a full-text search execution means 131
And a full-text index generating means 132.

【００３３】これらの手段はつぎのように動作する。These means operate as follows.

【００３４】構造化文書管理手段110は、構造化文書と
その構造情報などを記憶装置120に格納・管理し、また
格納した構造化文書に対する問合せ（検索要求）を処理
して、条件に合致する文書を出力装置140に返却する。The structured document management means 110 stores and manages the structured document and its structure information in the storage device 120, processes an inquiry (search request) for the stored structured document, and matches the condition. The document is returned to the output device 140.

【００３５】構造化文書登録手段111は、入力装置150
から入力された構造化文書を受け取り、そのテキスト全
体を、記憶装置121に格納する。そして、構造情報を抽
出して管理するために、そのテキスト全体を文書要素分
解／格納手段112に渡す。The structured document registration means 111 is provided with the input device 150
, And stores the entire text in the storage device 121. Then, in order to extract and manage the structure information, the entire text is passed to the document element decomposition / storage means 112.

【００３６】文書要素分解／格納手段112は、構造化文
書のテキストを、文書要素単位に分解し、文書要素それ
ぞれに対応するテキストと文書要素間の関連を、共に、
記憶装置120の文書部品格納部122に格納する。ここで、
格納する文書要素の情報には、それぞれ固有のID（識別
情報）が付与される。The document element decomposing / storing means 112 decomposes the text of the structured document into document elements, and associates the text corresponding to each document element with the relation between the document elements.
It is stored in the document part storage unit 122 of the storage device 120. here,
A unique ID (identification information) is assigned to each piece of document element information to be stored.

【００３７】問合せ処理手段113は、入力装置150 から
入力された構造化文書に対する検索要求を受け取り、問
合せ解析手段114により問合せ条件を解析する。そし
て、問合せ実行手段115により、全文検索手段130や構造
条件判定手段116を用いて、検索条件に合致する文書あ
るいは文書要素の検索を実行する。The query processing means 113 receives a search request for the structured document input from the input device 150, and the query analysis means 114 analyzes the query condition. Then, the query execution unit 115 uses the full-text search unit 130 and the structure condition determination unit 116 to search for a document or document element that matches the search condition.

【００３８】複数の条件から構成される複合問合せが与
えられた場合には、問合せ実行手段115では、個々の条
件一つ一つについて検索を実行するが、2つ目以降の条
件検索では、先に実行した検索結果によって検索対象集
合を絞り込んでいる新しい問合せ条件を生成する。When a compound query composed of a plurality of conditions is given, the query execution means 115 executes a search for each of the individual conditions. A new query condition that narrows down the search target set based on the search result executed in (1) is generated.

【００３９】構造条件判定手段116では、文書部品格納
部122に格納された文書の構造情報を基に、構造に関す
る条件のみを判定する。The structure condition judging means 116 judges only the condition relating to the structure based on the structure information of the document stored in the document part storage section 122.

【００４０】文書（要素）取出し手段117は、文書部品
格納部122や、全文検索実行手段131から返却される文書
ID、あるいは文書要素IDから、これに該当するテキスト
を取り出し、出力装置140に送る。The document (element) fetching unit 117 is a document returned from the document part storage unit 122 or the full-text search executing unit 131.
The corresponding text is extracted from the ID or the document element ID and sent to the output device 140.

【００４１】全文検索手段130は、全文検索インデック
スを生成し、また全文検索により任意の文字列を含む文
書を高速に検索する。The full-text search means 130 generates a full-text search index, and performs a high-speed search for documents containing an arbitrary character string by full-text search.

【００４２】全文検索実行手段131は、全文インデック
ス格納部123に格納された全文インデックスを用いて、
任意の文字列、およびそれらの組合せに対する検索を実
行する。The full-text search execution means 131 uses the full-text index stored in the full-text index storage 123 to
Perform a search on any string, and any combination thereof.

【００４３】全文インデックス生成手段132は、構造化
文書およびそれを構成する文書要素を単位として全文イ
ンデックスを生成する。ここで、文書要素に対するテキ
ストには、当該文書要素のIDと、当該文書要素を包含す
る親文書要素のIDと、元文書のIDがテキストとして追加
され、その状態のテキストに対して、全文インデックス
が生成される。The full-text index generating means 132 generates a full-text index in units of a structured document and its constituent document elements. Here, the ID of the document element, the ID of the parent document element containing the document element, and the ID of the original document are added to the text for the document element as text, and the full-text index is added to the text in that state. Is generated.

【００４４】記憶装置120は、構造化文書のテキスト本
体や構造情報、そして、全文検索手段130が利用する全
文インデックスを格納する。The storage device 120 stores the text body and structure information of the structured document, and the full-text index used by the full-text search means 130.

【００４５】テキスト格納部121は、構造化文書のテキ
スト本体を格納する。The text storage section 121 stores the text body of the structured document.

【００４６】文書部品格納部122は、構造化文書の構造
情報を格納する。The document part storage unit 122 stores the structure information of the structured document.

【００４７】全文インデックス格納部123は、全文検索
手段130が利用する全文インデックスを格納する。The full-text index storage unit 123 stores a full-text index used by the full-text search means 130.

【００４８】入力装置150は、入力された構造化文書や
構造化文書に対する問合せを、構造化文書管理手段110
に渡す。The input device 150 sends the input structured document or an inquiry for the structured document to the structured document management unit 110.
Pass to.

【００４９】出力装置140は、構造化文書あるいは文書
要素を画面などに適切な形式で表示する。The output device 140 displays the structured document or document element on a screen or the like in an appropriate format.

【００５０】次に、図２、図３及び図４は、本発明の第
１の実施の形態の処理フローを示すフローチャートであ
る。図１、図２乃至図４を参照して、本発明の第１の実
施の形態の全体の動作について詳細に説明する。Next, FIGS. 2, 3 and 4 are flow charts showing the processing flow of the first embodiment of the present invention. The entire operation of the first embodiment of the present invention will be described in detail with reference to FIGS.

【００５１】構造化文書の検索に先立ち、図２に示す手
順で、全文インデックスを生成する。Prior to the search for a structured document, a full-text index is generated according to the procedure shown in FIG.

【００５２】入力装置150から構造化文書管理手段110に
入力された構造化文書は、構造化文書登録手段111によ
って文書全体がテキスト格納部121に格納される（図２
のステップＡ1）。The entire structured document input from the input device 150 to the structured document management unit 110 is stored in the text storage unit 121 by the structured document registration unit 111 (FIG. 2).
Step A1).

【００５３】次に、文書要素分解／格納手段112によっ
て文書要素が抽出され、固有のIDが付与される（図２の
ステップＡ2）。Next, the document element is extracted by the document element decomposing / storing means 112, and a unique ID is given (step A2 in FIG. 2).

【００５４】文書要素分解／格納手段112は、文書要素
抽出の過程で得られる情報から文書の構造情報を文書部
品格納部122に生成し（図２のステップＡ３）、それと
同時に、抽出された個々の文書要素を一つの単位として
全文インデックスが全文インデックス生成手段132によ
って生成される（図２のステップＡ４）。The document element decomposing / storing means 112 generates document structure information in the document part storage unit 122 from the information obtained in the process of extracting the document elements (step A3 in FIG. 2). The full-text index is generated by the full-text index generation unit 132 using the document element of (1) as one unit (step A4 in FIG. 2).

【００５５】このとき、文書要素に対応するテキストだ
けではなく、当該文書要素のIDと、当該文書要素を包含
する親文書要素のIDと、元文書のIDがテキストとして追
加され、その状態のテキストに対して全文インデックス
が生成される。At this time, in addition to the text corresponding to the document element, the ID of the document element, the ID of the parent document element containing the document element, and the ID of the original document are added as text, and the text in that state is added. A full-text index is generated for.

【００５６】続いて、本発明の第１の実施の形態におけ
る検索処理について、図３及び図４のフローチャートを
用いて説明する。Next, a search process according to the first embodiment of the present invention will be described with reference to the flowcharts of FIGS.

【００５７】まず、入力装置150から入力された構造化
文書の検索要求について問合せ解析手段113が解析し
（図３のステップＢ１）、検索要求が複数の検索条件か
ら構成される複合問合せか否かを判定する（図３のステ
ップＢ２）。First, the query analysis unit 113 analyzes a search request for a structured document input from the input device 150 (step B1 in FIG. 3), and determines whether the search request is a compound query including a plurality of search conditions. (Step B2 in FIG. 3).

【００５８】図３のステップＢ２の判定の結果、複合問
合せである場合には、問合せ実行手段115において、文
書要素の包含関係がより外側、つまり構造化文書中での
出現場所がより文書の冒頭に近い文書要素に対する検索
条件を一つ取り出す（ステップＢ３）。これは、続く検
索条件の評価の際に、指定する親文書要素IDを取得する
ためである。例えば、文書要素の木構造を例に説明する
と、木構造上のよりルート（root）に近い部分からリー
フ（leａf）部に向かって対象集合を絞り込んでいくこ
とに相当する。If the result of the determination in step B2 of FIG. 3 indicates that the query is a compound query, the query execution means 115 determines that the inclusion relation of the document element is more outward, that is, the occurrence location in the structured document is closer to the beginning of the document. One search condition for a document element close to is retrieved (step B3). This is for acquiring the parent document element ID to be specified when the subsequent search condition is evaluated. For example, taking the tree structure of a document element as an example, this corresponds to narrowing down the target set from a portion closer to the root on the tree structure toward a leaf portion.

【００５９】一方、複合問合せでない場合には、このス
テップＢ３を行わない。On the other hand, if it is not a compound query, step B3 is not performed.

【００６０】次に、問合せ実行手段115において、処理
対象とする検索条件が、文書要素中に含まれる文字列
（キーワード）を条件として含むかどうかを判定する
（図３のステップＢ４）。Next, the query execution means 115 determines whether or not the search condition to be processed includes a character string (keyword) included in the document element as a condition (step B4 in FIG. 3).

【００６１】検索条件中にキーワード条件が含まれてい
る場合には、さらに先に処理を実行して得られた結果が
存在するか否かを判定する（図３のステップＢ５）。If the keyword condition is included in the search condition, it is determined whether or not there is a result obtained by executing the process first (step B5 in FIG. 3).

【００６２】複合問合せのうち、一つめの条件を評価す
る場合には、先に実行した処理はないので、この判定は
偽になる。二つめ以降の条件を評価するときには先に実
行した処理の結果が存在するので、問合せ実行手段115
において、処理対象とする検索条件に、先に実行した検
索結果の文書要素IDを親文書要素IDとして持つという条
件を付加する（図３のステップＢ６）。In the case of evaluating the first condition in the compound query, there is no processing executed first, so this judgment is false. When the second and subsequent conditions are evaluated, there is a result of the previously executed processing.
, A condition that the document element ID of the previously executed search result is used as the parent document element ID is added to the search condition to be processed (step B6 in FIG. 3).

【００６３】全文検索用のテキストには、図２のステッ
プＡ４において親文書要素IDがテキストとして付加され
ているので、図３のステップＢ６では、これを利用した
一種の絞り込み処理を実現することに相当する。Since the parent document element ID is added to the full-text search text in step A4 of FIG. 2 as a text, in step B6 of FIG. 3, a kind of narrowing-down process using this is realized. Equivalent to.

【００６４】そして、新しく生成された検索条件に対
し、全文検索実行手段131が全文インデックス格納部123
に格納されている全文インデックスを基に、キーワード
に対する全文検索を実行し、条件に合致する文書要素の
IDを取り出す（図３のステップＢ７）。Then, for the newly generated search condition, the full-text search execution means 131 causes the full-text index storage unit 123
Performs a full-text search for the keyword based on the full-text index stored in
The ID is taken out (step B7 in FIG. 3).

【００６５】キーワードが条件として含まれていない場
合には、図３のステップＢ４からＢ７の処理は行わず、
構造条件判定手段116が文書部品格納部122の構造情報を
参照し、検索条件に合致する文書要素を検索し、条件に
合致する文書要素のIDを取り出す（図３のステップＢ
８）。If the keyword is not included as a condition, the processing of steps B4 to B7 in FIG.
The structure condition determination means 116 refers to the structure information in the document part storage unit 122, searches for a document element that matches the search condition, and extracts the ID of the document element that matches the condition (step B in FIG. 3).
8).

【００６６】検索処理あるいは条件判定処理が終わる
と、問合せ実行手段115において、問合せの中の検索条
件で未処理のものがあるか否かを判定する（図４のステ
ップＢ９）。未処理のものがある場合には、図３のステ
ップＢ３に戻り、未処理の条件に対して、上記と同じ処
理を繰り返し実行する。When the search process or the condition determination process is completed, the query execution means 115 determines whether or not there is any unprocessed search condition in the query (step B9 in FIG. 4). If there is any unprocessed one, the process returns to step B3 in FIG. 3 and the same processing as described above is repeatedly executed for the unprocessed condition.

【００６７】図４のステップＢ９において、すべての検
索条件が処理済みであることが判定されると、文書（要
素）取出し手段117により、得られた文書要素のIDから
それに対応するテキストを取得したり、その文書要素を
含む構造化文書本体を取得したりする（図４のステップ
Ｂ１０）。If it is determined in step B9 in FIG. 4 that all the search conditions have been processed, the document (element) extracting means 117 obtains a text corresponding to the ID from the obtained document element. Or, a structured document body including the document element is acquired (step B10 in FIG. 4).

【００６８】そして最後に、文書（要素）取出し手段11
7は、取得したテキストを出力装置140に送り、出力装置
140はこれを適切な形式でユーザに提示する（図４のス
テップＢ１１）。Finally, the document (element) extracting means 11
7 sends the acquired text to the output device 140,
140 presents this to the user in an appropriate format (step B11 in FIG. 4).

【００６９】次に、本発明の第１の実施の形態の作用効
果について説明する。Next, the operation and effect of the first embodiment of the present invention will be described.

【００７０】本発明の第１の実施の形態では、全文イン
デックスの対象として、テキストだけではなく、その文
書要素のIDと、その文書要素を包含する親要素の文書ID
と、その文書要素を含む構造化文書の文書IDと、をテキ
ストの形式で追加し、「テキスト」＋「文書要素ID」＋
「構造化文書ID」という形で、全文インデックスが生成
されるように構成されているため、親文書要素が文書構
造に関する条件中に指定されている複合問合せに対し、
別途格納・管理されている構造情報を参照しなくても、
全文検索を実行することによって、条件判定を行うこと
ができる。In the first embodiment of the present invention, the target of the full-text index is not only the text but also the ID of the document element and the document ID of the parent element containing the document element.
And the document ID of the structured document containing the document element are added in text format, and "text" + "document element ID" +
Since it is configured to generate a full-text index in the form of "structured document ID", for a compound query whose parent document element is specified in the condition related to the document structure,
Even without referring to the structural information stored and managed separately,
By performing a full-text search, a condition can be determined.

【００７１】また、本発明の第１の実施の形態では、さ
らに、検索条件が文書の適合条件を含まず、文書構造に
対する条件のみから構成されている場合には、全文検索
は使わず、保持している構造情報から条件判定を行うよ
うに構成されているため、全文検索機能とは独立に、条
件判定を実行できる。Further, in the first embodiment of the present invention, when the search condition does not include the document matching condition and is composed only of the condition for the document structure, the full-text search is not used and the holding is performed. Since it is configured to make a condition judgment from the structured information that has been performed, the condition judgment can be executed independently of the full-text search function.

【００７２】［実施例１］次に、本発明の第１の実施の
形態についてさらに詳細に説明すべく、具体的な例の適
用した一実施例に即して説明する。なお、本発明の一実
施例の構成及び処理フローは、前記した本発明の第１の
実施の形態と同様とされる。[Embodiment 1] Next, in order to describe the first embodiment of the present invention in further detail, a description will be given of an embodiment to which a specific example is applied. Note that the configuration and processing flow of an example of the present invention are the same as those of the above-described first embodiment of the present invention.

【００７３】本実施例として、以下では、構造化文書の
一つであり、ISO8879およびJIS X 4151の標準規格が制
定されているSMGL（Standard Generalized Markup L
anguage）を例として用いる。In this embodiment, an SMGL (Standard Generalized Markup L), which is one of the structured documents, in which the standards of ISO8879 and JIS X 4151 are established, will be described below.
anguage) is used as an example.

【００７４】図５に示すように、「recipe」、「titl
e」、「step」、「item」、「exp」の５つの文書要素か
ら構成されるDTD (Document Type Definition；ドキュ
メント型定義) による、図６に示すようなSGML文書が入
力装置150から入力されると、構造化文書登録手段111に
よって、図６のテキスト全体はテキスト格納部121に格
納される（図２のステップＡ1）。As shown in FIG. 5, "recipe", "titl"
An SGML document as shown in FIG. 6 is input from the input device 150 by a DTD (Document Type Definition) composed of five document elements of “e”, “step”, “item”, and “exp”. Then, the entire text in FIG. 6 is stored in the text storage unit 121 by the structured document registration unit 111 (step A1 in FIG. 2).

【００７５】さらに、文書要素分解／格納手段112によ
って文書要素単位に分解・抽出され、文書要素毎に固有
のIDが付与され（図２のステップＡ2）、図７に示すよ
うな文書要素間の関連を示す情報が、文書部品格納部12
2に生成される（ステップＡ3）。Further, the document elements are decomposed and extracted in document element units by the document element decomposing / storing means 112, and a unique ID is assigned to each document element (step A2 in FIG. 2). The information indicating the association is stored in the document part storage unit 12.
2 (step A3).

【００７６】そしてさらに、全文インデックス生成手段
132により、文書要素一つ一つを単位として、全文イン
デックスが全文インデックス格納部123に生成される。Further, a full-text index generating means
With 132, a full-text index is generated in the full-text index storage unit 123 for each document element.

【００７７】このとき、文書要素のテキストには、図８
に示すように、その文書要素のID自身の他に、これを包
含する親文書要素のIDと、元文書のIDと、が文字列とし
て挿入される。全文インデックス生成手段132は、図８
に示すように、「テキスト」＋「自ID」＋「親ID」＋「文書ID」の文字列を一つの「文書」とみなし、全文インデックス
を生成する（図４のステップＡ4）。図８に示す例で
は、テキストは「<item>肉を炒める</item>」、自ID
は、pid＝８、親IDはpid＝６、pid＝１、文書ID（doc_i
d）はdid＝100である。At this time, the text of the document element includes FIG.
As shown in (1), in addition to the ID of the document element itself, the ID of the parent document element containing the document element and the ID of the original document are inserted as character strings. The full-text index generation unit 132 is configured as shown in FIG.
As shown in (1), a character string of “text” + “own ID” + “parent ID” + “document ID” is regarded as one “document”, and a full-text index is generated (step A4 in FIG. 4). In the example shown in FIG. 8, the text is “<item> Fry the meat </ item>”
Is pid = 8, parent ID is pid = 6, pid = 1, and document ID (doc_i
In d), did = 100.

【００７８】ここで、図９に示すような文字列の適合条
件と文書構造条件を、問合せ条件に含む複合問合せが、
入力装置150によって入力されたとする。Here, a compound query including the character string matching condition and the document structure condition as shown in FIG.
It is assumed that the input is made by the input device 150.

【００７９】問合せ処理手段113ではまず、問合せ解析
手段114によって、この問合せを、（１）「<step>に
『炒める』という文字列を含む」（条件1）という条件
と、（２）「それが包含する<exp>に『バター』を含
む」（条件2）という条件、に分解する（図３のステッ
プＢ１―Ｂ２）。In the query processing means 113, first, the query analysis means 114 converts the query into the following conditions: (1) "a character string of" fried "is included in <step>" (condition 1); Include “butter” in <exp> included ”(condition 2) (steps B1-B2 in FIG. 3).

【００８０】そして、<exp>を含み、包含関係がより外
側の文書要素である<step>に対する条件１を、最初の処
理対象として選択する（図３のステップＢ３）。条件中
で指定された文書要素間に包含関係がない場合には、ど
れを選択しても良い。Then, condition 1 for <step>, which is a document element including <exp> and having an outer inclusion relation outside, is selected as the first processing target (step B3 in FIG. 3). If there is no inclusion relationship between the document elements specified in the condition, any may be selected.

【００８１】問合せ実行手段115では、条件１に文字列
の適合条件が含まれていることを判定し（図３のステッ
プＢ４）、さらに、この処理より先に行った検索結果が
ないことを判定すると（図３のステップＢ５）、全文検
索実行手段131に、この問合せを委譲する。The query execution means 115 determines that the condition 1 includes a character string matching condition (step B4 in FIG. 3), and further determines that there is no search result performed prior to this processing. Then (step B5 in FIG. 3), the inquiry is transferred to the full-text search execution means 131.

【００８２】全文検索実行手段131は、全文インデック
ス格納部123の全文インデックスを参照して、「<step
>」と「炒める」という文字列が同時に含まれている文
書要素を検索する（図３のステップＢ７）。The full-text search execution means 131 refers to the full-text index in the full-text index storage unit 123 and executes “<step
A search is made for a document element that includes the character strings “>” and “fried” at the same time (step B7 in FIG. 3).

【００８３】その結果、図１６に示した文書要素が、条
件を満足するものとして得られる。As a result, the document element shown in FIG. 16 is obtained as satisfying the conditions.

【００８４】この後、未処理の条件２が存在することか
ら、図３のステップＢ３から始まる処理が再び開始され
る。Thereafter, since the unprocessed condition 2 exists, the process starting from step B3 in FIG. 3 is started again.

【００８５】条件２には、キーワードが含まれ、さら
に、条件１を検索した結果が存在するので、問合せ実行
手段115は、条件１の検索結果を条件２に埋め込み、新
しい検索条件を生成する（図３のステップＢ４―Ｂ
６）。The condition 2 includes a keyword and the result of the search for the condition 1 exists. Therefore, the query execution unit 115 embeds the search result of the condition 1 in the condition 2 to generate a new search condition ( Step B4-B in FIG.
6).

【００８６】このとき生成される新しい検索条件は、
「『<exp>』と『バター』という文字列が同時に含ま
れ、かつ、『pid=（条件1の検索結果の文書要素ID）』
という文字列を同時に含む文書要素を検索する」（条件
２―１）となる。The new search condition generated at this time is as follows:
"The character strings"<exp>"and" butter "are included at the same time, and" pid = (document element ID of the search result of condition 1) "
Search for document elements that simultaneously include the character string "(condition 2-1)."

【００８７】この新しい検索条件２―１の後半部分
（「かつ」以降）が、検索対象集合の絞り込み条件とし
て、新しく付加された検索条件である。The latter half of the new search condition 2-1 (after “and”) is a search condition newly added as a condition for narrowing down the search target set.

【００８８】条件１の結果として得られる文書要素が複
数個あった場合には、各々について、『pid=（文書要素
ID）』の条件が生成され、各々について処理が実行され
る（図３のステップＢ7）。When there are a plurality of document elements obtained as a result of Condition 1, for each of them, “pid = (document element
ID) ”is generated, and the process is executed for each condition (step B7 in FIG. 3).

【００８９】この場合、図７の木構造で示される文書要
素のうち、IDが６と９に対応する文書要素が、条件を満
たすので、（１）「『exp』と『バター』という文字列
が同時に含まれ、かつ『pid=6』という文字列を同時に
含む文書要素を検索する」、（２）「『exp』と『バタ
ー』という文字列が同時に含まれ、かつ『pid=9』とい
う文字列を同時に含む文書要素を検索する」、の２つの
条件が生成される。In this case, among the document elements represented by the tree structure in FIG. 7, the document elements corresponding to IDs 6 and 9 satisfy the conditions, so that (1) the character strings “exp” and “butter” Is searched for and the document element that contains the character string "pid = 6" at the same time. "(2)" The character strings "exp" and "butter" are included at the same time and "pid = 9" Search for document elements that include character strings simultaneously "is generated.

【００９０】全文検索手段130の機能によっては、これ
らの条件を、同時に評価しても良い。Depending on the function of the full-text search means 130, these conditions may be evaluated simultaneously.

【００９１】検索条件２―１の処理結果として、ID=8と
ID=11の文書要素が得られる。As the processing result of the search condition 2-1, ID = 8
The document element with ID = 11 is obtained.

【００９２】これは、元の問合せの処理結果であるか
ら、文書（要素）取出し手段117では、得られた文書要
素IDから該当する文書要素の情報を文書部品格納部122
から取り出し、さらに、元文書IDにより、テキスト格納
部121から、元の構造化文書のテキスト全体を取り出
し、取り出した内容を、出力装置140によって表示する
（図４のステップＢ１０―Ｂ１１）。Since this is the processing result of the original query, the document (element) fetching unit 117 stores the information of the corresponding document element from the obtained document element ID in the document part storage unit 122
Then, the entire text of the original structured document is extracted from the text storage unit 121 using the original document ID, and the extracted content is displayed by the output device 140 (steps B10-B11 in FIG. 4).

【００９３】一方、図１０に示すように、キーワードを
含まず、構造に関する条件のみが含まれる問合せが与え
られた場合（図１０の例では<step>を４つ以上持つ<rec
ipe>を取り出す）、問合せ実行手段115は、構造条件判
定手段116に処理を委譲し、構造情報判定手段116は、文
書部品格納部122のみを検索して、格納されている<reci
pe>の中から、<step>の文書要素を４つ以上持つものを
検索し、それを文書（要素）取出し手段117にて取り出
し、文書（要素）取出し手段117から出力装置140に渡し
て適切な形で出力する。On the other hand, as shown in FIG. 10, when a query that does not include a keyword but includes only a condition relating to a structure is given (in the example of FIG. 10, <rec> having four or more <step> s).
ipe>), the query execution unit 115 delegates the processing to the structure condition determination unit 116, and the structure information determination unit 116 searches only the document part storage unit 122 and stores the stored <reci
From among the <pe>, search for those having four or more document elements of <step>, retrieve them by the document (element) retrieval means 117, and pass them to the output device 140 from the document (element) retrieval means 117 Output in a simple form.

【００９４】［実施の形態２］次に、本発明の第２の実
施の形態について説明する。[Second Embodiment] Next, a second embodiment of the present invention will be described.

【００９５】図１１は、本発明の第２の実施の形態の構
成を示す図である。図１１を参照すると、本発明の第２
の実施の形態は、問合せ処理手段213が、図１を参照し
て説明した前記第１の実施の形態における問合せ処理手
段113の構成に加え、さらに文書要素条件判定手段218を
有する。FIG. 11 is a diagram showing the configuration of the second embodiment of the present invention. Referring to FIG. 11, a second embodiment of the present invention is shown.
In this embodiment, the inquiry processing means 213 has a document element condition determination means 218 in addition to the configuration of the inquiry processing means 113 in the first embodiment described with reference to FIG.

【００９６】この文書要素条件判定手段218はつぎのよ
うに動作する。The document element condition judging means 218 operates as follows.

【００９７】文書要素条件判定手段218は、ある文書要
素のIDとそれに関連した検索条件を入力として与えられ
ると、文書部品格納部122中の文書構造を検索して、与
えられた文書要素がその条件を満たすかどうかを判定す
る。ここでの検索条件は、IDが与えられた文書要素のみ
ならず、その親文書要素あるいは子文書要素など、当該
文書要素に関連するすべての文書要素に関するものであ
り、文書要素条件判定手段218は、これを処理する。When given as input the ID of a certain document element and the search condition associated therewith, the document element condition determination means 218 searches the document structure in the document part storage unit 122 and finds that the given document element is It is determined whether the condition is satisfied. The search condition here relates to not only the document element to which the ID is given, but also all the document elements related to the document element, such as its parent document element or child document element. And handle this.

【００９８】次に、図１２及び図１３は、本発明の第２
の実施の形態の処理フローを示すフロチャートである。
図１１乃至図１３を参照して、本発明の第２の実施の形
態の全体の動作について詳細に説明する。Next, FIGS. 12 and 13 show the second embodiment of the present invention.
9 is a flowchart showing a processing flow of the embodiment.
The entire operation of the second exemplary embodiment of the present invention will be described in detail with reference to FIGS.

【００９９】図１２のステップＢ１―Ｂ８および図１３
のＢ１０−Ｂ１１の各処理は、それぞれ、図３及び図４
に示した処理と同一であるため、その説明は省略する。
前記第１の実施の形態では、図４のステップＢ９におい
て、他に未処理の条件があると判定されると、新しい検
索条件式を生成して全文検索手段130による検索処理を
続行するが、本発明の第２の実施の形態では、ステップ
Ｂ７あるいはステップＢ８での検索処理の結果として得
られた文書要素（ID）数が１個であるか否か判定する
（図１３のステップＣ１）。Steps B1-B8 in FIG. 12 and FIG.
B10-B11 correspond to FIGS. 3 and 4 respectively.
Since the processing is the same as that shown in FIG.
In the first embodiment, if it is determined in step B9 of FIG. 4 that there is another unprocessed condition, a new search condition expression is generated and the search process by the full-text search unit 130 is continued. In the second embodiment of the present invention, it is determined whether the number of document elements (IDs) obtained as a result of the search processing in step B7 or B8 is one (step C1 in FIG. 13).

【０１００】文書要素数が１個である場合には、そのID
と未処理で残っているすべての検索条件を、文書要素条
件判定手段218に送り、文書要素条件判定手段218は、文
書部品格納部122中の情報を用いて、その条件判定を実
行する（図１３のステップＣ２）。If the number of document elements is one, the ID
Is sent to the document element condition determining means 218, and the document element condition determining means 218 executes the condition determination using the information in the document part storage unit 122 (FIG. 13 Step C2).

【０１０１】なお、本発明の第２の実施の形態では、各
条件による検索結果が１個だけの場合に、文書要素条件
判定手段218の処理を実行するとしたが、データ処理装
置200の処理速度が十分に高速である場合には、全文検
索手段130の処理と比べて速度が遅くならない限りにお
いて、検索結果の数を複数個に増やして処理してもよ
い。In the second embodiment of the present invention, the processing of the document element condition judging means 218 is executed when only one search result is obtained according to each condition. If the speed is sufficiently high, the number of search results may be increased to a plurality, as long as the speed is not slower than the process of the full-text search means 130.

【０１０２】次に、本発明の第２の実施の形態の作用効
果について説明する。Next, the operation and effect of the second embodiment of the present invention will be described.

【０１０３】本発明の第２の実施の形態では、複合問合
せ中の一部の条件による検索結果の数が１個ないし十分
に少ない場合、検索手段を全文検索手段131から文書要
素条件判定手段218に変更し、文書構造格納部122に対し
て残りの検索条件をまとめて評価するというように構成
されているため、複合問合せを構成する個々の条件に対
して全文検索実行手段131による全文検索を一回一回実
施する処理が回避され、データ処理装置200での処理コ
ストが節約できる。In the second embodiment of the present invention, when the number of search results based on some conditions in a compound query is one or sufficiently small, the search means is changed from the full-text search means 131 to the document element condition determination means 218. And the remaining search conditions are evaluated collectively for the document structure storage unit 122, so that the full-text search by the full-text search execution unit 131 is performed for each condition constituting the compound query. It is possible to avoid the processing to be performed once and to save the processing cost in the data processing device 200.

【０１０４】［実施例２］次に、本発明の第２の実施の
形態についてさらに詳細に説明すべく、具体的な例の適
用した一実施例に即して説明する。なお、本発明の一実
施例の構成及び処理フローは、前記した本発明の第２の
実施の形態と同様とされる。[Embodiment 2] Next, in order to describe the second embodiment of the present invention in more detail, an embodiment to which a specific example is applied will be described. Note that the configuration and processing flow of an example of the present invention are the same as those of the above-described second embodiment of the present invention.

【０１０５】図５、図６、図７、図８に示すように、構
造化文書が登録されているときに、図１４に示すような
複合問合せが与えられたとする。As shown in FIGS. 5, 6, 7 and 8, when a structured document is registered, it is assumed that a compound query as shown in FIG. 14 is given.

【０１０６】<item>、<exp>、<title>の文書要素間に包
含関係はないので、最初の「<item>に『カレールー』と
いう文字列を含む」という条件を処理する（図１２のス
テップＢ１―Ｂ３）。Since there is no inclusive relation between the document elements of <item>, <exp>, and <title>, the condition that the first “<item> includes the character string“ Careeroo ”” is processed (FIG. 12). Steps B1-B3).

【０１０７】全文検索実行手段131での検索の結果、条
件を満足する文書要素が1つだけ（ID=n-2）得られたと
する。本実施例では、これを検出し（図１３のステップ
Ｃ１）、この文書要素IDと残りすべての条件「<exp>に
『ケチャップ』という文字列を含み、<title>に『おい
しい』という文字列を含む」（条件３）を、文書要素条
件判定手段218に送る。It is assumed that, as a result of the search by the full-text search execution means 131, only one document element satisfying the condition (ID = n-2) is obtained. In the present embodiment, this is detected (step C1 in FIG. 13), and the document element ID and all the remaining conditions include a character string "ketchup" in <exp> and a character string "delicious" in <title>. (Condition 3) to the document element condition determination means 218.

【０１０８】文書要素条件判定手段218では、ID=n-2に
対応する文書要素の情報を文書部品格納部122から取得
し、それを起点として当該文書要素に関連するすべての
文書要素を辿り、それらを対象として、条件３を判定す
る（図１３のステップＣ２）。The document element condition judging means 218 obtains information of the document element corresponding to ID = n−2 from the document part storage unit 122, and traces all the document elements related to the document element with the information as a starting point. The condition 3 is determined for them (step C2 in FIG. 13).

【０１０９】ID=n-2の文書要素は条件３を満足しないの
で、文書（要素）取出し手段117では文書要素の情報が
取り出せず（図１３のステップＢ１０）、出力装置140
では条件を満足する文書が存在しない旨の表示が行われ
る（図１３のステップＢ１１）。Since the document element of ID = n-2 does not satisfy the condition 3, the document (element) extracting means 117 cannot extract the information of the document element (step B10 in FIG. 13), and the output device 140
Is displayed to indicate that there is no document satisfying the condition (step B11 in FIG. 13).

【０１１０】次に、本発明の第３の実施の形態について
説明する。Next, a third embodiment of the present invention will be described.

【０１１１】図１５を参照すると、本発明の第３の実施
の形態は、構造化文書検索プログラムを記録した記録媒
体400を備える。この記録媒体400としては、磁気ディス
ク、半導体メモリその他の記録媒体であってよい。Referring to FIG. 15, the third embodiment of the present invention includes a recording medium 400 on which a structured document search program is recorded. The recording medium 400 may be a magnetic disk, a semiconductor memory, or another recording medium.

【０１１２】構造化文書検索プログラムは、記録媒体40
0からデータ処理装置300に読み込まれ、データ処理装置
300の動作を制御する。データ処理装置300は、構造化文
書検索プログラムの制御により以下の処理、すなわち第
１および第２の実施の形態におけるデータ処理装置100
および200による処理と同一の処理、を実行する。The structured document search program is executed by the recording medium 40
Read from 0 to the data processing device 300, the data processing device
Control the operation of 300. The data processing device 300 performs the following processing under the control of the structured document search program, that is, the data processing device 100 in the first and second embodiments.
And the same processing as the processing by 200 is executed.

【０１１３】まず、入力装置150により構造化文書が構
造化文書管理手段210に与えられると、その文書本体と
文書構造を表す情報が記憶装置120に生成されるが、文
書構造を表す情報は、文書部品格納部122に格納され
る。この状態で、入力装置150により構造化文書に対す
る問合せが与えられると、構造化文書管理手段210で問
合せを解析し、問合せ処理手段213、あるいは全文検索
手段130によって条件に合致する文書要素あるいは構造
化文書が検索される。検索された文書要素あるいは構造
化文書は、出力装置140に適切な形式で出力される。First, when a structured document is provided to the structured document management means 210 by the input device 150, information representing the document body and the document structure is generated in the storage device 120. The information representing the document structure is: It is stored in the document part storage unit 122. In this state, when a query for a structured document is given by the input device 150, the structured document management means 210 analyzes the query, and the query processing means 213 or the full text retrieval means 130 The document is searched. The retrieved document element or structured document is output to the output device 140 in an appropriate format.

【０１１４】[0114]

【発明の効果】以上説明したように、本発明によれば下
記記載の効果を奏する。As described above, according to the present invention, the following effects can be obtained.

【０１１５】本発明の第１の効果は、構造化文書の文書
構造に関する条件と文書要素に含まれる文字列に関する
条件を含む複数の検索条件から構成された複合問合せ
を、全文検索手段のみによって処理することができ、こ
のため文書要素の構造情報を直接アクセスして条件判定
する場合に比べ、処理速度を向上する、ということであ
る。A first effect of the present invention is that a compound query composed of a plurality of search conditions including a condition relating to the document structure of a structured document and a condition relating to a character string included in a document element is processed only by the full-text retrieval means. Therefore, the processing speed is improved as compared with the case where the structure information of the document element is directly accessed and the condition is determined.

【０１１６】その理由は、本発明においては、全文検索
手段が用いる全文インデックスの中に、文書要素間の包
含関係と構造化文書IDの情報が含まれるように全文イン
デックスを生成している、ためである。The reason is that, in the present invention, the full-text index is generated so that the information of the inclusion relation between the document elements and the structured document ID is included in the full-text index used by the full-text search means. It is.

【０１１７】本発明の第２の効果は、複合問合せ中の一
部の条件による検索結果の数が１個ないし十分に少ない
場合、残りの検索条件をまとめて評価することができ、
処理速度を向上する、ということである。The second effect of the present invention is that, when the number of search results based on some conditions in a compound query is one or sufficiently small, the remaining search conditions can be evaluated collectively.
That is, the processing speed is improved.

【０１１８】その理由は、本発明においては、検索結果
の数が１個ないし十分に少なく、かつ未処理の問合せ条
件がまだ残っていることが検出された場合、検索手段
を、全文検索から、文書の構造情報上で直接条件判定を
実行する手段に、変更するためである。The reason is that, in the present invention, when it is detected that the number of search results is one or sufficiently small and unprocessed query conditions still remain, the search means is changed from full-text search to This is for changing to a means for directly executing the condition determination on the structure information of the document.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の第１の実施の形態の動作を説明するた
めの流れ図であり、データ登録処理の動作を示す流れ図
である。FIG. 2 is a flowchart for explaining an operation of the first exemplary embodiment of the present invention, and is a flowchart showing an operation of a data registration process.

【図３】本発明の第１の実施の形態の動作を説明するた
めの流れ図であり、検索動作を示す流れ図（その１）で
ある。FIG. 3 is a flowchart illustrating an operation of the first exemplary embodiment of the present invention, and is a flowchart (part 1) illustrating a search operation;

【図４】本発明の第１の実施の形態の動作を説明するた
めの流れ図であり、検索動作を示す流れ図（その２）で
ある。FIG. 4 is a flowchart illustrating an operation of the first exemplary embodiment of the present invention, and is a flowchart (part 2) illustrating a search operation;

【図５】本発明の第１の実施の形態の動作を具体的の説
明するための図であり、処理対象の一例となるSGML文書
のDTDの一例を示す図である。FIG. 5 is a diagram for specifically explaining the operation of the first embodiment of the present invention, and is a diagram illustrating an example of a DTD of an SGML document as an example of a processing target.

【図６】本発明の第１の実施の形態の動作を具体的に説
明するための図であり、処理対象の一例となるSGML文書
の一例を示す図である。FIG. 6 is a diagram specifically illustrating an operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of an SGML document that is an example of a processing target.

【図７】本発明の第１の実施の形態の動作を具体的に説
明するための図であり、処理対象の一例となるSGML文書
の構造情報を示す図である。FIG. 7 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram showing structure information of an SGML document as an example of a processing target.

【図８】本発明の第１の実施の形態の動作を具体的に説
明するための図であり、全文インデックスを生成する対
象となるテキストの一例を示す図である。FIG. 8 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of a text for which a full-text index is generated.

【図９】本発明の第１の実施の形態の動作を具体的に説
明するための図であり、検索条件の一例を示す図であ
る。FIG. 9 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of a search condition.

【図１０】本発明の第１の実施の形態の動作を具体的に
説明するための図であり、検索条件の一例を示す図であ
る。FIG. 10 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of a search condition.

【図１１】本発明の第２の実施の形態の構成を示すブロ
ック図である。FIG. 11 is a block diagram illustrating a configuration of a second exemplary embodiment of the present invention.

【図１２】本発明の第２の実施の形態の動作を説明する
ための流れ図であり、検索動作を示す流れ図（その１）
である。FIG. 12 is a flowchart for explaining the operation of the second exemplary embodiment of the present invention, and is a flowchart showing a search operation (part 1);
It is.

【図１３】本発明の第２の実施の形態の動作を説明する
ための流れ図であり、検索動作を示す流れ図（その２）
である。FIG. 13 is a flowchart for explaining the operation of the second exemplary embodiment of the present invention, and is a flowchart showing a search operation (part 2);
It is.

【図１４】本発明の第１の実施の形態の動作を具体的に
説明するための図であり、検索条件の一例を示す図であ
る。FIG. 14 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of a search condition.

【図１５】本発明の第３の実施の形態の構成を示すブロ
ック図である。FIG. 15 is a block diagram showing a configuration of a third exemplary embodiment of the present invention.

【図１６】本発明の第１の実施の形態の動作を具体的に
説明するための図であり、検索結果の一例を示す図であ
る。FIG. 16 is a diagram for specifically explaining the operation of the first exemplary embodiment of the present invention, and is a diagram illustrating an example of a search result.

[Explanation of symbols]

100、200、300 データ処理装置 110、210 構造化文書管理手段 111 構造化文書登録手段 112 文書要素分解／解析手段 113、213 問合せ処理手段 114問合せ解析手段 115、215 問合せ実行手段 117 文書（要素）取出し手段 218 文書要素条件判定手段 120 記憶装置 121 テキスト格納部 122 文書部品格納部 123 全文インデックス格納部 130 全文検索手段 131 全文検索実行手段 132 全文インデックス生成手段 140 出力装置 150 入力装置 400 記録媒体 100, 200, 300 Data processing device 110, 210 Structured document management means 111 Structured document registration means 112 Document element decomposition / analysis means 113, 213 Query processing means 114 Query analysis means 115, 215 Query execution means 117 Document (element) Extraction unit 218 Document element condition determination unit 120 Storage device 121 Text storage unit 122 Document part storage unit 123 Full-text index storage unit 130 Full-text search unit 131 Full-text search execution unit 132 Full-text index generation unit 140 Output device 150 Input device 400 Recording medium

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−44579（ＪＰ，Ａ) 特開平６−28403（ＪＰ，Ａ) 特開平８−241332（ＪＰ，Ａ) 特開平５−158984（ＪＰ，Ａ) 特開平８−255155（ＪＰ，Ａ) 特開平６−301721（ＪＰ，Ａ) 特開平４−217073（ＪＰ，Ａ) 特開平７−225771（ＪＰ，Ａ) 特開平７−319918（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 12/00 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of front page (56) References JP-A-7-44579 (JP, A) JP-A-6-28403 (JP, A) JP-A 8-241332 (JP, A) JP-A-5-241332 158984 (JP, A) JP-A-8-255155 (JP, A) JP-A-6-301721 (JP, A) JP-A-4-217073 (JP, A) JP-A-7-225771 (JP, A) JP-A-7-319918 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/30 G06F 12/00 JICST file (JOIS)

Claims

(57) [Claims]

1. A structured document registering means for receiving a structured document input from an input means and storing the full text of the structured document in a text storage unit, and storing the input text of the structured document in a document A document element disassembling / storing means for disassembling each element, assigning a unique ID (identification number) thereto, and storing a relation between the text corresponding to each of the document elements and the document element in a document part storage unit; A full-text index is generated in units of the structured document and the document elements constituting the structured document,
At this time, the text for the document element includes the ID of the document element and the ID of the parent document element containing the document element.
A full-text index generating means for generating a full-text index for the text in which the ID of the original document is added, and a text and document structure using the full-text index generated by the full-text index generating means. A full-text search executing means for executing a search relating to the structured document; a query analyzing means for receiving a search request for the structured document input from the input means and analyzing a query condition; Based on the structure information of the obtained document, a structure condition determining unit that determines only a condition related to a structure, and matches the search condition using the full-text search unit or the structure condition determining unit according to the analysis of the query analysis unit. A query execution unit for executing a search for a document or a document element; and the full-text search unit. Alternatively, information of a document element corresponding to the corresponding document element ID is extracted from the document component storage unit from the document element ID obtained as a result of the search by the structure condition determination unit, and further stored in the text based on the original document ID. And a document element extracting means for controlling to extract and output the text of the original structured document from the section to the output means, and when a compound query composed of a plurality of conditions is given, the query executing means Then, a search is performed for each of the plurality of conditions one by one. At this time, in the second and subsequent condition searches, a new query condition that embeds the search result of the previously executed condition as a condition and narrows the search target set is used. The ID of a document element and a search condition related to the document element such as a parent document element or a child document element starting from the document element are given as inputs. And a document element condition determining unit that searches a document structure in the document part storage unit and determines whether the document element satisfies the condition. apparatus.