JP2000250930A

JP2000250930A - Structured document retrieval system

Info

Publication number: JP2000250930A
Application number: JP11052814A
Authority: JP
Inventors: Takashi Shimojima; 崇下島; Masao Ito; 正雄伊藤; Takeshi Tsurubayashi; 健鶴林; Shinichi Nakai; 信一中井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-03-01
Filing date: 1999-03-01
Publication date: 2000-09-14

Abstract

PROBLEM TO BE SOLVED: To retrieve all structured documents by designating various structural conditions. SOLUTION: A document retrieval system which deals with the structured documents includes a structure analysis means 106 which analyzes the logical structure of a structured document that is registered, a means 107 for producing structural information holding document data, which produces the structural information holding document data including the information on the hierarchical structure of the document for each of elements that are classified into the logical structures by the means 106, an index information production means 108 which produces the index information for execution of retrieval from the structural information holding document data, a retrieval condition analysis means 109 which analyzes an inputted retrieval condition and converts it into another retrieval condition that is suitable to the retrieval processing and an index information retrieval means 110 which executes retrieval using the index information according to the retrieval condition that is converted by the means 109. Thus, it is made possible to retrieve the structured documents by designating various structural conditions.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、論文やマニュアル
等の論理的構造を持つ構造化文書を計算機で管理する文
書管理システムにおいて、構造化文書を検索する構造化
文書検索システムに関し、特に、効率的な文書検索を可
能にするものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document management system for managing a structured document having a logical structure, such as a dissertation or a manual, by a computer. This enables a simple document search.

【０００２】[0002]

【従来の技術】電子化文書の増大に伴い、マニュアル、
議事録、仕様書等、論理的構造を有する構造化文書の登
録、検索に対する関心が高まっている。構造化文書はそ
の論理構造がＤＴＤ（Document Type Definition：文書
型定義）によって定義される。従来、構造化文書の登
録、検索を行う文書管理システムとしては、特開平１０
−２４０７５２号公報に記載されたものが知られてい
る。このシステムは、図３７に示すように、文書構造を
解析する文書構造解析プログラム3701と、解析された文
書データを格納する解析済み文書データ格納領域3705
と、文書構造による検索を行うための構造インデックス
を作成する構造インデックス作成プログラム3702と、作
成された構造インデックスを格納する構造インデックス
格納領域3706と、登録対象文書の構造インデックスに対
応するデータ（構造化全文データ）を生成する構造化全
文データ生成プログラム3703と、このデータを格納する
構造化全文データ格納領域3707と、各登録対象文書の構
造化全文データから全文検索を行うための文字列インデ
ックスを作成する文字列インデックス作成プログラム37
04と、作成された文字列インデックスを格納する文字列
インデックス格納領域3708とを備えている。2. Description of the Related Art With the increase in electronic documents, manuals,
Interest in registering and retrieving structured documents having a logical structure such as minutes, specifications, etc. has been increasing. The logical structure of a structured document is defined by DTD (Document Type Definition). Conventionally, a document management system for registering and retrieving structured documents is disclosed in
What is described in -240752 is known. As shown in FIG. 37, this system includes a document structure analysis program 3701 for analyzing a document structure and an analyzed document data storage area 3705 for storing the analyzed document data.
, A structure index creation program 3702 for creating a structure index for performing a search by a document structure, a structure index storage area 3706 for storing the created structure index, and data (structured data) corresponding to the structure index of the document to be registered. Creates a structured full-text data generation program 3703 that generates full-text data), a structured full-text data storage area 3707 that stores this data, and a character string index for performing full-text search from the structured full-text data of each document to be registered String index creation program 37
04 and a character string index storage area 3708 for storing the generated character string index.

【０００３】このシステムでは、構造化文書を登録する
際、まず文書構造解析プログラム3701により登録対象文
書の持つ論理構造を解析して、解析済み文書データを作
成し、解析済み文書データ格納領域3705に登録する。In this system, when a structured document is registered, first, a logical structure of a document to be registered is analyzed by a document structure analysis program 3701 to generate analyzed document data, and the analyzed document data is stored in an analyzed document data storage area 3705. register.

【０００４】次に、構造インデックス作成プログラム37
02により、各登録対象文書の持つ論理構造を、登録順に
順次重ね合わせ、文書中の出現位置および種別が同じで
ある要素群は単一のメタ要素によって代表させ、文書中
の出現位置が同じである文字列データ群は単一のメタ文
字列データによって代表させることにより、メタ要素群
およびメタ文字列データ群（このシステムでは、これら
を総称してメタノードと呼ぶ）の木構造から構成される
構造インデックスを生成し、この構造インデックスを構
成する全てのメタノードに対して、それらを構造インデ
ックスの中で一意に識別する識別子（このシステムで
は、これを文脈識別子と呼ぶ）を与え、構造インデック
ス格納領域3706に登録する。Next, a structure index creation program 37
02, the logical structure of each document to be registered is superimposed sequentially in the order of registration, and elements having the same appearance position and type in the document are represented by a single meta element. A certain character string data group is represented by a single meta character string data, so that a meta element group and a meta character string data group (in this system, these are collectively referred to as a metanode) have a tree structure. An index is generated, and an identifier (in this system, this is referred to as a context identifier) for uniquely identifying the meta-nodes constituting the structural index is given to the meta-nodes. Register with.

【０００５】図３８は、上記構造インデックスを作成す
る過程を示す図である。図３８において、文書１、文書
２、文書３は、それぞれ登録対象文書の解析済み文書デ
ータを表わしている。これらの解析済み文書データの構
造を既存の構造インデックス上に順次重ね合わせること
により、構造インデックスが形成されていく。まず最初
に文書１が入力されると、最初の段階では構造インデッ
クスは初期状態（空）であるため、この解析済みデータ
と等価な木構造が生成されてそのまま構造インデックス
に登録され、構造インデックスは3801に示す状態とな
る。新たに生成されたメタ要素にはＥ１からＥ５までの
文脈識別子、新たに生成されたメタ文字列データにはＣ
１からＣ３までの文脈識別子が割り当てられる。FIG. 38 is a diagram showing a process of creating the structure index. In FIG. 38, a document 1, a document 2, and a document 3 each represent analyzed document data of a document to be registered. The structure index is formed by sequentially superimposing the structure of the analyzed document data on the existing structure index. First, when document 1 is input, since the structure index is initially in the initial state (empty), a tree structure equivalent to the analyzed data is generated and registered as it is in the structure index. The state shown in 3801 is obtained. The newly generated meta element has context identifiers from E1 to E5, and the newly generated meta character string data has C
Context identifiers from 1 to C3 are assigned.

【０００６】次に、文書２が入力されると、既存の構造
インデックス（3801）と構造が重複する部分については
何も行わず、3802上に対応する部分が無かった部分構造
（図中の網掛け部分）だけが新たに登録される。新たに
生成されたメタ要素には文脈識別子Ｅ６およびＥ７、新
たに生成されたメタ文字列データには文脈識別子Ｃ４が
割り当てられる。Next, when the document 2 is input, nothing is performed on the portion where the structure overlaps with the existing structure index (3801), and the partial structure (the network shown in FIG. Only the hung part) is newly registered. The context identifiers E6 and E7 are assigned to the newly generated meta element, and the context identifier C4 is assigned to the newly generated meta character string data.

【０００７】次に、文書３が入力されると、既存の構造
インデックス（3802）と構造が重複する部分については
何も行わず、3802上に対応する部分がなかった部分構造
（図中の網掛け部分）だけが新たに登録される。新たに
生成されたメタ要素には文脈識別子Ｅ８、Ｅ９およびＥ
１０、新たに生成されたメタ文字列データには文脈識別
子Ｃ５およびＣ６が割り当てられる。Next, when the document 3 is input, nothing is performed on the portion where the structure overlaps with the existing structure index (3802), and the partial structure (the network shown in FIG. Only the hung part) is newly registered. Context identifiers E8, E9 and E
10. Context identifiers C5 and C6 are assigned to the newly generated meta-character string data.

【０００８】このようにして、３個の文書が登録された
段階で、構造インデックスは3803に示す状態となる。[0008] At the stage when three documents are registered in this way, the structure index is in the state shown in 3803.

【０００９】次に、構造化全文データ生成プログラム37
03により、各登録対象文書について、その文書に対応す
る解析済み文書データ中に含まれるすべての文字列と、
その文字列を構造インデックス中で示す文脈識別子との
対応関係の定義から構成されるデータ（このシステムで
は、これを構造化全文データと呼ぶ）を生成し、構造化
全文データ格納領域3707に登録する。Next, a structured full-text data generation program 37
According to 03, for each document to be registered, all the character strings included in the parsed document data corresponding to the document,
Generates data (in this system, this is called structured full-text data) composed of the definition of the correspondence relationship with the context identifier indicating the character string in the structure index, and registers it in the structured full-text data storage area 3707 .

【００１０】次に、文字列インデックス作成プログラム
3704により、各登録対象文書に対応する構造化全文デー
タから、文脈識別子を含んだ全文検索を行なうための文
字列インデックスを作成し、文字列インデックス格納領
域3708に登録する。Next, a character string index creation program
According to 3704, a character string index for performing a full-text search including a context identifier is created from the structured full-text data corresponding to each registration target document, and registered in the character string index storage area 3708.

【００１１】このシステムでの検索は、まず構造インデ
ックスを参照し、指定された構造条件を満たす文脈識別
子の集合を決定する。In the search in this system, first, a set of context identifiers satisfying a specified structure condition is determined by referring to a structure index.

【００１２】次に、それらの文脈識別子をキーとして文
字列を検索することにより、指定条件を満たす文書群を
求める。Next, a document group that satisfies the designated condition is obtained by searching a character string using the context identifier as a key.

【００１３】[0013]

【発明が解決しようとする課題】しかし、構造化文書に
は異なる複数の文書型が存在しているが、従来技術で
は、そのような汎用的な文書型全てに対応した文書の登
録、および文書型を意識した検索が出来ないという課題
を有していた。However, there are a plurality of different document types in a structured document. However, in the related art, registration of documents corresponding to all such general-purpose document types and document There was a problem that a search conscious of the type could not be performed.

【００１４】また、従来技術の方法では、「章以下に
“○○”を含む文書」というような検索条件の場合に、
構造インデックスを参照して該当する全ての識別子の集
合を求め、その識別子を基にＯＲ検索を行なわなければ
ならないため、検索が遅くなるという課題を有してい
た。Further, according to the method of the prior art, in the case of a search condition such as “document including“ XX ”in chapter and below”,
Since it is necessary to obtain a set of all applicable identifiers by referring to the structure index and perform an OR search based on the identifiers, the search is slow.

【００１５】本発明は、こうした従来技術の課題を解決
するものであり、全ての構造化文書に対して、様々な構
造条件を指定して検索することができ、また、文書型を
意識した構造化文書の検索が可能な構造化文書検索シス
テムを提供することを目的としている。The present invention solves the above-mentioned problems of the prior art. It is possible to perform a search by specifying various structural conditions for all structured documents. It is an object of the present invention to provide a structured document search system capable of searching structured documents.

【００１６】[0016]

【課題を解決するための手段】そこで、本発明の構造化
文書検索システムでは、構造化文書の登録の際に登録文
書の論理構造を解析する構造解析部と、論理構造ごとに
分けられたデータにその階層構造の情報としてタグ名と
その出現順序と区切り文字とをつなぎ合わせた文字列を
付加する構造情報保持文書データ作成部と、構造情報保
持文書データから、高速に検索を行なうための索引情報
を作成する索引情報作成部とを設けている。Therefore, in the structured document search system of the present invention, a structure analyzing unit for analyzing a logical structure of a registered document when registering the structured document, and data divided for each logical structure. A structure information holding document data creating unit for adding a character string obtained by connecting a tag name, its appearance order, and a delimiter as information of the hierarchical structure to the document, and an index for performing a high-speed search from the structure information holding document data And an index information creating unit for creating information.

【００１７】こうした構成により、検索を行なう際に、
例えば「タイトルに“構造化”という文字列が含まれる
文書」とか「第１章内に“登録”という文字列が含まれ
る文書」「章題に“方法”という文字列が含まれる文
書」など、文書の様々な構造条件を指定して文書を検索
することが可能になる。With this configuration, when performing a search,
For example, "document whose title contains the character string" structured "", "document whose chapter 1 contains the character string of" registration "", and "document whose chapter title contains the character string of" method "" It is possible to search for a document by designating various structural conditions of the document.

【００１８】また、本発明の構造化文書検索システムで
は、構造化文書登録の際に、構造化文書の論理構造の情
報として、構造文字列だけでなく、登録対象文書の文書
型をも文書型文字列として登録する。Further, in the structured document search system of the present invention, when registering a structured document, not only the structure character string but also the document type of the registration target document is used as the information on the logical structure of the structured document. Register as a character string.

【００１９】これにより、例えば「文書型が［論文］
で、第１章の章題に“登録”という文字列が含まれる文
書」と言うように、文書型を指定した検索が可能にな
る。Thus, for example, if the document type is [paper]
Thus, a search specifying a document type can be performed, for example, "a document in which the title of the first chapter includes a character string" registered "".

【００２０】[0020]

【発明の実施の形態】本発明の請求項１に記載の発明
は、構造化文書を扱う文書検索システムにおいて、登録
対象の構造化文書の論理構造を解析する構造解析手段
と、この構造解析手段によって論理構造に分けられた要
素ごとにその階層構造の情報を含む構造情報保持文書デ
ータを作成する構造情報保持文書データ作成手段と、作
成された構造情報保持文書データから、検索を行なうた
めの索引情報を作成する索引情報作成手段と、入力され
た検索の条件を解析し、検索処理に適した検索条件に変
換する検索条件解析手段と、この検索条件に基づいて、
索引情報を用いて検索を行なう索引情報検索手段とを設
けたものであり、構造化文書を、様々な構造条件を指定
して検索することが可能になる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the first aspect of the present invention, in a document search system for handling a structured document, a structure analyzing means for analyzing a logical structure of a structured document to be registered, and the structure analyzing means Means for creating structure information holding document data including information of the hierarchical structure for each element divided into logical structures by the above, and an index for performing a search from the created structure information holding document data Index information creation means for creating information, search condition analysis means for analyzing input search conditions and converting the search conditions into search conditions suitable for search processing,
An index information search means for performing a search using the index information is provided, and a structured document can be searched by specifying various structural conditions.

【００２１】請求項２に記載の発明は、索引情報検索手
段によって検索された検索結果一覧を表示する結果一覧
表示手段と、この検索結果一覧の中から選択された文書
の実体データを表示する実体表示手段とを設けたもので
あり、構造条件に基づいて検索された複数の文書の中か
ら、所望の文書を選択して、その実体データを表示させ
ることができる。According to a second aspect of the present invention, there is provided a result list displaying means for displaying a list of search results searched by the index information searching means, and an entity for displaying entity data of a document selected from the search result list. Display means for selecting a desired document from a plurality of documents retrieved based on the structural conditions and displaying the actual data.

【００２２】請求項３に記載の発明は、構造情報保持文
書データ作成手段が、登録対象文書の各要素の論理構造
情報をタグ名とその出現順序と区切り文字とをつなぎ合
わせた文字列によって構造情報保持文書データに含める
ようにしたものであり、検索を行う際に、様々な構造条
件を指定した検索が可能になる。According to a third aspect of the present invention, the structure information holding document data creating means has a structure in which the logical structure information of each element of the document to be registered is formed by a character string in which a tag name, its appearance order and a delimiter are connected. This is included in the information holding document data. When performing a search, it is possible to perform a search by designating various structural conditions.

【００２３】請求項４に記載の発明は、構造情報保持文
書データ作成手段が、登録対象文書の文書型の情報を示
す文字列を構造情報保持文書データに含めるようにした
ものであり、検索の際に文書型を指定した検索が可能に
なる。According to a fourth aspect of the present invention, the structure information holding document data creating means includes a character string indicating the document type information of the registration target document in the structure information holding document data. In this case, a search with a specified document type can be performed.

【００２４】請求項５に記載の発明は、構造解析手段に
よって解析された構造解析済みデータを構造情報保持文
書データ作成手段へ渡す前に、登録対象文書の文書型お
よびタグ名にそれぞれ一意な識別子を割当てる文書型・
構造テーブル作成手段を設けたものであり、例えば、原
語文書とその和訳文書のように、文書型は異なるが論理
構造が同一である文書に対して、文書型およびタグ名の
識別子に同じ値を割り当てることによって、一つの検索
条件式でそれらの文書を抽出することが可能になる。According to a fifth aspect of the present invention, a unique identifier is assigned to the document type and tag name of the document to be registered before passing the structurally analyzed data analyzed by the structural analysis means to the structure information holding document data creating means. The document type to which the
A structure table creating means is provided.For example, for documents having different document types but the same logical structure, such as an original language document and its Japanese translation document, the same value is used for the identifier of the document type and the tag name. The assignment makes it possible to extract those documents with one search condition expression.

【００２５】請求項６に記載の発明は、索引情報作成手
段が、文書型およびタグ名にそれぞれ一意に割当てられ
た識別子を含む索引情報を作成するようにしたものであ
り、文書型またはタグ名を指定した構造指定検索要求に
対して、効率的で高速な検索が可能になる。According to a sixth aspect of the present invention, the index information creating means creates index information including an identifier uniquely assigned to each of the document type and the tag name. In response to a structure-designated search request that specifies a name, efficient and high-speed search can be performed.

【００２６】請求項７に記載の発明は、文書型・構造テ
ーブル作成手段が、複数出現するタグ名の情報を保持
し、構造情報保持文書データ作成手段が、複数出現する
タグ名の要素についてのみ、出現順序の情報を構造情報
保持文書データに含めるようにしたものであり、索引情
報から出現順序の情報を外すことが可能になるため、検
索速度が速くなる。According to the present invention, the document type / structure table creating means holds information on a plurality of appearing tag names, and the structure information holding document data creating means stores only the information on the tag name appearing a plurality of times. Since the information on the appearance order is included in the structure information holding document data, and the information on the appearance order can be excluded from the index information, the retrieval speed is increased.

【００２７】以下、本発明の実施の形態について図面を
用いて説明する。なお、本発明はこれら実施の形態に何
ら限定されるものではなく、その要旨を逸脱しない範囲
において、種々なる態様で実施することができる。Hereinafter, embodiments of the present invention will be described with reference to the drawings. The present invention is not limited to these embodiments at all, and can be implemented in various modes without departing from the scope of the invention.

【００２８】（第１の実施の形態）第１の実施の形態の
構造化文書検索システムは、図１に示すように、文書登
録における進行状況を表示し、また、文書検索での指定
された検索条件や検索結果を表示する表示手段101と、
登録対象文書を格納する記録装置102と、実際に検索対
象の文書の登録、検索および検索結果の表示を行なう検
索エンジン104と、文書登録および文書検索の際に様々
なコマンドや条件を入力する入力手段103と、検索エン
ジン104で作成された各種データを格納する検索データ
ベース105とを備えている。(First Embodiment) As shown in FIG. 1, the structured document retrieval system according to the first embodiment displays the progress of document registration, and specifies the progress of document registration. Display means 101 for displaying search conditions and search results;
A recording device 102 for storing documents to be registered, a search engine 104 for actually registering, searching for, and displaying search results of documents to be searched, and an input for inputting various commands and conditions at the time of document registration and document search It comprises means 103 and a search database 105 for storing various data created by the search engine 104.

【００２９】検索エンジン104の構成は、大きく登録手
段と検索手段と表示手段とに分かれる。まず、登録手段
は、登録対象文書の論理構造を解析する構造解析部106
と、構造解析部106によって論理構造ごとに分けられた
各データにその階層構造の情報を付加して構造情報保持
文書データを作成する構造情報保持文書データ作成部10
7と、構造情報保持文書データ作成部107で作成された構
造情報保持文書データから、高速に検索を行なうための
索引情報を作成する索引情報作成部108とを具備してい
る。これら106から108についての詳細は、文書登録処理
の流れの説明の中で述べる。The configuration of the search engine 104 is roughly divided into registration means, search means, and display means. First, the registration unit includes a structure analysis unit 106 that analyzes a logical structure of a document to be registered.
And a structure information holding document data creation unit 10 for adding structure information to each data divided for each logical structure by the structure analysis unit 106 to create structure information holding document data.
7 and an index information creation unit 108 for creating index information for performing a high-speed search from the structure information holding document data created by the structure information holding document data creation unit 107. Details of these 106 to 108 will be described in the description of the flow of the document registration process.

【００３０】次に、検索手段は、入力手段103から受け
た検索条件を、この検索装置で検索を行なうのに適した
条件に変換する検索条件解析部109と、検索条件解析部1
09で解析された検索条件を受けて、実際に索引情報を用
いて検索処理を行なう索引情報検索部110とを具備して
いる。これら109、110についての詳細は文書検索の流れ
の説明の中で述べる。Next, the search means includes a search condition analysis unit 109 for converting the search condition received from the input unit 103 into a condition suitable for performing a search by the search device, and a search condition analysis unit 1.
An index information search unit 110 that receives the search condition analyzed in 09 and actually performs a search process using the index information is provided. Details of these 109 and 110 will be described in the description of the flow of document search.

【００３１】次に表示手段は、検索結果一覧を表示手
段101へ表示する結果一覧表示部111と、検索結果一覧か
ら入力手段103によって選択された文書の実体データを
表示手段101へ表示する実体表示部112とを具備してい
る。Next, the display means includes a result list display section 111 for displaying a search result list on the display means 101, and an entity display for displaying entity data of the document selected by the input means 103 from the search result list on the display means 101. And a unit 112.

【００３２】検索データベース105は、構造解析部106に
よって作成された構造解析済みデータを格納する構造解
析済みデータ格納部113と、構造情報保持文書データ作
成部107によって作成されたデータを格納する構造情報
保持文書データ格納部114と、索引情報作成部108によっ
て作成された索引情報を格納する索引情報格納部115
と、登録文書の実体データを格納する実体データ格納部
116と、検索結果一覧のための書誌データを格納する一
覧データ格納部117とを具備しており、構造化文書の検
索および結果表示に用いるデータが格納される。The search database 105 includes a structure-analyzed data storage unit 113 for storing the structure-analyzed data created by the structure analysis unit 106, and structure information for storing the data created by the structure information holding document data creation unit 107. Holding document data storage unit 114 and index information storage unit 115 for storing the index information created by index information creation unit 108
And the entity data storage unit that stores the entity data of the registered document
It includes a list data storage unit 117 for storing bibliographic data for a search result list, and stores data used for structured document search and result display.

【００３３】次に、この構造化文書検索システムの文書
登録の処理を具体的な構造化文書の例を用いて説明す
る。Next, the document registration process of the structured document search system will be described using a specific example of a structured document.

【００３４】まず、入力手段103からの登録要求に従っ
て、記憶手段102から登録対象文書を読み込む。次に、
構造解析部106が、登録対象文書の構造を構造情報保持
文書データ作成部107で理解できる形に変換する。この
構造解析部106は、登録対象文書のＤＴＤを参照して、
その構造を解析する。この構造解析部106によって、文
字の並びとしての構造化文書が構造情報保持文書データ
作成部107に理解できるデータ構造に変換される（以
下、構造解析済みデータと呼ぶ）。First, in accordance with a registration request from the input unit 103, a document to be registered is read from the storage unit 102. next,
The structure analysis unit 106 converts the structure of the registration target document into a form that can be understood by the structure information holding document data creation unit 107. This structure analysis unit 106 refers to the DTD of the registration target document,
Analyze its structure. The structure analysis unit 106 converts the structured document as a character sequence into a data structure that can be understood by the structure information holding document data creation unit 107 (hereinafter, referred to as structure analyzed data).

【００３５】図２はＤＴＤの一例である。このＤＴＤに
より文書要素が現れるべき場所、順序、出現回数などが
定義される。図２の例で定義されている内容の一部につ
いての説明を以下に示す。まず＜論文＞要素はその子要
素として＜書誌＞要素と＜本文＞要素とを持つ。この＜
書誌＞要素は子要素として＜タイトル＞要素と＜著者＞
要素と＜日付＞要素とを持つ。また、この＜タイトル
＞、＜著者＞、＜日付＞要素はその下に子要素を持た
ず、その内容として文字列を持つ。＜本文＞要素はその
子要素として＜章＞要素を持ち、この＜章＞要素には
“＊”記号が後ろに記述されているが、これはこの＜章
＞要素が、＜本文＞以下にいくつ出現しても構わないと
いうことを意味する。FIG. 2 shows an example of the DTD. This DTD defines the location, order, number of appearances, and the like where the document element should appear. A description of a part of the contents defined in the example of FIG. 2 is shown below. First, the <paper> element has a <bibliography> element and a <body> element as its child elements. This <
The <Bibliography> element is a <Title> element and <Author> as child elements.
Element and a <date> element. The <title>, <author>, and <date> elements have no child elements below them, but have character strings as their contents. The <Body> element has a <Chapter> element as its child element, and this <Chapter> element is followed by a “*” symbol. It means that it may appear.

【００３６】構造解析部106は、図２のＤＴＤによって
定義される文書型の構造化文書を、例えば図３の形の構
造化文書に変換する。構造解析部106が解析した図３の
構造化文書を木構造で表すと図４のようになる。The structure analysis unit 106 converts a document type structured document defined by the DTD in FIG. 2 into a structured document in the form of FIG. 3, for example. FIG. 4 shows the structured document of FIG. 3 analyzed by the structure analysis unit 106 in a tree structure.

【００３７】次に、構造情報保持文書データ作成部107
の処理の流れを図５を用いて説明する。まず、登録対象
文書の構造解析済みデータを構造解析済みデータ格納部
113から読み込む（ステップ５０１）。次に、この読み
込んだ登録対象文書ごとに一意な番号（以下、文書番号
と呼ぶ）を割当てる（ステップ５０２）。次に、該当文
書の一つ一つの構成要素ごとに、その文書内で一意な番
号（以下、要素番号と呼ぶ）を割当てる（ステップ５０
３）。次に、この要素の木構造アドレスを、文書要素名
（タグ名）とその出現順序とをつなげた文字列（以下、
構造文字列と呼ぶ）で記述する（ステップ５０４）。Next, the structure information holding document data creation unit 107
Will be described with reference to FIG. First, the structurally analyzed data of the document to be registered is stored in the structurally analyzed data storage unit.
Read from 113 (step 501). Next, a unique number (hereinafter, referred to as a document number) is assigned to each of the read registration target documents (step 502). Next, a unique number (hereinafter referred to as an element number) within the document is assigned to each component of the document (step 50).
3). Next, the tree structure address of this element is converted into a character string (hereinafter, referred to as the document element name (tag name))
(Referred to as a structural character string) (step 504).

【００３８】このように記述した場合、図３に示した構
造化文書におけるタイトル文字列“構造化文書の管理方
法”の木構造アドレスは、 “\論文_１\書誌_１\タイトル_１\” と表現できる。ここで出現順序とは、同じ親要素を持つ
同じタグ名の要素の中で何番目に出現した要素かを示す
番号であり、この出現順序を、タグ名に“_”を挟んで
続ける。さらに階層の区切りは“\”を用いることとす
る。この“\”は構造文字列の最初と最後にも付加す
る。なお、この例では“_”と“\”を用いたが、これら
“_”と“\”はタグ名およびその出現順序で用いられな
いものである限り、どのようなものでも構わない。ま
た、これら“_”と“\”は１種類の文字で固定する必要
はなく、例えば階層の第１区切りが“\”、第２区切り
が“＄”というように異なる文字を割り当てても構わな
い。In this case, the tree structure address of the title character string “structured document management method” in the structured document shown in FIG. 3 is “\ article_1 \ bibliography_1 \ title_1 \ "Can be expressed. Here, the appearance order is a number indicating the number of the element having the same parent element and the same tag name, and the appearance order is continued with "_" being interposed between the tag names. In addition, “\” is used as a layer break. This "\" is also added to the beginning and end of the structure string. In this example, “_” and “\” are used, but these “_” and “\” may be any as long as they are not used in the tag name and the order of appearance. Also, these "_" and "\" need not be fixed with one kind of character, and different characters such as "\" for the first delimiter of the hierarchy and "$" for the second delimiter may be assigned. Absent.

【００３９】最後にこれら文書番号、要素番号、構造文
字列、および要素内容（文字列）とセットにしてレコー
ドを作成する（ステップ５０５）。そして、ステップ５
０３から５０５をすべての構成要素について繰り返す
（ステップ５０６）。このレコードをまとめて、構造情
報保持文書データとし、構造情報保持文書データ格納部
114に格納する（ステップ５０７）。Finally, a record is created as a set with these document number, element number, structural character string, and element contents (character string) (step 505). And step 5
Steps 03 to 505 are repeated for all components (step 506). This record is combined into structure information holding document data, and the structure information holding document data storage unit.
It is stored in 114 (step 507).

【００４０】図６は、図３に示した構造化文書の例から
作成される構造情報保持文書データの例である。符号60
1から607は図６に示すレコードの一つ一つを区別するた
めに割当てたものであり、後述の実例を用いた検索処理
の説明の中で利用する。ここでは図３の構造化文書に対
して文書番号はステップ５０２により“１”という数字
が割当てられている。なお、ここでの例では２番目に登
録される文書については文書番号として“２”が割当て
られて行くものと考えるが、この文書番号は登録済みの
文書データ一つ一つに対して一意な番号であれば、この
ように昇順に割当てていかなくても構わない。また、要
素番号も図６では、文書内で出現する順に各構造要素に
数字が割当てられているが、この要素番号も同一文書内
で一意な番号であればよい。FIG. 6 shows an example of the structure information holding document data created from the example of the structured document shown in FIG. Sign 60
Numerals 1 to 607 are assigned to distinguish each of the records shown in FIG. 6 and will be used in the description of the search processing using an actual example described later. Here, the number “1” is assigned to the structured document in FIG. In this example, it is assumed that the document registered second is assigned “2” as the document number, but this document number is unique to each registered document data. The numbers need not be assigned in ascending order as described above. Also, in FIG. 6, numbers are assigned to the respective structural elements in the order in which they appear in the document, but the element numbers may be unique numbers in the same document.

【００４１】次に、索引情報作成部108の処理の流れを
図７を用いて説明する。まず、構造情報保持文書データ
格納部114から登録対象文書の構造情報保持文書データ
を読み込む（ステップ７０１）。次に、その中の内容文
字列についてあらかじめ定めた文字数の文字連鎖を取り
出す（ステップ７０２）。この文字連鎖について、該当
する文書番号、要素番号、および該文字連鎖先頭文字が
その要素内において何番目の文字かを表す番号（以下、
文字位置番号と呼ぶ）の情報を内容文字列についての索
引情報（以下、内容文字列索引情報と呼ぶ）として作成
する（ステップ７０３）。この処理を登録対象文書の全
ての内容文字列について繰り返す（ステップ７０４）。
次に、ステップ７０２から７０４までと同様の処理を、
今度は登録対象文書の構造文字列に対して行なう（ステ
ップ７０５、７０６、７０７）。ここで作成された索引
情報は構造文字列索引情報と呼ぶことにする。最後に内
容文字列索引情報、および構造文字列索引情報を索引情
報格納部115に追加する（ステップ７０８）。Next, the flow of processing of the index information creating unit 108 will be described with reference to FIG. First, the structure information holding document data of the registration target document is read from the structure information holding document data storage unit 114 (step 701). Next, a character chain having a predetermined number of characters is extracted from the content character string therein (step 702). Regarding this character chain, the corresponding document number, element number, and a number indicating the number of the first character of the character chain in the element (hereinafter, referred to as
Information of the character position number) is created as index information on the content character string (hereinafter, referred to as content character string index information) (step 703). This process is repeated for all the content character strings of the registration target document (step 704).
Next, the same processing as steps 702 to 704 is performed.
This process is performed on the structural character string of the registration target document (steps 705, 706, and 707). The index information created here is referred to as structural character string index information. Finally, the content character string index information and the structure character string index information are added to the index information storage unit 115 (step 708).

【００４２】図８は、索引情報作成部108によって図６
の構造情報保持データのうち601のレコードについて作
成した構造文字列索引情報、および内容文字列索引情報
の例の一部を示した図である。図８の801は、「文書番
号が“１”の文書の、要素番号が“１”の要素の構造文
字列の中に、“￥論”という文字連鎖が先頭から“１”
文字目の位置から存在する」ということを表している。
なお、図８は索引情報の一部しか示していないが、実際
は登録対象文書の全ての内容文字列、および構造文字列
について索引情報が作成される。従って、図６の602の
構造文字列についての索引情報を作成する場合、601と
同様に“￥論”という文字連鎖が先頭から“１”文字目
の位置から存在するため、801の“￥論”についての索
引情報に［１−２−１］という情報を追加することにな
る。FIG. 8 shows the contents of FIG.
FIG. 7 is a diagram showing a part of an example of structural character string index information and content character string index information created for 601 records in the structural information holding data of FIG. In FIG. 8, reference numeral 801 denotes that “the document chain having the document number“ 1 ”has the character string“ ￥ theory ”in the structural character string of the element having the element number“ 1 ”from the beginning.
It exists from the position of the character. "
Although FIG. 8 shows only a part of the index information, index information is actually created for all the content character strings and the structural character strings of the registration target document. Accordingly, when creating index information for the structural character string 602 in FIG. 6, a character chain “￥ theory” exists from the position of the “1” character from the head, as in 601; The information [1-2-1] is added to the index information for "".

【００４３】なお、この例では２文字ずつ文字連鎖を取
り出してそれぞれに索引情報を作成しているが、この文
字連鎖は２文字ずつでなくても構わない。In this example, character chains are extracted two characters at a time and index information is created for each character chain. However, the character chains need not be two characters at a time.

【００４４】また、以上の登録処理を登録対象文書が入
力されるごとに繰り返すことにより、索引情報が追加さ
れてゆく。The above-described registration process is repeated each time a document to be registered is input, whereby index information is added.

【００４５】次に、本実施の形態における文書検索の処
理の流れを具体的な検索条件の例を用いて説明する。ま
ず、入力手段103から検索条件として「タイトルに“構
造化”という文字列が含まれる文書」という条件が与え
られたとする。Next, the flow of the document search process according to the present embodiment will be described using specific examples of search conditions. First, it is assumed that a condition “a document including a character string“ structured ”in a title” is given as a search condition from the input unit 103.

【００４６】ここで、検索条件解析部109の処理の流れ
を図９を用いて説明する。まずステップ９０１によって
検索条件の中に構造を指定した条件の有無を判断する。
構造を指定した検索条件があった場合、ステップ９０２
へ進み、ここでこの構造指定が論理構造の唯一の末端を
指定した検索であった場合は、構造条件をワイルドカー
ド（０文字以上の任意の文字列を表す文字：*）を１文
字目に含む後方一致用の文字列（以下、構造条件文字列
と呼ぶ）として作成する（ステップ９０３）。「唯一の
末端」については、具体例を用いて後述する。唯一の末
端以外を指定した検索の場合は、１文字目と最後の文字
にワイルドカードを含む中間一致用の構造条件文字列を
作成する（ステップ９０４）。最後に、ここで作成した
検索条件としての構造条件文字列と、指定された内容文
字列とを用いて検索条件式を作成する（ステップ９０
５）。また、ステップ９０１において、構造を指定しな
い検索だった場合は構造条件文字列を“Ｎｕｌｌ値”と
して（ステップ９０６）、検索条件式を作成する。Here, the processing flow of the search condition analysis unit 109 will be described with reference to FIG. First, in step 901, it is determined whether or not there is a condition specifying a structure in the search condition.
If there is a search condition specifying the structure, step 902
If the structure specification is a search that specifies only the end of the logical structure, the structure condition is set to the first character using a wildcard (a character representing any character string of 0 or more characters). It is created as a character string for backward matching (hereinafter, referred to as a structural condition character string) (step 903). The “single end” will be described later using a specific example. In the case of a search designating only the end other than the end, a structure condition character string for intermediate matching including a wildcard in the first and last characters is created (step 904). Finally, a search condition expression is created using the structure condition character string as the search condition created here and the specified content character string (step 90).
5). If it is determined in step 901 that the search is performed without specifying a structure, the structure condition character string is set to “Null value” (step 906), and a search condition expression is created.

【００４７】ここでの具体例、検索条件が「タイトルに
“構造化”という文字列が含まれる文書」の場合、構造
を指定した検索である。この条件には“タイトル”につ
いての構造指定があり、唯一の末端を指定した検索なの
で、構造条件文字列を“＊\タイトル_１\”とする。構
造要素“タイトル”は出現順序が特に指定されていない
が、この要素は文書中に１度しか現れない要素なので、
その出現順序は“１”として構造条件文字列を作成す
る。ここで“＊”はワイルドカードである。この結果作
成される検索条件式は、「構造文字列＝“＊\タイトル_
１\” ＆内容文字列＝“＊構造化＊”」となる。In this example, when the search condition is “a document including a character string“ structured ”in the title”, the search is performed by designating the structure. In this condition, there is a structure specification for “title”, and since it is a search specifying only one end, the structure condition character string is “* \ title_1 \”. Although the order of appearance of the structural element "title" is not specified, since this element appears only once in the document,
The appearance order is “1”, and a structure condition character string is created. Here, “*” is a wild card. The search condition expression created as a result is “structure character string =“ * \ title_
1 \ ”& Content string =“ * Structured * ””.

【００４８】ここで「論理構造の唯一の末端を指定した
検索」とは、図４に示されるような木構造において、文
字データを下に持っている要素のうちのどれかひとつを
指定した検索を言う。具体的には図４の中で網掛けで表
された要素が「末端」に値する。Here, "search specifying only the end of the logical structure" means a search in which one of elements having character data below is specified in a tree structure as shown in FIG. Say Specifically, the elements shaded in FIG. 4 are worth the “end”.

【００４９】これに対し、検索条件が「第１章内に“登
録”という文字列が含まれる文書」というように、ある
論理構造要素以下を指定した検索の場合、構造条件文字
列は“＊\章_０１\＊”となり、中間一致を行なう構造
条件文字列を作成する。また検索条件が「章題に“方
法”という文字列が含まれる文書」というように、論理
構造の末端を指定していても、該当する要素がひとつに
限られていない（唯一でない）場合も、中間一致を行な
う。この検索条件例の場合、第何章の章題でも良いこと
になるので、構造条件文字列は“＊\章題＊”となる。On the other hand, when the search condition specifies a logical structure element or less, such as "document including a character string" register "in Chapter 1", the structure condition character string is "*". \ Chapter_01 \ * ", and a structure condition character string for performing an intermediate match is created. Also, even if the search condition is specified as the end of the logical structure such as "document whose title includes the character string" method "", the applicable element is not limited to one (it is not unique). Perform an intermediate match. In the case of this search condition example, since any chapter title may be used, the structure condition character string is “* \ chapter title *”.

【００５０】次に、索引情報検索部110の処理の流れを
図１０を用いて説明する。まず、ステップ１００１にお
いて、索引情報格納部115から索引情報を読み込む。次
に、ステップ１００２にて、内容文字列索引情報を用い
て指定された文字列の検索を行なう。検索結果として
は、その文字列が存在する文書番号と要素番号の組が件
数分だけ得られる。次に、ステップ１００３にて、ステ
ップ１００２の結果が０件の場合は、「該当文書なし」
となり終了する。０件でない場合はステップ１００４へ
進み、ここで構造条件文字列が“Null”ならば、ステッ
プ１００２の結果を最終結果として（ステップ１００
７）終了する。構造条件文字列が“Null”でない場合
は、構造文字列索引情報を用いて指定された構造条件文
字列との検索を行なう（ステップ１００５）。ここでも
検索結果として文書番号と要素番号の組が得られる。最
後に、ステップ１００６にて、ステップ１００２と、ス
テップ１００５の結果のうち、両方に存在する文書番号
と要素番号の組を最終結果として終了する。Next, the processing flow of the index information search unit 110 will be described with reference to FIG. First, in step 1001, index information is read from the index information storage unit 115. Next, in step 1002, the designated character string is searched using the content character string index information. As a search result, a set of a document number and an element number in which the character string exists is obtained by the number of cases. Next, in Step 1003, if the result of Step 1002 is 0, "No applicable document"
And ends. If it is not 0, the process proceeds to step 1004. If the structure condition character string is “Null”, the result of step 1002 is set as the final result (step 1004).
7) End. If the structure condition character string is not "Null", a search is performed for the specified structure condition character string using the structure character string index information (step 1005). Here, a set of a document number and an element number is obtained as a search result. Finally, in step 1006, of the results of step 1002 and step 1005, the combination of the document number and element number that exist in both ends as the final result.

【００５１】ここで、ステップ１００２およびステップ
１００５で行なう、索引情報を用いた検索についての詳
細を説明する。例として“構造化”という文字列につい
て検索を行なう場合を考える。文字列“構造化”につい
ては２文字ずつの文字連鎖として“構造”と“造化”が
取り出すことができる。ここで取り出す文字連鎖の文字
数は、索引情報の文字連鎖の文字数と同一とする。この
２つの文字連鎖について図１１の1110に示すような索引
情報が作成されているとして、この中から文書番号およ
び要素番号が同一で、かつ“構造”の連鎖から“造化”
の連鎖に対して文字位置番号が連続しているものを検索
結果として抽出する。図１１の例では、文書番号および
要素番号が同一なものとして1121、1122、1123を取り出
すことができる。更にその中で文字位置番号が連続して
いるのは1121と1123であり、これらの文書番号と要素番
号の組を検索結果とする。Here, the details of the search using the index information performed in steps 1002 and 1005 will be described. As an example, consider a case where a search is performed for a character string “structured”. As for the character string “structured”, “structure” and “structured” can be extracted as a character chain of two characters at a time. Here, the number of characters in the character chain to be extracted is the same as the number of characters in the character chain in the index information. Assuming that index information such as that shown at 1110 in FIG. 11 has been created for these two character chains, the document numbers and element numbers are the same, and the "structure"
Are extracted as search results. In the example of FIG. 11, it is possible to extract 1121, 1122, and 1123 assuming that the document number and the element number are the same. Among them, the character position numbers are consecutive at 1121 and 1123, and a set of these document numbers and element numbers is used as a search result.

【００５２】内容文字列および構造文字列について上記
の方法で検索した結果を比較し、文書番号と要素番号の
組が双方の検索結果に存在するものが、最終的な検索結
果として得られる。The results of the search for the content character string and the structure character string by the above-described method are compared, and the combination of the document number and the element number in both search results is obtained as the final search result.

【００５３】その結果として得られた文書の書誌情報
（タイトル、著者、日付など）を結果一覧表示用のデー
タとして、一覧データ格納部117に格納する。The bibliographic information (title, author, date, etc.) of the resulting document is stored in the list data storage unit 117 as data for displaying a result list.

【００５４】最後に検索結果表示処理について説明す
る。まず、結果一覧表示部111が一覧データ格納部117か
ら検索結果一覧データを読込み、表示手段101へ表示す
る。次に、入力手段103から実体表示要求として、この
検索結果一覧の中からどれか１つの文書が選ばれると、
実体表示部112が該当する文書の実体データを実体デー
タ格納部116から読込み、表示手段101へ表示する。Finally, the search result display processing will be described. First, the result list display unit 111 reads the search result list data from the list data storage unit 117 and displays it on the display unit 101. Next, when any one document is selected from the search result list as the entity display request from the input unit 103,
The entity display unit 112 reads the entity data of the corresponding document from the entity data storage unit 116 and displays it on the display unit 101.

【００５５】以上のように、本実施の形態では、構造化
文書の論理構造の情報をタグ名とその出現順序とをつな
ぎ合わせた文字列を用いて登録することによって、検索
を行なう際に、様々な構造条件を指定した検索が可能と
なる。As described above, according to the present embodiment, when a search is performed by registering information on the logical structure of a structured document using a character string obtained by connecting the tag name and its appearance order, Searches with various structural conditions specified can be performed.

【００５６】（第２の実施の形態）本発明の第２の実施
の形態では、文書型を指定した検索が可能な構造化文書
検索システムについて説明する。(Second Embodiment) In a second embodiment of the present invention, a structured document retrieval system capable of performing retrieval by specifying a document type will be described.

【００５７】この構造化文書検索システムの構成図は第
１の実施の形態の図１と変わらない。ただし、構造情報
保持文書データ作成手段107で行なう処理の中に文書型
の情報を含む処理が加わり、その結果、索引情報作成部
108、検索条件解析部109および索引情報検索部110の処
理が前記第１の実施の形態とは異なる。The configuration diagram of this structured document search system is not different from that of the first embodiment shown in FIG. However, processing including document type information is added to the processing performed by the structure information holding document data generating unit 107, and as a result, the index information generating unit
The processing of the search condition analysis unit 109 and the index information search unit 110 is different from that of the first embodiment.

【００５８】ここで、第２の実施の形態における登録処
理の流れを説明する。まず、最初の登録対象文書として
図３に示された構造化文書例の場合で説明する。図３の
構造化文書は図２に示される「論文」型のＤＴＤを持
つ。構造解析部106の処理は第１の実施の形態と同様で
あるため説明を省略する。次に、構造情報保持文書デー
タ作成部107の処理を図１２に示す。第２の実施の形態
では、第１の実施の形態における構造情報保持文書デー
タ作成部107の処理である図５のステップ５０５がステ
ップ１２０１に変わっており、文書構成要素ごとのレコ
ードが文書型文字列を含むように作成される。Here, the flow of the registration process according to the second embodiment will be described. First, the case of the structured document example shown in FIG. 3 as the first document to be registered will be described. The structured document in FIG. 3 has a “paper” type DTD shown in FIG. The processing of the structure analyzing unit 106 is the same as that of the first embodiment, and thus the description is omitted. Next, the processing of the structure information holding document data creation unit 107 is shown in FIG. In the second embodiment, step 505 in FIG. 5, which is the processing of the structure information holding document data creation unit 107 in the first embodiment, is changed to step 1201, and the record for each document component is a document type character. Created to include columns.

【００５９】図１３は、第２の実施の形態において、図
３の構造化文書の例から作成される構造情報保持文書デ
ータの例である。第１の実施の形態の場合の例である図
６と比較すると、文書型文字列が加わっていることが分
かる。図３の例では一行目に、＜!DOCTYPE 論文 SYSTEM “paper.dtd”＞というような文書型宣言が記述されており、この“論
文”を文書文字列とする。次に、第２の実施の形態にお
ける索引情報作成部108では、第１の実施の形態と同様
に索引情報を作成するが、第１の実施の形態が構造文字
列および内容文字列についての計２種類の索引情報を作
成したのに対して、第２の実施の形態では前記２種類の
他に文書型文字列についての索引情報も作成され、これ
が索引情報格納部115に追加される。FIG. 13 shows an example of structure information holding document data created from the example of the structured document of FIG. 3 in the second embodiment. Compared to FIG. 6 which is an example of the first embodiment, it can be seen that a document type character string is added. In the example of FIG. 3, a document type declaration such as <! DOCTYPE paper SYSTEM “paper.dtd”> is described on the first line, and this “paper” is used as a document character string. Next, the index information creating unit 108 according to the second embodiment creates index information in the same manner as in the first embodiment, but the first embodiment provides a total number of structured character strings and content character strings. While two types of index information are created, in the second embodiment, index information on document-type character strings is also created in addition to the two types, and this is added to the index information storage unit 115.

【００６０】次に、２番目の登録対象文書として、「論
文」とは異なる図１４に示す「レシピ」型のＤＴＤを持
つ図１５のような構造化文書の例が登録される場合につ
いて説明する。Next, a case will be described in which an example of a structured document as shown in FIG. 15 having a “recipe” type DTD shown in FIG. 14 different from “thesis” is registered as the second document to be registered. .

【００６１】構造解析部106の処理は第１の実施の形態
と同様であるため説明を省略する。The processing of the structure analyzing unit 106 is the same as that of the first embodiment, and the description is omitted.

【００６２】図１６は図１５の構造化文書を構造解析部
106によって解析した結果得られる木構造である。FIG. 16 shows the structured document of FIG.
It is a tree structure obtained as a result of analysis by 106.

【００６３】次に、構造情報保持文書データ作成部107
にて、第一の登録対象文書と同様に図１２に示す処理を
行なった結果が図１７である。図１７に示すように、こ
の登録対象文書には第一の登録対象文書とは異なる文書
番号が割り当てられている。また文書型文字列が、文書
型である“レシピ”となっている。Next, the structure information holding document data creation unit 107
FIG. 17 shows the result of performing the processing shown in FIG. 12 in the same manner as in the first registration target document. As shown in FIG. 17, a document number different from the first registration target document is assigned to this registration target document. The document type character string is a document type “recipe”.

【００６４】次に、索引情報作成部108にて、第一の登
録対象文書の場合と同様に構造文字列、内容文字列、お
よび文書型文字列についての索引情報が作成され、索引
情報格納部115に追加される。文書型文字列の索引情報
として、文書型を表す文字列に関連付けて、その文書型
を有する文書の文書番号が格納される。Next, the index information creating unit 108 creates index information on the structural character string, the content character string, and the document type character string in the same manner as in the case of the first document to be registered. Added to 115. As document type character string index information, the document number of a document having the document type is stored in association with the character string representing the document type.

【００６５】新たな登録対象文書が入力されるごとに、
以上の登録処理を繰り返す。Every time a new document to be registered is input,
The above registration processing is repeated.

【００６６】次に、第２の実施の形態における文書検索
の処理の流れを具体的な検索条件の例を用いて説明す
る。入力手段103から検索条件として「文書型が［論
文］で、第１章の章題に“登録”という文字列が含まれ
る文書」が与えられたとする。第２の実施の形態では、
文書型を意識した登録、具体的には各構成要素ごとに文
書型文字列を持ち、この文書型文字列についての索引情
報を持つという登録を行なっているため、このように文
書型を指定した検索が可能となる。Next, the flow of a document search process according to the second embodiment will be described with reference to specific search conditions. It is assumed that “a document whose document type is [thesis] and the title of Chapter 1 includes a character string“ registered ”” is given as a search condition from the input unit 103. In the second embodiment,
Because the registration is performed with the document type in mind, specifically, the registration of the document type string for each component and the index information for this document type string is performed. Search becomes possible.

【００６７】図１８は、第２の実施の形態における検索
条件解析部109の処理の流れを示したものである。第２
の実施の形態では、第１の実施の形態での検索条件解析
部109の処理を行なう前に文書型を指定した検索かどう
か判別し（ステップ１８０１）、指定している場合はス
テップ１８０２でその文書型を文書型条件文字列とし、
指定されていない場合はステップ１８０３で文書型条件
文字列をNullとして次のステップ９０１へ進む。ステッ
プ９０１以下は第１の実施の形態での処理と同様である
ため説明を省略する。FIG. 18 shows the flow of processing of the search condition analysis unit 109 according to the second embodiment. Second
In the second embodiment, it is determined whether or not the search is performed by specifying a document type before performing the process of the search condition analysis unit 109 in the first embodiment (step 1801). The document type is a document type condition string,
If not specified, the document type condition character string is set to Null in step 1803, and the flow advances to the next step 901. Steps 901 and subsequent steps are the same as the processing in the first embodiment, and a description thereof will not be repeated.

【００６８】ここでの検索条件の例は「文書型が［論
文］で、第１章の章題に“登録”という文字列が含まれ
る文書」であるので、上記処理の結果、作成される検索
条件式は、「文書型文字列＝“論文” ＆構造文字列
＝“＊\章_１\章題_１\” ＆内容文字列＝“＊登録
＊”」となる。An example of the search condition here is “a document whose document type is [article] and the chapter title of Chapter 1 includes the character string“ registration ””, and is created as a result of the above processing. The search condition expression is “document type character string =“ paper ”& structure character string =“ * \ chapter_1 \ chapter_1_1 \ ”& content character string =“ * registered * ””.

【００６９】次に、第２の実施の形態における索引情報
検索部110の処理の流れを図１９を用いて説明する。ま
ず、内容文字列についての検索を行なうステップ１００
１、１００２、１００３は第１の実施の形態と同様であ
るため説明を省略する。次に内容文字列について検索を
行なった結果が０件でなかった場合で、さらに検索条件
式の文書型文字列がNullであった場合はステップ１００
７へ進み、ステップ１００２の結果を最終検索結果とし
て終了する。検索条件式の文書型文字列がNullでなかっ
た場合は、ステップ１９０２において、文書型文字列に
ついて、索引情報を用いて検索を行なう。次にステップ
１００４において、検索条件式の構造文字列がNullであ
った場合は、ステップ１９０３にて、ステップ１００２
とステップ１９０２の結果を基に文書番号と要素番号の
組を最終結果として抽出し、終了する。検索条件式の構
造文字列がNullでなかった場合は、ステップ１００５
で、構造文字列について索引情報を用いて検索を行な
う。最後にステップ１９０４にて、ステップ１００２と
１９０２と１００５の結果を基に文書番号と要素番号の
組を最終結果として抽出し、終了する。Next, the flow of processing of the index information search unit 110 in the second embodiment will be described with reference to FIG. First, a step 100 for searching for a content character string
Reference numerals 1, 1002, and 1003 are the same as those in the first embodiment, and a description thereof will not be repeated. Next, if the result of searching for the content character string is not 0, and if the document type character string of the search condition expression is Null, step 100
The process proceeds to step S7, and the result of step 1002 is ended as the final search result. If the document type character string of the search condition expression is not Null, in step 1902, a search is performed for the document type character string using the index information. Next, in step 1004, when the structural character string of the search condition expression is Null, in step 1903, step 1002
Then, a set of a document number and an element number is extracted as a final result based on the result of step 1902, and the processing is terminated. If the structure character string of the search condition expression is not Null, step 1005
Then, a search is performed for the structured character string using the index information. Finally, in step 1904, a set of a document number and an element number is extracted as a final result based on the results of steps 1002, 1902, and 1005, and the process ends.

【００７０】検索結果表示処理については、第１の実施
の形態と同様であるため説明を省略する。The search result display processing is the same as in the first embodiment, and a description thereof will not be repeated.

【００７１】以上のように、本実施の形態では、構造化
文書の論理構造の情報として、構造文字列だけでなく、
登録対象文書の文書型をも文書型文字列として登録する
ことによって、どのような文書型を持つ構造化文書でも
登録可能であり、また検索を行なう際に、文書型を指定
した検索が可能となる。As described above, in the present embodiment, not only the structure character string but also the logical
By registering the document type of the registration target document as a document type character string, it is possible to register structured documents with any document type, and it is possible to search by specifying the document type when performing search Become.

【００７２】（第３の実施の形態）本発明の第３の実施
の形態では、文書型を文書型識別子で設定する構造化文
書検索システムについて説明する。(Third Embodiment) In a third embodiment of the present invention, a structured document retrieval system for setting a document type by a document type identifier will be described.

【００７３】このシステムは、図２０に示すように、検
索エンジン104の登録手段の中に、文書型と文書型識別
子とを対応付けた文書型テーブル及び文書型識別子と構
造化文書の要素とを対応付けた構造テーブルとを作成す
る文書型・構造テーブル作成部2001を新たに備え、ま
た、検索データベース105内に、作成された文書型テー
ブル及び構造テーブルを格納する文書型・構造テーブル
格納部2002を新たに備えている。その他の構成は第１の
実施の形態（図１）と変わりがない。In this system, as shown in FIG. 20, a document type table in which a document type is associated with a document type identifier, a document type identifier and an element of a structured document are stored in the registration means of the search engine 104. A document type / structure table creating unit 2001 for creating the associated structure table is newly provided, and a document type / structure table storage unit 2002 for storing the created document type table and the structure table in the search database 105. Is newly provided. Other configurations are the same as those of the first embodiment (FIG. 1).

【００７４】ここで、第３の実施の形態における登録処
理の流れを説明する。まず構造解析部106の処理は、第
１および第２の実施の形態と同様であるため説明を省略
する。Here, the flow of the registration process in the third embodiment will be described. First, the processing of the structure analyzing unit 106 is the same as that of the first and second embodiments, and thus the description is omitted.

【００７５】図２１は、文書型・構造テーブル作成部20
01における処理の流れを示したものである。ここで登録
対象文書として図２に示すＤＴＤを持つ図３の構造化文
書の例の場合で説明する。まずステップ２１０１におい
て登録対象文書のＤＴＤ（図２）を読みこむ。次に、ス
テップ２１０２において、このＤＴＤで定義されている
文書型が文書型テーブルに登録されているか判断する。
この文書型テーブルとは、文書型とこの文書型に対して
一意に割当てられた識別子（これをここでは文書識別子
と呼ぶ）からなるテーブルである。文書型テーブルに登
録済みの場合は、そのまま処理を終了する。ここでの例
では、図３に示される登録対象文書は最初に登録される
文書であるとし、この時点で文書型テーブルは空の状態
であるため、ステップ２１０３に進み、文書型「論文」
に対して識別子を割当て、これら文書型と文書型識別子
の対応関係を管理するテーブル（以下、文書型テーブル
と呼ぶ）に追加する(ステップ２１０４)。FIG. 21 shows a document type / structure table creation unit 20.
It shows the flow of processing in 01. Here, the case of the example of the structured document in FIG. 3 having the DTD shown in FIG. 2 as a registration target document will be described. First, in step 2101, the DTD (FIG. 2) of the registration target document is read. Next, in step 2102, it is determined whether the document type defined by this DTD is registered in the document type table.
The document type table is a table including a document type and an identifier uniquely assigned to the document type (this is called a document identifier here). If it has been registered in the document type table, the process ends. In this example, it is assumed that the registration target document shown in FIG. 3 is the first document to be registered. At this point, the document type table is empty.
, And adds it to a table for managing the correspondence between these document types and document type identifiers (hereinafter, referred to as a document type table) (step 2104).

【００７６】次に、このＤＴＤの中で定義されている全
てのタグ名に対して、ステップ２１０５、２１０６、２
１０７を繰り返す。まずステップ２１０５でタグ名を読
み込み、ステップ２１０６でそのタグ名に一意な識別子
（以下、タグ識別子と呼ぶ）を割当て、ステップ２１０
７でこれらタグ名とタグ識別子、および文書型識別子の
対応関係を管理するテーブル（以下、構造テーブルと呼
ぶ）に追加する。Next, steps 2105, 2106, 2
Repeat 107. First, in step 2105, a tag name is read, and in step 2106, a unique identifier (hereinafter, referred to as a tag identifier) is assigned to the tag name.
At 7, a tag name, a tag identifier, and a document type identifier are added to a table (hereinafter, referred to as a structure table) for managing the correspondence between them.

【００７７】図２２は、図２のＤＴＤを読み込んだとき
に作成される文書型テーブルと構造テーブルの例であ
る。なお、図２２の例において、文書型識別子は“１”
が、タグ識別子は“１”から昇順に割り当てられている
が、これらは文書型およびタグ名を一意に識別できるも
のであれば昇順に割当てられていなくてもよいし、数字
でなくてもよい。FIG. 22 shows an example of a document type table and a structure table created when the DTD of FIG. 2 is read. In the example of FIG. 22, the document type identifier is “1”.
However, the tag identifiers are assigned in ascending order from "1", but these may not be assigned in ascending order or may not be numbers as long as they can uniquely identify the document type and the tag name. .

【００７８】次に、別の登録対象文書として、（仮に、
１０番目の登録対象文書として）図２３に示すＤＴＤを
持つ、図２４に示すような構造化文書の登録要求がきた
とする。この図２３に示すＤＴＤは、図２のＤＴＤと比
較するとその要素名（タグ名）が日本語と英語の違いが
あるものの、同じ論理構造を有するＤＴＤである。この
ような、既に文書型テーブルおよび構造テーブルに登録
済みの文書型と同一な構造を定義しているＤＴＤを持つ
文書を登録する場合、文書型・構造テーブル作成部2001
では、そのＤＴＤについての文書型識別子およびタグ識
別子は、既に登録済みの対応する文書型およびタグ名に
割当てられた識別子と同一な値を割当てる。Next, as another document to be registered, (tentatively,
Assume that a request for registration of a structured document as shown in FIG. 24 having the DTD shown in FIG. 23 (as the tenth document to be registered) is received. The DTD shown in FIG. 23 has the same logical structure as the DTD of FIG. 2, although the element name (tag name) is different between Japanese and English. When registering such a document having a DTD that defines the same structure as the document type already registered in the document type table and the structure table, the document type / structure table creation unit 2001
Then, the same value as the identifier assigned to the corresponding registered document type and tag name is assigned to the document type identifier and tag identifier for the DTD.

【００７９】図２５は、図２３に示すＤＴＤについて、
文書型・構造テーブル作成部2001の処理を行なった時点
での文書型テーブルおよび構造テーブルの結果である。FIG. 25 shows the DTD shown in FIG.
This is the result of the document type table and the structure table at the time when the process of the document type / structure table creation unit 2001 is performed.

【００８０】次の構造情報保持文書データ作成部107に
おける処理の流れは第２の実施の形態とほぼ同じである
が、ただし文書型文字列を作成する際にタグ名の変わり
にタグ識別子を用いる。また、文書型で文書型文字列の
フィールドを作成するのではなく、文書型識別子のフィ
ールドを作成する。The flow of processing in the next structure information holding document data creation unit 107 is almost the same as that of the second embodiment, except that a tag identifier is used instead of a tag name when creating a document type character string. . Also, instead of creating a document type character string field for the document type, a document type identifier field is created.

【００８１】図２６は、第３の実施の形態において図３
および図２４の構造化文書を登録した際に、構造情報保
持文書データ作成部107の処理を行なった結果得られる
構造情報保持文書データである。符号2601から2614は図
２６に示すレコードの一つ一つを区別するために割当て
たものであり、後述の実例を用いた検索処理の説明の中
で利用する。FIG. 26 shows a third embodiment of the present invention.
24 is structure information holding document data obtained as a result of performing the processing of the structure information holding document data creation unit 107 when the structured document of FIG. 24 is registered. Reference numerals 2601 to 2614 are assigned to distinguish each of the records shown in FIG. 26, and will be used in the description of search processing using an actual example described later.

【００８２】次に、第３の実施の形態における索引情報
作成部108では、第２の実施の形態とは違い、文書型に
ついての索引情報は作成しない。これは第２の実施の形
態では、文書型の情報として文書型の文字列を保持した
が、第３の実施の形態では、文書型の情報としては文書
型識別子を数値として持つからである。Next, unlike the second embodiment, the index information creating unit 108 according to the third embodiment does not create index information for a document type. This is because, in the second embodiment, a document type character string is held as document type information, but in the third embodiment, the document type information has a document type identifier as a numerical value.

【００８３】次に第３の実施の形態における文書検索処
理の流れを説明する。具体的な検索条件例として「文書
型が［論文］で、第１章の章題に“方法”という文字列
が含まれる文書」が与えられたとする。Next, the flow of a document search process according to the third embodiment will be described. As an example of a specific search condition, it is assumed that “a document whose document type is [thesis] and the chapter title of Chapter 1 includes a character string“ method ”” is given.

【００８４】図２７は第３の実施の形態における検索条
件解析部109の処理の流れを示したものである。第３の
実施の形態では、第２の実施の形態と同様に文書型を指
定した検索かどうか判別するが、文書型を指定した検索
の場合に文書型テーブルを読込み（ステップ２７０
１）、文書型文字列ではなく指定された文書型の識別子
を抽出する（ステップ２７０２）。その後、構造条件指
定の有無により後方一致用構造条件文字列または中間一
致文字列を作成する処理を行うという点では第２の実施
の形態と同様である。ただし、第３の実施の形態では構
造テーブルを読込み（ステップ２７０３）、これを参照
しながら構造条件文字列を作成する点が第２の実施の形
態と異なる。FIG. 27 shows the flow of processing of the search condition analysis unit 109 according to the third embodiment. In the third embodiment, as in the second embodiment, it is determined whether or not the search is performed by specifying the document type. However, in the case of the search in which the document type is specified, the document type table is read (step 270).
1) Extract the specified document type identifier instead of the document type character string (step 2702). After that, the second embodiment is similar to the second embodiment in that a process of creating a backward matching structural condition character string or an intermediate matching character string is performed according to the presence or absence of the structural condition designation. However, the third embodiment differs from the second embodiment in that the structure table is read (step 2703), and the structure condition character string is created with reference to the structure table.

【００８５】ここでの検索条件の例は「文書型が［論
文］で、第１章の章題に“方法”という文字列が含まれ
る文書」であるので、上記処理の結果作成される検索条
件式は、「文書型識別子＝１＆構造文字列＝“＊\
７_１\８_１\” ＆内容文字列＝“＊登録＊”」とな
る。An example of the search condition here is “a document whose document type is [article] and the chapter title of Chapter 1 includes a character string“ method ””. The conditional expression is “document type identifier = 1 & structure character string =“ * \
7_1 \ 8_1 \ "& content character string =" * registered * ".

【００８６】次に、第３の実施の形態における索引情報
検索部110の処理の流れを図２８を用いて説明する。ま
ず、内容文字列についての検索を行なうステップ１００
１、１００２、１００３は第１のおよび第２の実施の形
態と同様であるため説明を省略する。次に内容文字列に
ついて検索を行なった結果が０件でなかった場合で、さ
らに検索条件式の文書型識別子がNullであった場合はス
テップ１００７へ進み、ステップ１００２の結果を最終
検索結果として終了する。検索条件式の文書型識別子が
Nullでなく、さらにステップ１００４にて検索条件式の
構造文字列がNullであった場合は、ステップ２８０２へ
進み、ステップ１００２の結果の文書のうち文書型識別
子が検索条件式で指定された値と一致するものを最終結
果として抽出し、終了する。検索条件式の構造文字列が
Nullでなかった場合は、ステップ１００５で構造文字列
について索引情報を用いて検索を行なう。次にステップ
２８０３にて、ステップ１００２とステップ１００５の
結果のうち両方に存在する文書番号と要素番号の組を仮
結果として抽出し、最後にステップ２８０４にて、２８
０３で得た仮結果のうちで文書型識別子が検索条件式で
指定された値と一致する文書を最終結果として抽出し、
終了する。Next, the flow of processing of the index information search unit 110 in the third embodiment will be described with reference to FIG. First, a step 100 for searching for a content character string
Reference numerals 1, 1002, and 1003 are the same as those in the first and second embodiments, and a description thereof will not be repeated. Next, when the result of the search for the content character string is not 0, and when the document type identifier of the search condition expression is Null, the process proceeds to step 1007, and the result of step 1002 is ended as the final search result. I do. The document type identifier of the search condition expression is
If it is not Null and the structure character string of the search condition expression is Null in step 1004, the process proceeds to step 2802, and the document type identifier of the document resulting from step 1002 is set to the value specified in the search condition expression. A match is extracted as the final result, and the process ends. The structure string of the search condition expression is
If not Null, a search is performed for the structural character string using index information in step 1005. Next, in step 2803, a set of the document number and the element number existing in both of the results of step 1002 and step 1005 is extracted as a provisional result.
03, a document whose document type identifier matches the value specified in the search condition expression is extracted as a final result from among the provisional results obtained in step 03,
finish.

【００８７】ここでの例では検索条件が「文書型が［論
文］で、第１章の章題に“方法”という文字列が含まれ
る文書」で、このとき検索条件式を「文書型識別子＝１
＆構造文字列＝“＊\７_１\８_１\” ＆内容文字
列＝“＊登録＊”」と作成したことにより、上記図２８
の処理を行なった結果として図２６の2604と2611が得ら
れる。2611は文書型が［paper］で、文字列“方法”が
含まれるのは［chpt-title］中であるが、文書型・構造
テーブル作成部2001にて文書型が［論文］と同一な文書
型識別子を、また対応する構造文字列に同一な値を割当
てているため検索結果として2604と2611の両方を得るこ
とが可能となる。In this example, the search condition is “document whose document type is [paper] and the title of Chapter 1 includes a character string“ method ””. = 1
By creating & structure character string = “* \ 7_1 \ 8_1 \” & content character string = “* Registered *”,
26, 2604 and 2611 in FIG. 26 are obtained. 2611 indicates that the document type is [paper] and that the character string "method" is included in [chpt-title], but the document type / structure table creation unit 2001 has the same document type as [paper]. Since the same value is assigned to the type identifier and the corresponding structure character string, both 2604 and 2611 can be obtained as search results.

【００８８】検索結果表示処理については、第１および
第２の実施の形態と同様であるため説明を省略する。The search result display processing is the same as in the first and second embodiments, and a description thereof will be omitted.

【００８９】以上のように、本実施の形態では、文書型
・構造テーブル作成部2001において、同一な構造を定義
している異なるＤＴＤに対して、その文書型識別子、お
よびタグ識別子に同一な値を割当てることにより、検索
を行なう際に、一つの検索条件式から、同一構造を持
つ、文書型が異なる文書も検索結果として抽出が可能と
なる。As described above, in the present embodiment, in the document type / structure table creation unit 2001, the same value is used for the document type identifier and the tag identifier for different DTDs defining the same structure. When a search is performed, documents having the same structure but different document types can be extracted as search results from one search condition expression.

【００９０】（第４の実施の形態）本発明の第４の実施
の形態の構造化文書検索システムでは、内容文字列の索
引情報に文書型識別子及びタグ識別子を付加して、検索
を効率化している。(Fourth Embodiment) In a structured document search system according to a fourth embodiment of the present invention, a document type identifier and a tag identifier are added to index information of a content character string to improve search efficiency. ing.

【００９１】このシステムの構成図は第３の実施の形態
の図２０と変わらない。ただし、索引情報作成部108で
作成する索引情報の内容が異なり、その結果、索引情報
検索部110の処理が前記第３の実施の形態とは異なる。The configuration of this system is not different from that of the third embodiment shown in FIG. However, the contents of the index information created by the index information creation unit 108 are different, and as a result, the processing of the index information search unit 110 is different from that of the third embodiment.

【００９２】ここで、第４の実施の形態における文書登
録処理の流れを説明する。まず、最初の登録対象文書と
して図３に示された構造化文書例の場合で説明する。図
３の構造化文書は図２に示される「論文」型のＤＴＤを
持つ。構造解析部106、文書型・構造テーブル作成部200
1、および構造情報保持文書データ作成部107の処理は第
３の実施の形態と同様であるため説明を省略する。この
結果、図２６と同様の構造情報保持データが得られる。Here, the flow of the document registration process in the fourth embodiment will be described. First, the case of the structured document example shown in FIG. 3 as the first document to be registered will be described. The structured document in FIG. 3 has a “paper” type DTD shown in FIG. Structure analysis unit 106, document type / structure table creation unit 200
1, and the processing of the structure information holding document data creation unit 107 is the same as that of the third embodiment, and therefore the description is omitted. As a result, the same structure information holding data as in FIG. 26 is obtained.

【００９３】次に、第４の実施の形態における索引情報
作成部108の処理の流れを図２９に示す。第３の実施の
形態と異なるのは、内容文字列から文字連鎖についての
索引情報を作成する際に文書番号、要素番号、文字位置
番号に加えて、文書型識別子及びタグ識別子の情報も付
加する（ステップ２９０１）点だけである。構造文字列
索引情報の作成法は第３の実施の形態と変わらない。Next, FIG. 29 shows a processing flow of the index information creating unit 108 according to the fourth embodiment. The difference from the third embodiment is that when creating index information on a character chain from a content character string, information of a document type identifier and a tag identifier is added in addition to a document number, an element number, and a character position number. (Step 2901) Only the point. The method of creating the structural character string index information is the same as in the third embodiment.

【００９４】図３０は、第４の実施の形態における索引
情報作成部108によって図２６の構造情報保持データの
うち2601および2608のレコードについて作成した内容文
字列索引情報の例の一部を示した図である。図３０の30
01は「文書型識別子が“１”で文書番号が“１０”の文
書の、要素番号が“１”でタグ識別子が“３”の要素の
内容文字列の中に、“書検”という文字連鎖が先頭から
“２”文字目の位置から存在する」ということを表して
いる。FIG. 30 shows a part of an example of content character string index information created for records 2601 and 2608 in the structural information holding data of FIG. 26 by the index information creating unit 108 in the fourth embodiment. FIG. 30 in FIG.
01 is the character string "Book Check" in the content string of the element with the document type identifier "1" and the document number "10" and the element number "1" and the tag identifier "3". It means that the chain exists from the position of the “2” character from the head ”.

【００９５】次に、第４の実施の形態における文書検索
処理の流れを説明する。図３１は検索条件解析部109で
の処理を示したものである。文書型指定に関する処理
（ステップ１８０１、２７０１、２７０２、１８０３）
は第３の実施の形態と同様であるため説明を省略する。
次にステップ３１０１でタグ名のみを指定した構造検索
かどうか調べる。タグ名のみを指定した構造検索とは、
例えば「文書型が［論文］で、章題に“登録方”が含ま
れる文書」というような、タグ名のみの構造情報が与え
られ、その構造要素が複数あり得るような構造指定であ
る。タグ名のみを指定した検索であった場合、構造テー
ブルを読込み（ステップ２７０３）、指定されたタグ名
に対応するタグ識別子を抽出し（ステップ３１０２）、
文書型識別子、タグ識別子、および内容文字列の指定か
らなる検索条件式を作成する（ステップ３１０３）。上
記例のような検索条件が与えられた場合の検索条件式は
「文書型識別子＝１＆タグ識別子＝８＆内容文
字列＝“＊登録方＊”」となる。なおこの検索条件式
は、文書型テーブルおよび構造テーブルが図２５のよう
に作成されている場合である。また、タグ名のみを指定
した検索でない場合は、図２７のステップ９０１以下の
処理を行ない、第３の実施の形態と同様の検索条件式を
作成する（ステップ３１０４）。Next, the flow of a document search process according to the fourth embodiment will be described. FIG. 31 shows the processing in the search condition analysis unit 109. Processing related to document type designation (steps 1801, 2701, 2702, 1803)
Are the same as those in the third embodiment, and thus description thereof is omitted.
Next, in step 3101, it is checked whether or not a structure search in which only the tag name is specified. A structure search that specifies only the tag name is
For example, structure information such as "document whose document type is [thesis] and the chapter title includes" registration method "" is given structure information of only a tag name, and there may be a plurality of structure elements. If it is a search that specifies only the tag name, the structure table is read (step 2703), and a tag identifier corresponding to the specified tag name is extracted (step 3102).
A search condition expression including a document type identifier, a tag identifier, and a content character string is created (step 3103). When a search condition as in the above example is given, the search condition expression is “document type identifier = 1 & tag identifier = 8 & content character string =“ * registration method * ””. Note that this search condition expression is used when the document type table and the structure table are created as shown in FIG. If the search does not specify only the tag name, the processing from step 901 in FIG. 27 is performed to create a search condition expression similar to that of the third embodiment (step 3104).

【００９６】図３２は第４の実施の形態における索引情
報検索部110の処理を示したものである。ステップ３２
０１で検索条件式に構造文字列指定を含むか否か判断
し、含まない場合、すなわち文書型のみ、またはタグ名
によって構造条件指定された検索の場合は、まず索引情
報を読込み（ステップ１００１）、次に、内容文字列索
引情報のみを用いて検索をする（ステップ３２０２）。
ステップ３２０１で検索条件式に構造文字列を含む場合
には、図２８に示す、第３の実施の形態と同様の検索処
理を行なう（ステップ３２０３）。FIG. 32 shows the processing of the index information search unit 110 according to the fourth embodiment. Step 32
In step 1001, it is determined whether or not the search condition expression includes a structural character string designation. Then, a search is performed using only the content character string index information (step 3202).
If the search condition expression includes a structural character string in step 3201, the same search processing as in the third embodiment shown in FIG. 28 is performed (step 3203).

【００９７】ここで、ステップ３２０２で行なう、内容
文字列索引情報のみを用いた検索の詳細を図３３を用い
て説明する。検索条件式は「文書型識別子＝１＆タ
グ識別子＝８＆内容文字列＝“＊登録方＊”」とす
る。文字列“登録方”については２文字ずつの文字連鎖
として“登録”と“録方”を取り出すことができる。こ
こで取り出す文字連鎖の文字数は、索引情報の文字連鎖
の文字数と同一とする。この２つの文字連鎖について図
３３の3310に示すような索引情報が作成されているとし
て、この中から文書番号および要素番号が同一なものを
まず取り出す。図３３の例では、これが3321および3322
であるが、このうち“登録”の連鎖から“録方”の連鎖
に対して文字位置番号が連続し、かつ文書型識別子およ
びタグ識別子の値が検索条件式と一致しているものを検
索結果として抽出する。図３３の例では3321を取り出す
ことができ、この文書番号と要素番号の組を検索結果と
する。The details of the search performed in step 3202 using only the content character string index information will be described with reference to FIG. The search condition formula is “document type identifier = 1 & tag identifier = 8 & content character string =“ * registration method * ””. As for the character string “registration method”, “registration” and “recording method” can be extracted as a character chain of two characters. Here, the number of characters in the character chain to be extracted is the same as the number of characters in the character chain in the index information. Assuming that index information such as that shown by 3310 in FIG. 33 has been created for these two character chains, those having the same document number and element number are extracted from them. In the example of FIG. 33, this is 3321 and 3322.
Where the character position numbers are consecutive from the “registration” chain to the “recording” chain, and the values of the document type identifier and tag identifier match the search condition expression. Extract as In the example of FIG. 33, 3321 can be extracted, and a set of the document number and the element number is set as a search result.

【００９８】なお、図３３の例では、検索条件式におい
て文書型識別子およびタグ識別子の両方の値が指定され
ていたが、文書型識別子のみ指定されたような場合は索
引情報のタグ識別子の値は無視して、文書型識別子の値
が一致するものを抽出すれば良い。In the example of FIG. 33, both the document type identifier and the tag identifier are specified in the search condition expression. However, when only the document type identifier is specified, the value of the tag identifier of the index information is used. May be ignored and those having the same document type identifier value may be extracted.

【００９９】検索結果表示処理については、第１、第
２、第３の実施の形態と同様であるため説明を省略す
る。The search result display processing is the same as in the first, second, and third embodiments, and therefore, the description is omitted.

【０１００】以上のように、第４の実施の形態では、文
書登録を行なう際に索引情報作成部108により登録対象
文書の文書型識別子とタグ識別子とを内容文字列に関す
る索引情報に付加しており、こうすることにより、検索
を行なう際に、文書型、またはタグ名により構造条件指
定した検索要求の場合に、内容文字列に関する索引情報
に対してのみ検索を行ない、構造文字列に関する索引情
報に対して検索を行なわなくて済むため、高速な検索が
可能となる。As described above, in the fourth embodiment, when registering a document, the index information creating unit 108 adds the document type identifier and the tag identifier of the document to be registered to the index information related to the content character string. By doing so, when performing a search, in the case of a search request in which a structural condition is specified by a document type or a tag name, only the index information on the content character string is searched, and the index information on the structural character string is obtained. Since it is not necessary to perform a search for, a high-speed search can be performed.

【０１０１】（第５の実施の形態）本発明の第５の実施
の形態の構造化文書検索システムでは、構造文字列から
出現順序のデータを除き、出現順序の情報を構造文字列
の外で保持している。(Fifth Embodiment) In a structured document search system according to a fifth embodiment of the present invention, data of the order of appearance is excluded from the structure character string, and the information of the order of appearance is stored outside the structure character string. keeping.

【０１０２】この構造化文書検索システムの構成図は第
３の実施の形態の図２０と変わらない。ただし、文書型
・構造テーブル作成部2001で作成する文書型テーブルの
内容が異なり、また、それに伴い構造情報保持文書デー
タ作成部107、検索条件解析部109、索引情報検索部110
の処理が第３の実施の形態と異なる。The structure of this structured document search system is the same as that of the third embodiment shown in FIG. However, the contents of the document type table created by the document type / structure table creation unit 2001 are different, and accordingly, the structure information holding document data creation unit 107, the search condition analysis unit 109, and the index information search unit 110
Is different from that of the third embodiment.

【０１０３】ここで、第５の実施の形態における文書登
録処理の流れを説明する。ここでは登録対象文書として
図３に示された構造化文書例の場合で説明する。図３の
構造化文書は図２に示される「論文」型のＤＴＤを持
つ。構造解析部106の処理は第３の実施の形態と同様で
あるため説明を省略する。Here, the flow of the document registration process in the fifth embodiment will be described. Here, the case of the structured document example shown in FIG. 3 as the registration target document will be described. The structured document in FIG. 3 has a “paper” type DTD shown in FIG. The processing of the structure analysis unit 106 is the same as that of the third embodiment, and thus the description is omitted.

【０１０４】次に、第５の実施の形態における文書型・
構造テーブル作成部2001が作成する文書型テーブルにつ
いて説明する。登録対象文書のＤＴＤが図２のような場
合、第１の実施の形態における図２の説明でも述べたよ
うに、“＊”記号のついている〈章〉、〈段落〉、
〈節〉要素は文書中に複数出現し得ることが分かる。こ
れらの要素名（タグ名）を文書型テーブルのフィールド
に付け加える。Next, the document type in the fifth embodiment will be described.
The document type table created by the structure table creation unit 2001 will be described. When the DTD of the document to be registered is as shown in FIG. 2, as described in the description of FIG. 2 in the first embodiment, <chapter>, <paragraph>,
It can be seen that multiple <section> elements can appear in a document. These element names (tag names) are added to the fields of the document type table.

【０１０５】図３４は図２に示すＤＴＤについて第５の
実施の形態の方法で作成した場合の文書型テーブルおよ
び構造テーブルである。図３４に示すように、第３の実
施の形態における文書型テーブルのフィールドに加え、
繰り返し出現可能な要素のタグ名を記録するフィールド
を持つ。FIG. 34 shows a document type table and a structure table when the DTD shown in FIG. 2 is created by the method of the fifth embodiment. As shown in FIG. 34, in addition to the fields of the document type table in the third embodiment,
Has a field to record the tag name of the element that can appear repeatedly.

【０１０６】次に図３５は、登録対象文書が図３の場合
に、第５の実施の形態における構造情報保持文書データ
作成部107が作成する構造情報保持文書データを示す図
である。符号3501から3507は図３５に示すレコードの一
つ一つを区別するために割当てたものであり、後述の実
例を用いた検索処理の説明の中で利用する。図２６と比
べると分かるように、第５の実施の形態では、構造文字
列を作成する際に各要素の出現順序を含まない。また、
繰返出現フィールドが加わっており、これは図３４と対
応し「繰返出現１」フィールドは要素“章”（タグ識別
子：７）の該レコードにおける出現順序を記録し、該レ
コードの構造文字列に“章”に対応するタグ識別子：７
を含まない場合は値を持たない。同様に「繰返出現２」
「繰返出現３」フィールドは、それぞれ“段落”（タグ
識別子：１１）、“節”（タグ識別子：９）の該レコー
ドにおける出現順序を記録する。Next, FIG. 35 is a diagram showing the structure information holding document data created by the structure information holding document data creation unit 107 in the fifth embodiment when the registration target document is that of FIG. Reference numerals 3501 to 3507 are assigned to distinguish each of the records shown in FIG. 35, and will be used in the description of a search process using an example described later. As can be understood from comparison with FIG. 26, in the fifth embodiment, the order of appearance of each element is not included when creating a structural character string. Also,
A repeated appearance field is added, which corresponds to FIG. 34. The “repeated appearance 1” field records the appearance order of the element “chapter” (tag identifier: 7) in the record, and the structural character string of the record Tag identifier corresponding to "chapter": 7
Has no value if is not included. Similarly, "Repeat occurrence 2"
The “repeated appearance 3” field records the appearance order of “paragraph” (tag identifier: 11) and “section” (tag identifier: 9) in the record.

【０１０７】第５の実施の形態における索引情報作成部
108の処理は第３の実施の形態と同様であるため説明を
省略する。Index information creating section in the fifth embodiment
The processing of 108 is the same as that of the third embodiment, and the description is omitted.

【０１０８】次に、第５の実施の形態における文書検索
処理の流れを説明する。図３６は第５の実施の形態にお
ける検索条件解析部109の処理の流れを示したものであ
る。第３の実施の形態における処理と違うのはステップ
３６０１、３６０２、３６０３、および３６０４だけで
ある。ステップ３６０１では、ある要素以下を指定した
検索かどうか判断する。ある要素以下を指定した検索と
は、「第１章内に“○○”と言う文字列を含む」と言っ
たような条件である。即ち、この場合には、『章』以下
の『節』等が検索対象に含まれる。この「ある要素」は
『第１章』のように唯一なものでなく、ただ単に『章』
というように複数個考えられるものでも良い。Next, the flow of a document search process according to the fifth embodiment will be described. FIG. 36 shows a flow of processing of the search condition analysis unit 109 according to the fifth embodiment. Only the steps 3601, 3602, 3603, and 3604 are different from the processing in the third embodiment. In step 3601, it is determined whether or not the search is for a specified element or less. A search that specifies a certain element or less is a condition such as "including a character string" OO "in Chapter 1." That is, in this case, “sections” following the “chapter” are included in the search target. This “certain element” is not unique like “Chapter 1”, it is simply “Chapter”
A plurality of items may be used.

【０１０９】その結果に従い、ステップ３６０２、３６
０３では、図２７のステップ２７０４、２７０５で行な
ったように、構造条件文字列を作成するが、その際、第
５の実施の形態における構造文字列の書式に習い、出現
順序を含まない構造条件文字列を作成する。次のステッ
プ３６０４において、構造条件の指定が、ある要素の出
現順序を指定したものであるときは、文書型テーブルを
参照し、該当する繰返出現の出現順序を抽出する。According to the result, steps 3602, 36
In step S03, a structure condition character string is created as performed in steps 2704 and 2705 in FIG. 27. At this time, the structure condition string not including the order of appearance is learned according to the structure character string format in the fifth embodiment. Create a string. In the next step 3604, when the specification of the structural condition specifies the appearance order of a certain element, the document type table is referred to and the appearance order of the corresponding repeated appearance is extracted.

【０１１０】具体例として、検索条件が「文書型が［論
文］で、第１章内の節の中で節題に“入力”が含まれる
文書」の場合で説明する。この場合、構造条件指定は
「ある要素以下を指定」した条件ではないので、ステッ
プ３６０３より出現順序を含まない後方一致用構造条件
文字列として“＊\７\９\１０\”を作成する。次に構造
条件で出現順序が指定されているのは『章』でその出現
順位は１である。ここで図３４の文書型テーブルより文
書型［論文］において『章』は繰返出現１フィールドに
登録されているので「繰返出現１＝１」という式を得
る。最終的にステップ９０５の時点で作成される検索条
件式は「文書型識別子＝１＆構造文字列＝“＊\７\
９\１０\” ＆繰返出現１＝１＆内容文字列＝
“＊入力＊”」となる。As a specific example, the case where the search condition is “document whose document type is [paper] and in the section of Chapter 1 the“ title ”includes“ input ”” will be described. In this case, since the structure condition specification is not a condition specifying “a certain element or less”, “* \ 7 \ 9 \ 10 \” is created from step 3603 as a structure condition character string for backward matching that does not include the order of appearance. Next, the appearance order is specified in the structural condition in "chapter", and the appearance order is 1. Here, from the document type table of FIG. 34, in the document type [thesis], "chapter" is registered in the repeated occurrence 1 field, so that the expression "repeated appearance 1 = 1" is obtained. The search condition formula finally created at the step 905 is “document type identifier = 1 & structure character string =“ * \ 7 \
9 \ 10 \ "& repeated occurrence 1 = 1 & content string =
“* Input *” ”.

【０１１１】次に、第５の実施の形態における索引情報
検索部110での処理について前記検索条件解析部109の説
明内で用いた検索条件式の場合について説明する。検索
条件式「文書型識別子＝１＆構造文字列＝“＊\７\
９\１０\” ＆繰返出現１＝１＆内容文字列＝
“＊入力＊”」について第３の実施の形態と同様に構造
文字列、内容文字列について索引情報を用いた検索を行
ない、更にその結果のうち文書型識別子の値が１となっ
ているのは図３５の3506である。第５の実施の形態で
は、ここで更に繰返出現フィールドの指定をチェックす
る。検索条件式より「繰返出現１＝１」と指定されてい
るので3506の繰返出現１フィールドを参照し、この値が
指定された値“１”と同じであるので3506は最終検索結
果として適当である。Next, the processing in the index information search unit 110 in the fifth embodiment will be described with reference to the search condition formula used in the description of the search condition analysis unit 109. Search condition expression “document type identifier = 1 & structure character string =“ * \ 7 \
9 \ 10 \ "& repeated occurrence 1 = 1 & content string =
For "* input *", a search is performed using the index information for the structural character string and the content character string in the same manner as in the third embodiment, and the value of the document type identifier is 1 in the result. Is 3506 in FIG. In the fifth embodiment, the specification of the repeated appearance field is further checked here. Since “repeated occurrence 1 = 1” is specified in the search condition expression, reference is made to the 3506 repeated occurrence 1 field, and since this value is the same as the specified value “1”, 3506 is used as the final search result. Appropriate.

【０１１２】第５の実施の形態における検索結果表示処
理については、第１、第２、第３、第４の実施の形態と
同様であるため説明を省略する。The search result display processing in the fifth embodiment is the same as in the first, second, third, and fourth embodiments, and therefore will not be described.

【０１１３】なお、本実施の形態では、第３の実施の形
態のように文書型識別子、タグ識別子を作成し、タグ識
別子を用いて構造文字列を作成する例で説明したが、第
１および第２の実施の形態のようにタグ名をそのまま用
いた構造文字列を作成する例．についても、図３４の文
書型テーブルに代わるような繰返出現の情報を保持する
手段を備え、更に図３５のように構造情報保持文書デー
タが繰返出現の情報を保持しさえすれば同様に実現可能
である。In the present embodiment, an example has been described in which a document type identifier and a tag identifier are created as in the third embodiment, and a structural character string is created using the tag identifier. Example of creating a structural character string using tag names as they are as in the second embodiment. 34 is provided with means for holding information on repeated occurrences as in place of the document type table in FIG. 34, and as long as the structural information holding document data holds information on repeated occurrences as shown in FIG. It is feasible.

【０１１４】以上のように、第５の実施の形態では、文
書登録を行なう際に、複数出現し得る要素についてだけ
その出現順序の情報を構造文字列の外に持つことによ
り、構造文字列の文字数が少なくなり検索速度が速くな
る。また、構造文字列の外に出現順序の情報を保持させ
たことにより、様々な構造指定に対応可能となる。As described above, in the fifth embodiment, at the time of document registration, only the elements that can appear a plurality of times have information on their appearance order outside the structure character string, and thus the structure character string Fewer characters and faster search speed. Further, by retaining the information of the appearance order outside the structure character string, it is possible to cope with various structure designations.

【０１１５】[0115]

【発明の効果】以上の説明から明らかなように、本発明
の構造化文書検索システムは、登録した構造化文書を、
様々な構造条件を指定して検索することが可能である。
また、構造化文書の論理構造と文書型とを意識した登録
を行うことにより、検索を行う際に論理構造および文書
型を指定した検索が実現でき、また、構造条件の指定に
合った検索処理を行うことにより、高速な構造条件指定
検索が可能になる。As is clear from the above description, the structured document search system of the present invention converts a registered structured document into
It is possible to search by designating various structural conditions.
Also, by registering the logical structure and the document type of the structured document in consideration of the logical structure and the document type at the time of the search, the search can be realized by specifying the logical structure and the document type. Performs a high-speed structural condition designation search.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における構造化文書
検索システムの構成図、FIG. 1 is a configuration diagram of a structured document search system according to a first embodiment of the present invention;

【図２】本発明の第１の実施の形態におけるＤＴＤ（文
書型定義）の一例を示す図、FIG. 2 is a diagram showing an example of a DTD (document type definition) according to the first embodiment of the present invention;

【図３】本発明の第１の実施の形態における構造化文書
の一例を示す図、FIG. 3 is a diagram showing an example of a structured document according to the first embodiment of the present invention;

【図４】図３の構造化文書の構造を図形的に示す図、FIG. 4 is a diagram schematically showing the structure of the structured document of FIG. 3;

【図５】本発明の第１の実施の形態における構造情報保
持文書データ作成部の処理手順を示す図、FIG. 5 is a diagram showing a processing procedure of a structure information holding document data creating unit according to the first embodiment of the present invention;

【図６】本発明の第１の実施の形態における構造情報保
持文書データの一例を示す図、FIG. 6 is a diagram showing an example of structure information holding document data according to the first embodiment of the present invention;

【図７】本発明の第１の実施の形態における索引情報作
成部の処理手順を示す図、FIG. 7 is a diagram showing a processing procedure of an index information creating unit according to the first embodiment of the present invention;

【図８】本発明の第１の実施の形態における索引情報の
一例を示す図、FIG. 8 is a diagram showing an example of index information according to the first embodiment of the present invention;

【図９】本発明の第１の実施の形態における検索条件解
析部の処理手順を示す図、FIG. 9 is a diagram showing a processing procedure of a search condition analysis unit according to the first embodiment of the present invention;

【図１０】本発明の第１の実施の形態における索引情報
検索部の処理手順を示す図、FIG. 10 is a diagram showing a processing procedure of an index information search unit according to the first embodiment of the present invention;

【図１１】本発明の第１の実施の形態における索引情報
を用いた検索方法の詳細を示す図、FIG. 11 is a diagram showing details of a search method using index information according to the first embodiment of the present invention;

【図１２】本発明の第２の実施の形態における構造情報
保持文書データ作成部の処理手順を示す図、FIG. 12 is a diagram showing a processing procedure of a structure information holding document data creating unit according to the second embodiment of the present invention;

【図１３】本発明の第２の実施の形態における構造情報
保持文書データの一例を示す図、FIG. 13 is a diagram showing an example of structure information holding document data according to the second embodiment of the present invention;

【図１４】本発明の第２の実施の形態におけるＤＴＤ
（文書型定義）の２番目の例を示す図、FIG. 14 shows a DTD according to the second embodiment of the present invention.
A diagram showing a second example of (document type definition),

【図１５】本発明の第２の実施の形態における構造化文
書の２番目の例を示す図、FIG. 15 is a diagram showing a second example of a structured document according to the second embodiment of the present invention;

【図１６】図１５の構造化文書の構造を図形的に示す
図、FIG. 16 is a diagram schematically showing the structure of the structured document in FIG. 15;

【図１７】本発明の第２の実施の形態における構造情報
保持文書データの２番目の例を示す図、FIG. 17 is a diagram showing a second example of the structure information holding document data according to the second embodiment of the present invention;

【図１８】本発明の第２の実施の形態における検索条件
解析部の処理手順を示す図、FIG. 18 is a diagram showing a processing procedure of a search condition analysis unit according to the second embodiment of the present invention;

【図１９】本発明の第２の実施の形態における索引情報
検索部の処理手順を示す図、FIG. 19 is a diagram showing a processing procedure of an index information search unit according to the second embodiment of the present invention;

【図２０】本発明の第３の実施の形態における構造化文
書検索システムの構成図、FIG. 20 is a configuration diagram of a structured document search system according to a third embodiment of the present invention;

【図２１】本発明の第３の実施の形態における文書型・
構造テーブル作成部の処理手順を示す図、FIG. 21 shows a document type according to the third embodiment of the present invention.
Diagram showing the processing procedure of the structure table creation unit,

【図２２】本発明の第３の実施の形態における最初の時
点での文書型テーブルと構造テーブルの一例の図、FIG. 22 is a diagram showing an example of a document type table and a structure table at an initial point in the third embodiment of the present invention;

【図２３】本発明の第３の実施の形態におけるＤＴＤ
（文書型定義）の一例を示す図、FIG. 23 shows a DTD according to the third embodiment of the present invention.
A diagram showing an example of (document type definition),

【図２４】本発明の第３の実施の形態における構造化文
書の一例を示す図、FIG. 24 is a diagram showing an example of a structured document according to the third embodiment of the present invention;

【図２５】本発明の第３の実施の形態における図２３の
ＤＴＤについての登録が終わった時点での文書型テーブ
ルと構造テーブルの一例を示す図、FIG. 25 is a diagram showing an example of a document type table and a structure table at the time when registration of the DTD of FIG. 23 is completed in the third embodiment of the present invention;

【図２６】本発明の第３の実施の形態における構造情報
保持文書データの一例を示す図、FIG. 26 is a diagram showing an example of structure information holding document data according to the third embodiment of the present invention;

【図２７】本発明の第３の実施の形態における検索条件
解析部の処理手順を示す図、FIG. 27 is a diagram showing a processing procedure of a search condition analysis unit according to the third embodiment of the present invention;

【図２８】本発明の第３の実施の形態における索引情報
検索部の処理手順を示す図、FIG. 28 is a diagram showing a processing procedure of an index information search unit according to the third embodiment of the present invention;

【図２９】本発明の第４の実施の形態における索引情報
作成部の処理手順を示す図、FIG. 29 is a diagram showing a processing procedure of an index information creating unit according to the fourth embodiment of the present invention;

【図３０】本発明の第４の実施の形態における内容文字
列索引情報の一例を示す図、FIG. 30 is a diagram illustrating an example of content character string index information according to the fourth embodiment of the present invention;

【図３１】本発明の第４の実施の形態における検索条件
解析部の処理手順を示す図、FIG. 31 is a diagram showing a processing procedure of a search condition analysis unit according to the fourth embodiment of the present invention;

【図３２】本発明の第４の実施の形態における索引情報
検索部の処理手順を示す図、FIG. 32 is a diagram showing a processing procedure of an index information search unit according to the fourth embodiment of the present invention;

【図３３】本発明の第４の実施の形態における図３２の
ステップ３２０２で行なう内容文字列索引情報のみを用
いた検索方法の詳細を示す図、FIG. 33 is a diagram showing details of a search method using only the content character string index information performed in step 3202 of FIG. 32 according to the fourth embodiment of the present invention;

【図３４】本発明の第５の実施の形態における文書型テ
ーブルと構造テーブルの一例の図、FIG. 34 is a diagram showing an example of a document type table and a structure table according to the fifth embodiment of the present invention;

【図３５】本発明の第５の実施の形態における構造情報
保持文書データの一例を示す図、FIG. 35 is a diagram showing an example of structure information holding document data according to the fifth embodiment of the present invention;

【図３６】本発明の第５の実施の形態における検索条件
解析部の処理手順を示す図、FIG. 36 is a diagram showing a processing procedure of a search condition analysis unit according to the fifth embodiment of the present invention;

【図３７】従来の技術における文書登録システムの構成
を示す図、FIG. 37 is a diagram showing a configuration of a document registration system according to a conventional technique;

【図３８】従来の技術における構造インデックスの生成
過程を示す図である。FIG. 38 is a diagram showing a process of generating a structure index in a conventional technique.

[Explanation of symbols]

101 表示手段 102 記憶手段 103 入力手段 104 検索エンジン 105 検索データベース 106 構造解析部 107 構造情報保持文書データ作成部 108 索引情報作成部 109 検索条件解析部 110 索引情報検索部 111 結果一覧表示部 112 実体表示部 113 構造解析済みデータ格納部 114 構造情報保持文書データ格納部 115 索引情報格納部 116 実体データ格納部 117 一覧データ格納部 2001 文書型・構造テーブル作成部 2002 文書型・構造テーブル格納部 3701 文書構造解析プログラム 3702 構造インデックス作成プログラム 3703 構造化全文データ生成プログラム 3704 文字列インデックス作成プログラム 3705 解析済み文書データ格納領域 3706 構造インデックス格納領域 3707 構造化全文データ格納領域 3708 文字列インデックス格納領域 101 display means 102 storage means 103 input means 104 search engine 105 search database 106 structure analysis unit 107 structure information holding document data creation unit 108 index information creation unit 109 search condition analysis unit 110 index information search unit 111 result list display unit 112 entity display Unit 113 Structure analyzed data storage unit 114 Structure information holding document data storage unit 115 Index information storage unit 116 Entity data storage unit 117 List data storage unit 2001 Document type / structure table creation unit 2002 Document type / structure table storage unit 3701 Document structure Analysis program 3702 Structural index creation program 3703 Structured full-text data generation program 3704 Character string index creation program 3705 Parsed document data storage area 3706 Structural index storage area 3707 Structured full-text data storage area 3708 Character string index storage area

───────────────────────────────────────────────────── フロントページの続き (72)発明者鶴林健大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者中井信一大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5B075 ND03 ND35 NK02 NK43 NK54 UU05 ──────────────────────────────────────────────────の Continued on the front page (72) Inventor Ken Tsurubayashi 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. F-term (reference) 5B075 ND03 ND35 NK02 NK43 NK54 UU05

Claims

[Claims]

In a document retrieval system for handling a structured document, a structure analyzing means for analyzing a logical structure of a structured document to be registered, and a hierarchical structure of each element divided into logical structures by the structure analyzing means. Structure information holding document data creating means for creating structure information holding document data containing information; index information creating means for creating index information for performing a search from the created structure information holding document data; A search condition analyzing unit that analyzes a search condition and converts the search condition into a search condition suitable for a search process; and an index information search unit that performs a search using the index information based on the search condition. Structured document search system.

2. An apparatus according to claim 1, further comprising: a result list displaying means for displaying a search result list searched by said index information searching means; and an entity displaying means for displaying entity data of a document selected from said search result list. Claim 1 characterized by the following:
Structured document search system described in 1.

3. The structure information holding document data generating means converts the logical structure information of each element of the registration target document into a character string obtained by connecting a tag name, its appearance order, and a delimiter to the structure information holding document data. The structured document search system according to claim 1, wherein the structured document search system is included.

4. The structured document according to claim 1, wherein the structure information holding document data creating unit includes a character string indicating document type information of the registration target document in the structure information holding document data. Search system.

5. A document type for assigning a unique identifier to each of a document type and a tag name of a document to be registered before passing the structure-analyzed data analyzed by the structure analyzing means to the structure information holding document data creating means. 2. The structured document search system according to claim 1, further comprising a structure table creating unit.

6. The structured document search system according to claim 5, wherein said index information creating means creates index information including said identifier uniquely assigned to each of a document type and a tag name.

7. The document type / structure table creation means,
The structure information holding document data generating means holds information on a plurality of appearing tag names, and includes information on an appearance order in the structure information holding document data only for elements having a plurality of appearing tag names. Item 6. A structured document search system according to Item 5.