JPH05225240A

JPH05225240A - Document data base device

Info

Publication number: JPH05225240A
Application number: JP4234911A
Authority: JP
Inventors: Hiroshi Okumura; 洋奥村
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1991-09-25
Filing date: 1992-09-02
Publication date: 1993-09-03

Abstract

PURPOSE:To extract partial document contents from a structured document stored in a document data base. CONSTITUTION:A document retrieval means 12 reports information which designates a set of documents out of information, which are reported from a retrieval expression analysis means 11, to a document set retrieval means 13 and reports information related to extraction of the document structure out of these information to a document structure extracting means 14. The document set retrieval means 13 retrieves a set of documents in a document storage part 2 based on reported information. The document structure extracting means 14 extracts the document structure of each element of the set of documents retrieved by the document set retrieval means 13. A document allocating means 15 allocates unallocated documents extracted and reconstituted by the document structure extracting means 14.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、コンピュータを利用
した文書処理、特に既存文書の再利用による文書処理を
可能にする文書データベース装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document database apparatus which enables document processing using a computer, and more particularly, document processing by reusing existing documents.

【０００２】[0002]

【従来の技術】従来、計算機を利用した文書データベー
スは、特にオフィスを中心として、文書を蓄積・再利用
することを目的として多くのユーザによって利用されて
いる。文書を蓄積・再利用するための装置としては、文
書ファイリング装置があったが、文書ファイリング装置
では文書の格納場所が分からない場合は所望の文書を得
る事が出来ず、文書データベースとして利用することが
できなかった。2. Description of the Related Art Conventionally, a document database using a computer has been used by many users for the purpose of accumulating and reusing documents, especially in offices. There was a document filing device as a device for storing and reusing documents, but if the document filing device does not know the storage location of the document, the desired document cannot be obtained and it should be used as a document database. I couldn't.

【０００３】このような問題を解決すべく、従来の文書
データベース装置は文書ファイリング装置と文書検索装
置とから構成されていた。このような文書データベース
装置によって処理される文書には属性が登録されてい
る。その属性は、文書ファイリング装置に文書が蓄積さ
れる前にその文書に登録されるようになっている。また
属性の登録に際しては、システムによって自動的になさ
れても良いし、ユーザによって手動でなされても構わな
い。In order to solve such a problem, a conventional document database device is composed of a document filing device and a document search device. Attributes are registered in a document processed by such a document database device. The attribute is registered in the document before the document is stored in the document filing device. In addition, the registration of the attribute may be automatically performed by the system or manually by the user.

【０００４】ここで、属性とは、例えばユーザが与える
キーワード、システムによって自動的に登録されるキー
ワード、文書自体が持っている情報である作者や作成年
月日などである。このような属性はインデックスとして
登録される。Here, the attribute is, for example, a keyword given by the user, a keyword automatically registered by the system, an author or a creation date which is information held by the document itself. Such attributes are registered as indexes.

【０００５】そして、文書検索装置は、検索時には、上
述した属性をキーとしてインデックスを用いて、文書フ
ァイリング装置内を検索することにより、結果として文
書の存在場所、名前を得るとともに、その結果に基づい
て文書ファイリング装置から所望の文書全体を取り出し
て、それを検索結果としていた。At the time of the search, the document search apparatus searches the document filing apparatus by using the index with the above-mentioned attribute as a key, thereby obtaining the location and name of the document, and based on the result. Then, the whole desired document is taken out from the document filing apparatus and used as the retrieval result.

【０００６】しかしながら、上記従来の文書データベー
ス装置では文書全体を取り出すことしかできないという
問題があった。However, the conventional document database device described above has a problem that only the entire document can be retrieved.

【０００７】そのため、例えば文書の一定の部分のみを
必要とするユーザやアプリケーションは、一旦、文書を
リモートの文書ファイリング装置からローカル的な領域
（例えば記憶装置）に取り出した後、その文書全体を解
析し、その後、解析結果に基づいて必要な部分（文書内
容）のみを取り出さなければならない。このため、転送
されるデータ量が必要なデータ量に比べて非常に多くな
るので転送効率が悪くなってしまうとともに、文書全体
が得られてしまうので得られた文書を再編成する必要が
生じてしまい、ユーザにとって非常に大きな負担となっ
ていた。Therefore, for example, a user or an application that needs only a certain part of a document once extracts the document from a remote document filing device to a local area (for example, a storage device) and then analyzes the entire document. After that, only the necessary parts (document contents) must be taken out based on the analysis result. For this reason, the amount of data to be transferred is much larger than the required amount of data, so that the transfer efficiency is deteriorated and the entire document is obtained, so that it is necessary to reorganize the obtained document. This is a very heavy burden for the user.

【０００８】このような欠点を解決するために、関係デ
ータベースに代表される２次元の表を扱えるシステムを
利用したり、文書内にタグを入れたりすることによって
文書の内容を取り出したりする工夫もなされた。このよ
うなシステムでは、文書を２次元の表として扱い、その
表を関係データベースに埋め込んだり、あるいはタグを
文書中に埋め込んだりすることによって一部分を取り出
すという機能を実現していた。表は、例えば（タイト
ル：”文書データベース装置”）というように、（キ
ー：値）という「キー」と「値」の組みからなる組み合
わせの列として表現される。In order to solve such a drawback, a system that can handle a two-dimensional table typified by a relational database may be used, or a document may be taken out by inserting a tag in the document. Made In such a system, a document is treated as a two-dimensional table, and a function of embedding the table in a relational database or embedding a tag in the document to extract a part is realized. The table is represented as a sequence of combinations of a "key" and a "value" called "key: value" such as (title: "document database device").

【０００９】また、近年においては、上記のような構造
を持たない文書ではなく、論理的な構造を持った構造化
文書が扱われるようになった。文書が、例えば国際規格
ＯＤＡ（Open Document Architecture；ISO 8613）に準
拠している場合、図１３に示すような文書は、図１４に
示すような、フレーム内にブロックが配置されている入
れ子構造の割付け構造を持ち、かつ、図１５に示す様な
内部表現（文書構造）を持っている。図１５に示す文書
構造において、図中矢印Ａで示す点線より上部が文書の
論理構造であり、また図中矢印Ｂで示す点線より下部が
文書の割付け構造であり、更に図中矢印Ａで示す点線と
図中矢印Ｂで示す点線との間に位置している部分が文書
の内容（内容部）である。また文書の割付け構造は、図
１６に示割付けテンプレート（ＯＤＡでは共通割付け構
造という）を用いて、文書の論理構造から生成される。
これを割付けという。In recent years, structured documents having a logical structure have come to be used instead of documents having no structure as described above. When the document conforms to the international standard ODA (Open Document Architecture; ISO 8613), for example, the document shown in FIG. 13 has a nested structure in which blocks are arranged in a frame as shown in FIG. It has an allocation structure and has an internal representation (document structure) as shown in FIG. In the document structure shown in FIG. 15, the logical structure of the document is shown above the dotted line indicated by arrow A in the figure, the document layout structure is shown below the dotted line indicated by arrow B in the figure, and further indicated by arrow A in the figure. The portion located between the dotted line and the dotted line indicated by arrow B in the figure is the content (content portion) of the document. The layout structure of the document is generated from the logical structure of the document using the layout template shown in FIG. 16 (common layout structure in ODA).
This is called allocation.

【００１０】なお内部構造は、図１５に示すように、更
に細かい文書部品から構成されており、各々の文書部品
は親の文書部品、子の文書部品、孫の文書部品という様
な親子関係の木構造を有している。例えば図１５に示さ
れる、論理構造を構成する各構成要素および割付け構造
を構成する各構成要素はそれぞれ文書部品である。As shown in FIG. 15, the internal structure is made up of finer document parts, and each document part has a parent-child relationship such as a parent document part, a child document part, and a grandchild document part. It has a tree structure. For example, each constituent element that constitutes the logical structure and each constituent element that constitutes the layout structure shown in FIG. 15 are document parts.

【００１１】更に図１５に示すような文書構造は実際に
は、図１７に示すように、文書内容を保持（実際にはポ
インタなどによって指し示している）している論理構造
と、図１８に示すように、文書内容を保持（実際にはポ
インタなどによって指し示している）している割付け構
造とで表現される。Further, the document structure shown in FIG. 15 is actually shown in FIG. 17, and the logical structure for holding the document contents (actually indicated by a pointer) is shown in FIG. As described above, it is expressed as an allocation structure that holds (in fact, points by a pointer or the like) the document content.

【００１２】しかしながら、従来の文書データベース装
置では、文書単位でしか文書を取り扱えなかったので、
上記ＯＤＡに準拠している文書の如く、構造化文書の文
書構造を取り扱うことができなかった。However, in the conventional document database device, since the document can be handled only in the unit of document,
It was not possible to handle the document structure of a structured document such as the document conforming to the ODA.

【００１３】そのため次のような問題が生じていた。Therefore, the following problems have occurred.

【００１４】（１）文書が階層のない２次元の表として
扱われていたため、文書の階層的な論理構造を扱うこと
ができなかった。例えば、ある文書の論理構造がタイト
ル、著者名、段落から構成され、その段落が段落タイト
ル、段落内容から構成されるものとする。そして、この
ような文書の段落の段落タイトルを取り出すという検索
を行う場合、従来の文書データベース装置では、文書全
体を取り出すか、あるいは文書中の段落全体を取り出さ
なければならなかった。そのため、文書を取り出した後
の文書編集に手間がかかったり、文書の転送のデータ量
が多く転送効率が悪いという問題があった。(1) Since the document is treated as a two-dimensional table having no hierarchy, the hierarchical logical structure of the document cannot be handled. For example, assume that the logical structure of a document is composed of a title, an author name, and a paragraph, and that paragraph is composed of a paragraph title and paragraph contents. When performing a search for extracting the paragraph title of a paragraph of such a document, the conventional document database device must retrieve the entire document or the entire paragraph in the document. Therefore, there are problems that it takes time to edit the document after the document is taken out, and that the data amount of the document transfer is large and transfer efficiency is poor.

【００１５】（２）複数の文書の検索結果として一つの
文書を得る場合、従来の文書データベース装置では、複
数の文書から著者のみを取り出して一つの文書とする、
といったように検索結果を「表」として表現することし
かできず、「表」では表現できない階層構造を持った文
書を得ることはできなかった。このため、文書全体を取
り出した後、その文書を編集して一つの文書を作り出す
という作業が発生してしまうこととなり、その文書を取
り出した後に文書編集を行わなければならず、文書編集
に手間がかかったり、ユーザに大きな負担を強いること
になっていた。また文書の転送のデータ量が多く転送効
率が悪いという問題もあった。(2) When one document is obtained as a search result of a plurality of documents, in the conventional document database device, only the author is extracted from the plurality of documents and made into one document.
As described above, the search result can only be expressed as a "table", and a document having a hierarchical structure that cannot be expressed by the "table" cannot be obtained. For this reason, the work of taking out the entire document and then editing the document to create one document must be performed, and it is necessary to edit the document after taking out the document. It was supposed to take a lot of time and impose a heavy burden on the user. In addition, there is a problem that the transfer amount of the document is large and transfer efficiency is poor.

【００１６】[0016]

【発明が解決しようとする課題】このように上記従来の
文書データベース装置では、構造化文書の文書構造を取
り扱うことができず、文書単位でしか文書を取り扱うこ
とができなかったので、文書全体を取り出してから、そ
の文書を編集して一つの所望の文書を作り出すという作
業を行わなければならなかった。このため、取り出した
文書の内容編集及び割付けに手間がかかることとなり、
ユーザに大きな負担がかかっていた。As described above, in the conventional document database device described above, the document structure of the structured document cannot be handled, and the document can be handled only on a document-by-document basis. After taking it out, I had to edit the document and create one desired document. Therefore, it takes time and effort to edit and assign the contents of the retrieved document.
A heavy burden was placed on the user.

【００１７】また文書中の不必要な情報も同時に伝送さ
れるため、データ伝送量が多く、伝送効率が悪いという
欠点があった。Further, since unnecessary information in the document is also transmitted at the same time, there is a drawback that the data transmission amount is large and the transmission efficiency is poor.

【００１８】そこで、本発明は、文書データベースに蓄
積されている構造化文書から一部分の文書内容を抽出す
ることができると共に、該抽出した内容の割付け処理を
行うことができ、かつ、伝送データ量、伝送時間、文書
編集時間及び文書割付け時間を軽減することのできる文
書データベース装置を提供することを目的とする。Therefore, according to the present invention, a part of the document contents can be extracted from the structured document stored in the document database, the extracted contents can be assigned, and the transmission data amount can be increased. It is an object of the present invention to provide a document database device capable of reducing transmission time, document editing time and document allocation time.

【００１９】[0019]

【課題を解決するための手段】上記目的を達成するた
め、第１の発明の文書データベース装置は、構造化文書
の文書構造の抽出に関する情報を指定する指定手段と、
該指定手段により指定された前記情報に基づいて、検索
対象の構造化文書の文書構造を抽出する文書構造抽出手
段とを具えている。In order to achieve the above object, a document database device according to the first aspect of the present invention comprises a designation means for designating information relating to extraction of a document structure of a structured document,
Document structure extracting means for extracting the document structure of the structured document to be searched based on the information specified by the specifying means.

【００２０】第２の発明の文書データベース装置は、複
数の構造化文書を格納する文書格納手段と、文書集合を
指定する情報と文書構造の抽出に関する情報とを指定す
る指定手段と、前記文書格納手段から、前記文書集合を
指定する情報に適合する文書集合を検索する文書集合検
索手段と、前記文書構造の抽出に関する情報に基づい
て、前記文書集合検索手段により検索された文書集合の
各要素の文書構造を抽出する文書構造抽出手段とを具え
ている。A document database device according to a second aspect of the present invention is a document storage unit for storing a plurality of structured documents, a designation unit for designating a document set and information for extracting a document structure, and the document storage unit. A document set search means for searching a document set that matches the information designating the document set from the means, and each element of the document set searched by the document set search means based on the information related to the extraction of the document structure. And a document structure extracting means for extracting the document structure.

【００２１】第３の発明の文書データベース装置は、第
１の発明又は第２の発明において、前記文書構造抽出手
段により抽出された文書構造を割付ける文書割付け手段
を更に具えたことを特徴とする。A document database device of a third invention is characterized in that, in the first invention or the second invention, it further comprises a document allocating means for allocating the document structure extracted by the document structure extracting means. ..

【００２２】第４の発明の文書データベース装置は、第
１の発明又は第２の発明において、前記文書構造抽出手
段により抽出された文書構造を所定の表示形式に従って
表示する文書構造表示手段を更に具えている。A document database device according to a fourth invention is the document database device according to the first or second invention, further comprising document structure displaying means for displaying the document structure extracted by the document structure extracting means in a predetermined display format. I am.

【００２３】第５の発明の文書データベース装置は、第
１の発明又は第２の発明において、文書構造をそれぞれ
異なった表示形式で表示する複数の文書構造表示手段
と、当該各文書構造表示手段を１つ以上選択する表示選
択手段とを更に具えている。A document database device of a fifth invention is the document database device according to the first invention or the second invention, wherein a plurality of document structure display means for displaying the document structures in different display formats and the respective document structure display means are provided. It further comprises a display selection means for selecting one or more.

【００２４】第６の発明の文書データベース装置は、第
３の発明において、前記文書構造抽出手段により抽出さ
れた文書構造が属していた構造化文書に対応する文書割
付けテンプレートに従って、前記抽出された文書構造を
割付けることを特徴としている。A document database device according to a sixth aspect of the present invention is the document database apparatus according to the third aspect, wherein the extracted document is created according to a document allocation template corresponding to the structured document to which the document structure extracted by the document structure extracting means belongs. It is characterized by allocating structure.

【００２５】第７の発明の文書データベース装置は、第
３の発明において、複数の文書割付けテンプレートを格
納するテンプレート格納手段と、前記各文書割付けテン
プレート中の所望の文書割付けテンプレートを指定する
テンプレート指定手段とを更に具え、前記文書割付け手
段が、前記テンプレート指定手段により指定された文書
割付けテンプレートに基づいて、前記抽出された文書構
造を割付けることを特徴とする。A document database device of a seventh invention is the document database device of the third invention, wherein template storing means for storing a plurality of document allocation templates and template specifying means for specifying a desired document allocation template in each of the document allocation templates. Further, the document allocating means allocates the extracted document structure based on the document allocating template designated by the template designating means.

【００２６】第８の発明の文書データベース装置は、第
６の発明において、前記テンプレート格納手段に格納さ
れている文書割付けテンプレートを編集するテンプレー
ト編集手段を更に具えている。The document database apparatus of the eighth invention further comprises template editing means for editing the document allocation template stored in the template storage means in the sixth invention.

【００２７】[0027]

【作用】第１の発明によれば、検索対象の構造化文書か
ら、構造化文書の抽出に関する情報に基づいた文書構造
が抽出されるので、構造化文書の一部分のみを抽出する
ことができる。According to the first aspect of the present invention, since the document structure based on the information regarding the extraction of the structured document is extracted from the structured document to be searched, only a part of the structured document can be extracted.

【００２８】第２の発明によれば、複数の構造化文書中
から、文書集合を指定する情報に適合する文書集合が検
索され、更に、その文書集合の各要素（検索された各構
造化文書）から、構造化文書の抽出に関する情報に基づ
いた文書構造が抽出されるので、複数の構造化文書から
それぞれ、同一の文書構造を有する一部分の内容のみを
抽出することができる。According to the second aspect of the invention, a plurality of structured documents are searched for a document set that matches the information designating the document set, and each element of the document set (each searched structured document is searched). ), The document structure based on the information about the extraction of the structured document is extracted, so that only a part of the content having the same document structure can be extracted from each of the plurality of structured documents.

【００２９】第３の発明によれば、文書割付け手段によ
って、文書構造抽出手段により抽出された文書構造の内
容を割付けるようにしたので、割付け済みの文書を得る
ことができる。According to the third invention, the contents of the document structure extracted by the document structure extracting device are allocated by the document allocating device, so that the allocated document can be obtained.

【００３０】第４の発明によれば、文書構造表示手段に
よって、文書構造抽出手段により抽出された文書構造の
内容を表示するようにしたので、抽出された文書構造の
内容を視覚的に認識することができる。According to the fourth aspect of the invention, the content of the document structure extracted by the document structure extraction means is displayed by the document structure display means, so that the content of the extracted document structure is visually recognized. be able to.

【００３１】第５の発明によれば、表示選択手段によっ
て、複数の文書構造表示手段中から単数又は複数の文書
構造表示手段を選択し、この選択された文書構造表示手
によって、文書構造抽出手段により抽出された文書構造
の内容を表示するようにしたので、所望の表示形式で、
抽出された文書構造の内容を視覚的に認識することがで
きる。According to the fifth invention, the display selecting means selects one or a plurality of document structure displaying means from the plurality of document structure displaying means, and the selected document structure displaying means selects the document structure extracting means. Since the content of the document structure extracted by is displayed, in the desired display format,
The content of the extracted document structure can be visually recognized.

【００３２】第６の発明によれば、文書割付け手段によ
って、文書構造抽出手段により抽出された文書構造が属
していた構造化文書に対応する予め設定された文書割付
けテンプレートに従って、文書構造抽出手段により抽出
された文書構造を割付けるようにしたので、割付け済み
の文書を得ることができる。According to the sixth aspect of the present invention, the document structure extracting means operates according to a preset document allocation template corresponding to the structured document to which the document structure extracted by the document structure extracting means belongs. Since the extracted document structure is assigned, the assigned document can be obtained.

【００３３】第７の発明によれば、文書割付け手段が、
テンプレート指定手段により指定された文書割付けテン
プレートに従って、文書構造抽出手段により抽出された
文書構造を割付けるようにしたので、割付け済みの文書
を得ることができる。According to the seventh invention, the document allocating means is
Since the document structure extracted by the document structure extracting means is allocated according to the document allocation template specified by the template specifying means, the allocated document can be obtained.

【００３４】第８の発明によれば、文書テンプレート編
集手段は、テンプレート格納手段に格納されている文書
割付けテンプレートを作成／削除／変更する。文書割付
け手段は、文書テンプレート編集手段により編集された
文書割付けテンプレートに従って、文書構造抽出手段に
より抽出された文書構造を割付けるようにしたので、所
望のレイアウトに応じた割付け済みの文書を得ることが
できる。According to the eighth invention, the document template editing means creates / deletes / changes the document allocation template stored in the template storage means. Since the document allocating means allocates the document structure extracted by the document structure extracting means in accordance with the document allocation template edited by the document template editing means, an allocated document according to a desired layout can be obtained. it can.

【００３５】[0035]

【実施例】以下、本発明の実施例を添付図面を参照して
説明する。Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００３６】最初に本発明の第１の実施例を図１乃至図
１１を参照して説明する。First, a first embodiment of the present invention will be described with reference to FIGS.

【００３７】図１は本発明に係る文書データベース装置
の第１の実施例を機能ブロック図で示したものである。
同図において、文書データベース装置は、構造化文書検
索装置１、文書格納手段２、入出力制御装置３、キーボ
ード４、マウス５、ディスプレイ装置６を備えている。FIG. 1 is a functional block diagram showing a first embodiment of the document database device according to the present invention.
In the figure, the document database device comprises a structured document retrieval device 1, a document storage means 2, an input / output control device 3, a keyboard 4, a mouse 5, and a display device 6.

【００３８】文書格納手段２は、例えば磁気ディスクを
備えて構成される大容量ファイリングシステムであっ
て、ここには図１４に示したような内部構造を持った構
造化文書が複数格納されている。The document storage means 2 is a large-capacity filing system including, for example, a magnetic disk, and a plurality of structured documents having an internal structure as shown in FIG. 14 are stored therein. ..

【００３９】入出力制御装置３は、構造化文書検索装置
１とキーボード４とマウス５およびディスプレイ装置６
との間の入出力を制御するものである。例えばディスプ
レイ装置６の表示画面上で形成された文書は入出力制御
装置３を通じて文書格納手段２に格納される。また構造
化文書検索装置１によって文書格納手段２から読み出さ
れた構造化文書は、入出力制御装置３を通じてディスプ
レイ装置６へ伝送され表示画面上に表示される。The input / output control device 3 includes a structured document retrieval device 1, a keyboard 4, a mouse 5 and a display device 6.
It controls input and output between and. For example, a document formed on the display screen of the display device 6 is stored in the document storage means 2 through the input / output control device 3. The structured document read from the document storage unit 2 by the structured document search device 1 is transmitted to the display device 6 through the input / output control device 3 and displayed on the display screen.

【００４０】キーボード４及びマウス５は、各種データ
及びコマンド等を入力するために操作されるものであ
り、この操作による入力に応じた表示がディスプレイ装
置６の表示画面上になされる。The keyboard 4 and mouse 5 are operated to input various data and commands, and a display corresponding to the input by this operation is displayed on the display screen of the display device 6.

【００４１】検索式解析手段１１は、検索式（詳細は後
述する）を解析するためのものであり、指示された検索
式を文書構造抽出式と文書集合検索式とに分け、文書検
索手段１２に対して検索を実行すべき旨を指示する。The search expression analysis means 11 is for analyzing a search expression (details will be described later), divides the specified search expression into a document structure extraction expression and a document set search expression, and the document search means 12 Is instructed to perform a search.

【００４２】文書検索手段１２は、検索を実行するため
のものであり、検索すべき旨の指示に従って文書集合検
索手段１３、文書構造抽出手段１４、文書割付け手段１
５に所定の指示やデータ伝送を行う。The document search means 12 is for executing a search, and according to an instruction to search, the document set search means 13, the document structure extraction means 14, the document allocation means 1
Predetermined instructions and data transmission are given to 5.

【００４３】文書集合検索手段１３は、文書格納手段２
内の文書を検索するためのものであり、文書に予め与え
られている属性やキーワードに基づいて、与えられた条
件、つまり検索式内の文書集合検索式を満たす文書の集
合を検索する。The document set retrieval means 13 is the document storage means 2
This is for searching the documents in the document, and searches a set of documents satisfying a given condition, that is, a document set search formula in the search formula, based on an attribute or a keyword given in advance to the document.

【００４４】文書構造抽出手段１４は、構造化文書中の
文書部品を検索するためのものであり、指示された部
分、つまり検索式内の文書構造抽出式に基づく部分を抽
出した後、必要があれば構造の変更を行う。また構造化
文書に対応する文書割付けテンプレートも読み出す。The document structure extracting means 14 is for retrieving the document parts in the structured document, and is necessary after extracting the instructed part, that is, the part based on the document structure extracting formula in the retrieval formula. If so, change the structure. It also reads the document allocation template corresponding to the structured document.

【００４５】文書割付け手段１５は、文書割付テンプレ
ートに従って割付けを行うものであり、文書論理構造の
抽出を行った結果、割付けが崩れてしまった構造化文書
つまり文書論理構造に対して、文書割付テンプレートに
従って新たな割付け構造を付与する。The document allocating means 15 performs allocation according to the document allocation template. As a result of extracting the document logical structure, the allocation of the structured document, that is, the document logical structure, has been destroyed. According to, a new layout structure is added.

【００４６】文書構造表示手段１６は、割付け構造を持
たない構造化文書つまり文書論理構造の表示を行うため
のものであり、抽出された文書論理構造をディスプレイ
装置６上に表示させる。The document structure display means 16 is for displaying a structured document having no layout structure, that is, a document logical structure, and displays the extracted document logical structure on the display device 6.

【００４７】なお、この実施例では、文書割付け手段１
５と文書構造表示手段１６とが設けられているが、これ
ら各手段は同時に機能するようなことはなく、いずれか
の手段のみが機能するように設定されている。そのため
に、いずれかの手段を指定でき、かつ、切り替えられる
ようになっている。これは、ユーザによって指定された
情報が、入出力制御装置３、検索式解析手段１１を経て
文書検索手段１２に入力されると、文書検索手段１２が
その情報に基づいて上記各手段を切り替えるようになっ
ている。ここで、文書割付け手段１５が機能するよう指
定された場合には、抽出された文書論理構造が割付けさ
れて表示されることとなり、一方、文書構造表示手段１
６が機能するよう指定された場合は、割付け構造を持た
ない構造化文書つまり文書論理構造がが表示されること
となる。In this embodiment, the document allocating means 1
5 and the document structure display means 16 are provided, but these respective means do not function at the same time, and only one of them is set to function. Therefore, either means can be designated and can be switched. This is because when the information designated by the user is input to the document search means 12 via the input / output control device 3 and the search expression analysis means 11, the document search means 12 switches the above-mentioned means based on the information. It has become. Here, when the document allocating means 15 is designated to function, the extracted document logical structure is allocated and displayed, while the document structure displaying means 1 is displayed.
If 6 is specified to work, a structured document that has no layout structure, that is, a document logical structure will be displayed.

【００４８】勿論、このように切り替え方式ではなく、
上述した構成において、文書割付け手段１５を削除した
構成や、文書構造表示手段１８を削除した構成とするこ
とも可能である。また複数の文書構造表示手段を設けた
が、１つの文書構造表示手段のみ設けるようにしても良
い。Of course, the switching method is not
In the above-mentioned configuration, the document allocating unit 15 may be deleted or the document structure displaying unit 18 may be deleted. Although a plurality of document structure display means are provided, only one document structure display means may be provided.

【００４９】上述した構成において、文書格納手段２内
の複数の構造化文書から所望の検索結果を得るために
は、検索式（検索条件）を設定しなければならないの
で、次にその検索式について説明する。In the above-described structure, a search expression (search condition) must be set in order to obtain a desired search result from a plurality of structured documents in the document storage means 2. explain.

【００５０】図２は、検索式の一例を示したものであ
り、この検索式は、上述した指定手段の機能を果たすも
のであり、文書集合を指定する情報（文書集合検索式）
と、文書構造の抽出に関する情報（文書構造抽出式）と
から構成されている。FIG. 2 shows an example of the retrieval formula. This retrieval formula fulfills the function of the above-mentioned designating means, and is information for designating a document set (document set retrieval formula).
And information on document structure extraction (document structure extraction formula).

【００５１】図２に示す検索式において、「Ｆｒｏｍ」
と「Ｐｒｏｊｅｃｔ」とが対になって構成されており、
「Ｆｒｏｍ」２３Ａと「Ｐｒｏｊｅｃｔ」２３Ｂとが、
「Ｆｒｏｍ」２４Ａと「Ｐｒｏｊｅｃｔ」２４Ｂとが、
「Ｆｒｏｍ」２５Ａと「Ｐｒｏｊｅｃｔ」２５Ｂとが、
それぞれ対になっている。各「Ｆｒｏｍ」は検索の範囲
を指定するものであり、また各「Ｐｒｏｊｅｃｔ」は検
索の範囲から抽出する部分を指定するものであり、更に
「Ｃｏｌｌａｐｓｅ」２６は一段階層を浅くすることを
指定している。なお「Ｆｒｏｍ」２４Ａに指定されてい
る検索の範囲を示す情報「＊」２７は「何でも良い」と
いうことを表している。In the retrieval formula shown in FIG. 2, "From"
And "Project" are paired,
"From" 23A and "Project" 23B
"From" 24A and "Project" 24B
"From" 25A and "Project" 25B
Each is a pair. Each "From" designates a search range, each "Project" designates a part to be extracted from the search range, and "Collapse" 26 designates that one step layer is shallow. ing. Note that the information “*” 27 indicating the search range specified in the “From” 24A indicates that “anything is acceptable”.

【００５２】「Ｐｒｏｊｅｃｔ」及び「Ｆｒｏｍ」では
属性名を指定することによって範囲を指定しているが、
指示される属性名は構造化文書の作成に際して、文書部
品の識別が可能なように文書論理構造の文書部品に与え
られるものである。なお、この属性値は必ずしも文書部
品の名前でなくとも良く、作成日、作成者名等であって
も構わない。In "Project" and "From", the range is specified by specifying the attribute name.
The instructed attribute name is given to the document component of the document logical structure so that the document component can be identified when the structured document is created. The attribute value does not necessarily have to be the name of the document part, but may be the creation date, the creator name, or the like.

【００５３】また最も外側の「Ｆｒｏｍ」２３Ａは文書
集合の指定を行う。ここには文書集合検索式を指示する
ことができ、これは、文書集合検索手段１３に指示する
ことのできる文書集合検索式である。この文書集合検索
式は、文書の作成日、文書の種類別たとえば「特許」、
「オブジェクト指向言語に関する論文」などを示す情報
等、文書の属性値の指定でも構わない。また構造化文書
の構造化に立ち入った指定であっても構わない。これ
は、構造化文書を構成している構成要素を指定すること
であり、指定された構成要素を有する構造化文書全てが
検索対象となる。例えば構成要素としての「タイトル」
として「文書データベース装置」を指定することによ
り、構成要素としての「タイトル」が「文書データベー
ス装置」となっている、全ての構造化文書が検索対象と
なる。これにより同一技術分野の文書を検索することが
できる。The outermost "From" 23A designates a document set. A document set search formula can be specified here, and this is a document set search formula that can be specified to the document set search means 13. This document set search formula is based on the document creation date, document type such as "patent",
It is also possible to specify the attribute value of the document such as information indicating "paper on object-oriented language". Further, it may be a designation that goes into the structuring of the structured document. This is to specify the constituent elements that make up the structured document, and all structured documents having the specified constituent elements are to be searched. For example, "title" as a component
By specifying the "document database device" as, all structured documents whose "title" as a component is "document database device" are to be searched. As a result, documents in the same technical field can be searched.

【００５４】図２に示される検索式によれば、最も外側
の「Ｆｒｏｍ」２３Ａによって、文書集合として、文書
格納手段２内のディレクトリ”／データベース研究関連
／論文”中の、全ての文書が指示される。この文書集合
の各要素（各文書）に対して「Ｐｒｏｊｅｃｔ」２３Ｂ
によって、タイトル、著者名、段落の抽出の指示がなさ
れる。更に内側の「Ｆｒｏｍ」２５Ａによって段落が指
示され、内側の「Ｐｒｏｊｅｃｔ」２５Ｂによって段落
から更に段落タイトルを抽出することが指示される。ま
た、「Ｃｏｌｌａｐｓｅ」２７によって段落の直下にあ
る段落タイトルは文書論理根の直下に移動することも指
示されている。According to the retrieval formula shown in FIG. 2, all the documents in the directory "/ database research-related / thesis" in the document storage means 2 are designated as a document set by the outermost "From" 23A. To be done. “Project” 23B for each element (each document) of this document set
Instructs extraction of titles, author names, and paragraphs. Further, the inner "From" 25A designates a paragraph, and the inner "Project" 25B designates further extraction of a paragraph title from the paragraph. Further, "Collapse" 27 also instructs to move the paragraph title immediately below the paragraph to immediately below the document logical root.

【００５５】また図２に示した検索式は、文書集合の指
定と文書抽出の指定ができる検索式であればどのような
ものと代替しても構わない。The retrieval formula shown in FIG. 2 may be replaced with any retrieval formula as long as it can designate a document set and a document extraction.

【００５６】上述したような検索式は、キーボード４、
マウス５を操作することにより指定されディスプレイ装
置６の表示画面上に表示される。こうしてディスプレイ
装置６の表示画面上で検索式（検索条件）を設定する
と、この検索式は、入出力制御装置３を介して検索式解
析手段１１に通知される。The retrieval formula as described above is obtained by the keyboard 4,
It is designated by operating the mouse 5 and displayed on the display screen of the display device 6. When the search formula (search condition) is set on the display screen of the display device 6 in this way, the search formula is notified to the search formula analysis means 11 via the input / output control device 3.

【００５７】検索式解析手段１１は、例えば図２に示さ
れる検索式が通知されると、その検索式を解析して、文
書集合検索式（図２に示す文書集合検索式２１）と文書
構造抽出式（図２に示す文書構造抽出式２２）とを文書
検索手段１２に通知する。When the search expression shown in FIG. 2 is notified, the search expression analysis means 11 analyzes the search expression and extracts the document set search expression (document set search expression 21 shown in FIG. 2) and the document structure. The document retrieval means 12 is notified of the extraction formula (document structure extraction formula 22 shown in FIG. 2).

【００５８】なお、図２に示す文書構造抽出式２２は、
図３に示されるような木構造（以下解析木という）で表
現することができる。従って、この実施例では、検索式
解析手段１１から文書検索手段１２に通知される文書構
造の抽出に関する情報は、文書構造抽出式２２（図２参
照）のような形式ではなく、検索式解析手段１１によっ
て求められる図３に示すような解析木３０の形式で通知
されるようになっている。勿論、文書構造抽出式２２そ
のものを通知するようにしても良い。Note that the document structure extraction formula 22 shown in FIG.
It can be represented by a tree structure (hereinafter referred to as an analysis tree) as shown in FIG. Therefore, in this embodiment, the information about the extraction of the document structure notified from the search expression analysis unit 11 to the document search unit 12 is not in the format of the document structure extraction expression 22 (see FIG. 2) but the search expression analysis unit. The notification is made in the form of the analytic tree 30 shown in FIG. Of course, the document structure extraction formula 22 itself may be notified.

【００５９】さて、文書検索手段１２は、文書集合検索
式２１及び解析木３０が通知されると、文書集合検索式
２１を文書集合検索手段１３に通知する。文書集合検索
手段１３では、通知された文書集合検索式２１に基づい
て、文書格納手段２から該当する複数の構造化文書を探
し出す。勿論、場合によっては１つの文書のときも有り
得る。文書集合検索手段１３はその各構造化文書へのポ
インタの集合を文書検索手段１２に指示する。そのポイ
ンタの集合の一例を図４に示す。When the document set search formula 21 and the parse tree 30 are notified, the document search unit 12 notifies the document set search unit 13 of the document set search formula 21. The document set search means 13 searches for a plurality of relevant structured documents from the document storage means 2 based on the notified document set search formula 21. Of course, it may be one document in some cases. The document set search means 13 instructs the document search means 12 on the set of pointers to each structured document. An example of the set of pointers is shown in FIG.

【００６０】ここでは、文書集合検索手段１３によって
図４に示すように、構造化文書４１Ａ、４２Ａ、４３
Ａ、４４Ａ、４５Ａが検索され、その各文書を指し示す
ポインタ４１、４２、４３、４４、４５が、文書集合検
索手段１３から文書検索手段１２に指示されるものとす
る。Here, as shown in FIG. 4, the structured document 41A, 42A, 43 by the document set retrieval means 13.
It is assumed that A, 44A, and 45A are searched, and pointers 41, 42, 43, 44, and 45 pointing to the respective documents are instructed from the document set searching unit 13 to the document searching unit 12.

【００６１】文書検索手段１２は、指示されたポインタ
の集合を、検索式解析手段１１から既に通知されている
解析木３０とともに文書構造抽出手段１４に指示する。
すると文書構造抽出手段１４は、構造化文書へのポイン
タの集合および解析木３０が指示されると、各ポインタ
で指し示される構造化文書のファイル内部へポインタを
張るための情報を、構造化文書毎に順次読み込むととも
に、その読み込んだ情報に基づいて構造化文書に対して
文書構造抽出処理を行い、その後、抽出された文書論理
構造を文書検索手段１２に転送する。The document retrieving means 12 instructs the document structure extracting means 14 together with the parse tree 30 which has already been notified by the retrieval expression analyzing means 11 about the set of instructed pointers.
Then, when the set of pointers to the structured document and the parse tree 30 are instructed, the document structure extraction means 14 provides information for setting the pointer inside the file of the structured document pointed by each pointer. Each time it is sequentially read, document structure extraction processing is performed on the structured document based on the read information, and then the extracted document logical structure is transferred to the document search means 12.

【００６２】図５は、文書格納手段２内に格納されてい
る構造化文書のファイル内部へ、ポインタを張るための
情報の読み込みを説明するための図を示したものであ
る。FIG. 5 is a diagram for explaining the reading of the information for setting the pointer inside the file of the structured document stored in the document storage means 2.

【００６３】ここでは、図４に示したポインタ４１で指
し示されている構造化文書４１Ａが図１４に示した構造
化文書であった場合の、上記情報の読み込みについて説
明する。Here, the reading of the above information when the structured document 41A pointed by the pointer 41 shown in FIG. 4 is the structured document shown in FIG. 14 will be described.

【００６４】この実施例では、構造化文書４１Ａの文書
論理構造５１を構成する各構成要素を参照するためのイ
ンデックスが保持されているインデックスファイル５２
が、文書格納手段２に予め格納されているので、文書構
造抽出手段１４は、ポインタ４１で指し示される構造化
文書４１Ａに対応するインデックスファイル５２を自己
内に読み込むことにより、ポインタを張るための情報を
得ることができる。なお、この時点では構造化文書のフ
ァイルをオープンにする必要はないが、ポインタを張る
ときにはファイルをオープンにする必要がある。同様に
して、ポインタ４２〜４５で指し示される各構造化文書
のファイル内部へポインタを張るための情報を得ること
ができる。In this embodiment, an index file 52 holding an index for referring to each constituent element of the document logical structure 51 of the structured document 41A.
However, since the document storage unit 2 is stored in advance in the document storage unit 2, the document structure extraction unit 14 reads the index file 52 corresponding to the structured document 41A pointed to by the pointer 41 into itself to set the pointer. You can get information. At this point, it is not necessary to open the structured document file, but it is necessary to open the file when setting the pointer. In the same manner, it is possible to obtain information for setting the pointer inside the file of each structured document pointed by the pointers 42 to 45.

【００６５】また上述した方法以外に、インデックスフ
ァイル５２を設けず、構造化文書４１Ａのファイルをオ
ープンして、その文書の文書論理構造５１を読み込むよ
うにしても良い。なお、このとき文書の内容（内容部）
５３は読み込まれない。なぜならば、一般的に、文書論
理構造のデータ量に比べて文書内容のデータ量の方が非
常に多いからである。In addition to the method described above, the index file 52 may not be provided, and the file of the structured document 41A may be opened to read the document logical structure 51 of the document. At this time, the contents of the document (contents section)
53 is not read. This is because, in general, the data amount of the document contents is much larger than the data amount of the document logical structure.

【００６６】次に文書構造抽出手段１４による文書構造
抽出処理を説明する。Next, the document structure extraction processing by the document structure extraction means 14 will be described.

【００６７】（１）最初に、読み込んだ構造化文書のフ
ァイル内部へポインタを張るための情報（この例ではイ
ンデックスファイル５２）に基づいて、解析木３０（図
３参照）から、該当する構造化文書（この例では構造化
文書４１Ａ）に対して、その解析木のラベルに従ってポ
インタ（リンク）を張る。そのポインタが張られた状態
を図６に示す。図６から分かるように、解析木３０を構
成する「タイトル」、「著者名」、「段落」、「段落タ
イトル」の各ノードを表している各ラベルと一致する文
書論理構造５１の構成要素（文書部品）に対して、ポイ
ンタＰ１〜Ｐ４が設定されている。これは、解析木３０
と、解析木３０の要素に対応する論理構造５１における
構成要素とが対応付けされたポインタＰ１〜Ｐ４が、文
書構造抽出手段１４内に保持されたことを意味する。な
お、図６に示す状態においては、解析木３０は文書構造
抽出手段１４内に存在しており、また文書論理構造５１
及び文書の内容部５３は文書格納手段２内に存在してい
る。なおここでは構造化文書４１Ａの割付け構造につい
ては省略してある。(1) First, based on the information (index file 52 in this example) for setting a pointer inside the file of the read structured document, the corresponding structuring from the parse tree 30 (see FIG. 3). A pointer (link) is set to the document (the structured document 41A in this example) according to the label of the parse tree. The state in which the pointer is set is shown in FIG. As can be seen from FIG. 6, the constituent elements of the document logical structure 51 that match the respective labels representing the “title”, “author name”, “paragraph”, and “paragraph title” nodes that make up the parse tree 30 ( Pointers P1 to P4 are set for document parts). This is the parse tree 30
And the pointers P1 to P4 associated with the constituent elements in the logical structure 51 corresponding to the elements of the parse tree 30 are held in the document structure extracting means 14. Incidentally, in the state shown in FIG. 6, the parse tree 30 exists in the document structure extracting means 14 and the document logical structure 51.
The document content section 53 exists in the document storage means 2. The layout structure of the structured document 41A is omitted here.

【００６８】（２）次に、そのリンク情報に基づいて文
書論理構造をコピーする。そのコピーされた文書論理構
造の一例を図７に示す。文書構造抽出手段１４は、図７
に示す様に、コピーされた文書論理構造７１及び文書の
内容部７２を、自己内に読み込む。この実施例では、文
書論理構造７１及び文書の内容部７２のみを読み込むよ
うにしているので、文書論理構造７１に対応する割付け
構造は得られない（割付け構造は存在していない）。(2) Next, the document logical structure is copied based on the link information. An example of the copied document logical structure is shown in FIG. The document structure extracting means 14 is shown in FIG.
As shown in (1), the copied document logical structure 71 and the document content portion 72 are read into itself. In this embodiment, since only the document logical structure 71 and the document content portion 72 are read, the layout structure corresponding to the document logical structure 71 cannot be obtained (the layout structure does not exist).

【００６９】（３）更に、文書構造を変更するタグがつ
いている解析木のノードがあればその指示に従って構造
を変更する。(3) Further, if there is a node in the parse tree with a tag for changing the document structure, the structure is changed according to the instruction.

【００７０】図７に示すように、解析木３０には「Ｃｏ
ｌｌａｐｓｅ」３１のタグがついたノード段落３２が存
在するので、図７に示される文書論理構造７１は図８の
ように変更される。図８から分かるように、「Ｃｏｌｌ
ａｐｓｅ」３１のタグによって「段落タイトル」の階層
がなくなり、文書内容８１が段落の「ノード」８２の直
下に存在している。また図８に示すような、抽出された
文書論理構造は新たな構造化文書となる。As shown in FIG. 7, the parse tree 30 has "Co
Since there is a node paragraph 32 tagged with "llapse" 31, the document logical structure 71 shown in FIG. 7 is changed as shown in FIG. As can be seen from FIG.
The “apse” 31 tag eliminates the hierarchy of the “paragraph title”, and the document content 81 exists immediately below the “node” 82 of the paragraph. Further, the extracted document logical structure as shown in FIG. 8 becomes a new structured document.

【００７１】上記（１）〜（３）の処理で一つの文書に
対する文書構造抽出処理は終了することとなる。このと
き、必ずしも全ての構造化文書から文書論理構造が抽出
されるとは限らず、文書構造抽出式（解析木）に適合す
る文書論理構造を有する構造化文書からのみ、その文書
論理構造が抽出されることとなる。従って、抽出される
文書論理構造が「なし」という場合も有り得る。The document structure extraction process for one document is completed by the above processes (1) to (3). At this time, the document logical structure is not necessarily extracted from all the structured documents, and the document logical structure is extracted only from the structured document having the document logical structure that conforms to the document structure extraction formula (parse tree). Will be done. Therefore, the extracted document logical structure may be “none”.

【００７２】ところで、文書検索手段１２は構造化文書
の検索処理を終了すると、転送された文書論理構造の集
合が少なくとも１つ以上の文書である場合は、文書のリ
ストを作成し、そのリストを入出力制御装置３を通じて
ディスプレイ装置６に伝送する。これによりディスプレ
イ装置６の表示画面上には、検索結果（構造化文書とし
ての文書論理構造）の文書集合のリストが表示される。
その表示状態の一例を図９に示す。図９に示す例では、
検索結果ウィンドウ９１内に、５つの文書のリストが表
示されている。When the structured document search processing is completed, the document search means 12 creates a list of documents if the transferred document logical structure set is at least one or more documents, and creates the list. It is transmitted to the display device 6 through the input / output control device 3. As a result, a list of document sets of search results (document logical structure as a structured document) is displayed on the display screen of the display device 6.
An example of the display state is shown in FIG. In the example shown in FIG.
A list of five documents is displayed in the search result window 91.

【００７３】この表示状態で、キーボード４やマウス５
を操作することにより、表示されている文書論理構造
（構造化文書）のリストから所望の文書論理構造を指示
し、更に読み出しを指示すると、この指示情報が入出力
制御装置３に伝送される。この指示に応答した文書検索
手段１２は、文書構造抽出手段１４から転送された文書
論理構造の集合のデータから、対応する文書論理構造を
選び出す。In this display state, the keyboard 4 and the mouse 5 are
When a desired document logical structure is instructed from the displayed document logical structure (structured document) list by operating, and further reading is instructed, this instruction information is transmitted to the input / output control device 3. In response to this instruction, the document search means 12 selects a corresponding document logical structure from the data of the set of document logical structures transferred from the document structure extraction means 14.

【００７４】ここで、文書割付け手段１５が機能するよ
う指定されていた場合、文書検索手段１２は、選び出し
た文書論理構造を文書割付け手段１５に送出する。文書
割付け手段１５では、伝送されてきた文書論理構造に対
して、その文書論理構造（抽出された文書構造）が属し
ていた構造化文書に対応する文書割付けテンプレートに
従って文書割付け構造を付与する。この割付け結果は入
出力制御装置３を通じてディスプレイ装置６に表示され
る。この表示状態の一例を図１０に示す。図１０に示す
例では、「文書データベースとは」のリストが選択され
たので、文書ウィンドウ１０１内に、図８に示した文書
論理構造に対して文書割付け構造が付与された文書構造
の内容が表示されている。Here, if the document allocating means 15 is designated to function, the document searching means 12 sends the selected document logical structure to the document allocating means 15. The document allocating means 15 assigns a document allocation structure to the transmitted document logical structure according to a document allocation template corresponding to the structured document to which the document logical structure (extracted document structure) belongs. The result of this allocation is displayed on the display device 6 through the input / output control device 3. An example of this display state is shown in FIG. In the example shown in FIG. 10, since the list “What is a document database” is selected, the contents of the document structure in which the document allocation structure is added to the document logical structure shown in FIG. 8 are displayed in the document window 101. It is displayed.

【００７５】一方、文書構造表示手段１６が機能するよ
う指定されていた場合は、選び出された文書論理構造は
文書構造表示手段１６に伝送されることとなる。すると
文書構造表示手段１６は、伝送されてきた文書論理構造
を所定の表示方法で表示可能な構造にする。その表示可
能な構造は入出力制御装置３を通じてディスプレイ装置
６に表示される。この表示状態の一例を図１１に示す。
この図１１に示す例においては、表示ウィンドウ１１１
内に、図８に示す文書論理構造における「タイトル」、
「著者名」、「段落」の各名前に対応して該当する文書
内容を表示するという表示方法に従って、検索結果が表
示されている。On the other hand, when the document structure display means 16 is designated to function, the selected document logical structure is transmitted to the document structure display means 16. Then, the document structure display means 16 makes the transmitted document logical structure a structure which can be displayed by a predetermined display method. The displayable structure is displayed on the display device 6 through the input / output control device 3. An example of this display state is shown in FIG.
In the example shown in FIG. 11, the display window 111
"Title" in the document logical structure shown in FIG.
The search results are displayed according to the display method of displaying the corresponding document contents corresponding to the names of “author name” and “paragraph”.

【００７６】これにより、ディスプレイ装置６の表示画
面には検索結果である文書が表示されることとなる。As a result, the search result document is displayed on the display screen of the display device 6.

【００７７】上述した様に第１の実施例によれば、構造
化文書の検索条件と構造抽出とを指示することにより、
複数の構造化文書中から、文書集合を指定する情報に適
合する文書集合が検索され、更にその文書集合の各要素
（検索された各構造化文書）から、構造化文書の抽出に
関する情報に基づいた文書論理構造が抽出されるので、
複数の構造化文書からそれぞれ、同一の文書論理構造を
有する一部分の内容のみを抽出することができる。As described above, according to the first embodiment, by instructing the search condition and the structure extraction of the structured document,
A document set that matches the information that specifies the document set is searched from among a plurality of structured documents, and based on the information related to the extraction of the structured document from each element of the document set (each searched structured document). Since the document logical structure is extracted,
Only a part of the contents having the same document logical structure can be extracted from each of a plurality of structured documents.

【００７８】また文書割付け手段によって、文書構造抽
出手段により抽出された文書論理構造が属していた構造
化文書に対応する文書割付けテンプレートに従って、文
書構造抽出手段により抽出された文書論理構造を割付け
るようにしたので、割付け済みの文書を得ることができ
る。Further, the document allocating means allocates the document logical structure extracted by the document structure extracting means in accordance with the document allocation template corresponding to the structured document to which the document logical structure extracted by the document structure extracting means belongs. Since it is set, it is possible to obtain the allocated document.

【００７９】更に文書構造表示手段によって、文書構造
抽出手段により抽出された文書論理構造の内容を表示す
るようにしたので、抽出された文書論理構造の内容を視
覚的に認識することができる。Further, since the content of the document logical structure extracted by the document structure extracting means is displayed by the document structure displaying means, the content of the extracted document logical structure can be visually recognized.

【００８０】次に、本発明の第２の実施例を図１２を参
照して説明する。Next, a second embodiment of the present invention will be described with reference to FIG.

【００８１】図１２は、本発明に係る文書データベース
装置の第２の実施例を機能ブロック図で示したものであ
る。この機能ブロック図は、図１に示した第１の実施例
の機能ブロック図の構成において、文書構造表示手段１
６を削除し、また、テンプレート格納手段１２１、テン
プレート指定手段１２２、テンプレート編集手段１２
３、複数の文書構造表示手段１２４-1、１２４-2、１２
４-3、表示選択手段１２５０を追加した構成になってい
る。なお、この図において、図１に示す構成要素の同様
の機能を果たす部分には同一の符号を付している。FIG. 12 is a functional block diagram showing a second embodiment of the document database device according to the present invention. This functional block diagram corresponds to the structure of the functional block diagram of the first embodiment shown in FIG.
6, the template storing means 121, the template designating means 122, and the template editing means 12 are deleted.
3. Multiple document structure display means 124-1, 124-2, 12
4-3, the display selecting means 1250 is added. Note that, in this figure, the same reference numerals are given to parts that perform the same functions of the components shown in FIG.

【００８２】テンプレート格納手段１２１は、例えば磁
気ディスクを備えて構成されるファイリングシステムで
あって、異なったレイアウト構造を得るために用いられ
る複数の文書割付テンプレートを格納している。ここに
は、例えば図１６に示したような文書割付テンプレート
も格納されている。The template storage means 121 is a filing system including, for example, a magnetic disk, and stores a plurality of document allocation templates used to obtain different layout structures. A document allocation template as shown in FIG. 16, for example, is also stored here.

【００８３】テンプレート指定手段１２２は、文書割付
け手段１５が割付けを行うときの文書割付けテンプレー
トを指定するためのものであり、ユーザによって指定さ
れた文書割付けテンプレートを文書割付け手段１５に指
示する。このとき、ユーザから指定された指示情報に基
づいて、テンプレート格納手段１２１を検索して、実際
の文書割付けテンプレートを示す情報を得る。The template designating means 122 is for designating a document layout template when the document layout means 15 performs layout, and instructs the document layout means 15 to the document layout template designated by the user. At this time, the template storage means 121 is searched based on the instruction information designated by the user to obtain information indicating the actual document allocation template.

【００８４】テンプレート編集手段１２３は、テンプレ
ート格納手段１２１中の文書割付けテンプレートを変更
するものであり、ユーザからの編集対象の文書割付けテ
ンプレートの指示情報及び削除／変更の指示情報に基づ
いて、テンプレート格納手段１２１中の文書割付けテン
プレートに対して削除／変更の操作を実行する。またテ
ンプレート編集手段１２３は、ユーザからの文書割付け
テンプレートの作成指示情報に従って、新たな文書割付
けテンプレートを作成し、これをテンプレート格納手段
１２１に格納する。The template editing means 123 changes the document allocation template in the template storage means 121, and stores the template based on the instruction information of the document allocation template to be edited and the instruction information of deletion / modification from the user. The delete / change operation is executed for the document allocation template in the means 121. Further, the template editing unit 123 creates a new document allocation template according to the document allocation template creation instruction information from the user and stores it in the template storage unit 121.

【００８５】複数の文書構造表示手段１２４-1、１２４
-2、１２４-3は、割付け構造を持たない構造化文書つま
り文書論理構造の表示を行うためのものであり、抽出さ
れた文書論理構造をディスプレイ装置６上に表示させ
る。各文書構造表示手段は、それぞれ異なった表示形式
で文書論理構造である文書構造を表示する。A plurality of document structure display means 124-1 and 124
Reference numerals -2 and 124-3 are for displaying a structured document having no layout structure, that is, a document logical structure, and display the extracted document logical structure on the display device 6. Each document structure display means displays a document structure which is a document logical structure in a different display format.

【００８６】表示選択手段１２５は、表示形式と文書構
造表示手段との対応関係について予め知っており、ユー
ザからの表示形式を示す入力情報に応じて、上記各文書
構造表示手段中から所望の文書構造表示手段を選択す
る。なおこの実施例では、複数の表示形式を示す入力情
報が入力された場合は、表示選択手段１２５は、各表示
形式に対応する文書構造表示手段を選択することにな
る。従って、単数又は複数の表示形式で文書論理構造を
表示することができる。The display selecting means 125 knows in advance the correspondence between the display format and the document structure displaying means, and selects a desired document from the document structure displaying means according to the input information indicating the display format from the user. Select the structure display means. In this embodiment, when the input information indicating a plurality of display formats is input, the display selection means 125 selects the document structure display means corresponding to each display format. Therefore, the document logical structure can be displayed in a single or plural display format.

【００８７】係る構成において、文書データベース装置
の文書検索処理について説明する。なおこの第２の実施
例においては、検索条件（検索式）に適合する文書の検
索処理は、基本的には上記第１の実施例の処理と同様で
あるので、ここでは、その説明については省略する。The document search process of the document database device having the above configuration will be described. It should be noted that in the second embodiment, the process of searching for a document that matches the search condition (search formula) is basically the same as the process of the first embodiment, so a description thereof will be given here. Omit it.

【００８８】この第２の実施例が、第１の実施例と異な
る点は、大別して次の３点である。（１）複数の文書割付けテンプレート中から、所望の文
書割付けテンプレートの規則に従って文書論理構造を割
付ける。（２）複数の文書構造表示手段中から、所望の文書構造
表示手段によって文書論理構造を表示する。（３）新たな文書割付けテンプレートを追加したり、既
存の文書割付けテンプレートの削除／変更を行う。The second embodiment differs from the first embodiment in the following three points. (1) A document logical structure is allocated from a plurality of document allocation templates according to the rules of a desired document allocation template. (2) A document logical structure is displayed by a desired document structure display unit from a plurality of document structure display units. (3) Add a new document allocation template or delete / change an existing document allocation template.

【００８９】次に、これらの処理動作について説明す
る。Next, these processing operations will be described.

【００９０】抽出された文書論理構造に対して割付け処
理を施した結果を表示させたい場合は、ユーザは、キー
ボード４あるいはマウス５を操作して、少なくとも、
「検索式」と「文書論理構造に対する割付け処理を行う
べく割付け指示情報」と「所望の文書割付けテンプレー
トの指示情報」とを設定する。なお文書割付けテンプレ
ートの指定がなかった場合は予め設定された文書割付け
テンプレートが適用されるようになっている。When the user wants to display the result of the allocation process for the extracted document logical structure, the user operates the keyboard 4 or the mouse 5 to at least
"Search formula", "allocation instruction information for performing allocation processing for the document logical structure", and "desired document allocation template instruction information" are set. If the document allocation template is not designated, a preset document allocation template is applied.

【００９１】このようなユーザからの入力情報のうち、
所望の文書割付けテンプレートの指示情報は入出力制御
装置３を経てテンプレート指定手段１２２入力される。
テンプレート指定手段１２２では、その情報に基づい
て、テンプレート格納手段１２１内から、実際に適用さ
れる文書割付けテンプレートを検索し、そのテンプレー
トを示す情報（例えば、テンプレートを識別する識別情
報、テンプレートが格納されているアドレス情報など）
を文書割付け手段１５に通知する。Of the input information from such a user,
The instruction information of the desired document allocation template is input to the template specifying means 122 via the input / output control device 3.
Based on the information, the template designating means 122 searches the template storing means 121 for a document allocation template that is actually applied, and stores information indicating the template (for example, identification information for identifying the template and the template). Address information etc.)
Is notified to the document allocating means 15.

【００９２】これに対し、ユーザからの入力情報のう
ち、検索式及び割付け指示情報は、入出力制御装置３を
経て検索式解析手段１１に入力される。検索式解析手段
１１では、検索式情報に基づいて上述した第１の実施例
と同様の処理を実行し、この結果を文書検索手段１２に
通知する。割付け指示情報については、そのまま文書検
索手段１２に通知する。このとき文書検索手段１２で
は、割付け指示情報が入力されたので、検索式に適合し
た検索結果を文書割付け手段１５に通知するということ
を認識する。なお検索式に適合する検索結果を得るため
の検索処理は、上述した第１の実施例と同様である。On the other hand, of the input information from the user, the search formula and the allocation instruction information are input to the search formula analysis means 11 via the input / output control device 3. The search expression analysis means 11 executes the same processing as that of the above-described first embodiment based on the search expression information, and notifies the document search means 12 of the result. The allocation instruction information is notified to the document search means 12 as it is. At this time, the document search means 12 recognizes that since the layout instruction information has been input, the document layout means 15 is notified of the search result matching the search formula. The search process for obtaining a search result that matches the search formula is the same as in the first embodiment described above.

【００９３】ここで、例えば、図１６に示したような文
書割付けテンプレートが、テンプレート格納手段１２１
に格納され、指定されたとする。また、第１の実施例で
説明した様に図２に示した検索式が入力され、結果とし
て文書構造抽出手段１４によって図８に示した様な文書
論理構造が抽出され、更に文書検索手段による検索処理
が終了して、図９に示すような検索結果ウィンドウ９１
が表示され、その後、ユーザによって検索結果ウィンド
ウ９１内の「文書データベースとは？」が選択されたと
する。Here, for example, the document allocation template as shown in FIG.
Stored in and specified. Further, as described in the first embodiment, the search formula shown in FIG. 2 is input, and as a result, the document structure extracting unit 14 extracts the document logical structure as shown in FIG. When the search process is completed, the search result window 91 as shown in FIG. 9 is displayed.
Is displayed and then the user selects "What is a document database?" In the search result window 91.

【００９４】このような前提においては、文書論理構造
は文書検索実行手段１２に入力されることになるので、
文書検索実行手段１２では、入力された文書論理構造を
文書割付け手段１５に通知する。すると文書割付け手段
１５は、既に通知されている文書割付けテンプレートを
示す情報に基づいて、テンプレート格納手段１２１か
ら、適用すべき文書割付けテンプレート（この例では図
１６に示したような文書割付けテンプレート）を読み出
すとともに、その文書割付けテンプレートに従って、入
力された文書論理構造を割り付ける。この割付け結果
は、入出力制御装置３を経てディスプレイ装置６に入力
されるので、このディスプレイ装置６には、図１０に示
す文書ウィンドウ１０１の様な結果が表示される。On such a premise, since the document logical structure is input to the document search execution means 12,
The document search execution means 12 notifies the document allocation means 15 of the input document logical structure. Then, the document allocating means 15 determines the document allocating template (in this example, the document allocating template shown in FIG. 16) to be applied from the template storing means 121 based on the information indicating the notified document allocating template. At the same time as reading, the input document logical structure is allocated according to the document allocation template. Since the layout result is input to the display device 6 via the input / output control device 3, a result like the document window 101 shown in FIG. 10 is displayed on the display device 6.

【００９５】次に、割付け構造を持たない構造化文書つ
まり文書論理構造をそのまま表示させたい場合は、ユー
ザは、キーボード４或いはマウス５を操作して、少なく
とも、「検索式」と「文書論理構造の表示処理を行うべ
く表示指示情報」と「所望の表示形式を示す情報」とを
設定する。なお表示形式の指定がなかった場合は予め設
定された表示形式で表示されるようになっている。Next, in order to display the structured document having no layout structure, that is, the document logical structure as it is, the user operates the keyboard 4 or the mouse 5 to at least the "search formula" and the "document logical structure". "Display instruction information" and "information indicating a desired display format" are set in order to perform the display process. If the display format is not specified, the display format is set in advance.

【００９６】このようなユーザからの入力情報のうち、
所望の表示形式を示す情報は入出力制御装置３を経て表
示選択手段１２５に入力される。表示選択手段１２５で
は、その情報に基づいて、文書構造表示手段１２４-1、
１２４-2、１２４-3の中から所望の文書構造表示手段を
選択する。Of the input information from such a user,
Information indicating a desired display format is input to the display selection means 125 via the input / output control device 3. In the display selecting means 125, based on the information, the document structure displaying means 124-1,
A desired document structure display means is selected from 124-2 and 124-3.

【００９７】これに対し、ユーザからの入力情報のう
ち、検索式及び表示指示情報は入出力制御装置３を経て
検索式解析手段１１に入力される。検索式解析手段１１
では、検索式に基づいて上述した第１の実施例と同様の
処理を実行し、この結果を文書検索手段１２に通知す
る。また表示指示情報については、そのまま文書検索手
段１２に通知する。このとき文書検索手段１２では、表
示指示情報が入力されたので、検索式に適合する検索結
果を、文書構造表示手段に通知するということを認識す
る。なお検索式に適合する検索結果を得るための検索処
理は、上述した第１の実施例と同様である。On the other hand, among the input information from the user, the search expression and the display instruction information are input to the search expression analyzing means 11 via the input / output control device 3. Search expression analysis means 11
Then, the same processing as in the first embodiment described above is executed based on the search formula, and the result is notified to the document search means 12. Further, the display instruction information is directly notified to the document search means 12. At this time, the document search means 12 recognizes that since the display instruction information has been input, the document structure display means is notified of the search result matching the search expression. The search process for obtaining a search result that matches the search formula is the same as in the first embodiment described above.

【００９８】ここで、例えば、文書構造表示手段１２４
-1が文書論理構造をテーブル形式で表示するように設定
され、文書構造表示手段１２４-2が文書論理構造を木構
造形式で表示するように設定され、文書構造表示手段１
２４-3が文書論理構造をハイパーテキスト形式で表示す
るように設定されていたとする。また第１の実施例で説
明した様に図２に示した検索式が入力され、結果として
文書構造抽出手段１４によって図８に示した様な文書論
理構造が抽出され、更に、文書検索手段による検索処理
が終了して、図９に示した様な検索結果ウィンドウ９１
が表示され、その後、ユーザによって検索結果ウィンド
ウ９１内の「文書データベースとは？」が選択されたと
する。Here, for example, the document structure display means 124
-1 is set to display the document logical structure in a table format, document structure display means 124-2 is set to display the document logical structure in a tree structure format, and document structure display means 1
24-3 is set to display the document logical structure in a hypertext format. Further, as described in the first embodiment, the search formula shown in FIG. 2 is input, and as a result, the document structure extracting unit 14 extracts the document logical structure as shown in FIG. When the search process is completed, the search result window 91 as shown in FIG. 9 is displayed.
Is displayed and then the user selects "What is a document database?" In the search result window 91.

【００９９】このような前提においては、文書論理構造
は文書検索実行手段１２に入力されることになるので、
文書検索実行手段１２では、入力された文書論理構造を
各文書構造表示手段に通知する。ここで文書構造表示手
段１２４-1が選択されていたとすれば、文書構造表示手
段１２４-1は、文書論理構造を表示可能な構造にすべく
テーブル形式の構造に変更する。このテーブル形式の構
造は、入出力制御装置３を通してディスプレイ装置６に
表示される。このとき、ディスプレイ装置６には、図１
１に示した表示ウィンドウ１１１が表示される。Under such a premise, since the document logical structure is input to the document search execution means 12,
The document search execution means 12 notifies each document structure display means of the input document logical structure. If the document structure display means 124-1 is selected here, the document structure display means 124-1 changes the document logical structure to a tabular structure so that it can be displayed. This table format structure is displayed on the display device 6 through the input / output control device 3. At this time, the display device 6 is shown in FIG.
The display window 111 shown in 1 is displayed.

【０１００】また文書構造表示手段１２４-2が選択され
ていた場合は、文書論理構造は、図８に示した様な構造
に対応する木構造として表示ウィンドウ内に表示される
ことになる。また文書構造表示手段１２４-3が選択され
ていた場合は、文書論理構造は、図８に示した様な構造
に対応するハイパーテキストとして表示ウィンドウ内に
表示されることになる。さらに複数の文書構造表示手段
が選択されていた場合は、それぞれの表示手段の表示形
式に対応した検索結果が、それぞれの表示ウィンドウ内
に表示される。When the document structure display means 124-2 is selected, the document logical structure is displayed in the display window as a tree structure corresponding to the structure shown in FIG. When the document structure display means 124-3 is selected, the document logical structure is displayed in the display window as hypertext corresponding to the structure shown in FIG. Further, when a plurality of document structure display means are selected, the search result corresponding to the display format of each display means is displayed in each display window.

【０１０１】最後に、文書割付けテンプレートの編集処
理について説明する。Finally, the editing process of the document allocation template will be described.

【０１０２】ユーザからの文書割付けテンプレートの作
成指示情報、編集対象の文書割付けテンプレートの指示
情報及び削除／変更の指示情報等の編集処理操作に関す
る情報は、入出力制御装置３を経て、テンプレート編集
手段１２３に入力される。テンプレート編集手段１２３
では、編集対象の文書割付けテンプレートの指示情報及
び削除／変更の指示情報に基づいて、テンプレート格納
手段１２１中の文書割付けテンプレートに対して削除／
変更の操作を実行すると共に、文書割付けテンプレート
の作成指示情報に従って、新たな文書割付けテンプレー
トを作成し、これをテンプレート格納手段１２１に格納
する。Information relating to the editing processing operation such as the document allocation template creation instruction information from the user, the instruction information of the document allocation template to be edited, and the deletion / modification instruction information is passed through the input / output control device 3 to the template editing means. 123 is input. Template editing means 123
Then, based on the instruction information of the document allocation template to be edited and the instruction information of deletion / modification, the document allocation template in the template storage means 121 is deleted / deleted.
A change operation is executed, a new document allocation template is created in accordance with the document allocation template creation instruction information, and this is stored in the template storage unit 121.

【０１０３】上述したように第２の実施例によれば、文
書割付け手段が、テンプレート指定手段により指定され
た文書割付けテンプレートに従って、文書構造抽出手段
により抽出された文書論理構造を割付けるようにしたの
で、割付け済みの文書を得ることができる。As described above, according to the second embodiment, the document allocating means allocates the document logical structure extracted by the document structure extracting means in accordance with the document allocation template designated by the template designating means. Therefore, the allocated document can be obtained.

【０１０４】また、文書テンプレート編集手段が既存の
文書割付けテンプレートを編集、新たな文書割付けテン
プレートを作成するようにし、文書割付け手段が、作成
又は編集された文書割付けテンプレートに従って、文書
構造抽出手段により抽出された文書論理構造を割付ける
ようにしたので、所望のレイアウトに応じた割付け済み
の文書を得ることができる。Further, the document template editing means edits the existing document allocation template and creates a new document allocation template, and the document allocation means extracts the document structure extraction means according to the created or edited document allocation template. Since the assigned document logical structure is assigned, it is possible to obtain an assigned document according to a desired layout.

【０１０５】更に、選択表示選択手段によって選択され
た単数又は複数の文書構造表示手段によって、文書構造
抽出手段により抽出された文書論理構造の内容を表示す
るようにしたので、所望の表示形式で、抽出された文書
論理構造の内容を視覚的に認識することができる。Furthermore, since the contents of the document logical structure extracted by the document structure extracting means are displayed by the single or plural document structure displaying means selected by the selection display selecting means, the desired display format can be displayed. The content of the extracted document logical structure can be visually recognized.

【０１０６】[0106]

【発明の効果】以上説明したように本発明によれば、検
索対象の構造化文書から、構造化文書の抽出に関する情
報に基づいた文書構造が抽出されるので、構造化文書の
一部分のみを抽出することができるという利点がある。As described above, according to the present invention, since the document structure based on the information regarding the extraction of the structured document is extracted from the structured document to be searched, only a part of the structured document is extracted. There is an advantage that can be done.

【０１０７】また、複数の構造化文書中から、文書集合
を指定する情報に適合する文書集合が検索され、更に、
その文書集合の各要素（検索された各構造化文書）か
ら、構造化文書の抽出に関する情報に基づいた文書構造
が抽出されるので、複数の構造化文書からそれぞれ、同
一の文書構造を有する一部分の内容のみを抽出すること
ができるという利点がある。Further, a document set that matches the information designating the document set is searched from a plurality of structured documents, and further,
Since the document structure based on the information regarding the extraction of the structured document is extracted from each element (each structured document searched) of the document set, a part having the same document structure from each of the plurality of structured documents. There is an advantage that only the contents of can be extracted.

【０１０８】また、抽出された文書構造の内容を割付け
るようにしたので、割付け済みの文書を得ることができ
るという利点がある。Further, since the contents of the extracted document structure are assigned, there is an advantage that an assigned document can be obtained.

【０１０９】また、抽出された文書構造の内容を表示す
るようにしたので、抽出された文書構造の内容を視覚的
に認識することができるという利点がある。Further, since the contents of the extracted document structure are displayed, there is an advantage that the contents of the extracted document structure can be visually recognized.

【０１１０】また、選択された単数又は複数の文書構造
表示手段によって、抽出された文書構造の内容を表示す
るようにしたので、所望の表示形式で、抽出された文書
構造の内容を視覚的に認識することができるという利点
がある。Further, since the contents of the extracted document structure are displayed by the selected document structure display means or units, the contents of the extracted document structure can be visually displayed in a desired display format. It has the advantage of being recognizable.

【０１１１】また、抽出された文書構造を、その文書構
造が属していた構造化文書に対応する予め設定された文
書割付けテンプレートに従って割付けるようにしたの
で、割付け済みの文書を得ることができるという利点が
ある。Further, since the extracted document structure is allocated according to the preset document allocation template corresponding to the structured document to which the document structure belongs, the allocated document can be obtained. There are advantages.

【０１１２】また、文書割付け手段が、テンプレート指
定手段により指定された文書割付けテンプレートに従っ
て、文書構造抽出手段により抽出された文書構造を割付
けるようにしたので、割付け済みの文書を得ることがで
きるという利点がある。Further, since the document allocating means allocates the document structure extracted by the document structure extracting means in accordance with the document allocation template designated by the template designating means, it is possible to obtain the allocated document. There are advantages.

【０１１３】更には、抽出された文書構造を、編集され
た文書割付けテンプレートに従って割付けるようにした
ので、所望のレイアウトに応じた割付け済みの文書を得
ることができるという利点がある。Furthermore, since the extracted document structure is allocated according to the edited document allocation template, there is an advantage that an allocated document according to a desired layout can be obtained.

【０１１４】以上のことから、文書データベースに蓄積
されている構造化文書から一部分の文書内容を抽出する
ことができると共に、該抽出した内容の割付け処理を行
うことができ、かつ、伝送データ量、伝送時間、文書編
集時間及び文書割付け時間を軽減することのできる文書
データベース装置を提供することができるという効果を
奏する。From the above, a part of the document contents can be extracted from the structured document stored in the document database, and the extracted contents can be assigned and the transmission data amount, It is possible to provide a document database device that can reduce the transmission time, the document editing time, and the document allocation time.

[Brief description of drawings]

【図１】本発明に係る文書データベース装置の第１の実
施例を示す機能ブロック図。FIG. 1 is a functional block diagram showing a first embodiment of a document database device according to the present invention.

【図２】本実施例において用いられた検索式を示す図。FIG. 2 is a diagram showing a search formula used in this embodiment.

【図３】図２に示した検索式を解析して得られた解析木
を示す図。FIG. 3 is a diagram showing an analytic tree obtained by analyzing the search expression shown in FIG.

【図４】本実施例における構造化文書へのポインタの集
合を示す図。FIG. 4 is a diagram showing a set of pointers to structured documents according to the present exemplary embodiment.

【図５】本実施例における構造化文書のファイル内部へ
ポインタを張るための情報の読み込みを説明するための
図。FIG. 5 is a diagram for explaining reading of information for setting a pointer inside a file of a structured document according to the present exemplary embodiment.

【図６】本実施例における解析木と構造化文書とがリン
ク付けされた対応関係を示す図。FIG. 6 is a diagram showing a correspondence relationship in which a parse tree and a structured document are linked in this embodiment.

【図７】図６に示したリンク付けに基づいてコピーされ
た文書の論理構造を示す図。7 is a diagram showing a logical structure of a document copied based on the linking shown in FIG.

【図８】図２に示した検索式に基づいて抽出された文書
の論理構造の一例を示す図。FIG. 8 is a diagram showing an example of a logical structure of a document extracted based on the search formula shown in FIG.

【図９】本実施例における検索結果の文書集合のリスト
の一例を示す図。FIG. 9 is a diagram showing an example of a list of document sets of search results in this embodiment.

【図１０】本実施例における抽出された文書の文書論理
構造に対して文書の割付け構造を付与した検索結果の表
示例を示す図。FIG. 10 is a diagram showing a display example of a search result in which a document layout structure is added to a document logical structure of an extracted document in the present embodiment.

【図１１】本実施例における抽出された文書の文書論理
構造（文書の割付け構造無し）の検索結果の表示例を示
す図。FIG. 11 is a diagram showing a display example of a search result of a document logical structure (without a document layout structure) of an extracted document in the present embodiment.

【図１２】本発明に係る文書データベース装置の第２の
実施例を示す機能ブロック図。FIG. 12 is a functional block diagram showing a second embodiment of the document database device according to the present invention.

【図１３】構造化文書の一例を示す図。FIG. 13 is a diagram showing an example of a structured document.

【図１４】図１３に示した構造化文書の文書割付け構造
で表現されるレイアウトイメージを示す図。14 is a diagram showing a layout image represented by the document allocation structure of the structured document shown in FIG.

【図１５】図１３に示した構造化文書の内部表現（文書
構造）を示す図。15 is a diagram showing an internal representation (document structure) of the structured document shown in FIG.

【図１６】割付けテンプレートの一例を示す図。FIG. 16 is a diagram showing an example of an allocation template.

【図１７】図１３に示した構造化文書の文書論理構造を
示す図。17 is a diagram showing a document logical structure of the structured document shown in FIG.

【図１８】図１３に示した構造化文書の文書割付け構造
を示す図。18 is a diagram showing a document allocation structure of the structured document shown in FIG.

[Explanation of symbols]

１…構造化文書検索装置、２…文書格納手段、３…入出
力制御装置、４…キーボード、５…マウス、６…ディス
プレイ装置、１１…検索式解析手段、１２…文書検索手
段、１３…文書集合検索手段、１４…文書構造抽出手
段、１５…文書割付け手段、１６、１２４-1、１２４-
2、１２４-3…文書構造表示手段。１２１…テンプレー
ト格納手段、１２２…テンプレート指定手段、１２３…
テンプレート編集手段、１２５…表示選択手段。DESCRIPTION OF SYMBOLS 1 ... Structured document search device, 2 ... Document storage means, 3 ... Input / output control device, 4 ... Keyboard, 5 ... Mouse, 6 ... Display device, 11 ... Search expression analysis means, 12 ... Document search means, 13 ... Document Set retrieval means, 14 ... Document structure extraction means, 15 ... Document allocation means, 16, 124-1, 124-
2, 124-3 ... Document structure display means. 121 ... Template storage means, 122 ... Template designating means, 123 ...
Template editing means, 125 ... Display selecting means.

Claims

[Claims]

1. A designating unit for designating information regarding extraction of a document structure of a structured document, and document structure extraction for extracting a document structure of a structured document to be searched based on the information designated by the designating unit. A document database device comprising means.

2. A document storage unit for storing a plurality of structured documents, a designating unit for designating information for designating a document set and information for extracting a document structure, and designating the document set from the document storage unit. Document set retrieval means for retrieving a document set conforming to the information, and document structure extraction for extracting the document structure of each element of the document set retrieved by the document set retrieval means based on the information related to the extraction of the document structure A document database device comprising means.

3. The document database apparatus according to claim 1, further comprising a document allocating means for allocating the document structure extracted by said document structure extracting means.

4. The document database device according to claim 1, further comprising a document structure display means for displaying the document structure extracted by said document structure extraction means in a predetermined display format.

5. The apparatus further comprises a plurality of document structure displaying means for displaying the document structures in different display formats, and a display selecting means for selecting one or more of the respective document structure displaying means. The document database device according to item 1 or 2.

6. The document allocating means allocates the extracted document structure according to a document allocation template corresponding to a structured document to which the document structure extracted by the document structure extracting means belongs. The document database device according to claim 3.

7. The document allocating means further comprises template storing means for storing a plurality of document allocating templates, and template specifying means for specifying a desired document allocating template in each of the document allocating templates.
4. The document database device according to claim 3, wherein the extracted document structure is allocated based on the document allocation template designated by the template designating unit.

8. The document database apparatus according to claim 6, further comprising template editing means for editing the document allocation template stored in said template storage means.