JP2002073589A

JP2002073589A - Device and method for extracting document information and recording medium with its program recorded thereon

Info

Publication number: JP2002073589A
Application number: JP2000256353A
Authority: JP
Inventors: Daishiro Yokozeki; 大子郎横関; Takahiko Murayama; 隆彦村山; Shuichiro Yamamoto; 修一郎山本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-08-25
Filing date: 2000-08-25
Publication date: 2002-03-12
Anticipated expiration: 2020-08-25
Also published as: JP3607182B2

Abstract

PROBLEM TO BE SOLVED: To provide a document information extracting device/method extracting only a specified partial tree document from document information having a tree structure and to provide a recording medium where the program is recorded. SOLUTION: A rule processing part 11 generates an evaluation result holding part having the same tree structure as a document being the object of extraction in a tree structure information holding part 12. True/false value of the evaluation result holding part in the tree structure information holding part 12 is initialized to 'False'. Then, a section processing part 13 is started and a section processing is performed. When the performance of the section processing on one section is terminated, the true/false value of the evaluation result is logically synthesized with the content of the tree structure information holding part 12 and the evaluation results for the respective sections are accumulated and held. When the evaluation of all the sections is terminated, a partial tree where the true/false value being the evaluation result is 'True' is extracted from the document which is the object of extraction in the tree structure information holding part 12.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、木構造を有する
文書情報から、特定の部分木文書のみを抽出可能とする
文書情報抽出装置、方法、及びそのプログラムを記録し
た記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document information extracting apparatus and method capable of extracting only a specific subtree document from document information having a tree structure, and a recording medium storing the program.

【０００２】[0002]

【従来の技術】近年、文書情報を、ＨＴＭＬ（Hyper Te
xt Markup Language ）から発展し、その構造を木構造
で表すことが可能なＸＭＬ（Extensible markup langua
ge）により作成して管理することが盛んである。ＨＴＭ
ＬやＸＭＬによる文書情報は、タグと呼ばれる文書の定
義を示すマークを用いて、文書の構造を記述する。特
に、ＨＴＭＬでは利用できるタグの種類や意味が予め決
まっていたが、ＸＭＬでは使用者が独自に定義すること
が可能となった。よって、関連のある企業間でこのタグ
を独自に定義し、企業間のデータ交換を容易に行う工夫
がされている。2. Description of the Related Art In recent years, document information is stored in HTML (Hyper
XML (Extensible markup language) that evolved from the xt Markup Language (XML) and whose structure can be represented by a tree structure
ge) is very popular. HTM
The document information in L or XML describes the structure of the document using a mark called a tag indicating the definition of the document. In particular, in HTML, the types and meanings of tags that can be used are determined in advance, but in XML, users can define them independently. Therefore, a device has been devised which defines this tag independently between related companies and facilitates data exchange between the companies.

【０００３】[0003]

【発明が解決しようとする課題】しかし、従来、ＸＭＬ
で記述された木構造を有する文書情報から、任意に必要
な文書情報を取得するには、次のような方法があるが、
それぞれ以下の問題があった。まず、ＸＭＬの木構造を
木構造のまま扱い、プログラミング言語により必要な文
書を抽出する方法では、抽出する文書構造毎の専用のプ
ログラミングが必要であり、抽出に手間がかかる上、汎
用的でなく、多くのプログラムを必要とするので設備の
面からも効率が悪いという問題があった。また、木構造
から目的の部分を指定する手段と、目的の部分を発見し
た時に、削除する、しない等のアクションを実行するプ
ログラムとを組み合わせて実行する方法では、その対応
関係を抽出する文書構造に従い、その都度指定する必要
があり、同様に汎用的でなく、効率が悪いという問題が
あった。However, conventionally, XML
The following methods are available to arbitrarily obtain the necessary document information from the document information having the tree structure described in
Each had the following problems: First, the method of treating the XML tree structure as a tree structure and extracting necessary documents by a programming language requires dedicated programming for each document structure to be extracted, which requires time and effort for extraction and is not general-purpose. However, since many programs are required, there is a problem that efficiency is low in terms of facilities. Further, in a method of executing a combination of a means for designating a target portion from a tree structure and a program for performing an action such as deleting or not detecting a target portion when the target portion is found, a document structure for extracting a correspondence between the two is used. Therefore, there is a problem that it is not universal and the efficiency is low.

【０００４】本発明は、上記問題点に鑑みてなされたも
ので、木構造を有する文書情報から、特定の部分木文書
のみを抽出可能とする文書情報抽出装置を提供すること
を目的とする。より具体的には、文書情報を利用者に開
示可能な文書情報へ変換するための変換ルールを記述し
た定義体情報の木構造と、文書情報の木構造を比較し
て、文書情報から定義体情報と同一の構造と値を有する
部分木文書を抽出する文書情報抽出装置、方法、及びそ
のプログラムを記録した記録媒体を提供することを目的
とする。The present invention has been made in view of the above problems, and has as its object to provide a document information extracting apparatus capable of extracting only a specific subtree document from document information having a tree structure. More specifically, the tree structure of the definition information, which describes a conversion rule for converting the document information into document information that can be disclosed to the user, is compared with the tree structure of the document information. It is an object of the present invention to provide a document information extracting apparatus and method for extracting a subtree document having the same structure and value as information, and a recording medium on which the program is recorded.

【０００５】[0005]

【課題を解決するための手段】上記問題点を解決するた
めに、本発明は、木構造を有する文書情報から、利用者
毎の権限に応じて、文書情報の全てもしくは一部分を、
該利用者のそれぞれに開示する部分木文書として抽出す
る文書情報抽出装置であって、与えられた利用者の権限
情報と文書情報に従い、文書情報を該利用者に開示可能
な文書情報へ変換するための変換ルールを記述した定義
体情報を、定義体情報データベースから抽出する定義体
情報管理手段と、定義体情報管理手段で抽出された定義
体情報から、その木構造を解釈する定義体解釈手段と、
定義体解釈手段で解釈された定義体情報の木構造と、文
書情報の木構造を比較して、部分木文書を抽出する部分
木抽出手段とを設け、部分木抽出手段は、文書情報の木
構造に対応して各ノードの評価結果を保持する領域を備
え、初期値として評価結果に偽値の設定された第１の木
構造情報保持手段と、木構造の末端ノードを葉ノードと
し、定義体情報の木構造の指定された葉ノードの値と文
書情報の木構造の指定された葉ノードの値を比較し、両
者が一致する場合に、葉ノードの評価結果として第１の
木構造情報保持手段へ真値を記録する葉ノード処理手段
と、処理対象となるノードより下位のノードを子ノード
とし、葉ノードを含む子ノードの評価結果を論理合成
し、論理合成した結果を該処理対象となるノードの評価
結果として第１の木構造情報保持手段へ記録する枝ノー
ド処理手段と、定義体情報の木構造を構成するノードと
同一のノードを文書情報から検索して枝ノード処理手段
を起動し、枝ノード処理手段の評価終了後、該ノードよ
り下の枝に位置する部分木文書の評価を行い、真値のみ
を含む部分木の評価結果を真値と記録するセクション処
理手段と、セクション処理手段を起動し、セクション
処理手段の評価終了後、評価結果に真値が記録されたノ
ードからなる部分木を部分木文書として抽出するルール
処理手段とを有することを特徴とする。以上の構成によ
り、文書情報を利用者に開示可能な文書情報へ変換する
ための変換ルールを記述した定義体情報の木構造と、文
書情報の木構造を比較して、文書情報から定義体情報と
同一の構造と値を有する部分木文書を抽出することを可
能とする。In order to solve the above-mentioned problems, the present invention provides a method for converting all or a part of document information from document information having a tree structure according to the authority of each user.
A document information extracting device for extracting a document as a subtree document to be disclosed to each of the users, the document information being converted into document information that can be disclosed to the users in accordance with the given user authority information and document information. Definition information management means for extracting definition information describing conversion rules for the definition from a definition information database, and definition interpretation means for interpreting the tree structure from the definition information extracted by the definition information management means When,
A subtree extracting means for comparing the tree structure of the definition body information interpreted by the definition body interpreting means with the tree structure of the document information to extract a subtree document; A first tree structure information holding unit having an area for holding an evaluation result of each node corresponding to the structure, and a false value set in the evaluation result as an initial value; a terminal node of the tree structure being a leaf node; The value of the designated leaf node of the tree structure of the body information is compared with the value of the designated leaf node of the tree structure of the document information. If the two match, the first tree structure information is evaluated as the leaf node evaluation result. Leaf node processing means for recording a true value in the holding means, a node lower than the processing target node as a child node, logically synthesizing the evaluation result of the child node including the leaf node, and processing the logically synthesized result as the processing target First tree as the evaluation result of the node Branch node processing means for recording in the structure information holding means, and the same node as the node constituting the tree structure of the definition body information is searched from the document information to activate the branch node processing means, and after the evaluation of the branch node processing means is completed. A section processing unit that evaluates a subtree document located on a branch below the node and records an evaluation result of a subtree containing only true values as a true value; And a rule processing means for extracting, as a partial tree document, a partial tree composed of nodes having true values recorded in the evaluation result after the evaluation is completed. With the above configuration, the tree structure of the definition information that describes the conversion rule for converting the document information into the document information that can be disclosed to the user is compared with the tree structure of the document information. It is possible to extract a subtree document having the same structure and value as.

【０００６】本発明は、上記文書情報抽出装置におい
て、定義体情報管理手段で抽出された定義体情報に、抽
出対象となる部分木文書が複数指定されていた場合、部
分木抽出手段は、文書情報と第１の木構造情報保持手段
の両方の木構造に対応して各ノードの評価結果を保持す
る領域を備え、初期値として評価結果に偽値の設定され
た第２の木構造情報保持手段を更に備え、ルール処理手
段は、定義体情報に指定された抽出を希望する複数の部
分木文書に対応して、セクション処理手段の評価終了
後、第１の木構造情報保持手段へ記録された部分木文書
の評価結果を、第２の木構造情報保持手段へ論理合成し
て累積する手段と、第２の木構造情報保持手段の評価結
果に真値が記録されたノードからなる部分木を部分木文
書として抽出する手段とを有することを特徴とする。以
上の構成により、定義体情報に指定された抽出を希望す
る複数の部分木文書に対応した、抽出対象文書に対する
評価結果を累積して保持し、最後に一括して部分木文書
の抽出を行うことを可能とする。According to the present invention, in the above document information extracting apparatus, when a plurality of partial tree documents to be extracted are specified in the definition body information extracted by the definition body information managing means, the partial tree extracting means A second tree structure information holding unit which holds an evaluation result of each node corresponding to both tree structures of the information and the first tree structure information holding unit, and in which a false value is set in the evaluation result as an initial value The rule processing means is further provided, wherein the rule processing means is recorded in the first tree structure information holding means after completion of the evaluation of the section processing means, corresponding to the plurality of subtree documents desired to be extracted specified in the definition body information. Means for logically synthesizing the evaluation result of the partial tree document to the second tree structure information holding means and accumulating the result, and a partial tree comprising a node in which a true value is recorded in the evaluation result of the second tree structure information holding means. For extracting a document as a subtree document Characterized in that it has a. With the above configuration, the evaluation results for the extraction target document corresponding to the plurality of subtree documents desired to be extracted specified in the definition body information are accumulated and held, and finally the subtree document is extracted collectively. To make things possible.

【０００７】本発明は、上記文書情報抽出装置におい
て、定義体情報は、抽出対象とする要素、もしくは値
を、付箋の意味を持つタグと呼ばれる記号により指定す
る識別子と、識別子で特定される要素、もしくは値に対
して、抽出する条件を指定する制約条件とを含むマーク
アップ言語で記述し、マークアップ言語で記述された文
書情報から指定の部分木文書を抽出することを特徴とす
る。According to the present invention, in the above document information extracting apparatus, the definition body information includes an element for specifying an element or a value to be extracted by a symbol called a tag having a tag meaning, and an element specified by the identifier. Alternatively, a value is described in a markup language including a constraint condition specifying a condition to be extracted, and a specified subtree document is extracted from document information described in the markup language.

【０００８】本発明は、上記文書情報抽出装置におい
て、マークアップ言語は、ＸＭＬであることを特徴とす
る。According to the present invention, in the above document information extracting apparatus, the markup language is XML.

【０００９】本発明は、木構造を有する文書情報から、
利用者毎の権限に応じて、文書情報の全てもしくは一部
分を、該利用者のそれぞれに開示する部分木文書として
抽出する文書情報抽出方法であって、与えられた利用者
の権限情報と文書情報に従い、文書情報を該利用者に開
示可能な文書情報へ変換するための変換ルールを記述し
た定義体情報を、定義体情報データベースから抽出する
処理と、抽出された定義体情報から、その木構造を解釈
する処理と、解釈された定義体情報の木構造と、文書情
報の木構造を比較して、部分木文書を抽出する処理とを
含み、部分木文書を抽出する処理は、木構造の末端ノー
ドを葉ノードとし、定義体情報の木構造の指定された葉
ノードの値と文書情報の木構造の指定された葉ノードの
値を比較し、両者が一致する場合に、葉ノードの評価結
果として、文書情報の木構造に対応して各ノードの評価
結果を保持する領域を備え、初期値として評価結果に偽
値の設定された第１の木構造情報保持手段へ真値を記録
する葉ノード処理と、処理対象となるノードより下位の
ノードを子ノードとし、葉ノードを含む子ノードの評価
結果を論理合成し、論理合成した結果を該処理対象とな
るノードの評価結果として第１の木構造情報保持手段へ
記録する枝ノード処理と、定義体情報の木構造を構成す
るノードと同一のノードを文書情報から検索して枝ノー
ド処理を起動し、枝ノード処理の評価終了後、該ノード
より下の枝に位置する部分木文書の評価を行い、真値の
みを含む部分木の評価結果を真値と記録するセクション
処理と、セクション処理を起動し、セクション処理の評
価終了後、評価結果に真値が記録されたノードからなる
部分木を部分木文書として抽出するルール処理とを有す
ることを特徴とする。[0009] The present invention is based on document information having a tree structure.
A document information extraction method for extracting all or a part of document information as a subtree document to be disclosed to each user according to authority of each user, the authority information and document information of a given user being provided. Extracting, from a definition body information database, definition body information describing a conversion rule for converting document information into document information that can be disclosed to the user. And a process of comparing the tree structure of the interpreted definition body information and the tree structure of the document information to extract a subtree document. The process of extracting the subtree document includes a tree structure. The terminal node is a leaf node, and the value of the designated leaf node in the tree structure of the definition body information is compared with the value of the designated leaf node in the tree structure of the document information. As a result, document information A leaf node process for storing an evaluation result of each node corresponding to the tree structure, and recording a true value in the first tree structure information holding unit in which the evaluation result is set to a false value as an initial value; A node lower than the node to be processed is set as a child node, the evaluation result of the child node including the leaf node is logically synthesized, and the result of the logical synthesis is stored as the evaluation result of the node to be processed as the first tree structure information storage. The branch node processing to be recorded in the means and the same node as the node constituting the tree structure of the definition body information are searched from the document information to start the branch node processing. After the evaluation of the branch node processing is completed, Evaluate the subtree document located in the branch, and start the section processing that records the evaluation result of the subtree containing only the true value as a true value, and the section processing. After the evaluation of the section processing is completed, the true value is added to the evaluation result. Is recorded And having a rule processing for extracting the partial tree consisting of nodes as a partial tree document.

【００１０】本発明は、木構造を有する文書情報から、
利用者毎の権限に応じて、文書情報の全てもしくは一部
分を、該利用者のそれぞれに開示する部分木文書として
抽出する文書情報抽出方法に用いられるプログラムを記
録した記録媒体であって、プログラムは、与えられた利
用者の権限情報と文書情報に従い、文書情報を該利用者
に開示可能な文書情報へ変換するための変換ルールを記
述した定義体情報を、定義体情報データベースから抽出
する処理と、抽出された定義体情報から、その木構造を
解釈する処理と、解釈された定義体情報の木構造と、文
書情報の木構造を比較して、部分木文書を抽出する処理
とを含み、部分木文書を抽出する処理は、木構造の末端
ノードを葉ノードとし、定義体情報の木構造の指定され
た葉ノードの値と文書情報の木構造の指定された葉ノー
ドの値を比較し、両者が一致する場合に、葉ノードの評
価結果として、文書情報の木構造に対応して各ノードの
評価結果を保持する領域を備え、初期値として評価結果
に偽値の設定された第１の木構造情報保持手段へ真値を
記録する葉ノード処理と、処理対象となるノードより下
位のノードを子ノードとし、葉ノードを含む子ノードの
評価結果を論理合成し、論理合成した結果を該処理対象
となるノードの評価結果として第１の木構造情報保持手
段へ記録する枝ノード処理と、定義体情報の木構造を構
成するノードと同一のノードを文書情報から検索して枝
ノード処理を起動し、枝ノード処理の評価終了後、該ノ
ードより下の枝に位置する部分木文書の評価を行い、真
値のみを含む部分木の評価結果を真値と記録するセクシ
ョン処理と、セクション処理を起動し、セクション処理
の評価終了後、評価結果に真値が記録されたノードから
なる部分木を部分木文書として抽出するルール処理とを
コンピュータに実行させることを特徴とする。[0010] The present invention is based on document information having a tree structure.
A recording medium storing a program used in a document information extracting method for extracting all or a part of document information as a subtree document to be disclosed to each of the users according to authority of each user, wherein the program is Extracting, from a definition body information database, definition body information describing a conversion rule for converting document information into document information that can be disclosed to the user according to the given user authority information and document information; Including a process of interpreting the tree structure from the extracted definition body information, a process of comparing the interpreted tree structure of the definition body information and the tree structure of the document information, and extracting a subtree document, In the process of extracting a subtree document, the terminal node of the tree structure is set as a leaf node, and the value of the designated leaf node of the tree structure of the definition information is compared with the value of the designated leaf node of the tree structure of the document information. , When there is a match, a region for holding the evaluation result of each node corresponding to the tree structure of the document information is provided as the evaluation result of the leaf node, and a first value in which a false value is set in the evaluation result as an initial value is provided. Leaf node processing for recording a true value in the tree structure information holding means, a node lower than the processing target node is set as a child node, and an evaluation result of the child node including the leaf node is logically synthesized. Branch node processing to be recorded in the first tree structure information holding means as an evaluation result of a node to be processed, and branch node processing by retrieving the same node as the node constituting the tree structure of the definition body information from the document information After starting and evaluating the branch node processing, a section processing for evaluating a subtree document located on a branch below the node and recording the evaluation result of the subtree including only the true value as a true value; Launch , Characterized in that to execute after the end of evaluation section process, and a rule processing true value evaluation result to extract a subtree consisting recording node as a partial tree document to the computer.

【００１１】[0011]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態について説明する。まず、図１から図１０を用
いて、本発明の実施の形態を説明する。図１は、本発明
の実施の形態の文書情報抽出装置の構成を説明するブロ
ック図である。図１において、符号１は、指定された定
義体情報で抽出対象文書から部分木文書の抽出を行う部
分木抽出処理部である。符号２は、利用者のＩＤ情報等
から、利用者の権限情報を与える利用者情報管理部であ
る。符号３は、利用者のＩＤ情報等と、その利用者の権
限情報を一意に記録する利用者情報データベースであ
る。符号４は、抽出対象の文書情報と、利用者の権限情
報から定義体情報を指定する定義体情報管理部である。
符号５は、利用者の権限情報に対応して定義体情報を予
め記録された定義体情報データベースである。符号６
は、指定された定義体情報の木構造を解釈する定義体解
釈処理部である。Embodiments of the present invention will be described below with reference to the drawings. First, an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram illustrating a configuration of a document information extraction device according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a subtree extraction processing unit that extracts a subtree document from a document to be extracted using specified definition body information. Reference numeral 2 denotes a user information management unit that gives user authority information from user ID information and the like. Reference numeral 3 denotes a user information database that uniquely records user ID information and the like and authority information of the user. Reference numeral 4 denotes a definition body information management unit that specifies definition body information from the document information to be extracted and the authority information of the user.
Reference numeral 5 denotes a definition body information database in which definition body information is recorded in advance corresponding to the authority information of the user. Code 6
Is a definition body interpretation processing unit that interprets the tree structure of the specified definition body information.

【００１２】また、部分木抽出処理部１は、ルール処理
部１１と、木構造情報保持部１２（第２の木構造情報保
持手段）と、セクション処理部１３と、木構造情報一時
保持部１４（第１の木構造情報保持手段）と、枝ノード
処理部１５と、葉ノード処理部１６とから構成されてい
る。ルール処理部１１は、木構造情報保持部１２に、複
数の要素を記録する構造体記録部を、与えられた文書情
報の木構造と同一の木構造に構成した評価結果保持部を
作成する。また、定義体解釈処理部６より与えられる定
義体情報に指定された、抽出を希望する複数の部分木文
書に対応して、木構造情報一時保持部１４へ記録された
部分木文書の評価結果を、木構造情報保持部１２へ論理
合成して累積し、木構造情報保持部１２の評価結果に真
値が記録された部分木を部分木文書として抽出する。木
構造情報保持部１２は、複数の要素を記録する構造体記
録部から構成される記録部であって、部分木の評価結果
を論理合成して累積して保持するバッファである。セク
ション処理部１３は、木構造情報一時保持部１４に、複
数の要素を記録する構造体記録部を文書情報の木構造と
同一の木構造に構成した評価結果保持部を作成する。ま
た、定義体情報の木構造を構成するノードと同一のノー
ドを、文書情報から検索して枝ノード処理部１５を起動
し、枝ノード処理部１５の評価終了後、該ノードより下
の枝に位置する部分木文書の評価を行い、真値のみを含
む部分木の評価結果を真値と記録する。木構造情報一時
保持部１４は、複数の要素を記録する構造体記録部から
構成される記録部であって、部分木の評価結果を一時的
に保持するバッファである。枝ノード処理部１５は、木
構造において自分より下の枝に位置する、葉ノードを含
む子ノードの評価結果を論理合成し、木構造情報一時保
持部１４へ記録する。葉ノード処理部１６は、指定され
た定義体情報の木構造の末端部分の値と指定された文書
情報の木構造の末端部分の値を比較し、両者が一致する
場合に、真値を葉ノードの評価結果として木構造情報一
時保持部１４へ記録する。The subtree extraction processing unit 1 includes a rule processing unit 11, a tree structure information holding unit 12 (second tree structure information holding unit), a section processing unit 13, and a tree structure information temporary holding unit 14. (A first tree structure information holding unit), a branch node processing unit 15, and a leaf node processing unit 16. The rule processing unit 11 creates an evaluation result holding unit in the tree structure information holding unit 12 in which a structure recording unit that records a plurality of elements has the same tree structure as the tree structure of the given document information. In addition, the evaluation result of the partial tree document recorded in the tree structure information temporary holding unit 14 corresponding to the plurality of partial tree documents desired to be extracted specified in the definition body information given by the definition body interpretation processing unit 6 Are logically synthesized into the tree structure information holding unit 12 and accumulated, and a partial tree in which the true value is recorded in the evaluation result of the tree structure information holding unit 12 is extracted as a partial tree document. The tree structure information holding unit 12 is a recording unit including a structure recording unit that records a plurality of elements, and is a buffer that logically synthesizes and accumulates the evaluation results of the subtrees and holds the result. The section processing unit 13 creates an evaluation result holding unit in the tree structure information temporary holding unit 14 in which a structure recording unit that records a plurality of elements has the same tree structure as the tree structure of the document information. In addition, the same node as the node constituting the tree structure of the definition body information is searched from the document information, and the branch node processing unit 15 is started. The subtree document located is evaluated, and the evaluation result of the subtree including only the true value is recorded as the true value. The tree structure information temporary holding unit 14 is a recording unit including a structure recording unit that records a plurality of elements, and is a buffer that temporarily holds the evaluation result of the subtree. The branch node processing unit 15 logically synthesizes the evaluation results of the child nodes including the leaf nodes, which are located on the lower branch in the tree structure, and records the result in the tree structure information temporary holding unit 14. The leaf node processing unit 16 compares the value of the end portion of the tree structure of the specified definition body information with the value of the end portion of the tree structure of the specified document information. The evaluation result of the node is recorded in the tree structure information temporary holding unit 14.

【００１３】なお、部分木抽出処理部１の木構造情報保
持部１２と、木構造情報一時保持部１４と、更に、利用
者情報データベース３と、定義体情報データベース５
は、それぞれ、ハードディスク装置や光磁気ディスク装
置、フラッシュメモリ等の不揮発性のメモリや、ＣＤ−
ＲＯＭ等の読み出しのみが可能な記録媒体、ＲＡＭ（Ra
ndom Access Memory）のような揮発性のメモリ、あるい
はこれらの組み合わせによるコンピュータ読み取り、書
き込み可能な記録媒体より構成されるものとする。The tree structure information holding unit 12, the tree structure information temporary holding unit 14, the user information database 3, and the definition body information database 5
Is a non-volatile memory such as a hard disk device, a magneto-optical disk device, a flash memory, and a CD-ROM, respectively.
A read-only recording medium such as a ROM, a RAM (Ra
It is configured by a computer-readable and writable recording medium using a volatile memory such as an ndom access memory) or a combination thereof.

【００１４】また、部分木抽出処理部１のルール処理部
１１と、セクション処理部１３と、枝ノード処理部１５
と、葉ノード処理部１６と、更に、利用者情報管理部２
と、定義体情報管理部４と、定義体解釈処理部６は、そ
れぞれ、部分木抽出処理部１、あるいは利用者情報管理
部２、定義体情報管理部４、定義体解釈処理部６におい
て、専用のハードウェアにより実現されるものであって
もよく、また、メモリおよびＣＰＵ（中央演算装置）に
より構成され、上記の各部の機能を実現するためのプロ
グラムをメモリにロードして実行することにより、その
機能を実現させるものであってもよい。The rule processing unit 11, the section processing unit 13, and the branch node processing unit 15 of the subtree extraction processing unit 1
, Leaf node processing unit 16, and user information management unit 2
And the definition body information management unit 4 and the definition body interpretation processing unit 6, respectively, in the subtree extraction processing unit 1 or the user information management unit 2, the definition body information management unit 4, and the definition body interpretation processing unit 6, It may be realized by dedicated hardware, and may be configured by a memory and a CPU (Central Processing Unit), and load and execute a program for realizing the functions of the above-described units in the memory. , The function may be realized.

【００１５】また、本実施の形態の文書情報抽出装置に
は、周辺機器として入力装置、表示装置等（いずれも図
示せず）が接続されるものとする。ここで、入力装置と
はキーボード、マウス等の入力デバイスのことをいう。
表示装置とはＣＲＴ（Cathode Ray Tube）ディスプレイ
装置や液晶表示装置等のことをいう。It is assumed that an input device, a display device, and the like (neither is shown) are connected as peripheral devices to the document information extraction device of the present embodiment. Here, the input device refers to an input device such as a keyboard and a mouse.
The display device refers to a CRT (Cathode Ray Tube) display device, a liquid crystal display device, or the like.

【００１６】次に、本発明の実施の形態の動作を図２か
ら図１０を用いて説明する。まず、図２から図６を用い
て、抽出対象文書と定義体情報のＸＭＬにおける表現と
木構造表現について説明する。図２は、同実施の形態で
説明する抽出対象文書と抽出文書の一例を示す模式図で
あり、図３は、木構造情報保持部１２、あるいは木構造
情報一次保持部１４に構成された、抽出対象文書の評価
結果保持部を説明する模式図である。図２において、＜
Ａ＞や＜Ｂ＞が１つのノード（継ぎ目）を表し、＜Ａ＞
や＜／Ａ＞等の同一の記号で表されたタグの間に記述さ
れた＜Ｂ＞や＜Ｃ＞のタグが、そのノードより下に位置
する枝ノード、あるいは葉ノードを表す。また、図３に
おいて、木構造情報保持部１２、あるいは木構造情報一
次保持部１４に構成された抽出対象文書の評価結果保持
部は、構造体記録部Ｄ０１からＤ１８が図２に示した各
ノードに対応して配置され、木構造を構成している。ま
た図４の（ａ）は、ルール処理部１１が木構造情報保持
部１２に作成した評価結果保持部の記録内容であり、構
造体記録部により抽出対象文書へのリンクと評価結果の
真偽値を保持する。図４の（ｂ）は、セクション処理部
１３が木構造情報一時保持部１４に作成した評価結果保
持部の記録内容であり、構造体記録部により抽出対象文
書へのリンクと、木構造情報保持部１２へのリンクと、
評価結果の真偽値を保持する。更に、図５は、同実施の
形態で説明する定義体情報の一例を示す模式図である。
定義体情報では、抽出したい部分木文書の一塊りをセク
ション（＜ｓｅｃｔｉｏｎ＞）として表す。本発明の文
書情報抽出装置は、各セクションのタグ＜ｓｅｃｔｉｏ
ｎ＞と＜／ｓｅｃｔｉｏｎ＞の間に記述されたタグと同
じ構造、及び値を持つ部分木文書を、抽出対象文書から
抽出する。定義体情報は、例えば＜Ｂ＞と＜／Ｂ＞の間
に記述された、抽出を行う部分を示す識別子「＊」と抽
出を行う条件式「＝＝”ｂ１”」で示される。また、こ
の定義体情報を木構造表現へ変換したものが図６に示す
模式図である。図６は、ルール処理部１１で処理される
定義体情報であることを示すＲ０１の下に、抽出したい
部分木文書の一塊りである「ｓｅｃｔｉｏｎｘ」を示
すＲ０２と、「ｓｅｃｔｉｏｎｙ」を示すＲ１１が配
置されている。更に、それぞれの下に、ノードＡを示す
Ｒ０３や、ノードＢを示すＲ０４等が配置される。Next, the operation of the embodiment of the present invention will be described with reference to FIGS. First, the XML expression and the tree structure expression of the extraction target document and the definition body information will be described with reference to FIGS. FIG. 2 is a schematic diagram illustrating an example of an extraction target document and an extracted document described in the embodiment. FIG. 3 is a diagram illustrating a tree structure information holding unit 12 or a tree structure information primary holding unit 14. FIG. 9 is a schematic diagram illustrating an evaluation result holding unit of an extraction target document. In FIG.
A> and <B> represent one node (seam), and <A>
A tag <B> or <C> described between tags represented by the same symbol such as </A> or </A> indicates a branch node or a leaf node located below that node. In FIG. 3, the tree structure information storage unit 12 or the tree structure information primary storage unit 14 stores the evaluation result storage unit of the extraction target document, and the structure recording units D01 to D18 correspond to the nodes shown in FIG. , And constitute a tree structure. FIG. 4A shows the recorded contents of the evaluation result holding unit created in the tree structure information holding unit 12 by the rule processing unit 11, and the structure recording unit links to the extraction target document and determines whether the evaluation result is true or false. Hold the value. FIG. 4B shows the recorded contents of the evaluation result holding unit created in the tree structure information temporary holding unit 14 by the section processing unit 13. The link to the extraction target document and the tree structure information holding by the structure recording unit are shown. A link to part 12;
Holds the truth value of the evaluation result. FIG. 5 is a schematic diagram showing an example of definition body information described in the embodiment.
In the definition body information, a lump of a subtree document to be extracted is represented as a section (<section>). The document information extraction device according to the present invention provides a tag of each section <section.
A subtree document having the same structure and value as the tag described between <n> and </ section> is extracted from the extraction target document. The definition body information is represented by, for example, an identifier “*” indicating a portion to be extracted and a conditional expression “==“ b1 ”” described between <B> and </ B>. FIG. 6 is a schematic diagram in which the definition body information is converted into a tree structure expression. FIG. 6 shows R02 indicating “section x”, which is a lump of a subtree document to be extracted, and “section y” below R01 indicating that it is the definition body information processed by the rule processing unit 11. R11 is arranged. Further, below each, R03 indicating a node A, R04 indicating a node B, and the like are arranged.

【００１７】以上を踏まえて、図７から図１０のフロー
チャートを用いて、部分木抽出処理部１の動作を説明す
る。まず、部分木抽出処理部１には、与えられた利用者
情報から決定された定義体情報が定義体解釈処理部６を
介して、図６に示す木構造として与えられているものと
する。また、文書情報も予め与えられているものとす
る。図７は、部分木抽出処理部１のルール処理部１１の
動作を説明したフローチャートである。まず、ルール処
理部１１は、木構造情報保持部１２に、図３に示した抽
出対象文書と同じ木構造を持つ評価結果保持部を作成す
る（ステップＳ１）。作成した木構造情報保持部１２の
評価結果保持部の真偽値は”Ｆａｌｓｅ”に初期化する
（ステップＳ２）。次にセクション処理部１３を起動
し、セクション処理を実行する（ステップＳ３）。１つ
のセクションに対するセクション処理の実行が終了する
と、その評価結果の真偽値（”Ｆａｌｓｅ”か”Ｔｕｒ
ｅ”）を木構造情報保持部１２の内容と論理合成し、セ
クション毎の評価結果を累積して保持する（ステップＳ
４）。１つのセクションに対する評価結果を保持した
ら、全セクションに対して評価を行ったかどうかを判断
する（ステップＳ５）。もし、全セクションの評価が終
了していない場合、ステップＳ３へ戻り、次のセクショ
ンの評価を実行する（ステップＳ５のＮＯ）。もし、全
セクションの評価を終了した場合、ステップＳ６へ進む
（ステップＳ５のＹＥＳ）。全セクションの評価が終了
したら、木構造情報保持部１２の中で、評価結果の真偽
値が”Ｔｒｕｅ”である部分木を、抽出対象文書から抽
出して（ステップＳ６）終了する。Based on the above, the operation of the subtree extraction processing unit 1 will be described with reference to the flowcharts of FIGS. First, it is assumed that the subtree extraction processing unit 1 is provided with the definition body information determined from the given user information as a tree structure shown in FIG. It is also assumed that document information has been given in advance. FIG. 7 is a flowchart illustrating the operation of the rule processing unit 11 of the subtree extraction processing unit 1. First, the rule processing unit 11 creates an evaluation result holding unit having the same tree structure as the extraction target document shown in FIG. 3 in the tree structure information holding unit 12 (step S1). The Boolean value of the created evaluation result holding unit of the tree structure information holding unit 12 is initialized to "False" (step S2). Next, the section processing section 13 is activated to execute section processing (step S3). When the execution of the section process for one section is completed, a boolean value (“False” or “Turn”) of the evaluation result is obtained.
e ") is logically synthesized with the contents of the tree structure information holding unit 12, and the evaluation results for each section are accumulated and held (step S).
4). When the evaluation result for one section is held, it is determined whether or not evaluation has been performed for all sections (step S5). If the evaluation of all sections has not been completed, the process returns to step S3 to execute the evaluation of the next section (NO in step S5). If all sections have been evaluated, the process proceeds to step S6 (YES in step S5). When the evaluation of all sections is completed, a partial tree whose truth value is “True” in the evaluation result is extracted from the extraction target document in the tree structure information holding unit 12 (step S6), and the processing ends.

【００１８】図８は、部分木抽出処理部１のセクション
処理部１３の動作を説明したフローチャートである。ル
ール処理部１１から起動されたセクション処理部１３
は、木構造情報一時保持部１４に、図３に示した抽出対
象文書と同じ木構造を持つ評価結果保持部を作成する
（ステップＳ１１）。作成した木構造情報一時保持部１
４の評価結果保持部の真偽値は”Ｆａｌｓｅ”に初期化
する（ステップＳ１２）。木構造情報一時保持部１４の
初期化が終了したら、抽出対象文書から、＜ｓｅｃｔｉ
ｏｎ＞と＜／ｓｅｃｔｉｏｎ＞に間の最上位に記述され
たタグと同じ名前の子ノードを抽出対象のセクション１
つとして検索する（ステップＳ１３）。次に、抽出対象
文書に該当するノードが発見されたか否かを判断し（ス
テップＳ１４）、もし該当ノードが発見されなかった場
合（ステップＳ１４のＮＯ）、そのまま何もせずに、ル
ール処理部１１へ木構造情報一時保持部の記録内容を返
却する（ステップＳ１７）。もし、ステップＳ１４にお
いて、該当するノードが発見された場合（ステップＳ１
４のＹＥＳ）、枝ノード処理部１５を起動して枝ノード
処理を実行する（ステップＳ１５）。枝ノード処理が終
了すると、セクションを構成している部分木文書の評価
を行う。評価は、該ノードより下の枝に位置する部分木
文書の評価を行い、真値（Ｔｒｕｅ）のみを含む部分木
の評価結果を”Ｔｒｕｅ”と記録する（ステップＳ１
６）。セクションを構成する部分木文書の評価が終了し
たら、ルール処理部１１へ木構造情報一時保持部の記録
内容を返却して（ステップＳ１７）終了する。FIG. 8 is a flowchart for explaining the operation of the section processing unit 13 of the subtree extraction processing unit 1. Section processing unit 13 started from rule processing unit 11
Creates an evaluation result holding unit having the same tree structure as the extraction target document shown in FIG. 3 in the tree structure information temporary holding unit 14 (step S11). Temporary tree structure information storage unit 1
The Boolean value of the evaluation result holding unit of No. 4 is initialized to “False” (step S12). When the initialization of the tree structure information temporary storage unit 14 is completed, <secti>
ON> and </ section>, a child node with the same name as the tag described at the top level is extracted in section 1
One is searched (step S13). Next, it is determined whether or not a node corresponding to the document to be extracted is found (step S14). If the node is not found (NO in step S14), the rule processing unit 11 does nothing without performing any processing. The contents of the tree structure information temporary storage unit are returned (step S17). If a corresponding node is found in step S14 (step S1
(YES in No. 4), the branch node processing unit 15 is activated to execute the branch node processing (step S15). When the branch node processing is completed, the subtree documents constituting the section are evaluated. In the evaluation, a subtree document located on a branch below the node is evaluated, and the evaluation result of the subtree including only the true value (True) is recorded as “True” (step S1).
6). When the evaluation of the subtree documents constituting the section is completed, the contents recorded in the tree structure information temporary storage unit are returned to the rule processing unit 11 (step S17), and the processing ends.

【００１９】図９は、部分木抽出処理部１の枝ノード処
理部１５の動作を説明したフローチャートである。セク
ション処理部１３から起動された枝ノード処理部１５
は、枝ノード処理部１５内に、自分より下に位置する子
ノードの真偽値を保持する真偽値保持部を作成する（ス
テップＳ２１）。真偽値保持部は定義体情報の該当する
枝ノードに記述された子ノードの全種類（名前）に対応
して、その数だけ作成する。作成した真偽値保持部の真
偽値は”Ｆａｌｓｅ”に初期化する（ステップＳ２
２）。次に、自分より下に位置する子ノードが、そのノ
ードより下に更に枝を持つ枝ノードであるか、それより
下にノードを持たず、値のみを持つ葉ノードであるかを
判断する（ステップＳ２３）。もし、子ノードがそれよ
り下にノードを持たず、値のみを持つ葉ノードであった
場合（ステップＳ２３のＮＯ）、葉ノード処理部１６を
起動して葉ノード処理を実行する（ステップＳ３２）。
葉ノード処理の実行が終了したら、葉ノード処理の結果
を呼び出し元へ返却して（ステップＳ３３）終了する。
もし、子ノードがそれより下にノードを持つ枝ノードで
あった場合（ステップＳ２３のＹＥＳ）、更に枝ノード
処理を実行する（ステップＳ２４）。ステップＳ２４で
呼び出した枝ノード処理の実行が終了したら、真偽値保
持部から、今処理した子ノードの種類（名前）に対応す
る真偽値を取得し、該枝ノード処理の結果との論理和を
計算し、結果を同じ真偽値保持部へ書き戻す（ステップ
Ｓ２５）。次に、全ての子ノードに対して評価を行った
否かを判断する（ステップＳ２６）。もし、全ての子ノ
ードに対して評価を終了していない場合、ステップＳ２
３へ戻り、次の子ノードの評価を実行する（ステップＳ
２６のＮＯ）。もし、全ての子ノードに対して評価が終
了した場合（ステップＳ２６のＹＥＳ）、全種類（名
前）の子ノードの真偽値保持部が”Ｔｒｕｅ”であるか
否かを判断する（ステップＳ２７）。もし、ステップＳ
２７で、全種類（名前）の子ノードの真偽値保持部が”
Ｔｒｕｅ”であった場合（ステップＳ２７のＹＥＳ）、
木構造情報一時保持部１４の該当する枝ノードへ”Ｔｒ
ｕｅ”を設定する（ステップＳ２８）。そして、呼び出
し元へ”Ｔｒｕｅ”を返却して（ステップＳ２９）終了
する。もし、ステップＳ２７で、子ノードの真偽値保持
部のいずれかが”Ｆａｌｓｅ”であった場合（ステップ
Ｓ２７のＮＯ）、該当する枝ノードへ”Ｆａｌｓｅ”を
設定する（ステップＳ３０）。そして、呼び出し元へ”
Ｆａｌｓｅ”を返却して（ステップＳ３１）終了する。
なお、定義体情報の該当する枝ノードに記述された子ノ
ードの全種類（名前）と、各子ノードの評価結果の真偽
値を、FIG. 9 is a flowchart illustrating the operation of the branch node processing unit 15 of the subtree extraction processing unit 1. Branch node processing unit 15 started from section processing unit 13
Creates a boolean value holding unit that holds the boolean value of the child node located below itself in the branch node processing unit 15 (step S21). The true / false value holding units are created in the number corresponding to all types (names) of the child nodes described in the corresponding branch node of the definition body information. The boolean value of the created boolean value holding unit is initialized to "False" (step S2).
2). Next, it is determined whether the child node located below itself is a branch node having more branches below that node, or a leaf node having no value below it and having only a value ( Step S23). If the child node has no value below it and is a leaf node having only a value (NO in step S23), the leaf node processing unit 16 is activated to execute leaf node processing (step S32). .
When the execution of the leaf node processing ends, the result of the leaf node processing is returned to the caller (step S33), and the processing ends.
If the child node is a branch node having a node below it (YES in step S23), branch node processing is further performed (step S24). When the execution of the branch node process called in step S24 is completed, a boolean value corresponding to the type (name) of the currently processed child node is acquired from the boolean value holding unit, and the logical value with the result of the branch node process is obtained. The sum is calculated, and the result is written back to the same true / false value holding unit (step S25). Next, it is determined whether or not all child nodes have been evaluated (step S26). If the evaluation has not been completed for all the child nodes, step S2
3 and evaluate the next child node (step S
26 NO). If the evaluation has been completed for all child nodes (YES in step S26), it is determined whether or not the true / false value holding units of all types (names) of child nodes are “True” (step S27). ). If step S
27, the true / false value holding units of all types (names) of child nodes are "
True "(YES in step S27),
To the corresponding branch node of the tree structure information temporary holding unit 14, "Tr"
ue "is set (step S28), and" True "is returned to the caller (step S29), and the process ends.If any of the true / false value holding units of the child node is" False "in step S27. Is set (NO in step S27), “False” is set to the corresponding branch node (step S30), and “to the caller”
False "(step S31) and the process ends.
Note that all types (names) of the child nodes described in the corresponding branch node of the definition body information and the truth value of the evaluation result of each child node are

【数１】と表すと、上述の枝ノードの子ノードが、更にそれより
下にノードを持つ枝ノードであった場合の評価処理は次
式で表される。(Equation 1) When the child node of the above-mentioned branch node is a branch node having a node further below it, the evaluation processing is expressed by the following equation.

【数２】ここで、上式の論理和はステップＳ２５に、論理積はス
テップＳ２７にそれぞれ対応する。(Equation 2) Here, the logical sum of the above equation corresponds to step S25, and the logical product corresponds to step S27.

【００２０】図１０は、部分木抽出処理部１の葉ノード
処理部１６の動作を説明したフローチャートである。枝
ノード処理部１５から起動された葉ノード処理部１６
は、まず、定義体情報の葉ノードに該当する位置に、値
を特定する条件式が設定されているか否かを判断する
（ステップＳ４１）。もし、葉ノードの値を特定する条
件式が設定されていない場合（ステップＳ４１のＮ
Ｏ）、条件は満たされたものとして、条件式の判断ステ
ップＳ４２をパスしてステップＳ４３へ進む。もし、葉
ノードの値を特定する条件式が設定されている場合（ス
テップＳ４１のＹＥＳ）、葉ノードの持つ値が、定義体
情報に設定された条件式を満たすか否かを判断する（ス
テップＳ４２）。もし、葉ノードの値が条件式を満たさ
ない場合（ステップＳ４２のＮＯ）、該当する葉ノード
に対応する木構造情報一時保持部１４の真偽値を”Ｆａ
ｌｓｅ”に設定する（ステップＳ４７）。木構造情報一
時保持部１４の真偽値を設定したら、呼び出し元へ”Ｆ
ａｌｓｅ”を返却して（ステップＳ４８）終了する。も
し、葉ノードの値が条件式を満たす場合（ステップＳ４
２のＹＥＳ）、定義体情報に識別子が設定されているか
否かを判断する（ステップＳ４３）。ステップＳ４３に
おいて、識別子が設定されている場合（ステップＳ４３
のＹＥＳ）、該当する葉ノード以下の部分木に対応する
木構造情報一時保持部１４の真偽値を”Ｔｒｕｅ”に設
定する（ステップＳ４４）。次に、木構造情報一時保持
部１４の真偽値を設定したら、呼び出し元へ”Ｔｒｕ
ｅ”を返却して（ステップＳ４５）終了する。ステップ
Ｓ４３において、識別子が設定されていない場合（ステ
ップＳ４３のＮＯ）、該当する葉ノードに対応する木構
造情報一時保持部１４の真偽値を”Ｔｒｕｅ”に設定す
る（ステップＳ４６）。次に、木構造情報一時保持部１
４の真偽値を設定したら、呼び出し元へ”Ｔｒｕｅ”を
返却して（ステップＳ４５）終了する。FIG. 10 is a flowchart illustrating the operation of the leaf node processing unit 16 of the subtree extraction processing unit 1. Leaf node processing unit 16 started from branch node processing unit 15
First, it is determined whether or not a conditional expression for specifying a value is set at a position corresponding to a leaf node of the definition body information (step S41). If the conditional expression for specifying the value of the leaf node is not set (N in step S41)
O) Assuming that the condition is satisfied, the process passes the judgment step S42 of the conditional expression and proceeds to step S43. If the conditional expression for specifying the value of the leaf node is set (YES in step S41), it is determined whether or not the value of the leaf node satisfies the conditional expression set in the definition information (step S41). S42). If the value of the leaf node does not satisfy the conditional expression (NO in step S42), the Boolean value of the tree structure information temporary holding unit 14 corresponding to the leaf node is set to “Fa”.
1se ”(step S47). When the truth value of the tree structure information temporary holding unit 14 is set,“ F ”is sent to the caller.
"alse" is returned (step S48). If the value of the leaf node satisfies the conditional expression (step S4)
(YES in 2), it is determined whether or not an identifier is set in the definition body information (step S43). If the identifier is set in step S43 (step S43
YES), the true / false value of the tree structure information temporary holding unit 14 corresponding to the subtree below the corresponding leaf node is set to “True” (step S44). Next, after setting the true / false value of the tree structure information temporary storage unit 14, “Tru
e ”is returned (step S45). If the identifier is not set in step S43 (NO in step S43), the true / false value of the tree structure information temporary holding unit 14 corresponding to the leaf node is changed. (Step S46) Next, the tree structure information temporary storage unit 1 is set.
After setting the truth value of 4, "True" is returned to the caller (step S45), and the processing ends.

【００２１】以上説明したように、ルール処理部１１か
らセクション処理部１３、セクション処理部１３から枝
ノード処理部１５、枝ノード処理部１５から枝ノード処
理部１５、あるいは葉ノード処理部１６がそれぞれ起動
され、指定された部分木を評価して呼び出し元へ評価結
果を返却することで、セクション処理部の動作が一回終
了すると、抽出対象文書から、抽出したい部分木文書の
一塊り（セクション）に対する評価結果が得られる。ル
ール処理は、定義体情報に指定されたセクションの数だ
け、抽出対象文書の評価を繰り返し、すべて終了したな
らば、その各セクションの評価結果の論理和により、抽
出するべき部分木文書を特定する。As described above, the rule processing section 11 to the section processing section 13, the section processing section 13 to the branch node processing section 15, the branch node processing section 15 to the branch node processing section 15, or the leaf node processing section 16 respectively. When the operation of the section processing unit is completed once by being activated and evaluating the specified subtree and returning the evaluation result to the caller, a lump (section) of the subtree document to be extracted is extracted from the document to be extracted. The evaluation result for is obtained. In the rule processing, the evaluation of the extraction target document is repeated by the number of sections specified in the definition body information, and when all the processing is completed, the subtree document to be extracted is specified by the logical sum of the evaluation results of each section. .

【００２２】次に、上述の実施の形態で説明した文書情
報抽出装置を利用した実施例を、図１１から図１５を用
いて説明する。本実施例は、パーソナルコンピュータの
製造会社とその部品会社間で行う見積り文書の送受信
に、文書情報抽出装置を利用して、情報の振り分けを行
う場合を説明する。図１１は、本実施の形態の実施例を
説明する模式図である。図１１において、符号５０は、
文書情報抽出装置を利用した情報管理センタである。符
号５１は、文書情報抽出装置の利用者情報管理部を示
す。符号５２は、文書情報抽出装置の定義体情報管理部
を示す。符号５３は、文書情報抽出装置の部分木抽出処
理部を示す。符号５４は、文書情報Ｄａを示す。文書情
報Ｄａは見積の内容を指定する文書で、内容は図１２に
示すように、パーソナルコンピュータのメモリやディス
クといった構成部品毎に、部品名、型名、数量、価格と
いった見積に必要な情報がタグとして記述される。ま
た、文書情報Ｄａは、予めパーソナルコンピュータの製
造会社が、情報管理センタ５０へ登録するものとする。
また、符号５５と符号５６は、それぞれ、パーソナルコ
ンピュータの部品会社である見積先業者Ｂと見積先業者
Ｃに対して設定された定義体情報Ｒｂと定義体情報Ｒｃ
である。定義体情報Ｒｂと定義体情報Ｒｃも、予めパー
ソナルコンピュータの製造会社が、情報管理センタ５０
へ登録するものとする。定義体情報の内容は、それぞ
れ、図１３（ａ）に見積先業者Ｂに対する定義体情報
を、図１３（ｂ）に見積先業者Ｃに対する定義体情報を
示す。図１３（ａ）では、見積先業者Ｂがメモリを扱う
部品会社なので、メモリに関する情報のみを抽出するよ
うにタグが構成されている。同様に、図１３（ｂ）で
は、見積先業者Ｃがディスクを扱う部品会社なので、デ
ィスクに関する情報のみを抽出するようにタグが構成さ
れている。更に、符号５７と符号５８は、それぞれ、見
積先業者Ｂと見積先業者Ｃに対して抽出された抽出文書
Ｔｂと抽出文書Ｔｃである。また、符号６０と符号７０
は、それぞれ見積先業者Ｂと見積先業者Ｃを示す。Next, an embodiment using the document information extracting apparatus described in the above embodiment will be described with reference to FIGS. In the present embodiment, a description will be given of a case where information is distributed using a document information extraction device for transmitting and receiving an estimate document between a personal computer manufacturing company and its component company. FIG. 11 is a schematic diagram illustrating an example of the present embodiment. In FIG. 11, reference numeral 50 denotes
An information management center using a document information extraction device. Reference numeral 51 denotes a user information management unit of the document information extraction device. Reference numeral 52 denotes a definition body information management unit of the document information extraction device. Reference numeral 53 denotes a subtree extraction processing unit of the document information extraction device. Reference numeral 54 indicates the document information Da. The document information Da is a document for specifying the contents of the estimate. As shown in FIG. 12, the contents include information necessary for the estimate such as a part name, a model name, a quantity, and a price for each component such as a memory and a disk of a personal computer. Described as a tag. It is assumed that the document information Da is registered in the information management center 50 by a personal computer manufacturer in advance.
Reference numerals 55 and 56 denote the definition body information Rb and the definition body information Rc set for the quotation supplier B and the quotation supplier C, which are component companies of the personal computer, respectively.
It is. The definition body information Rb and the definition body information Rc are also transmitted to the information management center 50 by the personal computer manufacturer in advance.
Shall be registered to The contents of the definition body information are shown in FIG. 13A, respectively, and are shown in FIG. 13B. In FIG. 13A, the tag is configured to extract only information related to the memory, because the quotation supplier B is a parts company that handles the memory. Similarly, in FIG. 13B, since the quote supplier C is a parts company that handles disks, the tag is configured to extract only information on disks. Further, reference numerals 57 and 58 are an extracted document Tb and an extracted document Tc extracted for the quoted supplier B and the quoted supplier C, respectively. Also, reference numerals 60 and 70
Indicates the quoted supplier B and the quoted supplier C, respectively.

【００２３】今、情報管理センタ５０へ、見積先業者Ｂ
６０と見積先業者Ｃ７０がアクセスすると、利用者情報
管理部５２において、見積先業者Ｂ６０と見積先業者Ｃ
７０のそれぞれに対する利用者属性が特定されて、定義
体情報管理部５３へ出力される。定義体情報管理部５３
では、見積先業者Ｂ６０と見積先業者Ｃ７０のそれぞれ
の利用者属性と、予め登録された文書情報Ｄａ５４を用
いて、予め登録された定義体情報の中から、定義体情報
Ｒｂ５５と定義体情報Ｒｃ５６を抽出する。定義体情報
Ｒｂ５５と定義体情報Ｒｃ５６は、部分木抽出処理部５
１へ出力され（ここでは紙面の都合上、定義体解釈処理
部は省略する）、文書情報Ｄａ５４の部分木文書の抽出
が実行される。抽出された抽出文書Ｔｂ５７は、図１４
に示す。図１４では、図１２に示す見積の内容を指定す
る文書から、見積先業者Ｂ６０が扱うメモリに関する情
報のみが抽出されている。この抽出文書Ｔｂ５７は見積
先業者Ｂ６０へ送信される。抽出された抽出文書Ｔｃ５
８は、図１５に示す。図１５では、図１２に示す見積の
内容を指定する文書から、見積先業者Ｃ７０が扱うメモ
リに関する情報のみが抽出されている。この抽出文書Ｔ
ｃ５８は見積先業者Ｃ７０へ送信される。以上説明した
ように、本発明の文書情報抽出装置を用いると、一つの
見積の内容を指定する文書から、見積先業者毎に、特定
の部分のみを抽出した見積の内容を指定する情報を送付
することが可能となる。Now, to the information management center 50,
When the quotation partner C60 and the quotation partner C70 access, in the user information management unit 52, the quotation partner B60 and the quotation partner C
The user attribute for each of 70 is specified and output to the definition body information management unit 53. Definition body information management unit 53
Then, using the respective user attributes of the quotation supplier B60 and the quotation supplier C70 and the document information Da54 registered in advance, the definition body information Rb55 and the definition body information Rc56 from the definition body information registered in advance. Is extracted. The definition body information Rb55 and the definition body information Rc56 are stored in the subtree extraction processing unit 5
1 (here, the definition body interpretation processing unit is omitted due to space limitations), and the extraction of the subtree document of the document information Da54 is executed. The extracted document Tb57 is shown in FIG.
Shown in In FIG. 14, from the document specifying the contents of the estimate shown in FIG. This extracted document Tb57 is transmitted to the quote destination company B60. Extracted extracted document Tc5
8 is shown in FIG. In FIG. 15, from the document specifying the contents of the estimate shown in FIG. 12, only the information regarding the memory handled by the quote receiver C70 is extracted. This extracted document T
c58 is transmitted to the quotation supplier C70. As described above, by using the document information extraction device of the present invention, from a document that specifies the contents of one quotation, information that specifies the contents of the quotation by extracting only a specific part is sent to each quotation supplier. It is possible to do.

【００２４】また、上述の実施の形態で説明した文書情
報抽出装置、及び実施例で説明した情報管理センタは、
それぞれ、その機能を実現するためのプログラムを、コ
ンピュータ読みとり可能な記録媒体に記録して、この記
録媒体に記録されたプログラムをコンピュータシステム
に読み込ませ、実行することにより、上述の各装置にお
ける機能を実現しても良い。Further, the document information extracting device described in the above embodiment and the information management center described in the embodiment,
A program for realizing the function is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed, whereby the function of each of the above-described devices is performed. May be realized.

【００２５】ここで、上記「コンピュータシステム」と
は、ＯＳや周辺機器等のハードウェアを含み、さらにＷ
ＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）システムを利用
している場合であれば、ホームページ提供環境（あるい
は表示環境）も含むものとする。また、「コンピュータ
読みとり可能な記録媒体」とは、フロッピー（登録商
標）ディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ
等の可搬媒体、コンピュータシステムに内蔵されるハー
ドディスク等の記憶装置のことをいう。更に、「コンピ
ュータ読みとり可能な記録媒体」とは、インターネット
等のコンピュータネットワークや電話回線等の通信回線
を介してプログラムを送出する場合のように、短時間の
間、動的にプログラムを保持するもの（伝送媒体もしく
は伝送波）、その場合のサーバやクライアントとなるコ
ンピュータシステム内部の揮発性メモリのように、一定
時間プログラムを保持しているものも含むものとする。Here, the "computer system" includes an OS and hardware such as peripheral devices.
If a WW (World Wide Web) system is used, a homepage providing environment (or display environment) is also included. The "computer-readable recording medium" is a floppy (registered trademark) disk, a magneto-optical disk, a ROM, a CD-ROM.
And a storage device such as a hard disk built in a computer system. Further, a “computer-readable recording medium” refers to a medium that dynamically stores a program for a short time, such as when a program is transmitted through a computer network such as the Internet or a communication line such as a telephone line. (Transmission medium or transmission wave), in which case a program holding a program for a certain period of time, such as a volatile memory in a computer system serving as a server or a client, is also included.

【００２６】また、上記プログラムは、前述した機能の
一部を実現するためのものであっても良く、更に前述し
た機能をコンピュータシステムに既に記憶されているプ
ログラムとの組み合わせで実現できるもの、いわゆる差
分ファイル（差分プログラム）であっても良い。The above-mentioned program may be for realizing a part of the above-mentioned functions, and may be for realizing the above-mentioned functions in combination with a program already stored in a computer system, that is, a so-called program. It may be a difference file (difference program).

【００２７】[0027]

【発明の効果】以上の如く本発明によれば、木構造を有
する同一の文書情報から、利用者毎の権限に応じて、文
書情報のすべて、もしくは一部分を、該利用者のそれぞ
れに開示する部分木文書として抽出する文書情報抽出装
置に含まれる部分木抽出手段に、文書情報の評価結果を
記録する、偽値に初期化された第１の木構造情報保持手
段と、定義体情報の木構造の指定された末端部分の値と
文書情報の木構造の指定された末端部分の値を比較し、
両者が一致する場合に、真値を葉ノードの評価結果とし
て第１の木構造情報保持手段へ記録する葉ノード処理手
段と、木構造において自分より下の枝に位置する、葉ノ
ードを含む子ノードの評価結果を論理合成し、第１の木
構造情報保持手段へ記録する枝ノード処理手段と、第１
の木構造情報保持手段に、複数の要素を記録する構造体
記録部を文書情報の木構造と同一の木構造に構成した評
価結果保持部を作成し、同時に定義体情報の木構造を構
成するノードと同一のノードを文書情報から検索して枝
ノード処理手段を起動し、枝ノード処理手段の評価終了
後、該ノードより下の枝に位置する部分木文書の評価を
行い、真値のみを含む部分木の評価結果を真値と記録す
るセクション処理手段と、セクション処理手段を起動
し、セクション処理手段の評価終了後、評価結果保持部
の評価結果に真値が記録された部分木を部分木文書とし
て抽出するルール処理手段とを設けた。これにより、文
書情報を利用者に開示可能な文書情報へ変換するための
変換ルールを記述した定義体情報の木構造と、文書情報
の木構造を比較して、文書情報から定義体情報と同一の
構造と値を有する部分木文書を抽出することが可能とな
る。As described above, according to the present invention, all or a part of the document information is disclosed to each user from the same document information having a tree structure according to the authority of each user. A first tree structure information holding unit initialized to a false value, which records an evaluation result of the document information in a partial tree extracting unit included in the document information extracting apparatus for extracting as a partial tree document; Compare the value of the specified terminal part of the structure with the value of the specified terminal part of the tree structure of the document information,
If both match, a leaf node processing unit that records a true value as a leaf node evaluation result in the first tree structure information holding unit, and a child including a leaf node that is located on a branch below the tree structure in the tree structure. Branch node processing means for logically synthesizing the evaluation result of the node and recording the result in the first tree structure information holding means;
In the tree structure information holding means, an evaluation result holding unit in which a structure recording unit for recording a plurality of elements is configured to have the same tree structure as the tree structure of the document information is created, and the tree structure of the definition body information is configured at the same time. The same node as the node is searched from the document information to activate the branch node processing means. After the evaluation of the branch node processing means is completed, the subtree document located on the branch below the node is evaluated, and only the true value is obtained. A section processing means for recording the evaluation result of the subtree including the true value as a true value; and a section processing means for activating the section processing means. And a rule processing means for extracting the document as a tree document. As a result, the tree structure of the definition information, which describes the conversion rules for converting the document information into document information that can be disclosed to the user, is compared with the tree structure of the document information, and the document information is identical to the definition information. It is possible to extract a subtree document having the structure and value of

【００２８】本発明は、上記文書情報抽出装置におい
て、文書情報の評価結果を記録する、偽値に初期化され
た第２の木構造情報保持手段を更に備え、ルール処理手
段は、第２の木構造情報保持手段に、複数の要素を記録
する構造体記録部を文書情報の木構造と同一の木構造に
構成した評価結果保持部を作成する手段と、定義体情報
に指定された抽出を希望する複数の部分木文書に対応し
て、第１の木構造情報保持手段へ記録された部分木文書
の評価結果を、第２の木構造情報保持手段へ論理合成し
て累積する手段と、第２の木構造情報保持手段の評価結
果に真値が記録された部分木を部分木文書として抽出す
る手段とを有する構成とした。これにより、定義体情報
に指定された抽出を希望する複数の部分木文書に対応し
た抽出対象文書に対する評価結果を累積して保持し、最
後に一括して部分木文書の抽出を行うことが可能とな
る。従って、ＸＭＬ等のマークアップ言語で記述され
た、木構造を有する文書情報から、任意に必要な文書情
報を取得する場合に、従来の問題点であった、抽出する
文書構造毎の専用のプログラミングを行う必要がなく、
効率的に部分木文書を抽出できるという効果が得られ
る。また、従来の問題点であった、木構造から目的の部
分を指定する手段と、目的の部分を発見した時に、削除
する、しない等のアクションを実行するプログラムとを
組み合わせる対応関係を、抽出する文書構造に従い、そ
の都度指定する必要がなく、同様に効率的に部分木文書
を抽出できるという効果が得られる。According to the present invention, in the above-mentioned document information extracting apparatus, the apparatus further comprises a second tree structure information holding means for recording an evaluation result of the document information and initialized to a false value, wherein the rule processing means comprises: The tree structure information holding means includes means for creating an evaluation result holding unit in which a structure recording unit for recording a plurality of elements is configured to have the same tree structure as the tree structure of the document information, and extraction specified in the definition body information. Means for logically synthesizing the evaluation result of the partial tree document recorded in the first tree structure information holding means to the second tree structure information holding means for a plurality of desired partial tree documents and accumulating the result; Means for extracting, as a partial tree document, a partial tree in which the true value is recorded in the evaluation result of the second tree structure information holding means. As a result, it is possible to accumulate and hold the evaluation results for the extraction target documents corresponding to the multiple subtree documents that you want to extract specified in the definition body information, and finally extract the subtree documents collectively Becomes Therefore, when arbitrarily necessary document information is acquired from document information having a tree structure described in a markup language such as XML or the like, there is a problem in the related art that is a dedicated programming for each document structure to be extracted. Without having to do
The effect is obtained that the subtree document can be efficiently extracted. In addition, a correspondence between a means for designating a target portion from a tree structure and a program for executing an action such as deleting or not deleting a target portion when the target portion is found, which is a conventional problem, is extracted. According to the document structure, there is no need to specify each time, and similarly, an effect that a subtree document can be efficiently extracted can be obtained.

[Brief description of the drawings]

【図１】本発明の実施の形態の構成を説明するブロッ
ク図である。FIG. 1 is a block diagram illustrating a configuration of an embodiment of the present invention.

【図２】同実施の形態で説明する抽出対象文書と抽出
文書の一例を示す模式図である。FIG. 2 is a schematic diagram illustrating an example of an extraction target document and an extraction document described in the embodiment.

【図３】木構造情報保持部に構成された、抽出対象文
書の評価結果保持部を説明する模式図である。FIG. 3 is a schematic diagram illustrating an evaluation result holding unit for an extraction target document, which is configured in a tree structure information holding unit.

【図４】評価結果保持部の記録内容の一例を説明する
模式図である。FIG. 4 is a schematic diagram illustrating an example of recorded contents of an evaluation result holding unit.

【図５】同実施の形態で説明する定義体情報の一例を
示す模式図である。FIG. 5 is a schematic diagram showing an example of definition body information described in the embodiment.

【図６】図５で説明する定義体情報の木構造表現を示
す模式図である。FIG. 6 is a schematic diagram showing a tree structure expression of definition body information described in FIG. 5;

【図７】同実施の形態のルール処理部の動作を説明す
るフローチャートである。FIG. 7 is a flowchart illustrating an operation of a rule processing unit according to the embodiment.

【図８】同実施の形態のセクション処理部の動作を説
明するフローチャートである。FIG. 8 is a flowchart illustrating an operation of a section processing unit according to the embodiment.

【図９】同実施の形態の枝ノード処理部の動作を説明
するフローチャートである。FIG. 9 is a flowchart illustrating an operation of a branch node processing unit according to the embodiment;

【図１０】同実施の形態の葉ノード処理部の動作を説
明するフローチャートである。FIG. 10 is a flowchart illustrating an operation of a leaf node processing unit according to the embodiment.

【図１１】同実施の形態の実施例を説明する模式図で
ある。FIG. 11 is a schematic diagram illustrating an example of the same embodiment.

【図１２】同実施例で説明する抽出対象文書を説明す
る模式図である。FIG. 12 is a schematic diagram illustrating an extraction target document described in the embodiment.

【図１３】同実施例で説明する定義体情報を説明する
模式図である。FIG. 13 is a schematic diagram illustrating definition body information described in the embodiment.

【図１４】同実施例で抽出された部分木文書を説明す
る模式図である。FIG. 14 is a schematic diagram illustrating a subtree document extracted in the embodiment.

【図１５】同実施例で抽出された部分木文書を説明す
る模式図である。FIG. 15 is a schematic diagram illustrating a subtree document extracted in the embodiment.

[Explanation of symbols]

１部分木抽出処理部２利用者情報管理部３利用者情報データベース４定義体情報管理部５定義体情報データベース６定義体解釈処理部１１ルール処理部１２木構造情報保持部１３セクション処理部１４木構造情報一時保持部１５枝ノード処理部１６葉ノード処理部５０情報管理センタ５１部分木抽出処理部５２利用者情報管理部５３定義体情報管理部５４文書情報Ｄａ５５定義体Ｒｂ５６定義体Ｒｃ５７抽出文書Ｔｂ５８抽出文書Ｔｃ６０見積先業者Ｂ７０見積先業者Ｃ DESCRIPTION OF SYMBOLS 1 Subtree extraction processing part 2 User information management part 3 User information database 4 Definition body information management part 5 Definition body information database 6 Definition body interpretation processing part 11 Rule processing part 12 Tree structure information holding part 13 Section processing part 14 Tree Structure information temporary storage unit 15 Branch node processing unit 16 Leaf node processing unit 50 Information management center 51 Subtree extraction processing unit 52 User information management unit 53 Defined body information management unit 54 Document information Da 55 Defined body Rb 56 Defined body Rc 57 Extracted document Tb 58 Extracted document Tc 60 Estimated supplier B 70 Estimated supplier C

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 17/30 ４１９Ｇ０６Ｆ 17/30 ４１９Ａ (72)発明者山本修一郎東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 5B009 QA06 VA02 5B075 ND02 ND35 PP02 PP03 PQ02 UU06 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 17/30 419 G06F 17/30 419A (72) Inventor Shuichiro Yamamoto 2-3-3 Otemachi, Chiyoda-ku, Tokyo No. 1 Nippon Telegraph and Telephone Corporation F-term (reference) 5B009 QA06 VA02 5B075 ND02 ND35 PP02 PP03 PQ02 UU06

Claims

[Claims]

1. A document information extraction apparatus for extracting, from document information having a tree structure, all or a part of the document information as a partial tree document to be disclosed to each of the users according to the authority of each user. According to given authority information of the user and the document information, the definition body information describing a conversion rule for converting the document information into document information that can be disclosed to the user is defined from the definition body information database. Definition body information management means to be extracted; Definition body interpretation means for interpreting the tree structure from the definition body information extracted by the definition body information management means; Definition body information interpreted by the definition body interpretation means And a subtree extraction unit for comparing the tree structure of the document information with the tree structure of the document information to extract the subtree document. The subtree extraction unit includes a node corresponding to the tree structure of the document information. A first tree structure information holding unit having an area for holding an evaluation result, in which a false value is set in the evaluation result as an initial value; a terminal node of the tree structure being a leaf node; The value of the designated leaf node is compared with the value of the designated leaf node of the tree structure of the document information, and if both match, the first tree structure information holding means is sent as the leaf node evaluation result. Leaf node processing means for recording a true value; a node lower than the node to be processed as a child node; logically synthesizing the evaluation results of the child nodes including the leaf node; and logically synthesizing the result. Branch node processing means for recording in the first tree structure information holding means as a node evaluation result; and searching for the same node as the node constituting the tree structure of the definition body information from the document information, and executing the branch node processing. hand And after completion of the evaluation by the branch node processing means, a section processing means for evaluating a subtree document located on a branch below the node, and recording the evaluation result of the subtree including only the true value as a true value; Activating the section processing means, and after completion of the evaluation of the section processing means, a rule processing means for extracting, as a partial tree document, a subtree consisting of nodes having true values recorded in the evaluation results. Document information extraction device.

2. When a plurality of subtree documents to be extracted are specified in the definition body information extracted by the definition body information management means, the subtree extraction means: A second tree structure information holding unit having an area for holding an evaluation result of each node corresponding to both tree structures of the tree structure information holding unit, and a false value set in the evaluation result as an initial value. The rule processing means records the plurality of partial tree documents specified in the definition body information in the first tree structure information holding means after completion of the evaluation by the section processing means. Means for logically synthesizing the evaluation result of the partial tree document thus obtained and accumulating the result in the second tree structure information holding means; and a node in which a true value is recorded in the evaluation result of the second tree structure information holding means. Subtree as a subtree document Document information extraction apparatus according to claim 1, characterized in that it comprises means for exiting, the.

3. The definition body information includes an identifier for specifying an element or a value to be extracted by a symbol called a tag having a tag meaning, and an element or a value specified by the identifier. And a constraint condition for designating a condition to be specified, and a specified subtree document is extracted from the document information described in a markup language. The document information extracting device according to any one of the above.

4. The document information extracting device according to claim 3, wherein the markup language is XML (Extensible markup language).

5. A document information extraction method for extracting, from document information having a tree structure, all or a part of the document information as a partial tree document to be disclosed to each of the users according to the authority of each user. According to given authority information of the user and the document information, the definition body information describing a conversion rule for converting the document information into document information that can be disclosed to the user is defined from the definition body information database. An extracting process; a process of interpreting the tree structure from the extracted definition body information; and a comparison between the interpreted tree structure of the definition body information and the tree structure of the document information. Extracting the subtree document, the terminal node of the tree structure as a leaf node, the value of the designated leaf node of the tree structure of the definition body information and the tree structure of the document information Finger The values of the leaf nodes are compared, and when the two match, a region for holding the evaluation result of each node corresponding to the tree structure of the document information is provided as the evaluation result of the leaf node, and the initial value is set as the initial value. Leaf node processing for recording a true value in the first tree structure information holding unit in which a false value is set in the evaluation result; and a node lower than the processing target node as a child node, and a child node including the leaf node Branch node processing for logically synthesizing the evaluation result of the above, and recording the result of the logical synthesis in the first tree structure information holding unit as the evaluation result of the node to be processed, and constructing the tree structure of the definition body information The same node as the node is searched from the document information to activate the branch node processing. After the evaluation of the branch node processing is completed, the subtree document located on the branch below the node is evaluated, and only the true value is determined. Including partial tree A section process for recording the evaluation result as a true value; and a rule process for activating the section process and extracting a subtree consisting of nodes having the true value recorded in the evaluation result as a subtree document after the evaluation of the section process is completed. A document information extraction method, comprising:

6. A document information extracting method for extracting, from document information having a tree structure, all or a part of the document information as a partial tree document to be disclosed to each of the users according to the authority of each user. A recording medium on which a program to be used is recorded, the program comprising: a conversion rule for converting the document information into document information that can be disclosed to the user according to given authority information of the user and the document information. Extracting, from the definition information database, a process of interpreting the tree structure from the extracted definition information, a tree structure of the interpreted definition information, and the document Extracting the partial tree document by comparing the tree structure of information, and extracting the partial tree document, wherein the terminal node of the tree structure is a leaf node, and the definition body information The value of the designated leaf node in the tree structure of the document information is compared with the value of the designated leaf node in the tree structure of the document information. A leaf node process for storing an evaluation result of each node corresponding to the structure, and recording a true value in a first tree structure information holding unit in which a false value is set in the evaluation result as an initial value; A node lower than the target node is set as a child node, the evaluation result of the child node including the leaf node is logically synthesized, and the result of the logical synthesis is used as the evaluation result of the processing target node as the first tree structure information. Branch node processing to be recorded in a holding unit; and searching the document information for a node that is the same as a node constituting the tree structure of the definition body information, activating the branch node processing. Below the node A section process for evaluating a subtree document located on a branch and recording an evaluation result of a subtree including only a true value as a true value; and activating the section process, and after completing the evaluation of the section process, And a computer-readable recording medium for causing a computer to execute a rule process of extracting a subtree composed of nodes in which true values are recorded as a subtree document.