JP2001101207A

JP2001101207A - Document summarizing device

Info

Publication number: JP2001101207A
Application number: JP27772999A
Authority: JP
Inventors: Fumito Masui; 文人桝井; Junichi Fukumoto; 淳一福本; Mitsuo Shimohata; 光夫下畑
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1999-09-30
Filing date: 1999-09-30
Publication date: 2001-04-13

Abstract

PROBLEM TO BE SOLVED: To exactly provide information desired by user and to provide the summary of the entire document. SOLUTION: Information under consideration is extracted based on a predetermined rule from a document inputted by a document information input part 1 and stored in a sentence information storage part 5 by a noticing information deciding means 2. When a character string in reference relation with the information under consideration exists, its reference information is stored in the sentence information storage part 5 by a reference information deciding means 6. When a character string having relation with the information under consideration exists, its related information is stored in the sentence information storage par 5 by a relation deciding means 10. A summary sentence of the inputted document is generated based on the information under consideration the reference information and the related information stored in the sentence information storage part 5 by a summary sentence generating means 11.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書を要約して表
示する文書要約装置に関し、特に、文脈に即した情報を
抽出して表示する文書要約装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document summarizing apparatus for summarizing and displaying a document, and more particularly to a document summarizing apparatus for extracting and displaying contextual information.

【０００２】[0002]

【従来の技術】与えられた原文データから、予め定めら
れたルールに基づきその要約を抽出する文書要約装置が
ある。従来、このような文書要約装置としては、例え
ば、特開平８−２５５１７２号公報に示されているもの
があった。2. Description of the Related Art There is a document summarizing apparatus for extracting an abstract from given original text data based on a predetermined rule. Conventionally, such a document summarizing apparatus has been disclosed in, for example, JP-A-8-255172.

【０００３】この文献に示された技術は、文書を構成す
る原文データを格納した原文データベースから、例え
ば、意見、提言等のように文章の内容を識別するための
複数種類の文の型を設定し、この各文の型に分類した文
単位の抜粋文データを作成し、抜粋文データベースとし
て格納する原文加工部を備えた文書検索システムであ
る。ここで、原文加工部は、指定された文の型に対応す
る抜粋文データを抽出する型判定部と、抜粋文データを
例えば接続詞を除去したような所定の形式に整形する整
形部を備える。According to the technique disclosed in this document, a plurality of types of sentences for identifying the contents of a sentence such as opinions and proposals are set from an original text database storing original text data constituting a document. The document retrieval system includes an original sentence processing unit that creates excerpt sentence data in sentence units classified into these sentence types and stores the data as an excerpt sentence database. Here, the original sentence processing unit includes a type determination unit for extracting the excerpt sentence data corresponding to the specified sentence type, and a shaping unit for shaping the excerpt sentence data into a predetermined format such as, for example, a conjunction removed.

【０００４】その目的は (1)検索された文書の全文から所望の抜粋文や情報のみ
を抽出して表示できるようにして、利用者の文書検索作
業に要する負荷の軽減化を図ること (2)同一文書に対して様々な観点や基準に基づいて抜粋
文や情報を抽出して表示できるようにして、多様な検索
目的に応じた検索処理を実現すること (3)検索された文書の全文から所定の基準に基づいて抜
粋文や情報のみを抽出して表示する場合に、優先度の順
序に従って表示する抜粋文や情報の量を調整できる検索
処理を実現することにある。[0004] Its purpose is (1) to reduce the load required for a user's document search work by extracting and displaying only desired excerpts and information from the entire text of a searched document. ) Excerpt sentences and information can be extracted and displayed from the same document based on various viewpoints and criteria to realize search processing corresponding to various search purposes. (3) Full text of searched documents It is an object of the present invention to realize a search process capable of adjusting the amount of the excerpt sentences and information to be displayed in accordance with the priority order when only the excerpt sentences and information are extracted and displayed based on a predetermined criterion.

【０００５】[0005]

【発明が解決しようとする課題】文書から注目する情報
を抽出して要約を行う場合、抽出された個々の情報の関
連性を把握することが必要となる。この場合、個々の文
章を対象に注目情報を抽出することに加えて、文書全体
を範囲として、個々の注目情報の照応関係や参照関係を
把握することも重要である。In the case of extracting information of interest from a document and summarizing the information, it is necessary to grasp the relevance of each extracted information. In this case, in addition to extracting attention information for individual sentences, it is also important to grasp the anaphoric relation and reference relation of each attention information in the entire document as a range.

【０００６】しかしながら、上記のような従来の方法で
は、情報を抽出する際のスコープが文章を単位とした処
理を想定して実現されているため、文書全体に渡る文脈
的な情報の関連性を把握することができないという問題
があった。However, in the above-described conventional method, since the scope at the time of extracting information is realized by assuming processing in units of sentences, the relevance of contextual information over the entire document is reduced. There was a problem that it could not be grasped.

【０００７】例えば、上述した従来の技術では、型判定
用辞書を利用して、予め用意された５Ｗ１Ｈの抜粋情報
の型から、利用者が、例えば「who」という動作主体を
表現する型を選択すると、原文から例えば「首相」、
「前首相」、「科学技術庁長官」等の何らかの動作の主
体となっている人物の名前が抽出されて抜粋情報画面に
表示される。同様に、利用者が、例えば「where」とい
う場所を表現する型を選択したとすると、同一文中から
は、例えば「首相官邸」、「国内の原子力発電所」等の
ような、場所を表現する言葉が抽出されて抜粋情報画面
に表示される。[0007] For example, in the above-described conventional technique, the user selects, for example, a type representing a subject of action such as "who" from a type of 5W1H excerpt information prepared in advance using a type determination dictionary. Then, for example, "Prime Minister" from the original text,
The names of persons who are the main actors of some operations, such as "the former Prime Minister" and "the Secretary of the Science and Technology Agency" are extracted and displayed on the excerpt information screen. Similarly, if the user selects a type that represents a place such as "where", the same sentence represents a place such as "Prime Minister's office" or "Domestic nuclear power plant". The words are extracted and displayed on the excerpt information screen.

【０００８】ところが、この手法では、ある文書に関し
て、文１中に現れる動作の主体と「where」なる型から
「首相」と「首相官邸」の関係を把握することは可能で
あるが、文２中の「where」なる型として「国会議事
堂」が存在した場合には、「首相」と「国会議事堂」の
範囲は、認識単位である“文”を超えているため、文１
の「首相」と文２の「国会議事堂」を関連付けることは
できなかった。However, in this method, it is possible to grasp the relationship between “Prime Minister” and “Prime Minister's residence” from a certain subject and the type of “where” in the sentence 1 with respect to a certain document. In the case where there is a “parliament building” as a “where” type, the range of “the prime minister” and “the parliament building” exceeds the recognition unit “sentence”.
It was not possible to associate the "Prime Minister" with the Parliament Building in sentence 2.

【０００９】[0009]

【課題を解決するための手段】本発明は、前述の課題を
解決するため次の構成を採用する。〈構成１〉入力文書中の文字列から予め決められたルー
ルに基づき注目情報を抽出する注目情報決定手段と、入
力文書中で注目情報と参照関係にある文字列があった場
合に、これを示す参照情報を入力文書に付与する参照情
報決定手段と、入力文書中で注目情報と関連性のある文
字列があった場合に、これを示す関連情報を入力文書に
付与する関連性決定手段と、注目情報と参照情報と関連
情報とに基づき、入力文書の要約文を生成する要約文生
成手段とを備えたことを特徴とする文書要約装置。The present invention employs the following structure to solve the above-mentioned problems. <Structure 1> Attention information determining means for extracting attention information from a character string in an input document based on a predetermined rule. If there is a character string in reference to the attention information in the input document, this is Reference information determining means for assigning reference information to the input document, and relevance determining means for assigning, to the input document, relevant information indicating a character string that is relevant to the attention information in the input document, if there is a character string in the input document. And a summary sentence generating means for generating a summary sentence of the input document based on the attention information, the reference information, and the related information.

【００１０】〈構成２〉構成１に記載の文書要約装置に
おいて、ユーザからの任意の指示を受けた場合、注目情
報と参照情報と関連情報とに基づき、ユーザの指示を反
映させて入力文書の要約を生成するカスタマイズ手段と
を備えたことを特徴とする文書要約装置。<Structure 2> In the document summarizing apparatus described in Structure 1, when an arbitrary instruction is received from the user, the instruction of the input document is reflected by reflecting the instruction of the user based on the attention information, the reference information, and the related information. A document summarizing apparatus comprising: a customizing unit that generates an abstract.

【００１１】〈構成３〉構成１または２に記載の文書要
約装置において、入力文書中にタグ情報が付与されてい
るかを判定し、タグ情報が付与されていた場合は、タグ
情報の種類と属性を抽出するタグ情報抽出手段と、タグ
情報抽出手段で抽出されたタグ情報の種類と属性と、注
目情報とに基づいて入力文書の要約を生成する要約文生
成手段と、ユーザからの任意の指示を受けた場合、注目
情報と参照情報と関連情報とタグ情報の種類と属性に基
づき、ユーザの指示を反映させて入力文書の要約を生成
するカスタマイズ手段とを備えたことを特徴とする文書
要約装置。<Structure 3> In the document summarizing apparatus described in structure 1 or 2, it is determined whether or not tag information is added to the input document. If tag information is added, the type and attribute of the tag information are determined. Tag information extracting means for extracting a tag, a summary sentence generating means for generating a summary of the input document based on the type and attribute of the tag information extracted by the tag information extracting means, and the attention information; A document summarizing means for generating a summary of the input document by reflecting a user's instruction based on the type and attribute of the attention information, the reference information, the related information, and the tag information. apparatus.

【００１２】〈構成４〉構成３に記載の文書要約装置に
おいて、入力文書中にタグ情報が付与された画像情報が
あるかを判定し、画像情報があった場合は、これを抽出
する画像情報抽出手段と、タグ情報抽出手段で抽出され
たタグ情報の種類と属性と、注目情報とに基づいて入力
文書の要約を生成すると共に、画像情報抽出手段で画像
情報が抽出された場合は、その画像を付与する要約文生
成手段と、ユーザからの任意の指示を受けた場合、注目
情報と参照情報と関連情報とタグ情報の種類と属性に基
づき、ユーザの指示を反映させて入力文書の要約を生成
すると共に、画像情報抽出手段で画像情報が抽出された
場合は、その画像を付与するカスタマイズ手段とを備え
たことを特徴とする文書要約装置。<Structure 4> In the document summarizing apparatus described in Structure 3, it is determined whether or not there is image information to which tag information is added in the input document, and if there is image information, the image information is extracted. The extracting means generates a summary of the input document based on the type and attribute of the tag information extracted by the tag information extracting means and the attention information, and, when the image information is extracted by the image information extracting means, A summary sentence generating means for adding an image, and, when receiving an arbitrary instruction from the user, reflecting the instruction of the user based on attention information, reference information, related information, and type and attribute of tag information, and summarizing the input document. And a customizing means for adding an image when the image information is extracted by the image information extracting means.

【００１３】[0013]

【発明の実施の形態】以下、本発明の実施の形態を具体
例を用いて詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below in detail with reference to specific examples.

【００１４】《具体例１》〈構成〉図１は、本発明の文書要約装置の具体例１を示
す構成図である。図の装置はマイクロコンピュータ等で
構成され、文書情報入力部１、注目情報決定手段２、文
字列パターンデータベース（文字列パターンＤＢ）３、
構文パターンデータベース（構文パターンＤＢ）４、文
章情報格納部５、参照情報決定手段６、構文解析ルール
７、単語辞書８、属性情報決定手段９、関連性決定手段
１０、要約文生成手段１１、要約情報出力部１２からな
る。Embodiment 1 <Configuration> FIG. 1 is a configuration diagram showing Embodiment 1 of a document summarizing apparatus according to the present invention. The apparatus shown in the figure is composed of a microcomputer or the like, and includes a document information input unit 1, attention information determination means 2, a character string pattern database (character string pattern DB) 3,
Syntax pattern database (syntax pattern DB) 4, sentence information storage 5, reference information determining means 6, syntax analysis rule 7, word dictionary 8, attribute information determining means 9, relevance determining means 10, summary sentence generating means 11, summary An information output unit 12 is provided.

【００１５】文書情報入力部１は、キーボード、マウ
ス、ＯＣＲ（光学式文字読取装置）、音声入力等、ユー
ザが自然言語情報を入力するための入力手段である。注
目情報決定手段２は、文書情報入力部１から入力された
文書情報中の文字列から、予め決められたルールに基づ
き、注目情報を抽出する機能を有している。即ち、注目
情報決定手段２は、入力文書中に、ユーザが所望する情
報に関連する文字列や、文字列パターンデータベース３
中の文字列に一致する文字列があるかどうかを判断し、
あれば該当する部分を注目情報としてマーキングする機
能を有している。The document information input unit 1 is an input means for a user to input natural language information, such as a keyboard, a mouse, an OCR (optical character reading device), and voice input. The attention information determining means 2 has a function of extracting attention information from a character string in the document information input from the document information input unit 1 based on a predetermined rule. That is, the attention information determining means 2 includes a character string related to information desired by the user and a character string pattern database 3 in the input document.
Determines if there is a string that matches the string inside,
If there is, it has a function of marking a corresponding part as attention information.

【００１６】文字列パターンデータベース３は、例えば
「○○さん」の「さん」や「大蔵省」の「省」といった
ような接辞のパターン、および「○○首相」の「首相」
や「日本銀行」の「銀行」といったような機能語のパタ
ーンをそれぞれ持つ意味や属性毎に格納するデータベー
スである。The character string pattern database 3 includes, for example, affix patterns such as "san" of "san" and "ministry" of "Ministry of Finance", and "prime" of "prime OO".
It is a database that stores the meanings and attributes of each having a pattern of functional words such as “bank” of “Bank of Japan”.

【００１７】構文パターンデータベース４は、例えば
「福井は話した」の「福井」と「福井に到着した」の
「福井」が、見た目にはどちらも「福井」という文字列
であるが、これらを「人物」と「場所」を示すものであ
ると区別するための特定の用言パターンを格納している
データベースである。In the syntactic pattern database 4, for example, “Fukui” of “I talked to Fukui” and “Fukui” of “Arrive at Fukui” are both character strings of “Fukui” in appearance. This is a database storing a specific word pattern for distinguishing between "person" and "place".

【００１８】文章情報格納部５は、注目情報決定手段２
で決定された注目情報部分の文字列とそれに伴う付属情
報や、注目情報が占める文書内の位置情報等を一時的に
格納しておく格納部である。The sentence information storage unit 5 includes the attention information determination unit 2
This is a storage section for temporarily storing the character string of the attention information portion determined in step (1) and accompanying information associated therewith, position information in the document occupied by the attention information, and the like.

【００１９】参照情報決定手段６は、入力情報中の注目
情報や文字列から、同一の実体を指し示す注目情報を認
識し、マーキングする機能を有している。即ち、入力文
書中で注目情報と参照関係にある文字列を抽出する機能
を持つ。参照情報決定手段６では、参照関係を判断する
ために、構文パターンデータベース４に格納されている
特徴的な構文パターンを利用したり、単語辞書８や構文
解析ルール７を呼び出して文書内の構文、文体の構造を
解析するよう構成されている。The reference information determining means 6 has a function of recognizing attention information indicating the same entity from attention information or a character string in the input information and marking the information. That is, it has a function of extracting a character string having a reference relationship with the attention information in the input document. The reference information determining means 6 uses a characteristic syntax pattern stored in the syntax pattern database 4 to determine a reference relationship, calls a word dictionary 8 and a syntax analysis rule 7 to generate a syntax, It is configured to analyze the structure of the style.

【００２０】構文解析ルール７は、例えば、機械翻訳装
置の構文解析における語や句、節等を把握するレベルの
ルールである。単語辞書８は、構文解析ルール７の解析
処理のために必要な一般的な単語や句などを格納してい
る辞書である。The parsing rule 7 is, for example, a level rule for grasping words, phrases, clauses, and the like in the parsing of the machine translator. The word dictionary 8 is a dictionary that stores general words and phrases necessary for the analysis processing of the syntax analysis rules 7.

【００２１】属性情報決定手段９は、入力情報中の注目
情報に関する属性を認定してマーキングあるいは付与す
る機能を有している。即ち、属性情報決定手段９は、注
目情報に対して複数の属性が考えられる等、その属性が
曖昧であった場合にどれが正しいかを決定する機能を有
している。The attribute information determining means 9 has a function of recognizing and marking or giving an attribute relating to the information of interest in the input information. That is, the attribute information determining means 9 has a function of determining which is correct when the attribute is ambiguous, such as a case where a plurality of attributes are considered for the attention information.

【００２２】関連性決定手段１０は、入力情報中の注目
情報同士の関連性を認識し、マーキングする機能部であ
る。The relevancy determining means 10 is a functional unit for recognizing the relevancy between pieces of attention information in the input information and marking the information.

【００２３】要約文生成手段１１は、注目情報決定手段
２から関連性決定手段１０までの各手段によって決定さ
れ、マークされた情報と、文章情報格納部５に格納され
た情報を組み合わせ、要約として生成する作成部であ
る。即ち、要約文生成手段１１は、文章情報格納部５に
格納された入力文書に付与されている注目情報と参照情
報と関連情報とに基づき要約を作成する機能を有してい
る。The summary sentence generating means 11 combines the information determined by the attention information determining means 2 to the relevance determining means 10 and marked and the information stored in the text information storage section 5 as a summary. This is the creation unit to generate. That is, the summary sentence generation unit 11 has a function of creating a summary based on the attention information, the reference information, and the related information given to the input document stored in the text information storage unit 5.

【００２４】要約情報出力部１２は、ＣＲＴやプリンタ
あるいはネットワーク等からなり、一連の手段によって
入力された自然言語情報から抜き出された注目情報とそ
の属性や関連性、参照関係を考慮して作成された要約情
報を出力する機能を有している。The summary information output unit 12 is composed of a CRT, a printer, a network, or the like, and is created in consideration of attention information extracted from natural language information input by a series of means and its attributes, relationships, and reference relationships. It has a function of outputting the summarized information.

【００２５】尚、上記の注目情報決定手段２、参照情報
決定手段６、属性情報決定手段９、関連性決定手段１
０、要約文生成手段１１は、各手段の機能に対応したソ
フトウェアと、これを実行するプロセッサやメモリとい
ったハードウェアで構成されている。また、文字列パタ
ーンデータベース３、構文パターンデータベース４、文
章情報格納部５、構文解析ルール７、単語辞書８は、磁
気ディスク装置や半導体メモリといった記憶装置上に実
現されている。The above-mentioned attention information deciding means 2, reference information deciding means 6, attribute information deciding means 9, and relevancy deciding means 1 are described above.
0, the summary sentence generating means 11 is composed of software corresponding to the function of each means and hardware such as a processor and a memory for executing the software. Further, the character string pattern database 3, the syntax pattern database 4, the text information storage unit 5, the syntax analysis rule 7, and the word dictionary 8 are realized on a storage device such as a magnetic disk device or a semiconductor memory.

【００２６】〈動作〉図２は、文書要約装置の動作を示
すフローチャートである。文書情報入力部１から入力さ
れる文書情報としては、新聞記事や電子メールに代表さ
れるようなテキスト文字列で構成されるような文書情報
であり、一括またはあるまとまりで入力される（ステッ
プＳ１０１）。<Operation> FIG. 2 is a flowchart showing the operation of the document summarizing apparatus. The document information input from the document information input unit 1 is document information composed of a text character string typified by a newspaper article or an e-mail, and is input in a lump or in a unit (step S101). ).

【００２７】図３は、入力情報の一例を示す説明図であ
る。このとき、入力された文書情報中には、図示のよう
な“4月1日，○△首相は…を発表した．”という文が存
在したとする。FIG. 3 is an explanatory diagram showing an example of the input information. At this time, it is assumed that a sentence "Prime Minister O △ has announced ... on April 1" as shown in the input document information.

【００２８】入力された文書情報は、注目情報決定手段
２によって文字列パターンデータベース３が参照され
（ステップＳ１０２）、データベース中のパターンと一
致する文字列、または、ユーザが所望する情報に関係す
る文字列があるかどうかが調べられる（ステップＳ１０
３）。このとき、文字列パターンデータベース３には、
例えば次のような情報が格納されていたとする。The input document information is referred to the character string pattern database 3 by the attention information determining means 2 (step S102), and a character string matching the pattern in the database or a character string related to the information desired by the user. It is checked whether there is a column (step S10).
3). At this time, the character string pattern database 3 contains
For example, assume that the following information is stored.

【００２９】図４は、文字列パターンデータベース３に
格納された文字列パターンの説明図である。ここで、
［…］内はどの文字が何度現れてもよいという意味を表
し、その文字に連続して「大統領」や「首相」といった
文字列が連続していた場合、その種類は人名、属性は政
治家である、ということを示している。FIG. 4 is an explanatory diagram of a character string pattern stored in the character string pattern database 3. here,
[...] indicates that any character can appear any number of times. If the character is followed by a character string such as "President" or "Prime Minister", the type is person name and the attribute is political. It indicates that it is a house.

【００３０】図３の入力情報に対して、図４のような文
字列パターンであった場合、「首相」というパターンが
一致する。そのため、文書情報における「○△首相」の
“○△”という文字列は注目情報として判断されてＳＧ
ＭＬ（standard generalizedmarkup language）やＨＴ
ＭＬ（hypertext markup language）、ＸＭＬ（extensi
ble markup language）といったタグ付けなどの手段で
マーキングされ（ステップＳ１０４）、文章情報格納部
５に格納される（ステップＳ１０５）。If the input information of FIG. 3 has a character string pattern as shown in FIG. 4, the pattern "Prime Minister" matches. Therefore, the character string “○ という” of “Prime Minister △” in the document information is determined as the attention information, and SG
ML (standard generalized markup language) and HT
ML (hypertext markup language), XML (extensi
This is marked by means such as tagging such as ble markup language (step S104) and stored in the text information storage unit 5 (step S105).

【００３１】図５は、注目情報がマークされた文書情報
の例を示す説明図である。図５（ａ）に示すように、文
書情報中の「首相」というパターンが一致するため、こ
れは注目情報として判断される。そして、注目情報とし
て判断された場合は、図５（ｂ）に示すようにタグ付け
され、かつ、図４に示すような種類と属性とが付与され
る。FIG. 5 is an explanatory diagram showing an example of document information in which attention information is marked. As shown in FIG. 5A, since the pattern of "Prime Minister" in the document information matches, this is determined as attention information. If it is determined that the information is attention information, it is tagged as shown in FIG. 5B, and a type and an attribute as shown in FIG. 4 are given.

【００３２】上記ステップＳ１０３において、文字列パ
ターンデータベース３の文字列パターンに一致しない場
合はステップＳ１０６に進む。If it does not match the character string pattern in the character string pattern database 3 in step S103, the process proceeds to step S106.

【００３３】次に、参照情報決定手段６において、注目
情報の参照関係を調べる。先ず、構文解析ルール７が起
動され（ステップＳ１０６）、文章情報格納部５に格納
されたマーク付き文書情報に対して、構文パターンデー
タベース４の構文パターンや単語辞書８を用いて注目情
報や名詞句の参照先または参照元があるかどうかを判断
する（ステップＳ１０７）。Next, the reference information determining means 6 checks the reference relation of the attention information. First, the syntactic analysis rule 7 is activated (step S106). The marked information stored in the text information storage unit 5 is subjected to the attention information and the noun phrase using the syntax pattern of the syntax pattern database 4 and the word dictionary 8. It is determined whether there is a reference destination or a reference source (step S107).

【００３４】例えば、注目情報“○△”のような名詞句
の場合、文書中には“彼”や“総理”、“その政治家”
などの言い換えが現れることが多い。この場合、これら
の言い換えは皆同じ実体“○△”を指し示しているの
で、互いに参照関係にあるといえる。For example, in the case of a noun phrase such as the attention information “○ △”, “he”, “primary”, “the politician”
Such paraphrasing often appears. In this case, since these paraphrases all point to the same entity “△ Δ”, they can be said to be in a reference relationship with each other.

【００３５】よって、上記のような言い換えと判断され
た文字列は、それぞれの言い換えが見つかった文書内の
位置がマークされ（ステップＳ１０８）、文章情報格納
部５に格納される（ステップＳ１０９）。一方、ステッ
プＳ１０７において、参照情報が見つからなかった場合
はステップＳ１１０に進む。Therefore, the character string determined to be paraphrased as described above is marked in the document where the paraphrase is found (step S108) and stored in the text information storage unit 5 (step S109). On the other hand, if the reference information is not found in step S107, the process proceeds to step S110.

【００３６】図６は、参照情報がマークされた文書情報
の例である。尚、注目情報（人名：首相，政治家）と
は、図４に示したように「人名」が種類で「首相」と
「政治家」が属性を示している。また、この場合の属性
は、人名を特徴付ける情報集合であるため、複数存在し
てもよい。また、参照情報（003，005，006，007：彼）
とは、「彼」が言い換えで、003等が言い換えが出現す
る位置を示している。尚、この場合の出現する位置と
は、例えば、入力文書中で何番目の名詞句として出現し
たか、または、入力文書中で何番目の文中の何番目の名
詞句として出現したかを示すものであるが、対象とする
文書中でその位置が特定できるものであればどのような
位置情報であってもよい。FIG. 6 is an example of document information in which reference information is marked. Note that the attention information (person name: PM, politician) indicates the type of “person name” and the attributes “Prime Minister” and “politician” as shown in FIG. In this case, since the attribute is an information set characterizing a person's name, a plurality of attributes may exist. Also, reference information (003, 005, 006, 007: he)
Means that "he" is a paraphrase, and 003 etc. indicates the position where the paraphrase appears. In this case, the appearance position indicates, for example, what number noun phrase appears in the input document or what number noun phrase in the input document appears in the input document. However, any position information may be used as long as its position can be specified in the target document.

【００３７】属性情報決定手段９では、注目情報の属性
情報を調べる。先ず、構文解析ルール７が起動され、構
文パターンデータベース４や単語辞書８が参照される
（ステップＳ１１０）。構文解析ルールは、文章情報格
納部５に格納されたマーク付き文書情報の注目情報につ
いて、属性情報が得られるかどうかを判断し（ステップ
Ｓ１１１）、属性情報が得られた場合はマーク付き文書
情報に情報を付与して文章情報格納部５に格納する（ス
テップＳ１１２、Ｓ１１３）。The attribute information determining means 9 checks the attribute information of the information of interest. First, the syntax analysis rule 7 is activated, and the syntax pattern database 4 and the word dictionary 8 are referred to (step S110). The syntax analysis rule determines whether attribute information can be obtained with respect to the noted information of the marked document information stored in the text information storage unit 5 (step S111). If the attribute information is obtained, the marked document information is determined. And the information is stored in the text information storage unit 5 (steps S112 and S113).

【００３８】図７は、属性情報がマークされた文書情報
の例を示す説明図である。尚、ここで、図６における参
照情報が注目情報に変化している理由は、例えば“議
会”が注目情報であった場合、“衆院議会”の参照先が
注目情報であることになり、従って、この参照情報は注
目情報と同列と見なすからである。但し、図示例ではこ
のように注目情報に変化させるようにしたが、参照情報
がどの名詞句を参照しているかが分かるようになってい
れば、必ずしも注目情報に変化させる必要はない。FIG. 7 is an explanatory diagram showing an example of document information in which attribute information is marked. Here, the reason why the reference information in FIG. 6 is changed to the attention information is that, for example, when “congress” is the attention information, the reference destination of the “lower house assembly” is the attention information. This is because the reference information is regarded as the same as the attention information. However, in the illustrated example, the target information is changed to the attention information as described above. However, if it is possible to know which noun phrase the reference information refers to, it is not always necessary to change the reference information to the attention information.

【００３９】ステップＳ１１１において、属性情報が得
られなかった場合、または、既に付与されている場合は
ステップＳ１１４に進む。ここで、属性情報は構文パタ
ーンデータベース４や単語辞書８から得られる場合と、
文章情報格納部５に保存された情報から得られる場合が
ある。If no attribute information has been obtained in step S111, or if attribute information has already been given, the flow advances to step S114. Here, the attribute information is obtained from the syntax pattern database 4 or the word dictionary 8,
It may be obtained from the information stored in the text information storage unit 5.

【００４０】図８は、構文パターンデータベース４中の
構文パターンの一例を示す説明図である。図９は、属性
情報がマークされた他の文書情報の一例を示す説明図で
ある。FIG. 8 is an explanatory diagram showing an example of a syntax pattern in the syntax pattern database 4. FIG. 9 is an explanatory diagram illustrating an example of other document information in which attribute information is marked.

【００４１】図８中、パターン番号とは、構文パターン
データベース４にあるパターン固有の番号を示し、構文
パターンとは、属性情報決定手段９において、必要な情
報を得るための構文の組み合わせを抽象化したものであ
る。例えば、パターン２２においては、カンマで区切ら
れた二つの名詞連続のパターンを示しており、その前部
の名詞は“人”を示す名詞、後部の名詞は“未知”であ
ることを示している。In FIG. 8, a pattern number indicates a pattern-specific number in the syntax pattern database 4, and a syntax pattern is an abstraction of a combination of syntaxes for obtaining necessary information in the attribute information determining means 9. It was done. For example, in pattern 22, a pattern of two consecutive nouns separated by a comma is shown, and the noun at the front indicates that the noun indicates "person" and the noun at the back indicates "unknown". .

【００４２】例えば、“課長”“特許一郎”というよう
な名詞連続があって、“課長”は単語辞書等の意味情報
から人（間）であると分かっており、“特許一郎”は形
態素解析で名詞（句）とは判定されているが、単語辞書
にはないので未知語であるとする。このとき、“課長
（人）、特許一郎（未知語）”はパターン番号２２にマ
ッチし、パターンの「種類」が人名に決定され、パター
ンが解釈される。「パターンの解釈」とは、マッチした
構文パターンの要素（パターン２２では二つの名詞）が
どういう関係であると解釈すべきかを示している。上記
の例では、この解釈に従えば、“課長”は“特許一郎”
に対する「属性」であり、“特許一郎”は「人名」であ
ると解釈することができる。従って、図９の例では“元
首相，特許太郎”がそのパターンである。尚、図９に示
す文書情報は、構文パターンによる属性情報の一例を示
しており、図７の文書情報とは直接的な関係はないもの
である。For example, there is a series of nouns such as "section manager" and "patent Ichiro". Is determined to be a noun (phrase), but is not known in the word dictionary, so it is assumed to be an unknown word. At this time, “section manager (person), Ichiro Patent (unknown word)” matches the pattern number 22, and the “kind” of the pattern is determined as a person name, and the pattern is interpreted. “Interpretation of a pattern” indicates what relationship should be interpreted between the elements of the matched syntax pattern (two nouns in the pattern 22). In the above example, according to this interpretation, "section manager" is "patent Ichiro"
Can be interpreted as "personal name". Therefore, in the example of FIG. 9, the pattern is “Former Prime Minister, Taro Patent”. Note that the document information shown in FIG. 9 shows an example of attribute information based on a syntax pattern, and has no direct relationship with the document information shown in FIG.

【００４３】関連性決定手段１０では、注目情報間の関
連性を判断する。先ず、文章情報格納部５に格納された
マーク付き文書情報が参照され、この文書情報に対し
て、構文解析ルール７が起動され、構文パターンデータ
ベース４や単語辞書８を参照する（ステップＳ１１
４）。The relevancy determining means 10 determines the relevancy between the pieces of attention information. First, the document information with a mark stored in the text information storage unit 5 is referred to, a syntax analysis rule 7 is activated for this document information, and the syntax pattern database 4 and the word dictionary 8 are referred to (step S11).
4).

【００４４】構文解析ルール７では、文章情報格納部５
に格納されたマーク付き文書情報の注目情報について、
他の注目情報と関連性が存在するかどうかを判断する
（ステップＳ１１５）。In the syntax analysis rule 7, the text information storage 5
About the attention information of the document information with the mark stored in
It is determined whether there is a relationship with other attention information (step S115).

【００４５】ここで、構文解析ルール７は構文パターン
データベース４を参照する。図１０は、構文パターンの
一例を示す説明図である。図１０のような構文パターン
を用いると、例えば“…日本政府を代表して首相が参列
する”といった文書情報では、“首相”は“日本政府”
に属していると認識され、両者は関連性があると判断さ
れる。更に、参照情報を利用して“首相”の実体である
“○△”に対しても“日本政府”との関連性が把握でき
る。図１１は、このような例を示す属性情報がマークさ
れた他の文書情報の説明図である。Here, the syntax analysis rule 7 refers to the syntax pattern database 4. FIG. 10 is an explanatory diagram illustrating an example of the syntax pattern. If a syntax pattern as shown in FIG. 10 is used, for example, in document information such as “... the prime minister participates on behalf of the Japanese government”, “the prime minister” is “the Japanese government”
And both are determined to be related. Furthermore, using the reference information, it is possible to ascertain the relevance of the Japanese government to “○ △”, which is the substance of the “Prime Minister”. FIG. 11 is an explanatory diagram of another document information in which attribute information indicating such an example is marked.

【００４６】このように、関連性があった場合は、マー
ク付き文書情報中の該当注目情報に情報を付与して文章
情報格納部５に格納する（ステップＳ１１６、Ｓ１１
７）。As described above, when there is relevance, information is added to the noticed information in the marked document information and stored in the text information storage unit 5 (steps S116 and S11).
7).

【００４７】図１２は、属性情報がマークされた文書情
報の説明図である。この状態は、“○△首相”と“日本
政府”の「関連性」が認識された結果を示すもので、関
連性（003：政府）（005：政府）となる理由は次の通り
である。FIG. 12 is an explanatory diagram of document information in which attribute information is marked. This state shows the result of the recognition of the "relevance" between "Prime Minister △ @" and "the Japanese government". The reason for the relevance (003: government) (005: government) is as follows. .

【００４８】先ず、（図１２に示す前の段階の）文書情
報が図７に示すように、「…<注目情報（人名：首相，
政治家）｜参照情報（003，005，006，007：彼）（00
4，008：総理）（008：その政治家）>○△首相<注目情
報>…」という状態であったとする。この場合、図１１
に示す“首相”の参照情報（004，008：総理）と、“○
△首相”の参照情報（004，008：総理）が一致する（要
するに参照関係にある）ことから決定される。First, as shown in FIG. 7, the document information (before the stage shown in FIG. 12) is “... <attention information (person name: PM,
Politician) | Reference information (003, 005, 006, 007: he) (00
4,008: Prime Minister) (008: The politician)>○> Prime Minister <attention information> ... ". In this case, FIG.
Reference information (004, 008: Prime Minister) of “Prime Minister” and “○
It is determined from the fact that the reference information (004, 008: prime minister) of “Prime Minister” matches (in short, has a reference relationship).

【００４９】また、仮に、図１２に示す前の段階の文書
情報が、「…<注目情報（人名：首相，政治家）｜参照
情報（003，005，006，007：彼）（004，008：総理）
（008：その政治家）｜関連性（003，005：政府）>○△
首相<注目情報>…」という状態であったとすると、図１
１に示す“日本政府”の参照情報（003，005：政府）と
“○△首相”の関連情報（003，005：政府）が一致する
ことからも関連性が決定される。If the document information at the previous stage shown in FIG. 12 is “... <attention information (personal name: Prime Minister, politician) | reference information (003, 005, 006, 007: him) (004, 008) : Prime)
(008: The politician) | Relevance (003, 005: Government)> ○ △
Prime Minister <Attention information> ... "
The relevance is also determined from the fact that the reference information (003, 005: government) of “Japanese government” shown in FIG. 1 matches the related information (003, 005: government) of “Prime Minister △”.

【００５０】ステップＳ１１５において、関連性が見つ
からなかった場合は、そのままステップＳ１１８に進
む。If no relevance is found in step S115, the process proceeds directly to step S118.

【００５１】尚、上記ステップＳ１０１からステップＳ
１１７の一連の処理が進む毎に、文書内の注目情報とそ
れに関する情報はマーク付き文書情報として更新され、
文章情報格納部５に保持される。ここで保持されるマー
ク情報は、以降または同一処理内で行われる同様の処理
において参照・利用可能である。また、文字列パターン
データベース３および構文パターンデータベース４で
は、上記の一連の処理が行われる際に必要とされる語句
やその組み合わせのパターン情報がデータベースとして
予め蓄積されており、各処理の必要な時点で参照され
る。Note that the steps S101 to S101
Each time a series of processing in step 117 proceeds, the attention information in the document and the information related thereto are updated as marked document information.
The text information is stored in the text information storage unit 5. The mark information held here can be referred to and used later or in a similar process performed in the same process. Further, in the character string pattern database 3 and the syntax pattern database 4, pattern information of words and phrases required when the above series of processing is performed and the combination thereof are stored in advance as databases, Referred to by

【００５２】最後に、要約文生成手段１１により、文章
情報格納部５からマーク付き文書情報が参照される（ス
テップＳ１１８）。要約文生成手段１１では、マークさ
れた情報を参照しながら、重要またはユーザが所望する
注目情報を組み合わせて要約文を作成する（ステップＳ
１１９）。そして、要約文生成手段１１で作成した要約
情報を要約情報出力部１２より出力する（ステップＳ１
２０）。Finally, the summary sentence generating means 11 refers to the marked document information from the sentence information storage 5 (step S118). The summary sentence generating unit 11 creates a summary sentence by combining important or desired information of interest with reference to the marked information (step S).
119). Then, the summary information created by the summary sentence generation unit 11 is output from the summary information output unit 12 (step S1).
20).

【００５３】図１３は、要約情報の一例を示す説明図で
ある。図示の要約情報は、○○○改革案の詳細を示すよ
うにしたものである。出力の仕方としては、例えば、デ
フォルトとして「4月1日，○△首相は衆院議会で○○○
改革案を発表した．」という要約を出力し、○○○改革
案の文字列を異なる表示としておく。そして、この○○
○改革案についての詳細を必要に応じて順次出力すると
いった方法が考えられる。これらの要約情報も、注目情
報や参照情報等の情報によって、入力文書中のどこに位
置していても抽出することができる。FIG. 13 is an explanatory diagram showing an example of the summary information. The illustrated summary information shows details of the XX reform plan. As the output method, for example, as the default, “April 1,
We announced the reform plan. Is output, and the character string of the XX reform plan is displayed differently. And this ○○
○ A method of sequentially outputting the details of the reform proposal as needed can be considered. Such summary information can also be extracted from information such as attention information and reference information regardless of where it is located in the input document.

【００５４】〈効果〉以上のように具体例１によれば、
入力文書中の注目情報と参照情報と関連情報とに基づ
き、入力文書中の要約文を生成するようにしたので、以
下のような効果が得られる。<Effects> According to the specific example 1 as described above,
Since the summary sentence in the input document is generated based on the attention information, the reference information, and the related information in the input document, the following effects can be obtained.

【００５５】(1)テキスト情報の注目すべき部分または
ユーザが所望する情報に関して注目すべき情報を自動的
に判断し、注目情報の種類や属性、相互の関連性を把握
して要約情報を作成するため、従来と比較してユーザが
所望する情報をより的確にかつ簡潔に提供することがで
きる。(1) Automatically determines the notable part of the text information or the information to be noticed with respect to the information desired by the user, and creates summary information by grasping the types and attributes of the noticed information and their relevance to each other. Therefore, the information desired by the user can be provided more accurately and simply as compared with the related art.

【００５６】(2)各情報をマークした文書情報を保持し
て処理を進めるため、文書全体を対象とした注目情報の
関連性や参照関係をも把握することができる。(2) Since the process is carried out while holding the document information in which each information is marked, it is possible to grasp the relevance and reference relationship of the attention information for the entire document.

【００５７】《具体例２》具体例２は、具体例１の構成
に加えて要約情報にカスタマイズして出力する構成を付
加したものである。<< Specific Example 2 >> In the specific example 2, in addition to the configuration of the specific example 1, a configuration for customizing and outputting summary information is added.

【００５８】〈構成〉図１４は、具体例２の文書要約装
置の構成図である。図の装置は、文書情報入力部１、注
目情報決定手段２、文字列パターンデータベース（文字
列パターンＤＢ）３、構文パターンデータベース（構文
パターンＤＢ）４、文章情報格納部５、参照情報決定手
段６、構文解析ルール７、単語辞書８、属性情報決定手
段９、関連性決定手段１０、要約文生成手段１１、要約
情報出力部１２、カスタマイズ手段１３からなる。ここ
で、カスタマイズ手段１３以外は、具体例１と同様であ
るため、ここでの説明は省略する。<Structure> FIG. 14 is a diagram showing the structure of a document summarizing apparatus according to the second embodiment. The apparatus shown in the figure includes a document information input unit 1, attention information determining means 2, a character string pattern database (character string pattern DB) 3, a syntax pattern database (syntax pattern DB) 4, a text information storage unit 5, and a reference information determining means 6. , A syntax analysis rule 7, a word dictionary 8, attribute information determining means 9, relevance determining means 10, summary sentence generating means 11, summary information output section 12, and customizing means 13. Here, the configuration other than the customizing unit 13 is the same as that of the first embodiment, and the description is omitted here.

【００５９】カスタマイズ手段１３は、ユーザの指示に
よって、注目情報決定手段２から関連性決定手段１０の
各手段によって決定され、マークされた情報と、文章情
報格納部５に格納された情報とを組み合わせ、例えば、
一覧表やインデックス、注目箇所のハイライトなどを行
い、要約情報をカスタマイズする機能を有している。The customizing means 13 combines the marked information determined by the attention information deciding means 2 to each of the relevance deciding means 10 and the information stored in the text information storage section 5 according to the user's instruction. For example,
It has a function to customize the summary information by performing a list, an index, highlighting a point of interest, and the like.

【００６０】〈動作〉図１５は、具体例２の動作を示す
フローチャートである。具体例２において、文書情報入
力部１からの入力は、新聞記事や電子メールに代表され
るようなテキスト文字列で構成されるような文書情報で
あり、一括またはあるまとまりで入力される（ステップ
Ｓ２０１）。<Operation> FIG. 15 is a flowchart showing the operation of the second embodiment. In the specific example 2, the input from the document information input unit 1 is document information constituted by a text character string typified by a newspaper article or an e-mail, and is input collectively or in a unit (step). S201).

【００６１】具体例２において、ステップＳ２０１から
ステップＳ２１７までの動作については具体例１におけ
るステップＳ１０１からステップＳ１１７までの動作と
同様であるため、ここでの説明は省略する。In the specific example 2, the operation from step S201 to step S217 is the same as the operation from step S101 to step S117 in the specific example 1, and the description is omitted here.

【００６２】ステップＳ２１７において文章情報格納部
５にマーク付き文書情報が格納されると、あるいはステ
ップＳ２１５において関連性のある注目情報がなかった
場合、次に、要約文生成手段１１およびカスタマイズ手
段１３によって文章情報格納部５からマーク付き文書情
報が参照される（ステップＳ２１８）。ここで、ユーザ
によって要約情報のカスタマイズが指定されているかを
判定し（ステップＳ２１９）、指定されている場合はス
テップＳ２２０に進み、指定されていない場合は、ステ
ップＳ２２１に進む。When the marked document information is stored in the sentence information storage unit 5 in step S217, or when there is no relevant attention information in step S215, next, the summary sentence generation unit 11 and the customization unit 13 Marked document information is referenced from the text information storage unit 5 (step S218). Here, it is determined whether customization of the summary information is specified by the user (step S219). If it is specified, the process proceeds to step S220. If not, the process proceeds to step S221.

【００６３】ステップＳ２２０では、カスタマイズ手段
１３により、マークされた情報を参照しながら、重要ま
たはユーザが所望する注目情報を組み合わせて、ユーザ
の目的に合った形で要約情報をカスタマイズする。In step S220, the customizing means 13 customizes the summary information in a form suited to the user's purpose by combining important or desired user's attention information while referring to the marked information.

【００６４】図１６は、人名リストとしてカスタマイズ
した要約情報の説明図である。図１７は、製品カタログ
としてカスタマイズした要約情報の説明図である。FIG. 16 is an explanatory diagram of summary information customized as a personal name list. FIG. 17 is an explanatory diagram of summary information customized as a product catalog.

【００６５】図１６の例では、人名に対する職業、所属
組織、その組織の場所といった項目で要約情報が生成さ
れている。また、図１７の例では、製品毎に、その製品
の種別や生産元，発売元、立地、その状況といった形で
要約情報が生成されている。In the example of FIG. 16, summary information is generated for items such as occupation for a person's name, affiliation organization, and location of the organization. In the example of FIG. 17, summary information is generated for each product in the form of the type of the product, the source of the product, the source of the sale, the location, and the situation.

【００６６】一方、ステップＳ２２１では、要約文生成
手段１１により、マークされた情報を参照しながら、重
要またはユーザが所望する注目情報を組み合わせて要約
文を作成する。On the other hand, in step S221, the summary sentence generating means 11 creates a summary sentence by combining important or desired information of interest by the user while referring to the marked information.

【００６７】そして、要約文生成手段１１またはカスタ
マイズ手段１３でカスタマイズされた要約情報を要約情
報出力部１２より出力する（ステップＳ２２２）。Then, the summary information customized by the summary sentence generating means 11 or the customizing means 13 is output from the summary information output section 12 (step S222).

【００６８】〈効果〉以上のように、具体例２によれ
ば、具体例１の構成に加えて要約情報をカスタマイズす
る手段を設けたので、具体例１の効果に加えて、次のよ
うな効果が得られる。<Effects> As described above, according to the second embodiment, a means for customizing the summary information is provided in addition to the configuration of the first embodiment. The effect is obtained.

【００６９】(1)テキスト情報の注目すべき部分を自動
的に判断し、注目情報の関連をユーザの目的に応じてま
とめ、一覧表として出力することができる。(1) A noteworthy portion of text information is automatically determined, and the association of the noticeable information can be summarized according to the purpose of the user and output as a list.

【００７０】(2)文章のままでは計算機の表示装置（要
約情報出力部１２）で一度に表示しきれないような文書
情報であっても、必要な情報を取りこぼすことなく一度
に表示することが可能となる。(2) Even if the document information cannot be displayed all at once on the display device (summary information output unit 12) of the computer, it is necessary to display the necessary information at once without dropping necessary information. Becomes possible.

【００７１】《具体例３》具体例３は、タグ情報を含む
文書情報からタグ情報を抽出するようにしたものであ
る。<< Third Specific Example >> In a third specific example, tag information is extracted from document information including tag information.

【００７２】〈構成〉図１８は、具体例３の文書要約装
置の構成図である。図の装置は、文書情報入力部１、注
目情報決定手段２、文字列パターンデータベース（文字
列パターンＤＢ）３、構文パターンデータベース（構文
パターンＤＢ）４、文章情報格納部５、参照情報決定手
段６、構文解析ルール７、単語辞書８、属性情報決定手
段９、関連性決定手段１０、要約文生成手段１１、要約
情報出力部１２、カスタマイズ手段１３ａ、タグパター
ンデータベース（タグパターンＤＢ）１４、タグ情報抽
出手段１５からなる。ここで、タグパターンデータベー
ス１４、タグ情報抽出手段１５とカスタマイズ手段１３
ａ以外は、具体例２と同様であるため、ここでの説明は
省略する。<Structure> FIG. 18 is a diagram showing the structure of a document summarizing apparatus according to the third embodiment. The apparatus shown in the figure includes a document information input unit 1, attention information determining means 2, a character string pattern database (character string pattern DB) 3, a syntax pattern database (syntax pattern DB) 4, a text information storage unit 5, and a reference information determining means 6. , Syntax analysis rules 7, word dictionary 8, attribute information determining means 9, relevance determining means 10, summary sentence generating means 11, summary information output section 12, customizing means 13a, tag pattern database (tag pattern DB) 14, tag information It consists of extraction means 15. Here, the tag pattern database 14, the tag information extracting means 15, and the customizing means 13
Except for a, the configuration is the same as that of the specific example 2, and the description is omitted here.

【００７３】タグパターンデータベース１４は、基本的
なタグセットの書式やパターンなどを格納するデータベ
ースである。タグ情報抽出手段１５は、入力された情報
に例えばＳＧＭＬやＨＴＭＬ、ＸＭＬなどのタグ情報が
付与されているかどうかをタグパターンデータベース１
４にアクセスして判断し、タグ情報が付与されていた場
合は、それらの種類や位置情報を抽出し、文章情報格納
部５に格納する機能を有している。The tag pattern database 14 is a database for storing basic tag set formats and patterns. The tag information extraction unit 15 determines whether tag information such as SGML, HTML, or XML has been added to the input information.
The tag information is accessed and determined, and when tag information is added, the type and position information are extracted and stored in the text information storage unit 5.

【００７４】また、カスタマイズ手段１３ａは、ユーザ
の指示によって、このタグ情報抽出手段１５から関連性
決定手段１０の各手段によって決定され、マークされた
情報と、文章情報格納部５に格納された情報とを組み合
わせ、例えば、一覧表やインデックス、注目箇所のハイ
ライトなどを行い、要約情報をカスタマイズする機能を
有している。Further, the customizing means 13a determines the marked information determined by each means of the relevance determining means 10 from the tag information extracting means 15 and the information stored in the text information storage section 5 according to the user's instruction. And has a function of customizing the summary information by performing, for example, a list, an index, highlighting of a point of interest, and the like.

【００７５】〈動作〉図１９、図２０は、具体例３の動
作を示すフローチャート（その１、その２）である。<Operation> FIGS. 19 and 20 are flowcharts (parts 1 and 2) showing the operation of the third embodiment.

【００７６】具体例３において、文書情報入力部１から
入力される文書は、ＣＤ−ＲＯＭに格納されたＳＧＭＬ
タグ付きの新聞記事やＷＷＷ情報に代表されるようなＨ
ＴＭＬまたはＳＨＴＭＬのようなタグ情報とテキスト文
字列情報で構成されるタグ付き文書情報であり、一括ま
たはあるまとまりで入力される（ステップＳ３０１）。In the specific example 3, the document input from the document information input unit 1 is SGML stored in the CD-ROM.
H as represented by tagged newspaper articles and WWW information
Tagged document information including tag information such as TML or SHTML and text character string information, and is input collectively or in a unit (step S301).

【００７７】図２１は、入力文書の一例を示す説明図で
ある。図示のようなタグ付き文書情報は、タグ情報抽出
手段１５によってタグパターンデータベース１４が参照
され（ステップＳ３０２）、データベース中のタグパタ
ーンと一致するタグ情報、またはユーザが指示するタグ
情報があるかどうかが調べられる（ステップＳ３０
３）。FIG. 21 is an explanatory diagram showing an example of an input document. For the tagged document information as shown in the figure, the tag pattern database 14 is referred to by the tag information extracting means 15 (step S302), and whether there is tag information matching the tag pattern in the database or tag information designated by the user is present. Is checked (step S30).
3).

【００７８】図２２は、タグパターンの一例を示す説明
図である。ステップＳ３０３において、タグパターンデ
ータベース１４に例えば図２２のようなタグパターン情
報が格納されていた場合、これと一致する文書中のタグ
情報はタグと判断され、その種類および属性が把握され
る。FIG. 22 is an explanatory diagram showing an example of a tag pattern. In step S303, if tag pattern information as shown in FIG. 22, for example, is stored in the tag pattern database 14, tag information in a document that matches this is determined to be a tag, and its type and attribute are grasped.

【００７９】例えば、“<DATE>4月1日</DATE>”という
部分の“<DATE>”および</DATE>”はＳＧＭＬまたはＨ
ＴＭＬのタグ情報であると判断され、その間の文字列は
日付に関する情報として把握される。また、“<TITLE>
…</TITLE>”という部分の“<TITLE>”および</TITLE
>”はＳＧＭＬタグ情報であると判断され、その間の文
字列は題目であるとして把握される。更に、タグの種類
が一意に判断されるため、ここでこのタグ付き文書はＳ
ＧＭＬであると把握される。For example, “<DATE>” and “</ DATE>” of “<DATE> April 1 </ DATE>” are SGML or H
It is determined that the tag information is TML tag information, and the character string between them is grasped as date information. Also, “<TITLE>
… </ TITLE> ”for“ <TITLE> ”and </ TITLE
> ”Is determined to be SGML tag information, and the character string between them is grasped as a title. Further, since the type of the tag is uniquely determined, the tag-attached document is represented by S
It is recognized as GML.

【００８０】把握されたタグ情報は、要約装置内で認識
でき、かつ、処理し易いようなタグ付けなどの手段でマ
ーキングされ（ステップＳ３０４）、文章情報格納部５
に格納される（ステップＳ３０５）。図２３は、タグ情
報がマーキングされたタグ付き文書情報の一例である。The grasped tag information is marked by means such as tagging so that it can be recognized in the summarization apparatus and is easy to process (step S304).
(Step S305). FIG. 23 is an example of tagged document information in which tag information is marked.

【００８１】図１９に戻って、ステップＳ３０３におい
てタグパターンにマッチする文字列がない場合はステッ
プＳ３０６に進む。以下、ステップＳ３０６からステッ
プＳ３２１までの動作については具体例２におけるステ
ップＳ２０２からステップＳ２１７と同様であるため、
ここでの説明は省略する。Returning to FIG. 19, if there is no character string that matches the tag pattern in step S303, the flow advances to step S306. Hereinafter, since operations from step S306 to step S321 are the same as steps S202 to S217 in the specific example 2,
The description here is omitted.

【００８２】ステップＳ３２２では、要約文生成手段１
１およびカスタマイズ手段１３によって、文章情報格納
部５からマーク付き文書情報が参照される。次に、ユー
ザによって要約情報のカスタマイズが指定されているか
を判断し（ステップＳ３２３）、指定されている場合は
ステップＳ３２４に進み、指定されていない場合はステ
ップＳ３２５へ進む。In step S322, the summary sentence generating means 1
1 and the customizing unit 13 refer to the marked document information from the text information storage unit 5. Next, it is determined whether customization of the summary information is specified by the user (step S323). If it is specified, the process proceeds to step S324. If not, the process proceeds to step S325.

【００８３】要約文生成手段１１は、マークされた情報
を参照しながら、重要またはユーザが所望する注目情報
を組み合わせて要約文を作成し、必要な箇所にはタグ情
報も入力時のタグの種類に合わせて付与する（ステップ
Ｓ３２５）。The summary sentence generating means 11 creates a summary by combining important or desired user's attention information while referring to the marked information. The tag information is also input at a necessary portion when the tag information is input. (Step S325).

【００８４】カスタマイズ手段１３では、マークされた
情報を参照しながら、重要またはユーザが所望する注目
情報を組み合わせて、例えば上述した図１６のような人
名リストであったり、図１７のような製品カタログであ
るというように、ユーザの目的にあった形で要約情報を
カスタマイズし、必要な箇所にはタグ情報も入力時のタ
グの種類に合わせて付与する（ステップＳ３２４）。The customizing means 13 combines important or desired information of interest by referring to the marked information, for example, a personal name list as shown in FIG. 16 or a product catalog as shown in FIG. In step S324, the summary information is customized according to the purpose of the user, and tag information is added to necessary portions according to the type of tag at the time of input.

【００８５】最後に、要約文生成手段１１およびカスタ
マイズ手段１３によって、ユーザが所望する注目情報を
考慮した要約情報が、ユーザの所望する形態（通常のテ
キスト文書またはタグ付き文書）として要約情報出力部
１２より出力される（ステップＳ３２６）。Finally, the summary information generating section 11 and the customizing section 13 convert the summary information in consideration of the attention information desired by the user into a form desired by the user (normal text document or tagged document) in the summary information output section. 12 (step S326).

【００８６】〈効果〉以上のように、具体例３によれ
ば、具体例２の構成に加えて、タグ情報を含む文書情報
からタグ情報を抽出する手段を設けたので、具体例２の
効果に加えて、更に次のような効果が得られる。<Effects> As described above, according to the specific example 3, in addition to the configuration of the specific example 2, a means for extracting the tag information from the document information including the tag information is provided. In addition to the above, the following effects can be further obtained.

【００８７】(1)ＳＧＭＬやＨＴＭＬ、ＸＭＬなどのよ
うなタグ付き文書であっても、通常の文書情報と同様に
注目情報を把握し、要約情報を提供することができる。
(2)タグパターン情報を保持しているため、通常の文書
情報の要約情報にタグを付与して出力したり、タグ付き
文書の要約情報を異種のタグ付き情報として提供するこ
とができる。(1) Even for a tagged document such as SGML, HTML, or XML, attention information can be grasped and summary information can be provided in the same manner as ordinary document information.
(2) Since the tag pattern information is held, it is possible to add a tag to the summary information of normal document information and to output the summary information, or to provide the summary information of a tagged document as heterogeneous tagged information.

【００８８】《具体例４》具体例４は、文書中に画像情
報が含まれている場合は画像情報も抽出するようにした
ものである。<< Specific Example 4 >> In the specific example 4, when image information is included in a document, the image information is also extracted.

【００８９】〈構成〉図２４は、具体例４の文書要約装
置の構成図である。図の装置は、文書情報入力部１、注
目情報決定手段２、文字列パターンデータベース（文字
列パターンＤＢ）３、構文パターンデータベース（構文
パターンＤＢ）４、文章情報格納部５、参照情報決定手
段６、構文解析ルール７、単語辞書８、属性情報決定手
段９、関連性決定手段１０、要約文生成手段１１ａ、要
約情報出力部１２、カスタマイズ手段１３ｂ、タグパタ
ーンデータベース（タグパターンＤＢ）１４、タグ情報
抽出手段１５、画像情報抽出手段１６、画像情報格納部
１７からなる。ここで、要約文生成手段１１ａ、カスタ
マイズ手段１３ｂ、画像情報抽出手段１６、画像情報格
納部１７以外は、具体例３と同様であるため、ここでの
説明は省略する。<Structure> FIG. 24 is a diagram showing the structure of a document summarizing apparatus according to the fourth embodiment. The apparatus shown in the figure includes a document information input unit 1, attention information determining means 2, a character string pattern database (character string pattern DB) 3, a syntax pattern database (syntax pattern DB) 4, a text information storage unit 5, and a reference information determining means 6. , Syntax analysis rules 7, word dictionary 8, attribute information determination means 9, relevance determination means 10, summary sentence generation means 11a, summary information output unit 12, customization means 13b, tag pattern database (tag pattern DB) 14, tag information It comprises an extracting unit 15, an image information extracting unit 16, and an image information storage unit 17. Here, except for the summary sentence generation unit 11a, the customization unit 13b, the image information extraction unit 16, and the image information storage unit 17, the description is omitted because it is the same as that of the specific example 3.

【００９０】画像情報抽出手段１６は、文章情報格納部
５に格納されているマーク付き文書情報を参照し、文書
内の画像情報を調べる機能を有している。タグパターン
データベース１４は、基本的なタグセットの書式やパタ
ーンなどを格納するデータベースである。タグ情報抽出
手段１５は、入力された情報に例えばＳＧＭＬやＨＴＭ
Ｌ、ＸＭＬなどのタグ情報が付与されているかどうかを
タグパターンデータベース１４にアクセスして判断し、
タグ情報が付与されていた場合は、それらの種類や位置
情報を抽出し、文章情報格納部５に格納する機能を有し
ている。The image information extracting means 16 has a function of referring to the marked document information stored in the text information storage section 5 and checking the image information in the document. The tag pattern database 14 is a database that stores basic tag set formats and patterns. The tag information extracting means 15 adds, for example, SGML or HTM to the input information.
Accessing the tag pattern database 14 to determine whether tag information such as L, XML, etc. has been added,
When tag information has been given, it has a function of extracting those types and position information and storing them in the text information storage unit 5.

【００９１】また、要約文生成手段１１ａは、タグ情報
抽出手段１５から関連性決定手段１０までの各手段によ
って決定されマークされた情報と、文章情報格納部５お
よび画像情報格納部１７に格納された情報とを組み合わ
せ、必要に応じて画像位置情報を埋め込んで画像を添付
した形の要約として生成する機能を有している。The summary sentence generating means 11a stores the information determined and marked by each means from the tag information extracting means 15 to the relevancy determining means 10 and the sentence information storage 5 and the image information storage 17. It has a function of combining the information with the information, and embedding the image position information as necessary, and generating an image-attached summary.

【００９２】更に、カスタマイズ手段１３ｂは、ユーザ
の指示によって、タグ情報抽出手段１５から関連性決定
手段１０の各手段によって決定されマークされた情報
と、文章情報格納部５および画像情報格納部１７に格納
された情報とを組み合わせ、また、必要に応じて画像位
置情報を埋め込んで画像を添付した形で、例えば、一覧
表やインデックス、注目箇所のハイライトなどを行い、
要約情報をカスタマイズする機能を有している。Further, the customizing means 13b stores the information determined and marked by the respective means of the relevance determining means 10 from the tag information extracting means 15 and the text information storage section 5 and the image information storage section 17 according to the user's instruction. Combine with the stored information, and, if necessary, in the form of embedding image position information and attaching the image, for example, perform a list, index, highlight of the point of interest,
It has a function to customize summary information.

【００９３】〈動作〉図２５、図２６は、具体例４の動
作を示すフローチャート（その１、その２）である。<Operation> FIGS. 25 and 26 are flowcharts (Nos. 1 and 2) showing the operation of the fourth embodiment.

【００９４】具体例４において、文書情報入力部１から
入力される文書は、ＣＤ−ＲＯＭに格納されたＳＧＭＬ
タグ付きの新聞記事やＷＷＷ情報に代表されるようなＨ
ＴＭＬまたはＳＨＴＭＬのようなタグ情報とテキスト文
字列情報で構成されるタグ付き文書情報であり、一括ま
たはあるまとまりで入力される（ステップＳ４０１）。In the specific example 4, the document input from the document information input unit 1 is SGML stored in the CD-ROM.
H as represented by tagged newspaper articles and WWW information
Tagged document information including tag information such as TML or SHTML and text character string information, and is input collectively or in a unit (step S401).

【００９５】入力されたタグ付き文書情報は、タグ情報
抽出手段１５によってタグパターンデータベース１４が
参照され（ステップＳ４０２）、データベース中のタグ
パターンと一致するタグ情報、または、ユーザが指示す
るタグ情報があるかどうかが調べられる（ステップＳ４
０３）。The input tag-added document information is referred to the tag pattern database 14 by the tag information extracting means 15 (step S402), and tag information matching the tag pattern in the database or tag information designated by the user is stored. It is checked whether there is (Step S4)
03).

【００９６】ステップＳ４０３からステップＳ４０５の
動作については、具体例３におけるステップＳ３０３か
らステップＳ３０５の動作と同様である。The operations in steps S403 to S405 are the same as the operations in steps S303 to S305 in the third embodiment.

【００９７】次に、画像情報抽出手段１６では、文章情
報格納部５に格納されているマークされた文書情報を参
照して（ステップＳ４０６）、文書内の画像情報を調べ
る（ステップＳ４０７）。その結果、マーク文書内に画
像または画像表示を指定するような部分が見つかった場
合、画像情報抽出手段１６は画像表示部分とそれに対応
する文字列部分をセットにして画像情報格納部１７に格
納する（ステップＳ４０８、Ｓ４０９）。Next, the image information extracting means 16 refers to the marked document information stored in the text information storage section 5 (step S406), and checks the image information in the document (step S407). As a result, when an image or a part for designating image display is found in the mark document, the image information extracting unit 16 stores the image display part and the corresponding character string part in the image information storage unit 17 as a set. (Steps S408 and S409).

【００９８】図２７は、タグ付き文書情報の一例を示す
説明図である。図２８は、画像情報格納部１７に格納さ
れた画像情報の一例を示す説明図である。FIG. 27 is an explanatory diagram showing an example of tagged document information. FIG. 28 is an explanatory diagram illustrating an example of the image information stored in the image information storage unit 17.

【００９９】例えば、図２７に示すような文において、
タグが付与された文字列部分“ＯＫＴＡＣ４５０００”
と画像情報の位置を示すリンク情報“<img name=“…ok
tac45000.gif”>”の部分をセットにして図２８に示す
ように画像情報格納部１７に格納する。一方、画像情報
部分が見つからない場合は、そのままステップＳ４１０
に進む。For example, in a sentence as shown in FIG.
Character string part "OKTAC45000" with tag
And link information indicating the position of image information “<img name =“… ok
tac45000.gif ">" is set as a set and stored in the image information storage unit 17 as shown in FIG. On the other hand, if the image information portion is not found, the process proceeds to step S410.
Proceed to.

【０１００】以下、ステップＳ４１０からステップＳ４
２５までの動作については、具体例３におけるステップ
Ｓ３０６からステップＳ３２１までの動作と同様であ
る。Hereinafter, steps S410 to S4
The operations up to 25 are the same as the operations from step S306 to step S321 in the specific example 3.

【０１０１】ステップＳ４０１からステップＳ４２５ま
での動作終了後、要約文生成手段１１およびカスタマイ
ズ手段１３ｂによって、文章情報格納部５からマーク付
き文書情報が参照される（ステップＳ４２７）。ここ
で、ユーザによって要約情報のカスタマイズが指定され
ている場合はステップＳ４２８に進み、指定されていな
い場合はステップＳ４２９に進む。After the operation from step S401 to step S425 is completed, the summary sentence generating means 11 and the customizing means 13b refer to the marked document information from the text information storage section 5 (step S427). Here, if customization of the summary information is specified by the user, the process proceeds to step S428, and if not, the process proceeds to step S429.

【０１０２】ステップＳ４２８では、要約文生成手段１
１が、マークされた情報を参照しながら、重要またはユ
ーザが所望する注目情報を組み合わせ、必要に応じて画
像情報を添えて要約文を作成し、必要な箇所にはタグ情
報も入力時のタグの種類に合わせて付与される。In step S428, the summary sentence generating means 1
1 combines the important or desired information of interest with reference to the marked information, creates a summary with image information as needed, and, where necessary, adds the tag information to the tag at the time of input. It is given according to the type of.

【０１０３】一方、ステップＳ４２９では、カスタマイ
ズ手段１３ｂによりマークされた情報を参照しながら、
重要またはユーザの注目情報を組み合わせて、例えば、
人名リストであったり、製品カタログであるといったよ
うに、ユーザの目的にあった形で画像情報を添えて要約
情報をカスタマイズし、必要な箇所にはタグ情報も入力
時のタグの種類に合わせて付与される。On the other hand, in step S429, while referring to the information marked by the customizing means 13b,
Combining important or user attention information, for example,
Customize the summary information with the image information according to the purpose of the user, such as a person name list or a product catalog, and also add the tag information where necessary according to the type of tag at the time of input. Granted.

【０１０４】図２９は、人名リストとしてカスタマイズ
した要約情報の説明図である。図３０は、製品カタログ
としてカスタマイズした要約情報の説明図である。FIG. 29 is an explanatory diagram of summary information customized as a personal name list. FIG. 30 is an explanatory diagram of summary information customized as a product catalog.

【０１０５】このようにして、要約文生成手段１１およ
びカスタマイズ手段１３ｂによって、ユーザが所望する
注目情報を考慮した要約情報が、ユーザの所望する形態
（通常のテキスト文書またはタグ付き文書）として得ら
れる（ステップＳ４３０）。得られた要約情報には画像
情報が付与されており、タグ付き文書の場合はサンプル
の縮小画像などが添付された形で要約情報出力部１２よ
り提供される。As described above, the summary sentence generation unit 11 and the customization unit 13b obtain the summary information in consideration of the attention information desired by the user in a form desired by the user (normal text document or tagged document). (Step S430). Image information is added to the obtained summary information, and in the case of a document with a tag, the summary information output unit 12 provides the sample with a reduced image or the like attached thereto.

【０１０６】ユーザは、必要に応じて、縮小画像または
参照部分を指定することで、図２９や図３０に示すよう
に画像情報が保持されている文書情報を参照することが
できる。The user can refer to the document information holding the image information as shown in FIGS. 29 and 30 by designating the reduced image or the reference portion as necessary.

【０１０７】〈効果〉以上のように具体例４によれば、
具体例３の構成に加えて、文書中に含まれる画像情報を
抽出する画像情報抽出手段を設けるようにしたので、具
体例３の効果に加えて、更に次のような効果が得られ
る。<Effects> As described above, according to the fourth embodiment,
Since the image information extracting means for extracting the image information included in the document is provided in addition to the configuration of the specific example 3, the following effect can be obtained in addition to the effect of the specific example 3.

【０１０８】(1)テキスト情報の注目すべき部分を自動
的に判断し、更に、要約を表示する際に関連する画像情
報を同じに表示させることができる。 (2)検索結果を要約して表示するときに、画像情報を挿
し絵として付与することで、文章だけでは伝わりにくい
情報（色、形）などを感覚的に伝達することが可能とな
る。(1) A noteworthy portion of text information is automatically determined, and further, related image information can be displayed in the same manner when displaying a summary. (2) When summarizing and displaying search results, by providing image information as an illustration, it is possible to intuitively convey information (color, shape, etc.) that is difficult to transmit only by text.

【０１０９】尚、上記具体例４では、要約情報に画像を
付加するようにしたが、入力文書中に音声情報が含まれ
ていた場合は、画像情報と同様な方法で音声情報を要約
に付加するようにしてもよい。In the above specific example 4, an image is added to the summary information. However, when audio information is included in the input document, the audio information is added to the abstract in the same manner as the image information. You may make it.

[Brief description of the drawings]

【図１】本発明の文書要約装置の具体例１を示す構成図
である。FIG. 1 is a configuration diagram showing a specific example 1 of a document summarizing apparatus of the present invention.

【図２】具体例１の文書要約装置の動作を示すフローチ
ャートである。FIG. 2 is a flowchart illustrating an operation of the document summarizing apparatus according to the first embodiment.

【図３】入力情報の一例を示す説明図である。FIG. 3 is an explanatory diagram illustrating an example of input information.

【図４】文字列パターンデータベースに格納された文字
列パターンの説明図である。FIG. 4 is an explanatory diagram of a character string pattern stored in a character string pattern database.

【図５】注目情報がマークされた文書情報の例を示す説
明図である。FIG. 5 is an explanatory diagram illustrating an example of document information in which attention information is marked.

【図６】参照情報がマークされた文書情報の例を示す説
明図である。FIG. 6 is an explanatory diagram showing an example of document information in which reference information is marked.

【図７】属性情報がマークされた文書情報の例を示す説
明図である。FIG. 7 is an explanatory diagram illustrating an example of document information in which attribute information is marked.

【図８】構文パターンデータベース４中の構文パターン
の一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of a syntax pattern in a syntax pattern database 4.

【図９】属性情報がマークされた他の文書情報の一例を
示す説明図である。FIG. 9 is an explanatory diagram showing an example of other document information marked with attribute information.

【図１０】構文パターンの一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a syntax pattern.

【図１１】属性情報がマークされた他の文書情報の説明
図である。FIG. 11 is an explanatory diagram of other document information marked with attribute information.

【図１２】属性情報がマークされた文書情報の説明図で
ある。FIG. 12 is an explanatory diagram of document information in which attribute information is marked.

【図１３】要約情報の一例を示す説明図である。FIG. 13 is an explanatory diagram illustrating an example of summary information.

【図１４】具体例２の文書要約装置の構成図である。FIG. 14 is a configuration diagram of a document summarization device of a specific example 2.

【図１５】具体例２の動作を示すフローチャートであ
る。FIG. 15 is a flowchart illustrating an operation of a specific example 2;

【図１６】人名リストとしてカスタマイズした要約情報
の説明図である。FIG. 16 is an explanatory diagram of summary information customized as a personal name list.

【図１７】製品カタログとしてカスタマイズした要約情
報の説明図である。FIG. 17 is an explanatory diagram of summary information customized as a product catalog.

【図１８】具体例３の文書要約装置の構成図である。FIG. 18 is a configuration diagram of a document summarizing apparatus according to a third embodiment.

【図１９】具体例３の動作を示すフローチャート（その
１）である。FIG. 19 is a flowchart (part 1) illustrating the operation of the third embodiment.

【図２０】具体例３の動作を示すフローチャート（その
２）である。FIG. 20 is a flowchart (part 2) illustrating the operation of the third embodiment;

【図２１】入力文書の一例を示す説明図である。FIG. 21 is an explanatory diagram illustrating an example of an input document.

【図２２】タグパターンの一例を示す説明図である。FIG. 22 is an explanatory diagram illustrating an example of a tag pattern.

【図２３】タグ情報がマーキングされたタグ付き文書情
報の一例を示す説明図である。FIG. 23 is an explanatory diagram showing an example of tagged document information in which tag information is marked.

【図２４】具体例４の文書要約装置の構成図である。FIG. 24 is a configuration diagram of a document summarizing apparatus according to a fourth embodiment.

【図２５】具体例４の動作を示すフローチャート（その
１）である。FIG. 25 is a flowchart (part 1) illustrating the operation of the specific example 4;

【図２６】具体例４の動作を示すフローチャート（その
２）である。FIG. 26 is a flowchart (part 2) illustrating the operation of the specific example 4;

【図２７】タグ付き文書情報の一例を示す説明図であ
る。FIG. 27 is an explanatory diagram showing an example of tagged document information.

【図２８】画像情報格納部１７に格納された画像情報の
一例を示す説明図である。FIG. 28 is an explanatory diagram illustrating an example of image information stored in the image information storage unit 17;

【図２９】人名リストとしてカスタマイズした要約情報
の説明図である。FIG. 29 is an explanatory diagram of summary information customized as a personal name list.

【図３０】製品カタログとしてカスタマイズした要約情
報の説明図である。FIG. 30 is an explanatory diagram of summary information customized as a product catalog.

[Explanation of symbols]

１文書情報入力部２注目情報決定手段５文章情報格納部６参照情報決定手段９属性情報決定手段１０関連性決定手段１１要約文生成手段１３、１３ａ、１３ｂカスタマイズ手段１５タグ情報抽出手段１６画像情報抽出手段１７画像情報格納部 DESCRIPTION OF SYMBOLS 1 Document information input part 2 Attention information determination means 5 Text information storage part 6 Reference information determination means 9 Attribute information determination means 10 Relevance determination means 11 Abstract sentence generation means 13, 13a, 13b Customization means 15 Tag information extraction means 16 Image information Extraction means 17 Image information storage

───────────────────────────────────────────────────── フロントページの続き (72)発明者下畑光夫東京都港区虎ノ門１丁目７番12号沖電気工業株式会社内Ｆターム(参考） 5B075 ND03 NK02 NS01 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Mitsuo Shimohata 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd. F-term (reference) 5B075 ND03 NK02 NS01

Claims

[Claims]

1. An attention information determining unit for extracting attention information from a character string in an input document based on a predetermined rule, and a case where there is a character string in a reference relationship with the attention information in the input document. Reference information determining means for providing reference information indicating this to the input document; and when there is a character string related to the attention information in the input document, providing relevant information indicating this to the input document. A document summarizing apparatus comprising: a relevance determining unit; and a summary sentence generating unit that generates a summary sentence of the input document based on the attention information, the reference information, and the related information.

2. The document summarizing apparatus according to claim 1, wherein when an arbitrary instruction is received from a user, the input document is reflected by reflecting the instruction of the user based on attention information, reference information, and related information. A document summarizing apparatus comprising: a customizing unit that generates an abstract.

3. The document summarizing apparatus according to claim 1, wherein it is determined whether tag information is added to the input document. If the tag information is added, the type and attribute of the tag information are determined. Tag information extraction means for extracting the input document; a summary sentence generation means for generating a summary of the input document based on the type and attribute of the tag information extracted by the tag information extraction means; and attention information; Is received, based on attention information, reference information, related information, and the type and attribute of the tag information,
A document summarizing device for generating a summary of the input document by reflecting the instruction of the user.

4. The document summarizing apparatus according to claim 3, wherein it is determined whether or not there is image information to which tag information is added in the input document, and if there is such image information, the image information to be extracted is extracted. Extracting means for generating a summary of the input document based on the type and attribute of the tag information extracted by the tag information extracting means and the attention information; and when the image information is extracted by the image information extracting means, A summary sentence generating means for adding the image, and when an arbitrary instruction from the user is received, based on attention information, reference information, related information, and the type and attribute of the tag information,
A document summarizing means for generating an abstract of the input document by reflecting the instruction of the user, and customizing means for adding the image when the image information is extracted by the image information extracting means. apparatus.