JPH10340271A

JPH10340271A - Document abstract preparation device, and storage medium where document abstract generation program is recorded

Info

Publication number: JPH10340271A
Application number: JP9150575A
Authority: JP
Inventors: Yasuhiro Ishitobi; 康浩石飛; Yoshihiro Ueda; 良寛上田
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1997-06-09
Filing date: 1997-06-09
Publication date: 1998-12-22

Abstract

PROBLEM TO BE SOLVED: To provide a document abstract preparation device for preparing an abstract for judging the necessity of a document obtained as the result of document retrieval with a simple operation in accordance with retrieval intention. SOLUTION: For preparing the abstract of a document in the document abstract preparation device, a document division means 1 divides document data into document elements. A retrieval intent holding means 2 holds inputted retrieval intent. An adaptability calculation means 3 calculates adaptability on the retrieval intention of the divided document element. A document element extraction means 4 extracts an adapted document element from the document element based on calculated adaptability. An abstract generation means 5 generates the abstract of the document by the adapted document element.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文書抄録作成装置及
び文書抄録作成プログラムを記録した記憶媒体に関し、
特に文書の抄録を作成する文書抄録作成装置、及び文書
の抄録を作成する文書抄録作成プログラムを記録した記
憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document abstract creation apparatus and a storage medium storing a document abstract creation program.
In particular, the present invention relates to a document abstract creation apparatus for creating an abstract of a document, and a storage medium storing a document abstract creation program for creating an abstract of a document.

【０００２】[0002]

【従来の技術】従来、大量の文書データの中から必要に
応じて文書を検索するためには様々な方法が考えられて
いる。検索対象となる文書データの量が増えるに従っ
て、より高速な、より適切な検索処理が必要となり、今
日ではコンピュータ等を使用した文書検索も利用されて
いる。2. Description of the Related Art Conventionally, various methods have been considered in order to retrieve a document from a large amount of document data as needed. As the amount of document data to be searched increases, faster and more appropriate search processing is required, and document search using a computer or the like is also used today.

【０００３】文書検索の結果に占める必要な文書の割合
を適合率と呼ぶ。現在、この適合率を上げるための技術
も色々と開発されているが、最終的には検索の結果得ら
れた文書を、ユーザが逐一読み解いて要不要の判断を行
うことになる。[0003] The ratio of a required document to the result of a document search is called a relevance ratio. At present, various techniques for increasing the relevance rate have been developed, but ultimately, the user will need to read the documents obtained as a result of the search one by one to determine whether or not the documents are necessary.

【０００４】しかし得られた文書が大量であった場合、
この判断のためにかかる時間や労力は大変なものにな
る。そこで、検索結果の文書を抄録の状態で提示し、こ
の判断を支援したいという要求がある。However, if the number of obtained documents is large,
The time and effort required to make this determination can be enormous. Therefore, there is a demand for presenting the document of the search result in the form of an abstract and supporting this determination.

【０００５】検索結果文書の抄録を作成する方法として
は、まず、文書の分野毎に予め重要な単語（キーワー
ド）を登録しておき、このキーワードを含んだ文章を抽
出する方法が、特開平２−９３８６６号公報に示されて
いる。[0005] As a method of preparing an abstract of a search result document, first, an important word (keyword) is registered in advance for each field of the document, and a method of extracting a sentence containing the keyword is disclosed in Japanese Patent Laid-Open Publication No. HEI 2 (1998) -197686. No. 93866.

【０００６】しかし、この方法では予めキーワードを登
録してある限定された分野に属する文書の抄録しか作成
できない。また、ユーザはキーワードを逐次追加登録し
なければならず、その労力は軽視できるものではない。However, according to this method, only an abstract of a document belonging to a limited field in which a keyword is registered in advance can be created. In addition, the user must sequentially register additional keywords, and the effort is not negligible.

【０００７】また、文献「Fast Generation of Abstrac
ts from General Domain Text Corpora by Extracting
Relevant Sentences」（Zechner 、COLONG'96 ）に示さ
れた方法では、キーワードについて重み付けした表を作
成する。抄録を作成する場合には、文書を構成する文章
毎にその文章に含まれたキーワードの重みの和を求め、
各文章のスコアとする。そして、スコアの高い文章を一
定数抽出し、出現順に並べて文書の抄録を作成する。[0007] Also, the document "Fast Generation of Abstrac"
ts from General Domain Text Corpora by Extracting
Relevant Sentences "(Zechner, COLONG'96) creates a weighted table of keywords. When creating an abstract, the sum of the weights of the keywords contained in each sentence constituting the document is calculated,
Score for each sentence. Then, a certain number of sentences having a high score are extracted and arranged in the order of appearance to create an abstract of the document.

【０００８】しかし、この方法では文章の長短に関係な
く機械的に文章毎にスコアを計算するため、各キーワー
ドの重みから計算される文章のスコアが、その文章の重
要度を適切に表わしているとは限らない。However, in this method, a score is calculated for each sentence mechanically regardless of the length of the sentence. Therefore, the score of the sentence calculated from the weight of each keyword appropriately represents the importance of the sentence. Not necessarily.

【０００９】また、上記で述べたいずれの方法でもユー
ザの文書検索の意図や目的とは関係なく抄録を作成する
ので、抽出されない部分にユーザの必要な情報を含んで
いる場合がある。これでは、文書検索の結果として得ら
れる文書がユーザの利用者意図に合っているか否かを判
断する支援としては、不十分である。In any of the above-described methods, an abstract is created irrespective of the user's intention or purpose of document search. Therefore, an unextracted portion may include information necessary for the user. This is insufficient as support for determining whether or not the document obtained as a result of the document search matches the user's intention.

【００１０】そこで、文書検索のために入力されるキー
ワードや検索式等を、抄録を作成する際に利用すること
が考えられている。構造化文書を対象とした文書検索の
過程で、文書検索のために入力される文字列を含む文章
を抽出し、抽出した文章を羅列して抄録を作成する方法
が特開平６−３０９３６８号公報に示されている。[0010] Therefore, it has been considered to use keywords, search formulas, and the like input for document search when creating an abstract. Japanese Patent Laid-Open Publication No. Hei 6-309368 discloses a method of extracting a sentence including a character string input for a document search in the process of searching for a structured document and listing the extracted sentences to create an abstract. Is shown in

【００１１】また、検索の結果得られた文書の文脈構造
を詳細に解析し、この結果を利用してユーザの指定した
話題に関する背景や事例を提示、ユーザの所望する視点
での抄録を作成する技術が特開平７−１８２３７３号公
報に示されている。In addition, the context structure of the document obtained as a result of the search is analyzed in detail, and the results are used to present backgrounds and examples related to topics specified by the user, and to create an abstract from the viewpoint desired by the user. The technology is disclosed in JP-A-7-182373.

【００１２】[0012]

【発明が解決しようとする課題】しかし、文書検索のた
めに入力される文字列を含んだ文章を抽出して羅列する
方法では、文書中にその文字列が頻繁に出現する場合、
抄録の規模が元の文書とあまり変わらない巨大なものに
なってしまうという問題点があった。However, in the method of extracting and enumerating sentences including a character string input for document search, if the character string frequently appears in a document,
There was a problem that the size of the abstract became huge, not much different from the original document.

【００１３】また、文脈構造を解析してユーザの所望す
る視点での抄録を作成する技術では、ユーザに指定され
た話題が元の文書の中でどのような位置付けとなってい
るのかを詳細に解析するため、検索結果の文書が大量で
あった場合、抄録を作成するためだけに、繁雑な操作が
必要になってしまうという問題点があった。[0013] Further, in the technology of creating an abstract from a viewpoint desired by a user by analyzing a context structure, a detailed description is given of how a topic designated by the user is positioned in an original document. There is a problem in that if a large number of documents are retrieved as a result of analysis, complicated operations are required only to create abstracts.

【００１４】本発明はこのような点に鑑みてなされたも
のであり、文書検索の結果得られる文書について要不要
を判断するための抄録を利用者意図に応じて簡単な操作
で作成する文書抄録作成装置を提供することを目的とす
る。SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and provides a document abstract in which an abstract for judging the necessity of a document obtained as a result of a document search is created by a simple operation according to a user's intention. It is an object to provide a creation device.

【００１５】また、本発明の別の目的は、コンピュータ
に、文書検索の結果得られる文書について要不要を判断
するための抄録を利用者意図に応じて簡単な操作で作成
させる文書抄録作成プログラムを記録した記憶媒体を提
供することである。Another object of the present invention is to provide a document abstract creation program for causing a computer to create an abstract for judging the necessity of a document obtained as a result of a document search by a simple operation according to a user's intention. It is to provide a recorded storage medium.

【００１６】[0016]

【課題を解決するための手段】本発明では上記課題を解
決するために、文書の抄録を作成する文書抄録作成装置
において、前記文書を文書要素に分割する文書分割手段
と、入力される前記利用者意図を保持する利用者意図保
持手段と、分割された文書要素の前記利用者意図に対す
る適合度を算出する適合度算出手段と、前記適合度に基
づいて前記文書要素から適合文書要素を抽出する文書要
素抽出手段と、前記適合文書要素にて前記文書の抄録を
作成する抄録作成手段とから構成されることを特徴とす
る文書抄録作成装置が提供される。According to the present invention, in order to solve the above-mentioned problems, in a document abstract creation apparatus for creating an abstract of a document, a document dividing means for dividing the document into document elements, Means for holding user intention, means for calculating the degree of suitability of the divided document elements with respect to the user intention, and extracting a suitable document element from the document element based on the degree of suitability A document abstract creation device is provided, comprising: a document element extraction unit; and an abstract creation unit that creates an abstract of the document using the compatible document element.

【００１７】このような文書抄録作成装置にて文書の抄
録を作成する場合、文書分割手段は文書を文書要素に分
割する。また、利用者意図保持手段は入力される利用者
意図を保持する。適合度算出手段は分割された文書要素
の利用者意図に対する適合度を算出する。文書要素抽出
手段は適合度に基づいて文書要素から適合文書要素を抽
出する。そして、抄録作成手段が適合文書要素にて文書
の抄録を作成する。When an abstract of a document is created by such a document abstract creating apparatus, the document dividing means divides the document into document elements. The user intention holding means holds the input user intention. The relevance calculating means calculates the relevance of the divided document elements to the user's intention. The document element extracting means extracts a compatible document element from the document element based on the degree of relevance. Then, the abstract creating means creates an abstract of the document using the relevant document element.

【００１８】このように本発明の第１の文書抄録作成装
置では、文書を文書要素に分割し、利用者意図との適合
度の高い文書要素を抽出して文書の抄録を作成するの
で、文書の要不要を判断するための抄録を、利用者意図
に応じて簡単な操作で作成することができる。As described above, in the first document abstract creation apparatus of the present invention, a document is divided into document elements, and a document element having a high degree of compatibility with the user's intention is extracted to create an abstract of the document. An abstract for judging the necessity of the user can be created by a simple operation according to the user's intention.

【００１９】また、本発明では上記課題を解決するため
に、文書の抄録を作成する文書抄録作成装置において、
入力される利用者意図を保持する利用者意図保持手段
と、前記利用者意図から重要キーワードを抽出する重要
キーワード抽出手段と、前記文書から前記重要キーワー
ドを含んだ重要文章を抽出する重要文章抽出手段と、前
記重要文章にて前記文書の抄録を作成する抄録作成手段
とから構成されることを特徴とする文書抄録作成装置が
提供される。According to another aspect of the present invention, there is provided a document abstract creation apparatus for creating an abstract of a document.
User intention holding means for holding input user intention, important keyword extracting means for extracting important keywords from the user intention, and important sentence extracting means for extracting important sentences containing the important keywords from the document And an abstract creating means for creating an abstract of the document based on the important sentence.

【００２０】このような文書抄録作成装置にて文書の抄
録を作成する場合、利用者意図保持手段は入力される利
用者意図を保持する。また、重要キーワード保持手段は
利用者意図から重要キーワードを抽出する。重要文章抽
出手段は文書から重要キーワードを含んだ重要文章を抽
出する。そして、抄録作成手段は重要文書にて文書の抄
録を作成する。When an abstract of a document is created by such a document abstract creation device, the user intention holding means holds the input user intention. The important keyword holding means extracts important keywords from the user's intention. The important sentence extracting means extracts an important sentence including an important keyword from the document. Then, the abstract creating means creates an abstract of the document with the important document.

【００２１】このように本発明の第２の文書抄録作成装
置では、文書から重要キーワードを含んだ重要文章を抽
出して文書の抄録を作成するので、文書の要不要を判断
するための抄録を、利用者意図に応じて簡単な操作で作
成することができる。As described above, in the second document abstract creation apparatus of the present invention, an important sentence including an important keyword is extracted from a document to create an abstract of the document. It can be created by a simple operation according to the user's intention.

【００２２】さらに、本発明では上記課題を解決するた
めに、文書の抄録を作成する文書抄録作成プログラムを
記録した記憶媒体において、コンピュータを、前記文書
を文書要素に分割する文書分割手段、入力される利用者
意図を保持する利用者意図保持手段、分割された文書要
素の前記利用者意図に対する適合度を算出する適合度算
出手段、前記適合度に基づいて前記文書要素から適合文
書要素を抽出する文書要素抽出手段、前記適合文書要素
にて前記文書の抄録を作成する抄録作成手段、として機
能させることを特徴とする文書抄録作成プログラムを記
録した記憶媒体が提供される。Further, according to the present invention, in order to solve the above-mentioned problems, in a storage medium storing a document abstract creation program for creating an abstract of a document, a computer is provided with a document dividing means for dividing the document into document elements. Means for holding user intention, means for calculating the degree of suitability of the divided document elements with respect to the user intention, and extracting a suitable document element from the document element based on the degree of suitability A storage medium storing a document abstract creation program characterized by functioning as a document element extraction unit and an abstract creation unit for creating an abstract of the document using the relevant document element is provided.

【００２３】このような文書抄録作成プログラムにてコ
ンピュータに文書の抄録を作成させる場合、コンピュー
タは文書を文書要素に分割する文書分割手段及び入力さ
れる利用者意図を保持する利用者意図保持手段として機
能する。また、コンピュータは、分割された文書要素の
利用者意図に対する適合度を算出する適合度算出手段及
び適合度に基づいて文書要素から適合文書要素を抽出す
る文書要素抽出手段としても機能する。さらに、コンピ
ュータは、適合文書要素にて文書の抄録を作成する抄録
作成手段として機能する。When a computer prepares an abstract of a document using such a program for preparing a document abstract, the computer is provided as a document dividing means for dividing the document into document elements and a user intention holding means for holding the input user intention. Function. Further, the computer also functions as a degree-of-fit calculating means for calculating the degree of suitability of the divided document element to the user's intention, and as a document element extracting means for extracting a relevant document element from the document element based on the degree of matching. Further, the computer functions as an abstract creating unit that creates an abstract of the document using the compatible document element.

【００２４】このように本発明の文書抄録作成プログラ
ムでは、コンピュータに、文書から重要キーワードを含
んだ重要文章を抽出して文書の抄録を作成する機能を実
現させるので、文書の要不要を判断するための抄録を、
利用者意図に応じて簡単な操作で作成することができ
る。As described above, in the document abstract creation program of the present invention, a function of extracting an important sentence including an important keyword from a document and creating an abstract of the document is realized, so that the necessity of the document is determined. Abstract for
It can be created by a simple operation according to the user's intention.

【００２５】[0025]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。図１は、本発明の第１の文書抄録
作成装置の原理構成を示したブロック図である。本発
明の第１の文書抄録作成装置は、文書分割手段１と、利
用者意図保持手段２と、適合度算出手段３と、文書要素
抽出手段４と、抄録作成手段５とから構成され、文書デ
ータから、入力される利用者意図に応じた抄録を作成す
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the principle configuration of the first document abstract creation device of the present invention. The first document abstract creation device according to the present invention comprises a document division unit 1, a user intention holding unit 2, a matching degree calculation unit 3, a document element extraction unit 4, and an abstract creation unit 5. An abstract is created from the data according to the user's intention.

【００２６】ここで文書分割手段１は文書データを文書
要素Ａ、Ｂ、Ｃ・・・に分割する。また、利用者意図保
持手段２は入力される利用者意図を保持する。適合度算
出手段３は、分割された文書要素Ａ、Ｂ、Ｃ・・・の、
利用者意図に対する適合度を算出する。文書要素Ａ、Ｃ
について算出された適合度が高い場合、文書要素抽出部
４はこの適合度に基づいて文書要素Ａ、Ｃを適合文書要
素として抽出する。抄録作成手段５は抽出された適合文
書要素Ａ、Ｃにて文書の抄録を作成する。Here, the document dividing means 1 divides the document data into document elements A, B, C. The user intention holding means 2 holds the input user intention. The relevance calculating means 3 calculates the divided document elements A, B, C,.
Calculate the degree of conformity to the user's intention. Document elements A, C
If the relevance calculated for is high, the document element extraction unit 4 extracts the document elements A and C as the relevant document elements based on the relevance. The abstract creating means 5 creates an abstract of the document based on the extracted conforming document elements A and C.

【００２７】このように本発明の第１の文書抄録作成装
置では、文書を文書要素に分割し、利用者意図との適合
度の高い文書要素を抽出して文書の抄録を作成するの
で、文書の要不要を判断するための抄録を、利用者意図
に応じて簡単な操作で作成することができる。As described above, in the first document abstract creation apparatus of the present invention, a document is divided into document elements, and a document element having high relevance to the user's intention is extracted to create an abstract of the document. An abstract for judging the necessity of the user can be created by a simple operation according to the user's intention.

【００２８】図２は、本発明の第１の文書抄録作成装置
を適用した文書検索装置の実施の形態を示す図である。
図に示した文書検索装置１０は、ユーザからの入力を受
け付ける入力装置２０及びユーザに作成した抄録を出力
する出力装置３０と接続されている。この文書検索装置
１０は、コンピュータ等で実現してもよい。なお、出力
装置３０はモニタ等の表示装置のみで構成してもよい。FIG. 2 is a diagram showing an embodiment of a document search device to which the first document abstract creation device of the present invention is applied.
The illustrated document search device 10 is connected to an input device 20 that receives an input from a user and an output device 30 that outputs an abstract created for the user. This document search device 10 may be realized by a computer or the like. Note that the output device 30 may be configured with only a display device such as a monitor.

【００２９】ここで文書検索装置１０は、文書データ記
憶部１１、文書指定部１２、文書分割部１３、利用者意
図記憶部１４、適合度算出部１５、適合度記憶部１６、
抄録作成部１７から構成されている。Here, the document retrieval device 10 includes a document data storage unit 11, a document designation unit 12, a document division unit 13, a user intention storage unit 14, a fitness calculation unit 15, a fitness storage unit 16,
It is composed of an abstract creation unit 17.

【００３０】文書データ記憶部１１は、入力される文書
データを記憶し、文書データの指定を入力されると対応
する文書データを出力する。ここで文書データ記憶部１
１に記憶される文書データは、文書検索の結果得られた
文書群でもよい。文書指定部１２は、記憶されている文
書データのうち抄録を作成する対象とする文書データの
指定を受け付ける。The document data storage section 11 stores the input document data, and when the designation of the document data is input, outputs the corresponding document data. Here, the document data storage unit 1
The document data stored in 1 may be a group of documents obtained as a result of the document search. The document specification unit 12 receives specification of document data for which an abstract is to be created among the stored document data.

【００３１】文書分割部１３は、指定されて出力された
文書データを文書要素に分割する。なお、文書分割部１
３では文書を分割するための分割単位を複数定義してお
り、抄録作成処理が順調に進まなかった場合には分割単
位を変更して再度文書データの分割を行う。The document dividing section 13 divides the designated and output document data into document elements. Note that the document division unit 1
In No. 3, a plurality of division units for dividing the document are defined. If the abstract creation processing does not proceed smoothly, the division unit is changed and the document data is divided again.

【００３２】利用者意図記憶部１４は、何のために文書
の抄録を作成するのか、その目的である利用者意図の入
力を受け付け、これを記憶する。適合度算出部１５は、
分割された文書要素の各々について利用者意図との適合
度を算出する。適合度記憶部１６は、算出された適合度
と文書要素との関係を記憶する。The user intention storage unit 14 receives an input of a user intention, which is the purpose of preparing an abstract of a document, and stores it. The fitness calculating unit 15 calculates
The degree of conformity with the user intention is calculated for each of the divided document elements. The relevance storage unit 16 stores the relationship between the calculated relevance and the document element.

【００３３】そして、抄録作成部１７は、適合度記憶部
１６に記憶された適合度に基づいて文書要素から適合文
書要素を抽出し、文書の抄録を作成する。次に、このよ
うな文書検索装置１０にて文書の抄録を作成する手順に
ついて説明する。Then, the abstract creating unit 17 extracts a matching document element from the document elements based on the matching degree stored in the matching degree storage unit 16 and creates an abstract of the document. Next, a procedure for creating an abstract of a document in the document search device 10 will be described.

【００３４】図３は、図２に示した文書検索装置１０に
おける文書抄録作成の手順を説明するフローチャートで
ある。なお、文書データ記憶部１１には予め文書データ
が記憶されているとする。以下、図中のステップ番号に
沿って説明を行う。［Ｓ１］利用者意図記憶部１４は、入力装置２０を介し
て文書の抄録作成の目的である利用者意図を示す検索式
の入力を受け付け、これを記憶する。［Ｓ２］文書指定部１２は、入力装置２０を介して抄録
作成する対象文書の指定を受け付け、これを文書データ
記憶部１１に入力する。［Ｓ３］指定された文書を文書データ記憶部１１から供
給された文書分割部１３は、この文書を文単位の文書要
素に分割する。［Ｓ４］適合度算出部１５は、利用者意図記憶部１４か
ら利用者意図を取得し、文書分割部１３から入力される
文書要素すべてに対し、適合度を算出する。算出した適
合度は文書要素との対応関係と共に適合度記憶部１６に
記憶させる。［Ｓ５］抄録作成部１７は、適合度記憶部１６に記憶さ
れた各文書要素の適合度から、適合度１．０の文書要素
が存在するか否か判断する。適合度１．０の文書要素が
存在する場合には、ステップＳ６へ進む。適合度１．０
の文書要素が存在しない場合には、ステップＳ７へ進
む。［Ｓ６］抄録作成部１７には文書分割部１３から分割さ
れた文書要素が入力されているので、そのうち適合度
１．０の文書要素を適合文書要素として抽出する。［Ｓ７］抄録作成部１７は、文書要素の分割単位から、
分割単位を大きくできるか否か判断する。分割単位を大
きくできる場合はステップＳ８へ進む。分割単位が既に
最大となっていて、それ以上大きくできない場合はステ
ップＳ９へ進む。［Ｓ８］抄録作成部１７は、文書分割部１３に分割単位
の変更を要求する。分割単位の変更要求を受けた文書分
類部１３は、文書データ記憶部１１から供給されている
文書を大きな分割単位で分割し直して再度ステップＳ４
に進む。この分割し直しは、この時点で分割単位が
「文」であった場合には、分割単位を「段落」に、「段
落」であった場合には、「項」にする。また、分割単位
が「項」であった場合には「節」に、「節」であった場
合には「章」にする。［Ｓ９］抄録作成部１７には文書分割部１３から分割さ
れた文書要素が入力されているので、そのうち適合度が
最大値である文書要素を適合文書要素として抽出する。［Ｓ１０］抄録作成部１７は、抽出した適合文書要から
文書の抄録を作成する。作成した抄録は出力装置３０に
出力する。［Ｓ１１］同じ検索式で次の文書の抄録を作成するか否
か判断する。抄録を作成する場合にはステップＳ２に進
む。なお、抄録を作成しない場合にはこのフローチャー
トの処理を終了する。FIG. 3 is a flowchart for explaining the procedure for creating a document abstract in the document search apparatus 10 shown in FIG. It is assumed that document data is stored in the document data storage unit 11 in advance. Hereinafter, the description will be given along the step numbers in the figure. [S1] The user intention storage unit 14 receives, via the input device 20, an input of a search expression indicating the user intention, which is the purpose of creating an abstract of a document, and stores the input. [S2] The document specification unit 12 receives the specification of the target document for preparing the abstract via the input device 20, and inputs this to the document data storage unit 11. [S3] The document division unit 13 supplied with the specified document from the document data storage unit 11 divides this document into document elements in units of sentences. [S4] The relevance calculation unit 15 acquires the user intention from the user intention storage unit 14 and calculates the relevance for all the document elements input from the document division unit 13. The calculated relevance is stored in the relevance storage unit 16 together with the correspondence with the document element. [S5] The abstract creation unit 17 determines whether or not there is a document element with a suitability of 1.0 based on the suitability of each document element stored in the suitability storage unit 16. If there is a document element with a fitness of 1.0, the process proceeds to step S6. Fitness 1.0
If no document element exists, the process proceeds to step S7. [S6] Since the document elements divided from the document dividing section 13 are input to the abstract creating section 17, the document elements having the relevance of 1.0 are extracted as the conforming document elements. [S7] The abstract creating unit 17 calculates the
It is determined whether the division unit can be increased. If the division unit can be increased, the process proceeds to step S8. If the division unit is already the maximum and cannot be increased any further, the process proceeds to step S9. [S8] The abstract creation unit 17 requests the document division unit 13 to change the division unit. Upon receiving the division unit change request, the document classification unit 13 redivides the document supplied from the document data storage unit 11 into large division units, and repeats step S4.
Proceed to. At this time, if the division unit is “sentence” at this time, the division unit is “paragraph”, and if the division unit is “paragraph”, it is “paragraph”. If the division unit is “section”, the section is “section”. If the division unit is “section”, the section is “chapter”. [S9] Since the document elements divided from the document dividing section 13 are input to the abstract creating section 17, the document elements having the maximum relevance among them are extracted as the conforming document elements. [S10] The abstract creation unit 17 creates an abstract of the document from the extracted relevant document information. The created abstract is output to the output device 30. [S11] It is determined whether to create an abstract of the next document using the same search formula. When preparing an abstract, the process proceeds to step S2. If the abstract is not created, the process of this flowchart ends.

【００３５】本発明の文書検索装置１０では、以上のよ
うな手順によって文書の抄録を作成する。ここで、この
文書検索装置１０にて文書の抄録を実際に作成する様子
を順を追って説明する。The document retrieval apparatus 10 of the present invention creates an abstract of a document according to the above procedure. Here, how the document retrieval apparatus 10 actually creates an abstract of a document will be described step by step.

【００３６】図４は、図２に示した文書データ記憶部１
１に記憶される２つの文書の例を示した図である。文書
データ４０、５０が文書データ記憶部１１に記憶されて
おり、ユーザが「株式会社Ａ」の共同開発や共同研究に
関して文書の抄録を作成したい場合、検索式としては
「株式会社Ａand 共同 and（開発or研究）」等が入力さ
れる。この検索式はキーワード「株式会社Ａ」とキーワ
ード「共同」とキーワード「開発」とを同時に含むこ
と、もしくはキーワード「株式会社Ａ」とキーワード
「共同」とキーワード「研究」を同時に含むことを示し
ている。FIG. 4 shows the document data storage unit 1 shown in FIG.
FIG. 2 is a diagram illustrating an example of two documents stored in a first document; When the document data 40 and 50 are stored in the document data storage unit 11 and the user wants to create an abstract of the document relating to the joint development and the joint research of “A Co., Ltd.”, the search formula is “A and Kyoko and ( Development or research) "is input. This search expression indicates that the keyword "Co. A" and the keyword "joint" and the keyword "development" are included at the same time, or that the keyword "stock A" and the keyword "joint" and the keyword "research" are included simultaneously I have.

【００３７】この検索式が入力装置２０を介して利用者
意図記憶部１４に受け付けられ、文書データ４０、５０
が指定されると、文書作成装置１０は、まず文書データ
４０、５０を「文」単位の文書要素に分割する。This search expression is received by the user intention storage unit 14 via the input device 20, and the document data 40, 50
Is specified, the document creating apparatus 10 first divides the document data 40, 50 into document elements in units of "sentence".

【００３８】図５は、図４に示した文書データ４０を
［文」単位で文書要素に分割した様子を示した図であ
る。文書を分割した場合、各文書要素にはＮＯ．が付さ
れる。ここで各文書要素に対する検索式との適合度を算
出すると、文書要素ＮＯ．３にはキーワード「株式会社
Ａ」とキーワード「共同」とキーワード「開発」とが同
時に含まれており、適合度は１．０となる。また、文書
要素ＮＯ．６にはキーワード「株式会社Ａ」のみが含ま
れているので、適合度は０．３３となる。なお、その他
の文書要素にはキーワードは１つも含まれていないの
で、適合度は０となっている。FIG. 5 is a diagram showing a state where the document data 40 shown in FIG. 4 is divided into document elements in units of [sentence]. When the document is divided, each document element has a NO. Is appended. Here, when the relevance of each document element with the search formula is calculated, the document element NO. 3 includes the keyword “A”, the keyword “joint”, and the keyword “development” at the same time, and the matching degree is 1.0. The document element No. 6 includes only the keyword “stock company A”, the matching degree is 0.33. Note that the other document elements do not include any keywords, so the relevance is 0.

【００３９】ここでは適合度１．０の文書要素が存在す
るので、これ以上の文書の分割は必要ない。図６は、図
５に示した文書要素の適合度の算出結果に従って作成さ
れる抄録を示した図である。In this case, since there is a document element having a relevance of 1.0, there is no need to further divide the document. FIG. 6 is a diagram showing an abstract created according to the calculation result of the degree of suitability of the document element shown in FIG.

【００４０】本発明では、抄録作成部１７は３種類の抄
録を作成することができる。抄録４１は適合度１．０の
文書要素のみを抽出して作成されている。ユーザが抄録
を作成する目的は、「株式会社Ａ」の共同開発や共同研
究に関する文書を検索することなので、抄録４１が表示
された時点でユーザは文書データ４０の要不要を判断で
きる。In the present invention, the abstract creating section 17 can create three types of abstracts. The abstract 41 is created by extracting only the document elements having the relevance of 1.0. Since the purpose of the user to create the abstract is to search for documents related to the joint development and joint research of “A Co., Ltd.”, when the abstract 41 is displayed, the user can determine whether the document data 40 is necessary.

【００４１】また、抄録４２は適合度１．０の文書要素
に元の文書のタイトルを抽出、付加して作成されてい
る。作成された抄録がどのような主題の文書中に含まれ
ているのかを明らかにすることで、ユーザの判断を支援
することができる。The abstract 42 is created by extracting and adding the title of the original document to a document element having a relevance of 1.0. By clarifying the subject of the created abstract included in the document, it is possible to assist the user's judgment.

【００４２】さらに抄録４３は、適合度１．０の文書要
素を抽出し、この文書要素に含まれているキーワードを
強調表示するように作成されている。ここでは、反転表
示でキーワードの存在を強調しているが、ポイント数や
字体を変えたり下線を引いたりしてもよい。作成される
抄録が長い場合など、このようにキーワードを強調する
ことでユーザの判断を支援することが可能となる。Further, the abstract 43 is prepared so as to extract a document element having a relevance of 1.0 and highlight the keywords contained in the document element. Here, the presence of the keyword is emphasized by reverse display, but the number of points and the font may be changed or underlined. When the abstract to be created is long, the keyword can be emphasized in this way to support the user's judgment.

【００４３】図７は、図４に示した文書データ５０を
「文」単位で文書要素に分割した様子を示した図であ
る。各文書要素に対する検索式との適合度を算出する
と、文書要素ＮＯ．１にキーワード「共同」とキーワー
ド「開発」とが同時に含まれており、適合度は０．６７
となる。また、文書要素ＮＯ．３にキーワード「共同」
とキーワード「開発」とが同時に含まれており、適合度
は０．６７となる。さらに、文書要素ＮＯ．５及びＮ
Ｏ．７にはいずれもキーワード「株式会社Ａ」が含まれ
ているので、適合度は０．３３となる。その他の文書要
素にはキーワードは１つも含まれていないので、適合度
は０である。FIG. 7 is a diagram showing a state in which the document data 50 shown in FIG. 4 is divided into document elements in units of “sentence”. When the relevance of each document element with the search formula is calculated, the document element NO. 1 includes the keyword “joint” and the keyword “development” at the same time, and the relevance is 0.67.
Becomes The document element No. 3. The keyword "joint"
And the keyword "development" are included at the same time, and the degree of matching is 0.67. Further, the document element No. 5 and N
O. 7 includes the keyword “A Co., Ltd.”, the matching degree is 0.33. Since the other document elements do not include any keywords, the relevance is 0.

【００４４】ここでは適合度１．０の文書要素は存在し
ないので、文書の分割単位を大きくする。図８は、図７
に示した文書データ５０を「段落」単位で文書要素に分
割した様子を示した図である。In this case, since there is no document element having a relevance of 1.0, the unit of document division is increased. FIG.
FIG. 5 is a diagram showing a state in which the document data 50 shown in FIG.

【００４５】各文書要素に対する検索式との適合度を算
出すると、文書要素ＮＯ．１にはキーワード「共同」と
キーワード「開発」とが同時に含まれており、適合度は
０．６７となる。文書要素ＮＯ．３にはキーワード「株
式会社Ａ」とキーワード「共同」とキーワード「開発」
とが同時に含まれており、適合度は１．０となる。ま
た、文書要素ＮＯ．４にはキーワード「株式会社Ａ」が
含まれており、適合度は０．３３となる。When the relevance of each document element with the retrieval formula is calculated, the document element NO. 1 includes the keyword “joint” and the keyword “development” at the same time, and the degree of matching is 0.67. Document element No. 3 is the keyword "A", the keyword "joint" and the keyword "development"
Are included at the same time, and the matching degree is 1.0. The document element No. 4 includes the keyword “A Co., Ltd.”, and the matching degree is 0.33.

【００４６】ここでは適合度１．０の文書要素が存在す
るので、これ以上の文書の分割は必要ない。図９は、図
８に示した文書要素の適合度の算出結果に従って作成さ
れる抄録を示した図である。In this case, since there is a document element having a relevance of 1.0, there is no need to further divide the document. FIG. 9 is a diagram showing an abstract created in accordance with the calculation result of the degree of suitability of the document element shown in FIG.

【００４７】抄録５１は適合度１．０の文書要素のみを
抽出している。抄録５１が表示された時点でユーザは文
書データ５０の要不要を判断できる。また、抄録５２は
適合度１．０の文書要素に元の文書のタイトルを抽出し
て付加している。さらに抄録５３は適合度１．０の文書
要素を抽出し、この文書要素に含まれているキーワード
を強調して表示する。The abstract 51 extracts only document elements having a relevance of 1.0. When the abstract 51 is displayed, the user can determine whether the document data 50 is necessary. In addition, the abstract 52 extracts and adds the title of the original document to a document element having a matching degree of 1.0. Further, the abstract 53 extracts a document element having a relevance of 1.0, and highlights and displays the keywords included in the document element.

【００４８】以上、説明したように、本発明の実施の形
態では文書を文書要素に分割し、利用者意図との適合度
の高い文書要素を抽出して文書の抄録を作成するので、
文書の要不要を判断するための抄録を、利用者意図に応
じて簡単な操作で作成することができる。As described above, in the embodiment of the present invention, a document is divided into document elements, and a document element having high relevance to the user's intention is extracted to create an abstract of the document.
Abstracts for determining the necessity of a document can be created by a simple operation according to the user's intention.

【００４９】なお、上記の説明では文書分割部１３で文
書の分割に使用する分割単位の定義を「文」、「段
落」、「章」、「節」としたが、必ずしもこれらの分割
単位をすべて定義しなくともよい。すなわち、「文」や
「段落」等、頻繁に使用される分割単位のみを定義して
おくことも可能である。In the above description, the division units used for dividing the document by the document division unit 13 are defined as “sentence”, “paragraph”, “chapter”, and “section”. Not all need to be defined. That is, it is also possible to define only frequently used division units such as “sentence” and “paragraph”.

【００５０】また、「文」単位で適合率１．０の文書要
素が存在しない場合に、文書単位を「連続したｎ文」に
変更するように設定してもよい。「ｎ文」は文の論理構
造による制限を受けないように抽出する。すなわち、文
「ａ」、文「ｂ」、文「ｃ」、文「ｄ」・・・で構成さ
れる文書を文書単位「２文」で分割するならば、文書要
素は、文「ａｂ」、文「ｂｃ」、文「ｃｄ」・・・とな
る。Further, when there is no document element having a relevance ratio of 1.0 in units of "sentence", the document unit may be changed to "consecutive n sentences". “N sentence” is extracted so as not to be restricted by the logical structure of the sentence. That is, if a document composed of the sentence “a”, the sentence “b”, the sentence “c”, the sentence “d”... Is divided into the document units “two sentences”, the document element becomes the sentence “ab” , Sentence "bc", sentence "cd", and so on.

【００５１】さらに、作成される抄録が大きすぎる場合
には、抄録を構成する文書要素数を適度な量になるよう
フィルタリングしてもよい。例えば、段落などのまとま
りのある文書要素では文頭や文末に要点が記載してある
場合が多いので、そのような位置にある文書要素を優先
的に抽出する。検索式に含まれるキーワード群の距離が
最小の文書要素を抽出したり、ユーザに改めて入力させ
たキーワードを含んだ文書要素を抽出したりしてもよ
い。また、作成された抄録の最初の一定数の文書要素の
みを抽出することもできる。Further, when the abstract to be created is too large, filtering may be performed so that the number of document elements constituting the abstract becomes an appropriate amount. For example, in a document element having a unity such as a paragraph, a gist is often described at the beginning or end of the sentence, and therefore, the document element at such a position is preferentially extracted. A document element in which the distance of the keyword group included in the search formula is the smallest may be extracted, or a document element including the keyword newly input by the user may be extracted. Further, it is also possible to extract only the first certain number of document elements of the created abstract.

【００５２】これらの方法で抄録の規模を抑えることに
より、ユーザは抄録をより短時間で読み、文書の要不要
を判断することができるようになる。なお、上記で説明
した第１の文書検索装置１０にて、約１年間分の新聞記
事のうち、検索式「株式会社Ａand 共同 and（開発or研
究）」でヒットした２６件の文書について抄録を作成
し、文書の要不要を判断する実験を行った。By reducing the size of the abstract using these methods, the user can read the abstract in a shorter time and determine whether a document is necessary or not. In the first document retrieval apparatus 10 described above, among the newspaper articles for about one year, abstracts were obtained for 26 documents that were hit by the retrieval formula "Aand joint and (development or research)". An experiment was conducted to determine the necessity of the document.

【００５３】この結果、約７７％の抄録において、元の
文書を読み解いて判断した場合と同じ判断が可能であっ
た。また、この文書検索装置１０にて作成された２６件
の抄録は、元の文書と比較して約２７％の文字数で構成
されていた。As a result, in about 77% of abstracts, it was possible to make the same judgment as when reading and judging the original document. The 26 abstracts created by the document retrieval apparatus 10 were composed of about 27% of the number of characters compared to the original document.

【００５４】新聞記事の抄録を作成する際に第１段落の
文章を抽出する従来の抄録作成装置では、元の文書と比
較した抄録の規模は平均で３４％である。また、そのよ
うにして作成された抄録による要不要の判断が、元の文
書を読み解いた後の判断と同一になったのは約５７％で
ある。In a conventional abstract creation apparatus that extracts the text of the first paragraph when creating an abstract of a newspaper article, the scale of the abstract compared to the original document is 34% on average. In addition, about 57% of the abstracts created in this way were judged to be unnecessary or unnecessary after reading the original document.

【００５５】次に、本発明の第２の文書抄録作成装置に
ついて説明する。図１０は、本発明の第２の文書抄録作
成装置の原理構成を示すブロック図である。Next, a second document abstract creation apparatus of the present invention will be described. FIG. 10 is a block diagram showing the principle configuration of the second document abstract creation device of the present invention.

【００５６】本発明の第２の文書抄録作成装置は、利用
者意図保持手段６１と、重要キーワード抽出手段６２
と、重要文章抽出手段６３と、抄録作成手段６４とから
構成され、文書データから、入力される利用者意図に応
じた抄録を作成する。The second document abstract creation device of the present invention comprises a user intention holding means 61 and an important keyword extraction means 62
, An important sentence extracting unit 63 and an abstract creating unit 64, which create an abstract according to the input user's intention from the document data.

【００５７】ここで利用者意図保持手段６１は入力され
る利用者意図を保持する。重要キーワード抽出手段６２
は、利用者意図保持手段６１に保持された利用者意図か
ら重要キーワードを抽出する。重要文章抽出手段６３
は、文書データから、重要キーワードを含んだ重要文章
Ａ、Ｃを抽出する。抄録作成手段６４は抽出された重要
文章Ａ、Ｃにて文書の抄録を作成する。Here, the user intention holding means 61 holds the input user intention. Important keyword extraction means 62
Extracts an important keyword from the user intention held in the user intention holding means 61. Important sentence extraction means 63
Extracts important sentences A and C including important keywords from document data. The abstract creating means 64 creates an abstract of the document based on the extracted important sentences A and C.

【００５８】このように本発明の第２の文書抄録作成装
置では、文書から重要キーワードを含んだ重要文章を抽
出して文書の抄録を作成するので、文書の要不要を判断
するための抄録を、利用者意図に応じて簡単な操作で作
成することができる。As described above, in the second document abstract creation apparatus of the present invention, an important sentence including an important keyword is extracted from a document to create an abstract of the document. It can be created by a simple operation according to the user's intention.

【００５９】図１１は、本発明の第２の文書抄録作成装
置を適用した文書検索装置の実施の形態を示す図であ
る。図に示した文書検索装置７０は、ユーザからの入力
を受け付ける入力装置８０及びユーザに作成した抄録を
出力する出力装置９０と接続されている。この文書検索
装置７０は、コンピュータ等で実現してもよい。なお、
出力装置９０はモニタ等の表示装置のみで構成してもよ
い。FIG. 11 is a diagram showing an embodiment of a document search device to which the second document abstract creation device of the present invention is applied. The illustrated document search device 70 is connected to an input device 80 that receives an input from a user and an output device 90 that outputs an abstract created for the user. This document search device 70 may be realized by a computer or the like. In addition,
The output device 90 may be composed of only a display device such as a monitor.

【００６０】ここで文書検索装置７０は、文書データ記
憶部７１、利用者意図記憶部７２、キーワード抽出機構
部７３、キーワード保持機構部７４、重要文章選択機構
部７５、抄録作成部７６から構成されている。Here, the document search device 70 comprises a document data storage unit 71, a user intention storage unit 72, a keyword extraction mechanism unit 73, a keyword holding mechanism unit 74, an important sentence selection mechanism unit 75, and an abstract creation unit 76. ing.

【００６１】文書データ記憶部７１は、文書検索の結果
得られた文書群の文書データを記憶し、ユーザから入力
装置８０を介して文書データの指定を入力されると対応
する文書データを出力する。The document data storage section 71 stores the document data of the document group obtained as a result of the document search, and outputs the corresponding document data when the user inputs the designation of the document data via the input device 80. .

【００６２】利用者意図記憶部７２は、何のために文書
の抄録を作成するのか、その目的である利用者意図の入
力を受け付け、これを記憶する。ここで入力される利用
者意図は、単語及び論理演算子で構成される検索式で
も、自然言語でもよい。The user intention storage unit 72 receives an input of the user intention, which is the purpose of preparing the abstract of the document, and stores it. The user intention input here may be a search expression composed of words and logical operators or a natural language.

【００６３】キーワード抽出機構部７３は、入力される
利用者意図から重要キーワードを抽出する。利用者意図
が検索式で入力されている場合は、その検索式に含まれ
る単語を重要キーワードとして抽出する。また、利用者
意図が自然言語で入力されている場合には、その利用者
意図を形態素解析し、含まれる単語から自立語を抽出し
て重要キーワードとする。The keyword extraction mechanism 73 extracts important keywords from the input user intention. When the user's intention is input by a search formula, words included in the search formula are extracted as important keywords. Further, when the user intention is input in a natural language, the user intention is subjected to morphological analysis, and an independent word is extracted from the included words to be an important keyword.

【００６４】キーワード抽出機構部７３はまた、指示が
あれば、入力される文書データから文書キーワードを抽
出する。ここで、文書キーワードの抽出にはｔｆ＊ＩＤ
Ｆ積を用いる手法を採用する。これは、文書データ中に
出現するすべての単語に対して、ｔｆ（term frequenc
y、文書中におけるその単語の出現頻度）と、ＤＦ（Doc
ument Frequency、その単語が出現する文書集合中の文
書数）の逆数ＩＤＦ（Inverted ＤＦ）を求めて、次式
（１）に示すｔｆ＊ＩＤＦ積を算出する方法である。If there is an instruction, the keyword extraction mechanism 73 extracts a document keyword from the input document data. Here, tf * ID is used to extract the document keyword.
A method using the F product is adopted. This is because tf (term frequencnc) is used for all words that appear in the document data.
y, the frequency of occurrence of the word in the document) and DF (Doc
In this method, the inverse IDF (Inverted DF) of the ument Frequency, the number of documents in the document set where the word appears, is calculated, and the tf * IDF product shown in the following equation (1) is calculated.

【００６５】[0065]

【数１】ｔｆ＊ＩＤＦ＝ｔｆ＊ｌｏｇ（Ｎ／ＤＦ）・・・（１）なお、Ｎは文書集合中の全文書数を示す。キーワード抽
出機構部７３は、算出されたｔｆ＊ＩＤＦ積が一定値以
上の単語、もしくは算出されるｔｆ＊ＩＤＦ積の大きい
単語を一定数、文書キーワードとする。なおこの際、ｔ
ｆ＊ＩＤＦ積の値が大きくても文書キーワードとする必
要のない単語をストップワードリストとして保持してお
き、文書キーワードとして選択される単語を適切なもの
とすることが可能である。Tf * IDF = tf * log (N / DF) (1) where N indicates the total number of documents in the document set. The keyword extraction mechanism 73 sets a certain number of words whose calculated tf * IDF products are equal to or greater than a certain value or words whose calculated tf * IDF products are large as a document keyword. At this time, t
Even if the value of the f * IDF product is large, a word that does not need to be a document keyword is stored as a stop word list, and a word selected as a document keyword can be made appropriate.

【００６６】キーワード保持機構部７４は、抽出された
重要キーワード及び文書キーワードを保持する。重要文
章選択機構部７５は、入力されるキーワードに基づき、
供給される文書データから重要文章を選択する。この重
要文章の選択については、後でフローチャートを示して
説明する。The keyword holding mechanism 74 holds the extracted important keywords and document keywords. The important sentence selection mechanism 75, based on the input keyword,
An important sentence is selected from the supplied document data. The selection of the important sentence will be described later with reference to a flowchart.

【００６７】抄録作成部７６は、選択された重要文章を
その出現順に並べて、文書の抄録を作成する。ここで、
重要文章選択機構部７５における重要文章の選択の手順
について説明する。なお、重要文章の選択には２つの方
法がある。The abstract creation section 76 arranges the selected important sentences in the order in which they appear, and creates an abstract of the document. here,
The procedure for selecting an important sentence in the important sentence selection mechanism 75 will be described. There are two methods for selecting important sentences.

【００６８】図１２は、図１１に示した重要文章選択機
構部７５における重要文章の第１の選択方法の手順を示
すフローチャートである。以下、図中のステップ番号に
沿って説明する。［Ｓ２１］供給される文書中の各単語のｔｆ＊ＩＤＦ積
を算出する。［Ｓ２２］すべての重要キーワードのｔｆ＊ＩＤＦ積
を、Ｓ２１で算出されたどのｔｆ＊ＩＤＦ積よりも高い
値に設定する。［Ｓ２３］重要キーワードの単語リストＬを作成する。［Ｓ２４］供給される文書中の各文章に対し、その文章
に含まれる単語のｔｆ＊ＩＤＦ積を合計してスコアを算
出する。［Ｓ２５］スコアの高い順に、一定数（Ｎ個）の文章を
選択し、これらを重要文章とする。［Ｓ２６］ステップＳ２５で選択された重要文章中に重
要キーワードが含まれているか否か判断する。重要キー
ワードが含まれていればステップＳ２７に進む。重要キ
ーワードが含まれていなければステップＳ２８に進む。［Ｓ２７］ステップＳ２６で重要文章中に含まれている
と判断された重要キーワードを、単語リストＬから削除
する。［Ｓ２８］単語リストＬが空か否か判断する。単語リス
トＬが空ならばこのフローチャートの処理は終了であ
る。単語リストＬが空でないならばステップＳ２９に進
む。［Ｓ２９］Ｎに１を加え、新たなＮとする。［Ｓ３０］スコアがＮ番目の文章Ｓに、重要キーワード
が含まれているか否か判断する。重要キーワードが含ま
れていればステップＳ３１に進む。重要キーワードが含
まれていなければ再度ステップＳ２９へ進む。［Ｓ３１］ステップＳ３０で文章Ｓに含まれていると判
断された重要キーワードを、単語リストＬから削除す
る。［Ｓ３２］文章Ｓを重要文章として追加選択し、再度ス
テップＳ２８へ進む。FIG. 12 is a flowchart showing the procedure of the first important sentence selection method in the important sentence selection mechanism 75 shown in FIG. Hereinafter, description will be given along the step numbers in the figure. [S21] The tf * IDF product of each word in the supplied document is calculated. [S22] The tf * IDF products of all important keywords are set to values higher than any tf * IDF products calculated in S21. [S23] A word list L of important keywords is created. [S24] For each sentence in the supplied document, the score is calculated by summing the tf * IDF products of the words included in the sentence. [S25] A fixed number (N) of sentences are selected in descending order of the score, and these are set as important sentences. [S26] It is determined whether an important keyword is included in the important sentence selected in step S25. If an important keyword is included, the process proceeds to step S27. If no important keyword is included, the process proceeds to step S28. [S27] The important keyword determined to be included in the important sentence in step S26 is deleted from the word list L. [S28] It is determined whether the word list L is empty. If the word list L is empty, the processing of this flowchart ends. If the word list L is not empty, the process proceeds to step S29. [S29] One is added to N to obtain a new N. [S30] It is determined whether an important keyword is included in the Nth sentence S of the score. If an important keyword is included, the process proceeds to step S31. If no important keyword is included, the process proceeds to step S29 again. [S31] The important keywords determined to be included in the sentence S in step S30 are deleted from the word list L. [S32] The sentence S is additionally selected as an important sentence, and the process proceeds to step S28 again.

【００６９】本発明の第２の文書検索装置７０では、こ
のようにして選択した重要文章をその出現順に並べて文
書の抄録を作成することによって、利用者意図に沿った
文書抄録を作成することができる。In the second document retrieval apparatus 70 of the present invention, the important sentences selected in this way are arranged in the order of appearance, and an abstract of the document is created, thereby making it possible to create a document abstract according to the user's intention. it can.

【００７０】次に、重要文章の選択の第２の方法につい
て説明する。この第２の方法は、文書データに対する文
書キーワードが予め重み付けられて存在している場合
に、その文書キーワード及び重み付けを用いて重要文章
を選択する。Next, a second method for selecting an important sentence will be described. In the second method, when a document keyword for document data is weighted beforehand, an important sentence is selected using the document keyword and the weight.

【００７１】図１３は、図１１に示した重要文章選択機
構部７５における重要文章の第２の選択方法の手順を示
すフローチャートである。以下、図中のステップ番号に
沿って説明する。［Ｓ４１］文書キーワード及び重要キーワードから、キ
ーワードテーブルを作成する。なお、重要キーワードの
重みは文書キーワードの重みの最大値より大きな値を設
定する。［Ｓ４２］文書中の各文章に対するスコアを算出し、文
テーブルを作成する。なお、スコアは文章中に含まれる
キーワードの重みを合計して算出する。［Ｓ４３］最高スコアの文章を重要文章とし、文テーブ
ルから重要文章リストに移す。［Ｓ４４］キーワードテーブルを更新する。すなわち、
ステップＳ４３で重要文章とした文章に含まれるキーワ
ードに付加されている重みを減らし、利用済みのチェッ
クを付加する。［Ｓ４５］キーワードテーブルに、重み下限値より大き
な重みを付加されたキーワードがあるか否か判断する。
重み下限値より大きな重みを付加されたキーワードがあ
ればステップＳ４２へ進む。そのようなキーワードがな
ければこのフローチャートの処理は終了となる。FIG. 13 is a flowchart showing the procedure of the second important sentence selection method in the important sentence selection mechanism 75 shown in FIG. Hereinafter, description will be given along the step numbers in the figure. [S41] A keyword table is created from document keywords and important keywords. The weight of the important keyword is set to a value larger than the maximum value of the weight of the document keyword. [S42] A score for each sentence in the document is calculated, and a sentence table is created. The score is calculated by summing the weights of the keywords included in the text. [S43] The sentence with the highest score is regarded as an important sentence, and the sentence is moved from the sentence table to the important sentence list. [S44] The keyword table is updated. That is,
In step S43, the weight added to the keyword included in the sentence that is regarded as an important sentence is reduced, and a check that the sentence is used is added. [S45] It is determined whether or not there is a keyword in the keyword table to which a weight greater than the lower weight limit has been added.
If there is a keyword to which a weight greater than the lower weight limit is added, the process proceeds to step S42. If there is no such keyword, the process of this flowchart ends.

【００７２】ここで、重要文章を第２の選択方法で選択
し、文書の抄録を作成する様子を例をあげて説明する。
図１４は、図１１に示した文書データ記憶部７１に記憶
される文書データの例を示す図である。Here, an example in which an important sentence is selected by the second selection method and an abstract of a document is created will be described.
FIG. 14 is a diagram illustrating an example of document data stored in the document data storage unit 71 illustrated in FIG.

【００７３】文書データ１００は、タイトル及び９つの
段落から構成されている。また、図１５は、図１４に示
した文書データに付加された、重み付きの文書キーワー
ドを示す。The document data 100 is composed of a title and nine paragraphs. FIG. 15 shows a weighted document keyword added to the document data shown in FIG.

【００７４】重み付き文書キーワードの表１０１は予め
用意されている。ここで、検索式「（中国｜中華）＊料
理」（「｜」は論理和を、「＊」は論理積を示す）が入
力された場合には、「中国」或いは「中華」、及び「料
理」が重要キーワードとなり、キーワードテーブルが作
成される。A table 101 of weighted document keywords is prepared in advance. Here, when the search expression “(China | Chinese) * cuisine” (“|” indicates a logical sum and “*” indicates a logical product) is input, “China” or “Chinese” and “ “Cooking” becomes an important keyword, and a keyword table is created.

【００７５】図１６は、図１４に示した文書データから
最初に作成されるキーワードテーブルを示した図であ
る。キーワードテーブル１０２には文書キーワードと重
要キーワードとが記載されており、重要キーワードには
文書キーワードの重みの最大値（ここでは４）よりも大
きい値（ここでは５）が重み付けされている。なお、各
キーワードに対し、利用したか否かをチェックするため
の利用済欄が設けてある。FIG. 16 is a diagram showing a keyword table created first from the document data shown in FIG. The keyword table 102 describes a document keyword and an important keyword, and the important keyword is weighted with a value (here, 5) larger than the maximum value (here, 4) of the weight of the document keyword. In addition, a used column for checking whether or not each keyword has been used is provided.

【００７６】図１７は、図１６に示したキーワードテー
ブル１０２に基づいて作成された文テーブルを示した図
である。文テーブル１０３には、各文章の出現順序及び
スコアが記載される。FIG. 17 is a diagram showing a sentence table created based on the keyword table 102 shown in FIG. The sentence table 103 describes the appearance order and score of each sentence.

【００７７】以上示したような文書データ１００、キー
ワードテーブル１０２、文テーブル１０３から文書の抄
録を作成する。図１８は、図１４に示した文書データの
抄録を示す図であって、（Ａ）は本発明の第２の文書抄
録装置にて作成した抄録を、（Ｂ）は文書キーワードの
みから作成した抄録を、示す図である。An abstract of a document is created from the document data 100, the keyword table 102, and the sentence table 103 as described above. FIG. 18 is a diagram showing an abstract of the document data shown in FIG. 14, wherein (A) is an abstract created by the second document abstracting apparatus of the present invention, and (B) is created from only document keywords. It is a figure which shows an abstract.

【００７８】なお、ここではステップＳ４４におけるキ
ーワードテーブルの更新で、キーワードの重みを１／１
０にし、重み下限値を２とした場合に作成される抄録を
例にあげた。Here, the weight of the keyword is reduced to 1/1 by updating the keyword table in step S44.
The abstract prepared when the weight lower limit value is set to 0 and the weight lower limit value is set to 2 is described as an example.

【００７９】図において抄録１０４には検索式に含まれ
ていた「中国」、「中華」、「料理」を含む文章１０４
ａが含まれていることが判る。このように、本発明の第
２の文書検索装置によれば、本来は検索式「（中国｜中
華）＊料理」を主題としない文書１００に対しても、検
索目的に合致するか否かを迅速に判断できる抄録を作成
することが可能となる。In the figure, the abstract 104 includes sentences 104 including “China”, “China”, and “cooking” included in the retrieval formula.
It can be seen that a is included. As described above, according to the second document search device of the present invention, it is determined whether or not the document 100 which does not originally have the search formula “(China | Chinese) * cooking” as a subject matches the search purpose. Abstracts that can be judged quickly can be created.

【００８０】なお、上記の説明では文章に含まれるキー
ワードの重みの合計を文章のスコアとして算出するとし
たが、各文章に含まれるキーワードの頻度と重みの積を
合計して文章のスコアとしてもよい。In the above description, the sum of the weights of the keywords included in the text is calculated as the score of the text. However, the product of the frequency and the weight of the keyword included in each text may be summed to obtain the score of the text. .

【００８１】また、重み下限値を変更することにより、
抄録の文書量を加減することが可能である。Also, by changing the weight lower limit value,
It is possible to control the amount of abstract documents.

【００８２】[0082]

【発明の効果】以上説明したように本発明の第１の文書
抄録作成装置では、文書を文書要素に分割し、利用者意
図との適合度の高い文書要素を抽出して文書の抄録を作
成する構成としたので、文書の要不要を判断するための
抄録を、利用者意図に応じて簡単な操作で作成すること
ができる。As described above, in the first document abstract creation apparatus of the present invention, a document is divided into document elements, and a document element having high relevance to the user's intention is extracted to create an abstract of the document. With this configuration, an abstract for determining whether or not a document is necessary can be created by a simple operation according to the user's intention.

【００８３】また、以上説明したように本発明の第２の
文書抄録作成装置では、文書から重要キーワードを含ん
だ重要文章を抽出して文書の抄録を作成する構成とした
ので、文書の要不要を判断するための抄録を、利用者意
図に応じて簡単な操作で作成することができる。Further, as described above, the second document abstract creation apparatus of the present invention is configured to extract an important sentence including an important keyword from a document to create an abstract of the document. An abstract for judging can be created by a simple operation according to the user's intention.

【００８４】さらに本発明の文書抄録作成プログラムに
は、コンピュータに、文書から重要キーワードを含んだ
重要文章を抽出して文書の抄録を作成する機能を実現さ
せる機能を持たせたので、文書の要不要を判断するため
の抄録を、利用者意図に応じて簡単な操作で作成するこ
とができる。Further, the document abstract creation program of the present invention has a function of extracting an important sentence including an important keyword from a document and realizing a function of creating an abstract of the document. An abstract for determining unnecessaryness can be created by a simple operation according to the user's intention.

[Brief description of the drawings]

【図１】本発明の第１の文書抄録作成装置の原理構成を
示したブロック図である。FIG. 1 is a block diagram showing the principle configuration of a first document abstract creation device of the present invention.

【図２】本発明の第１の文書抄録作成装置を適用した文
書検索装置の実施の形態を示す図である。FIG. 2 is a diagram showing an embodiment of a document search device to which the first document abstract creation device of the present invention is applied.

【図３】図２に示した文書検索装置における文書抄録作
成の手順を説明するフローチャートである。FIG. 3 is a flowchart illustrating a procedure for creating a document abstract in the document search device shown in FIG. 2;

【図４】図２に示した文書データ記憶部に記憶される２
つの文書の例を示した図である。FIG. 4 is a diagram illustrating 2 stored in a document data storage unit shown in FIG.
FIG. 4 is a diagram showing an example of one document.

【図５】図４に示した文書データを［文」単位で文書要
素に分割した様子を示した図である。FIG. 5 is a diagram showing a state in which the document data shown in FIG. 4 is divided into document elements in units of [sentence].

【図６】図５に示した文書要素の適合度の算出結果に従
って作成される抄録を示した図である。FIG. 6 is a diagram showing an abstract created in accordance with a calculation result of the degree of suitability of the document element shown in FIG. 5;

【図７】図４に示した文書データを「文」単位で文書要
素に分割した様子を示した図である。FIG. 7 is a diagram showing a state where the document data shown in FIG. 4 is divided into document elements in units of “sentence”.

【図８】図７に示した文書データを「段落」単位で文書
要素に分割した様子を示した図である。FIG. 8 is a diagram showing a state where the document data shown in FIG. 7 is divided into document elements in units of “paragraphs”.

【図９】図８に示した文書要素の適合度の算出結果に従
って作成される抄録を示した図である。9 is a diagram showing an abstract created according to the calculation result of the degree of matching of the document element shown in FIG. 8;

【図１０】本発明の第２の文書抄録作成装置の原理構成
を示すブロック図である。FIG. 10 is a block diagram showing the principle configuration of the second document abstract creation device of the present invention.

【図１１】本発明の第２の文書抄録作成装置を適用した
文書検索装置の実施の形態を示す図である。FIG. 11 is a diagram showing an embodiment of a document search device to which the second document abstract creation device of the present invention is applied.

【図１２】図１１に示した重要文章選択機構部における
重要文章の第１の選択方法の手順を示すフローチャート
である。FIG. 12 is a flowchart showing a procedure of a first important sentence selection method in the important sentence selection mechanism shown in FIG. 11;

【図１３】図１１に示した重要文章選択機構部における
重要文章の第２の選択方法の手順を示すフローチャート
である。13 is a flowchart showing a procedure of a second important sentence selection method in the important sentence selection mechanism shown in FIG. 11;

【図１４】図１１に示した文書データ記憶部に記憶され
る文書データの例を示す図である。FIG. 14 is a diagram illustrating an example of document data stored in the document data storage unit illustrated in FIG. 11;

【図１５】図１４に示した文書データに付加された、重
み付きの文書キーワードを示す。FIG. 15 illustrates a weighted document keyword added to the document data illustrated in FIG. 14;

【図１６】図１４に示した文書データから最初に作成さ
れるキーワードテーブルを示した図である。FIG. 16 is a diagram showing a keyword table created first from the document data shown in FIG. 14;

【図１７】図１６に示したキーワードテーブルに基づい
て作成された文テーブルを示した図である。FIG. 17 is a diagram showing a sentence table created based on the keyword table shown in FIG. 16;

【図１８】図１４に示した文書データの抄録を示す図で
あって、（Ａ）は本発明の第２の文書抄録装置にて作成
した抄録を、（Ｂ）は文書キーワードのみから作成した
抄録を、示す図である。18A and 18B are diagrams showing an abstract of the document data shown in FIG. 14, wherein FIG. 18A is an abstract created by the second document abstract device of the present invention, and FIG. It is a figure which shows an abstract.

[Explanation of symbols]

１文書分割手段２利用者意図保持手段３適合度算出手段４文書要素抽出手段５抄録作成手段 1 Document dividing means 2 User intention holding means 3 Relevance calculating means 4 Document element extracting means 5 Abstract making means

Claims

[Claims]

1. A document abstract creating apparatus for creating an abstract of a document, comprising: a document dividing unit that divides the document into document elements; a user intention holding unit that holds an input user intention; Relevance calculating means for calculating relevance of the element to the user's intention; document element extracting means for extracting a relevant document element from the document element based on the relevance; An abstract creation means for creating a document abstract, and a document abstract creation apparatus comprising:

2. The document division unit defines a plurality of division units for dividing the document, and changes the division unit when a suitable document element is not extracted by the document element extraction unit. The document abstract creation apparatus according to claim 1, wherein

3. An important document which extracts an important document element from the relevant document element based on the stored important document element condition, and inputs the important document element to the abstract creation means instead of the relevant document element. 2. The document abstract creation device according to claim 1, further comprising an element extraction unit.

4. A document abstract creation apparatus for creating an abstract of a document, a user intention holding means for holding an input user intention, an important keyword extraction means for extracting an important keyword from the user intention, A document abstract creating apparatus, comprising: an important sentence extracting unit that extracts an important sentence including the important keyword from a document; and an abstract creating unit that creates an abstract of the document based on the important sentence.

5. The document abstract creation apparatus according to claim 4, wherein said important keyword extracting means sets a document keyword extracted from said document as said important keyword.

6. The document abstract creation apparatus according to claim 4, further comprising display means for highlighting the important keyword when displaying the abstract.

7. A storage medium storing a document abstract creation program for creating an abstract of a document, a computer comprising: a document dividing means for dividing the document into document elements; and a user intention holding for input user intention. Means for calculating a degree of suitability of the divided document elements with respect to the user intention; a document element extracting means for extracting a suitable document element from the document element based on the degree of suitability; A storage medium storing a document abstract creation program, which functions as an abstract creation means for creating an abstract of the document.

8. A storage medium storing a document abstract creation program for creating an abstract of a document, a computer, a user intention holding means for holding the input user intention, and extracting an important keyword from the user intention. An important keyword extracting means for extracting an important sentence including the important keyword from the document; an abstract creating means for creating an abstract of the document based on the important sentence; A storage medium that stores an abstract creation program.