JPH02112069A

JPH02112069A - Automatic summarizing system

Info

Publication number: JPH02112069A
Application number: JP63263932A
Authority: JP
Inventors: Junichi Matsuda; 純一松田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-10-21
Filing date: 1988-10-21
Publication date: 1990-04-24

Abstract

PURPOSE:To extract a summarized sentence only with a word dividing effect by providing a means to accumulate a specified phrase, which already appears, in a natural word processing system equipped with a dictionary and a word dividing means. CONSTITUTION:A central processing unit 2 executes all processing from the reading of a Japanese sentence to the output of a result. A storage device 4 to accumulate information, which are needed for the processing, is equipped with a dictionary data base, a word divided result data base, a document data base and a key word data base as a storing area. A condition for a key word or a key sentence is integrated into the natural word processing system in advance. While the system reads a sentence to be processed one by one, the system decides whether the sentence includes the key word or goes to be the key sentence or not. Based on such a result, only the partial sentence is extracted and the summarized sentence is automatically made. Thus, the substance of the sentence can be automatically extracted and the substance of the sentence can be grasped in a short time.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文章の中から要約文となる文を自動的に抽出す
る方式に係り、例えば１機械翻訳システムで翻訳対象と
する要約文を自動的に抽出する方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a method for automatically extracting a summary sentence from a text. For example, a machine translation system automatically extracts a summary sentence to be translated. Regarding the extraction method.

〔従来の技術〕従来２文章の大意を自動的に得る方法としては、例えば
、特開昭６１−１１７６５８号に見られるように、文章
中に現れる語の出現頻度をもとにキーワードを自動的に
判断し、そのキーワードをもとに各段落から最重要文を
１文選び、要約文とするという方法がある。[Prior Art] As a conventional method for automatically obtaining the gist of two sentences, for example, as shown in Japanese Patent Laid-Open No. 117658/1983, keywords are automatically extracted based on the frequency of words appearing in the sentences. There is a method of determining the most important sentence from each paragraph based on the keyword and using it as a summary sentence.

一方、機械翻訳システムでは、従来、文章中の全ての文
を翻訳処理の対象としており、翻訳対象文を指定したい
ときは人間が指示しなければならなかった。また、機械
翻訳処理の効率向上という観点からは、例えば特開昭６
１−２５５４６８号に記載されているように、入力文に
対する翻訳処理が所定時間を経過したものを部分訳や単
語訳にする方式がある。しかは５文章の中から一部分を
自動的に抽出・翻訳する方法はなかった。On the other hand, in conventional machine translation systems, all sentences in a text are subject to translation processing, and when a person wants to specify a sentence to be translated, a human has to give an instruction. In addition, from the perspective of improving the efficiency of machine translation processing, for example,
As described in Japanese Patent No. 1-255468, there is a method in which an input sentence is translated into a partial translation or word translation after a predetermined period of time has elapsed. However, there was no way to automatically extract and translate a portion of the five sentences.

[Problem to be solved by the invention]

しかしながら、上に述べた要約方法は、どの程度の出現
頻度の語をキーワードとすべきか決めることが難しく、
また、１段落１文では十分に文意を把握できない可能性
がある。However, with the above-mentioned summarization method, it is difficult to decide how frequently words should appear as keywords.
Furthermore, it is possible that one sentence per paragraph may not be enough to grasp the meaning of the text.

本発明の目的は、要約文抽出のための新たな方法を提供
し、構文・意味解析をすることなく単語分割効果のみに
より文意を把握するのに必要十分な要約文を抽出するこ
とにある。また、これを機械翻訳に適用し、翻訳処理を
効率化することも可能である。The purpose of the present invention is to provide a new method for extracting summary sentences, and to extract enough summary sentences to understand the meaning of sentences using only the word segmentation effect without performing syntactic or semantic analysis. . It is also possible to apply this to machine translation to make translation processing more efficient.

（ｉＭを解決するための手段〕本発明では、自然言語処理システムにあらかじめキーワ
ードやキーセンテンスとなるための条件を組み入れてお
く、システムは処理しようとする文を一文ずつ読み込み
ながら、その文がキーワードを含むかまたキーセンテン
スとなるかを判定する。この結果をもとに、一部の文の
み出力して要約文を自動的に作る。(Means for solving iM) In the present invention, conditions for becoming keywords and key sentences are incorporated into the natural language processing system in advance.The system reads each sentence to be processed one by one, and then identifies the sentences as keywords. It determines whether it contains a key sentence or whether it is a key sentence.Based on this result, only a part of the sentence is output and a summary sentence is automatically created.

[Effect]

自動生成した要訣文書は文数が元の文より少ないので、
短時間で文書の内容を把握することができる。The automatically generated summary document has fewer sentences than the original, so
You can understand the contents of a document in a short time.

また、同一語句が何回も現われるので、読みやすい要約
文書を作ることができる。Also, since the same words appear many times, it is possible to create an easy-to-read summary document.

〔Example〕

以下２本発明を日本語文書処理システムに適用した例を
図面を用いて詳細に説明する。Hereinafter, two examples in which the present invention is applied to a Japanese document processing system will be described in detail with reference to the drawings.

第２図は、日本語文書処理システムのハード構成図であ
る。１は日本文を入力する装置である。FIG. 2 is a hardware configuration diagram of the Japanese document processing system. 1 is a device for inputting Japanese sentences.

２はシステムの中央処理装置であり１日本文を読み込ん
でから結果を出力するまでの全ての処理を行なう、３は
結果を出力する装置である。４は処理に必要な情報を蓄
える記憶装置であり、ここではその記憶エリアとして辞
書データベース及び単語分割結果データベース及び文書
データベース及びキーワードデータベースを含んでいる
ものとする。Reference numeral 2 is a central processing unit of the system, and 1 performs all processing from reading a Japanese sentence to outputting the result. Reference numeral 3 is a device that outputs the result. Reference numeral 4 denotes a storage device for storing information necessary for processing, and here it is assumed that its storage area includes a dictionary database, a word division result database, a document database, and a keyword database.

第３図は、曝語分割結果を記憶する単語分割結果データ
ベースのレコードの例である。レコードは、単語つづり
３０１９品詞３０２、辞書の種別３０３から構成されて
いる。FIG. 3 is an example of records of a word division result database that stores word division results. The record is composed of word spelling 3019, part of speech 302, and dictionary type 303.

第４図は、文書を記憶する文書データベースのレコード
の例である。レコードは文番号４０１゜文４０２．依存
先文番号４０３、要約文フラグ４０４から構成されてい
る。FIG. 4 is an example of records of a document database that stores documents. The record is sentence number 401° sentence 402. It consists of a dependent sentence number 403 and a summary sentence flag 404.

第５図は、既出重要語を記憶するキーワードデ−タベー
スのレコードの例である。レコードは文番号５０１、単
語５０２から構成されている。FIG. 5 is an example of a record in a keyword database that stores important words that have already appeared. The record is composed of a sentence number 501 and a word 502.

次に、第１図に示したフローチャート図に従って１本発
明の処理方式を説明する。まず１文番号ＳＮ及びキーワ
ードデータベースのレコード番号Ｍに初期値１を設定し
く１０１）、入力された日本文を入力装置１より読み込
んで文書データベースに文番号、文を登録しく１０２）
、単語分割する（１０３）。単語分割の方法については
すでに多くの方法が知られており、例えば、特開昭６２
−１６９２６２号で示された方法を用いればよい、単語
分割の結果はあらかじめクリアした単語分割結果データ
ベースに、単語つづり３０１、品詞３０２、辞書の種別
３０３として格納する。Next, a processing method of the present invention will be explained according to the flowchart shown in FIG. First, set the initial value 1 to the first sentence number SN and the record number M of the keyword database 101), read the input Japanese sentence from the input device 1, and register the sentence number and sentence in the document database 102)
, word division is performed (103). Many methods of word segmentation are already known, for example,
The method shown in No. 169262 may be used. The word segmentation results are stored in a pre-cleared word segmentation result database as word spelling 301, part of speech 302, and dictionary type 303.

日本文を読み込んだ後、単語分割結果データベースのレ
コード番号Ｎ及びキーワードデータベースの最終レコー
ド番号Ｊの値を設定しく１０４）。After reading the Japanese sentence, the values of the record number N of the word division result database and the final record number J of the keyword database are set (104).

単語分割結果データベースの内容を順に読み込む（１０
５）。Read the contents of the word segmentation result database in order (10
5).

次に、読み込んだ語が既出語を含むか否かを判定するこ
とにより、入力日本文の依存光の文を特定する。この詳
細を以下に述べる。ここでは、既出語を名詞に限定する
。まず、読み込んだ語の品詞３０２が名詞か否かを判定
しく１０６）、名詞でなければ、単語分割結果データベ
ースの次のレコードに移る６名詞であれば、キーワード
データベースのレコード番号ＭがＯであるか否かを判定
する。ＭがＯでなければ、未処理の語がキーワードデー
タベース中に存在するので、単語分割結果データベース
の語がキーワードデータベースのＭ番目のレコードと一
致するか否かを判定する（１０８）。Next, the dependent sentence of the input Japanese sentence is identified by determining whether or not the read word includes a word that has already appeared. The details are described below. Here, already mentioned words are limited to nouns. First, it is determined whether the part of speech 302 of the read word is a noun or not (106), and if it is not a noun, move to the next record in the word segmentation result database.6) If it is a noun, the record number M in the keyword database is O. Determine whether or not. If M is not O, since an unprocessed word exists in the keyword database, it is determined whether the word in the word segmentation result database matches the Mth record in the keyword database (108).

一致しなければ、キーワードデータベースのレコード番
号Ｍを１減らして（１０９）、同一処理を繰り返す。一
致すれば、キーワードデータベースのＭ番目のレコード
の文番号と文書データベースのＳＮ番目のレコードの依
存先文番号を比較する（１１０）、この際、文番号が空
白のときはＯとみなす。キーワードデータベースのＭ番
目のレコードの文番号の方が大きかった場合は、文書デ
ータベースのＳＮ番目のレコードの依存先文番号をキー
ワードデータベースのＭ番目のレコードの文番号に変更
する（１１１）。それ以外の場合は５文書データベース
を更新しない、そして、単語分割結果データベースの次
の語へ移る。If they do not match, the record number M of the keyword database is decreased by 1 (109), and the same process is repeated. If they match, the sentence number of the Mth record in the keyword database is compared with the dependent sentence number of the SNth record in the document database (110). At this time, if the sentence number is blank, it is assumed to be O. If the sentence number of the Mth record in the keyword database is larger, the dependent sentence number of the SNth record in the document database is changed to the sentence number of the Mth record in the keyword database (111). In other cases, the 5-document database is not updated, and the process moves to the next word in the word segmentation result database.

キーワードデータベースのレコード番号ＭがＯであれば
、読み込んだ語がキーワードデータベース中の語と一致
しない、つまり新出語であることになり、単語分割結果
データベースの次の語Ｉ＼移る。If the record number M of the keyword database is O, the read word does not match any word in the keyword database, that is, it is a new word, and the process moves to the next word I\ in the word division result database.

単語分割結果データベースの次の語に移ったら、単語分
割結果データベースのレコード番号Ｎに１を加えて（１
１２）、そのレコードに語が存在するかを確認しく１１
３）、存在すれば読み込んでキーワードデータベース中
の語と一致するか否かを判定する処理を繰り返す。単語
分割結果データベースのＮ番目のレコードに語が存在し
なければ、単語分割結果データベースのすべての語を処
理したことになる。When moving to the next word in the word segmentation result database, add 1 to the record number N in the word segmentation result database (1
12) Check if the word exists in the record11
3) If it exists, the process of reading it and determining whether it matches the word in the keyword database is repeated. If no word exists in the Nth record of the word segmentation result database, all words in the word segmentation result database have been processed.

次に、文書データベースのＳＮ番目のレコードに依存先
文番号が登録されているかを確認する（１１４）、登録
されていなければ、依存先文番号が特定できなかったこ
とになり、デフォルト値として’５Ｎ−１’　を文書デ
ータベースの依存売文番号欄に登録する（１１５）。Next, it is checked whether the dependent statement number is registered in the SNth record of the document database (114). If it is not registered, it means that the dependent statement number could not be specified, and the default value is ''. 5N-1' is registered in the dependent bill number column of the document database (115).

次に、単語分割結果データベースの語をキーワードデー
タベースに追加する。以下、その詳細を述べる。まず、
単語分割結果データベースのレコード番号Ｎ、キーワー
ドデータベースのレコード番号Ｍの値を設定する（１１
６）。次に、単語分割結果データベースの語を順に読み
込み（１１７）、名詞か否かを判定する（１１８）。名
詞であれば、キーワードデータベースに単語及び文番号
を登録する（１１９）。名詞以外の場合は登録しない。Next, the words in the word segmentation result database are added to the keyword database. The details will be described below. first,
Set the values of the record number N of the word division result database and the record number M of the keyword database (11
6). Next, the words in the word division result database are read in order (117), and it is determined whether or not they are nouns (118). If it is a noun, the word and sentence number are registered in the keyword database (119). Anything other than a noun will not be registered.

そして、単語分割結果データベースのレコード番号を１
進め（１２０）、まだ未処理の語が残っているか否かを
判定しく１２１）、残っていれば、キーワードデータベ
ースのレコード番号を１進め（１２２）。Then, set the record number of the word segmentation result database to 1.
It advances (120), and it is determined whether or not there are any unprocessed words left (121). If there are, the record number of the keyword database is incremented by one (122).

単語分割結果データベース中の語に対して同一処理を繰
り返す。単語分割結果データベース中の全ての語を処理
したら、次の日本文があるか否かを判定しく１２３）、
あれば、文番号を１進めて（１２４）、次の日本文に対
して同一処理を繰り返す。The same process is repeated for words in the word segmentation result database. After processing all the words in the word segmentation result database, it is determined whether the next Japanese sentence exists or not (123).
If there is, the sentence number is incremented by 1 (124) and the same process is repeated for the next Japanese sentence.

全ての入力日本文の処理を終えたら、要約文として抽出
すべき文を選び出力する。ここでは、最後の文から順に
依存先の文を取りだし、出力することにする。まず、文
書データベースのＳＮ番目の文を出力するためにフラグ
を立てる（１２５）。After processing all input Japanese sentences, select and output sentences to be extracted as summary sentences. Here, we will extract and output dependent statements in order from the last statement. First, a flag is set to output the SNth sentence of the document database (125).

次に、ＳＮ番目の文の依存先の文番号をＳＮに代入しく
１２６）、ＳＮが０か否かを判定する（１２７）　。Next, the statement number on which the SNth statement depends is assigned to SN (126), and it is determined whether SN is 0 (127).

０でなければ、依存先の文にフラグを立てる処理を続け
る。ＳＮが０になれば、出力すべき文を全て選んだこと
になり、先頭の文から順に出力する。If it is not 0, continue flagging the dependent statement. When SN becomes 0, it means that all sentences to be output have been selected, and the sentences are output in order starting from the first sentence.

文番号ＳＮに１を順に加えて（１２８）、日本文がある
かを判定する（１２９）。日本文があれば。1 is sequentially added to the sentence number SN (128), and it is determined whether there is a Japanese sentence (129). If there is Japanese text.

フラグが立っているかを判定しく１３０）、フラグが立
っていれば、日本文を出力しく１３１）、フラグが立っ
ていなければ、出力しない、そして、文番号ＳＮに１を
加えて同一処理を続ける。最後の日本文まで出力したら
、要約文生成処理を全て終えることになる。Determine whether the flag is set 130), if the flag is set, output the Japanese text 131), if the flag is not set, do not output it, then add 1 to the statement number SN and continue the same process. . Once the last Japanese sentence has been output, the entire summary sentence generation process is complete.

例として、第４図の文書データベースに記述された文書
の要約方法を示す。上に述べた方法により得られた依存
売文番号が４０３に記述されている。第６図に示したの
は、文の依存関係を示した木構造である。本方式では、
先頭の文から最終の文へつながる一連の文はなわち番号
１，３，４゜９．１０の文を要約文として抽出すること
になる。As an example, a method of summarizing documents written in the document database of FIG. 4 will be shown. The dependent bill number obtained by the method described above is written in 403. What is shown in FIG. 6 is a tree structure showing the dependency relationships between sentences. In this method,
A series of sentences leading from the first sentence to the last sentence, that is, sentences with numbers 1, 3, 4, 9, and 10 are extracted as summary sentences.

要約結果を第７図に示す。The summary results are shown in Figure 7.

一部の文を取りだし要約文とするには、文の間の関係を
調べなくても特定のキーワードを定めてキーワードを含
んだ文だけ抽出することも考えられる。この方法を第８
図に示したフローチャート図に従って説明する。まず、
入力された日本文を入力装置１より読み込み（８０１）
、単語分割する（８０２）、単語分割の結果はあらかじ
めクリアして単語分割結果データベースに、単語つづり
３０１、品詞３０２．辞書の種別３０３として格納する
。In order to extract some sentences and use them as a summary sentence, it is conceivable to specify a specific keyword and extract only the sentences that include the keyword, without examining the relationship between the sentences. This method is the 8th
The explanation will be given according to the flowchart shown in the figure. first,
Read the input Japanese text from input device 1 (801)
, word segmentation (802), the word segmentation results are cleared in advance and stored in the word segmentation result database as word spelling 301, part of speech 302, . It is stored as the dictionary type 303.

日本文を読み込んだ後、単語分割結果データベースのレ
コード番号Ｎに初期値１を設定しく８０３）、読み込ん
だ文がタイトル文であるか否かを判定する（８０４）、
タイトル文であるか否か判定には。After reading the Japanese sentence, set the record number N of the word division result database to an initial value of 1 (803), and determine whether the read sentence is a title sentence (804).
To determine whether it is a title sentence or not.

例えば２文末に句点がなくかつ文の直後に改行コード、
空白が続いているといった条件を用いることができる。For example, if there is no period at the end of the second sentence and there is a line break code immediately after the sentence,
Conditions such as consecutive blank spaces can be used.

読み込んだ文がタイトル文である場合は、キーワードデ
ータベースをクリアしく８０５）、キーワードデータベ
ースのレコード番号Ｍに初期値１を設定する（８０６）
。次に、単語分割結果データベースのＮ番目のレコード
を読み込み（８０７）、単語種別３０２が内容語である
かを判定する（８０ｇ）。If the read sentence is a title sentence, clear the keyword database (805) and set the initial value 1 to the record number M of the keyword database (806).
. Next, the Nth record of the word division result database is read (807), and it is determined whether the word type 302 is a content word (80g).

内容語であれば、まず、単語っづり３０１をキーワード
データベースのＭ番目のレコードに登録しく８０９）、
キーワードデータベースのレコード番号Ｍを１だけ増や
す（８１０）。次に、単語分割結果データベースのレコ
ード番号Ｎを１だけ増やす（８１０）。単語分割結果デ
ータベースのＮ番目ルコードが内容語でない場合は、何
もせずにＮを１だけ増やす（ｓ　ｉ　ｏ）。そして、単
語分割結果データベースのＮ番目のレコードにまだ未処
理の語があるかを判定する（８１２）、未処理の語があ
れば、同一処理を繰り返す。未処理の語がなければ、単
語分割結果データベースの全ての語を処理したことにな
り、タイトル文を出力しく８２０）、次の文に移る。If it is a content word, first register the full word 301 in the Mth record of the keyword database 809),
The record number M of the keyword database is increased by 1 (810). Next, the record number N of the word division result database is incremented by 1 (810). If the Nth code in the word division result database is not a content word, do nothing and increment N by 1 (s i o). Then, it is determined whether there are any unprocessed words in the Nth record of the word division result database (812). If there are any unprocessed words, the same process is repeated. If there are no unprocessed words, it means that all words in the word division result database have been processed, the title sentence is output (820), and the process moves to the next sentence.

読み込んだ日本文がタイトル文でない場合は、単語分割
結果データベースのＮ番目のレコードを読み込み（８１
３）、キーワードデータベースのレコード番号Ｍに初期
値１を設定する（８１４）。そして、キーワードデータ
ベースのＭ番目のレコードの語が単語分割結果データベ
ースのＮ番目のレコードの語と一致するか否かを調べる
（８１５）。If the read Japanese sentence is not a title sentence, read the Nth record of the word segmentation result database (81
3) Set the record number M of the keyword database to an initial value of 1 (814). Then, it is checked whether the word in the Mth record of the keyword database matches the word in the Nth record of the word division result database (815).

一致すれば、読み込んだ日本文の中にキーワードが含ま
れていることになり、入力［」本文が東要な文であると
みなして、この文を出方装置３に出力して（８２０）、
次の文に移る。一致しなければ、キーワードデータベー
スのレコード番号Ｍに１を加え（８１６）、キーワード
データベースのＭ番目のレコードに語がまだ残っている
かを判定しく８１７）、残っていれば、再び、キーワー
ドデータベースのＭ番目のレコードの語が単語分割結果
データベースのＮ番目のレコードの語と一致するか否か
を調べる（８１５）、残っていなければ、単語分割結果
データベースのレコード番号Ｎを１だけ増やしく８１８
）、単語分割結果データベースのＮ番目のレコードにま
だ未処理の語があるかを判定する（８１９）。未処理の
語があれば、単語分割結果データベースのＮ＃ｒ目のレ
コードを読み込み、同一処理を繰り返す。未処理の語が
なければ、単語分割結果データベースの語にキーワード
が含まれていないことになり、入力日本文が重要な文で
ないとみなして、この文を出力せずに１次の文に移る。If they match, it means that the keyword is included in the read Japanese text, and the input text is considered to be a key text, and this text is output to the output device 3 (820). ,
Move on to the next sentence. If there is no match, 1 is added to the record number M of the keyword database (816), and it is determined whether there are any words left in the Mth record of the keyword database (817). It is checked whether the word of the th record matches the word of the Nth record of the word segmentation result database (815). If there is no word left, the record number N of the word segmentation result database is increased by 1 818
), it is determined whether there is still an unprocessed word in the Nth record of the word segmentation result database (819). If there is an unprocessed word, the N#rth record of the word division result database is read and the same process is repeated. If there are no unprocessed words, it means that the words in the word segmentation result database do not contain the keyword, and the input Japanese sentence is considered not to be important, and the process moves to the next sentence without outputting this sentence. .

次の文に移ってからは、まず次の文があるか否かを調べ
る（８２１）、次の文がなければ、全ての文を処理した
ことになる。次の文があった場合、再び１次の文を読み
込み（８０１）、同一の処理を文章の終わるまで繰り返
す。After moving on to the next sentence, it is first checked whether there is a next sentence (821); if there is no next sentence, all sentences have been processed. If there is a next sentence, the first sentence is read again (801) and the same process is repeated until the sentence ends.

キーワードを決める方法としては、タイトル文中の内容
語とする方法だけでなく、先頭文や最終文中に現われた
内容語とすることやアンダーラインを引いた語、太字、
ゴシック、斜文字など字体が異なる語とすることも可能
である。この場合。Keywords can be determined not only by using content words in the title sentence, but also by using content words that appear in the first or last sentence, underlined words, bold words, etc.
It is also possible to use words in different fonts, such as Gothic or italic. in this case.

キーワードを決定する部分以外のアルゴリズムは図８と
同じである。The algorithm other than the part for determining keywords is the same as in FIG. 8.

システムに組み込まれた辞書が基本語辞書と専門語辞書
に分離されている場合、要約文を抽出する方法として、
次のようなことも考えられる。If the dictionary built into the system is separated into a basic language dictionary and a specialized language dictionary, the following methods are used to extract summary sentences:
The following may also be considered.

この方法を第９図に示したフローチャートに従って説明
する。まず、入力された日本文を久方装置１より読み込
む（９０１）、読み込んだ文を単語分割しく９０２）、
単語分割の結果を単語分割結果データベースに、単語っ
づり３０１、品詞３ｏ２゜辞書の種別３０３として格納
する。つぎに、単語分割結果データベース中の語を順に
読み込み（９０３）、辞書種別３０３により読み込んだ
語が専門語辞書に記述された語であるか否かを調べる（
９０４）、専門語辞書に記述された語であれば、入力日
本文が重要な文であるとみなして、この文を出力装置３
に出力して（９０５）、次の文に移る。読み込んだ単語
分割結果データベース中の語が専門語辞書に記述された
語でなければ、単語分割結果データベース中の語を全て
処理したかを判定しく９０６）、未処理の語があれば、
次の語を単語弁−割結果データベースがら読み込み（９
０３）。This method will be explained according to the flowchart shown in FIG. First, an input Japanese sentence is read from Kugata device 1 (901), the read sentence is divided into words (902),
The word segmentation results are stored in the word segmentation result database as full word 301 and part of speech 3o2° dictionary type 303. Next, the words in the word division result database are read in order (903), and it is checked based on the dictionary type 303 whether the read word is a word described in the technical language dictionary (903).
904), if the word is described in the technical language dictionary, the input Japanese sentence is considered to be an important sentence, and this sentence is sent to the output device 3.
(905) and moves on to the next sentence. If the word in the read word segmentation result database is not a word described in the technical language dictionary, it is determined whether all the words in the word segmentation result database have been processed (906), and if there are unprocessed words,
Read the next word from the word division result database (9
03).

同一処理を続ける。単語分割結果データベース中の語を
全て処理したら、入力日本文が重要な文でないとみなし
て、この文を出力せずに１次の文に移る。まず次の文が
あるが否かを調べる（９０７）。Continue the same process. Once all the words in the word segmentation result database have been processed, the input Japanese sentence is considered not to be an important sentence, and the process moves to the next sentence without outputting this sentence. First, it is checked whether the next sentence exists or not (907).

次の文がなければ、全ての文を処理したことになり、要
約文抽出処理を終える０次の文があった場合、再び１次
の文を読み込み（９０１）、同一の処理を文章の終わる
まで繰り返す。If there is no next sentence, it means that all sentences have been processed, and if there is a 0th sentence to finish the summary sentence extraction process, read the 1st sentence again (901) and perform the same process at the end of the sentence. Repeat until.

システムがユーザ辞書を有する場合は、専門語辞書の変
わりにユーザ辞書を用いて、第９図のアルゴリズムを使
うこともできる。If the system has a user dictionary, the user dictionary can be used instead of the technical language dictionary, and the algorithm shown in FIG. 9 can also be used.

荒っぽい自動要約方法として１文章中の先頭段落と最終
段落のみを抽出する方法もある。この方法は、自動要約
としてはあまり有効とはいえないが１機械翻訳システム
と組合せた場合には有効であると考えられる。There is also a crude automatic summarization method that extracts only the first and last paragraphs of a single sentence. Although this method cannot be said to be very effective as an automatic summary, it is considered to be effective when combined with a machine translation system.

上に述べた自動要約方法を応用して、ある文書が重要で
あるか否かを判定することができる。複数の文書から、
重要な文書を選びだし、出方する方法の例を第１０図に
示すフローチャートに従って説明する。The automatic summarization method described above can be applied to determine whether a certain document is important or not. From multiple documents,
An example of a method for selecting and displaying important documents will be explained with reference to the flowchart shown in FIG.

まず１文書を読み込み（１００１）、文書の中の文を順
に読み込む（１００２）、次に、読み込んだ文が要約文
になるか否かを判定しく１０（）３）、要約文になるな
らば、要約文数に１を加え（１００４）。First, read one document (1001), read the sentences in the document in order (1002), then judge whether the read sentence becomes a summary sentence (10()3), and if it becomes a summary sentence, , adds 1 to the number of summary sentences (1004).

全文数にも１を加える（１００５）、読み込んだ文が要
約文にならなければ、全文数のみ１を加える（１００５
）、次に１文書中の文を全て読み終ったか否か判定しく
１００６）、終わっていなければ。Add 1 to the number of full sentences (1005); if the read sentence does not become a summary sentence, add 1 only to the number of full sentences (1005)
), then it is determined whether all the sentences in one document have been read (1006), and if not.

次の文を読み込む（１００２）、終っていれば、（要約
文数）／（全文数）がある値を超えているが否かを判定
しく１００７）、超えていれば、その文書を出力する（
１００８）、超えていなければ、何も出力せず、次の文
書があるが否かを判定する（１００９）、次の文書があ
れば、これを続み込み（１００１）、なければ、全ての
処理を終える。Read the next sentence (1002), if finished, determine whether (number of summary sentences)/(number of full texts) exceeds a certain value (1007), and if it does, output the document. (
1008), if the limit is not exceeded, nothing is output, and it is determined whether there is a next document (1009), if there is a next document, it is continued (1001), otherwise all Finish processing.

〔Effect of the invention〕

本発明によれば１文章の大意を自動的に抽出することが
でき、短時間で文章の大意を把握することができる。ま
た１本方法を機械翻訳に適用した場合、翻訳処理を効率
化することもできる。According to the present invention, the gist of a sentence can be automatically extracted, and the gist of the sentence can be grasped in a short time. Furthermore, when one method is applied to machine translation, the translation process can be made more efficient.

[Brief explanation of the drawing]

第１図は、本発明の１実施例を示すフローチャート、第
２図は本発明に係る日本語文書処理システムのハード構
成図、第３図は単語分割結果データベースのレコードの
例、第４図は文書データベースのレコードの例、第５図
はキーワードデータベースのレコードの例、第６図は文
の依存構造を表した木構造の例、第７図は要約結果の例
、第８図及び第９図及び第１０図は実施例の変形例を示
すフローチャートである。１・・・入力装置、２・・・中央処理装置、３・・・出
力装置、４・・・記憶装置。罵凹不図寥図ｌρ 図ＤFIG. 1 is a flowchart showing one embodiment of the present invention, FIG. 2 is a hardware configuration diagram of a Japanese document processing system according to the present invention, FIG. 3 is an example of records of a word segmentation result database, and FIG. An example of a document database record, Figure 5 is an example of a keyword database record, Figure 6 is an example of a tree structure showing a sentence dependency structure, Figure 7 is an example of a summary result, Figures 8 and 9. and FIG. 10 are flowcharts showing a modification of the embodiment. 1... Input device, 2... Central processing unit, 3... Output device, 4... Storage device. Figure D

Claims

[Claims] 1. In a natural language processing system having a dictionary and a word segmentation means, an automatic system characterized in that a specific sentence is automatically extracted by providing a means for accumulating specific already-used phrases. Summary method. 2. The automatic summarization method according to claim 1, wherein the extraction means analyzes connections between sentences in the document. 3. In a natural language processing system that has a dictionary and word segmentation means, it is possible to automatically determine which sentences should be extracted by checking whether each sentence in a document contains a specific word. Features an automatic summary method. 4. The automatic summary system according to claim 3, wherein the extraction means extracts sentences including words appearing in the title. 5. The automatic summarization system according to claim 3, wherein the extraction means extracts sentences including underlined words or words with different fonts. 6. A patent characterized in that, when the dictionary of the natural language processing system is separated into a technical term dictionary and a basic term dictionary, the extraction means extracts sentences that include words registered in the technical term dictionary. Automatic summarization method according to claim 3. 7. Claim 3, characterized in that when the dictionary of the natural language processing system includes a user dictionary, the extraction means extracts sentences that include words registered in the user dictionary. Automatic summary method for section descriptions. 8. In a natural language processing system having means for determining titles and paragraphs, an automatic system is characterized in that it automatically determines sentences to be extracted by examining the positional relationship of each sentence in a document. Summary method. 9. The automatic summary system according to claim 8, wherein the extraction means extracts a title, a first paragraph, and a last paragraph. 10. The automatic summary system according to claim 8, wherein the extraction means extracts a title and a first sentence and a last sentence in each paragraph. 11. A machine translation method characterized in that a summary sentence created by the methods described in claims 1, 3, and 8 is to be translated. 12. Important documents are recognized using the ratio between the number of summary sentences extracted by the methods described in claims 1, 3, and 8 and the number of full sentences in the document. Natural language processing system.