JP3206600B2

JP3206600B2 - Document generation device

Info

Publication number: JP3206600B2
Application number: JP02183689A
Authority: JP
Inventors: 忠信宮内; 義文松永
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1989-01-30
Filing date: 1989-01-30
Publication date: 2001-09-10
Anticipated expiration: 2016-09-10
Also published as: JPH02255960A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、ワードプロセッサなどの文書処理を行う
計算機に関し、特に利用者が簡単な操作で文書に付加情
報を与え、これにより要約文など情報抽出がなされた新
文書を容易に得ることができる文書生成装置に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer for performing document processing such as a word processor, and in particular, a user gives additional information to a document by a simple operation, thereby extracting information such as a summary sentence. TECHNICAL FIELD The present invention relates to a document generation device capable of easily obtaining a new document in which the execution has been performed.

[Conventional technology]

近年、計算機技術の進歩により文書が計算機上で高速
に扱えるようになり、これに伴ってオフィスでは文書の
電子化が急速に進んでいる。特に、一旦電子化された文
書は加工がしやすいので、機械翻訳や推敲支援などの研
究が盛んである。しかし、これらの実用化には技術的な
課題が多いため、用途が限定されやすく、広く一般に普
及するには至っていない。そこで、電子化された文章を
構成する単語を電子化辞書を用いて調べ、注釈やマーキ
ングを付すようにした情報処理装置や辞書引き装置など
が開発されている。In recent years, with advances in computer technology, documents can be handled at high speed on computers, and with this, documents have been rapidly digitized in offices. In particular, once digitized documents are easy to process, research on machine translation and elaboration support is active. However, since there are many technical problems in practical use of these, their applications are easily limited, and they have not yet been widely used. Therefore, an information processing device or a dictionary lookup device has been developed in which words constituting an electronic sentence are checked using an electronic dictionary and annotations and markings are added.

[Problems to be solved by the invention]

しかしながら、辞書引き装置などは翻訳や原稿の理解
などに対する支援的な位置付けのものであり、また、出
力結果を直接文書として利用することが難しいという問
題点があった。However, a dictionary lookup device or the like is a supportive device for translation and comprehension of manuscripts, and has a problem that it is difficult to directly use an output result as a document.

この発明は、出力結果を直接文書として利用すること
ができ、翻訳や原稿の理解などに有効な付加価値の高い
文書を出力することができる文書生成装置を提供するこ
とを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a document generation apparatus capable of directly using an output result as a document and outputting a high value-added document effective for translation, understanding of a document, and the like.

[Means for solving the problem]

本発明は、文書を記憶する記憶手段と、前記記憶手段
に記憶されている文書の段落を順次選択する選択手段
と、所定の書式を入力する第１の入力手段と、キーワー
ドを入力する第２の入力手段と、前記選択手段によって
選択された文書の段落が一文で形成されているか否かを
判断する第１の判断手段と、前記第１の判断手段が一文
で形成されていると判断した場合、該一文中の文字並び
が前記所定の書式に一致しているか否かを判断する第２
の判断手段と、前記第２の判断手段が所定の書式に一致
していると判断した場合、当該一文を抽出する第１の抽
出手段と、前記第１の判断手段が一文で形成されていな
いと判断した場合、当該段落内に前記キーワードが存在
するか否かを判断する第３の判断手段と、前記第３の判
断手段がキーワードが存在すると判断した場合、該キー
ワードを含む一文を抽出する第２の抽出手段と、前記第
１の抽出手段及び前記第２の抽出手段が抽出した一文を
順次保持した一文書を生成する文書生成手段とを具備し
たことを特徴とする。The present invention provides a storage unit for storing a document, a selection unit for sequentially selecting paragraphs of the document stored in the storage unit, a first input unit for inputting a predetermined format, and a second input unit for inputting a keyword. Input means, first determining means for determining whether the paragraph of the document selected by the selecting means is formed by one sentence, and determining that the first determining means is formed by one sentence In the second case, it is determined whether or not the character arrangement in the sentence matches the predetermined format.
The first extracting means for extracting the sentence and the first judging means are not formed of one sentence when the judging means and the second judging means judges that the sentence matches the predetermined format. If it is determined that the keyword exists in the paragraph, a third determining unit that determines whether or not the keyword exists, and if the third determining unit determines that the keyword exists, a sentence including the keyword is extracted. It is characterized by comprising a second extracting means, and a document generating means for generating one document sequentially holding one sentence extracted by the first extracting means and the second extracting means.

なお、第１の判断手段、第２の判断手段、第１の抽出
手段、第３の判断手段、第２の抽出手段が行う処理例と
しては、後述の第７図におけるステップ102、111、11
2、118、119の処理が、それぞれ対応する。Examples of processing performed by the first determining means, the second determining means, the first extracting means, the third determining means, and the second extracting means include steps 102, 111, 11 in FIG.
The processes of 2, 118 and 119 correspond respectively.

[Action]

本発明では、記憶手段が文書を記憶し、選択手段が記
憶手段に記憶されている文書の段落を順次選択し、第１
の入力手段が所定の書式を入力し、第２の入力手段がキ
ーワードをし、第１の判断手段が前記選択手段によって
選択された文書の段落が一文で形成されているか否かを
判断し、第２の判断手段が前記第１の判断手段が一文で
形成されていると判断した場合、該一文中の文字並びが
前記所定の書式に一致しているか否かを判断し、第１の
抽出手段が前記第２の判断手段が所定の書式に一致して
いると判断した場合、当該一文を抽出し、第３の判断手
段が前記第１の判断手段が一文で形成されていないと判
断した場合、当該段落内に前記キーワードが存在するか
否かを判断し、第２の抽出手段が前記第３の判断手段が
キーワードが存在すると判断した場合、該キーワードを
含む一文を抽出し、文書生成手段が前記第１の抽出手段
及び前記第２の抽出手段が抽出した一文を順次保持した
一文書を生成するようにしている。In the present invention, the storage means stores the document, and the selection means sequentially selects paragraphs of the document stored in the storage means,
The input means inputs a predetermined format, the second input means performs a keyword, and the first determination means determines whether a paragraph of the document selected by the selection means is formed by one sentence, When the second judging means judges that the first judging means is formed by one sentence, it judges whether or not the character arrangement in the one sentence matches the predetermined format, and performs the first extraction. When the means determines that the second determination means matches the predetermined format, the sentence is extracted, and the third determination means determines that the first determination means is not formed of one sentence. In this case, it is determined whether or not the keyword exists in the paragraph. If the second extracting unit determines that the keyword exists, the second extracting unit extracts one sentence including the keyword and generates a document. Means for extracting the first extraction means and the second extraction means; Means are adapted to generate a sequence retained one document sentence extracted by the.

〔Example〕

以下、この発明に係わる文書生成装置の一実施例を説
明する。Hereinafter, an embodiment of a document generation apparatus according to the present invention will be described.

第１図は、この発明に係わる文書生成装置の概念構成
を示すブロック図である。この文書生成装置は、ユーザ
により入力された原文書を文書データとして電子的に格
納する原文記憶部11と、新文書を作成するための条件
を、前記原文記憶部11に記憶された原文書に基づいて設
定する生成条件入力部12と、前記生成条件入力部12から
入力された生成条件により、前記原文記憶部11に記憶さ
れた原文書から特定の構文を有する文を抽出する文抽出
部13と、前記文抽出部13により抽出された複数の文を用
いて、目的とする新文書を生成する文書生成部14とから
構成されている。FIG. 1 is a block diagram showing a conceptual configuration of a document generation device according to the present invention. This document generation device includes an original text storage unit 11 that electronically stores an original document input by a user as document data, and a condition for creating a new document in the original document stored in the original text storage unit 11. A generation condition input unit 12 that is set based on the generation condition input unit 12, and a sentence extraction unit 13 that extracts a sentence having a specific syntax from the original document stored in the original text storage unit 11 according to the generation condition input from the generation condition input unit 12. And a document generation unit 14 for generating a target new document using the plurality of sentences extracted by the sentence extraction unit 13.

第２図は、この発明に係わる文書生成装置の基本構成
を示すブロック図である。図において、第１図と同一符
号は同等部分を示すものとする。FIG. 2 is a block diagram showing a basic configuration of the document generation device according to the present invention. In the figure, the same reference numerals as those in FIG. 1 indicate the same parts.

原文入力部15は、処理すべき文書を計算機で処理でき
るように電子化された文書データの形に変換する部分で
あり、原文書は原文入力部15で文書データに変換された
後、一旦、原文記憶部11に記憶される。情報処理部10
は、原文記憶部11に記憶された原文書と、生成条件入力
部12から入力された生成条件に基づいて新文書の生成を
行う。情報処理部10はマイクロプロセッサからなり、辞
書部16により単語の切り出しを行う原文走査部18と、入
力条件に従って原文記憶部11に記憶された原文書から特
定の文を抽出する文抽出部13と、抽出された文から新し
い文書を生成する文書生成部14とから構成されている。
情報処理部10で生成された文書は、一旦結果記憶部17に
蓄えられ、表示部19又は印刷部20から出力される。The original text input unit 15 is a part that converts a document to be processed into a form of digitized document data so that it can be processed by a computer.After the original document is converted into document data by the original text input unit 15, It is stored in the original text storage unit 11. Information processing unit 10
Generates a new document based on the original document stored in the original text storage unit 11 and the generation conditions input from the generation condition input unit 12. The information processing unit 10 includes a microprocessor, an original sentence scanning unit 18 that cuts out words by the dictionary unit 16, a sentence extraction unit 13 that extracts a specific sentence from the original document stored in the original sentence storage unit 11 according to input conditions, and And a document generation unit 14 for generating a new document from the extracted sentences.
The document generated by the information processing unit 10 is temporarily stored in the result storage unit 17 and output from the display unit 19 or the printing unit 20.

なお、原文記憶部11には原文を直接入力する手段を接
続してもよく、文書生成部14には結果の出力手段及び記
憶手段を別に設けてもよい。また、文書から抽出される
特定の文には、項目名などの特定の構文を有する文と、
キーワードなどの特定の単語を有する文が含まれてお
り、これらの文の抽出、及び文書生成のために種々の電
子化辞書を設けてもよい。It should be noted that a means for directly inputting an original sentence may be connected to the original sentence storage unit 11, and a result output means and a storage means may be separately provided for the document generation unit 14. Also, specific sentences extracted from the document include a statement having a specific syntax such as an item name,
Sentences having specific words such as keywords are included, and various electronic dictionaries may be provided for extracting these sentences and generating documents.

第３図は、上述した文書生成装置を日本語要約文生成
システムに適用した場合のシステム構成図である。この
システムは、各種の日本語印刷物について、簡単な条件
の指定で要約文を出力するよう構成されたもので、要約
文は主に長文の理解の促進や、時間の節約のために用い
られる。FIG. 3 is a system configuration diagram when the above-described document generation device is applied to a Japanese summary sentence generation system. This system is configured to output a summary sentence for various types of Japanese printed matter by specifying simple conditions, and the summary sentence is mainly used to promote understanding of a long sentence and to save time.

第３図において、21は原文Ａを読み取り、文書データ
に変換する原稿入力部である光学式文字認識装置（OC
R）、22は生成条件等の各種データやコマンドを入力す
るための生成条件入力部である編集／情報入力部、23は
生成された新文書を記録紙上にプリントして出力するた
めの印刷部であるレーザープリンタである。他の構成は
第２図と同様であり、同等部分を同一符号で示す。In FIG. 3, reference numeral 21 denotes an optical character recognition device (OC) which is an original input unit for reading an original A and converting it into document data.
R) and 22 are an edit / information input unit which is a generation condition input unit for inputting various data and commands such as generation conditions, and 23 is a printing unit for printing and outputting a generated new document on recording paper. Is a laser printer. Other configurations are the same as those in FIG. 2, and the same parts are denoted by the same reference numerals.

次に、上述した日本語要約文生成システムの作用を、
文書の要約文作成を例として第２図〜第７図と共に説明
する。Next, the operation of the Japanese summary sentence generation system described above will be described.
The description will be made with reference to FIG. 2 to FIG.

最初に、OCR21を用いて原文Ａを読み込む。この例で
は原文Ａが印刷物であるが、ワードプロセッサなどによ
り既に電子化されている文書であれば、フロッピイディ
スクなどの記憶媒体を介して読み込んでもよい。また、
その場で文書を作成できる文書作成手段を設け、編集結
果をそのまま原文として使用してもよい。その場合の入
力手段としては通常のキーボードによるタイプ入力に限
らず、例えばタブレットを用いたオンライン手書き文字
認識などを用いれば、計算機になじみのない人でも容易
に原文を作成することができる。First, the original text A is read using OCR21. In this example, the original text A is a printed matter, but any document that has already been digitized by a word processor or the like may be read via a storage medium such as a floppy disk. Also,
A document creating means capable of creating a document on the spot may be provided, and the edited result may be used as it is as the original text. In this case, the input means is not limited to typing using a normal keyboard. For example, if an online handwritten character recognition using a tablet is used, even a person who is unfamiliar with the computer can easily create the original text.

OCR21によって文書データに変換された原文Ａは、一
旦原文記憶部11の磁気ディスクに記憶された後、情報処
理部10において要約文作成の処理が施される。なお、以
下に述べる処理は原文を入力するごとに行ってもよい
し、複数蓄えた文書の中から選択して行うこともでき
る。The original sentence A converted into the document data by the OCR 21 is temporarily stored on the magnetic disk of the original sentence storage unit 11, and then subjected to a summary sentence creation process in the information processing unit 10. The processing described below may be performed each time the original text is input, or may be selected from a plurality of stored documents.

まず、編集／情報入力部22において生成条件を設定す
る。第４図は、生成条件を入力する場合の一例を示すも
ので、表示部19の画面上での表示状態を示している。こ
の例では、第５図に示す原文について、章の名前（1.
…、2.…等）及び節の名前（2.1…、2.2…等）と、箇条
書きの部分、キーワードを含む文を抽出して要約文とし
ている。生成条件の入力は、第４図に示すように、項目
の部分に章及び節の名前、箇条書き及びキーワードの部
分にその具体例を入力してゆく。なお、初期画面では、
項目と箇条書きの選択などの一般的な条件があらかじめ
設定されている。First, the editing / information input unit 22 sets a generation condition. FIG. 4 shows an example of a case where a generation condition is input, and shows a display state on the screen of the display unit 19. In this example, the chapter names (1.
, 2 ...., etc.) and the names of the sections (2.1, 2.2, etc.), and the sentences containing the bullet points and keywords are extracted as a summary sentence. As shown in FIG. 4, the input of the generation conditions is to input the names of chapters and sections in the item section, and specific examples in the section and keyword sections. In the initial screen,
General conditions such as selection of items and bullets are preset.

これらの生成条件は、原文が比較的短いものであれば
キーボードから入力してもよいが、原文から直接転記す
ることもできる。例えば、第４図の文書ホルダー１の中
から“XXXXXの高速仮想マシン”の項目を選択すると、
画面上の図示せぬ他の領域のウインドウに原文（第５
図）が表示される。ここで、例えば原文の項目の名前の
部分を入力する場合は、条件入力メニューの“項目”を
選択するとその右側のエリアが開くので、ユーザーはポ
インティングデバイス（マウス）により原文の該当部分
を順次選択して組み合わせてゆく。箇条書きやキーワー
ドの部分も同様にして入力すれば、容易に生成条件の設
定ができる。文書ホルダー１には、読み込み済みの文書
が表示されているので、生成条件の設定が終了した後、
再び要約したい文書を選択して「開始」を指定すれば、
情報処理部10において文の抽出、生成の処理が開始され
る。These generation conditions may be input from the keyboard as long as the original text is relatively short, but can also be transcribed directly from the original text. For example, if you select the item “High-speed virtual machine of XXXXX” from the document holder 1 in FIG. 4,
In the window of the other area (not shown) on the screen,
Figure) is displayed. Here, for example, when inputting the name of an item in the original text, selecting "item" in the condition input menu opens the area on the right side, and the user sequentially selects the corresponding portion of the original text using a pointing device (mouse). And combine them. By similarly inputting the item of the bullet point and the keyword, the generation condition can be easily set. Since the loaded document is displayed in the document holder 1, after setting the generation conditions,
Select the document you want to summarize again and specify "Start"
The information processing unit 10 starts a sentence extraction and generation process.

次に、情報処理部10の原文走査部18（第２図）では、
文書の構造付けが行われる。まず、原文に対して辞書部
16を参照しながら単語の切り出しを行う。原文の冒頭の
部分における単語分割の例を以下に示す。Next, in the original text scanning unit 18 (FIG. 2) of the information processing unit 10,
The document is structured. First, the dictionary section
Cut out words while referring to 16. An example of word segmentation at the beginning of the original sentence is shown below.

|1||はじめに｜ |XXXXX|は｜※※らし||い||システム||だが｜実行速度
｜が｜… このような単語単位の分割が終了した後、第６図に示
すように構造付けを行う。ここでは、句点（。）又はピ
リオド（、）と改行記号に着目して、文中の他の記号な
ども含めて文としてまとめ、改行記号が付された箇所に
より段落として区切る。このとき、単語、文及び段落に
はそれぞれ番号を付け、段落はその先頭と末尾の文番号
を、文にはその先頭と最後の単語番号を持つようにす
る。この結果について文抽出部13では、設定された生成
条件に従って、後述する手順により特定の文を抽出し、
文書生成部14に送出する。文書生成部14では、抽出され
た文を組み合わせて新文書を生成する。生成された文書
は、一旦結果記憶部17に格納され、開始時の指定により
表示部19のCRTディスプレイ又は印刷部20のレーザープ
リンタに出力される。なお、出力結果に不満がある場合
は、図示せぬ編集装置により内容を手直しした上で、さ
らに複数の文書をまとめて出力することもできる。| 1 || Introduction | | XXXXX | is | ** like || I || System || But | Execution speed | is |… After such word unit division is completed, the structure as shown in Fig. 6 Make the attachment. Here, focusing on a period (.) Or a period (,) and a line feed symbol, a sentence including other symbols in the sentence is put together as a sentence, and the sentence is divided into paragraphs by a portion where the line feed symbol is added. At this time, numbers are assigned to words, sentences, and paragraphs, respectively, so that paragraphs have the beginning and end sentence numbers, and sentences have the beginning and end word numbers. Based on this result, the sentence extraction unit 13 extracts a specific sentence according to a procedure described later according to the set generation condition,
The document is sent to the document generation unit 14. The document generation unit 14 generates a new document by combining the extracted sentences. The generated document is temporarily stored in the result storage unit 17, and is output to the CRT display of the display unit 19 or the laser printer of the printing unit 20 according to the designation at the start. If the output result is unsatisfactory, the contents can be edited by an editing device (not shown), and then a plurality of documents can be output together.

次に、情報処理部10における目的文抽出の処理手順を
前出の第６図を参照しながら第７図のフローチャートに
基づいて説明する。Next, the processing procedure for extracting a target sentence in the information processing section 10 will be described based on the flowchart of FIG. 7 with reference to FIG.

まず、原文走査部18で構造付けが行われた文書につい
て、特定の構文（項目名及び箇条書きの部分）を有する
文の抽出を行う。これらは一般に一行のみで、且つ最初
に順序立ての記号が付くなどの性質を持つため、ユーザ
ーが入力した具体例の書式、もしくはシステム自身に設
定されている基本的な書式と比較することで抽出するこ
とができる。First, a sentence having a specific syntax (item name and itemized list) is extracted from the document structured by the original sentence scanning unit 18. Since these are generally only one line, and have the property of having a sequence mark at the beginning, they are extracted by comparing them with the format of the specific example entered by the user or the basic format set in the system itself. can do.

第７図において、まず切り分けられた段落について順
に最後の段落かどうか、つまり、処理していない残りの
段落があるか否かを判断し（ステップ101）、残りの段
落があるときは、その段落が一文のみの段落かどうかを
判断する（ステップ102）。ここでは、その段落がいく
つかの文からなるかを段落の構造から調べていき、先頭
と最後の文番号が一致すれば一文であると判断する。In FIG. 7, first, it is determined whether or not the cut paragraph is the last paragraph in order, that is, whether there is any remaining paragraph that has not been processed (step 101). Is determined as a paragraph having only one sentence (step 102). Here, it is checked from the structure of the paragraph whether the paragraph is composed of several sentences, and if the first and last sentence numbers match, it is determined that the sentence is one sentence.

ここで、段落が一文のみの段落であるときには、この
文の書式を求める（ステップ103）、書式とは、文中の
記号などの並びかたを記述したもので、情報処理部10の
内部において特殊な形態で表現される。その一例を以下
に示す。Here, when the paragraph is a paragraph having only one sentence, the format of this sentence is obtained (step 103). The format is a description of the arrangement of symbols and the like in the sentence. It is expressed in a simple form. An example is shown below.

＃；数字１文字＄；英大文字１文字＠；英小文字１文字％；その他記号文字１文字（例；イロハ、ローマ数字
など） _;スペース →；タブレーション＊；文字列なお、その他の記号（．、＞、ー、など）は、そのま
ま表記する。また上記特殊記号は、例えば、「1.2書式の例」からは「→＃．＃＿＊」とい
う書式が決定される。#: 1 numeric character ＄; 1 uppercase letter ＠; 1 lowercase letter%; 1 other symbolic character (eg, Iroha, Roman numeral, etc.) _; space →; tablation *; character string Other symbols ( ,>,-, Etc.) are written as they are. The above special symbols are For example, a format of “→ #. # _ *” Is determined from “1.2 example of format”.

次に、箇条書きの抽出が選択されているかどうかを判
断する（ステップ104）。ここで、箇条書きの抽出が選
択されている場合は、現在保持している文番号を一旦待
避させ（ステップ105）、その文の書式と、条件入力時
に入力した具体例から決定された書式が一致するかどう
かを判断する（ステップ106）。なお、条件入力時に具
体例が指定されなかったときは、システム内に設定され
た幾つかの一般的な書式を使用する。また、箇条書きで
は、条項に分けて同じ書式で書き並べられているため、
次の文も一文のみで同じ書式であれば、箇条書きとみな
すことができる。Next, it is determined whether or not the item extraction is selected (step 104). If the item extraction is selected, the sentence number currently held is temporarily saved (step 105), and the format of the sentence and the format determined from the specific example input at the time of inputting the condition are changed. It is determined whether they match (step 106). If a specific example is not specified when inputting a condition, some general formats set in the system are used. Also, in the bulleted list, since it is written in the same format divided into clauses,
The following sentence can be regarded as a bullet if it has only one sentence and the same format.

ここで、書式が一致した場合は文番号カウンタを進め
て次の文を調べ（ステップ107）、一致しなくなるまで
カウンタを進めて繰り返す。そして、書式が一致しなく
なったときは、複数の文について書式が一致したかどう
かを判断し（ステップ108）、一致した場合は一致した
文について全て抽出フラグを立てる（ステップ109）。
なお、上記操作によって、箇条書きの抽出指定の有無に
かかわらす文番号カウンタは次に処理すべき文を指すこ
とになる。If the formats match, the sentence number counter is advanced to check the next sentence (step 107), and the counter is advanced and repeated until no more matches are found. When the formats no longer match, it is determined whether or not the formats match for a plurality of sentences (step 108). If the formats match, the extraction flags are set for all the matching sentences (step 109).
By the above operation, the sentence number counter indicates the next sentence to be processed irrespective of the presence / absence of the item extraction specification.

次に項目の抽出が選択されていないときは後述のステ
ップ114に進み、選択されているときは、項目の書式と
比較して一致するかどうかを判断する（ステップ11
1）。ここで、項目の書式と一致しないときは、後述の
ステップ114に進み、一致するときは項目名として一致
した文の抽出フラグを立て（ステップ112）、段落番号
カウンタを進めて（ステップ113）、ステップ101に戻
る。Next, when extraction of an item is not selected, the process proceeds to step 114 described later. When it is selected, it is compared with the format of the item to determine whether it matches (step 11).
1). If the format does not match the item format, the process proceeds to step 114 described later. If the format matches, the extraction flag of the matching sentence is set as the item name (step 112), and the paragraph number counter is advanced (step 113). Return to step 101.

一方、ステップ102において段落が複数の文からなる
場合は、生成条件入力部12（第１図）で指定された特定
の単語（キーワード、カタカナ語、略語）を有する文の
抽出を行う。まず、段落の先頭の文を文番号カウンタに
セットし（ステップ114）、その段落の最後の文、つま
り、処理していない残りの文があるか否かを判断する
（ステップ115）。ここで、残りの文があるときは、文
の先頭の単語を単語番号カウンタにセットし（ステップ
116）、その文において最後の単語、つまり、処理して
いない残りの単語があるか否かを判断する（ステップ11
7）。ここで、残りの単語がある場合は、さらに、単語
切り出しの際に辞書引きした結果の属性と入力条件を比
較することで、抽出対象の単語かどうかを判断し（ステ
ップ118）、抽出対象の単語であるときは現在の文の抽
出フラグを立て（ステップ119）、文番号カウンタを進
める（ステップ120）。このとき、ステップ105で文番号
を待避させていた場合には、その文番号を進める。On the other hand, if the paragraph consists of a plurality of sentences in step 102, a sentence having a specific word (keyword, katakana, abbreviation) specified by the generation condition input unit 12 (FIG. 1) is extracted. First, the first sentence of the paragraph is set in the sentence number counter (step 114), and it is determined whether there is a last sentence of the paragraph, that is, whether there is a remaining sentence that has not been processed (step 115). If there are any remaining sentences, set the first word of the sentence in the word number counter (step
116), it is determined whether or not there is the last word in the sentence, that is, the remaining unprocessed word (step 11).
7). Here, if there are any remaining words, it is further determined whether or not the word is an extraction target by comparing the attribute of the result of the dictionary extraction at the time of word extraction with the input condition (step 118). If it is a word, the extraction flag of the current sentence is set (step 119), and the sentence number counter is advanced (step 120). At this time, if the statement number has been saved in step 105, the statement number is advanced.

なお、ステップ117において最後の単語である場合
は、そのままステップ120にすすみ、その段落の最後の
文になるまで上記操作を繰り返す。また、ステップ118
においてその文の単語が抽出対象の単語でないときは、
単語番号カウンタを進め（ステップ121）、ステップ117
に戻る。さらに、ステップ115においてその段落に残り
の文がないときは、ステップ113に移行する。そして、
最後の段落に達したときに処理を終了する。If it is the last word in step 117, the process directly proceeds to step 120, and the above operation is repeated until the last sentence of the paragraph is reached. Step 118
If the sentence word is not the word to be extracted in
Advance the word number counter (step 121), step 117
Return to Further, when there is no remaining sentence in the paragraph in step 115, the process proceeds to step 113. And
The process ends when the last paragraph is reached.

なお、この例ではユーザーが入力したキーワードにつ
いて抽出しているが、辞書部３が含む属性を差し替えれ
ば、特定の専門分野の単語に対応させることもできる。In this example, the keyword input by the user is extracted. However, if the attribute included in the dictionary unit 3 is replaced, it can be made to correspond to a word in a specific specialty field.

上述した目的文の抽出処理が終了した後、要約文の生
成を行う。まず、原文の文書名を結果記憶部17（第２
図）の文頭に拡大文字で書き込む。そして、抽出処理が
行われた文書について、先頭から順に抽出フラグの有無
を調べていき、フラグが立っていれば順次結果記憶部17
に書き込んでいく。このとき、キーワードの部分を太文
字やアンダーラインで強調したり、項目名の文の前後に
改行記号や適当なタブレーションを挿入すれば、出力結
果が見易くなる。最後にページ割り付けを行い、改ペー
ジ記号を挿入して要約文書の作成を完了する。このよう
な処理の結果、第５図に示した原文に対して、第８図に
示すような要約文が出来上がる。After the above-described object sentence extraction processing is completed, a summary sentence is generated. First, the original document name is stored in the result storage unit 17 (second
Write in enlarged characters at the beginning of (Fig.). Then, the presence / absence of an extraction flag is checked sequentially from the top of the extracted document, and if the flag is set, the result storage unit 17 is sequentially checked.
Write to. At this time, if the keyword portion is emphasized with bold characters or underlines, or if a line feed symbol or an appropriate tabulation is inserted before and after the item name sentence, the output result becomes easier to see. Finally, page assignment is performed, and a page break symbol is inserted to complete creation of the summary document. As a result of such processing, a summary sentence as shown in FIG. 8 is completed for the original sentence shown in FIG.

なお、必要があれば、この結果を見てさらに編集をく
わえれば、より自然でわかりやすい要約文を得ることが
できる。また、システムとして標準事務文書体系（OD
A）に準拠した文書、つまり、構造化文書を扱えるもの
を使用すれば、文書が論理構造を持つため、章、節の抽
出や後述する応用例（３）で述べる図形の扱いなどが容
易となり、より高度な処理が可能となる。If necessary, the result can be edited further to obtain a more natural and easy-to-understand summary sentence. In addition, the standard office document system (OD
If a document that conforms to A), that is, a document that can handle a structured document, is used, the document has a logical structure, which makes it easy to extract chapters and sections and handle figures described in Application Example (3) described later. , More advanced processing becomes possible.

上述した実施例では、日本語原稿の要約文を作成する
場合を例にして述べたが、この発明に係わる文書生成装
置は上記実施例に限定されるものではなく、様々な応用
が可能である。以下にその例を示す。In the above-described embodiment, an example in which a summary sentence of a Japanese manuscript is created has been described. However, the document generation device according to the present invention is not limited to the above-described embodiment, and various applications are possible. . An example is shown below.

（１）日英要約文作成システム上記実施例では、日本語の要約文を作成したが、項目
名、箇条書きなどは抽出しやすいだけでなく、構文が単
純で意味解析が不要なため比較的翻訳しやすいという特
徴がある。そこで、これを利用して第９図に示すような
日英要約文作成システムを実現することができる。この
システムは、原稿入力部31から入力した日本語の原文に
ついて、日本文要約文抽出部32で生成条件に従って項目
等を抽出し、構文解析部33において意味解析を行い、変
換部34で英訳した後、英文要約文生成部35において英文
の要約文を生成して出力部36において出力するようにし
たものである。出力された英文の要約文は、日英翻訳支
援などに利用することができる。(1) Japanese-English summary sentence creation system In the above embodiment, a Japanese summary sentence was created. However, not only are item names and bullet points easy to extract, but the syntax is simple and semantic analysis is unnecessary, so it is relatively easy. There is a feature that it is easy to translate. Therefore, by utilizing this, a Japanese-English summary sentence creation system as shown in FIG. 9 can be realized. This system extracts items and the like from a Japanese original sentence input from a manuscript input unit 31 according to a generation condition in a Japanese sentence abstract sentence extraction unit 32, performs a semantic analysis in a syntax analysis unit 33, and translates it into English in a conversion unit 34. Thereafter, the English summary sentence generation unit 35 generates an English summary sentence and outputs it on the output unit 36. The output English summary can be used for Japanese-English translation support.

（２）目次作成システム文書作成の際、章立ての変更や内容の増補により、し
ばしば目次の変更が必要となる。また、文書が複数の筆
者により執筆されている場合は、目次作成などが面倒に
なりやすい。しかしながら、この発明に係わる文書生成
装置を応用すれば、第10図に示すような目次の自動作成
が容易にできる目次作成システムを実現することができ
る。この目次作成システムは、原稿入力部41で入力した
文書の中から、項目抽出部42において章、節の名前及び
「参考文献」、「索引」など特定の行を抽出する。次
に、情報付加部43において前記特定の行について順に番
号付けを行い、項目の右側に対応するページ番号を書き
込み、適当に段落付けを行った上で先頭に「目次」とい
うような書き込みを行い。出力部44において出力するよ
うにしたものである。このようにして作成した目次を、
処理した文書の先頭に付けて出力すれば、表紙を付ける
だけで報告書などに使用することができる。(2) Table-of-contents creation system When creating a document, it is often necessary to change the table of contents due to changes in chapter chapters or additions to the contents. Also, when a document is written by a plurality of writers, it is easy to create a table of contents. However, if the document generation device according to the present invention is applied, it is possible to realize a table of contents creating system which can easily create a table of contents automatically as shown in FIG. In the table of contents creating system, a specific line such as a chapter, a section name, a “reference”, and an “index” is extracted in the item extracting unit 42 from the document input by the original input unit 41. Next, in the information adding unit 43, numbering is sequentially performed on the specific line, a page number corresponding to the right side of the item is written, and after appropriately paragraphing, writing such as “table of contents” is performed at the beginning. . This is output by the output unit 44. The table of contents created in this way is
If it is output at the beginning of the processed document, it can be used for reports etc. just by attaching a cover.

（３）プレゼンテーション支援システム現在、研究発表などのプレゼンテーションを行う際、
オーバーヘッドプロジェクタ（OHP）やスライドを使用
することが一般的であるが、これらの装置に使用する原
稿の作成はわずらわしく、急に原稿が必要になった場合
など、手元の一般文書から原稿が自動的に作成できると
非常に便利である。そこで、発表原稿が主に各項目名、
キーワード及び図表などを中心に構成されることに着目
して、第11図に示すようなプレゼンテーション支援シス
テムを実現することができる。このプレゼンテーション
支援システムは、まず、原稿入力部51で入力した文書に
ついて、文抽出部52と図形抽出部53で、それぞれ原稿の
入力条件に従って文、図形の抽出を行う。次に、レイア
ウト部54において映写時に見やすいように１枚当たり10
行程度に拡大してレイアウトすると共に、図形が出現す
るページの直後に、対応する図形を１枚に収まるよう拡
大を行う。そして、作成したフォームを出力部55のレー
ザービームプリンタなどによりOHP用紙に直接完成原稿
としてプリントアウトするようにしたものである。ユー
ザーはプリントアウトされた原稿を取り出してそのまま
OHPによりプレゼンテーションを行うことができる。な
お、原稿のデータを図示せぬ結果記憶部に電子的に記憶
させれば、作成された原稿に更に編集を加えることもで
き、編集した原稿を直接出力させることもできる。(3) Presentation support system Currently, when giving presentations such as research presentations,
Although it is common to use overhead projectors (OHPs) and slides, creating the originals for these devices is troublesome, and the originals are automatically converted from the general documents at hand, such as when the originals are needed suddenly. It is very convenient to be able to create it. Therefore, the presentation manuscript mainly consists of each item name,
Focusing on the fact that the configuration is centered on keywords and charts, a presentation support system as shown in FIG. 11 can be realized. In the presentation support system, first, a sentence and a figure are extracted from a document input by an original input unit 51 by a sentence extracting unit 52 and a graphic extracting unit 53 according to the input conditions of the original. Next, in the layout section 54, 10 per sheet is used for easy viewing during projection.
The layout is enlarged so as to be on the order of the row, and the enlargement is performed immediately after the page where the graphic appears so that the corresponding graphic can be accommodated in one sheet. Then, the created form is printed out as a completed manuscript directly on OHP paper by a laser beam printer or the like of the output unit 55. The user removes the printed document and leaves it
You can give a presentation by OHP. If the data of the manuscript is electronically stored in a result storage unit (not shown), the created manuscript can be further edited, and the edited manuscript can be directly output.

また、最近では計算機の出力画面をスクリーン上に直
接投影する透過形の画像投影装置も開発されているの
で、これらの装置と組み合わせれば、ハードコピーを介
さないでプレゼンテーションを行うことも可能である。In addition, recently, a transmission type image projection device that directly projects an output screen of a computer onto a screen has been developed, so that it is possible to give a presentation without a hard copy by combining with these devices. .

〔The invention's effect〕

以上説明したように、この発明に係わる文書生成装置
では、あらかじめ入力された生成条件に基づいて特定の
文を抽出し、抽出された複数の文によって新文書を生成
するようにしたため、出力結果は特定の情報量を多く含
み、付加価値の高い文書として翻訳や原稿の理解などに
有効に利用することができる。また、比較的単純な処理
によって文書を作成するため、従来の機械翻訳システム
などに比べて装置の実現が容易になる。As described above, in the document generation device according to the present invention, a specific sentence is extracted based on a generation condition input in advance, and a new document is generated based on a plurality of extracted sentences. The document contains a large amount of specific information and can be effectively used as a high value-added document for translation and understanding of a manuscript. In addition, since the document is created by relatively simple processing, the realization of the apparatus becomes easier as compared with a conventional machine translation system or the like.

[Brief description of the drawings]

第１図はこの発明に係わる文書生成装置の概略構成を示
すブロック図、第２図はこの発明に係わる文書生成装置
の基本構成を示すブロック図、第３図は上述した文書生
成装置を日本語要約分生成システムに適用した場合のシ
ステム構成図、第４図は表示部の画面上での表示状態を
示す説明図、第５図は原文の一例を示す図、第６図は文
書の構造付けの例を示す説明図、第７図は情報処理部に
おける目的文抽出の処理手順を示すフローチャート、第
８図は要約文の一例を示す図、第９図は日英要約文作成
システムの基本構成を示すブロック図、第10図は目次作
成システムの基本構成を示すブロック図、第11図はプレ
ゼンテーション支援システムの基本構成を示すブロック
図である。 10……情報処理部、11……原文記憶部、 12……生成条件入力部、13……文抽出部、 14……文書生成部、15……原文入力部、 16……辞書部、17……結果記憶部、 18……原文走査部、19……表示部、20……印刷部。FIG. 1 is a block diagram showing a schematic configuration of a document generation device according to the present invention, FIG. 2 is a block diagram showing a basic configuration of the document generation device according to the present invention, and FIG. FIG. 4 is an explanatory diagram showing a display state on a screen of a display unit, FIG. 5 is a diagram showing an example of an original text, and FIG. 6 is a structure of a document. FIG. 7 is a flowchart showing a procedure for extracting a target sentence in the information processing unit, FIG. 8 is a diagram showing an example of a summary sentence, and FIG. 9 is a basic configuration of a Japanese-English summary sentence creation system FIG. 10 is a block diagram showing a basic configuration of a table of contents creating system, and FIG. 11 is a block diagram showing a basic configuration of a presentation support system. 10 information processing section, 11 original text storage section, 12 generation condition input section, 13 sentence extraction section, 14 document generation section, 15 original text input section, 16 dictionary section, 17 ... Result storage unit, 18 ... Original text scanning unit, 19 ... Display unit, 20 ... Printing unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者松永義文東京都渋谷区代々木３丁目57番６号グランフォーレ富士ゼロックス株式会社内 (56)参考文献特開昭62−271171（ＪＰ，Ａ) 特開昭63−89966（ＪＰ，Ａ) 特開昭63−298563（ＪＰ，Ａ) 特開昭60−138670（ＪＰ，Ａ) 特開昭63−163925（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Yoshifumi Matsunaga 3-57-6 Yoyogi, Shibuya-ku, Tokyo Granforet Fuji Xerox Co., Ltd. (56) References JP-A-62-271171 (JP, A) JP-A-63-89966 (JP, A) JP-A-63-298563 (JP, A) JP-A-60-138670 (JP, A) JP-A-63-163925 (JP, A)

Claims

(57) [Claims]

1. A storage device for storing a document, a selection device for sequentially selecting paragraphs of the document stored in the storage device, a first input device for inputting a predetermined format, and a first input device for inputting a keyword. (2) input means, first determining means for determining whether the paragraph of the document selected by the selecting means is formed by one sentence, and determining that the first determining means is formed by one sentence The second determining means for determining whether the character arrangement in the sentence matches the predetermined format; and the second determining means determines that the character sequence matches the predetermined format. A first extraction unit for extracting the sentence, and a third determination for determining whether the keyword exists in the paragraph when the first determination unit determines that the keyword is not formed in one sentence. Means, and the third determining means is a key If it is determined that the keyword exists, a second extraction unit for extracting one sentence including the keyword, and a document that sequentially holds the one sentence extracted by the first extraction unit and the second extraction unit are generated. A document generation device, comprising: a document generation unit.

2. The document generation apparatus according to claim 1, wherein said document is a structured document.