JPH1185884A

JPH1185884A - Document processing system and record medium recorded with program

Info

Publication number: JPH1185884A
Application number: JP9246956A
Authority: JP
Inventors: Makoto Takimoto; 誠滝本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-09-11
Filing date: 1997-09-11
Publication date: 1999-03-30

Abstract

PROBLEM TO BE SOLVED: To prepare a routine document with a routine format at all times, even if the documents of various formats are inputted. SOLUTION: This document processing system is provided with a key word detecting means 27 for detecting each key word coinciding each item of a routine format from text data obtained from an inputted document 11, a unit extracting means 29a extracting each detected key word at text data and each character string as item information following each key word of this respectively as a unit, a key word position detecting means 28 detecting each position within the routine format of each key word detected by a key Word-detecting means, and a unit writing means 29b preparing the routine document by respectively writing each unit which corresponds to each key word at each position detected by the key word position detecting means within the routine format.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、外部から入力され
た種々のフォーマットを有する文書に含まれる各文字を
用いて、複数の項目とこの各項目に対応した各項目情報
とからなる定型フォーマットを有する定型文書を作成す
る文書処理システム、及びこの定型フォーマットを有す
る定型文書を作成するプログラムを記録した記録媒体に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fixed format including a plurality of items and item information corresponding to each item, using each character included in a document having various formats input from the outside. 1. Field of the Invention The present invention relates to a document processing system for creating a fixed document having the fixed format, and a recording medium storing a program for creating a fixed document having the fixed format.

【０００２】[0002]

【従来の技術】近年、ワークステーションやパーソナル
コンピュータ等のコンピュータシステムを利用して、各
種の文書を作成する文書処理システムが開発されてい
る。このような文書処理システムが作成する文書とし
て、予め複数の項目（必要事項）が設定されていて、各
項目に対応した項目情報からなる定型文書がある。例え
ば、このような定型文書の具体例として、履歴書や、領
収書などがある。2. Description of the Related Art In recent years, a document processing system for creating various documents using a computer system such as a workstation or a personal computer has been developed. As a document created by such a document processing system, there is a fixed document including a plurality of items (necessary items) set in advance and item information corresponding to each item. For example, specific examples of such a fixed form document include a resume and a receipt.

【０００３】このような定型文書を作成する方式とし
て、文書作成ソフトウェアが実行されるコンピュータや
ワードプロセッサを利用して、図６に示すように、コン
ピュータ１の表示装置２の表示画面２ａに定型文書の定
型フォーマット３を表示させ、この定型フォーマット３
上に項目情報の入力領域４を表示させる。そして、オペ
レータはコンピュータ１の脇に積上げられた用紙５に記
載された文書の内容を読取って、キーボード６を用いて
各入力領域４に各項目に対応した項目情報を入力する。As a method of creating such a standard document, a computer or a word processor on which document creation software is executed is used, as shown in FIG. 6, on a display screen 2a of a display device 2 of a computer 1 to display the standard document. Display the fixed format 3 and display the fixed format 3.
An input area 4 for item information is displayed above. Then, the operator reads the content of the document described on the paper 5 stacked on the side of the computer 1 and inputs item information corresponding to each item into each input area 4 using the keyboard 6.

【０００４】また、用紙５を光学的文字読取装置（以下
ＯＣＲと略記する）にセットして、この用紙５に記載さ
れた各項目の項目情報を直接読取って、この読取った項
目情報を予め記憶している定型フォーマットの各項目に
対する項目情報の領域に書込む方式がある。この場合、
用紙５のフォーマットを予めＯＣＲに記憶させておく必
要がある。Further, the sheet 5 is set on an optical character reading device (hereinafter abbreviated as OCR), the item information of each item described on the sheet 5 is directly read, and the read item information is stored in advance. There is a method of writing in the item information area for each item in the fixed format. in this case,
It is necessary to store the format of the paper 5 in the OCR in advance.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述し
たオペレータのマニュアル入力手法及び光学的文字読取
装置（ＯＣＲ）を用いる手法は、まだ解消すべき次のよ
うな課題があった。すなわち、オペレータがキーボード
やポインティング・デバイスを使用して入力する方式に
おいては、オペレータの操作能力に依存して作成効率や
誤った項目入力の発生確率等の作成精度が異なる。However, the manual input method by the operator and the method using the optical character reading device (OCR) have the following problems to be solved. That is, in the method in which the operator performs input using a keyboard or a pointing device, the creation efficiency such as the creation efficiency and the occurrence probability of an incorrect item input differs depending on the operator's operation ability.

【０００６】ＯＣＲを用いる手法においては、前述した
ように、ＯＣＲに対して予め用紙５のフォーマットを予
め記憶させておく必要がある。したがって、レイアウト
の変更、項目の追加、項目の削除等の用紙５のフォーマ
ットが変更になった場合は、ＯＣＲに登録してあるフォ
ーマット情報も同時に更新する必要があった。In the method using the OCR, it is necessary to previously store the format of the paper 5 in the OCR as described above. Therefore, when the format of the paper 5 is changed, such as a change in layout, addition of an item, or deletion of an item, it is necessary to update the format information registered in the OCR at the same time.

【０００７】さらに、用紙５のフォーマット毎にフォー
マット情報が必要なＯＣＲでは、例えば、履歴書や領収
書等のように、全国的に決まったフォーマットがない文
書の各情報を取込んで同一の定型文書に作成する場合
は、各用紙５の文書毎のフォーマットをＯＣＲに登録し
て、用紙５の文書をＯＣＲに読取らせる毎に適応するフ
ォーマットをオペレータが選択設定する必要がある。こ
の操作は非常に煩雑であり、実用的でない。[0007] Further, in the OCR which requires format information for each format of the paper 5, for example, a document having no format determined nationwide, such as a resume or a receipt, is fetched and has the same fixed format. In the case of creating a document, it is necessary for the operator to register the format for each document of each sheet 5 in the OCR, and to select and set a format to be applied each time the document of the sheet 5 is read by the OCR. This operation is very complicated and impractical.

【０００８】本発明はこのような事情に鑑みてなされた
ものであり、文書に含まれるキーワードを検出すること
によって、たとえ入力された文書のフォーマットが一定
していなくても、各文書の各文字を定型フォーマットの
どの位置に書込むかを判断でき、入力された文書から、
所望の定型フォーマットを有した定型文書を効率的かつ
高精度に作成することができる文書処理システム及びこ
の定型文書を作成するプログラムを記録した記録媒体を
提供することを目的とする。The present invention has been made in view of such circumstances, and by detecting a keyword included in a document, even if the format of the input document is not fixed, each character of each document is Can be determined in the fixed format, and from the input document,
It is an object of the present invention to provide a document processing system capable of efficiently and accurately creating a fixed document having a desired fixed format and a recording medium storing a program for creating the fixed document.

【０００９】[0009]

【課題を解決するための手段】本発明は、外部から入力
された種々のフォーマットを有する文書に含まれる各文
字を用いて、複数の項目とこの各項目に対応した各項目
情報とからなる定型フォーマットを有する定型文書を作
成する文書処理システムである。SUMMARY OF THE INVENTION According to the present invention, there is provided a fixed form comprising a plurality of items and item information corresponding to each item, using each character included in a document having various formats input from the outside. This is a document processing system that creates a standard document having a format.

【００１０】そして、入力された文書から得られるテキ
ストデータから定型フォーマットの各項目に一致する各
キーワードを検出するキーワード検出手段と、テキスト
データにおける検出された各キーワードとこの各キーワ
ードに続く項目情報としての各文字列をそれぞれユニッ
トとして抽出するユニット抽出手段と、キーワード検出
手段で検出された各キーワードの定型フォーマット内に
おける各位置を検出するキーワード位置検出手段と、定
型フォーマット内におけるキーワード位置検出手段で検
出された各位置に各キーワードに対応する各ユニットを
それぞれ書込むことによって定型文書を作成するユニッ
ト書込手段とを備えている。[0010] Then, a keyword detecting means for detecting each keyword matching each item of the fixed format from the text data obtained from the input document, and each detected keyword in the text data and item information following each keyword. Unit extraction means for extracting each character string as a unit, keyword position detection means for detecting each position in the fixed format of each keyword detected by the keyword detection means, and keyword position detection means in the fixed format Unit writing means for writing a unit corresponding to each keyword in each of the positions thus created to create a standard document.

【００１１】このよう構成された文書処理システムを用
いて定型文書が作成できる動作原理を説明する。この文
書処理システムで作成される定型文書は複数の項目とこ
の項目に続く各項目情報とで構成とされる。そして、文
書形式を示す定型フォーマットにはこの定型文書におけ
る各項目及び該当項目に続く項目情報の定型文書上の位
置情報が含まれる。A description will be given of the principle of operation in which a standard document can be created using the document processing system configured as described above. A fixed form document created by this document processing system is composed of a plurality of items and item information following the items. The standard format indicating the document format includes the position information on the standard document of each item in the standard document and item information following the corresponding item.

【００１２】一方、この文書処理システムに入力される
文書のフォーマットは一定しないが、この文書には、定
型文書を作成する上で必要な各項目及び該当項目に続く
項目情報は、位置は特定されないが、必ず存在する。し
たがって、入力された文書から各項目に一致する独立し
た一塊の文字列をキーワードとして、該当キーワードの
定型フォーマット上の位置を検出できる。On the other hand, although the format of a document input to the document processing system is not fixed, the position of each item necessary for creating a standard document and the item information following the corresponding item is not specified. Is always present. Therefore, the position of the keyword in the fixed format can be detected by using an independent block of character strings matching each item from the input document as the keyword.

【００１３】一方、入力された文書上におけるキーワー
ドに続く文字列は項目情報と見なせるので、キーワード
とこのキーワードに続く文字列とを一つのユニットと定
義して、このユニットを入力された文書から抽出して、
抽出したユニットを定型フォーマット上の前記検出した
位置へ書込めば、定型文書における一つの項目及び項目
情報が作成できる。On the other hand, since a character string following a keyword in an input document can be regarded as item information, the keyword and a character string following this keyword are defined as one unit, and this unit is extracted from the input document. do it,
If the extracted unit is written at the detected position in the standard format, one item and item information in the standard document can be created.

【００１４】入力された文書における項目と見なせる全
てのキーワードに対して上述した処理を実施することに
よって、入力されたフォーマットが不特定の文書から完
全な一つの定型文書が作成される。By performing the above-described processing on all keywords that can be regarded as items in the input document, a complete fixed document is created from a document whose input format is unspecified.

【００１５】請求項２の文書処理システムにおいては、
上述した文書処理システムに対して、入力された文書を
イメージデータに変換する画像変換手段と、画像変換手
段で変換されたイメージデータをテキストデータに変換
する文字認識手段とを付加した物である。In the document processing system according to the second aspect,
The above-described document processing system is obtained by adding image conversion means for converting an input document into image data and character recognition means for converting the image data converted by the image conversion means into text data.

【００１６】このように構成された文書処理システムに
おいては、たとえ入力される文書がワードプロセッサや
表計算ソフトウェアにて作成された文書でなとも用紙上
に印刷又は手書きされた文書であつても、入力された文
書はイメージデータに変換され、そののちテキストデー
タに変換される。In the document processing system configured as described above, even if the input document is a document created by a word processor or spreadsheet software or a document printed or handwritten on paper, The converted document is converted into image data and then into text data.

【００１７】したがって、それ以降上述した発明の文書
処理システムと同様の処理手順で入力されたフォーマッ
トが不特定の文書から完全な一つの定型文書が作成され
る。また、請求項３の文書処理システムにおいては、上
述した発明の文書処理システムにおけるユニット抽出手
段は、あるキーワードに属する単数または複数のキーワ
ード及びこれらのキーワードに続く項目情報としての各
文字列を持つユニットを抽出するようにしている。Accordingly, a complete fixed form document is created from a document having an unspecified format input in a processing procedure similar to that of the above-described document processing system of the invention. Further, in the document processing system according to the third aspect, the unit extracting means in the document processing system according to the invention described above includes a unit having one or more keywords belonging to a certain keyword and each character string as item information following the keyword. Is to be extracted.

【００１８】このように構成することによって、定型フ
ォーマットの項目と入力された文書に含まれる項目とが
完全に一致しなくて、例えば入力文書における複数の項
目が定型フォーマットの一つの項目に該当する意味を持
つ場合等においては、この連続する二つの項目に対して
一つの項目情報が付されることになる。With this configuration, the items in the fixed format do not completely match the items included in the input document. For example, a plurality of items in the input document correspond to one item in the fixed format. In a case where it has a meaning or the like, one item information is added to the two consecutive items.

【００１９】よって、この場合は、連続する複数のキー
ワードとこれらのキーワードに続く項目情報を一つのユ
ニットとして処理することが適切であり、より確実に入
力文書から定型文書を作成できる。Therefore, in this case, it is appropriate to process a plurality of continuous keywords and item information following these keywords as one unit, and a standard document can be created from an input document more reliably.

【００２０】また、請求項４の文書処理システムにおい
ては、上述した発明の文書処理システムにおけるキーワ
ード検出手段にて定型フォーマットの全ての項目に対す
るキーワードが検出されなかったとき、操作入力された
キーワードを用いてユニットを作成するようにしてい
る。Further, in the document processing system according to the present invention, when a keyword for all items in the fixed format is not detected by the keyword detecting means in the document processing system of the present invention, the keyword input and operated is used. To create units.

【００２１】このように構成することによって、定型フ
ォーマットの項目に該当する項目が入力文書に見あたら
ない場合でかつ該当項目の項目情報のみが存在する場合
は、オペレータが該当項目に対応するキーワードを入力
することによって、ユニットが形成され、このユニット
を用いて定型文書が作成される。With this configuration, if an item corresponding to the item in the fixed format is not found in the input document and only the item information of the item exists, the operator inputs a keyword corresponding to the item. By doing so, a unit is formed, and a standard document is created using this unit.

【００２２】請求項５の発明は、外部から入力された種
々のフォーマットを有する文書に含まれる各文字を用い
て、複数の項目とこの各項目に対応した各項目情報とか
らなる定型フォーマットを有する定型文書を作成するコ
ンピュータ読取り可能なプログラムを記録した記録媒体
である。The invention according to claim 5 has a fixed format including a plurality of items and item information corresponding to the respective items, using each character included in a document having various formats input from the outside. This is a recording medium on which a computer-readable program for creating a standard document is recorded.

【００２３】そして、この記録媒体に記録されたプログ
ラムは、コンピュータに対して、入力された文書から得
られたテキストデータから定型フォーマットの各項目に
一致する各キーワードを検出させ、テキストデータにお
ける検出された各キーワードとこの各キーワードに続く
項目情報としての各文字列をそれぞれユニットとして抽
出させ、検出された各キーワードの定型フォーマット内
における各位置を検出させ、定型フォーマット内におけ
る検出された各位置に各キーワードに対応する各ユニッ
トをそれぞれ書込むことによって前記定型文書を作成さ
せる。The program recorded on the recording medium causes the computer to detect each keyword corresponding to each item of the fixed format from the text data obtained from the input document, and to detect each keyword in the text data. Each keyword and each character string as item information subsequent to each keyword are extracted as a unit, each position of each detected keyword in the fixed format is detected, and each detected position in the fixed format is assigned to each position. The fixed form document is created by writing each unit corresponding to the keyword.

【００２４】このように構成されたプログラムを記録し
た記録媒体を用いることによって、たとえ上述した機能
を有していない一般の文書処理システムであっても、こ
の記録媒体を組込むことによって簡単に入力されたフォ
ーマットが不特定の文書から定型文書を自動的に作成す
ることができる。By using a recording medium on which the program configured as described above is recorded, even a general document processing system not having the above-described functions can be easily input by incorporating this recording medium. A standard document can be automatically created from a document whose format is unspecified.

【００２５】[0025]

【発明の実施の形態】以下本発明の一実施形態を図面を
用いて説明する。図１は本発明の実施形態の文書処理シ
ステムの概略構成を示すブロック図である。この文書処
理システムはコンピュータ等の一種の情報処理装置で構
成されている。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a document processing system according to an embodiment of the present invention. This document processing system is composed of a kind of information processing device such as a computer.

【００２６】外部から入力された種々のフォーマットの
文書１１が記載された用紙１２は画像変換手段としての
イメージスキャナ１３へセットされる。用紙１２に記載
された文書１１は用紙１枚分のイメージデータに変換さ
れて入力部１４へ送出されるとともに、文字認識部１５
へ送出される。文字認識部１５は入力された用紙１枚分
のイメージデータに含まれる各文字の形状（パターン）
を認識して用紙１枚分のテキストデータに変換して入力
部１４へ送出する。A sheet 12 on which documents 11 of various formats input from the outside are described is set on an image scanner 13 as image conversion means. The document 11 described on the paper 12 is converted into image data for one sheet and sent to the input unit 14, and the character recognition unit 15
Sent to The character recognizing unit 15 performs the shape (pattern) of each character included in the input image data of one sheet.
Is recognized, converted into text data for one sheet, and transmitted to the input unit 14.

【００２７】入力部１４は入力した用紙１枚分のイメー
ジデータをイメージデータメモリ１６へ格納するととも
に、入力した用紙１枚分のテキストデータをテキストデ
ータメモリ１７へ格納する。さらに、この入力部１４に
は、キーボードやマウス等の操作部１８からユーザ（オ
ペレータ）が操作入力した各種データや操作指令が入力
される。The input unit 14 stores the input image data for one sheet in the image data memory 16 and stores the input text data for one sheet in the text data memory 17. Further, various data and operation commands input by the user (operator) from the operation unit 18 such as a keyboard and a mouse are input to the input unit 14.

【００２８】この文書処理システムの主記憶装置内に
は、最終的に作成する定型文書における各項目やこの項
目に続く項目情報における１枚の定型文書上の各位置を
示す定型フォーマットを記憶する定型フォーマットメモ
リ１９、該当定型フォーマット又は最終的な定型文書に
含まれる各項目（キーワード）を記憶する項目（キーワ
ード）メモリ２０、作成された定型文書をテキストデー
タ形式で一時記憶する定型文書データメモリ２１が形成
されている。In the main storage device of the document processing system, a fixed format for storing each item in the fixed document to be finally created and each position on one fixed document in the item information following this item is stored. A format memory 19, an item (keyword) memory 20 for storing each item (keyword) included in the applicable fixed format or final fixed document, and a fixed document data memory 21 for temporarily storing the created fixed document in a text data format. Is formed.

【００２９】さらに、主記憶装置内には、表示装置２３
に対してウインドウ表示すべき定型イメージ領域２４ａ
と入力イメージ領域２４ｂとが形成された表示メモリ２
４、定型文書データメモリ２１に記憶された定型文書を
前記表示装置２３またはプリンタ２５へ出力するための
出力バッファ２２が形成されている。Further, a display device 23 is provided in the main storage device.
Image area 24a to be displayed as a window with respect to
Display memory 2 in which an input image area 24b is formed
4. An output buffer 22 for outputting the standard document stored in the standard document data memory 21 to the display device 23 or the printer 25 is formed.

【００３０】表示装置２３は作成された定型文書を表示
出力し、ブリンタ２５は作成された定型文書を用紙２６
へ印字出力する。さらに、この情報処理装置で構成され
た文書処理システム内には、ＨＤＤ等の記録媒体に記録
されたアプリケーション・プログラム上にそれぞれ単位
プログラムモジュールとして形成されたキーワード検出
部２７、キーワード位置検出部２８、ユニット抽出部２
９ａ、ユニット書込部２９ｂが設けられている。また、
キーワード検出部２７は仮キーワード設定部２７ａとキ
ーワード決定部２７ｂとで構成されている。The display device 23 displays and outputs the prepared standard document, and the printer 25 displays the prepared standard document on a sheet 26.
Print out to Further, in the document processing system constituted by the information processing apparatus, a keyword detection unit 27, a keyword position detection unit 28, and a keyword detection unit 28 each formed as a unit program module on an application program recorded on a recording medium such as an HDD. Unit extractor 2
9a and a unit writing section 29b are provided. Also,
The keyword detection unit 27 includes a temporary keyword setting unit 27a and a keyword determination unit 27b.

【００３１】ここで、入力される文書１１が種々のフォ
ーマットで記載された履歴書であり、この履歴書の文書
１１を予め設定された定型フォーマットを有する定型文
書に作成する場合を例にして説明する。Here, the input document 11 is a resume written in various formats, and an example will be described in which the document 11 of the resume is created as a fixed document having a predetermined fixed format. I do.

【００３２】定型フォーマットメモリ１９内には、図２
に示すように、作成する１枚（１頁）分の履歴書からな
る定型文書における氏名，住所，電話番号，生年月日，
同居する家族，続柄，氏名等の複数の項目３０、及び各
項目３０に続く日本太郎，東京都世田谷区…，555-1234
等の文字列からなる実際の項目情報３１を書込むための
枠３１ａの各位置が記憶されている。In the fixed format memory 19, FIG.
As shown in the figure, the name, address, phone number, date of birth,
Multiple items 30 such as family, relationship, name, etc. living together, and Nippon Taro following each item 30, Setagaya-ku, Tokyo ..., 555-1234
Each position of the frame 31a for writing the actual item information 31 consisting of a character string such as "."

【００３３】項目（キーワード）メモリ２０内には、図
２に示すように、上述した定型文書又は定型フォーマッ
トメモリ１９における氏名，住所，電話番号，生年月
日，同居する家族各等の項目３０が記憶されている。As shown in FIG. 2, the item (keyword) memory 20 includes items 30 such as a name, an address, a telephone number, a date of birth, and a family member who live together in the above-mentioned fixed-form document or fixed-form format memory 19. It is remembered.

【００３４】表示メモリ２４の定型イメージ領域２４ａ
には定型フォーマットの各項目３０と項目情報を入力す
るための枠３１ａが記憶され、入力イージ領域２４ｂに
は、イメージデータメモリ１６に記憶された入力した文
書１１のイメージデータが記憶される。そして、この表
示メモリ２４の各イメージ領域２４ａ，２４ｂのイメー
ジデータは表示装置２３の表示画面に同時に表示され
る。そして、例えば、キーワード（項目）が検索されな
い場合等に起因して、定形文書が自動的に作成されなか
った場合に、ユーザ（オペレータ）が操作部１８を操作
してキーワードを手動設定したり、ユニットの定型フォ
ーマット上の書込位置を指定する場合に使用される。The fixed image area 24a of the display memory 24
Stores an item 30 of a fixed format and a frame 31a for inputting item information, and an input easy area 24b stores image data of the input document 11 stored in the image data memory 16. Then, the image data of each image area 24a, 24b of the display memory 24 is simultaneously displayed on the display screen of the display device 23. Then, for example, when a fixed-form document is not automatically created due to, for example, a case where a keyword (item) is not searched, a user (operator) operates the operation unit 18 to manually set a keyword, Used to specify the writing position on the fixed format of the unit.

【００３５】このような構成において、イメージスキャ
ナ１３，文字認識部１５、キーワード検出部２７、キー
ワド位置検出部２８、ユニット抽出部２９ａ，及びユニ
ット書込部２９ｂの各動作を図４及び図５の流れ図を用
いて説明する。In such a configuration, the operations of the image scanner 13, the character recognition unit 15, the keyword detection unit 27, the keyword position detection unit 28, the unit extraction unit 29a, and the unit writing unit 29b are shown in FIGS. This will be described with reference to a flowchart.

【００３６】図４の流れ図において、ユーザ（オペレー
タ）は文書１１が記載された用紙１２をイメージスキャ
ナ３にセットする。イメージスキャナ３は用紙１２に記
載された文書１１をイメージデータに変換して文字認識
部１５へ送出するとともに入力部１４ヘ送出する（ステ
ップＳ１）。入力部１４はイメージデータをイメージデ
ータメモリ１６へ書込む。In the flowchart of FIG. 4, a user (operator) sets a sheet 12 on which a document 11 is described on the image scanner 3. The image scanner 3 converts the document 11 described on the paper 12 into image data, sends the image data to the character recognition unit 15, and sends it to the input unit 14 (step S1). The input unit 14 writes image data into the image data memory 16.

【００３７】文字認識部１５は入力した用紙１枚（１
頁）分のイメージデータに対して文字認識処理を実行し
てテキストデータに変換して入力部１４へ送出する（Ｓ
２）。入力部１４は入力されたテキストデータを、罫線
の枠ごとブロック化して一旦テキストデータメモリ１７
へ書込む。したがって、このテキストデータメモリ１７
には、図２に示すように、文書１１に含まれる例えば
「氏名」，「住所」等のブロック化された複数の項目３
０とこの各項目３０に続く「日本次郎」，「東京都世田
谷区…」等の項目情報３１等のブロック化された複数の
文字列が文字コードに変換された状態で記憶される。The character recognizing unit 15 receives one input sheet (1
A character recognition process is performed on the image data of (page) to convert the image data into text data and send it to the input unit 14 (S
2). The input section 14 blocks the input text data for each frame of the ruled line and temporarily
Write to Therefore, this text data memory 17
As shown in FIG. 2, there are a plurality of items 3 included in the document 11 such as “name” and “address”.
A plurality of blocked character strings such as 0 and item information 31 following each item 30 such as “Nihon Jiro”, “Setagaya-ku, Tokyo...” Are stored in a state where they are converted into character codes.

【００３８】次に、キーワード検索部２７の仮キーワー
ド設定部２７ａが起動して、テキストデータ内における
このブロック化された各複数の文字列をそれぞれ仮キー
ワードと定義する（Ｓ３）。Next, the provisional keyword setting section 27a of the keyword search section 27 is activated, and each of the plurality of block character strings in the text data is defined as a provisional keyword (S3).

【００３９】次に、イメージデータメモリ１６における
入力された文書１１のイメージデータを表示メモリ２４
の入力イメージ領域２４ｂに展開して、表示装置２３の
表示画面上にウインドウ表示する（Ｓ４）。Next, the input image data of the document 11 in the image data memory 16 is displayed in the display memory 24.
Is displayed in a window on the display screen of the display device 23 (S4).

【００４０】次に、キーワード決定部２７ｂが起動し
て、先に定義した各ブロックを示す複数の仮キーワード
のうちの一つの仮キーワードを取出す（Ｓ４ａ）。そし
て、取出した一つの仮キーワードに対する正規のキーワ
ードの決定処理、すなわち、定型文書に含まれる項目３
０に対応するキーワードを検索する処理を行う（Ｓ
５）。Next, the keyword deciding section 27b is activated and takes out one temporary keyword from the plurality of temporary keywords indicating each block defined previously (S4a). Then, the process of determining a regular keyword for one extracted temporary keyword, that is, item 3 included in the standard document
A process for searching for a keyword corresponding to 0 is performed (S
5).

【００４１】このキーワード検索処理においては、図５
に示すように、まず仮キーワードの文字数が、項目（キ
ーワード）メモリ２９に記憶されている項目（キーワー
ド）の中で最も文字数の多いもの（キーワードＭＡＸ文
字数と称する）より大きいか小さいかを判別する（Ｓ２
０）。仮キーワードがキーワードＭＡＸ文字数より大き
い場合は、この仮キーワードはキーワード（項目）であ
る可能性が皆無であるため、この仮キーワードに対する
キーワードの決定処理を終了して、Ｓ４ａへ戻り、次の
仮キーワードを読出す。In this keyword search process, FIG.
First, it is determined whether the number of characters of the temporary keyword is larger or smaller than the item (keyword) having the largest number of characters (referred to as the keyword MAX character number) among the items (keywords) stored in the item (keyword) memory 29. (S2
0). If the temporary keyword is larger than the number of characters of the keyword MAX, there is no possibility that the temporary keyword is a keyword (item). Therefore, the process of determining the keyword for this temporary keyword is terminated, and the process returns to S4a to return to the next temporary keyword. Is read.

【００４２】仮キーワードの文字数がキーワードＭＡＸ
文字数より小さかった場合、仮キーワードの先頭の文字
列を抽出し（Ｓ２１）、その先頭の文字と項目（キーワ
ード）メモリ２９内の同じ先頭文字を持つ項目（キーワ
ード）と同一であるか否かを調べる（Ｓ２２，Ｓ２
３）。The number of characters of the temporary keyword is the keyword MAX.
If it is smaller than the number of characters, the leading character string of the temporary keyword is extracted (S21), and it is determined whether or not the leading character is the same as the item (keyword) having the same leading character in the item (keyword) memory 29. Check (S22, S2
3).

【００４３】仮キーワードとキーワードが一致しなかっ
た場合、項目（キーワード）メモリ２９内の次のキーワ
ード（項目）の候補を検索し（Ｓ２５）、あれば次候補
が無くなるまでＳ２２とＳ２３とを繰り返す。If the temporary keyword does not match the keyword, a search is made for a candidate for the next keyword (item) in the item (keyword) memory 29 (S25), and if there is no next candidate, S22 and S23 are repeated until there is no next candidate. .

【００４４】仮キーワードと項目（キーワード）メモリ
２９から読出した項目（キーワード）が一致した場合、
テキストデータ上の仮キーワードをキーワードとして設
定（Ｓ２４）し、メモリに格納する（Ｓ２７）。When the provisional keyword matches the item (keyword) read from the item (keyword) memory 29,
The temporary keyword on the text data is set as a keyword (S24) and stored in a memory (S27).

【００４５】なお、次候補が無くなった場合は、この仮
キーワードをキーワード（項目）に続く項目情報３１と
設定する（Ｓ２６）。即ち、図２のテキストデータメモ
リ１７に示すように、仮キーワード化されたテキストデ
ータの中から、キーワードとして設定されたのは「氏
名」「住所」「電話番号」「生年月日」「同居する家
族」の各文字列である。なお、図２ではキーワード化さ
れていないが、テキストデータ下段にある「続柄」およ
び「氏名」は主のキーワードではないにしろ、「父」
「日本一郎」の項目情報３１から見ればキーワード（項
目）とみなすことができる。この様な場合は、「同居す
る家族」を主キーワード、この主キーワードに続く「続
柄」「氏名」を副キーワードとして、キーワードどうし
による親子関係をもたせるような機能を設けてもよい。If there is no next candidate, the temporary keyword is set as item information 31 following the keyword (item) (S26). That is, as shown in the text data memory 17 of FIG. 2, the text data set as keywords from the temporary keyword-converted text data are “name”, “address”, “telephone number”, “birth date”, and “to live together”. It is each character string of "family". In FIG. 2, although not keywordized, “family” and “name” in the lower part of the text data are not the main keywords, but “father”.
From the item information 31 of "Nihon Ichiro", it can be regarded as a keyword (item). In such a case, a function may be provided such that a parent-child relationship is established between keywords, with “family living together” as a main keyword and “sequence” and “name” following the main keyword as sub-keywords.

【００４６】次に、キーワード位置検出部２８が起動し
て、図４のＳ６へ戻り、設定したキーワード（項目）が
定型フォーマット上におけるどの位置に対応するかの決
定処理を実施する。Next, the keyword position detecting section 28 is activated, and returns to S6 in FIG. 4, and determines the position on the fixed format to which the set keyword (item) corresponds.

【００４７】定型フォーマット上の所望の位置を確定す
るのに必要な全ての項目が揃った場合を「確定」（Ｓ
８）とし、次のＳ１３のユニット化処理へ進む。定型フ
ォーマット上の所望の位置を確定するするには今回のキ
ーワードのみては足りないが他に位置の候補が無い場
合、すなわち、定型フォーマットのすべての項目に対す
るキーワードが検出されなかった場合を「近似的に確
定」（Ｓ７）とし、ユーザは足りないキーワード（項
目）と項目情報３１とを表示されているイメージデータ
から操作部１８を使用して入力する。そして、Ｓ１３の
ユニット化処理へ進む。A case where all the items necessary to determine the desired position on the fixed format are completed is determined (S
8), and proceeds to the next unitization process in S13. In order to determine the desired position on the fixed format, it is not enough to use only this keyword, but if there are no other position candidates, that is, if keywords for all items of the fixed format are not detected, The user inputs a missing keyword (item) and item information 31 from the displayed image data using the operation unit 18. Then, the process proceeds to the unitization process in S13.

【００４８】定型フォーマット上の所望の位置の候補が
複数あるが一つに決定できない場合を「確定不可能」
（Ｓ９）とし、この場合ユーザは文書１１のイメージス
キャナ１３に対する再入力操作を行うか、操作部１８を
用いて位置の選択と項目情報の入力を行う（Ｓ１１，Ｓ
１２）。"Cannot be determined" when there are a plurality of candidates for the desired position on the fixed format but one cannot be determined.
In this case, the user performs a re-input operation on the image scanner 13 of the document 11 or selects a position and inputs item information using the operation unit 18 (S11, S9).
12).

【００４９】一つのキーワードの定型フォーマット上の
位置検出処理が終了すると、ユニット抽出部２９ａが起
動して、テキストデータメモリ１７のテキストデータの
該当キーワードとこのキーワードに続く文字列からなる
項目情報とを一つのユニットと定義して、このユニット
を抽出する（Ｓ１３）。この処理をユニット化と呼び、
ユニット化されたテキストデータはユニットとして親子
関係を持った状態で記憶保持される。When the process of detecting the position of one keyword in the fixed format is completed, the unit extraction unit 29a is activated to store the relevant keyword of the text data in the text data memory 17 and item information consisting of a character string following this keyword. This unit is defined as one unit, and this unit is extracted (S13). This process is called unitization,
The unitized text data is stored and held as a unit in a parent-child relationship.

【００５０】ユニット書込部２９ｂは、抽出したユニッ
トを、定型フォーマットの該当ユニットのキーワードの
位置（項目）に書込んで、定型文書データメモリ２１へ
書込む（１４）。さらに、出力バッファ２２へ送出す
る。The unit writing section 29b writes the extracted unit in the keyword position (item) of the unit in the standard format, and writes it in the standard document data memory 21 (14). Further, the data is sent to the output buffer 22.

【００５１】さらに、ユニット書込部２９ｂは、表示メ
モリ２４の定型イメージ領域２４ａに、出力バッファ２
２に格納されたユニットデータに対応するイメージデー
タを展開して、表示装置２３にウィンドウ表示する（Ｓ
１５）。Further, the unit writing section 29b stores the output buffer 2 in the fixed image area 24a of the display memory 24.
The image data corresponding to the unit data stored in the storage device 2 is developed and displayed in a window on the display device 23 (S
15).

【００５２】以上で一つのキーワード及びこのキーワー
ド（項目３０）に続く項目情報３１の定型文書に対する
書込処理が終了したので、読取った文書１１のイメージ
データに未読出のキーワードがあれば（Ｓ１６）、Ｓ４
ａへ戻り次の仮キーワードの読出処理を開始する。As described above, the writing process of one keyword and the item information 31 following the keyword (item 30) to the standard document is completed. If there is an unread keyword in the image data of the read document 11 (S16). , S4
Returning to a, the reading process of the next temporary keyword is started.

【００５３】このように、Ｓ４ａからＳ１５との間での
処理を繰り返すことにより、入力した種々のフォーマッ
トを有する文書１１から抽出した各項目３０（キーワー
ド）と該当項目に続く各項目情報３１とを定型フォーマ
ットの該当する各項目の位置に自動的に書込むことがで
きる。As described above, by repeating the processing from S4a to S15, each item 30 (keyword) extracted from the input document 11 having various formats and each item information 31 following the corresponding item are obtained. It can be automatically written at the position of each applicable item in the fixed format.

【００５４】これにより、入力した文書１１の各項目３
０及び項目情報３１とユニットとして定型フォーマット
の各位置に書込むことができ、結果的に所望の定型文書
を作成することができる。Thus, each item 3 of the input document 11
0 and item information 31 and a unit can be written in each position of the fixed format, and as a result, a desired fixed document can be created.

【００５５】なお、ユーザの要求に応じて定型イメージ
領域２４ａに展開した定型文書をプリンタ２５により印
刷出力することができる。また、作成した定型文書をフ
ァイル装置に保存し、必要に応じて呼出すことにより表
示出力または印刷出力を行うことができる。なお、Ｓ１
５で定型文書フォーマット及び項目３０，項目情報３１
をウィンドウ表示しているが、これに限る事なく、Ｓ７
およびＳ８において書込位置が確定したときに表示する
方式でもよい。The standard document developed in the standard image area 24a can be printed out by the printer 25 in response to a user request. In addition, the created standard document is stored in a file device, and can be displayed or printed out by calling it as needed. Note that S1
5, the standard document format and item 30, item information 31
Is displayed in a window, but is not limited to this.
Alternatively, the display may be performed when the writing position is determined in S8.

【００５６】なお、本発明は上述した実施形態に限定さ
れるものではない。実施形態システムにおいては、入力
される文書１１は用紙１２に印刷又は手書きされた状態
で入力される。しかし、例えばワードプロセッサや計算
機ソフトウェアで作成された文書、すなわち、テキスト
データ形式で入力することも可能である。この場合、図
１におけるイメージスキャナ１３及び文字認識部１５は
不要であり、入力された文書のテキストデータは入力部
１４を介して直接テキストデータメモリ１７へ書込まれ
る。The present invention is not limited to the embodiment described above. In the embodiment system, the input document 11 is input while being printed or handwritten on the paper 12. However, it is also possible to input a document created by, for example, a word processor or computer software, that is, a text data format. In this case, the image scanner 13 and the character recognition unit 15 in FIG. 1 are unnecessary, and the text data of the input document is directly written into the text data memory 17 via the input unit 14.

【００５７】なお、上述した実施形態において記載した
手法は、コンピュータに実行させることのできるプログ
ラムとして、例えば磁気ディスク（フロッピーディス
ク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、
ＤＶＤ等）、半導体メモリなどの記録媒体に書込んで各
種装置に適用したり、通信媒体により伝送して各種装置
に適用することも可能である。本装置を実現するコンピ
ュータは、記録媒体に記録されたプログラムを読込み、
プログラムによって動作が制御されることにより、上述
した処理を実行する。Note that the method described in the above-described embodiment includes, as programs that can be executed by a computer, for example, a magnetic disk (floppy disk, hard disk, etc.), an optical disk (CD-ROM,
It can be applied to various devices by writing to a recording medium such as a DVD or a semiconductor memory, or can be transmitted to a communication medium and applied to various devices. A computer that realizes the present apparatus reads a program recorded on a recording medium,
The above-described processing is executed by controlling the operation by the program.

【００５８】[0058]

【発明の効果】以上詳述したように本発明によれば、種
々のフォーマットを有する文書に含まれる各項目に対応
するキーワードを検出し、このキーワードとこのキーワ
ードに続く項目情報とをユニットと定義して、このユニ
ットを定型フォーマットの位置決定されたキーワード位
置に書込むようにしている。As described above in detail, according to the present invention, a keyword corresponding to each item included in a document having various formats is detected, and this keyword and item information following the keyword are defined as a unit. Then, this unit is written in the determined keyword position in the fixed format.

【００５９】したがつて、たとえ入力された文書のフォ
ーマットが一定していなくても、各文書の各文字を定型
フォーマットのどの位置に書込むかを判断でき、入力さ
れた文書から、所望の定型フォーマットを有した定型文
書を効率的かつ高精度に作成することができる。また、
入力文書のォーマット情報を管理する必要がなく、定型
フォーマットのみを管理すればよいため、文書のフォー
マット管理が簡素化される。Therefore, even if the format of the input document is not constant, it is possible to determine where each character of each document is to be written in the fixed format, and to determine the desired fixed format from the input document. A standard document having a format can be created efficiently and with high accuracy. Also,
There is no need to manage the format information of the input document, and only the fixed format needs to be managed. Therefore, the format management of the document is simplified.

[Brief description of the drawings]

【図１】本発明の一実施形態に係わる文書管理システ
ムの概略構成を示すブロック図FIG. 1 is a block diagram showing a schematic configuration of a document management system according to an embodiment of the present invention.

【図２】同文書管理システムにおける各メモリの記憶
内容を示す図FIG. 2 is a diagram showing storage contents of respective memories in the document management system.

【図３】同文書管理システムにおける表示メモリの記
憶内容を示す図FIG. 3 is a diagram showing storage contents of a display memory in the document management system.

【図４】同文書管理システムの全体動作を示す流れ図FIG. 4 is a flowchart showing the overall operation of the document management system.

【図５】同文書管理システムのキーワード検索動作を
示す流れ図FIG. 5 is a flowchart showing a keyword search operation of the document management system.

【図６】従来の定型文書を作成する手法を説明するた
めの模式図FIG. 6 is a schematic diagram for explaining a conventional method of creating a fixed form document.

[Explanation of symbols]

１１…文書１２…用紙１３…イメージスキャナ１４…入力部１５…文字認識部１６…イメージデータメモリ１７…テキストデータメモリ１８…操作部１９…定型フォーマットメモリ２０…項目（キーワード）メモリ２２…出力バッファ２３…表示装置２４…表示メモリ２４ａ…定型イメージ領域２４ｂ…入力イメージ領域２７…キーワード検索部２７ａ…仮キーワード設定部２７ｂ…キーワード決定部２８…キーワード位置検出部２９ａ…ユニット抽出部２９ｂ…ユニット書込部３０…項目３１…項目情報 DESCRIPTION OF SYMBOLS 11 ... Document 12 ... Paper 13 ... Image scanner 14 ... Input part 15 ... Character recognition part 16 ... Image data memory 17 ... Text data memory 18 ... Operation part 19 ... Standard format memory 20 ... Item (keyword) memory 22 ... Output buffer 23 ... Display device 24 ... Display memory 24a ... Standard image area 24b ... Input image area 27 ... Keyword search unit 27a ... Temporary keyword setting unit 27b ... Keyword determination unit 28 ... Keyword position detection unit 29a ... Unit extraction unit 29b ... Unit writing unit 30 ... item 31 ... item information

Claims

[Claims]

1. A standard document having a standard format including a plurality of items and item information corresponding to each item is created using each character included in a document having various formats input from the outside. A document processing system, comprising: keyword detection means for detecting each keyword that matches each item of the fixed format from text data obtained from the input document; and each of the detected keywords in the text data. A unit extraction unit that extracts each character string as the item information following the keyword as a unit, a keyword position detection unit that detects each position in the fixed format of each keyword detected by the keyword detection unit, The keyword position detection in the fixed format The document processing system comprising a unit writing means for creating the fixed text by the fact that each unit corresponding to each keyword writing each document to each position detected by means.

2. Using a character included in a document having various formats input from the outside, a fixed document having a fixed format including a plurality of items and item information corresponding to each item is created. A document processing system, comprising: image conversion means for converting the input document into image data; character recognition means for converting the image data converted by the image conversion means into text data; and conversion by the character recognition means. Keyword detection means for detecting each keyword that matches each item of the fixed format from the extracted text data; and each of the detected keywords in the text data and each character string as the item information following each keyword. Unit extraction means for extracting as a unit, and detection by the keyword detection means Keyword position detecting means for detecting each position of each of the keywords in the fixed format, and writing each unit corresponding to each keyword at each position detected by the keyword position detecting means in the fixed format. And a unit writing means for creating the standard document.

3. The unit extraction unit according to claim 1, wherein the unit extraction unit extracts one or more keywords belonging to a certain keyword and a unit having each character string as the item information following the keywords. Document processing system as described.

4. The unit according to claim 1 or 2, wherein when the keyword detecting means does not detect a keyword for all items of the fixed format, the unit is created by using a keyword input and operated. Document processing system as described.

5. A standard document having a standard format including a plurality of items and item information corresponding to each item is created using each character included in a document having various formats input from the outside. A recording medium on which a computer-readable program is recorded, wherein the program causes a computer to detect each keyword matching each item of the fixed format from text data obtained from the input document; Extracting each of the detected keywords in the data and each of the character strings as the item information following each of the keywords as a unit, detecting each of the positions of the detected keywords in the fixed format, the fixed format; Each key at each detected position in the Recording medium for recording a program for causing create the standard document by the units corresponding to the over de writing each document.