JPH0464188A

JPH0464188A - Method and device for generating document format

Info

Publication number: JPH0464188A
Application number: JP2174838A
Authority: JP
Inventors: Kiichiro Watanabe; 起一郎渡邊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-07-02
Filing date: 1990-07-02
Publication date: 1992-02-28

Abstract

PURPOSE:To reduce a workload and to prevent the mis-setting of position information performed by setting a read mode in each field in unit of character frame, and converting setting information and each detection information to a document format. CONSTITUTION:A document position is detected by reading out the image data of an inputted document by a document position detecting part 2 after storing in image memory 1 once, and the frame line of the character frame is extracted by a field position detecting part 3, and the position of a field is detected in unit of character frame. Thence, the character in each character frame is recognized at a character recognition part 4, and a read mode setting part 5 sets the read mode based on a recognition result. Finally, the detection information in the document position detecting part 2, the detection information in the field position detecting part 3, and the setting information in the read mode setting part 5 are inputted to an output information conversion part 6, and they are converted to the document formats. Thereby, the workload can be reduced, and also, the mis-setting of the position information can be prevented.

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術発明が解決しようとする課題課題を解決するための手段（第１図）作用実施例（第２図〜第５図）発明の効果〔概要〕帳票フォーマット作成方法及び帳票フォーマット作成装
置に関し、帳票フォーマットの作成に要する作業量を軽減し、更に
位置情報の設定ミスも防止できるようにすることを目的
とし、入力した帳票のイメージデータを、メモリに格納し、こ
の格納したイメージデータから、帳票の位置を検出し、
該検出位置で、各フィールド内の文字枠線を抽出するこ
とにより、各フィールドの位置を、文字枠単位で検出し
、検出された各フィルド内の文字を、文字枠単位で認識
すると共に、この認識結果に基づいて、各フィールド内
のリードモードを、文字枠単位で設定し、該設定情報、
及び上記各検出情報を、帳票フォーマット形式に変換す
ることにより、帳票フォーマットの作成を行うように構
成する。[Detailed description of the invention] [Table of contents] Overview Industrial field of application Conventional technology Problems to be solved by the invention Means for solving the problems (Fig. 1) Working examples (Figs. 2 to 5) Effects of the Invention [Summary] With regard to a form format creation method and a form format creation device, the present invention aims to reduce the amount of work required to create a form format and also prevent mistakes in setting location information. Store the image data in memory, detect the position of the form from this stored image data,
By extracting the character frame line in each field at the detection position, the position of each field is detected in character frame units, and the characters in each detected field are recognized in character frame units. Based on the recognition results, the read mode in each field is set for each character frame, and the corresponding setting information,
The system is configured to create a form format by converting each of the above detected information into a form format.

[Industrial application field]

本発明は帳票フォーマット作成方法及び帳票フォーマッ
ト作成装置に関し、更に詳しくいえば、文字認識装置で
帳票上の文字を認識する際に用いられ、特に、認識すべ
き帳票のフォーマットを自動的に作成できるようにした
帳票フォーマット作成方法及び帳票フォーマット作成装
置に関する。The present invention relates to a form format creation method and a form format creation device, and more specifically, the present invention is used when recognizing characters on a form using a character recognition device. The present invention relates to a form format creation method and a form format creation device.

[Conventional technology]

従来、文字読取装置により、自動読み取りを行う帳票に
は、各種のフォーマットの帳票が使用されていた。Conventionally, forms in various formats have been used for automatic reading by character reading devices.

このような帳票を読み取る際、その帳票のフォーマット
を予め作成しておく必要がある。従来は、帳票フォーマ
ットの作成方法として、次のような方法を用いていた。When reading such a form, it is necessary to create the format of the form in advance. Conventionally, the following method has been used to create a form format.

（１）フォーマット作成用の固定式帳票に、作成対象と
なる帳票のフィールド数、フィールド位置、フィールド
内読取文字数、読取り字体（手書、活字）、フィールト
リートモ〜ド等の読取り情報を記入し、文字読取装置で
読み取らせることにより作成する。(1) Fill in the reading information such as the number of fields, field position, number of characters read in the field, reading font (handwritten, printed), field treat mode, etc. of the form to be created on the fixed form for format creation. , created by reading it with a character reading device.

（２）　　ＭＭ　Ｉ　　（Ｍａｎ　Ｍａｃｈｉｎｅ　Ｉ
ｎｔｅｒｆａｃｅ）から直接読取り情報を入力して作成
する。(2) MM I (Man Machine I)
It is created by directly inputting the reading information from (interface).

[Problem to be solved by the invention]

上記のような従来のものにおいては次のような欠点があ
った。The above-mentioned conventional devices had the following drawbacks.

（１）　　入力する読取り情報は、通常の場合マニュア
ルに書いである。このため、マニュアルを見ながらの複
雑な作業を行う必要がある。(1) The reading information to be input is usually written in the manual. Therefore, it is necessary to perform complicated tasks while referring to the manual.

したがって、読取り情報の入力に多くの時間を必要とし
、作業能率も悪い。Therefore, it takes a lot of time to input the read information, and the work efficiency is also poor.

（２）フィールド位置等の位置情報の計測ミスや変換ミ
スが起こりやすい。(2) Mistakes in measuring and converting positional information such as field positions are likely to occur.

本発明は、このような従来の欠点を解消し、帳票フォー
マットの作成に要する作業量を軽減し、更に、位置情報
の設定ミス等も防止できるようにすることを目的とする
。SUMMARY OF THE INVENTION It is an object of the present invention to eliminate such conventional drawbacks, reduce the amount of work required to create a form format, and furthermore prevent errors in setting position information.

[Means to solve the problem]

第１図は本発明の原理図であり、図中、１は画像メモリ
、２は帳票位置検出部、３はフィールド位置決め部、４
は文字認識部、５はリードモード設定部を示す。FIG. 1 is a diagram showing the principle of the present invention, in which 1 is an image memory, 2 is a form position detection section, 3 is a field positioning section, and 4 is a diagram showing the principle of the present invention.
5 indicates a character recognition section, and 5 indicates a read mode setting section.

本発明は、上記の目的を達成するため、次のように構成
したものである。In order to achieve the above object, the present invention is configured as follows.

（１）入力した帳票のイメージデータを、メモリに格納
し、この格納したイメージデータから、帳票の位置を検
出し、該検出位置で、各フィールド内の文字枠線を抽出
することにより、各フィールドの位置を、文字枠単位で
検出し、検出された各フィールド内の文字を、文字枠単
位で認識すると共に、この認識結果に基づいて、各フィ
ールド内のリードモードを、文字枠単位で設定し、該設
定情報、及び上記各検出情報を、帳票フォーマット形式
に変換することにより、帳票フォーマットの作成を行う
ことを特徴とする帳票フォーマット作成方法。(1) Store the image data of the input form in memory, detect the position of the form from this stored image data, and extract the character frame line in each field at the detected position, so that each field The position of the field is detected in each character frame, and the characters in each detected field are recognized in each character frame, and based on this recognition result, the read mode in each field is set in each character frame. , the setting information, and each of the above-mentioned detection information to create a form format by converting the information into a form format.

（２）入力した帳票のイメージデータを格納する画像メ
モリ１と、該画像メモリ１内のイメージブタから、帳票
位置を検出する帳票位置検出部２と、検出された帳票位
置で、各フィールド内の文字枠の枠線を抽出することに
より、フィールドの位置を、文字枠単位で検出するフィ
ールド位置検出部３と、検出された各フィールド内の文
字を、文字枠単位で認識する文字認識部４と、該文字認
識部４の認識結果に基づいて、各フィールド内のリード
モードを、文字枠単位で設定するり−ドモド設定部５と
、該リードモード設定部５の設定情報、及び上記各検出
情報を、帳票フォーマット形式に変換する出力情報変換
部６とを設け、該出力情報変換部６の出力から、帳票フ
ォーマット出力が得られるようにしたことを特徴とする
帳票フォーマット作成装置。(2) An image memory 1 that stores the image data of the input form; a form position detection unit 2 that detects the position of the form from the image data in the image memory 1; A field position detection unit 3 detects the position of the field in each character frame by extracting the frame line of the character frame, and a character recognition unit 4 recognizes the characters in each detected field in each character frame. , a read mode in each field is set for each character frame based on the recognition result of the character recognition unit 4, a read mode setting unit 5, setting information of the read mode setting unit 5, and each of the above-mentioned detection information. 1. A form format creation device, comprising: an output information conversion section 6 for converting a document into a form format, and a form format output can be obtained from the output of the output information conversion section 6.

[Effect]

本発明は上記のように構成したので、次のような作用が
ある。Since the present invention is configured as described above, it has the following effects.

イメージスキャナ等で入力した帳票のイメージデータは
、−旦画像メモリ１に格納される。その後、帳票位置検
出部１２では、画像メモリ１内のイメージデータを読み
出して、帳票位置を検出する。Image data of a form input using an image scanner or the like is stored in the image memory 1. Thereafter, the form position detection section 12 reads out the image data in the image memory 1 and detects the position of the form.

この帳票位置の検出により、帳票サイズが分かる。By detecting the position of the form, the size of the form can be determined.

その後、フィールド位置検出部３により、罫線抽出処理
を用いて文字枠の枠線を抽出し、フィールドの位置を文
字枠単位で検出する。Thereafter, the field position detection unit 3 extracts the frame lines of the character frames using ruled line extraction processing, and detects the position of the field for each character frame.

次に、文字認識部４では、検出された各文字枠内の文字
を認識し、この認識結果に基づいて、リドモード設定部
５がリードモードを設定する。Next, the character recognition unit 4 recognizes the characters within each detected character frame, and the read mode setting unit 5 sets the read mode based on the recognition result.

最後に、出力情報変換部６において、帳票位置検出部２
の検出情報、フィールド位置検出部３の検出情報、及び
リードモード設定部５の設定情報を入力して、帳票フォ
ーマット形式に変換する。Finally, in the output information converter 6, the form position detector 2
, the detection information of the field position detection section 3, and the setting information of the read mode setting section 5 are inputted and converted into a form format.

このようにすれば、フォーマット作成を行いたい帳票上
の文字枠内に、認識させたい字体を記入し、この帳票の
イメージデータを入力するだけで自動的に、帳票フォー
マットの作成ができる。In this way, a form format can be automatically created simply by writing the font to be recognized within the character frame of the form on which the format is to be created and by inputting the image data of this form.

〔Example〕

以下、本発明の実施例を図面に基づいて説明する。 Embodiments of the present invention will be described below based on the drawings.

第２図乃至第５図は、本発明の１実施例を示した図であ
り、第２図は帳票の例、第３図は文字フィールドの例、
第４図は帳票フォーマット作成装置のブロック図、第５
図は帳票フォーマット作成処理のフローチャートである
。2 to 5 are diagrams showing one embodiment of the present invention, in which FIG. 2 is an example of a form, FIG. 3 is an example of a character field,
Figure 4 is a block diagram of the form format creation device, Figure 5
The figure is a flowchart of the form format creation process.

図中、第１図と同符号は同一のものを示す。また、７は
イメージスキャナ、８はＣＰＵ、９は罫線抽出部を示す
。In the figure, the same reference numerals as in FIG. 1 indicate the same parts. Further, 7 is an image scanner, 8 is a CPU, and 9 is a ruled line extractor.

この実施例で用いる帳票としては、第２図のようなもの
を用いる。処理行数は、行番号（］）〜（５）の５行、
行内フィールド数は、行番号（１）のみがフィルド■と
フィールド■の２フイールドで、他の行は１フイールド
である。The form shown in FIG. 2 is used in this embodiment. The number of lines to be processed is 5 lines with line numbers (]) to (5),
Regarding the number of fields in a line, only the line number (1) has two fields, field ■ and field ■, and the other lines have one field.

行内文字数は、（１）行が９文字、（２）行が３文字、
（３）行が４文字、（４）行が４文字、（５）行が４文
字である。The number of characters in a line is (1) 9 characters per line, (2) 3 characters per line,
(3) Lines are 4 characters; (4) Lines are 4 characters; (5) Lines are 4 characters.

フィールド内読取字数は、（１）行のフィールド■は４
文字、（１）行のフィールド■は５文字、（２）行のフ
ィールド■が３文字、（３）行のフィールド■が４文字
、（４）行のフィールド■が４文字、（５）行のフィル
ド■が４文字である。The number of characters read in the field is 4 for field ■ in line (1)
Characters, (1) line field ■ is 5 characters, (2) line field ■ is 3 characters, (3) line field ■ is 4 characters, (4) line field ■ is 4 characters, (5) line The field ■ is 4 characters long.

また、文字フィールドの例としては、例えば第３図のよ
うなものがある。Further, as an example of a character field, there is a field as shown in FIG. 3, for example.

第３図（Ａ）は、文字フィールド内の各文字枠内に、「
あいうえお」の仮名が全て記入されている例であり、こ
れを文字認識すれば、仮名のフィルドと判別できる。In Figure 3 (A), each character frame in the character field has "
This is an example in which all the kana characters for ``Aiueo'' are filled in, and if this is character-recognized, it can be determined that it is a kana field.

第３図（Ｂ）は、最初の２文字が余白で、残りが数字ｒ
９００」である。このように、余白がある場合、フィー
ルド内のリードモードが統一されていれば、余白のリー
ドモードも同一にする。なお余白の指定は任意にできる
。In Figure 3 (B), the first two characters are blank spaces and the rest is the number r.
900". In this way, when there is a margin, if the read mode within the field is unified, the read mode of the margin is also made the same. Note that the margin can be specified arbitrarily.

第３図（Ｃ）は、最初の２文字が仮名「あい」で、残り
の３文字が数字「１２３」である。このように、フィー
ルド内に、文字を混在させることにより、サブフィール
ド分割を指定することも可能である。In FIG. 3(C), the first two characters are the kana "Ai" and the remaining three characters are the number "123". In this way, by mixing characters within a field, it is also possible to specify subfield division.

第３図（Ｄ）は、「Ａ」が英字単独、ｒＮ」が数字単独
、「Ｋ」が仮名単独で、残りが余白である（余白は左文
字に合わせる）。In FIG. 3(D), "A" is a single alphabetic character, "rN" is a single numeric character, "K" is a kana character alone, and the remainder is a blank space (the blank space is aligned with the left character).

このような場合、予め、英字、数字など誤読の少ない文
字に、リードモードを指定しておけば、混在文字の指定
も可能である。In such a case, by specifying read mode in advance for characters that are less likely to be misread, such as alphabets and numbers, it is possible to specify mixed characters.

帳票フォーマット作成装置としては、例えば第４図の装
置を用いる。As the document format creation device, for example, the device shown in FIG. 4 is used.

イメージスキ＋す７は、例えば第２図に示したような、
読ませたい帳票を読み取るものであり、このイメージス
キャナから読み取ったイメージデータは、ＣＰＵ８によ
り、画像メモリ１に格納される。その後、この画像メモ
リｌに格納されたイメージデータから、帳票フォーマッ
ト作成に必要な基本情報を得るものである。The image skill 7 is, for example, as shown in Figure 2.
The image scanner is used to read a form to be read, and the image data read from this image scanner is stored in the image memory 1 by the CPU 8. Thereafter, basic information necessary for creating a form format is obtained from the image data stored in the image memory l.

帳票位置検出部２では、画像メモリ１内に取り込んだ帳
票のイメージデータに対して、上端、下端、左右端の検
出処理を行うことにより、用紙サイズ（帳票高さ、帳票
幅）を検出する。The form position detection unit 2 detects the paper size (form height, form width) by detecting the top, bottom, left and right edges of the image data of the form taken into the image memory 1.

フィールド位置検出部３では、各フィールド内の文字枠
の枠線を抽出することにより、フィールドの位置を検出
するが、その際、枠線抽出に関しては、罫線抽出部９に
おいて、罫線抽出処理をする。The field position detection unit 3 detects the position of the field by extracting the frame line of the character frame in each field. At this time, regarding the frame line extraction, the ruled line extraction unit 9 performs ruled line extraction processing. .

文字枠を抽出するには、上記イメージスキャナ７として
、ドロップアウトカラーで文字枠が書かれた帳票は、ド
ロップアウトカラーの検出可能なイメージスキャナを使
用する。To extract a character frame, an image scanner capable of detecting a dropout color is used as the image scanner 7 for a form in which a character frame is written in a dropout color.

上記罫線抽出処理より、帳票上のＦ＆、ｍｌ　（文字枠
では左右枠）と横線（文字枠では上下枠）の端点の座標
を検出する。Through the ruled line extraction processing described above, the coordinates of the end points of F&, ml (left and right frames for character frames) and horizontal lines (upper and lower frames for character frames) on the form are detected.

また、フィールド位置を決める際は、フィールド内の文
字枠と、文字枠の間の間隔を考慮して位置決めをする。Furthermore, when determining the field position, the positioning is performed in consideration of the character frames within the field and the spacing between the character frames.

フィールドとフィールドの間の最低間隔が、フィルド内
の文字枠と文字枠の間の最高間隔以上になるように設定
しておけば、上記の最低間隔を予め知ることで、フィー
ルド位置（行中心位置、各フィールド左端、右端座標）
を検出することが可能である。If you set the minimum spacing between fields to be equal to or greater than the maximum spacing between character frames in a field, you can determine the field position (line center position) by knowing the above minimum spacing in advance. , left and right coordinates of each field)
It is possible to detect.

上記フィールド位置検出部３によるフィールド位置の検
出で、次の情報が得られる。The following information is obtained by detecting the field position by the field position detection section 3.

ａ、処理行数−各フィールドの行中心位置より行数を知
る。a. Number of rows to be processed - Know the number of rows from the row center position of each field.

５０行番号−行の上から順番に指定。50 Line number - Specify in order from the top of the line.

Ｃ８行内フィールド数−行中心線の座標が同じもので、
左から順番に指定。C8 Number of fields in a row - The coordinates of the row center line are the same,
Specify in order from the left.

ｄ０行内文字数−文字枠の左右枠より検出する。d0 Number of characters in line - Detected from the left and right frames of the character frame.

これは、大枠、布枠のどちらかに絞って本数を数えることにより可能である。This is either a large frame or a cloth frame. By counting the number of It is possible to

ｅ、フィールド番号−行内のフィールドの左側から順番
に指定。e, Field number - Specify the fields in the line in order from the left.

ｆ、フィールド内読取文字数−行内文字数と同様に検出
する。f, Number of characters read in field - Detected in the same way as Number of characters in line.

ｇ９文字番号−フィールド内の文字枠の左側から順番に
指定。g9 Character number - Specify in order from the left side of the character frame in the field.

ｈ、読取字体（手書、活字）−文字枠の大きさ等で区別
する。h. Reading font (handwritten, printed) - differentiated by the size of the character frame, etc.

次に、文字認識部４では、フィールド内の各文字枠内に
、予め文字を記入しておくことにより（第３図参照）、
その文字を認識する。Next, in the character recognition unit 4, by writing characters in advance in each character frame in the field (see Fig. 3),
Recognize the character.

リードモード設定部５では、上記文字認識の結果を用い
て、文字枠単位でのリードモードの設定を行う。The read mode setting unit 5 uses the result of the character recognition to set the read mode for each character frame.

出力情報変換部６では、上記帳票位置の検出情報（＠票
すイズの検出情報）、フィールド位置の検出情報、及び
リードモードの設定情報を、帳票フォーマット形式に変
換することにより、帳票フォーマットの基本部分を作成
する。The output information conversion unit 6 converts the above-mentioned form position detection information (@sheet size detection information), field position detection information, and read mode setting information into a form format, thereby converting the basic form format. Create parts.

上記装置による帳票フォーマット作成処理を、第５図の
フローチャートに基づいて説明する。なお、各処理の番
号は、カッコ内に示す。The form format creation process performed by the above device will be explained based on the flowchart shown in FIG. Note that the number of each process is shown in parentheses.

初期情報の設定を行い（１０１）、イメージスキャナ７
から帳票を読み取る。このイメージスキャナ７から読み
取った帳票のイメージデータは、画像メモリ１に格納す
る（１０２）。Set the initial information (101) and set the image scanner 7.
Read the form from. The image data of the form read by the image scanner 7 is stored in the image memory 1 (102).

その後、帳票位置検出部２により、帳票位置の検出処理
を行い（１０３）、更に、位置抽出部９によって、文字
枠の枠線を抽出しく１０４）、フィールド位置検出部３
でフィールド位置を検出する（１０５）。Thereafter, the form position detection unit 2 performs a form position detection process (103), and the position extraction unit 9 extracts the frame line of the character frame (104), and the field position detection unit 3
The field position is detected (105).

続いて、フィールド内の各文字を、文字枠単位で取り出
し、文字認識を行う（１０６）。Next, each character in the field is extracted in character frame units and character recognition is performed (106).

この文字認識結果の情報を用いて、リードモード設定部
５がリードモードの設定を、文字枠単位で行う（１０７
）。これを全てのフィールドについて実行しく１０Ｂ）
、上記各部の出力情報を、出力情報変換部６で帳票フォ
ーマット形式に変換して出力する（１０９）。Using the information of this character recognition result, the read mode setting unit 5 sets the read mode for each character frame (107
). Execute this for all fields (10B)
, the output information of each of the above units is converted into a form format by the output information conversion unit 6 and outputted (109).

以上実施例について説明したが、本発明は次のようにし
ても実施可能である。Although the embodiments have been described above, the present invention can also be implemented as follows.

（１）　　＃＆票の読み取り前に、次の情報を設定して
おけば、複雑なフォーマット指定が可能となる。(1) If you set the following information before reading the #& vote, you can specify a complex format.

ａ、帳票の一番先頭のフィールドを■Ｄフィールドと指
定しておけば、そのフィールド内に書かれた番号を、帳
票フォーマット番号として定義する。a. If the first field of the form is designated as the D field, the number written in that field is defined as the form format number.

ｂ０文字認識前の段階で、第３図のように、リドモード
に何種類かのモードを設定しておけば、特殊な指定も可
能である。If several types of read modes are set as shown in FIG. 3 before the b0 character is recognized, special specifications can be made.

Ｃ０画像フィールドが混在する帳票のフォーマット形式
を作成する場合は、文字枠の最大の大きさを指定するこ
とにより、それ以上の文字枠を画像フィールドと判定す
る。When creating a form format in which C0 image fields are mixed, by specifying the maximum size of a character frame, character frames larger than that are determined to be image fields.

（２）データ処理、データチエツク、編集指定端等の制
御情報が不要な単純帳票の場合は、上記実施例の処理で
帳票フォーマットの作成ができる。(2) In the case of a simple form that does not require control information such as data processing, data check, and editing designation terminal, the form format can be created by the processing of the above embodiment.

しかし、制御情報が必要な帳票の場合や、細かな変更（
リードモードの変更等）を必要とする場合、または読み
取り文字がリジェクトした場合は、会話形式により設定
することができる。However, in the case of documents that require control information or small changes (
If it is necessary to change the read mode (such as changing the read mode), or if the read characters are rejected, settings can be made in a conversational manner.

〔Effect of the invention〕

以上説明したように、本発明によれば次のような効果が
ある。As explained above, the present invention has the following effects.

（１）フォーマット作成を行いたい帳票上に、その文字
枠内で認識させたい字体を記入し、この帳票をイメージ
リーダ等で読ませれば、自動的に帳票フォーマット出力
が得られる。(1) Write the font you want to recognize within the character frame on the form you want to format, and read the form with an image reader or the like to automatically output the form format.

（２）従って、従来例にくらべて作業量が軽減され、し
かも位置情報の設定ミス等も防げる。(2) Therefore, the amount of work is reduced compared to the conventional example, and mistakes in setting position information can be prevented.

[Brief explanation of the drawing]

第１図は本発明の原理図、第２図乃至第５図は本発明の１実施例を示した図であり
、第２図は帳票の例、第３図は文字フィールドの例、第４図は帳票フォーマット作成装置のブロック図、第５図は帳票フォーマット作成処理のフローチャートで
ある。１−画像メモリ２−帳票位置検出部３−・−フィールド位置検出部４−文字認識部５−・−リードモード設定部６−・−出力情報変換部FIG. 1 is a diagram showing the principle of the present invention. FIGS. 2 to 5 are diagrams showing one embodiment of the present invention. FIG. 2 is an example of a form, FIG. 3 is an example of a character field, and FIG. The figure is a block diagram of the form format creation device, and FIG. 5 is a flowchart of the form format creation process. 1-Image memory 2-Form position detection section 3--Field position detection section 4-Character recognition section 5--Read mode setting section 6--Output information conversion section

Claims

[Claims]

(1) Store the image data of the input form in memory, detect the position of the form from this stored image data, and extract the character frame line in each field at the detected position. The position of the field is detected in each character frame, and the characters in each detected field are recognized in each character frame.Based on this recognition result, the read mode in each field is set in each character frame. A method for creating a form format, characterized in that a form format is created by converting the setting information and each of the detected information into a form format.

(2) An image memory (1) that stores image data of an input form; a form position detection unit (2) that detects the position of the form from the image data in the image memory (2); and the detected form position. The field position detection unit (3) detects the position of the field in each character frame by extracting the frame line of the character frame in each field, and a character recognition unit (4) that recognizes the character; a read mode setting unit (5) that sets the read mode in each field for each character frame based on the recognition result of the character recognition unit (4); An output information converting section (6) is provided to convert the setting information of the mode setting section (5) and each of the above detection information into a form format, and the form format output is determined from the output of the output information converting section (6). A document format creation device characterized in that it can obtain the following information.