JPS61193281A

JPS61193281A - Document input system

Info

Publication number: JPS61193281A
Application number: JP60032633A
Authority: JP
Inventors: Kunihiro Okada; 邦弘岡田; Yasuaki Nakano; 中野　康明; Osamu Kunisaki; 国崎　修; Hiromichi Fujisawa; 藤沢　浩道
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1985-02-22
Filing date: 1985-02-22
Publication date: 1986-08-27

Abstract

PURPOSE:To enable decision automatically of the format information of document subjected to identification by understanding the construction of the image from the remaining characters and frame after identification of characters existing within the frame from an image of a document to be identified and eliminating all characters except that are to be entered in the specimen document. CONSTITUTION:Read out construction data and physical data of the specimen document in sequence to the memory 53 and compare with the construction data and physical data of the input document of memory 54; thence determining which specimen document the input document coincides with. Edit the character recognition result and the necessary parts in the construction data and write out in the output device 9. As an example of editing, e.g. if the item title of the frame belonging to the data frame is 'applicant', and the result of recognition is 'YAMADA TARO', 'applicant'='YAMADA TARO', which is written as record 1.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は文書の処理方式に係り、特に表形式の文書上の
文字を読み取る目的に好適な文書入力方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a document processing method, and particularly to a document input method suitable for reading characters on a tabular document.

[Background of the invention]

従来の文書読み取り装置（以下、ＯＣＲと略する）では
、読み取り領域をＯＣＲが感じない色（ドロップアウト
カラー）を用いて印刷する必要があり、印刷費用が高価
になる問題があった。さらに文字読み取り領域の位置を
文書のエツジからの距離として記述し、その領域の文字
数や文字種・チェック方法などを指定する必要があり、
煩雑であった。以下、文字読み取り領域の位置情報や領
域内の文字数や文字種・チェック方法などの情報を書式
情報と呼ぶ０文字認識機能は従来のように専用ＯＣＲだ
けでなく、文書ファイルやＯＡ用ワークステーションな
どにも要求されているが、上記の問題点が広い普及を阻
んでいた。In conventional document reading devices (hereinafter abbreviated as OCR), it is necessary to print the reading area using colors (dropout colors) that OCR does not sense, which has the problem of increasing printing costs. Furthermore, it is necessary to describe the position of the character reading area as the distance from the edge of the document, and specify the number of characters in that area, character type, checking method, etc.
It was complicated. Hereinafter, information such as the position information of the character reading area, the number of characters in the area, character type, and checking method is referred to as format information.The character recognition function is not only used for dedicated OCR as in the past, but also for document files and OA workstations. However, the above-mentioned problems have prevented its widespread use.

従来技術では、たとえば公開特許公報昭和５８−２０７
１８４号（昭和５８年１２月２日公開）において、入力
画像からあらかじめメモリ内に格納しておいた定形的な
画像を除去する方法、ならびに定形的な画像を用いて文
書の種類を識別する方法が開示されている。しかし、こ
の方法ではメモリ画像を記憶するのでメモリ量が増大し
、また文書の変形（伸縮・回転・位置ずなど）が生じた
ときに正しく除去できないおそれがあった。In the prior art, for example, published patent publication 1982-207
No. 184 (published on December 2, 1984), a method for removing a fixed image stored in memory in advance from an input image, and a method for identifying the type of document using the fixed image. is disclosed. However, since this method stores the memory image, the amount of memory increases, and there is also a risk that if the document is deformed (stretched/contracted, rotated, misaligned, etc.), it may not be removed correctly.

[Purpose of the invention]

本発明の目的は、入力文書の構造に自動的に理解して文
書の書式を自動的に生成することにより。The purpose of the present invention is to automatically understand the structure of an input document and automatically generate a document format.

対象文書の範囲を大幅に拡大するとともに、書式情報作
成の手間を簡略化する手段を提供することにある。The object of the present invention is to provide a means for greatly expanding the range of target documents and simplifying the effort required to create format information.

本発明の他の目的は、定形的な入力文書の種類を自動的
に識別してあらかじめ記憶した書式情報を用いることに
より、種類の異なる定形的な文書の連続読み取りを可能
とする手段を提供することにある。Another object of the present invention is to provide a means for automatically identifying the type of a formatted input document and using pre-stored format information to enable continuous reading of formatted documents of different types. There is a particular thing.

[Summary of the invention]

かかる目的を達成するために、本発明においては識別対
象文書を用い、識別対象文書の画像から枠部分ならびに
枠部分の間の関係を抽出し、さらに枠の内部に存在する
文字を識別してこの中から識別対象文書と同種の文字未
記入文書（以下これを見本文書と呼ぶ）中に記入すべき
文字以外の文字を削除し、残った文字と枠部分とから画
像の構　−造を理解することにより、自動的に識別対象
文書の書式情報の決定を行うものである。In order to achieve this purpose, the present invention uses a document to be identified, extracts frame parts and relationships between the frame parts from an image of the document to be identified, and further identifies the characters existing inside the frame. Delete characters other than those that should be written in a blank document (hereinafter referred to as a sample document) of the same type as the document to be identified, and understand the structure of the image from the remaining characters and the frame. By doing so, the format information of the document to be identified is automatically determined.

[Embodiments of the invention]

以下、本発明を図面を用いて詳細に説明する。 Hereinafter, the present invention will be explained in detail using the drawings.

第１図は本発明の文書理解方式を実施する装置の構成を
示すブロック図である。装置の各部はバス１に接続され
、全体の動作は制御部２により制御される。文書３上の
情報（文書画像）は光電変換装置４により走査され、さ
らにディジタル化され、バス１を介してメモリ５１に格
納される。メモリ５１は後述する５２，５３．５４とと
もにメモリ５の一部をなす、ディジタル化の際、公知の
高効率符号化処理を行ってもよく、これにより文書画像
を記憶するメモリの記憶容量を節約できる。FIG. 1 is a block diagram showing the configuration of an apparatus that implements the document understanding method of the present invention. Each part of the device is connected to a bus 1, and the overall operation is controlled by a controller 2. Information on the document 3 (document image) is scanned by the photoelectric conversion device 4, further digitized, and stored in the memory 51 via the bus 1. The memory 51 forms a part of the memory 5 along with 52, 53, and 54, which will be described later.When digitizing, known high-efficiency encoding processing may be performed, thereby saving the storage capacity of the memory for storing document images. can.

また、以下の説明では１画素１ビツトに二値化するもの
とするが、１画素を多値で表現してもよく、カラースキ
ャナにより光電変換して色情報を付与してもよい。Further, in the following explanation, it is assumed that each pixel is binarized into one bit, but one pixel may be expressed in multi-values, or color information may be added by photoelectric conversion using a color scanner.

文書の処理には登録モードと識別モードとがある。モー
ドの選択はキーボード６、ディスプレイ７を用いて制御
部２とマンマシン対話により行う。Document processing includes a registration mode and an identification mode. Mode selection is performed through man-machine interaction with the control section 2 using the keyboard 6 and display 7.

まず、登録モードの場合を説明する。登録モードの処理
の流れ図を第２図に示す、この処理は制御部２のソフト
ウェアにより実行される。第２図で、２０１は識別対象
文書画像の入力であり、この文書上の画像をメモリ５１
の中に格納する。２０２では入力した文書画像に対し位
置補正処理、傾き補正処理などの正規化を行い、その結
果得られる画像がメモリ５２に格納される。この正規化
画像に対し、２０３で線抽出処理が行われ、抽出された
線パターンはメモリ５３に格納される。第３図は線パタ
ーン抽出の処理内容を説明する図であり。First, the case of registration mode will be explained. A flowchart of the registration mode process is shown in FIG. 2, and this process is executed by the software of the control unit 2. In FIG. 2, 201 is an input of a document image to be identified, and the image on this document is stored in a memory 51.
Store it inside. At 202, normalization such as position correction processing and tilt correction processing is performed on the input document image, and the resulting image is stored in the memory 52. Line extraction processing is performed on this normalized image in step 203, and the extracted line pattern is stored in the memory 53. FIG. 3 is a diagram illustrating the processing contents of line pattern extraction.

（Ａ）のような入力画像から水平方向の線抽出により（
Ｂ）のような水平線パターンを得、同様に（Ａ）から垂
直方向に線抽出により（Ｃ）のような垂直線パターンを
得、（Ｂ）と（Ｃ）との論理和により（Ｄ）のような画
像を得る。以後（Ｄ）を線パターンと呼ぶ、線パターン
抽出処理の詳細については後述する。２０４では線パタ
ーンから輪郭抽出により輪郭上の座標点列を抽出するに
の座標点列は輪郭の個数だけ得られる。以下では輪郭の
中で内輪郭だけを処理する。ここで内輪郭とは白地を取
り囲む黒地の境界であり、白地を右側に黒地を左側に見
るようにして輪郭を一周したとき時計回りに一周するよ
うな輪郭を言う、２０５ではこの内輪郭の中から長方形
をなすものを抽出する。長方形判定の詳細については後
述する。以後長方形をなす内輪郭を枠と呼ぶ、２０６で
は枠の４隅の座標を用いて枠の順序を左上から右下へと
並べ換える。By extracting horizontal lines from an input image like (A), (
Obtain a horizontal line pattern such as B), obtain a vertical line pattern such as (C) by vertically extracting lines from (A), and obtain a vertical line pattern such as (C) by logically adding (B) and (C). get an image like this. Hereinafter, (D) will be referred to as a line pattern, and details of the line pattern extraction process will be described later. In step 204, coordinate point sequences on the contour are extracted from the line pattern by contour extraction, and as many coordinate point sequences as there are contours are obtained. In the following, only the inner contours among the contours will be processed. Here, the inner contour is the boundary of the black background surrounding the white background, and refers to the contour that goes around the contour clockwise with the white background on the right and the black background on the left.In 205, inside this inner contour Extract rectangles from. Details of rectangle determination will be described later. Hereinafter, the inner contour of the rectangle will be referred to as a frame. At step 206, the coordinates of the four corners of the frame are used to rearrange the frames from top left to bottom right.

２０７では抽出した枠が複数個あったときの、枠の間の
相互関係すなわち枠ｍと枠ｎの間の関係コードＣｍ　ｎ
を抽出する。関係コードＣｍ　ｎの定義の例を次に示す
。In 207, when there are a plurality of extracted frames, the mutual relationship between the frames, that is, the relationship code between frame m and frame n Cm n
Extract. An example of the definition of the relational code Cm n is shown below.

Ｃｍｎ＝０：以下の条件が全て不成立＝　１：ｎがｍを包含＝−１：ｍがｎの包含＝　２二ｍがｎの直上＝−２：ｍがｎの直下＝　３：ｍがｎの直圧＝−３：ｍがｎの直右〜＝　４：ｍがｎの上方＝−４：ｍがｎの下方＝　５：ｍがｎの左方＝−５：　ｍのｎの右方２０８では抽出した枠に対応する領域の画像を原画像か
ら取り出し、順次この画像を文字認識部８に送って枠内
の文字を認識する。枠内部の画像のみを文字認識部に送
るので非ドロップアウトカラーの表示部分や枠外の不用
部分に妨害されることがなく、また帯状領域を対象とす
るので文字の切り出し・認識が容易である。全ての枠の
中の文字を認識した後認識結果を枠毎にディスプレイ７
に表示する。表示された文字の中で対象文書の構造を表
わす文字列でないと判断した時、これをキイボード６を
用いて消去する。例えば、第３図（Ａ）において、日立
太部、東京都、３０，１．１等を消去する。また、構造
を表わす文字列の中で内容を変更する時も同様に６，７
を用いて変更する６以上の修正したデータを見本帳票の
データとして用い、以下の処理を続行する。また対象文
書の構造を表わす文字列以外の文字列を消去する手段と
しては上述したキイボードによる手段以外でも自動的に
行なうことが可能である。Cmn=0: All of the following conditions do not hold = 1: n includes m = -1: m includes n = 22 m is directly above n = -2: m is directly below n = 3: m is n Direct pressure = -3: m is directly to the right of n ~ = 4: m is above n = -4: m is below n = 5: m is to the left of n = -5: m is to the right of n At step 208, images of areas corresponding to the extracted frames are extracted from the original image, and these images are sequentially sent to the character recognition unit 8 to recognize the characters within the frames. Since only the image inside the frame is sent to the character recognition unit, there is no interference from non-dropout color display parts or unnecessary parts outside the frame, and since the target is a band-shaped area, it is easy to cut out and recognize characters. After recognizing the characters in all frames, display the recognition results for each frame 7
to be displayed. When it is determined that the displayed characters do not represent the structure of the target document, the character string is erased using the keyboard 6. For example, in FIG. 3(A), Hitachi Abe, Tokyo, 30, 1.1, etc. are deleted. Also, when changing the contents in the character string representing the structure, 6, 7
Use the six or more corrected data that is changed using as the data of the sample form, and continue the following process. Further, as a means for erasing character strings other than character strings representing the structure of the target document, it is possible to automatically erase character strings other than the above-mentioned keyboard method.

例えば以下に述べる入力文書の構造を理解した後に、理
解した結果に基づいてデータ枠と判断した枠内の文字列
を消去することにより自動的に消去可能である。２０９
では枠ｍの中で認識文字数Ｎｍ、文字列Ｋｍ、関係コー
ドＣｍｎとを文書の構造に関する知識と照合して、入力
文書の構造を理解する。知識はｒ　ｉ　ｆ　＝ｔｈｅｎ
〜」形式、すなわち条件と結論とにより表現される。知
識の例を次に示す。ａ、ｂ、Ｑは枠番号を示す。For example, after understanding the structure of the input document described below, it can be automatically deleted by deleting character strings within a frame determined to be a data frame based on the result of the understanding. 209
Then, in the frame m, the number of recognized characters Nm, the character string Km, and the relational code Cmn are compared with knowledge regarding the structure of the document to understand the structure of the input document. Knowledge is r i f = then
...'' format, that is, a condition and a conclusion. Examples of knowledge are: a, b, and Q indicate frame numbers.

１゜（条件）　　Ｋａ＝’申請者′ （結論）　枠ａは項目枠枠ａの項目名称＝′申請者′ ２゜（条件）　　Ｎａ＝０　　＆Ｋｂ＝　’申請者′＆Ｃａ　ｂ　＝　−３（結論）　枠ａは枠すに従属するデータ枠棒ａの属性＝
゛氏名（Ｎ　ａ　＝　Ｏは空白を示す）３、（条件）　　Ｎａ＝Ｏ＆Ｋｂ＝　’都道府県′＆Ｃａ　　ｂ　＝−４Ｋｃ＝　’本籍′＆Ｃａｃ＝−３（結論）　枠ａは枠す、ｃに属するデータ枠棒ａの属性
＝゛住所：都道府県′ ４、（条件）Ｋａ＝’昭和年昭和年月日輪）　枠ａは独立なデータ枠枠ａの単位名称＝″日時枠ａの属性＝“時間二年月日′ ５、（条件）　　Ｎａ＝０（結論）　枠ａは独立な項目・データ枠棒ａの項目名称
＝′雑′ 枠ａの属性＝゛雑′ ここで項目枠とは見出しとして使われる枠で、普通はデ
ータが記入されない。データ枠とは項目の内容を表す文
字（文字列）が記入される予定の枠をいい、属性とはそ
の枠に記入される文字の種類や限定範囲などをいう。上
記の知識の具体的な実現法はプログラムのｒ　ｉ　ｆ　
＝ｔｈｅｎ〜」文としてかかれたサブルーチンであり、
知識との照会はａ。1゜ (Condition) Ka = 'Applicant' (Conclusion) Frame a is an item frame Item name of frame a = 'Applicant' 2゜ (Condition) Na = 0 & Kb = 'Applicant'& Ca b = -3 (Conclusion) Frame a is the attribute of data frame bar a that is subordinate to frame =
゛Name (Na = O indicates a blank) 3. (Condition) Na = O & Kb = 'Prefecture'& Ca b = -4 Kc = 'Registered address'& Cac = -3 (Conclusion) Frame a is , Attribute of data frame bar a belonging to c = ゛Address: Prefecture' 4, (Condition) Ka = 'Showa year Showa year month sun wheel) Frame a is an independent data frame Unit name of frame a = ``Date and time frame a Attribute = “Time 2 years, month, day” 5, (condition) Na = 0 (conclusion) Frame a is an independent item ・Item name of data frame bar a = ``Miscellaneous'' Attribute of frame a = ``Miscellaneous'' Here, item frame is a box that is used as a heading and is not normally filled with data. A data frame is a frame in which characters (character strings) representing the contents of an item are to be written, and an attribute is a type of character to be written in the frame, a limited range, etc. The specific method for realizing the above knowledge is the r i f of the program.
=then~” is a subroutine written as a statement,
Inquiry with knowledge is a.

ｂのところにｍ、ｎを代入し、条件が成立するか否かを
調べた結果を外部に引数として与えればよい０条件が成
立したときは結論も引数として外部に与える。知識たと
えば１．の条件のａ、ｂにあてはまる枠番号ｍ、ｎが見
出されたとすれば、結論の枠番号ａのところに枠番号ｍ
を代入したものを入力文書の構造データとして登録する
。構造データは上記の枠の関係コード、枠の種類（項目
枠、データ枠あるいは両者）、従属関係、項目名称、属
性コードなどからなる。複数の知識に該当する場合には
、−例として出現順序の早い知識を優先させると決めれ
ばよい。最も優先度の低い知識として他の知識にあては
まらまいときの救済措置（上記の場合は５．）を決めて
おけば、すべての枠についての構造が決定される。すべ
ての枠につい知識との照合が行われ構造が決定されると
、構造データの登録が終了し、２１０で構造データ及び
枠の物理データ（枠の４隅の座標など）、マスクデータ
を出力装置９に書き出す。マスクデータとは、データ枠
の中にありかしめ印刷された文字（上記の４．における
昭和年月日などの単位名称に相当）の位置を示すもので
ある。構造データ。Substitute m and n for b, check whether the condition holds, and give the result to the outside as an argument. 0 If the condition holds, the conclusion is also given to the outside as an argument. Knowledge for example 1. If frame numbers m and n that apply to conditions a and b are found, frame number m will be added to frame number a in the conclusion.
The substituted value is registered as the structure data of the input document. The structural data consists of the above-mentioned frame relationship codes, frame types (item frame, data frame, or both), dependent relationships, item names, attribute codes, and the like. If the knowledge applies to multiple pieces of knowledge, for example, it may be decided to give priority to the knowledge that appears earlier. By determining the remedy (5 in the above case) when the knowledge does not apply to other knowledge as the knowledge with the lowest priority, the structure for all frames will be determined. When all the frames are compared with knowledge and the structures are determined, the registration of the structural data is completed, and in step 210, the structural data, physical data of the frames (coordinates of the four corners of the frame, etc.), and mask data are output to the device. Write it down at 9. The mask data indicates the position of the caulked characters (corresponding to the unit name such as the year, month, and day of Showa in 4. above) within the data frame. Structural data.

枠の物理データ、マスクデータは従来のＯＣＲの書式デ
ータに相当する。書き出しに際してキーボード６、ディ
スプレイ７を用いて、入力文書に対する識別番号やファ
イル名称を入力する。The frame physical data and mask data correspond to conventional OCR format data. When writing, the keyboard 6 and display 7 are used to input the identification number and file name for the input document.

次に線パターン抽出処理の詳細を説明する。第４図は、
線パターンを抽出する処理をＰＡＤ形式で示す流れ図で
ある。線パターン抽出には水平線抽出と垂直線抽出とが
あるが、ここでは水平線抽出について述べ、垂直線抽出
については同様であるので省略する。第４図で４０１は
線抽出への入り口であり、メモリ５２に格納されている
正規化画像Ｑが与えられる。Ｑは第３図（Ａ）のような
二次元データＱ（ｉｔｊ）ｔ（ｉ＝ｏ〜Ｉ−１，ｊ＝ｏ−Ｊ−１）と
して表現される。４０２では４０３〜４０９の処理を走
査線番号ｊについて繰り返すことにより。Next, details of the line pattern extraction process will be explained. Figure 4 shows
2 is a flowchart showing a process for extracting a line pattern in PAD format. Line pattern extraction includes horizontal line extraction and vertical line extraction, but horizontal line extraction will be described here, and vertical line extraction will be omitted since it is the same. In FIG. 4, 401 is an entrance to line extraction, and the normalized image Q stored in the memory 52 is given. Q is expressed as two-dimensional data Q(itj)t (i=o to I-1, j=o-J-1) as shown in FIG. 3(A). In step 402, the processes in steps 403 to 409 are repeated for scanning line number j.

第３図（Ｂ）のような二次元パターンＡ（ｉ＊　ｊ）を
を得る。４０３は初期化でありＡ（ｉ、ｊ）を０にクリ
アし後述するＢを０とする。４０４は走査線の中で感素
数工だけ繰り返すループである。４０５はＱ（ｘ、ｊ）
が１かＯかを判定し、１の場合には４０６で黒の遅炎Ｂ
をカウントする−　Ｑ（ｌ　ｖ　ｊ）がＯの場合には、
４０７の判定によりその前の画素までの遅炎Ｂが閾値Ｃ
より大きいとき、４０８でＡ（ｉ　　　Ｂｏ　　ｊ）〜Ａ（ｉ　　　ｌ、ｊ）の８
個の画素を１（黒）にする処理を行い、４０９で遅炎Ｂ
をリセットする。４１０〜４１１の処理は走査線の最終
点（ｉ＝Ｉ−１）において４０７以降と同様の処理を行
うものである。４０７の判定を加えたことにより十分長
い黒の水平線分があるときのみ線抽出が行われるが、文
書上に記入されている文字・記号は短い線分からなって
いるのでほとんど抽出されない、４１２はこの処理の出
口であり、パターンＡ（ｚ、ｊ）を出力に与え、メモリ
５３に格納する０以上の説明から分かるように、パター
ンＡ（ｉｔｊ）は線分の存在を反映したパターンである
。A two-dimensional pattern A(i*j) as shown in FIG. 3(B) is obtained. 403 is initialization, in which A(i, j) is cleared to 0 and B, which will be described later, is set to 0. 404 is a loop that repeats only the number of sensing elements in the scanning line. 405 is Q(x,j)
Determines whether is 1 or O, and if it is 1, it is 406 and black slow flame B
- If Q(l v j) is O, then
As a result of the judgment in step 407, the slow flame B up to the previous pixel is the threshold value C.
When larger, 8 of A(i Bo j) to A(i l, j) in 408
Processing is performed to set the pixels to 1 (black), and at 409, the slow flame B
Reset. Processes 410 to 411 are similar to processes 407 and subsequent steps at the final point (i=I-1) of the scanning line. By adding the judgment in 407, line extraction is performed only when there is a sufficiently long black horizontal line segment, but since the characters and symbols written on the document are made up of short line segments, they are almost never extracted. It is the exit of the process, gives the pattern A(z, j) to the output, and stores it in the memory 53. As can be seen from the explanation of 0 or more, the pattern A(itj) is a pattern that reflects the existence of a line segment.

次に長方形抽出処理の詳細を説明する。第５図は、長方
形を抽出する処理の内容を示す図である。Next, details of the rectangle extraction process will be explained. FIG. 5 is a diagram showing the details of the process of extracting a rectangle.

内輪郭の座標は一次元データＸ（ｉ）、Ｙ（ｉ）として
表現される６輪郭点の中で（Ｘ（ｉ）＋Ｙ（ｉ））の最
大値と最小値、（Ｘ（ｉ）−Ｙ（ｉ））の最大値と最小
値、を与える点を求めればこれらは第５図のＰ４．Ｐ２
．Ｐ３．ＰＬに相当する。Ｐ１〜Ｐ４は枠が長方形の場
合には４隅の点になる。The coordinates of the inner contour are the maximum and minimum values of (X(i) + Y(i)), (X(i) - If we find the points that give the maximum and minimum values of Y(i)), these points can be found in P4. P2
．． P3. Corresponds to PL. P1 to P4 are the four corner points when the frame is a rectangle.

Ｐ１〜Ｐ２の間でＸの最大値と最小値を与える点を求め
これをＱ１２．Ｑｌｌとする。Ｐ２〜Ｐ３の間でＹの最
大値と最小値を与える点を求めこれをＱ２２．Ｑ２１と
する。他の辺についても同様にする。ＰＩ、Ｑｌｌ、Ｑ
１２．Ｐ２の４点のＸ座標値の差が小さいとき、Ｐ１〜
Ｐ２は直線であると判定し、他の３辺についても同様に
直線判定を行い、４辺が直線であるときこの枠は長方形
であると判定をする。Find the points that give the maximum and minimum values of X between P1 and P2 and calculate this in Q12. Let it be Qll. Find the point that gives the maximum and minimum values of Y between P2 and P3 and calculate this in Q22. Let's call it Q21. Do the same for the other sides. PI, Qll, Q
12. When the difference in the X coordinate values of the four points of P2 is small, P1~
P2 is determined to be a straight line, and the other three sides are similarly determined to be straight lines, and when the four sides are straight lines, this frame is determined to be a rectangle.

次に関係コード抽出処理の詳細を説明する。第６図は、
関係コードを抽出する処理をＰＡＤ形式で示す流れ図で
ある。第６図で６０１は入り口であり、メモリ５２に格
納されている枠の４隅の座標及び輪郭長Ｒが与えられる
。枠ｍの４隅を（Ｘ　１　（ｍ）、　Ｙ　１　（ｍ））
〜（Ｘ　４　（ｍ）、　Ｙ　４　（ｍ））とする、６０
２，６０３は枠番号ｍ、ｎに関する二重ループである。Next, details of the relational code extraction process will be explained. Figure 6 shows
It is a flowchart which shows the process of extracting a relational code in PAD format. In FIG. 6, 601 is an entrance, and the coordinates of the four corners of the frame and the outline length R stored in the memory 52 are given. The four corners of frame m are (X 1 (m), Y 1 (m))
~(X 4 (m), Y 4 (m)), 60
2,603 is a double loop regarding frame numbers m and n.

６０４でＣｍ　ｎに初期値Ｏを入れる。６０５の条件は
枠ｍがｎを包含することを検出し、条件が成立するとき
はＣｍｎに１を、Ｃｎｍに−１を代入する。以下同様に
関係コードを判定する。６０６は出口で関係コードから
なる行列Ｃを出力に与える。In step 604, the initial value O is set in Cm_n. The condition 605 detects that frame m includes n, and when the condition is satisfied, 1 is assigned to Cmn and -1 is assigned to Cnm. The related codes are determined in the same manner below. 606 provides a matrix C consisting of relational codes at the output.

以上で登録モードの説明を終了する。この登録モード処
理はあらかじめ行って構造データ等の書式情報を格納し
ておくものとするが１次に述べる識別モード処理と統合
して行うことも可能である。This concludes the explanation of the registration mode. Although this registration mode processing is performed in advance to store format information such as structural data, it is also possible to perform it in combination with the identification mode processing described below.

次に識別モードにおける入力対象文書の処理に゛ついて
説明する。第７図に識別モードの処理の流れ図を示す、
７０１は入力文書の画像の取り込み。Next, processing of an input target document in the identification mode will be explained. FIG. 7 shows a flowchart of identification mode processing.
701 is importing an image of an input document.

７０２は正規化、７０３は線パターンの抽出、７０４は
輪郭抽出、７０５は長方形抽出、７０６は枠順序付け、
７０７は関係コード抽出であり。702 is normalization, 703 is line pattern extraction, 704 is contour extraction, 705 is rectangle extraction, 706 is frame ordering,
707 is a relational code extraction.

第２図の２０２ん２０７と同様である。７０８では見本
文書の構造データ及び物理データを順次メモリ５３に読
みだし、メモリ５４の入力文書の構造データ及び物理デ
ータと比較し、入力文書がどの見本文書と一致するかを
決定する。この見本文書の構造データ及び物理データを
用いて入力文書の読み取りを以下のようにして行う。７
０９は枠について以下の７１０〜７１７を繰り返し実行
するループである。７１０はデータ枠についてのみ読み
取りを行う判定である。７１１ではこのデータ枠の内部
の領域だけを抽出する。７１２ではマスクデータにより
あらかじめ印刷された文字を消去する（マスクデータが
存在するときのみ行う）。This is similar to 202 and 207 in FIG. In step 708, the structural data and physical data of the sample document are sequentially read into the memory 53 and compared with the structural data and physical data of the input document in the memory 54 to determine which sample document the input document matches. Using the structural data and physical data of this sample document, the input document is read as follows. 7
09 is a loop that repeatedly executes the following steps 710 to 717 for the frame. 710 is a determination to read only the data frame. In step 711, only the area inside this data frame is extracted. At step 712, characters previously printed using mask data are erased (this is done only when mask data exists).

７１３では属性コードにより定まる字種を選択し。At 713, a character type determined by the attribute code is selected.

認識辞書（文字認識のための標準パターン）を設定する
。７１４では枠内部の文字を認識する。Set the recognition dictionary (standard pattern for character recognition). In step 714, characters inside the frame are recognized.

７１５では属性コードにより定まる単語辞書を選択し、
７１６で単語照合を行って文字認識結果の誤読あるいは
不読を修正する。７１７では文字認識結果と構造データ
中の必要な部分を編集し、出力装置９に書き出す。編集
の例として、たとえばデータ枠の属する項目枠の項目名
称が「申請者」であり（前記知識の２．に相当）、この
枠の認識結果が１山田太部′であったとすれば。At 715, a word dictionary determined by the attribute code is selected,
In step 716, word matching is performed to correct misreading or misreading of the character recognition results. In step 717, the character recognition result and the necessary portions of the structural data are edited and written to the output device 9. As an example of editing, suppose that the item name of the item frame to which the data frame belongs is "Applicant" (corresponding to 2. of the above knowledge), and the recognition result of this frame is 1 Yamada Abe'.

′申請者′＝″山田太部′ のようにし、これをルコードとして書き出す。``Applicant'' = ``Yamada Abe'' and write it out as a code.

また、独立な項目枠・データ枠で項目名称＝′日日時、
単位名称＝゛昭和年月日′であり（知識４゜に相当）単
位名称の部分がマスクデータにより消去されたため、認
識結果が５８５２９だったとすれば、編集出力は ′日時′＝昭和５８年５月２９日′ とする。後者の例では認識部８から文字の位置座標を取
り出し、マスクデータと座標の比較により位置を決定し
て編集する。Also, in the independent item frame/data frame, item name = 'date date and time,
Since the unit name = ``Showa date and time'' (corresponding to knowledge 4゜) and the unit name part was erased by mask data, if the recognition result was 58529, the editing output would be ``Date and time'' = 58, 1982. Let it be the 29th day of the month. In the latter example, the position coordinates of the character are taken out from the recognition unit 8, and the position is determined and edited by comparing the coordinates with the mask data.

以上本発明の一実施例について説明した。本実施例に対
して通常の文字認識装置や図形処理装置で行われる処理
、たとえば枠の抽出結果の表示・修正、文字認識結果の
表示・修正などを付加することは容易に想像されよう。One embodiment of the present invention has been described above. It is easy to imagine that processing performed by a normal character recognition device or graphic processing device, such as displaying and modifying the frame extraction results and displaying and modifying the character recognition results, can be added to this embodiment.

また、枠認識に際して線抽出や長方形決定の閾値を複数
個設け、自動判定の結果が困難なときはマンマシン的に
最適値を選択したり、二値化レベルを変えて再試行など
の変更を加えてもよい、さらに、識別モードにおいて文
書の種類があらかじめ決まっている場合はキーボードか
ら見本文書の番号を与えるようにして、構造データなど
の照合を省略してもよい。In addition, we have set multiple thresholds for line extraction and rectangle determination during frame recognition, and when automatic judgment results are difficult, we can manually select the optimal value or change the binarization level and try again. Furthermore, if the type of document is determined in advance in the identification mode, the number of the sample document may be given from the keyboard, and the verification of structural data etc. may be omitted.

なお、本実施例では文書から線パターンを抽出し、それ
から枠を抽出するとしたが原文書パターンから直接輪郭
を抽出し、それから枠を抽出するようにすることも可能
であり、この場合には枠以外の輪郭を後で他の処理たと
えば文字切り出しや文字認識に利用できる。Note that in this example, a line pattern is extracted from a document and a frame is extracted from it, but it is also possible to directly extract an outline from the original document pattern and then extract a frame. Other contours can be used later for other processing, such as character segmentation and character recognition.

また、本実施例では文書構造の記述として枠形式のもの
に限定したが、実線あるいは破線などの罫線や１円など
を用いるように拡張することも可能である。たとえば実
線からなる罫線は、細長い外輪郭を抽出することで検出
でき、罫線の上方の領域を切り出すことによって下線を
引いた表題文字の認識が可能である。円の検出は輪郭座
標系列を極座標表示し、この空間で直線検出すればよい
。Further, in this embodiment, the description of the document structure is limited to a frame format, but it may be expanded to use ruled lines such as solid lines or broken lines, 1 yen, etc. For example, a solid ruled line can be detected by extracting an elongated outer contour, and an underlined title character can be recognized by cutting out the area above the ruled line. A circle can be detected by displaying the contour coordinate series in polar coordinates and detecting a straight line in this space.

また、枠の外部の文字も読み取って、構造記述に利用す
ることもできる。色情報を利用し構造記述に色属性を付
加することも有効である。It is also possible to read characters outside the frame and use them for structural description. It is also effective to add color attributes to the structural description using color information.

さらに、文書から直接光電変換する場合に限定せず、画
像ファイルから読みだした文書画像に対しても適用でき
る。Furthermore, the present invention is not limited to direct photoelectric conversion of documents, but can also be applied to document images read from image files.

〔Effect of the invention〕

以上説明したごとく、本発明によれば見本文書がない場
合でも入力文書の構造を自動的に理解して文書の書式を
自動的に生成することが可能で。As explained above, according to the present invention, even when there is no sample document, it is possible to automatically understand the structure of an input document and automatically generate a document format.

従来のように人間による書式情報の作成作業が不要であ
り、黒色で印刷された表形式の文書が読み取れるのでド
ロップアウトカラーで印刷し直す必要がなく、準備作業
の手間を簡略化できるとともに、対象文書の範囲を大幅
に拡大することができる。There is no need for humans to create formatting information as in the past, and since tabular documents printed in black can be read, there is no need to reprint them in dropout color, simplifying the preparation work and making it easy to The scope of the document can be expanded significantly.

また本発明によれば、定形的な入力文書の種類を自動的
に識別してあらかじめ記憶した書式情報を用いることが
でき、異なる種類の定形的な文書の連続読み取りが可能
となる。Further, according to the present invention, the type of a fixed-form input document can be automatically identified and format information stored in advance can be used, making it possible to continuously read different types of fixed-form documents.

[Brief explanation of drawings]

第１図は本発明の文書処理方式を実施する装置の構成を
示すブロック図、第２．４，６．７図は第１図の制御部
における処理を説明するための流れ図、第３，５図は第
２図における処理内容を説明する図である。１・・・バス、２・・・制御部、３・・・文書、５・・
・メモリ、６・・・キーボード、７・・・ディスプレイ
、８・・・文字認識部、９・・・出力装置。FIG. 1 is a block diagram showing the configuration of an apparatus that implements the document processing method of the present invention, FIGS. 2.4 and 6.7 are flowcharts for explaining the processing in the control section of FIG. The figure is a diagram illustrating the processing contents in FIG. 2. 1...Bus, 2...Control unit, 3...Document, 5...
- Memory, 6...Keyboard, 7...Display, 8...Character recognition unit, 9...Output device.

Claims

[Claims]

1. means for inputting a document image; means for extracting a frame portion from the document image; means for identifying characters existing inside the frame portion; means for making corrections such as deletion, insertion, replacement, etc.;
A document input method comprising means for understanding the structure of an image from the frame portion and the modified characters.