JPH06119491A - Slip comprehension system - Google Patents

Slip comprehension system

Info

Publication number
JPH06119491A
JPH06119491A JP4269027A JP26902792A JPH06119491A JP H06119491 A JPH06119491 A JP H06119491A JP 4269027 A JP4269027 A JP 4269027A JP 26902792 A JP26902792 A JP 26902792A JP H06119491 A JPH06119491 A JP H06119491A
Authority
JP
Japan
Prior art keywords
entry
frame
slip
entry frame
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4269027A
Other languages
Japanese (ja)
Inventor
Masami Oguro
雅己 小黒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP4269027A priority Critical patent/JPH06119491A/en
Publication of JPH06119491A publication Critical patent/JPH06119491A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To reduce labor at the time of preparing slip definition information and to improve the accuracy of recognition concerning the slip comprehension system for segmenting and recognizing characters described inside a description frame from a slip designating the description frame with a black frame line. CONSTITUTION:A slip definition table 108 is prepared for making physical arrangement relation between the items of the recognizing object slip correspondent to the class of descriptions such as an address and name or the like. A rectangle circumscrubing to the continuity of white picture elements is extracted from the image data of an input slip 101 by a description frame extracting means 105, and the rectangle provided with a size larger than a threshold value is decided as the description frame. A class judging means 107 judges the class of the described item by collating the physical arrangement relation of the items provided from the decided description frame with the definition information of the slip definition tavle 108. Further, the character string is recognized by a character string recognizing means 109 while using original knowledge corresponding to the class for each described item.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は,黒枠線で記入枠を指定
している帳票から,記入枠内に記入された文字を切り出
し,認識する帳票理解システムに関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a form comprehension system for recognizing by cutting out characters entered in an entry frame from a form in which an entry frame is designated by a black frame line.

【0002】[0002]

【従来の技術】図6は,従来技術を説明するための図で
ある。図6において,601は入力帳票,602は従来
の帳票定義情報が格納された帳票定義テーブルを表す。
2. Description of the Related Art FIG. 6 is a diagram for explaining a conventional technique. In FIG. 6, 601 represents an input form, and 602 represents a form definition table in which conventional form definition information is stored.

【0003】従来,記入枠内に記入された文字列の種別
を検出するために,その種別に関する定義情報と帳票に
おける記入位置の絶対座標(または,マーカ等からの相
対座標)とのマッピングを行っていた。例えば,図6に
示すように,あらかじめ用意した帳票定義テーブル60
2に,項目1の左上端のx,y座標Sx,Syと,記入
枠の各方向のサイズLx,Lyを与え,その位置に記入
される種別(本例では,氏名)を併記する。これによ
り,該座標に記入された文字列が,氏名を表す文字列で
あることがわかる。
Conventionally, in order to detect the type of the character string entered in the entry frame, mapping is performed between the definition information regarding the type and the absolute coordinates of the entry position on the form (or the relative coordinates from the marker etc.). Was there. For example, as shown in FIG. 6, a form definition table 60 prepared in advance is used.
2, the x, y coordinates Sx, Sy at the upper left corner of item 1 and the sizes Lx, Ly in each direction of the entry frame are given, and the type (name in this example) entered at that position is also described. As a result, it can be seen that the character string written in the coordinates is a character string representing the name.

【0004】一方,文字列認識では,文字列の種別に応
じた知識を用いて,認識率を向上させる方法が多く行わ
れている。例えば,氏名欄には,人名を表す単語しか用
いられない,電話番号欄は,数字・記号しか用いられな
い,等の知識を利用した認識である。
On the other hand, in the character string recognition, there are many methods for improving the recognition rate by using knowledge according to the type of the character string. For example, recognition using knowledge such that only words representing a person's name are used in the name field, and only numbers and symbols are used in the telephone number field.

【0005】[0005]

【発明が解決しようとする課題】上記従来の方法では,
あらかじめ記入枠の座標を指定する必要がある。このた
め,各記入枠間の配置が密になった場合,正確な位置座
標の指定をしなければならず,帳票定義情報作成時にお
ける煩わしさの要因となっている。また,正確な指定を
行っても,帳票印刷のむらによる位置ずれ誤差や,帳票
入力時のわずかな傾きに対応できず,実際の記入枠の位
置と定義された位置の間にずれが生じ,文字認識等を行
う場合に,イメージが正しく切り出せないため,誤認識
が生じる。これらの現象は,黒枠線の帳票においては,
特に顕著に現れる。
SUMMARY OF THE INVENTION In the above conventional method,
It is necessary to specify the coordinates of the entry frame in advance. For this reason, when the arrangement between the entry frames becomes dense, it is necessary to accurately specify the position coordinates, which is a factor of troublesomeness when creating the form definition information. In addition, even if the correct designation is made, it is not possible to deal with the misalignment error due to uneven printing of the form and the slight inclination at the time of inputting the form. When performing recognition, etc., the image cannot be cut out correctly, resulting in erroneous recognition. These phenomena are
Especially noticeable.

【0006】本発明の目的は,緻密な位置座標の指定を
不要として,帳票定義情報作成時の手間を軽減し,さら
に,位置ずれに強く,各記入文字列を精度良く認識する
帳票理解システムを提供することにある。
An object of the present invention is to provide a form understanding system which eliminates the need for precise specification of position coordinates, reduces the time and effort required when creating form definition information, and is resistant to misalignment and accurately recognizes each entered character string. To provide.

【0007】[0007]

【課題を解決するための手段】上記課題を解決するため
に,認識対象帳票の項目間の物理的な配置関係と,住
所,姓名等の記入種別とを対応づけた帳票定義テーブル
を用意しておき,スキャナ等から入力されたイメージデ
ータを格納するイメージ格納手段と,該データから白画
素の連続に外接する四角形を抽出し,しきい値以上の大
きさを有する四角形を記入枠として決定する記入枠抽出
手段と,該記入枠抽出手段で得られた項目の物理的な配
置関係と前記帳票定義テーブルの定義情報とを照合し,
記入種別を決定する種別判定手段とにより,記入項目の
種別を判断する。さらに,各記入項目毎に,種別に応じ
た独自の知識を用いて,文字切り出し・文字認識・単語
照合を行う文字列認識手段により,イメージデータをコ
ード情報に変換し,格納する。
[Means for Solving the Problems] In order to solve the above problems, a form definition table is prepared in which the physical layout relationship between the items of the recognition target form and the entry types such as the address and the family name are associated with each other. Every time, an image storage means for storing image data input from a scanner or the like, and a quadrilateral circumscribing a series of white pixels from the data are extracted, and a quadrilateral having a size equal to or larger than a threshold value is determined as an entry frame. The frame extraction means is collated with the physical arrangement relationship of the items obtained by the entry frame extraction means and the definition information of the form definition table,
The type of entry item is determined by the type determination means for determining the type of entry. Further, the image data is converted into code information and stored by the character string recognition means that performs character segmentation, character recognition, and word matching using unique knowledge according to the type for each entry.

【0008】[0008]

【作用】本発明では,白画素の連結成分で作られる四角
形から記入枠を自動検出し,その物理的な配置関係と,
帳票定義テーブルに定義された記入枠の物理的な配置関
係との照合により,記入種別を決定するので,帳票定義
情報として緻密な位置座標の指定が不要となる。また,
位置ずれに強く,各記入文字列を精度良く認識すること
が可能となる。
In the present invention, the entry frame is automatically detected from the quadrangle formed by the connected components of the white pixels, and the physical arrangement relationship between
Since the entry type is determined by collating with the physical arrangement relationship of the entry frames defined in the form definition table, it is not necessary to specify precise position coordinates as the form definition information. Also,
It is resistant to misalignment, and it is possible to recognize each written character string with high accuracy.

【0009】[0009]

【実施例】以下,図面を用いて,本発明の実施例を説明
する。図1は本発明の実施例の構成を示すブロック図,
図2は本発明の実施例の処理フローチャート,図3は本
発明の実施例に係る帳票定義テーブルの記述例を示す図
である。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention,
FIG. 2 is a processing flowchart of the embodiment of the present invention, and FIG. 3 is a diagram showing a description example of the form definition table according to the embodiment of the present invention.

【0010】図1において,101は黒枠線で記入枠を
指定した入力帳票であり,これは,スキャナ等の画像入
力装置102により読み取られる,2値データで入力さ
れる。103はCPUおよびメモリなどからなる帳票を
入力するための処理装置である。イメージ格納手段10
4は,画像入力装置102から入力されたイメージデー
タを,処理装置103のメモリに格納する(図2の処理
201)。
In FIG. 1, reference numeral 101 denotes an input form in which an input frame is designated by a black frame line, which is input by binary data read by an image input device 102 such as a scanner. Reference numeral 103 denotes a processing device for inputting a form, which includes a CPU and a memory. Image storage means 10
4 stores the image data input from the image input device 102 in the memory of the processing device 103 (process 201 in FIG. 2).

【0011】記入枠抽出手段105は,入力帳票101
のイメージデータ中に存在する連続する白画素の領域を
検出し,これに外接する四角形を生成して枠を抽出する
(図2の処理202)。なお,記入枠抽出手段105の
処理については,後に図4を用いて詳述する。
The entry frame extracting means 105 is provided for the input form 101.
The continuous white pixel area existing in the image data is detected, a circumscribing quadrangle is generated, and a frame is extracted (process 202 in FIG. 2). The processing of the entry frame extraction means 105 will be described later in detail with reference to FIG.

【0012】入力される画像は,黒罫線で枠取りされて
いる。このため,記入枠のみの画像であれば,連続する
白画素領域は,記入枠で囲まれる閉領域のみであり,記
入枠の数だけできる。しかし,実際には,各項目に文字
が記入されているため,文字線で囲まれる閉領域が存在
する。そこで,記入枠抽出手段105における記入枠判
定手段106で,抽出された四角形のサイズにより,記
入枠の閉領域か,文字の一部であるかを判定し,記入枠
の閉領域のみを選択する(図2の処理203)。この記
入枠の抽出がすべて終了するまで,抽出処理を繰り返す
(図2の処理204)。
The input image is framed by black ruled lines. Therefore, in the case of an image with only the entry frame, the continuous white pixel area is only a closed area surrounded by the entry frame, and the number of entry frames can be made. However, in reality, since a character is entered in each item, there is a closed area surrounded by a character line. Therefore, the entry frame determination unit 106 in the entry frame extraction unit 105 determines whether the closed region of the entry frame or a part of the character is selected based on the size of the extracted rectangle, and selects only the closed region of the entry frame. (Process 203 in FIG. 2). The extraction process is repeated until the extraction of all the entry frames is completed (process 204 in FIG. 2).

【0013】帳票定義テーブル108には,入力される
帳票の各項目の種別を記述した複数の定義テーブルが格
納されている。図3は,その帳票定義テーブル108に
ついての一つの定義テーブルの記述例を示しており,こ
れには,項目の種別,項目枠/文字枠の別,文字枠の場
合には文字数,などが記述されている。なお,文字枠と
は,1文字のみを記入する枠,項目枠とは,1文字以上
の文字を文字列で記入する枠である。図3に示す例で
は,1行目は住所,2行目の左から12個の文字枠は電
話番号,残り5個の文字枠は郵便番号,3行目は氏名,
4行目は左から商品名,次の2つの文字枠は型番,残り
の2つの文字枠は色,の各情報を入力するための帳票定
義を行っている。
The form definition table 108 stores a plurality of definition tables in which the types of respective items of the input form are described. FIG. 3 shows a description example of one definition table for the form definition table 108, in which the type of item, the type of item frame / character frame, the number of characters in the case of a character frame, etc. are described. Has been done. The character frame is a frame in which only one character is entered, and the item frame is a frame in which one or more characters are entered in a character string. In the example shown in FIG. 3, the first line is an address, the 12 character boxes from the left of the second line are telephone numbers, the remaining 5 character boxes are zip codes, and the third line is a name.
The fourth line defines the form for inputting the product name from the left, the next two character boxes are the model numbers, and the remaining two character boxes are the colors.

【0014】種別判定手段107は,記入枠抽出手段1
05で得られた項目の配置,連続する文字枠の個数など
の情報を用いて,帳票定義テーブル108に格納されて
いる定義テーブルの中から一致する定義テーブルを選択
する(図2の処理205)。なお,この種別判定手段1
07による処理は,後に図5を用いて詳述する。
The type determination means 107 is the entry frame extraction means 1
The matching definition table is selected from the definition tables stored in the form definition table 108 by using the information such as the item layout and the number of consecutive character frames obtained in 05 (process 205 in FIG. 2). . In addition, this type determination means 1
The processing according to 07 will be described later in detail with reference to FIG.

【0015】文字列認識手段109は,各記入枠内の文
字列イメージを認識する処理手段である。文字列認識手
段109では,種別判定手段107で得られた項目の種
別から得られる知識を用いて文字列認識を行う(図2の
処理206)。
The character string recognizing means 109 is a processing means for recognizing the character string image in each entry frame. The character string recognition unit 109 recognizes the character string using the knowledge obtained from the type of item obtained by the type determination unit 107 (process 206 in FIG. 2).

【0016】この文字列認識では,例えば次の参考文献
に示されているような技術を適用することができる。 〔参考文献〕仲林他,“あいまい用語検索を用いた高速
枠なし手書き文字列読み取り方式”,信学論,D-II,Vo
l.J74-D-II, No.11, pp.1528-1537 。
In this character string recognition, for example, the technique shown in the following reference can be applied. [References] Nakabayashi et al., "High-speed frame-free handwritten character string reading method using fuzzy term search", Theology, D-II, Vo
l.J74-D-II, No.11, pp.1528-1537.

【0017】まず,文字辞書110により認識する文字
の候補を抽出する。そして,例えば住所の項目に対して
は,住所で用いる文字列を辞書111(以降,用語辞書
と称す)として用意し,文字候補と用語辞書111とを
照合する。これにより,精度の高い文字認識を実現する
ことができる。
First, character candidates to be recognized by the character dictionary 110 are extracted. Then, for an item of address, for example, a character string used in the address is prepared as a dictionary 111 (hereinafter referred to as a term dictionary), and character candidates are collated with the term dictionary 111. As a result, highly accurate character recognition can be realized.

【0018】図4は,白画素の連続部分を抽出する記入
枠抽出手段を説明するための図である。図4では,各円
を画素と仮定し,一番外側の黒画素の列は黒枠線を示
し,中央に存在する図形は項目内に記入された図形を示
している。
FIG. 4 is a diagram for explaining an entry frame extracting means for extracting a continuous portion of white pixels. In FIG. 4, each circle is assumed to be a pixel, the outermost black pixel column indicates a black frame line, and the graphic existing in the center indicates the graphic entered in the item.

【0019】記入枠抽出手段105は,図4の(a)に
示すように,上端から走査方向に向けて,上下左右のい
ずれかの方向に隣接する白画素を同一領域としてまとめ
(以降,本処理をラベリングと称す),外接する長方形
の4角の座標を生成/変更していく。例えばラインiま
ででは,点Aと点Bとは異なる領域の点としてラベリン
グされるが,ラインjを処理する時,点Aを含む外接長
方形と点Bを含む外接長方形が,ラインjの白画素でつ
なげられ,同一の領域としてラベリングされる。このよ
うにして,最終的に,同一領域としてラベリングされる
白画素の連続領域401が得られ,該長方形領域の各4
角の帳票上の絶対座標を抽出することができる。
As shown in FIG. 4A, the entry frame extraction means 105 collects white pixels adjacent to each other in the up, down, left, or right direction from the upper end in the scanning direction as the same area (hereinafter, referred to as a book). The process is called labeling), and the coordinates of the four corners of the circumscribing rectangle are generated / changed. For example, up to line i, points A and B are labeled as points in different areas, but when processing line j, the circumscribing rectangle including point A and the circumscribing rectangle including point B are the white pixels of line j. They are connected by and are labeled as the same area. In this way, finally, a continuous region 401 of white pixels to be labeled as the same region is obtained, and each of the four rectangular regions is
It is possible to extract the absolute coordinates on the corner sheet.

【0020】図4の(b)は,白画素の連続領域が2つ
存在する例を示している。(a)で説明した処理と同様
な処理により,連続領域402,403が得られ,2つ
の長方形領域が求まる。
FIG. 4B shows an example in which there are two continuous regions of white pixels. By the processing similar to the processing described in (a), continuous areas 402 and 403 are obtained and two rectangular areas are obtained.

【0021】図5は,図1に示す種別判定手段107の
説明図である。図5において,501は,記入枠抽出手
段105で抽出された記入枠の物理的配置を示す。50
2は,配置501を解析して得られた記入枠情報であっ
て,該記入枠の配置と対応させて,項目枠,文字枠を,
2次元の配列に展開したメモリ内部の状態を示すもので
ある。503,504は,帳票定義テーブル108から
得られた記入枠情報を示す。
FIG. 5 is an explanatory diagram of the type determining means 107 shown in FIG. In FIG. 5, reference numeral 501 indicates the physical arrangement of the entry frames extracted by the entry frame extraction means 105. Fifty
Reference numeral 2 is entry frame information obtained by analyzing the layout 501, which corresponds to the layout of the entry frame to create an item frame and a character frame.
It shows the internal state of the memory expanded into a two-dimensional array. Reference numerals 503 and 504 denote entry frame information obtained from the form definition table 108.

【0022】記入枠情報502の作成では,記入枠の中
心座標により,各記入枠を行および列へ配置する。ま
た,各記入枠を,枠のサイズにより項目枠,文字枠に分
類し,文字枠については,連続する文字枠をひとまとめ
にして,個数を求める。
In creating the entry frame information 502, each entry frame is arranged in rows and columns according to the center coordinates of the entry frame. Also, each entry frame is classified into an item frame and a character frame according to the size of the frame, and regarding the character frame, the consecutive character frames are collected and the number is obtained.

【0023】一方,帳票定義テーブル108の記入枠情
報503,504は,あらかじめ定義された物理的配置
関係を,記入枠情報502と同様に,2次元の配列に展
開したものである。帳票定義テーブル108は,図3に
示したように,記入種別,文字枠・項目枠の別,文字枠
の場合には文字数,を定義してあるため,記入枠情報5
02に記入種別が付加された状態となる。
On the other hand, the entry frame information 503, 504 of the form definition table 108 is a physical layout relationship defined in advance, which is expanded into a two-dimensional array like the entry frame information 502. As shown in FIG. 3, the form definition table 108 defines the entry type, the character frame / item frame, and the number of characters in the case of a character frame.
The entry type is added to 02.

【0024】種別判定手段107では,項目枠/文字枠
の別,および文字枠の個数をキーとして,入力帳票と複
数の定義テーブル間の照合を行い,合致する定義テーブ
ルを探し出し,各記入枠の記入種別を求める。例えば,
記入枠抽出手段105により抽出した記入枠の物理的配
置501に対応する記入枠情報502の1行1列目に
は,項目枠が存在し,1行2列目には記入枠が存在して
いないことから,定義テーブルの記入枠情報504の定
義と一致しないことがわかる。
The type determining means 107 collates the input form with a plurality of definition tables by using the item frame / character frame distinction and the number of character frames as keys, finds a matching definition table, and finds each of the input frames. Ask for the type of entry. For example,
The entry frame information 502 corresponding to the physical arrangement 501 of the entry frames extracted by the entry frame extraction means 105 has an item frame at the first row and the first column, and an entry frame at the first row and the second column. Since it does not exist, it can be seen that it does not match the definition of the entry frame information 504 in the definition table.

【0025】また,文字枠については,各行毎に隣接す
る枠を,一旦,すべてまとめる処理(例えば,図5の例
では,3行目で隣接する全ての文字枠を一つにまとめて
いる)を行い,定義テーブル側でも,文字枠が隣接して
いる場合には文字数を加算した上で照合する。このよう
にして,記入枠情報502は,記入枠情報503と合致
し,各枠に,記入種別が与えられる。なお,文字枠につ
いては,認識の際には定義テーブルに従って分割する。
With respect to the character frames, a process of once combining all the adjacent frames for each line (for example, in the example of FIG. 5, all the adjacent character frames are combined into one in the third line). If the character frames are adjacent to each other on the definition table side, the number of characters is added and then the comparison is performed. In this way, the entry frame information 502 matches the entry frame information 503, and the entry type is given to each frame. The character frame is divided according to the definition table at the time of recognition.

【0026】帳票が傾いて入力されたときには,白画素
の連続領域である記入枠が傾いた状態で抽出される。こ
のため,それらの記入枠のうち,任意の項目枠について
横線の傾きを計算することができる。この傾きの情報
で,各記入枠の中心座標を補正し,同一行の記入枠の検
出を行う。
When a document is input with an inclination, the entry frame, which is a continuous area of white pixels, is extracted with an inclination. Therefore, the inclination of the horizontal line can be calculated for any of the entry boxes. The center coordinates of each entry frame are corrected based on this inclination information, and the entry frames in the same line are detected.

【0027】[0027]

【発明の効果】以上説明したように,本発明によれば,
黒罫線で書かれた記入枠を,連続した白画素部分により
検出するため,記入枠の絶対座標のような詳しい定義が
不要になる。また,白画素の連続領域の抽出時に,外接
長方形の4角の点の座標を求めることにより,傾いた記
入枠の検出が可能となる。これにより,イメージの傾き
によるずれも,同様の手段で処理できる。
As described above, according to the present invention,
Since the entry frame written with black ruled lines is detected by the continuous white pixel portion, detailed definition such as the absolute coordinates of the entry frame becomes unnecessary. Also, when extracting a continuous area of white pixels, it is possible to detect an inclined entry frame by obtaining the coordinates of the four corner points of the circumscribed rectangle. As a result, the shift due to the inclination of the image can be processed by the same means.

【0028】さらに,記入枠の物理的な配置関係と記入
種別を対応させた定義テーブルを用いることにより,入
力帳票の記入種別を判定できるため,文字認識時に,記
入種別毎の固有の知識を用いて認識精度を向上させるこ
とができる。
Further, since the entry type of the input form can be determined by using the definition table in which the physical layout relationship of the entry frame and the entry type are associated with each other, unique knowledge for each entry type is used at the time of character recognition. The recognition accuracy can be improved.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の実施例の構成を示すブロック図であ
る。
FIG. 1 is a block diagram showing a configuration of an exemplary embodiment of the present invention.

【図2】本発明の実施例の処理フローチャートである。FIG. 2 is a processing flowchart of an embodiment of the present invention.

【図3】本発明の実施例に係る帳票定義テーブルの記述
例を示す図である。
FIG. 3 is a diagram showing a description example of a form definition table according to the embodiment of the present invention.

【図4】本発明の実施例に係る記入枠抽出手段の説明図
である。
FIG. 4 is an explanatory diagram of an entry frame extracting unit according to the exemplary embodiment of the present invention.

【図5】本発明の実施例に係る種別判定手段の説明図で
ある。
FIG. 5 is an explanatory diagram of a type determination unit according to the exemplary embodiment of the present invention.

【図6】従来技術を説明するための図である。FIG. 6 is a diagram for explaining a conventional technique.

【符号の説明】[Explanation of symbols]

101 入力帳票 102 画像入力装置 103 処理装置 104 イメージ格納手段 105 記入枠抽出手段 106 記入枠判定手段 107 種別判定手段 108 帳票定義テーブル 109 文字列認識手段 110 文字辞書 111 用語辞書 101 input form 102 image input device 103 processing device 104 image storage means 105 entry frame extraction means 106 entry frame determination means 107 type determination means 108 form definition table 109 character string recognition means 110 character dictionary 111 term dictionary

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 黒枠線で囲まれた記入枠が1個以上存在
する帳票画像から,各記入枠に記入された文字列を,あ
らかじめ帳票の記入枠に応じて定められた項目の記入種
別で分類し認識する帳票理解システムにおいて,画像入
力装置(102) から入力された入力帳票(101) のイメージ
データを格納するイメージ格納手段(104) と,同色画素
の連結成分で作られる四角形の4点の各垂直,水平方向
座標を抽出し,所定のしきい値との比較により四角形を
選択して記入枠を決定する記入枠抽出手段(105) と,記
入の種別と記入枠の位置とを対応させた情報を記憶する
帳票定義テーブル(108) と,前記帳票定義テーブル(10
8) において定義された記入枠間の物理的配置関係と前
記記入枠抽出手段(105) で得られた記入枠間の物理的配
置関係との照合により,記入枠の種別を決定する種別判
定手段(107) と,記入枠内の文字列について,記入種別
に固有の知識を用いて,文字の切り出し,認識,知識処
理を行う文字列認識手段(109) とを備えたことを特徴と
する帳票理解システム。
1. From a form image in which at least one entry frame surrounded by a black frame line exists, a character string entered in each entry frame is entered in the entry type determined in advance according to the entry frame of the form. In a form understanding system for classifying and recognizing, an image storing means (104) for storing image data of an input form (101) inputted from an image input device (102) and a quadrilateral four points formed by connected components of pixels of the same color. Corresponds the entry type and the position of the entry frame with the entry frame extraction means (105) for extracting the vertical and horizontal coordinates of each of the above and selecting the quadrangle by selecting the quadrangle by comparing it with a predetermined threshold value. The form definition table (108) for storing the selected information and the form definition table (10
Type determination means for determining the type of entry frame by collating the physical placement relationship between entry boxes defined in 8) with the physical placement relationship between entry frames obtained by the entry frame extraction means (105) (107) and a character string recognition means (109) for performing character cutting, recognition, and knowledge processing on the character strings in the entry frame by using knowledge specific to the entry type. Understanding system.
JP4269027A 1992-10-08 1992-10-08 Slip comprehension system Pending JPH06119491A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4269027A JPH06119491A (en) 1992-10-08 1992-10-08 Slip comprehension system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4269027A JPH06119491A (en) 1992-10-08 1992-10-08 Slip comprehension system

Publications (1)

Publication Number Publication Date
JPH06119491A true JPH06119491A (en) 1994-04-28

Family

ID=17466661

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4269027A Pending JPH06119491A (en) 1992-10-08 1992-10-08 Slip comprehension system

Country Status (1)

Country Link
JP (1) JPH06119491A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11219442A (en) * 1997-11-25 1999-08-10 Mitsubishi Electric Corp Document edition output device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11219442A (en) * 1997-11-25 1999-08-10 Mitsubishi Electric Corp Document edition output device

Similar Documents

Publication Publication Date Title
US5799115A (en) Image filing apparatus and method
US5668892A (en) Table recognition apparatus
US5774580A (en) Document image processing method and system having function of determining body text region reading order
US5048107A (en) Table region identification method
JPS6159568A (en) Document understanding system
EP0843275B1 (en) Pattern extraction apparatus and method for extracting patterns
US8077976B2 (en) Image search apparatus and image search method
JPH05242292A (en) Separating method
JP3851742B2 (en) Form processing method and apparatus
JP2926066B2 (en) Table recognition device
US11270146B2 (en) Text location method and apparatus
JPH06119491A (en) Slip comprehension system
JP3276555B2 (en) Format recognition device and character reader
Hu et al. Construction of partitioning paths for touching handwritten characters
JPH0728935A (en) Document image processor
JPS6214277A (en) Picture processing system
JPH07160810A (en) Character recognizing device
JPH0589190A (en) Drawing information checking system
JP2918363B2 (en) Character classification method and character recognition device
JP4763113B2 (en) High speed labeling method
JPS63131287A (en) Character recognition system
JP3276554B2 (en) Format recognition device and character reader
JP2954218B2 (en) Image processing method and apparatus
JPH03126188A (en) Character recognizing device
JP2000207490A (en) Character segmenting device and character segmenting method