JPH06274690A

JPH06274690A - Character recognizing device and optical character reader

Info

Publication number: JPH06274690A
Application number: JP5083811A
Authority: JP
Inventors: Teruki Oikawa; 晃樹及川; Toshio Tsutsumida; 敏夫堤田
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1993-03-19
Filing date: 1993-03-19
Publication date: 1994-09-30

Abstract

PURPOSE:To quickly correct image data and to recognize a character with high accuracy by dividing an area formed with an internal boundary frame and an external boundary frame into plural areas, and performing correction processing selectively after detecting the area for which the correction processing is required. CONSTITUTION:A correction part 16 is constituted of an internal boundary correction part and an external boundary correction part. The image data in field memory 17 is corrected by applying internal boundary frame correction processing to an internal small area designated by a control part 15 in page memory 11 at the internal boundary frame correction part similarly as a conventional device. Similarly, at the external boundary frame correction part, the image data in the memory 17 is corrected by applying external boundary frame correction processing to an external area designated by the control part 15 in the memory 11. A described character can be reproduced with high accuracy and the correction processing can be accelerated by correcting the image data in the memory 17 in such way.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字や数字等のイメー
ジデータ（画像データ、以下同じ）を文字データとして
認識する文字認識装置、及び、光学的に帳票等を走査
し、読取ったイメージデータに基づいて文字認識を行う
光学式文字読取装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for recognizing image data (image data, the same applies hereinafter) such as characters and numbers as character data, and image data read by optically scanning a form or the like. The present invention relates to an optical character reading device that performs character recognition based on.

【０００２】[0002]

【従来の技術】帳票等に記載された文字や数字、又は任
意に設定された記号（以下、単に「文字」と称する）を
光学的に走査してイメージデータを生成し、このイメー
ジデータを文字として高精度に認識する装置として、光
学式文字読取装置(ＯＣＲ)がある。2. Description of the Related Art Image data is generated by optically scanning characters or numbers written on a form or the like, or arbitrarily set symbols (hereinafter simply referred to as "characters"), and the image data is converted into characters. There is an optical character reader (OCR) as a device that recognizes with high accuracy.

【０００３】図９は、この種の光学式文字読取装置が具
備する従来の文字認識装置のブロック構成図である。図
９において、９１は認識対象のイメージデータを格納す
るページメモリ、９２は枠検出・除去部、９３はフィー
ルドメモリ、９４は文字切出し部、９５は文字認識部で
ある。FIG. 9 is a block diagram of a conventional character recognition device included in this type of optical character reading device. In FIG. 9, 91 is a page memory that stores image data to be recognized, 92 is a frame detection / removal unit, 93 is a field memory, 94 is a character cutout unit, and 95 is a character recognition unit.

【０００４】また、認識対象物となる帳票イメージの例
（文字枠が設けられたフィールドの一部）を図１０に示
す。この図１０に示されるように、帳票等の認識対象に
は、文字と同系色の文字記入枠（非ドロップアウトカラ
ー、単に枠と記載する）が予め設けられている。FIG. 10 shows an example of a form image (a part of a field provided with a character frame) which is a recognition object. As shown in FIG. 10, a recognition target such as a form is provided in advance with a character entry frame (a non-dropout color, simply described as a frame) having a color similar to that of the character.

【０００５】ページメモリ９１は、認識対象物を光学的
に走査して得られる帳票イメージデータを格納する。こ
のページメモリ９１上にＸ−Ｙ座標系を設定する。例え
ば仮想的にＸ軸方向を行方向、Ｙ軸方向を列方向として
設定する。The page memory 91 stores form image data obtained by optically scanning an object to be recognized. An XY coordinate system is set on the page memory 91. For example, the X-axis direction is virtually set as the row direction and the Y-axis direction is set as the column direction.

【０００６】枠検出・除去部９２は、上記ページメモリ
９１上の帳票イメージに対して図１１に示すように行及
び列方向の射影ヒストグラム（黒画素累積数）を求め、
それに基づいて枠位置を検出する。次にその枠位置の内
側に内部境界枠を設定し、これにより形成される枠領域
をイメージデータを含む領域として切り出す。The frame detection / removal unit 92 obtains a projection histogram (black pixel cumulative number) in the row and column directions for the form image on the page memory 91 as shown in FIG.
The frame position is detected based on that. Next, an internal boundary frame is set inside the frame position, and the frame area formed by this is cut out as an area containing image data.

【０００７】フィールドメモリ９３は、上記枠検出・除
去部９２で切り出されたイメージを格納する。このメモ
リ上にはＸ−Ｙ座標系を設定し、例えば仮想的にＸ軸方
向を行方向及びＹ軸方向を列方向として設定する。The field memory 93 stores the image cut out by the frame detection / removal unit 92. An XY coordinate system is set on this memory, and for example, the X-axis direction is virtually set as the row direction and the Y-axis direction is set as the column direction.

【０００８】文字切出し部９４は、前記フィールドメモ
リ９３上のイメージに対して行方向及び列方向の射影を
とり、これら射影に基づいて文字パタンを切り出し出力
する。The character cut-out unit 94 takes projections in the row and column directions with respect to the image on the field memory 93, and cuts out and outputs character patterns based on these projections.

【０００９】文字認識部９５は、前記文字切出し部９４
からの文字パタンに基づいて文字を識別し、認識結果を
出力する。The character recognition unit 95 is provided with the character cutout unit 94.
The character is identified based on the character pattern from and the recognition result is output.

【００１０】図１２に前記枠検出・除去部による枠の検
出及び内部境界枠の設定方法の説明図を示す。通常、こ
の枠は上辺、右辺、左辺、下辺からなる長方形のものを
用いる。まず、図１１に示すようにＸ軸方向を行方向、
Ｙ軸方向を列方向としたときの行方向の射影ヒストグラ
ムを求める。この射影ヒストグラムを列方向に見て、そ
の値が最初に閾値THL以上となった場合を枠上辺の上
端、続いて閾値THL以下となった位置を枠上側の下端と
する。次に、検出された枠の下端に所定のマージン'WL'
を加えた位置を内部境界枠の上辺とする。同様にして枠
の右辺、左辺、下辺を指定して内部境界枠を決定する。FIG. 12 shows an explanatory diagram of a method of detecting a frame and setting an internal boundary frame by the frame detecting / removing unit. Normally, this frame has a rectangular shape including an upper side, a right side, a left side, and a lower side. First, as shown in FIG. 11, the X-axis direction is the row direction,
A projection histogram in the row direction when the Y axis direction is the column direction is obtained. When this projection histogram is viewed in the column direction, the case where the value first becomes equal to or more than the threshold THL is set as the upper end of the frame upper side, and the position at which the value becomes equal to or less than the threshold THL is set as the lower end of the frame upper side. Next, at the bottom of the detected frame, a predetermined margin'WL '
The position to which is added is the upper side of the internal boundary frame. Similarly, the right side, the left side, and the lower side of the frame are designated to determine the internal boundary frame.

【００１１】図１３に前記枠検出・除去部によって切り
出された文字列イメージの説明図を示す。この図に示さ
れるように、枠領域を単純に切り出しただけでは文字と
枠とが接触している部分や、枠から文字がはみ出した部
分が欠落してしまう。このような欠落部を補うために、
図１４及び図１５に示されるように、枠領域を単純に切
り出した後に内部境界枠の補正及び外部境界枠の補正が
行われている。FIG. 13 shows an explanatory view of a character string image cut out by the frame detection / removal unit. As shown in this figure, if the frame area is simply cut out, a portion where the character and the frame are in contact with each other or a portion where the character protrudes from the frame are missing. To make up for such a missing part,
As shown in FIGS. 14 and 15, after the frame area is simply cut out, the inner boundary frame and the outer boundary frame are corrected.

【００１２】図１４（ａ）,（ｂ）,（ｃ）に内部境界枠
補正処理の説明図を示す。FIGS. 14A, 14B, and 14C are explanatory views of the internal boundary frame correction processing.

【００１３】まず、図１４（ａ）に示すようにページメ
モリ９１内の帳票イメージに対して枠領域内のイメージ
データの有無を探索する。通常、文字認識対象となるイ
メージデータは黒画素により構成されるので、黒画素の
有無を検出してイメージデータを探索する。次に、図１
４（ｂ）に示すように、ページメモリ９１における枠領
域内の黒画素位置Ａ１、Ａ２を検出し、フィールドメモ
リ９３におけるこれら位置Ａ１、Ａ２に対応する点を検
出するとともにその間を補完して黒点化する。これによ
り、フィールドメモリ９３には図１４（ｃ）に示される
ようなイメージデータが格納される。First, as shown in FIG. 14A, the form image in the page memory 91 is searched for the presence or absence of image data in the frame area. Normally, image data to be a character recognition target is composed of black pixels, and therefore the presence or absence of black pixels is detected to search the image data. Next, FIG.
As shown in FIG. 4 (b), black pixel positions A1 and A2 in the frame area in the page memory 91 are detected, points corresponding to these positions A1 and A2 in the field memory 93 are detected, and the spaces between them are complemented to form black dots. Turn into. As a result, the image data as shown in FIG. 14C is stored in the field memory 93.

【００１４】図１５（ａ）〜（ｅ）に外部境界枠補正処
理の説明図を示す。図１５（ａ）に示すように、ページ
メモリ９１内の帳票イメージに対して外部境界枠及び内
部境界枠で形成される各領域内の黒画素を探索する。
尚、外部境界枠は、前記枠の外側に設定されるものであ
り、枠の上端に所定のマージン'WL'を加えた位置を外部
境界枠の上側位置とし、内部境界枠と同様に右側位置、
左側位置、下側位置等を指定することにより決定され
る。次に、図１５（ｂ）に示すように、ページメモリ９
１における外部境界枠で形成される領域内の黒画素位置
Ｂ１、内部境界枠で形成される黒画素位置Ｂ２を検出
し、フィールドメモリ９３においてこれら位置Ｂ１、Ｂ
２に対応する点を検出するともにその間を補完して黒点
化する（図１５（ｃ））。更に、図１５（ｄ）に示すよ
うに、ページメモリ９１における外部境界上の位置Ｂ１
を開始位置として外部境界枠からのはみ出し部分をトレ
ースし、フィールドメモリ９３でこの位置Ｂ１に対応す
る点を検出するとともに、トレース範囲内を補完する
（図１５（ｅ）。15 (a) to 15 (e) are explanatory views of the external boundary frame correction processing. As shown in FIG. 15A, the form image in the page memory 91 is searched for a black pixel in each area formed by the outer boundary frame and the inner boundary frame.
The outer boundary frame is set outside the frame, and the position where the predetermined margin'WL 'is added to the upper end of the frame is the upper position of the outer boundary frame, and the right position is the same as the inner boundary frame. ,
It is determined by specifying the left side position, the lower side position, and the like. Next, as shown in FIG. 15B, the page memory 9
1, the black pixel position B1 in the area formed by the outer boundary frame and the black pixel position B2 formed by the inner boundary frame are detected, and these positions B1, B in the field memory 93 are detected.
The points corresponding to 2 are detected, and the spaces between them are complemented to form black dots (FIG. 15C). Further, as shown in FIG. 15D, the position B1 on the external boundary in the page memory 91 is
The portion protruding from the outer boundary frame is traced with the start position as the start position, the point corresponding to this position B1 is detected in the field memory 93, and the inside of the trace range is complemented (FIG. 15 (e)).

【００１５】このように、フィールドメモリ９３上のイ
メージデータを補正した後に、このイメージデータを文
字切出し部９４で切り出し、更に文字認識部９５で文字
認識を行うことで、文字の認識精度の向上を図ってい
る。As described above, after the image data on the field memory 93 is corrected, the image data is cut out by the character cutting section 94, and the character recognition section 95 performs character recognition to improve the character recognition accuracy. I am trying.

【００１６】[0016]

【発明が解決しようとする課題】しかしながら、上述の
光学式文字読取装置では、枠領域を単純に切り出すと処
理時間は短縮されるものの、文字が枠に接触していた
り、あるいは、枠からはみ出して記入されている場合に
は、切り出された文字パタンが枠と重なったり、はみ出
した文字の一部が失われる場合がある。そのため、文字
認識精度が低下する欠点があった。However, in the above-mentioned optical character reading device, although the processing time is shortened if the frame area is simply cut out, the characters are in contact with the frame or are out of the frame. If it is filled in, the cut-out character pattern may overlap the frame, or part of the protruding character may be lost. Therefore, there is a drawback that the character recognition accuracy decreases.

【００１７】また、上記のような枠接触や枠はみ出しを
考慮して補正を行う場合は、文字と枠との接触やはみ出
しが発生したか否かに拘らず外部境界枠及び内部境界枠
により形成される領域全体に対して詳細な除去処理を行
っている。そのため、文字の接触やはみ出しが少ない場
合であっても文字認識処理の開始タイミングが遅れてし
まい、結果的に読取処理全体の速度が低下する問題があ
った。Further, when the correction is performed in consideration of the frame contact and the frame protrusion as described above, it is formed by the outer boundary frame and the inner boundary frame regardless of whether or not the contact between the character and the frame occurs. The detailed removal process is performed on the entire region to be removed. Therefore, even if there is little contact or protrusion of characters, there is a problem that the start timing of the character recognition process is delayed, and as a result, the speed of the entire reading process decreases.

【００１８】本発明は、上記背景の下になされたもので
あり、認識対象のイメージデータを高速かつ高精度に文
字データとして認識する文字認識装置を提供することを
目的とする。The present invention has been made under the background described above, and an object thereof is to provide a character recognition device for recognizing image data to be recognized as character data at high speed and with high accuracy.

【００１９】本発明の他の目的は、この文字認識装置と
光学的画像入力装置とを用いて高速かつ高精度な文字認
識を可能とする光学式文字読取装置を提供することにあ
る。Another object of the present invention is to provide an optical character reader which enables high-speed and highly accurate character recognition by using this character recognition device and an optical image input device.

【００２０】[0020]

【課題を解決するための手段】上記目的を達成する本発
明の文字認識装置は、処理対象となるイメージデータ内
の文字記入枠を検出する文字記入枠検出手段と、検出さ
れた文字記入枠の内側及び外側に夫々内部境界枠と外部
境界枠とを設定する境界枠設定手段と、前記内部境界枠
及び外部境界枠と前記文字記入枠とで形成される枠領域
を夫々複数の小領域に分割して各小領域におけるイメー
ジデータの有無を表すフラグデータを生成するフラグデ
ータ生成手段と、前記フラグデータに基づいて補正領域
を決定するとともに、該補正領域に存するイメージデー
タの補正を行う補正手段と、補正されたイメージデータ
を文字データとして認識する文字データ認識手段と、を
有し、補正領域を決定して枠除去を選択的に行う構成と
した。A character recognition device of the present invention which achieves the above object, is a character entry frame detecting means for detecting a character entry frame in image data to be processed, and a character entry frame detected. Boundary frame setting means for setting an inner boundary frame and an outer boundary frame respectively inside and outside, and a frame area formed by the inner boundary frame and the outer boundary frame and the character entry frame is divided into a plurality of small areas, respectively. Flag data generating means for generating flag data indicating the presence or absence of image data in each small area, and correction means for determining a correction area based on the flag data and correcting image data existing in the correction area. And a character data recognition unit that recognizes the corrected image data as character data, and determines the correction area to selectively remove the frame.

【００２１】この構成において、前記フラグデータ生成
手段は、前記内部境界枠を辺毎に分割して形成される個
々の内部小領域内のイメージデータの有無を検出する第
一の手段と、前記外部境界枠を辺毎に分割して形成され
る個々の外部小領域内のイメージデータの有無を検出す
る第二の手段と、各検出結果に応じて前記補正領域を決
定する第三の手段とから成る。In this structure, the flag data generating means includes a first means for detecting the presence or absence of image data in each internal small area formed by dividing the internal boundary frame for each side, and the external means. From a second means for detecting the presence or absence of image data in each external small area formed by dividing the boundary frame for each side, and a third means for determining the correction area according to each detection result. Become.

【００２２】また、上記他の目的を達成する本発明の光
学式文字読取装置は、認識対象物を光学的に走査して文
字が記入された文字記入枠を含むイメージデータを生成
する光学的画像入力装置と、前記イメージデータに基づ
いて前記文字を認識する文字認識装置とを有する光学式
文字読取装置において、前記文字認識装置に、上記本発
明の文字認識装置を用いたことを特徴としている。Further, the optical character reader of the present invention which achieves the above-mentioned other objects, is an optical image for optically scanning a recognition object to generate image data including a character entry frame in which characters are entered. In an optical character reader having an input device and a character recognition device for recognizing the character based on the image data, the character recognition device of the present invention is used as the character recognition device.

【００２３】[0023]

【作用】本発明の文字認識装置及びこれを用いた光学式
文字読取装置においては、内部境界枠及び外部境界枠に
て形成される枠領域を複数の小領域に分割し、これら小
領域に対し補正処理が必要かどうかを検出する。その結
果から補正を行う小領域を決定して枠除去を選択的に行
うようにしている。従って全領域に対して詳細な枠除去
等の補正を行う場合に比較してイメージデータを速やか
に補正することができるうえ、補正精度が低くなること
もない。特に、補正処理が必要な領域が検出されない場
合は、補正処理を行うことなく即座に文字認識処理を行
うので、処理時間が格段に短縮される。In the character recognition device of the present invention and the optical character reading device using the same, the frame area formed by the inner boundary frame and the outer boundary frame is divided into a plurality of small areas, and the small area is divided into a plurality of small areas. Detects whether correction processing is necessary. Based on the result, a small area to be corrected is determined and the frame is selectively removed. Therefore, the image data can be corrected promptly as compared with the case where detailed correction such as frame removal is performed on the entire area, and the correction accuracy does not decrease. In particular, when the area requiring the correction processing is not detected, the character recognition processing is immediately performed without performing the correction processing, so that the processing time is significantly shortened.

【００２４】[0024]

【実施例】本実施例においては、帳票を光学的に走査し
て得られるイメージデータの文字認識を文字認識装置に
よって行った。この文字認識装置のブロック構成図を図
１に示す。図１において１１はページメモリ、１２は枠
検出部（文字記入枠検出手段）、１３は境界枠設定部
（境界枠設定手段）、１４はフラグデータ生成部（フラ
グデータ生成手段）、１５は制御部、１６は補正部（補
正手段）、１７はフィールドメモリ、１８は文字切出し
部、１９は文字認識部（文字データ認識手段）である。EXAMPLE In this example, character recognition of image data obtained by optically scanning a form was performed by a character recognition device. A block configuration diagram of this character recognition device is shown in FIG. In FIG. 1, 11 is a page memory, 12 is a frame detection unit (character entry frame detection means), 13 is a boundary frame setting unit (boundary frame setting means), 14 is a flag data generation unit (flag data generation means), and 15 is control. Reference numeral 16 is a correction unit (correction unit), 17 is a field memory, 18 is a character cutout unit, and 19 is a character recognition unit (character data recognition unit).

【００２５】尚、ページメモリ１１、フィールドメモリ
１７、文字切出し部１８、文字認識部１９は、夫々、上
記従来装置の対応する部品９１、９３、９４、９５と同
等の機能を有するものである。The page memory 11, the field memory 17, the character cutout unit 18, and the character recognition unit 19 have the same functions as the corresponding parts 91, 93, 94, and 95 of the above-mentioned conventional apparatus.

【００２６】枠検出部１２は、ページメモリ１１に格納
された帳票のイメージデータから従来装置と同様にして
枠を検出する。境界枠設定部１３は、従来装置と同様に
内部境界枠及び外部境界枠を設定する。本実施例では長
方形の枠を用いており、従来装置と同様に、上辺、右
辺、左辺、下辺からなる長方形の内部境界枠及び外部境
界枠を設定する。更に、内部境界枠により形成される枠
領域内のイメージデータをフィールドメモリ１７に出力
する。尚、上記枠の検出手法は、特に限定されるもので
はなく、任意の方法で検出を行うことができる。また、
これら内部境界枠、外部境界枠及び枠の形状は特に限定
されるものではなく、任意の形状を用いることができ
る。The frame detector 12 detects a frame from the image data of the form stored in the page memory 11 in the same manner as the conventional device. The boundary frame setting unit 13 sets the inner boundary frame and the outer boundary frame as in the conventional device. In the present embodiment, a rectangular frame is used, and similarly to the conventional apparatus, a rectangular inner boundary frame and outer boundary frame having an upper side, a right side, a left side, and a lower side are set. Further, the image data in the frame area formed by the internal boundary frame is output to the field memory 17. The method for detecting the frame is not particularly limited, and detection can be performed by any method. Also,
The shapes of the inner boundary frame, the outer boundary frame, and the frame are not particularly limited, and any shape can be used.

【００２７】フラグデータ生成部１４は、図２に示され
るように、内部境界枠分割部１４１、外部境界枠分割部
１４２、及びイメージデータ検出部１４３により構成さ
れる。As shown in FIG. 2, the flag data generating unit 14 is composed of an inner boundary frame dividing unit 141, an outer boundary frame dividing unit 142, and an image data detecting unit 143.

【００２８】内部境界枠分割部１４１は、図３に示すよ
うに、長方形の内部境界枠の４頂点、即ち左上頂点、右
上頂点、右下頂点、左下頂点を検出する。そして、これ
ら各頂点の上下左右の方向によって、この枠を上辺、下
辺、左辺、右辺の４つに分割する。内部境界枠の左上頂
点では、この点の右方向及び左方向に夫々上辺及び左辺
が存在するので、右方向として内部境界枠の上辺を、下
方向として内部境界枠の左辺を夫々指定する。同様に、
他の頂点においても上下左右の各方向によって内部境界
枠の上下左右の各辺を指定する。上記内部境界枠の各頂
点における指定方向の関係を図４に示す。この図におい
ては、各頂点において○印がついた方向における辺の指
定を行うことを示している。これにより各辺を含む内部
小領域が指定される。As shown in FIG. 3, the internal boundary frame dividing section 141 detects four vertices of the rectangular internal boundary frame, that is, an upper left vertex, an upper right vertex, a lower right vertex and a lower left vertex. Then, the frame is divided into four parts of an upper side, a lower side, a left side, and a right side according to the vertical and horizontal directions of these respective vertices. At the upper left apex of the inner boundary frame, the upper side and the left side exist respectively to the right and left of this point. Therefore, the upper side of the inner boundary frame is designated as the right direction and the left side of the inner boundary frame is designated as the lower direction. Similarly,
Also at other vertices, the upper, lower, left and right sides of the inner boundary frame are specified by the upper, lower, left and right directions. FIG. 4 shows the relationship between the specified directions at the vertices of the internal boundary frame. In this figure, it is shown that the side in the direction marked with a circle is designated at each vertex. As a result, an internal small area including each side is specified.

【００２９】外部境界枠分割部１４２でも、内部境界枠
分割部１４１と同様にして、図５に示されるような外部
境界枠の上下左右の各頂点を検出すると共に、各頂点に
おいて図６に示される方向における外部境界枠の上下左
右の各辺を指定する。これにより各辺を含む外部小領域
が指定される。The outer boundary frame dividing unit 142 also detects the top, bottom, left and right vertices of the outer boundary frame as shown in FIG. 5 in the same manner as the inner boundary frame dividing unit 141, and at each vertex is shown in FIG. Specifies the upper, lower, left, and right sides of the outer bounding box in the specified direction. As a result, the external small area including each side is specified.

【００３０】イメージデータ検出部１４３では、内部境
界枠分割部１４１によって指定された内部境界枠の上下
左右の全ての辺において夫々イメージデータの有無を検
出すると共に、その結果に基づいて接触有無フラグを生
成する。この接触有無フラグは、イメージデータが検出
されれば「有」、イメージデータがなければ「無」を表
す。The image data detector 143 detects the presence or absence of image data on all sides of the inner boundary frame designated by the inner boundary frame divider 141, and the contact presence / absence flag is detected based on the result. To generate. The contact presence / absence flag indicates “present” if image data is detected, and “absent” if there is no image data.

【００３１】同様に、イメージデータ検出部１４３で
は、外部境界枠の上下左右全ての辺において夫々イメー
ジデータの有無を検出すると共に、その結果に基づい
て、はみだし有無フラグを生成する。Similarly, the image data detection unit 143 detects the presence or absence of image data on all sides of the outer boundary frame, and generates an overflow flag based on the result.

【００３２】このはみだし有無フラグは、イメージデー
タが検出されれば「有」、イメージデータがなければ
「無」を表す（以上、第一及び第二の手段）。The protrusion presence / absence flag indicates "present" when image data is detected, and "absent" when there is no image data (above, first and second means).

【００３３】尚、本実施例においては、イメージデータ
を黒画素として格納し、この黒画素の有無を調べたが、
イメージデータの有無は条件に応じて任意の方法で調べ
ることができる。In this embodiment, the image data is stored as a black pixel and the presence or absence of this black pixel is checked.
The presence or absence of image data can be checked by any method depending on the conditions.

【００３４】補正部１６は、図７に示されるように、内
部境界枠補正部１６１及び外部境界枠補正部１６２によ
り構成される。As shown in FIG. 7, the correction unit 16 is composed of an inner boundary frame correction unit 161 and an outer boundary frame correction unit 162.

【００３５】制御部１５では、接触フラグが「有」とな
った辺を含む小領域を全て選択し、これら小領域に対し
てのみ内部境界枠補正を行うように内部境界枠補正部１
６１を制御する。同様に、はみだしフラグが「有」とな
った辺を含む小領域を全て選択し、これら小領域に対し
てのみ外部境界枠補正を行うように外部境界枠補正部１
６２を制御する。これにより、図８に示すように、接触
有無フラグ、はみ出し有無フラグの内容に応じて、内部
境界枠補正部１６１、外部境界枠補正部１６２の実行又
は非実行が決定される（第三の手段）。The control unit 15 selects all the small areas including the side whose contact flag is "present", and performs the internal boundary frame correction unit 1 so as to perform the internal boundary frame correction only on these small areas.
Control 61. Similarly, the external boundary frame correction unit 1 selects all of the small areas including the sides whose overflow flag is “present” and performs the external boundary frame correction only on these small areas.
62 is controlled. As a result, as shown in FIG. 8, execution or non-execution of the inner boundary frame correction unit 161 and the outer boundary frame correction unit 162 is determined according to the contents of the contact presence / absence flag and the protrusion presence / absence flag (third means). ).

【００３６】内部境界枠補正部１６１では、ページメモ
リ１１において制御部１５により指定された内部小領域
に対して、従来装置と内部境界枠補正処理を行ってフィ
ールドメモリ１７のイメージデータを補正する。同様
に、外部境界枠補正部１６２では、ページメモリ１１に
おいて制御部１５により指定された外部小領域に対し
て、従来装置と同様に外部境界枠補正処理を行ってフィ
ールドメモリ１７のイメージデータを補正する。The internal boundary frame correction unit 161 corrects the image data in the field memory 17 by performing an internal boundary frame correction process on the internal small area designated by the control unit 15 in the page memory 11 with the conventional apparatus. Similarly, the external boundary frame correction unit 162 corrects the image data in the field memory 17 by performing the external boundary frame correction process on the external small area designated by the control unit 15 in the page memory 11 as in the conventional apparatus. To do.

【００３７】上述のように、フィールドメモリ１７のイ
メージデータを補正することにより、記入された文字を
高精度に再現することができるうえ、補正処理を高速に
行うことができる。従って、正確かつ高速に文字認識を
行うことができる。As described above, by correcting the image data in the field memory 17, the written character can be reproduced with high accuracy and the correction process can be performed at high speed. Therefore, character recognition can be performed accurately and at high speed.

【００３８】また、補正領域数が多い場合には、イメー
ジデータを文字データとして正確に認識することが困難
になることが予想される。従って、補正領域数を検出す
るとともに、文字の認識結果を出力する際にその補正領
域数を表示することによって文字認識の信頼度の目安と
することもできる。When the number of correction areas is large, it is expected that it will be difficult to accurately recognize the image data as character data. Therefore, by detecting the number of correction areas and displaying the number of correction areas when a character recognition result is output, the reliability of character recognition can be used as a measure.

【００３９】通常、文字認識装置による認識結果は、更
に人手によって補正する場合が多いが、その際にこの信
頼度を参考にすることで信頼度の低い文字を優先的に補
正することが可能となり、補正作業の効率を高くするこ
とができる。Usually, the recognition result by the character recognition device is often corrected manually, but by referring to this reliability, it is possible to preferentially correct a character with low reliability. The efficiency of correction work can be increased.

【００４０】[0040]

【発明の効果】以上詳細に説明したように、本発明の文
字認識装置では、内部境界枠及び外部境界枠により形成
される領域を複数の領域に分割し、補正処理が必要な領
域を検出した後に補正処理を選択的に行うので、全領域
に対して詳細な枠除去等の補正を行う場合に比較してイ
メージデータを速やかに補正することができる。また、
補正精度が低くなることもなく、高精度に文字を認識す
ることができる。As described above in detail, in the character recognition apparatus of the present invention, the area formed by the inner boundary frame and the outer boundary frame is divided into a plurality of areas, and the area requiring the correction processing is detected. Since the correction process is selectively performed later, the image data can be corrected promptly as compared with the case where detailed correction such as frame removal is performed on the entire region. Also,
It is possible to recognize a character with high accuracy without lowering the correction accuracy.

【００４１】特に、補正処理が必要な領域が検出されな
い場合は、補正処理を行うことなく即座に文字認識処理
を行うので、処理時間を大きく短縮することができる。In particular, when the area requiring the correction processing is not detected, the character recognition processing is immediately performed without performing the correction processing, so that the processing time can be greatly shortened.

[Brief description of drawings]

【図１】本発明の一実施例に係る文字認識装置のブロッ
ク構成図。FIG. 1 is a block configuration diagram of a character recognition device according to an embodiment of the present invention.

【図２】本実施例によるフラグデータ生成部のブロック
構成部。FIG. 2 is a block configuration unit of a flag data generation unit according to the present embodiment.

【図３】内部境界の各頂点の説明図。FIG. 3 is an explanatory diagram of each vertex of an internal boundary.

【図４】各内部境界頂点における指定方向の説明図。FIG. 4 is an explanatory diagram of a designated direction at each internal boundary vertex.

【図５】外部境界の各頂点の説明図。FIG. 5 is an explanatory diagram of each vertex of the outer boundary.

【図６】各外部境界における指定方向の説明図。FIG. 6 is an explanatory diagram of a designated direction at each outer boundary.

【図７】補正部のブロック構成図FIG. 7 is a block configuration diagram of a correction unit.

【図８】フラグデータに対する制御動作の説明図。FIG. 8 is an explanatory diagram of a control operation for flag data.

【図９】従来の文字認識装置のブロック構成図。FIG. 9 is a block configuration diagram of a conventional character recognition device.

【図１０】帳票イメージの説明図。FIG. 10 is an explanatory diagram of a form image.

【図１１】射影ヒストグラムの説明図。FIG. 11 is an explanatory diagram of a projection histogram.

【図１２】内部境界の説明図。FIG. 12 is an explanatory diagram of internal boundaries.

【図１３】切り出されたイメージデータの説明図。FIG. 13 is an explanatory diagram of cut out image data.

【図１４】内部境界枠補正処理の説明図。FIG. 14 is an explanatory diagram of internal boundary frame correction processing.

【図１５】外部境界枠補正処理の説明図。FIG. 15 is an explanatory diagram of external boundary frame correction processing.

[Explanation of symbols]

１１ページメモリ１２枠検出部１３境界枠設定部１４フラグデータ生成部１５制御部１６補正部１７フィールドメモリ１８文字切り出し部１９文字認識部 11 page memory 12 frame detection unit 13 boundary frame setting unit 14 flag data generation unit 15 control unit 16 correction unit 17 field memory 18 character cutout unit 19 character recognition unit

Claims

[Claims]

1. A character entry frame detecting means for detecting a character entry frame in image data to be processed, and a boundary for setting an inner boundary frame and an outer boundary frame inside and outside the detected character entry frame, respectively. A frame setting unit, a frame region formed by the inner boundary frame, the outer boundary frame, and the character entry frame is divided into a plurality of small regions, respectively, and flag data representing the presence or absence of image data in each small region is generated. Flag data generation means, correction means for determining a correction area based on the flag data and for correcting image data existing in the correction area, and character data recognition means for recognizing the corrected image data as character data A character recognition device comprising:

2. The flag data generating means includes first means for detecting the presence or absence of image data in each internal small area formed by dividing the internal boundary frame for each side, and the external boundary frame. A second means for detecting the presence / absence of image data in each external small area formed by dividing each of the sides, and a third means for determining the correction area according to each detection result. The character recognition device according to claim 1, wherein:

3. An optical image input device that optically scans a recognition object to generate image data including a character entry frame in which characters are entered, and character recognition that recognizes the characters based on the image data. An optical character reading device having a device, wherein the character recognition device is the character recognition device according to claim 1 or 2.