JP2000132636A

JP2000132636A - Character recognition method and device therefor

Info

Publication number: JP2000132636A
Application number: JP10308114A
Authority: JP
Inventors: Masanobu Sato; 誠展佐藤; Naoya Tanaka; 直哉田中
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-10-29
Filing date: 1998-10-29
Publication date: 2000-05-12

Abstract

PROBLEM TO BE SOLVED: To make precisely detectable a character frame position and to make recognizable a character filled in a character frame even if the character frame in image data is thinned or a background printing exists around the character frame by detecting the position of the character frame based on a frame area direction code obtained in the detection area of the character frame. SOLUTION: An area detecting a character frame is decided from document image data stored in an image data storage part 12 and character registration data containing a frame coordinate value stored in a character frame registration data storage part 13. A frame area direction code is detected with an object where black picture elements in the frame detection area are connected as an outline trace. The corner position candidate coordinate of the character frame is detected and is stored in a frame corner candidate coordinate storage part 18 against a picture where a candidate frame line generated based on a frame line candidate position is written. The position of a character frame inner-side is decided from the corner position candidate coordinate of the character frame stored in the frame corner candidate coordinate storage part 18 and character frame registration data stored in the character frame registration data storage part 13.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字認識方法及び
装置に関し、特に、文字枠内に記入された文字を光学的
に読み取る文字認識方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition method and apparatus, and more particularly, to a character recognition method and apparatus for optically reading characters written in a character frame.

【０００２】[0002]

【従来の技術】従来の文字認識装置により、非ドロップ
アウトカラーの文字枠に記入された文字を認識する場
合、文字枠が存在すると推測される領域内で水平方向及
び垂直方向に黒画素を投影してヒストグラムを求め、文
字枠の位置を検出していた。2. Description of the Related Art When a character written in a character frame of a non-dropout color is recognized by a conventional character recognition device, black pixels are projected in a horizontal direction and a vertical direction in an area where a character frame is presumed to exist. Then, the histogram was obtained, and the position of the character frame was detected.

【０００３】この推測される領域は、帳票イメージデー
タ中で帳票用紙の縦及び横方向の印刷ずれ、及びイメー
ジスキャナ機構の誤差による画像の幾何変形等を考慮し
て定められる。文字枠の位置は、ヒストグラム上で予め
登録されている文字枠の高さ及び幅より求められた閾値
以上の値を取る位置から、検出される。The estimated area is determined in the form image data in consideration of the printing displacement of the form paper in the vertical and horizontal directions, the geometric deformation of the image due to the error of the image scanner mechanism, and the like. The position of the character frame is detected from a position having a value greater than or equal to a threshold value obtained from the height and width of the character frame registered in advance on the histogram.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、帳票イ
メージデータ中で、例えば文字枠がかすれたりすると、
文字枠位置のヒストグラム値が閾値より小さくなり、文
字枠を正常に検出することができなくなってしまう。However, for example, if the character frame is blurred in the form image data,
The histogram value at the character frame position becomes smaller than the threshold, and the character frame cannot be normally detected.

【０００５】図１３は、文字枠を正常に検出できない状
態を示し、（ａ）は、文字枠がかすれている帳票イメー
ジデータの説明図、（ｂ）は、（ａ）の領域より求めた
黒画素投影によるヒストグラムの説明図、（ｃ）は、誤
検出された文字枠位置の説明図である。FIGS. 13A and 13B show a state in which a character frame cannot be detected normally. FIG. 13A is an explanatory diagram of form image data in which the character frame is blurred, and FIG. FIG. 9C is an explanatory diagram of a histogram based on pixel projection, and FIG. 9C is an explanatory diagram of a character frame position that is erroneously detected.

【０００６】図１３に示すように、帳票イメージデータ
１には、文字枠２に記入された文字３が示される
（（ａ）参照）。この帳票イメージデータ１の領域４よ
り垂直方向に画素投影して得られたヒストグラムの中か
ら、枠登録データの文字枠２の高さから導かれる閾値Ｂ
ｈ以上のヒストグラム値を取る位置を枠位置候補とする
場合、かすれた文字枠２の位置Ｌ２、Ｌ５、Ｌ１０は、
閾値Ｂｈ以下のヒストグラム値を取る（（ｂ）参照）の
で選択されない。その代わり、閾値Ｂｈを超えて枠登録
データの文字枠２の幅及び枠間ピッチに近い位置Ｌ１、
Ｌ３、Ｌ４、Ｌ８、Ｌ９、Ｌ１３が、選択される。As shown in FIG. 13, a form image data 1 shows a character 3 entered in a character frame 2 (see (a)). From the histogram obtained by projecting pixels from the area 4 of the form image data 1 in the vertical direction, a threshold B derived from the height of the character frame 2 of the frame registration data
When a position that takes a histogram value equal to or greater than h is a frame position candidate, the positions L2, L5, and L10 of the faint character frame 2 are:
It is not selected because it takes a histogram value equal to or smaller than the threshold value Bh (see (b)). Instead, the position L1, which exceeds the threshold Bh and is close to the width of the character frame 2 and the pitch between the frames in the frame registration data,
L3, L4, L8, L9, and L13 are selected.

【０００７】このような選択の結果、位置Ｌ１、Ｌ３に
よる枠Ｆ１、位置Ｌ４、Ｌ８による枠Ｆ２、位置Ｌ９、
Ｌ１３による枠Ｆ３が、文字枠２として検出される
（（ｃ）参照）ことになり、文字枠２の位置が誤って検
出されてしまう。As a result of such selection, a frame F1 based on the positions L1 and L3, a frame F2 based on the positions L4 and L8, a position L9,
The frame F3 based on L13 is detected as the character frame 2 (see (c)), and the position of the character frame 2 is erroneously detected.

【０００８】つまり、従来の文字認識装置では、画素分
布を示すヒストグラムに基づく選択により文字枠２の位
置を検出していたので、イメージスキャナの特性や帳票
用紙の印刷文字枠２の反射率によって帳票イメージデー
タ中で枠線がかすれたり、文字枠２の周辺に背景印刷が
あったりした場合、文字枠２及び文字３と背景との判別
ができず文字枠２の位置を正確に検出できなかった。That is, in the conventional character recognition device, the position of the character frame 2 is detected by selection based on the histogram indicating the pixel distribution. Therefore, the form is determined by the characteristics of the image scanner and the reflectance of the character frame 2 printed on the form paper. When the frame line is blurred in the image data or when the background is printed around the character frame 2, the character frame 2 and the character 3 cannot be distinguished from the background, and the position of the character frame 2 cannot be accurately detected. .

【０００９】このような、画素分布を示すヒストグラム
に基づく選択により文字枠位置を検出するものとして、
例えば、特許第２７６１４６７号公報に開示された画像
切り出し装置及び文字認識装置がある。In order to detect the character frame position by such selection based on the histogram indicating the pixel distribution,
For example, there is an image clipping device and a character recognition device disclosed in Japanese Patent No. 2761467.

【００１０】本発明の目的は、イメージデータ中で文字
枠がかすれていたり文字枠周辺に背景印刷があっても、
文字枠位置を正確に検出して、文字枠内に記入された文
字を認識することができる文字認識方法及び装置を提供
することである。[0010] An object of the present invention is to provide an image processing apparatus that can be used even if a character frame is blurred in image data or a background print is present around the character frame.
An object of the present invention is to provide a character recognition method and apparatus capable of accurately detecting a character frame position and recognizing characters written in the character frame.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明に係る文字認識方法は、文字枠に記入された
文字を光学的に読み取り画像認識する文字認識方法にお
いて、前記文字枠の検出領域内で得られた枠領域方向コ
ードに基づき前記文字枠の位置を検出することを特徴と
している。In order to achieve the above object, a character recognition method according to the present invention is a character recognition method for optically reading a character entered in a character frame and recognizing an image. The position of the character frame is detected based on a frame region direction code obtained in the region.

【００１２】上記構成を有することにより、文字枠に記
入された文字を光学的に読み取り画像認識する際、文字
枠の位置は、文字枠の検出領域内で得られた枠領域方向
コードに基づいて検出される。これにより、イメージデ
ータ中で文字枠がかすれていたり文字枠周辺に背景印刷
があっても、文字枠位置を正確に検出して、文字枠内に
記入された文字を認識することができる。With the above-described structure, when a character written in a character frame is optically read and image-recognized, the position of the character frame is determined based on the frame region direction code obtained in the detection region of the character frame. Is detected. Thus, even if the character frame is faint in the image data or there is a background print around the character frame, the character frame position can be accurately detected and the character written in the character frame can be recognized.

【００１３】また、本発明に係る文字認識装置により、
上記文字認識方法を実現することができる。Further, according to the character recognition device of the present invention,
The above character recognition method can be realized.

【００１４】[0014]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１５】図１は、本発明の実施の形態に係る文字認
識装置の構成を示すブロック図である。図１に示すよう
に、文字認識装置１０は、イメージスキャナ部１１、イ
メージデータ記憶部１２、文字枠登録データ格納部１
３、文字枠位置検出部１４、文字認識部１５、方向コー
ド記憶部１６、枠候補記憶部１７、及び枠コーナ候補座
標記憶部１８を有する。FIG. 1 is a block diagram showing a configuration of a character recognition device according to an embodiment of the present invention. As shown in FIG. 1, the character recognition device 10 includes an image scanner unit 11, an image data storage unit 12, a character frame registration data storage unit 1.
3, a character frame position detection unit 14, a character recognition unit 15, a direction code storage unit 16, a frame candidate storage unit 17, and a frame corner candidate coordinate storage unit 18.

【００１６】イメージスキャナ部１１は、非ドロップア
ウトカラーで印刷された文字枠内に文字が記入された帳
票を、光学的に読み取って光電変換し、２値データから
なる帳票イメージデータを取得する。取得された帳票イ
メージデータは、イメージデータ記憶部１２へ出力され
る。イメージデータ記憶部１２は、イメージスキャナ部
１１が読み取った帳票イメージデータを、２次元画像と
して記憶しておくメモリである。The image scanner section 11 optically reads a form in which characters are written in a character frame printed in a non-dropout color, photoelectrically converts the form, and acquires form image data composed of binary data. The acquired form image data is output to the image data storage unit 12. The image data storage unit 12 is a memory that stores the form image data read by the image scanner unit 11 as a two-dimensional image.

【００１７】文字枠登録データ格納部１３は、予め登録
された枠座標値を含む文字枠登録データを格納するため
のものであり、磁気ディスク、光ディスク或いはフロッ
ピーディスク等からなる。この文字枠登録データ格納部
１３には、各文字枠の基準となる位置及び大きさが格納
される。The character frame registration data storage section 13 is for storing character frame registration data including frame coordinate values registered in advance, and is composed of a magnetic disk, an optical disk, a floppy disk, or the like. The character frame registration data storage 13 stores a reference position and size of each character frame.

【００１８】図２は、文字枠登録データ格納部に格納さ
れる文字枠登録データの一例を示す説明図である。図２
に示すように、文字枠登録データは、３個の文字枠１９
ａ，１９ｂ，１９ｃに関するデータからなり、座標（Ｘ
＿Ｓ，Ｙ＿Ｓ）は帳票上の１枠目の左上の座標、距離Ｄ
ｘ１、Ｄｘ２は１枠目から２枠目及び３枠目の左上迄の
距離、横幅Ｗ１、Ｗ２、Ｗ３は各文字枠の幅、縦幅Ｈ
１、Ｈ２、Ｈ３は各文字枠の高さ、をそれぞれ示してい
る。FIG. 2 is an explanatory diagram showing an example of character frame registration data stored in the character frame registration data storage. FIG.
As shown in the figure, the character frame registration data includes three character frames 19.
a, 19b, and 19c. The coordinates (X
_S, Y_S) are the upper left coordinates of the first frame on the form and the distance D
x1 and Dx2 are the distances from the first frame to the upper left of the second and third frames, the widths W1, W2, and W3 are the widths of each character frame and the height H
1, H2 and H3 indicate the height of each character frame, respectively.

【００１９】文字枠位置検出部１４は、イメージデータ
記憶部１２からの帳票イメージデータ、及び文字枠登録
データ格納部１３からの文字枠登録データの入力によ
り、帳票イメージデータと文字枠登録データを比較して
文字枠の位置を決定する。決定された文字枠位置のイメ
ージは、文字認識部１５へ供給される。The character frame position detection unit 14 compares the form image data with the character frame registration data by inputting the form image data from the image data storage unit 12 and the character frame registration data from the character frame registration data storage unit 13. To determine the position of the character frame. The image of the determined character frame position is supplied to the character recognition unit 15.

【００２０】文字認識部１５は、入力された文字枠位置
のイメージから、文字枠内の帳票イメージデータに基づ
く文字認識を行う。方向コード記憶部１６、枠候補記憶
部１７及び枠コーナ候補座標記憶部１８は、文字枠位置
検出部１４による文字枠位置の決定に際し得られる、枠
領域方向コード、枠線候補及び枠コーナ候補の各データ
をそれぞれ記憶するためのメモリである。The character recognition unit 15 performs character recognition based on the form image data in the character frame from the input image of the character frame position. The direction code storage unit 16, the frame candidate storage unit 17, and the frame corner candidate coordinate storage unit 18 store the frame region direction code, the frame line candidate, and the frame corner candidate obtained when the character frame position detection unit 14 determines the character frame position. This is a memory for storing each data.

【００２１】図３は、図１の文字枠位置検出部による文
字枠位置検出処理の流れを示すフローチャートである。
図３に示すように、先ず、イメージデータ記憶部１２に
記憶された帳票イメージデータと、文字枠登録データ格
納部１３に記憶された枠座標値を含む文字枠登録データ
から、文字枠を検出する領域を決定する文字枠領域決定
処理を行う（ステップＳ１０１）。FIG. 3 is a flowchart showing the flow of a character frame position detecting process by the character frame position detecting section of FIG.
As shown in FIG. 3, first, a character frame is detected from the form image data stored in the image data storage unit 12 and the character frame registration data including the frame coordinate values stored in the character frame registration data storage unit 13. A character frame area determination process for determining an area is performed (step S101).

【００２２】次に、枠領域決定処理で求めた枠検出領域
内の黒画素が連結してできている物体を、輪郭トレース
して枠領域方向コードを抽出し、輪郭座標と対応させて
方向コード記憶部１６に記憶させる枠領域方向コード抽
出処理を行う（ステップＳ１０２）。Next, the object formed by connecting the black pixels in the frame detection area obtained by the frame area determination processing is contour-traced to extract a frame area direction code, and the direction code is made to correspond to the outline coordinates. A frame area direction code extraction process to be stored in the storage unit 16 is performed (step S102).

【００２３】次に、方向コード記憶部１６に記憶された
枠領域方向コードから文字枠内側位置の候補を求め、枠
線候補記憶部１７に記憶させる枠線候補検出処理を行う
（ステップＳ１０３）。Next, a candidate for a position inside the character frame is obtained from the frame area direction code stored in the direction code storage unit 16 and a frame line candidate detection process for storing the candidate in the frame line candidate storage unit 17 is performed (step S103).

【００２４】次に、枠線候補記憶部１７に記憶された枠
線候補位置に基づき作成した候補枠線が書き込まれた画
像に対し、ラベリングを行った後、文字枠のコーナ位置
候補座標を検出し、枠コーナ候補座標記憶部１８に記憶
させる枠コーナ候補作成処理を行う（ステップＳ１０
４）。Next, labeling is performed on the image in which the candidate frame lines created based on the frame line candidate positions stored in the frame line candidate storage unit 17 are written, and then the corner position candidate coordinates of the character frame are detected. Then, a frame corner candidate creation process to be stored in the frame corner candidate coordinate storage unit 18 is performed (step S10).
4).

【００２５】最後に、枠コーナ候補座標記憶部１８に記
憶された文字枠のコーナ位置候補座標と、文字枠登録デ
ータ格納部１３に記憶された枠座標値を含む文字枠登録
データから、文字枠内側の位置を決定する枠位置決定処
理を行う（ステップＳ１０５）。Lastly, the character frame corner position candidate coordinates stored in the frame corner candidate coordinate storage unit 18 and the character frame registration data including the frame coordinate values stored in the character frame registration data storage unit 13 are used to determine the character frame. A frame position determination process for determining an inner position is performed (step S105).

【００２６】このように、文字枠位置検出部１４により
文字枠位置検出処理が行われるが、上述した各ステップ
における処理内容を、図４〜図１２を参照して以下に詳
述する。図４は、図１のイメージスキャナ部で読み取り
イメージデータ記憶部に記憶された帳票イメージデータ
の一例を示す説明図である。図５〜図１２は、図１の文
字枠位置検出部による帳票イメージデータからの文字枠
位置検出動作の説明図である。As described above, the character frame position detecting process is performed by the character frame position detecting unit 14. The processing contents in each of the above-described steps will be described in detail below with reference to FIGS. FIG. 4 is an explanatory diagram showing an example of the form image data read by the image scanner unit of FIG. 1 and stored in the image data storage unit. 5 to 12 are explanatory diagrams of the character frame position detection operation from the form image data by the character frame position detection unit in FIG.

【００２７】図４に示すように、イメージデータ記憶部
１２に記憶された帳票イメージデータは、文字枠の枠線
がかすれていて、３個の文字枠１９ａ，１９ｂ，１９ｃ
を正常に検出することができない状態にある。As shown in FIG. 4, the form image data stored in the image data storage unit 12 has three character frames 19a, 19b and 19c in which the character frame lines are blurred.
Cannot be detected normally.

【００２８】図５及び図６により、文字枠領域決定処理
（ステップＳ１０１）における処理内容を説明する。The processing contents in the character frame area determination processing (step S101) will be described with reference to FIGS.

【００２９】帳票イメージデータ中の枠（文字枠）位置
は、イメージスキャナ１１の機構精度及び帳票用紙の縦
・横方向の誤差によって、枠登録データに登録された位
置とずれた箇所にあり、この幾何学的誤差は、上述した
機構精度及び帳票用紙品質により左右される。ここで、
帳票イメージデータの水平方向の誤差値をＮ＿Ｈ、垂直
方向の誤差許容値をＮ＿Ｖとし、各枠の検出領域は、文
字枠登録データが示す位置に対して水平方向に＋／−Ｎ
＿Ｈの幅を持ち、垂直方向に＋／−Ｎ＿Ｖの幅を持つと
する。The position of the frame (character frame) in the form image data is shifted from the position registered in the frame registration data due to the mechanism accuracy of the image scanner 11 and the vertical and horizontal errors of the form paper. The geometric error depends on the mechanism accuracy and the form paper quality described above. here,
The error value of the form image data in the horizontal direction is N_H, the allowable error value in the vertical direction is N_V, and the detection area of each frame is +/- N in the horizontal direction with respect to the position indicated by the character frame registration data.
_H, and +/- N_V in the vertical direction.

【００３０】文字枠登録データから求められたｎ文字目
の枠の左上座標位置を（Ｘｎｌ、Ｙｎｕ）とし、右下座
標位置を（Ｘｎｒ、Ｙｎｄ）とした場合に、各枠につい
て枠を構成する上下左右の線分を探索する領域は、枠左縦線の枠検出領域の左上座標：（Ｘｎｌ−Ｎ＿Ｈ、Ｙｎｕ−Ｎ＿Ｖ）右下座標：（Ｘｎｌ＋Ｎ＿Ｈ、Ｙｎｄ＋Ｎ＿Ｖ）枠右縦線の枠検出領域の左上座標：（Ｘｎｒ−Ｎ＿Ｈ、Ｙｎｕ−Ｎ＿Ｖ）右下座標：（Ｘｎｒ＋Ｎ＿Ｈ、Ｙｎｄ＋Ｎ＿Ｖ）枠上横線の枠検出領域の左上座標：（Ｘｎｌ−Ｎ＿Ｈ、Ｙｎｕ−Ｎ＿Ｖ）右下座標：（Ｘｎｒ＋Ｎ＿Ｈ、Ｙｎｕ＋Ｎ＿Ｖ）枠下横線の枠検出領域の左上座標：（Ｘｎｌ−Ｎ＿Ｈ、Ｙｎｄ−Ｎ＿Ｖ）右下座標：（Ｘｎｒ＋Ｎ＿Ｈ、Ｙｎｄ＋Ｎ＿Ｖ）となる。When the upper left coordinate position of the nth character frame obtained from the character frame registration data is (Xnl, Ynu) and the lower right coordinate position is (Xnr, Ynd), a frame is formed for each frame. The upper, lower, left and right line segments are searched for in the upper left coordinate of the frame left vertical line detection area: (Xnl-N_H, Ynu-N_V) Lower right coordinate: (Xnl + N_H, Ynd + N_V) Upper left coordinate: (Xnr-N_H, Ynu-N_V) Lower right coordinate: (Xnr + N_H, Ynd + N_V) Upper left coordinate of the frame detection area of the horizontal line on the frame: (Xnl-N_H, Ynu-N_V) Lower right coordinate: (Xnr + N_H, Ynu + N_V) Upper left coordinate of the frame detection area of the horizontal line below the frame: (Xnl-N_H, Ynd-N_V) Lower right coordinate: (Xnr + N_H, Ynd + N_V)

【００３１】図５（ａ）は、帳票イメージデータの１枠
目の２本の縦線の枠検出領域を示す説明図、図５（ｂ）
は、帳票イメージデータの２枠目の２本の縦線の枠検出
領域を示す説明図、図５（ｃ）は、帳票イメージデータ
の３枠目の２本の縦線の枠検出領域を示す説明図であ
る。図６（ａ）は、帳票イメージデータの１枠目の２本
の横線の枠検出領域を示す説明図、図６（ｂ）は、帳票
イメージデータの２枠目の２本の横線の枠検出領域を示
す説明図、図６（ｃ）は、帳票イメージデータの３枠目
の２本の横線の枠検出領域を示す説明図である。FIG. 5A is an explanatory diagram showing two vertical line frame detection areas in the first frame of the form image data, and FIG.
Is an explanatory diagram showing two vertical line frame detection areas in the second frame of the form image data. FIG. 5C shows two vertical line frame detection areas in the third frame of the form image data. FIG. FIG. 6A is an explanatory diagram illustrating a frame detection area of two horizontal lines in the first frame of the form image data, and FIG. 6B is a diagram illustrating detection of two horizontal lines in the second frame of the form image data. FIG. 6C is an explanatory diagram showing a frame detection region of two horizontal lines in a third frame of the form image data.

【００３２】図７及び図８により、枠領域方向コード抽
出処理（ステップＳ１０２）における処理内容を説明す
る。The contents of the frame area direction code extraction processing (step S102) will be described with reference to FIGS.

【００３３】図７（ａ）は、輪郭トレースにおいて現ト
レースポイントをｉとし前トレースポイントを（ｉ−
１）とした場合の、枠領域方向コードを示す説明図であ
る。輪郭トレースにおいて、次のトレースポイントの探
索は現トレースポイントを中心に反時計周りに行うもの
とする。図７（ｂ）は、３×３画素の物体を例としてそ
の外周輪郭をトレースした場合の、枠領域方向コードを
示す説明図である。図７（ｃ）は、方向コード記憶部に
記憶される枠領域方向コードのデータを示す説明図であ
る。FIG. 7A shows that the current trace point is i in the contour trace and the previous trace point is (i-
It is explanatory drawing which shows the frame area direction code in case of 1). In the contour trace, the search for the next trace point is performed counterclockwise around the current trace point. FIG. 7B is an explanatory diagram showing a frame area direction code when the outer peripheral contour of an object of 3 × 3 pixels is traced as an example. FIG. 7C is an explanatory diagram showing the data of the frame area direction code stored in the direction code storage unit.

【００３４】図８（ａ）は、物体中の２×２の大きさを
有する穴領域についての、枠領域方向コードを示す説明
図である。図８（ｂ）は、図８（ａ）の枠領域方向コー
ドにより得られたデータを示す説明図である。FIG. 8A is an explanatory diagram showing a frame region direction code for a hole region having a size of 2 × 2 in an object. FIG. 8B is an explanatory diagram showing data obtained by the frame area direction code of FIG. 8A.

【００３５】次の枠線候補検出処理（ステップＳ１０
３）では、枠領域方向コードを特徴として用い、線の左
辺は枠領域方向コード０が検出され、線の右辺は枠領域
方向コード４が検出され、線の上辺は枠領域方向コード
６が検出され、線の下辺は枠領域方向コード２が検出さ
れる、という特徴を用いる。Next frame line candidate detection processing (step S10)
In 3), a frame region direction code 0 is detected on the left side of the line, a frame region direction code 4 is detected on the right side of the line, and a frame region direction code 6 is detected on the upper side of the line. Then, the lower side of the line has a feature that the frame area direction code 2 is detected.

【００３６】これにより、図５（ａ）、図５（ｂ）、図
５（ｃ）、図６（ａ）、図６（ｂ）、及び図６（ｃ）に
示す各枠検出領域内の物体の外周の枠領域方向コード
と、穴がある場合の内周の枠領域方向コードを求めて、
輪郭座標と対応させ方向コード記憶部１６に記憶する。As a result, each frame detection area shown in FIGS. 5 (a), 5 (b), 5 (c), 6 (a), 6 (b) and 6 (c) is obtained. Find the frame area direction code on the outer circumference of the object and the frame area direction code on the inner circumference when there is a hole,
It is stored in the direction code storage unit 16 in association with the contour coordinates.

【００３７】図９により、枠線候補検出処理（ステップ
Ｓ１０３）における処理内容を説明する。The contents of the frame line candidate detection processing (step S103) will be described with reference to FIG.

【００３８】この処理内では、方向コード記憶部１６か
ら枠領域方向コードを読み出し、枠の左側の縦線の検出
領域内で線の右側の特徴である枠領域方向コード４を持
つ輪郭画素を垂直方向に投影して、ヒストグラム（Ｈｉ
ｓｔｇｒａｍ）を生成する。同様に、同枠の右側の縦線
の枠検出領域内で線の左側の特徴である枠領域方向コー
ド０を持つ画素を垂直方向に投影して、ヒストグラムを
生成する。In this process, the frame area direction code is read out from the direction code storage unit 16 and the outline pixel having the frame area direction code 4 which is a feature on the right side of the line is detected in the vertical line detection area on the left side of the frame. Direction, and the histogram (Hi
stgram). Similarly, in a vertical line frame detection area on the right side of the frame, pixels having a frame area direction code 0, which is a feature on the left side of the line, are projected in the vertical direction to generate a histogram.

【００３９】更に、同枠の上側の横線の検出領域内で線
の下側の特徴である枠領域方向コード２を持つ輪郭画素
を水平方向に投影して、ヒストグラムを生成する。同枠
の下側の横線についても、同線の検出領域内で線の上側
の特徴である枠領域方向コード６を持つ輪郭画素を水平
方向に投影して、ヒストグラムを生成する。Further, a contour pixel having a frame area direction code 2, which is a characteristic of the lower side of the line, in the upper horizontal line detection area of the same frame is projected in the horizontal direction to generate a histogram. Regarding the lower horizontal line of the frame, a contour pixel having a frame area direction code 6 which is a feature above the line in the detection area of the line is projected in the horizontal direction to generate a histogram.

【００４０】各処理の後、各枠検出領域毎にヒストグラ
ム値の多い順に２位までのヒストグラム値を検出した位
置を、枠線候補記憶部１７に記憶する。After each process, the position where the second highest histogram value is detected in the order of the histogram value in each frame detection area is stored in the frame line candidate storage unit 17.

【００４１】図９（ａ）は、１枠目から求めたヒストグ
ラムの例を示す説明図である。図９（ｂ）は、２枠目か
ら求めたヒストグラムの例を示す説明図である。図９
（ｃ）は、３枠目から求めたヒストグラムの例を示す説
明図である。上記各図において、枠線候補位置は、Ｘ１
１〜Ｘ１ａのＸ座標値及びＹ１１〜Ｙ１ａのＹ座標値で
あるが、これらを枠線候補記憶部１７に記憶する。FIG. 9A is an explanatory diagram showing an example of a histogram obtained from the first frame. FIG. 9B is an explanatory diagram illustrating an example of a histogram obtained from the second frame. FIG.
(C) is an explanatory view showing an example of a histogram obtained from the third frame. In the above figures, the frame line candidate position is X1
The X coordinate values of 1 to X1a and the Y coordinate values of Y11 to Y1a are stored in the frame line candidate storage unit 17.

【００４２】図１０により、枠コーナ候補作成処理（ス
テップＳ１０４）における処理内容を説明する。With reference to FIG. 10, the contents of the processing for creating a frame corner candidate (step S104) will be described.

【００４３】各枠検出領域内に、枠線候補記憶部１７の
Ｘ座標値を通る縦線及びＹ座標値を通る横線をそれぞれ
引いて、これらの縦・横線が引かれた画像を作成し、作
成された画像に対しラベリングを行う。その後、検出し
た閉領域の各コーナ座標を文字枠のコーナ位置候補座標
とし、枠コーナ候補座標記憶部１８に記憶する。In each frame detection area, a vertical line passing through the X coordinate value and a horizontal line passing through the Y coordinate value of the frame line candidate storage unit 17 are drawn, and an image having these vertical and horizontal lines drawn is created. Label the created image. Thereafter, the detected corner coordinates of the closed region are set as corner position candidate coordinates of the character frame and stored in the frame corner candidate coordinate storage unit 18.

【００４４】図１０（ａ）は、図９で求めた各枠検出領
域に枠線候補位置に従って線を引いた画像の一例を示す
説明図である。図１０（ｂ）は、ラベリングで求めた閉
領域Ｌｐ１〜Ｌｐ１７を示す説明図である。図１０
（ｂ）に示す閉領域Ｌｐ１〜Ｌｐ１７の左上コーナ座
標、左下コーナ座標、右上コーナ座標、及び右下コーナ
座標を、枠コーナ候補座標とし、これらを枠コーナ候補
座標記憶部１８に記憶する。FIG. 10A is an explanatory diagram showing an example of an image obtained by drawing a line in each frame detection area obtained in FIG. 9 according to the frame line candidate position. FIG. 10B is an explanatory diagram illustrating the closed regions Lp1 to Lp17 obtained by the labeling. FIG.
The upper left corner coordinate, the lower left corner coordinate, the upper right corner coordinate, and the lower right corner coordinate of the closed regions Lp1 to Lp17 shown in (b) are set as frame corner candidate coordinates, and these are stored in the frame corner candidate coordinate storage unit 18.

【００４５】図１１により、枠位置決定処理（ステップ
Ｓ１０５）における処理内容を説明する。Referring to FIG. 11, the contents of the frame position determination process (step S105) will be described.

【００４６】１枠目の左側の縦線の枠検出領域と１枠目
の上側の横線の枠検出領域が重なった領域内の、左上の
各枠コーナ候補座標を、１枠目左上座標候補とする。各
１枠目左上座標候補を基準として、枠登録データに従っ
て相対的に決めた各文字枠の左上、左下、右上及び右下
の各座標位置に最も近い左上、左下、右上及び右下の各
枠コーナ候補座標との距離値を、１枠目左上座標候補毎
に累積する。この累積値が最小となる場合に選択した左
上、左下、右上及び右下の各枠コーナ候補座標を、文字
枠内側座標とする。In the region where the left vertical line frame detection region of the first frame and the upper horizontal line frame detection region of the first frame overlap, the upper left corner candidate coordinates are defined as the first upper left coordinate candidate. I do. Upper left, lower left, upper right and lower right frames closest to the upper left, lower left, upper right and lower right coordinate positions of each character frame relatively determined according to the frame registration data based on each first frame upper left coordinate candidate The distance value from the corner candidate coordinates is accumulated for each of the upper left coordinate candidates of the first frame. The upper left corner, lower left corner, upper right corner, and lower right corner corner candidate coordinates selected when the accumulated value is the minimum are set as the character frame inner coordinates.

【００４７】図１１（ａ）は、文字枠登録データに従い
閉領域Ｌｐ１を基準に文字枠を配置した場合の説明図で
ある。図１１（ｂ）は、文字枠登録データに従い閉領域
Ｌｐ１０を基準に文字枠を配置した場合の説明図であ
る。この実施の形態に示す例では、図１０（ｂ）に示す
領域Ｓ内の閉領域Ｌｐ１と閉領域Ｌｐ１０の左上コーナ
が、文字枠コーナ候補座標となり、文字枠登録データに
従って閉領域Ｌｐ１を基準に文字枠を配置した場合、図
１１（ａ）となり、同様に閉領域Ｌｐ１０を基準に文字
枠を配置した場合、図１１（ｂ）となる。FIG. 11A is an explanatory diagram of a case where a character frame is arranged based on the closed area Lp1 in accordance with the character frame registration data. FIG. 11B is an explanatory diagram when a character frame is arranged based on the closed region Lp10 in accordance with the character frame registration data. In the example shown in this embodiment, the closed area Lp1 in the area S shown in FIG. 10B and the upper left corner of the closed area Lp10 are character frame corner candidate coordinates, and based on the closed area Lp1 according to the character frame registration data. FIG. 11A shows a case where a character frame is arranged, and FIG. 11B shows a case where a character frame is arranged on the basis of the closed region Lp10.

【００４８】閉領域Ｌｐ１の左上コーナは、帳票イメー
ジデータの文字枠の左上コーナであり、閉領域Ｌｐ１の
左上コーナを基準として枠登録データに従って相対的に
決めた各文字枠の左上、左下、右上及び右下の各座標
は、他の枠コーナ候補座標に近接し距離が小さくなる。The upper left corner of the closed area Lp1 is the upper left corner of the character frame of the form image data. The upper left corner, the lower left, and the upper right of each character frame relatively determined based on the frame registration data based on the upper left corner of the closed area Lp1. And the lower right coordinates are close to other frame corner candidate coordinates and have a smaller distance.

【００４９】閉領域Ｌｐ１０の左上コーナは、背景印刷
から求められた枠コーナ候補座標であるために枠位置と
は関連がなく、閉領域Ｌｐ１０の左上コーナを基準とし
て枠登録データに従って相対的に決めた各文字枠の左
上、左下、右上及び右下の各座標に近接した位置に、枠
コーナ候補座標は見つからないため、距離の累積値は大
きくなる。Since the upper left corner of the closed area Lp10 is the frame corner candidate coordinate obtained from the background printing, it has no relation to the frame position, and is relatively determined based on the frame registration data based on the upper left corner of the closed area Lp10. Since no corner candidate coordinates are found at positions close to the upper left, lower left, upper right, and lower right coordinates of each of the character frames, the cumulative value of the distance increases.

【００５０】結果として、閉領域ＬＰ１の左上コーナが
枠位置基準と決定されて、図１１（ａ）に示す下記位置
の座標が枠内側位置として決定され、この枠内のイメー
ジが文字認識部１５に出力される。As a result, the upper left corner of the closed area LP1 is determined as the frame position reference, and the coordinates of the following positions shown in FIG. 11A are determined as the frame inner positions. Is output to

【００５１】左上座標位置右上座標位置左下座標位置右下座標位置１枠目（Ｘ12，Ｙ11）（Ｘ14，Ｙ11）（Ｘ12，Ｙ13）（Ｘ14，Ｙ14）２枠目（Ｘ15，Ｙ15）（Ｘ17，Ｙ15）（Ｘ15，Ｙ17）（Ｘ17，Ｙ17）３枠目（Ｘ18，Ｙ18）（Ｘ1a，Ｙ18）（Ｘ18，Ｙ1a）（Ｘ1a，Ｙ1a）そして、文字認識部１５は、文字枠位置検出部１４が検
出した枠内側座標内の帳票イメージデータに対する文字
認識を行う。Upper left coordinate position Upper right coordinate position Lower left coordinate position Lower right coordinate position First frame (X12, Y11) (X12, Y13) (X12, Y13) (X14, Y14) Second frame (X15, Y15) (X17, Y15) (X15, Y17) (X17, Y17) Third frame (X18, Y18) (X1a, Y18) (X18, Y1a) (X1a, Y1a) Then, the character recognition unit 15 Character recognition is performed on the form image data within the detected coordinates inside the frame.

【００５２】図１２は、図１の文字認識装置による文字
認識処理結果である、検出した枠内側座標内の文字イメ
ージデータを示す説明図である。図１２に示すように、
上述した文字認識処理により、３個の文字枠１９ａ，１
９ｂ，１９ｃを正常に検出することができない状態にあ
った帳票イメージデータから、文字枠位置を正確に検出
して、文字枠内に記入された文字を認識することができ
る。FIG. 12 is an explanatory diagram showing the character image data within the detected coordinates inside the frame, which is the result of the character recognition processing by the character recognition device of FIG. As shown in FIG.
By the above-described character recognition processing, three character frames 19a, 1
The character frame position can be accurately detected from the form image data in a state where the characters 9b and 19c cannot be normally detected, and the character entered in the character frame can be recognized.

【００５３】このように、本発明によれば、イメージス
キャナの特性や帳票用紙の印刷文字枠の反射率によって
帳票イメージデータ中で枠線がかすれたり、文字枠周辺
に背景印刷があったりしても、文字枠周辺に印刷された
背景や文字枠内に記入された文字を文字枠と誤ることな
く、文字枠の位置を正確に検出することができる。As described above, according to the present invention, the frame lines are blurred in the form image data or the background is printed around the character frame depending on the characteristics of the image scanner and the reflectance of the print character frame on the form paper. Also, it is possible to accurately detect the position of the character frame without erroneously determining the background printed around the character frame or the character written in the character frame as the character frame.

【００５４】よって、従来の画素分布に基づく選択によ
る文字枠位置検出ではできなかった文字枠及び文字と背
景との判別ができるので、文字枠位置を正確に検出して
文字枠内に記入された文字を認識することができる。Therefore, since the character frame and the character and the background, which cannot be detected by the conventional character frame position detection based on the pixel distribution, can be distinguished, the character frame position can be accurately detected and written in the character frame. Characters can be recognized.

【００５５】[0055]

【発明の効果】以上説明したように、本発明によれば、
文字枠に記入された文字を光学的に読み取り画像認識す
る際、文字枠の位置は、文字枠の検出領域内で得られた
枠領域方向コードに基づいて検出されるので、イメージ
データ中で文字枠がかすれていたり文字枠周辺に背景印
刷があっても、文字枠位置を正確に検出して、文字枠内
に記入された文字を認識することができる。As described above, according to the present invention,
When the characters written in the character frame are optically read and recognized, the position of the character frame is detected based on the frame area direction code obtained in the detection area of the character frame. Even if the frame is faint or there is a background print around the character frame, the character frame position can be accurately detected, and the character entered in the character frame can be recognized.

【００５６】また、本発明に係る文字認識装置により、
上記文字認識方法を実現することができる。Further, according to the character recognition device of the present invention,
The above character recognition method can be realized.

[Brief description of the drawings]

【図１】本発明の実施の形態に係る文字認識装置の構成
を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a character recognition device according to an embodiment of the present invention.

【図２】文字枠登録データ格納部に格納される文字枠登
録データの一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of character frame registration data stored in a character frame registration data storage unit.

【図３】図１の文字枠位置検出部による文字枠位置検出
処理の流れを示すフローチャートである。FIG. 3 is a flowchart illustrating a flow of a character frame position detection process by a character frame position detection unit in FIG. 1;

【図４】図１のイメージスキャナ部で読み取りイメージ
データ記憶部に記憶された帳票イメージデータの一例を
示す説明図である。FIG. 4 is an explanatory diagram showing an example of form image data read by an image scanner unit of FIG. 1 and stored in an image data storage unit;

【図５】帳票イメージデータの２本の縦線による枠検出
領域を示し、（ａ）は、１枠目の説明図、（ｂ）は、２
枠目の説明図、（ｃ）は、３枠目の説明図である。5A and 5B show a frame detection area by two vertical lines of the form image data, FIG. 5A is an explanatory diagram of a first frame, and FIG.
(C) is an explanatory diagram of the third frame.

【図６】帳票イメージデータの２本の横線による枠検出
領域を示し、（ａ）は、１枠目の説明図、（ｂ）は、２
枠目の説明図、（ｃ）は、３枠目の説明図である。6A and 6B show a frame detection area by two horizontal lines of the form image data, FIG. 6A is an explanatory diagram of a first frame, and FIG.
(C) is an explanatory diagram of the third frame.

【図７】（ａ）は、輪郭トレースにおいて現トレースポ
イントをｉとし前トレースポイントを（ｉ−１）とした
場合の、枠領域方向コードを示す説明図、（ｂ）は、３
×３画素の物体を例としてその外周輪郭をトレースした
場合の、枠領域方向コードを示す説明図、（ｃ）は、方
向コード記憶部に記憶される枠領域方向コードのデータ
を示す説明図である。FIG. 7A is an explanatory diagram showing a frame area direction code when the current trace point is i and the previous trace point is (i-1) in the contour trace, and FIG.
FIG. 3C is an explanatory diagram showing a frame region direction code when the outer contour of the object is traced as an example of an object of 3 pixels, and FIG. 4C is an explanatory diagram showing frame region direction code data stored in a direction code storage unit. is there.

【図８】（ａ）は、物体中の２×２の大きさを有する穴
領域についての、枠領域方向コードを示す説明図であ
り、（ｂ）は、（ａ）の枠領域方向コードにより得られ
たデータを示す説明図である。FIG. 8A is an explanatory diagram showing a frame region direction code for a hole region having a size of 2 × 2 in an object, and FIG. 8B is a diagram showing a frame region direction code of FIG. FIG. 4 is an explanatory diagram showing obtained data.

【図９】図９は、ヒストグラムの一例を示し、（ａ）
は、１枠目から求めた例の説明図であり、（ｂ）は、２
枠目から求めた例の説明図、（ｃ）は、３枠目から求め
た例の説明図である。FIG. 9 shows an example of a histogram, and FIG.
FIG. 4 is an explanatory diagram of an example obtained from the first frame, and FIG.
FIG. 3C is an explanatory diagram of an example obtained from the frame, and FIG. 4C is an explanatory diagram of an example obtained from the third frame.

【図１０】（ａ）は、図９で求めた各枠検出領域に枠線
候補位置に従って線を引いた画像の一例を示す説明図、
（ｂ）は、ラベリングで求めた閉領域Ｌｐ１〜Ｌｐ１７
を示す説明図である。FIG. 10A is an explanatory diagram showing an example of an image in which a line is drawn in each frame detection area obtained in FIG. 9 according to a frame line candidate position,
(B) shows closed regions Lp1 to Lp17 obtained by labeling.
FIG.

【図１１】文字枠登録データに従い文字枠を配置した場
合を示し、（ａ）は、閉領域Ｌｐ１を基準にした説明図
であり、（ｂ）は、閉領域Ｌｐ１０を基準にした説明図
である。11A and 11B show a case where a character frame is arranged according to character frame registration data. FIG. 11A is an explanatory diagram based on a closed region Lp1, and FIG. 11B is an explanatory diagram based on a closed region Lp10. is there.

【図１２】図１の文字認識装置による文字認識処理結果
である、検出した枠内側座標内の文字イメージデータを
示す説明図である。12 is an explanatory diagram showing the character image data in the detected coordinates inside the frame, which is the result of the character recognition processing by the character recognition device of FIG. 1;

【図１３】文字枠を正常に検出できない状態を示し、
（ａ）は、文字枠がかすれている帳票イメージデータの
説明図、（ｂ）は、（ａ）の領域より求めた黒画素投影
によるヒストグラムの説明図、（ｃ）は、誤検出された
文字枠位置の説明図である。FIG. 13 shows a state in which a character frame cannot be normally detected;
(A) is an explanatory diagram of form image data in which a character frame is blurred, (b) is an explanatory diagram of a histogram obtained by projecting black pixels obtained from the region (a), and (c) is an erroneously detected character. It is explanatory drawing of a frame position.

[Explanation of symbols]

１０文字認識装置１１イメージスキャナ部１１１２イメージデータ記憶部１２１３文字枠登録データ格納部１３１４文字枠位置検出部１４１５文字認識部１５１６方向コード記憶部１６１７枠候補記憶部１７１８枠コーナ候補座標記憶部１８１９ａ，１９ｂ，１９ｃ文字枠（Ｘ＿Ｓ，Ｙ＿Ｓ）座標Ｄｘ距離Ｈ縦幅Ｌｐ閉領域Ｓ領域Ｗ横幅 Reference Signs List 10 Character recognition device 11 Image scanner unit 11 12 Image data storage unit 12 13 Character frame registration data storage unit 13 14 Character frame position detection unit 14 15 Character recognition unit 15 16 Direction code storage unit 16 17 Frame candidate storage unit 17 18 Frame corner Candidate coordinate storage unit 18 19a, 19b, 19c Character frame (X_S, Y_S) Coordinate Dx Distance H Vertical width Lp Closed area S Area W Width

Claims

[Claims]

1. A character recognition method for optically reading a character written in a character frame and recognizing an image, wherein a position of the character frame is detected based on a frame region direction code obtained in a detection region of the character frame. A character recognition method characterized in that:

2. A character frame area determining process for determining a detection area of the character frame from image data obtained by performing image processing on the character frame and character frame registration data including a frame coordinate value. A frame area direction code extraction process for extracting the frame area direction code from the detected detection area, and a frame line candidate detection for obtaining a frame line candidate position that is a candidate for a character frame inside position from the extracted frame area direction code Processing, a frame corner candidate creating process for detecting corner position candidate coordinates of the character frame after performing labeling on an image in which candidate frame lines created based on the frame line candidate positions are written; and From the candidate coordinates and the character frame registration data,
The character recognition method according to claim 1, wherein the detection is performed through a frame position determination process for determining the position inside the character frame.

3. The character according to claim 1, wherein the frame area direction code is obtained by contour tracing an object formed by connecting black pixels in the detection area of the character frame. Recognition method.

4. The character recognition method according to claim 1, wherein said character frame is printed in a non-dropout color.

5. A character recognition device for optically reading a character written in a character frame and recognizing an image, wherein a position of the character frame is detected based on a frame region direction code obtained in a detection region of the character frame. A character recognition device comprising character frame position detection means.

6. The character frame position detecting means determines a detection region of the character frame from image data obtained by performing image processing on the character frame and character frame registration data including a frame coordinate value. The frame region direction code is extracted from the inside, a frame line candidate position that is a candidate for a character frame inside position is obtained from the extracted frame region direction code, and a candidate frame line created based on the frame line candidate position is written. After performing labeling on the image, the corner position candidate coordinates of the character frame are detected, and from the corner position candidate coordinates and the character frame registration data,
The character recognition device according to claim 5, wherein the position inside the character frame is determined.

7. The character according to claim 5, wherein the frame area direction code is obtained by contour tracing an object formed by connecting black pixels in the detection area of the character frame. Recognition device.

8. The character recognition device according to claim 5, wherein a form in which characters are written in a character frame is optically read,
An image scanner unit for obtaining form image data composed of binary data by photoelectric conversion, an image data storage unit for storing form image data output from the image scanner unit, and including a character frame coordinate value registered in advance Character recognition further comprising: a character frame registration data storage unit that stores character frame registration data; and a character recognition unit that performs character recognition based on the image of the character frame output from the character frame position detection unit. apparatus.

9. The character recognition apparatus according to claim 5, wherein said character frame is printed in a non-dropout color.