JPS61255483A

JPS61255483A - character recognition device

Info

Publication number: JPS61255483A
Application number: JP60096964A
Authority: JP
Inventors: ▲はい▼　東善; Touzen Hai; Shigemi Osada; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-05-08
Filing date: 1985-05-08
Publication date: 1986-11-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概要〕帳票に書き込まれた文字を一字毎に切り出して認識する
光学文字読取り装置（ＯＣＲ）等の文字認識装置におい
て９文字記入枠にドロップアウトカラーを用いず９通常
の印刷インクを用いた帳票を使用できるように構成する
ことによって、帳票のコストダウンを図ったもの。[Detailed Description of the Invention] [Summary] In a character recognition device such as an optical character reader (OCR) that cuts out and recognizes each character written on a form, a 9-character entry frame can be filled with 9 characters without using a dropout color. This is designed to reduce the cost of forms by allowing the use of forms that use normal printing ink.

[Industrial application field]

本発明は文字認識装置、とくに文字記入枠が印刷されて
いる帳票を用いる方式の文字認識装置にかんするもので
ある。The present invention relates to a character recognition device, and particularly to a character recognition device using a form on which a character entry frame is printed.

光学文字読取り装置（ＯＣＲ）に代表される文字認識装
置は、帳票上に手書きあるいは印刷等によって記入され
た文字パターンを１例えば光電変換走査装置等の観測装
置によって観測し、得られた画像データの中かう１文字
ぶんの画像データを取り出し、これからその文字パター
ンの特徴を抽出し。A character recognition device, typically an optical character reader (OCR), uses an observation device such as a photoelectric conversion scanner to observe a character pattern written on a form by hand or by printing, and uses the resulting image data. We took the image data of one character, and extracted the characteristics of that character pattern.

これを予め文字種毎に標準の文字パターンから抽出して
認識辞書に記憶しである標準特徴と照合する等の手順に
よってその文字パターンの文字種を決定する装置である
。This is a device that determines the character type of a character pattern through a procedure such as extracting each character type from a standard character pattern in advance, storing it in a recognition dictionary, and comparing it with a certain standard feature.

前記１文字ふんの画像データを取り出すことを文字の切
出しあるいはセグメンテーシヨンと称するが、この処理
を容易にするため、多（の文字認識装置においては文字
記入枠を印刷した帳票を使用している。The process of extracting the image data of one character is called character extraction or segmentation, and in order to facilitate this process, multi-character recognition devices use forms with character entry frames printed on them. .

文字記入枠は、認識の対象に含まれてはならないもので
あるから、これをドロップアウト（人間には見えるが観
測部では観測できないようなインクによる印刷）によっ
て印刷した帳票を用いる場合が多い。Since the character entry frame should not be included in the recognition target, forms are often used that are printed using dropout (printing using ink that is visible to humans but cannot be observed by the observation unit).

これに対し、帳票のコストダウンを図るため。In response to this, in order to reduce the cost of forms.

通常のインクによって印刷した帳票を用い、切出しのあ
と文字記入枠を分離するように構成された文字認識装置
があるが、この場合２文字記入枠が確実に分離されるよ
うに構成されていなければならない。There is a character recognition device that uses a form printed with ordinary ink and is configured to separate the character entry frame after cutting out, but in this case, it must be configured to ensure that the two character entry frames are separated. It won't happen.

[Conventional technology]

第４図は１通常のインクによって文字記入枠を印刷した
帳票を用いる方式の文字認識装置の従来例の構成図であ
り。FIG. 4 is a block diagram of a conventional example of a character recognition device using a form in which a character entry frame is printed with ordinary ink.

１は１文字パターンが記入された帳票をラスク走査によ
って観測し２画像構成画素（以下ｒ黒画素Ｊと称する）
を“１”　・背景構成画素（以下「白画素ｊと称する）
を“０”とする２値の画像データとして出力するする観
測部。1 observes a form with a single character pattern written on it by rask scanning and detects 2 image constituent pixels (hereinafter referred to as r black pixels J).
is “1” ・Background constituent pixels (hereinafter referred to as “white pixel j”)
An observation unit that outputs binary image data with "0".

２は、観測部１によって得られた例えば１文字行ぶんの
画像データの中から７文字記入枠を含む１文字ぶんの画
像データを取り出す切出し部。Reference numeral 2 denotes a cutting unit that extracts one character's worth of image data including a seven character entry frame from, for example, one character line's worth of image data obtained by the observation unit 1.

３１と３２と３３と３４と３５は、切出し部２によって
切り出された１文字分の画像データの中から９文字記入
枠を分離する（文字記入枠に対応する黒画素“１″を白
画素“０”に変換する）文字記入枠分離部の構成要素で
あり。31, 32, 33, 34, and 35 separate nine character entry frames from the image data for one character extracted by the extraction unit 2 (black pixel "1" corresponding to the character entry frame is separated from white pixel "1"). 0”) is a component of the character entry frame separation unit.

３１は切出し部２によって切り出された１文字ぶんの画
像データを二次元配列の形で格納する入力バッファ。Reference numeral 31 denotes an input buffer that stores image data for one character extracted by the extraction unit 2 in the form of a two-dimensional array.

３２は、入力バッファ３１に格納した１文字ぶんの二次
元配列画像データから、その横方向と縦方向の投影を求
める投影部（第５図参照）。Reference numeral 32 denotes a projection unit (see FIG. 5) that calculates the horizontal and vertical projections of one character's worth of two-dimensional array image data stored in the input buffer 31.

３３は、投影部３２によって得られた投影から９文字記
入枠に該当する画素を検出する検出部。33 is a detection unit that detects pixels corresponding to the 9-character entry frame from the projection obtained by the projection unit 32;

３４は、入力バッファ３１に格納する画像データから、
検出部３３によって検出された文字記入枠を分離する処
理部。34 is the image data stored in the input buffer 31,
A processing unit that separates the character entry frame detected by the detection unit 33.

３５は、処理部３４によって文字記入枠を分離された画
像データを格納する出カバソファ。Reference numeral 35 denotes an output sofa for storing image data from which character entry frames have been separated by the processing unit 34.

４は、出力バッファ３４に格納される画像データから、
当該文字パターンの特徴を抽出する特徴抽出部。4 is from the image data stored in the output buffer 34,
A feature extraction unit that extracts features of the character pattern.

５は２文字種毎に標準の文字パターンから予め抽出した
特徴を記憶する特徴辞書。5 is a feature dictionary that stores features extracted in advance from standard character patterns for each of two character types.

６は、特徴抽出部４によって抽出した特徴を。6 is the feature extracted by the feature extraction unit 4.

特徴辞書５の記憶内容と照合することによって。By comparing the stored contents of the feature dictionary 5.

当該文字パターンの文字種を決定する照合部である。This is a matching unit that determines the character type of the character pattern.

[Problem that the invention seeks to solve]

上記構成の文字認識装置は１文字記入枠を含む１文字ぶ
んの画像データの横方向と縦方向の投影から１文字記入
枠に該当する画素を検出し、これをもとの画像データか
ら分離するものである。The character recognition device configured as described above detects pixels corresponding to a single character entry frame from the horizontal and vertical projection of image data for one character including a single character entry frame, and separates this from the original image data. It is something.

このため、たとえば帳票が傾斜している場合には、切出
し部２によって切り出される画像データにも傾きが生じ
、第６図に例示するように文字記入枠に該当する画素の
検出および除去が正しく行われず、認識率が低下すると
いう問題点がある。For this reason, for example, if the form is tilted, the image data cut out by the cutout section 2 will also be tilted, and pixels corresponding to the character entry frame will not be correctly detected and removed, as illustrated in FIG. However, there is a problem that the recognition rate decreases.

したがって２本発明の目的は２文字記入枠とともに文字
パターンを入力する方式の文字認識装置における認識率
の向上を図ることにある。Therefore, it is an object of the present invention to improve the recognition rate in a character recognition device that inputs a character pattern along with a two-character entry frame.

〔問題点を解決するための手段〕第１図は本発明の原理ブロック図であり。[Means for solving problems] FIG. 1 is a block diagram of the principle of the present invention.

７１は２文字記入枠を含む１文字ぶんの画像データ中の
黒画素よって囲まれる全領域を識別する第一の識別部。Reference numeral 71 denotes a first identification unit that identifies the entire area surrounded by black pixels in one character's worth of image data including a two-character entry frame.

７２は、前記画像データ中の黒画素によって囲まれる白
画素領域を識別する第二の識別部。A second identification unit 72 identifies a white pixel area surrounded by black pixels in the image data.

７３は、第一の識別部７１によって識別された第一の領
域に属し第二の識別部７２によって識別された第二の領
域に属しない第三の領域に属する画素を識別する第三の
識別部。73 is a third identification for identifying a pixel belonging to a third area that belongs to the first area identified by the first identification unit 71 and does not belong to the second area identified by the second identification unit 72; Department.

７４は、第三の識別部７３によって識別された第三の領
域と第二の識別部７２によって識別された第二の領域と
から文字記入枠に対応する画素を検出する第一の処理部
。74 is a first processing unit that detects pixels corresponding to the character entry frame from the third area identified by the third identification unit 73 and the second area identified by the second identification unit 72;

７５は、前記文字記入枠を含む１文字ぶんの画像データ
から、第一の処理部７４によって検出された文字記入枠
に対応する画素を分離する第二の処理部である。Reference numeral 75 denotes a second processing unit that separates pixels corresponding to the character entry frame detected by the first processing unit 74 from the image data for one character including the character entry frame.

[Effect]

すなわち、第一の識別部７１では、第５図に例示したよ
うな文字記入枠を含む１文字ぶんの二次元配列画像デー
タを、縦方向を主走査方向としてラスク走査し、走査毎
に最初に検出される黒画素から最後に検出される黒画素
までの範囲を識別することによって、第２図（ａ）のよ
うな領域を求め、同様に横方向を主走査方向とするラス
ク走査によって、山）のような領域を求め、（ａ）と中
）とに共通の領域として（Ｃ）のような第一の領域■を
求める。That is, the first identification unit 71 scans the two-dimensional array image data for one character including the character entry frame as illustrated in FIG. 5, with the vertical direction as the main scanning direction. By identifying the range from the detected black pixel to the last detected black pixel, the area shown in Fig. 2(a) is obtained, and similarly, by rask scanning with the horizontal direction as the main scanning direction, the mountain ) is obtained, and a first region (C) such as (C) is obtained as a common region between (a) and middle).

第二の識別部７２では、縦方向を主走査方向として同じ
画像データをラスク走査し、黒画素によってはさまれる
白画素領域を識別することによって。The second identification unit 72 scans the same image data with the vertical direction as the main scanning direction, and identifies a white pixel area sandwiched between black pixels.

第２図（ｄ）のような領域を求め、同様に横方向を主走
査方向とするラスク走査によって（８）のような領域を
求め、（ｄ）と（ｅ）とに共通の領域として（ｆ）のよ
うな第二の領域■を求める。Find the area as shown in Fig. 2(d), and similarly perform rask scanning with the horizontal direction as the main scanning direction to find the area as shown in (8), and as the area common to (d) and (e), Find the second region ■ such as f).

第三の識別部７３では、第一の領域■に属し第二の領域
■に属しない領域として、第２図（ｇｌのような第三の
領域■（ハツチング部）を求める。The third identification unit 73 determines a third area (hatched area) as shown in FIG.

次に、第一の処理ｉ７４では、第２図（ｇ）のように第
二の領域■と第三の領域■とに識別された画像データを
、＠次に縦横四方向からラスク走査し。Next, in the first process i74, as shown in FIG. 2(g), the image data identified as the second area (2) and the third area (2) is rask-scanned from all four directions, vertically and horizontally.

第三の領域■に属する画素のうち、走査毎に最初に検出
され且つ先に検出されたものを除き所定数以下連続する
ものを検出することによって９文字記入枠に対応する画
素を検出する。Among the pixels belonging to the third area (3), the pixels corresponding to the 9-character entry frame are detected by detecting the pixels that are detected first in each scan and are consecutive at a predetermined number or less, excluding the first detected pixels.

その結果、第２図（ｈ）にハツチングによって示す領域
の画素が文字記入枠に対応する画素として検出され、こ
れを、第二の処理部７５によってもとの画像データから
分離することによって、（１）に示すような画像データ
が得られる。As a result, pixels in the area shown by hatching in FIG. Image data as shown in 1) is obtained.

以下は従来例と同様にして特徴を抽出し、これを標準文
字パターンの特徴と照合することによって、入力文字パ
ターンの認識を行う。In the following, the input character pattern is recognized by extracting features in the same manner as in the conventional example and comparing them with the features of the standard character pattern.

〔Example〕

第３図は実施例の要部の構成図であり。 FIG. 3 is a configuration diagram of the main parts of the embodiment.

第一の識別部７１は、入力バッファ３１に記憶されてい
る二次元配列画像データをラスク走査する走査部７１ａ
と、縦方向の走査において識別した画素を格納するパン
ツ７７１ｂと、横方向の走査において識別した画素を格
納するバッファ７１ｃと、バッファ７１ｂの記憶内容と
バッファ７１ｃの記憶内容とに共通する画素を抽出する
ＡＮＤ回路７１ｄとから構成される。The first identification unit 71 includes a scanning unit 71a that scans the two-dimensional array image data stored in the input buffer 31.
, pants 771b that stores the pixels identified in the vertical scan, a buffer 71c that stores the pixels identified in the horizontal scan, and pixels that are common to the storage contents of the buffer 71b and the buffer 71c are extracted. and an AND circuit 71d.

第二の識別部７２は、入力バッファ３１に記憶されてい
る二次元配列画像データをラスク走査する走査部７２ａ
と、縦方向の走査において識別した画素を格納するバッ
ファ７２ｂと、横方向の走査において識別した画素を格
納するバッファ７２ｃと、バッファ７２ｂの記憶内容と
バッファ７２ｃの記憶内容とに共通する画素を抽出する
ＡＮＤ回路７２ｄとから構成される。The second identification unit 72 includes a scanning unit 72a that scans the two-dimensional array image data stored in the input buffer 31.
, a buffer 72b that stores the pixels identified in the vertical scan, a buffer 72c that stores the pixels identified in the horizontal scan, and pixels that are common to the storage contents of the buffer 72b and the buffer 72c are extracted. and an AND circuit 72d.

第一の処理部７４は、第二の識別部７１によって得られ
た第二の領域■の識別と第三の識別部７３によって得ら
れた第三の領域■の識別とを付された画像データを二次
元配列の形で格納する入カバソファ７４ａと、入カバソ
ファ７４ａに記憶される画像データを縦横四方向からラ
スク走査する走査部７４ｂと、各走査によって抽出され
た文字記入枠に対応する画素を格納する出カバソファ７
４ｃとから構成される。The first processing unit 74 processes the image data to which the identification of the second area ■ obtained by the second identification unit 71 and the identification of the third area ■ obtained by the third identification unit 73 are attached. An input cover sofa 74a stores image data in a two-dimensional array, a scanning unit 74b scans the image data stored in the input cover sofa 74a in four directions vertically and horizontally, and scans pixels corresponding to character entry frames extracted by each scan. External cover sofa 7 to store
4c.

〔Effect of the invention〕

以上説明したように９本発明によれば１文字バターンと
ともに切り出された文字記入枠が傾いている場合でも２
文字記入枠を正確に分離し文字パターンのみを正確に抽
出できるので、認識率を向上することができる。As explained above, according to the present invention, even if the character entry frame cut out with a single character pattern is tilted, the
Since the character entry frame can be accurately separated and only the character pattern can be extracted accurately, the recognition rate can be improved.

[Brief explanation of drawings]

第１図は本発明の原理ブロック図。第２図（ａ）〜（１）は作用の説明図。第３図は実施例の要部の構成図。第４図は従来例の構成図。第５図は従来例の説明図。第６図は問題点の説明図である。図中。 ■は観測部、　　　　　　２は切出し部。４は特徴抽出部、　　　　５は特徴辞書。６は照合部、３１は入力バッファ。３５は出力バッファ、７１は第一の識別部。７２は第二の識別部、７３は第三の識別部。７４は第一の処理部、７５は第二の処理部を示す。／乙Ａ＼≠くｎ月ｅ源理フ゛す・ン２　し〕９１　幻竿　２　口雌采例ｅ簀絹２斗　５２閂暑え１ひ説球区竿　６　日 FIG. 1 is a block diagram of the principle of the present invention. FIGS. 2(a) to 2(1) are explanatory diagrams of the action. FIG. 3 is a configuration diagram of the main parts of the embodiment. FIG. 4 is a configuration diagram of a conventional example. FIG. 5 is an explanatory diagram of a conventional example. FIG. 6 is an explanatory diagram of the problem. In the figure. ■ is the observation part, 2 is the cutting part. 4 is a feature extraction unit, and 5 is a feature dictionary. 6 is a collation unit, and 31 is an input buffer. 35 is an output buffer, and 71 is a first identification section. 72 is a second identification part, and 73 is a third identification part. 74 indicates a first processing section, and 75 indicates a second processing section. /Otsu A\≠Ku n month e genri vis 2 shi〕91 phantom Rod 2 mouths Female sewage example e bamboo silk 2 Dou 52 Antsae 1 Hi Seikyu District Rod 6 days

Claims

[Claims] A first identification unit (
71), a second identification unit (72) for identifying a background pixel area surrounded by image constituent pixels in the image data, and a second identification unit (72) for identifying a background pixel area surrounded by image constituent pixels in the image data; A third identification unit (73) that identifies pixels belonging to a third area that does not belong to the second area identified by the identification unit (72) of
), and a first processing unit (7) that detects pixels corresponding to the character entry frame in the image data from the second area and the third area.
4); and a second processing unit (75) that separates pixels corresponding to the detected character entry frame from the image data.