JPH06236455A

JPH06236455A - Character recognizing device

Info

Publication number: JPH06236455A
Application number: JP5022655A
Authority: JP
Inventors: Hiroshi Yoshida; 浩史吉田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1993-02-10
Filing date: 1993-02-10
Publication date: 1994-08-23

Abstract

PURPOSE:To provide a character recognizing device which can recognize even such characters that are partly defaced by eliminating the influences of defaced characters without requiring the jobs to previously register the defaced character and with high accuracy, high operability and high performance. CONSTITUTION:A defacement detecting part 160 detects the defacement of an input character pattern and then collates the feature matrix obtained from the input character pattern with the prepared feature matrix of standard characters through an identifying part 180. At the same time, the elements of the feature matrix of defaced parts are excluded from the subjects of collation or the influences of these elements are reduced. Thus even such characters that are partly defaced can be accurately recognized with elimination of the influence of defacement.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、一部がつぶれた文字
も正確に認識のできる文字認識装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device capable of accurately recognizing a partially crushed character.

【０００２】[0002]

【従来の技術】従来、文字を高精度に認識でき、かつ文
字パタンのつぶれにも対処している文字認識装置として
は、実願平０１−１１７７９６（実開平０３−５８７５
９）に開示されている文字認識装置があった。この文字
認識装置によれば、入力文字パタンから線幅を算出し、
入力文字パタン及び前記線幅より２以上の複数方向の線
素を表す複数のサブパタンを抽出し、サブパタンの文字
枠内領域を複数の部分領域に分割し、各方向に関する文
字線の分布状態を表す特徴マトリクスを抽出し、抽出し
た特徴マトリクスを予め用意した辞書マトリクスと照合
して識別を行い候補文字名を得る。さらに、該識別結果
の候補文字の中に予め定めた特定の文字名が存在した場
合には、予め算出してある文字パタン外接枠を小領域に
分割した各小領域毎の文字線幅に基づき、当該入力文字
パタンがつぶれ文字であるか否かの判定を行い、該判定
結果に基づき当該識別結果を認識結果として出力するか
否か、即ちリジェクトするか否かを決定する、というも
のであった。2. Description of the Related Art Conventionally, as a character recognition device capable of recognizing a character with high accuracy and dealing with a crushed character pattern, Japanese Patent Application No. 01-11796 (Japanese Utility Model Application No. 03-5875).
There was a character recognition device disclosed in 9). According to this character recognition device, the line width is calculated from the input character pattern,
Extracting a plurality of sub-patterns representing input character patterns and line elements in two or more directions from the line width, dividing the sub-pattern character area into a plurality of sub-areas, and expressing the distribution state of character lines in each direction. The feature matrix is extracted, and the extracted feature matrix is collated with a dictionary matrix prepared in advance to identify the candidate character name. Further, when a predetermined specific character name is present in the candidate characters of the identification result, the pre-calculated character pattern circumscribing frame is divided into small areas based on the character line width of each small area. It is determined whether or not the input character pattern is a collapsed character, and based on the determination result, whether or not to output the identification result as a recognition result, that is, whether or not to reject. It was

【０００３】[0003]

【発明が解決しようとする課題】しかし、前記従来の認
識方法では、まず識別結果の候補文字名からつぶれ文字
である可能性を検出しているため、予め登録してある文
字のみ前述のようなリジェクトするか否かの検定ができ
るものであり、従って予め登録してない文字については
前述の処理は全く行われず、故につぶれ文字に対し通常
の認識処理を行い、確からしさの非常に低い認識結果を
出力することになり、誤読文字が増え、認識精度の高い
文字認識装置が実現できないという問題点があった。However, in the above-mentioned conventional recognition method, first, since the possibility of being a crushed character is detected from the candidate character name of the identification result, only the characters registered in advance are as described above. It is possible to test whether or not to reject, and therefore the above processing is not performed for characters that are not registered in advance, so the normal recognition processing is performed for the collapsed characters and the recognition result with very low certainty. Therefore, there is a problem that a character recognition device with high recognition accuracy cannot be realized because the number of misread characters increases.

【０００４】また、前記つぶれ文字の検定を行うために
は、つぶれ易い文字、またつぶれた場合に誤読結果とな
り易い文字名等を予め調査し、登録しておく必要がある
が、前記調査は膨大なテストデータ等を用い、専用のソ
フトウェアを用い、さらに高性能な計算機により調査さ
れるレベルの高度な解析作業であり、例えば文字認識装
置の使用者、オペレーター等が簡単に調査し登録できる
類のものではない。従って、つぶれ文字を検出し当該文
字をリジェクトするという処理を各使用者が効果的に利
用することははなはだ困難であるという問題点があっ
た。In addition, in order to perform the above-mentioned test for the crushed characters, it is necessary to investigate and register beforehand the characters that are easily crushed and the names of characters that are liable to cause misreading when crushed. It is a high-level analysis work of a level that can be investigated by using a high-performance computer, using special test data using special test data, for example, a character recognition device user, operator, etc. can easily investigate and register. Not a thing. Therefore, it is extremely difficult for each user to effectively use the process of detecting a collapsed character and rejecting the character.

【０００５】また仮に前記登録の操作が可能であったと
しても、煩雑で専門的な知識の必要な作業が必要であ
り、素人でも簡単に使える操作性の良い文字認識装置を
実現することは不可能であるという問題点があった。Even if the registration operation is possible, it requires complicated work requiring specialized knowledge, and it is not possible to realize a character recognition device with good operability that can be easily used even by an amateur. There was a problem that it was possible.

【０００６】この発明は、以上述べたような、予め登録
した文字に対してしか処理できず、新たに処理を行う登
録作業は煩雑で専門的知識が必要で簡単には行えなず、
さらに文字外接枠内を分割して作成した小領域間の文字
線幅の差異は微小でありつぶれが効果的に検出できな
い、等のために、つぶれ文字を適切にリジェクトするこ
とができず従って誤読文字を減少させることができず精
度良い認識ができないという問題点と、煩雑な操作が必
要で操作性が良く、素人でも簡単に使えるような装置が
できないという問題点を除去し、予め登録する作業が不
必要で、つぶれ部分の存在する文字に対しても、つぶれ
の影響を取り除いて、正確に認識ができる、精度が高く
さらに操作性の良い高性能な文字認識装置を提供するこ
とを目的とする。According to the present invention, as described above, only the characters registered in advance can be processed, and the registration work for newly processing is complicated and requires specialized knowledge and cannot be easily carried out.
Furthermore, the difference in the character line width between the small areas created by dividing the character circumscribed frame is so small that the collapse cannot be detected effectively. Preliminary registration work by eliminating the problems that the characters cannot be reduced and the recognition cannot be performed accurately and the problems that complicated operation is required and the operability is good, and that no device can be easily used even by an amateur. It is unnecessary to remove the influence of crushing even for characters with a crushed part, and to provide a high-performance character recognition device with high accuracy and high operability that can be recognized accurately. To do.

【０００７】[0007]

【課題を解決するための手段】この発明は前記課題を解
決するために、文字、記号、図形等の入力文字パタンを
得る画像入力部と、前記入力文字パタンを記憶する記憶
部と、前記記憶されている入力文字パタンの文字外接枠
内を複数の小領域に分割する文字枠分割部と、前記分割
された小領域毎に特徴値を算出し特徴マトリクスを求め
る特徴マトリクス抽出部と、前記特徴マトリクスを予め
用意された標準文字パタンの特徴マトリクスと照合して
認識処理を行う識別部とを具備した文字認識装置におい
て、入力文字パタンのつぶれ領域を検出するつぶれ検出
部を備え、前記識別部は、前記つぶれ検出部により検出
されたつぶれ領域に含まれる前記小領域の特徴値を、前
記特徴マトリクスの照合時に除外叉は影響度を減じて照
合処理を行うことを特徴とする。In order to solve the above-mentioned problems, the present invention provides an image input section for obtaining an input character pattern of characters, symbols, figures, etc., a storage section for storing the input character pattern, and the storage section. A character frame dividing unit that divides the character circumscribing frame of the input character pattern into a plurality of small regions; a feature matrix extracting unit that calculates a feature value for each of the divided small regions and obtains a feature matrix; In a character recognition device comprising a discriminator that performs a recognition process by collating a matrix with a characteristic matrix of a standard character pattern prepared in advance, a collapsing detector that detects a collapsing region of an input character pattern is provided, and the discriminator is And performing a matching process by excluding or reducing the influence degree of the feature value of the small area included in the collapsed area detected by the collapse detecting unit when matching the feature matrix. And it features.

【０００８】[0008]

【作用】この発明によれば、つぶれ検出部において、入
力文字パタンのつぶれを検出し、入力文字パタンから得
られた特徴マトリクスと予め用意された標準文字の特徴
マトリクスとを照合する際に、つぶれの存在する部位の
特徴マトリクスの要素については照合の対象から除外或
は当該マトリクス要素の影響を低減するようにしてい
る。従って、つぶれ部分の存在する文字に対しても、つ
ぶれの影響を低減して正確に認識ができるようになり、
前記課題が解決されるのである。According to the present invention, when the crush detecting unit detects the crush of the input character pattern and collates the feature matrix obtained from the input character pattern with the feature matrix of the standard character prepared in advance, the crush is detected. The element of the characteristic matrix of the part where the is present is excluded from the object of the matching or the influence of the matrix element is reduced. Therefore, it becomes possible to reduce the influence of the crushing and accurately recognize even the characters with the crushed part,
The above problems are solved.

【０００９】[0009]

【実施例】以下図１〜図９を参照してこの発明の文字認
識装置の実施例につき説明する。図１は本発明の文字認
識装置の一実施例を示すブロック図である。文字認識装
置１００は、画像入力部１１０、パタンレジスタ１２
０、文字枠分割部１３０、線幅算出部１４０、サブパタ
ン抽出部１５０、つぶれ検出部１６０、識別部１８０及
び出力端子１９０である。図２はつぶれ検出部１６０の
一実施例を示すブロック図であり、線幅算出部１６１、
輪郭追跡部１６２、局所線幅抽出部１６３、判定部１６
４及びつぶれ領域決定部１６５から構成される。図３は
入力文字パタンの例を示した図である。図４はサブパタ
ンの例を示した図である。図５輪郭追跡部１６２の動作
説明図である。図６はつぶれ検出部１３０の動作説明図
である。図７は特徴マトリクスの例を示す図である。図
８は標準文字パタンの例を示す図である。図９は標準文
字パタンの特徴マトリクスの例を示す図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the character recognition device of the present invention will be described below with reference to FIGS. FIG. 1 is a block diagram showing an embodiment of the character recognition device of the present invention. The character recognition device 100 includes an image input unit 110 and a pattern register 12.
0, a character frame division unit 130, a line width calculation unit 140, a sub pattern extraction unit 150, a crush detection unit 160, an identification unit 180, and an output terminal 190. FIG. 2 is a block diagram showing an example of the crush detection unit 160, which includes a line width calculation unit 161.
Contour tracking unit 162, local line width extraction unit 163, determination unit 16
4 and the collapsed area determination unit 165. FIG. 3 is a diagram showing an example of input character patterns. FIG. 4 is a diagram showing an example of sub patterns. 5 is an operation explanatory diagram of the contour tracking unit 162. FIG. 6 is an explanatory diagram of the operation of the collapse detection unit 130. FIG. 7 is a diagram showing an example of the feature matrix. FIG. 8 is a diagram showing an example of a standard character pattern. FIG. 9 is a diagram showing an example of a characteristic matrix of standard character patterns.

【００１０】以下、この実施例の文字認識装置の動作を
図１〜図９を用いて詳細に説明する。画像入力部１１０
は、文字、図形、記号等（以下、文字という）が記載さ
れた帳票からの光信号Ｓを光電変換し、例えば文字線部
を黒画素、背景部を白画素で表現した白黒２値に量子化
された電気信号（以下、帳票画像データという）を生成
し、１文字づつの文字パタン（以下、入力文字パタンと
いう）を切り出し、該切り出された入力文字パタンをパ
タンレジスタ１２０に出力する。The operation of the character recognition device of this embodiment will be described in detail below with reference to FIGS. Image input unit 110
Photoelectrically converts an optical signal S from a form in which characters, figures, symbols, etc. (hereinafter referred to as characters) are described, and for example, the character line part is quantized into black and white binary in which the background part is expressed by black pixels and the background part is expressed by white pixels. A converted electric signal (hereinafter referred to as form image data) is generated, a character pattern for each character (hereinafter referred to as an input character pattern) is cut out, and the cut out input character pattern is output to the pattern register 120.

【００１１】前記文字パタンの切り出しは以下に述べる
ような方法により行う。まず帳票画像データを行方向に
平行な方向に走査して黒画素の分布を作成し、該黒画素
の分布より１行ずつの行データを抽出する。次に前記抽
出した行データを行方向に垂直な方向に走査して、各行
について黒画素の分布を作成し、該黒画素の分布より１
文字づつの文字間隔を検出する。そして、前記作成され
た各行の黒画素の分布、及び前記検出された文字間隔を
用いて１文字づつの文字領域を検出し、該領域に基づい
て１文字づつの文字パタンを切り出す。尚、本実施例に
おいては図３に示すような入力文字パタンを得た場合を
例にし以降具体的に説明する。The character pattern is cut out by the following method. First, the form image data is scanned in a direction parallel to the row direction to create a black pixel distribution, and row data for each row is extracted from the black pixel distribution. Next, the extracted row data is scanned in a direction perpendicular to the row direction to create a black pixel distribution for each row, and 1 is calculated from the black pixel distribution.
Detect the character spacing for each character. Then, a character area for each character is detected by using the created distribution of black pixels in each row and the detected character spacing, and a character pattern for each character is cut out based on the area. In the present embodiment, the case where an input character pattern as shown in FIG. 3 is obtained will be described as an example and will be specifically described below.

【００１２】パタンレジスタ１２０は、画像入力部１１
０より入力された１文字づつの入力文字パタンを順次記
憶しておく記憶部であり、本実施例では１２８×１２８
画素の記憶容量を持つものである。本パタンレジスタ１
１０内に記憶された１文字づつの入力文字パタンは、文
字枠分割部１３０内、線幅算出部１４０、サブパタン抽
出部１５０及びつぶれ検出部１６０より自由に参照する
ことができるものである。The pattern register 120 is provided in the image input section 11
This is a storage unit that sequentially stores input character patterns input from 0 for each character. In this embodiment, 128 × 128.
It has a memory capacity of pixels. This pattern register 1
The input character pattern for each character stored in 10 can be freely referred to by the character frame division unit 130, the line width calculation unit 140, the sub pattern extraction unit 150, and the collapse detection unit 160.

【００１３】文字枠分割部１３０においては、入力文字
パタンの外接枠を抽出し、外接枠内をＮ×Ｍの小領域に
分割し、当該分割結果を特徴マトリクス抽出部１７０、
及び識別部１８０に出力する。本実施例においては、入
力文字パタンの外接枠を水平、垂直方向に各々８等分し
てできる８×８の小領域に分割するものとし、図３の入
力文字パタンの例では、図３の３０１の罫線で示すよう
な各小領域に分割される。In the character frame dividing unit 130, the circumscribing frame of the input character pattern is extracted, the circumscribing frame is divided into N × M small regions, and the division result is divided into the feature matrix extracting unit 170.
And output to the identification unit 180. In this embodiment, the circumscribing frame of the input character pattern is divided into 8 × 8 small regions formed by dividing the circumscribing frame of the input character pattern into eight parts in the horizontal and vertical directions. In the example of the input character pattern shown in FIG. It is divided into each small area as indicated by the ruled line 301.

【００１４】線幅算出部１４０においては、パタンレジ
スタ１２０内の入力文字パタンより当該入力文字パタン
を構成している線素の線幅を算出し、該算出結果の線幅
値をサブパタン抽出部１５０及び特徴マトリクス抽出部
１７０に出力するものである。本実施例においては、前
記線幅の算出は、入力文字パタンを２×２の窓で走査し
たときに２×２の窓の全ての点が黒画素となる点の個数
Ｑと、入力文字パタンの全ての黒画素数Ａとを計数し、
近似式（１）に基づいて入力文字パタンの平均線幅Ｗを
算出することにより行う。Ｗ＝Ａ／（ＡーＱ）（１）図３に示した入力文字パタンにおいては、Ａ＝２９０、
Ｑ＝１６０となり、線幅Ｗ＝２９０／（２９０ー１６
０）＝２．２０となる。In the line width calculation unit 140, the line width of the line elements constituting the input character pattern is calculated from the input character pattern in the pattern register 120, and the line width value of the calculation result is calculated in the sub pattern extraction unit 150. And to the feature matrix extraction unit 170. In this embodiment, the line width is calculated by the number Q of points at which all the points in the 2 × 2 window are black pixels when the input character pattern is scanned by the 2 × 2 window, and the input character pattern. The total number of black pixels A and
This is performed by calculating the average line width W of the input character pattern based on the approximate expression (1). W = A / (AQ) (1) In the input character pattern shown in FIG. 3, A = 290,
Q = 160, line width W = 290 / (290-16
0) = 2.20.

【００１５】サブパタン抽出部１５０においては、パタ
ンレジスタ１２０に記憶されている入力文字パタン及び
線幅算出部１４０より入力された入力文字パタンの平均
線幅より水平、垂直、左斜め、右斜めの４方向の線素を
抽出した４個のサブパタンを抽出し、特徴マトリクス抽
出部１７０に出力する。前記、サブパタン抽出の処理は
本実施例においては下記のように行う。例えば水平サブ
パタンの場合は、入力文字パタンを水平方向に走査し黒
画素の連続を検出し、黒画素の連続数Ｌが式（２）を満
足するときに当該黒画素の連続を水平方向のサブパタン
として抽出するものである。Ｌ＞２ × Ｗ（２）但し、Ｗは線幅算出部１４０より入力された平均線幅で
ある。図３に示した入力文字パタンからは図４に示すよ
うな各々水平（Ｈ）、垂直（Ｖ）、左斜め（Ｌ）、右斜
め（Ｒ）のサブパタンが抽出される。In the sub-pattern extraction unit 150, the input character patterns stored in the pattern register 120 and the average line widths of the input character patterns input from the line width calculation unit 140 are horizontally, vertically, diagonally left, and diagonally right four. The four sub-patterns obtained by extracting the line elements in the direction are extracted and output to the feature matrix extraction unit 170. In the present embodiment, the process of extracting the sub-pattern is performed as follows. For example, in the case of the horizontal sub-pattern, the input character pattern is scanned in the horizontal direction to detect the succession of black pixels, and when the succession number L of black pixels satisfies the expression (2), the succession of the black pixels is considered as a sub-pattern in the horizontal direction. Is to be extracted as. L> 2 × W (2) where W is the average line width input from the line width calculation unit 140. From the input character pattern shown in FIG. 3, horizontal (H), vertical (V), left diagonal (L), and right diagonal (R) sub-patterns are extracted as shown in FIG.

【００１６】次に、つぶれ検出部１６０の動作につい
て、詳細に説明をする。つぶれ検出部１６０内の線幅算
出部１６１においては、パタンレジスタ１２０内の入力
文字パタンより当該入力文字パタンを構成している線素
の線幅を算出し、該算出結果の線幅値を判定部１６４に
出力するものである。本実施例においては、前記線幅算
出部１４０と同様の処理で前記線幅を算出するものとす
る。Next, the operation of the crush detector 160 will be described in detail. The line width calculation unit 161 in the crush detection unit 160 calculates the line width of the line elements forming the input character pattern from the input character pattern in the pattern register 120, and determines the line width value of the calculation result. It is output to the unit 164. In the present embodiment, it is assumed that the line width is calculated by the same processing as the line width calculation unit 140.

【００１７】つぶれ検出部１６０内の輪郭追跡部１６２
においては、入力文字パタンの輪郭を追跡し、輪郭画素
の座標値、及び図５（Ａ）の番号で表現されるような次
輪郭画素への方向番号を抽出し、該座標値及び方向番号
を輪郭系列として局所線幅抽出部１６３に出力する。
尚、前記入力文字パタンの輪郭の追跡は以下のような方
法により行う。（１）パタンレジスタを左上画素よりＹ方向（以後列
方向という場合もある）に走査を行いつつ、Ｘ方向に順
次走査し黒画素を検出し、検出した黒画素をスタートの
黒画素かつ第１の輪郭画素とする。（２）第１の黒画素を中心とした図５（Ａ）に示すよ
うな８方向の隣接画素に、１から８の順に順次着目し黒
である隣接画素を検出し、当該画素を第２の輪郭画素と
する。（３）今度は第２の輪郭画素を中心とし前記（２）と
同様に８方向の隣接画素より黒画素を検出する処理を行
い第３の輪郭画素とする。（４）以降同様に輪郭画素の検出処理を、検出した輪
郭画素がスタートの輪郭画素と同一の画素となるまで繰
り返し行う。（５）検出した画素がスタートの輪郭画素と同一とな
り、前記処理が終了したら、該スタートの画素より再び
前記（１）の走査を行って黒画素を検出し、検出した黒
画素が既に抽出された輪郭の外部の画素であった場合に
は該画素をスタートの黒画素とし、前記（２）以下の処
理を同様に行う。（６）パタンレジスタ全領域について上記処理を行っ
たら終了する。A contour tracking section 162 in the crush detecting section 160.
5, the contour of the input character pattern is traced, the coordinate value of the contour pixel and the direction number to the next contour pixel as represented by the number in FIG. 5A are extracted, and the coordinate value and the direction number are calculated. It is output to the local line width extraction unit 163 as a contour series.
The contour of the input character pattern is traced by the following method. (1) While scanning the pattern register from the upper left pixel in the Y direction (hereinafter also referred to as the column direction), it sequentially scans in the X direction to detect black pixels, and the detected black pixel is the start black pixel and the first black pixel. Contour pixels. (2) By sequentially focusing on adjacent pixels in eight directions as shown in FIG. 5A centering on the first black pixel in the order from 1 to 8, the black adjacent pixel is detected, and the pixel is set to the second pixel. Contour pixels. (3) This time, the process of detecting the black pixel from the adjacent pixels in the eight directions with the second contour pixel as the center is performed as the third contour pixel. (4) Thereafter, similarly, the contour pixel detection processing is repeated until the detected contour pixel becomes the same pixel as the start contour pixel. (5) When the detected pixel becomes the same as the start contour pixel and the above processing is completed, the scanning of (1) is performed again from the start pixel to detect a black pixel, and the detected black pixel has already been extracted. If it is a pixel outside the outline, the pixel is set as a start black pixel, and the processes from (2) onward are similarly performed. (6) When the above process is performed for all the pattern register areas, the process ends.

【００１８】上述のような方法によれば、例えば図５
（Ｂ）に示すような図３の入力文字パタンの左上の部分
では５０１に示した黒画素がスタートの黒画素となり、
以降図５（Ａ）の１の方向に画素を検定した段階で各々
黒画素と判定され、５０２なる輪郭座標が検出される。
また図５（Ｃ）に示すような前記入力文字パタンの左下
部分のような場合では、輪郭座標５０４を基準として１
の方向の５０５、２の方向５０６が各々検定されるが、
各々白画素であるので３の方向の５０７の画素が輪郭座
標として検出され、以降同様に５０８、５０９なる輪郭
座標が検出される。その結果最終的に、図３に示したよ
うな入力文字パタンに対しては図６の６０１をスタート
の画素とした輪郭６０２、及び６０３をスタートの画素
とした輪郭６０４が抽出される。According to the method as described above, for example, FIG.
In the upper left portion of the input character pattern of FIG. 3 as shown in FIG. 3B, the black pixel 501 is the start black pixel,
After that, at the stage where the pixel is tested in the direction 1 in FIG. 5A, it is determined that each pixel is a black pixel, and the contour coordinates 502 are detected.
Further, in the case of the lower left portion of the input character pattern as shown in FIG.
Direction 505, 2 direction 506 are respectively tested,
Since each is a white pixel, 507 pixels in three directions are detected as the contour coordinates, and thereafter, the contour coordinates 508 and 509 are similarly detected. As a result, finally, for the input character pattern as shown in FIG. 3, a contour 602 having 601 as a start pixel and a contour 604 having 603 as a start pixel in FIG. 6 are extracted.

【００１９】次につぶれ検出部１６０内の局所線幅検出
部１６３においては、輪郭追跡部１６２より入力された
輪郭系列の各画素を基準として、当該輪郭系列の方向に
対して垂直な方向に入力文字パタンを走査し、黒画素の
連続を計数し、該連続個数を当該画素部分における前記
輪郭系列方向の線素の局所的な線幅とし、該局所線幅及
び前記輪郭系列を判定部１６４に出力する。図６に示し
た輪郭系列について前記処理を具体的に説明すると、例
えば６０５の画素においては図５（Ａ）に示す１の方向
の輪郭であるので、それに垂直な３の方向、即ち図６の
６０６に示す方向に入力文字パタンを走査し黒画素の連
続を計数し、当該画素の局所線幅を計数するのである。Next, in the local line width detecting section 163 in the crush detecting section 160, each pixel of the contour series input from the contour tracking section 162 is used as a reference in the direction perpendicular to the direction of the contour series. The character pattern is scanned, the number of consecutive black pixels is counted, the number of consecutive black pixels is set as the local line width of the line element in the contour series direction in the pixel portion, and the local line width and the contour series are determined by the determination unit 164. Output. The above process will be described in detail with respect to the contour series shown in FIG. 6. For example, since the pixel 605 has a contour in one direction shown in FIG. The input character pattern is scanned in the direction indicated by 606, the number of consecutive black pixels is counted, and the local line width of the pixel is counted.

【００２０】次に、つぶれ検出部１６０内の判定部１６
４においては、線幅算出部１６１より入力された入力文
字パタンの平均線幅Ｗ、及び局所線幅抽出部１６３より
入力された入力文字パタンの輪郭各画素の輪郭系列、及
び局所線幅より、当該入力文字パタンにつぶれた可能性
のある部分（以降つぶれ候補領域という）が存在するか
否かを判定し、つぶれ候補領域が存在した場合には当該
つぶれ候補領域の座標値をつぶれ領域決定部１６５に出
力する。尚、前記つぶれ候補領域の判定は、輪郭系列の
任意の部分が式（３）を満たすか否かにより判定をし、
該輪郭系列が式（３）を満たした場合には当該輪郭系列
及び当該輪郭系列の各画素を基準とした局所線幅の領域
を前記つぶれ候補領域とし、該輪郭系列が式（３）を満
足しなかった場合には当該輪郭系列の部分にはつぶれ候
補領域はないと判定するものである。Ｌ ≧ ＴL （３）但し、Ｌは、前記輪郭系列の中で輪郭方向が同一であり
かつ当該輪郭画素を基準とした局所線幅Ｗiが式（５）
を満たしている輪郭画素の連続個数であり、ＴLは予め
定めた固定値、或いは算出した閾値であり本実施例では
式（４）にて予め算出する。ＴL ＝ｋL × Ｗ（４）但し、ｋLは予め定めた固定値であり本実施例では２で
あり、Ｗは前述の平均線幅である。Ｗi ≧ ＴW （５）但し、ＴWは平均線幅Ｗを用いて予め算出した閾値であ
り本実施例では式（６）にて算出する。ＴW ＝ｋW × Ｗ（６）但し、ｋWは予め定めた固定値であり本実施例では２で
ある。Next, the judging section 16 in the crush detecting section 160
4, the average line width W of the input character pattern input from the line width calculation unit 161 and the contour series of each contour pixel of the input character pattern input from the local line width extraction unit 163 and the local line width are It is determined whether or not there is a portion that may be crushed in the input character pattern (hereinafter referred to as a crushed candidate area). If there is a crushed candidate area, the coordinate value of the crushed candidate area is used as the crushed area determination unit. Output to 165. The determination of the collapse candidate area is made by determining whether or not an arbitrary part of the contour sequence satisfies Expression (3),
When the contour series satisfies the expression (3), the contour series and an area having a local line width based on each pixel of the contour series are used as the collapse candidate area, and the contour series satisfies the expression (3). If not, it is determined that there is no collapse candidate area in the contour series portion. L ≧ TL (3) where L has the same contour direction in the contour series and the local line width Wi with reference to the contour pixel is expressed by equation (5).
Is a continuous number of contour pixels satisfying the above condition, and TL is a predetermined fixed value or a calculated threshold value, which is calculated in advance by the formula (4) in this embodiment. TL = kL × W (4) where kL is a predetermined fixed value, which is 2 in this embodiment, and W is the above-mentioned average line width. Wi ≧ TW (5) where TW is a threshold value calculated in advance using the average line width W, and is calculated by the equation (6) in this embodiment. TW = kW × W (6) where kW is a predetermined fixed value and is 2 in this embodiment.

【００２１】図３に示す入力文字パタンにおいては、前
述のように線幅算出部１６１において平均線幅Ｗ＝２．
２０と算出されているので、前記式（６）、及び式
（４）より閾値ＴW、ＴLはともに４．４０となる。従っ
て、本判定部１６４においては、局所線幅が５画素以上
でありかつ同方向への輪郭である輪郭画素が５画素以上
連続している輪郭部分を検出し、当該部分をつぶれ候補
領域としてつぶれ領域決定部１６５に出力するものであ
る。図６に示すように図３の入力文字パタンにおいて
は、６０７から６０８に至る輪郭系列及び当該輪郭系列
を基準とした局所線幅の部分、及び６０８から６０９に
至る輪郭系列及び当該輪郭系列を基準とした局所線幅の
部分がつぶれ候補領域としてつぶれ領域決定部１６５に
出力される。In the input character pattern shown in FIG. 3, the average line width W = 2.
Since it is calculated as 20, the threshold values TW and TL are both 4.40 according to the equations (6) and (4). Therefore, the main determination unit 164 detects a contour portion in which the local line width is 5 pixels or more and the contour pixels which are contours in the same direction are continuous for 5 pixels or more, and the portion is crushed as a crush candidate area. This is output to the area determination unit 165. As shown in FIG. 6, in the input character pattern of FIG. 3, the contour series from 607 to 608 and the portion of the local line width based on the contour series, and the contour series from 608 to 609 and the contour series are used as the reference. The portion having the local line width is output to the collapsed area determining unit 165 as a candidate area for the collapsed area.

【００２２】つぶれ領域決定部１６５においては、判定
部１６３より入力されたつぶれ候補領域より、つぶれ部
分を決定し、当該つぶれ部分の座標を識別部１８０に出
力する。尚、前記つぶれ部分の決定は、２以上のつぶれ
候補領域に共通に含まれる画素を検出し、当該画素を前
記つぶれ部分とすることにより行う。図６に示した前記
判定部１６４から入力された２箇所のつぶれ候補領域に
おいては、６０８、６０９、６１０、及び６１１の画素
に囲まれる矩形領域の画素が前記２カ所のつぶれ候補領
域に共通に含まれるため、前記領域がつぶれ領域として
識別部１８０に出力される。The crushed area determination unit 165 determines the crushed portion from the crushed candidate area input from the determination unit 163 and outputs the coordinates of the crushed portion to the identification unit 180. The crushed portion is determined by detecting a pixel commonly included in two or more crushed candidate areas and setting the pixel as the crushed portion. In the two collapse candidate areas input from the determination unit 164 shown in FIG. 6, the pixels in the rectangular area surrounded by the pixels 608, 609, 610, and 611 are commonly used in the two collapse candidate areas. Since it is included, the area is output to the identification unit 180 as a collapsed area.

【００２３】特徴マトリクス抽出部１７０においては、
サブパタン抽出部１５０より入力された水平、垂直、左
斜め、右斜めのサブパタン及び、文字枠分割部１３０寄
り入力された分割座標より、サブパタンの各分割された
小領域の黒画素数を計数し、前記黒画素計数結果及び前
記平均線幅よ、式（７）に基づいて水平、垂直、左斜め
及び右斜めの特徴マトリクスを抽出する。ＫH（m，ｎ）＝ＢH（ｍ，ｎ）／ＷＫV（m，ｎ）＝ＢV（ｍ，ｎ）／ＷＫL（m，ｎ）＝ＢL（ｍ，ｎ）／Ｗ（７）ＫR（m，ｎ）＝ＢR（ｍ，ｎ）／Ｗ但し、ＫH、ＫV，ＫL、ＫRは各々水平、垂直、左斜め、
右斜めの特徴マトリクス、ＢH、ＢV，ＢL、ＢRは各々水
平、垂直、左斜め、右斜めの黒画素マトリクス、（ｍ，
ｎ）は各マトリクスの要素番号、Ｗは前述の平均線幅で
ある図４に示すサブパタンよりは、図７に示すような特
徴マトリクスが抽出される。In the feature matrix extraction section 170,
The number of black pixels in each of the divided small areas of the sub-pattern is counted from the horizontal, vertical, left-oblique, and right-oblique sub-patterns input from the sub-pattern extracting unit 150 and the division coordinates input toward the character frame dividing unit 130. Based on the black pixel count result and the average line width, the horizontal, vertical, left diagonal, and right diagonal characteristic matrices are extracted based on Expression (7). KH (m, n) = BH (m, n) / W KV (m, n) = BV (m, n) / W KL (m, n) = BL (m, n) / W (7) KR ( m, n) = BR (m, n) / W However, KH, KV, KL, and KR are horizontal, vertical, left diagonal, and
The right diagonal feature matrix, BH, BV, BL, and BR are horizontal, vertical, left diagonal, and right diagonal black pixel matrices, respectively (m,
n) is the element number of each matrix, and W is the above-mentioned average line width. A feature matrix as shown in FIG. 7 is extracted from the sub-pattern shown in FIG.

【００２４】そして識別部１８０においては、つぶれ検
出部１６０より入力されたつぶれ部分の座標、及び文字
枠分割部１３０より入力された文字枠座標より、前記つ
ぶれ部分が前記分割小領域のどの部分にあたるかを検定
し、前記特徴マトリクスと、予め識別部１８０内に具え
る辞書内の標準文字の特徴マトリクスとの各要素のう
ち、前記つぶれ部分の小領域に該当しない要素のみ式
（８）に基づいて照合し、式（８）の距離値Ｄが最も小
さくなる標準文字の文字コードを、当該入力文字パタン
の認識結果として出力端子１５０より出力するものであ
る。Ｄ＝（Σ（ｇi − ｋi）²）^1/2 （８）但し、ｇiは標準文字の特徴マトリクスの要素、ｋiは入
力文字パタンの特徴マトリクスの要素、ｉは標準文字、
及び入力文字パタンの特徴マトリクスの内つぶれ部分に
該当しない部分の領域番号である。In the identifying unit 180, the collapsed portion corresponds to which part of the divided small area based on the coordinates of the collapsed portion input from the collapse detecting unit 160 and the character frame coordinates input from the character frame dividing unit 130. Of the elements of the feature matrix and the feature matrix of standard characters in the dictionary included in the identification unit 180 in advance, only the elements that do not correspond to the small area of the collapsed portion are based on the equation (8). The character code of the standard character having the smallest distance value D of the expression (8) is output from the output terminal 150 as the recognition result of the input character pattern. D = (Σ (gi − ki) ² ) ^1/2 (8) where gi is an element of the standard character matrix, ki is an element of the input character pattern matrix, and i is a standard character.
And an area number of a portion that does not correspond to the inside-blurred portion of the feature matrix of the input character pattern.

【００２５】出力端子１９０は、認識結果を外部に出力
するためのデータ出力端子であり、そのほかのシステム
や、認識結果を記録する媒体、通信網、そのほかの情報
処理システム等を接続するものである。The output terminal 190 is a data output terminal for outputting the recognition result to the outside, and is connected to another system, a medium for recording the recognition result, a communication network, another information processing system, or the like. .

【００２６】尚、本発明は上述した実施例にのみ限定さ
れるものではない。例えば識別部１８０に於て、式
（８）に従い照合を行う場合に、本実施例に於いてはつ
ぶれ部分に当たる領域の要素を除外して照合するものと
したが、これに限られるものではなく、例えば、入力文
字パタンの特徴マトリクスのつぶれ部分の要素値を予め
定めた固定値叉はつぶれ部分以外の要素値の平均値等に
置換して照合を行うようにしても同様の効果が得られる
ものである。The present invention is not limited to the above embodiment. For example, in the identification unit 180, when the collation is performed according to the equation (8), the collation is performed by excluding the element of the region corresponding to the collapsed portion in the present embodiment, but the invention is not limited to this. For example, the same effect can be obtained by replacing the element value of the collapsed portion of the feature matrix of the input character pattern with a predetermined fixed value or the average value of the element values other than the collapsed portion and performing the matching. It is a thing.

【００２７】また、式（６）、及び式（４）において用
いた固定値ｋW、ｋLは本実施例の値のみに限られるもの
ではなく、本発明の範囲内で任意好適な値を用いて良
い。更に、画像入力方法、パタンレジスタの構成、平均
線幅の算出方法、輪郭追跡の方法、特徴マトリクスの抽
出方法、文字枠分割方法等も本発明の範囲内で適宜変更
してもよく、各構成部分の動作、処理の仕方、入出力信
号の流れ、配設個数、位置、形状及び個数そのほかの条
件を任意好適に変更できる。Further, the fixed values kW and kL used in the equations (6) and (4) are not limited to the values of this embodiment, and any suitable value may be used within the scope of the present invention. good. Furthermore, the image input method, the pattern register configuration, the average line width calculation method, the contour tracking method, the feature matrix extraction method, the character frame division method, etc. may be appropriately changed within the scope of the present invention. The operation of parts, the method of processing, the flow of input / output signals, the number of arranged, the position, the shape, the number and other conditions can be arbitrarily changed.

【００２８】[0028]

【発明の効果】以上詳細に説明をしたように、この発明
によれば、つぶれ検出部において検出されたつぶれ部分
に該当する特徴マトリクスの要素値を除外し或はその影
響度を減じて識別を行っているので、予めつぶれ易い文
字、叉はつぶれて誤読結果として出力され易い文字等の
調査、登録をする必要がなく、煩雑な操作、専門的な知
識等は全く必要なく、素人でも簡単に操作ができる操作
性の良い文字認識装置が可能となる。As described above in detail, according to the present invention, the element value of the feature matrix corresponding to the crushed portion detected by the crush detecting section is excluded or the degree of influence thereof is reduced to perform the identification. Since it is done, there is no need to investigate and register beforehand characters that are easily crushed or characters that are easily crushed and output as misreading results, no complicated operation, no specialized knowledge is required, and even an amateur can easily A character recognition device that can be operated and has good operability becomes possible.

【００２９】また、輪郭画素個々について局所線幅を検
出し、輪郭画素の連続部分毎につぶれを検出しているた
め、正確につぶれている領域が検出でき、当該領域の影
響している特徴マトリクス要素を確実に距離計算より除
外、或はその影響度を低減できる。従って、認識精度が
高く、故に誤読率が低く、従って認識結果のチェック、
修正作業等の時間も短縮でき、業務の効率が大幅に向上
するような、精度の高い高性能な文字認識装置が実現可
能となる。Further, since the local line width is detected for each contour pixel and the crushing is detected for each continuous portion of the contour pixel, an accurately crushed area can be detected and the feature matrix affected by the area. Elements can be reliably excluded from the distance calculation or their influence can be reduced. Therefore, the recognition accuracy is high, and therefore the misreading rate is low, and therefore the recognition result check,
It is possible to realize a highly accurate and high-performance character recognition device that can shorten the time for correction work and greatly improve the efficiency of work.

[Brief description of drawings]

【図１】本発明の文字認識装置の一実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing an embodiment of a character recognition device of the present invention.

【図２】つぶれ検出部の一実施例を示すブロック図であ
る。FIG. 2 is a block diagram showing an embodiment of a crush detection unit.

【図３】入力文字パタンの一例を示す図である。FIG. 3 is a diagram showing an example of an input character pattern.

【図４】サブパタンの一例を示す図である。FIG. 4 is a diagram showing an example of a sub pattern.

【図５】輪郭追跡部の動作説明図である。FIG. 5 is an operation explanatory diagram of a contour tracking unit.

【図６】つぶれ検出部の動作説明図である。FIG. 6 is an operation explanatory diagram of a crush detection unit.

【図７】特徴マトリクスの一例を示す図である。FIG. 7 is a diagram showing an example of a feature matrix.

【図８】標準パタンの一例を示す図である。FIG. 8 is a diagram showing an example of a standard pattern.

【図９】標準パタンの特徴マトリクスの一例を示す図で
ある。FIG. 9 is a diagram showing an example of a characteristic matrix of standard patterns.

[Explanation of symbols]

１００文字認識装置１１０画像入力部１２０パタンレジスタ１３０文字枠分割部１４０線幅算出部１５０サブパタン抽出部１６０つぶれ検出部１７０特徴マトリクス抽出部１８０識別部１９０出力端子 100 Character Recognition Device 110 Image Input Unit 120 Pattern Register 130 Character Frame Division Unit 140 Line Width Calculation Unit 150 Sub Pattern Extraction Unit 160 Collapse Detection Unit 170 Feature Matrix Extraction Unit 180 Discrimination Unit 190 Output Terminal

Claims

[Claims]

1. An image input unit for obtaining an input character pattern of characters, symbols, figures, etc., a storage unit for storing the input character pattern, and a plurality of small areas within a character circumscribed frame of the stored input character pattern. A character frame division unit that divides into regions, a feature matrix extraction unit that calculates a feature value for each of the divided small regions and obtains a feature matrix, and the feature matrix is collated with a feature matrix of a standard character pattern prepared in advance. In the character recognition device including an identification unit that performs recognition processing according to the above, a crushing detection unit that detects a crushed region of an input character pattern is provided, and the identification unit is included in the crushed region detected by the crushing detection unit. A character recognition device characterized in that a characteristic value of a small area is subjected to a collation process by excluding or reducing the influence degree when collating the characteristic matrix.