JPH03282985A

JPH03282985A - Noise processing system for hand-written numerical recognition

Info

Publication number: JPH03282985A
Application number: JP2084390A
Authority: JP
Inventors: Shoji Miki; 三木　章司; Hirotaka Tsubota; 浩貴坪田; Hiroshi Kameyama; 博史亀山
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 1990-03-30
Filing date: 1990-03-30
Publication date: 1991-12-13

Abstract

PURPOSE:To surely recognize a numerical by removing a noise element only when the length of a part corresponding to the noise element is less than a prescribed value and comparing the numeral with a sorting code again. CONSTITUTION:A hand-written character is thinned like a straight line P1, a dot P2, a straight line P3, a loop P4, a dot P5, and a straight line P6. Since the dot P2 can be determined as a noise frequently appearing on a folded point or the like, the dot P2 is previously registered as a dot branch to be a noise. Since the dot P5 is not an end point branch, it is not applied to noise processing. Then, the dot branch number of a noise element is specified, whether the length of the corresponding branch is less than a prescribed value or not is decided, and when the length is less than the prescribed value, the end point dot branch less than the prescribed length is removed. Consequently, the generation of misrecognition due to noise can be prevented.

Description

【発明の詳細な説明】発明の目的：（産業上の利用分野）この発明は、小切手等に手書きされた数字認識の記識率
を向上させるための数字認識における雑音処理方式に関
する。DETAILED DESCRIPTION OF THE INVENTION Object of the Invention: (Industrial Application Field) The present invention relates to a noise processing method in number recognition for improving the recognition rate of handwritten numbers on checks and the like.

（従来の技術）手書きの数字等を認識する認識方式として、特開平１−
１１６７８１号公報、特開平１−１１６７８２号公報で
示されるような方式かある。これらは手書き数字を前処
理して細線化した後、ループ、直線及びアークの要素に
よる特徴抽出を行ない、予め登録されている分類コード
と比較して手書き数字を認識する方法である。(Prior art) As a recognition method for recognizing handwritten numbers, etc.,
There are methods such as those shown in Japanese Patent Publication No. 116781 and Japanese Patent Application Laid-Open No. 1-116782. These methods involve pre-processing handwritten digits to thin them, extracting features using elements of loops, straight lines, and arcs, and comparing the extracted features with pre-registered classification codes to recognize the handwritten digits.

（発明が解決しようとする課題）上述した従来の認識方式では、短かい線も雑音として前
処理で除去するようになっている。従って、例えば第１
２図（八）で示す手書き文字はひげ部１が前処理の段階
で雑音として除去され、同図（Ｂ）で示すような数字と
して処理されてしまうため、“８”とは認識することが
できなかった。(Problem to be Solved by the Invention) In the conventional recognition method described above, short lines are also removed as noise through preprocessing. Therefore, for example, the first
In the handwritten character shown in Figure 2 (8), whisker part 1 is removed as noise in the preprocessing stage and processed as a number as shown in Figure 2 (B), so it cannot be recognized as "8". could not.

この発明は上述のような事情より成されたものてあり、
この発明の目的は、一定の条件を満たす場合にのみ雑音
処理を行なうことにより、確実な数字の認識を可能とし
た手書き数字認識方式における雑音処理方式を提供する
ことにある。This invention was made due to the above-mentioned circumstances.
SUMMARY OF THE INVENTION An object of the present invention is to provide a noise processing method in a handwritten digit recognition method that enables reliable number recognition by performing noise processing only when certain conditions are met.

発明の構成（課題を解決するための手段）この発明は、手書き数字を前処理して細線化した後、ル
ープ、直線及びアークの要素による特徴抽出を行ない、
予め登録されている分類コードと比較して認識する手書
き数字認識における雑音処理方式に関するものて、この
発明の上記目的は、前記前処理では雑音を除去せずに残
しておき、予め登録されている前記分類コードには雑音
を含んだコードを用意しておき、認識結果が前記雑音を
含んだコードに該当するとき、雑音要素に該当する部分
の長さか所定以下のときのみ除去して再度前記分類コー
ドと比較することによって達成される。Structure of the Invention (Means for Solving the Problems) This invention preprocesses handwritten digits to thin them, and then extracts features using loop, straight line, and arc elements.
Regarding a noise processing method in handwritten digit recognition that is recognized by comparison with pre-registered classification codes, the above object of the present invention is to leave the noise without removing it in the pre-processing and to remove the noise from the pre-registered classification codes. A code containing noise is prepared as the classification code, and when the recognition result corresponds to the code containing noise, it is removed only when the length of the part corresponding to the noise element is less than a predetermined value, and the classification is performed again. This is achieved by comparing with the code.

（作用）この発明は、小切手等に対する手書き数字を読取って前
処理して細線化した後、ループ、直線及びアークの要素
による特徴抽出を行ない、予め登録されている分類コー
ドと比較して手書き数字を１ｆｆｌ識する場合の雑音処
理方式である。そして、定値以内の長さの部分を雑音と
して無条件に除去する前処理を、特開平１−１１６７８
２号公報で示されるようなドツト処理と組合せて活用し
、文字認識のフローの中に小ループ処理、つなき処理、
Ｎ棒処理、更には２値情報を用いたつふれ対策処理を導
入した数字認識における雑音処理方式である。(Operation) This invention reads handwritten digits on checks, etc., preprocesses them to make them thinner, extracts features based on loop, straight line, and arc elements, and compares the handwritten digits with pre-registered classification codes. This is a noise processing method when recognizing 1ffl. Then, preprocessing that unconditionally removes parts with a length within a fixed value as noise is applied in Japanese Patent Application Laid-Open No. 1-11678.
It is used in combination with dot processing as shown in Publication No. 2, and small loop processing, connecting processing,
This is a noise processing method for number recognition that introduces N-bar processing and further anti-blur processing using binary information.

（実施例）この発明は前述したような特開平１−１１６７８２号公
報又は特開平１−１１６７８１号公報て示される文字認
識を前提としているが、第１図はこの発明の処理動作の
詳細を示しており、前処理された細線化文字の原型を得
（ステップ５１）、“０”〜°“９”の大分類の文字候
補を決定しくステップＳ２）、その後に候補があるか否
かを判断する（ステップＳ３）。(Example) This invention is based on the character recognition disclosed in Japanese Patent Application Laid-Open No. 1-116782 or 1-116781 as described above, and FIG. 1 shows the details of the processing operation of this invention. A preprocessed thinned character prototype is obtained (step 51), character candidates for major classifications of "0" to "9" are determined (step S2), and it is determined whether there are any candidates after that. (Step S3).

そして、候補かある場合は雑音対象となるドツトブラン
チか候補の中に有るか否かをチエツクする（ステップ５
４）。ドツト自体を基準コードパターンの作成時に適宜
定義しておく。ここに、ドツトブランチとは、細線化特
有の折れ曲り点などに生ずる細いひげ又は数字として不
要な部分てあり、例えば第３図（Ａ）て示す手書き文字
については、ＰＩは直線、Ｐ２はドツト、Ｐ３は直線、
Ｐ４はループ、Ｐ５はドツト、Ｐ６は直線として、第３
図（Ｂ）で示すように細線化される。そして、ドツトＰ
２は折れ曲り点などによく出現するノイズであることが
分つているので、これを雑音対象となるドツトブランチ
として予め登録しておく。なお、ドツトは要素の種類（
アーク、直線）は関係しない。又、ドツトＰ５について
も同様で種類は問わないが、この場合、ドツトＰ５は端
点ブランチ（片側が端点であるブランチ）ではないので
、雑音処理の対象にはならない。Then, if there are candidates, it is checked whether there is a dot branch that is a noise target among the candidates (step 5).
4). The dots themselves are appropriately defined when creating the reference code pattern. Here, dot branches are thin whiskers that occur at bending points that are unique to line thinning, or parts that are unnecessary as numbers. For example, in the handwritten character shown in Figure 3 (A), PI is a straight line and P2 is a dot. , P3 is a straight line,
P4 is a loop, P5 is a dot, P6 is a straight line, and the third
The lines are thinned as shown in Figure (B). And dot P
2 is known to be noise that often appears at bending points, etc., so this is registered in advance as a dot branch to be the noise target. Note that the dots indicate the type of element (
arcs, straight lines) are not relevant. Similarly, the type of dot P5 does not matter, but in this case, dot P5 is not an end point branch (branch with one end being an end point) and is therefore not subject to noise processing.

上記ステップＳ４において雑音対象となるドツトブラン
チが候補の中にある場合には、雑音対象のドツトブラン
チ番号の指定を行ない（ステップ５５）（第３図（Ｂ）
ではＰ２）、上記ドツトブランチ番号に対応するブラン
チの長さが所定以下か否かを判断しくステップＳ６）、
所定以下の場合には雑音処理１を行なう（ステップ５７
）。上記ステップＳ５のドツトブランチ番号の指定で述
へた端点ドツトフランチて、且つ一定長さ以下のブラン
チを除去することを、ここでは雑音処理１としている。If the dot branch to be the noise target is among the candidates in step S4, the dot branch number to be the noise target is specified (step 55) (Fig. 3(B)).
In step P2), it is determined whether the length of the branch corresponding to the dot branch number is less than or equal to a predetermined length (step S6),
If the noise is below a predetermined value, noise processing 1 is performed (step 57).
). Noise processing 1 here refers to the removal of the end point dot branches mentioned in the specification of the dot branch numbers in step S5 above and branches having a fixed length or less.

このような雑音処理１の後、又は上記ステップＳ４及び
Ｓ６でＮＯと判断された場合には文字認識の処理となり
（ステップＳ８）、その認識結果かＯＫか否かを判断し
くステップＳ９）　、ＯＫの場合には認識結果を出力す
る。文字認識の処理は前述の公報に記載されているか、
その処理は第２図で示すようになっている。すなわち、
先ず読取データに対してデータ構成、特徴抽出を行なっ
て後、特徴抽出に基づいて数字“Ｏ”〜“９”について
一応の認識を行ない、認識された数字についてそれぞれ
大分類、ウェイトチエツク（文字としての各要素毎の比
率の良さを評価する。例えは第１Ｏ図（八）の場合、同
図（Ｂ）に示す各要素か全体の中で占める割合によりチ
エツクする。）、ひすみチエツク（直線性及び滑かさを
評価する。）、ポジションチエツク（文字の各中心とな
る点の位置を評価する。例えば第１１図（Ａ）の場合、
同図（Ｂ）　に示す交点４の上下方向の位置によりチエ
ツクする。）及び詳細評価を行なって後、つぶれ文字チ
エツク及び総合判定を行なうものである。After such noise processing 1, or if NO is determined in steps S4 and S6 above, character recognition processing is performed (step S8), and it is determined whether the recognition result is OK or not (step S9), OK. In this case, the recognition result is output. Is the character recognition process described in the above-mentioned publication?
The process is shown in FIG. That is,
First, data structure and feature extraction are performed on the read data, and then the numbers "O" to "9" are recognized based on the feature extraction, and the recognized numbers are roughly classified and weight checked (as characters). For example, in the case of Figure 1O (8), check the ratio of each element shown in Figure 1 (B) to the whole.), check the distortion (straight line) ), position check (evaluate the position of each center point of the character. For example, in the case of Fig. 11 (A),
Check the vertical position of the intersection point 4 shown in FIG. 4(B). ) and detailed evaluation, then check for crushed characters and make a comprehensive judgment.

方、上記ステップＳ３て候補が無い場合、又は上記ステ
ップＳ９て認識結果かＯＫでない場合には雑音処理２と
なる（ステップ５１０）。この雑音処理２は、無条件で
一定の長さの端点ブランチを除去することを意味してい
る。この雑音処理２の後に再度文字認識を行ない（ステ
ップ５ｌｌ）、認識結果がＯＫの場合にはその結果を出
力し、認識結果がＯＫでない場合は小ループ処理となる
（ステップ５１２）。On the other hand, if there are no candidates in step S3, or if the recognition result is not OK in step S9, noise processing 2 is performed (step 510). This noise processing 2 means unconditionally removing endpoint branches of a certain length. After this noise processing 2, character recognition is performed again (step 5ll), and if the recognition result is OK, the result is output, and if the recognition result is not OK, a small loop process is performed (step 512).

小ループ処理（ステップ５２０）は、第４図で示すよう
な小ループ２をループ構成ブンラチ長が一定以下の小ル
ープを除去し、その２点間を１本の線で接続することで
ある。なお、小ループ２はホールベンでインクの出か悪
いとき、その中が抜けて細線化てループになることか多
い。かかる小ループ処理の後に再度文字認識を行ない（
ステップ５２１）、認識結果かＯＫか否かを判断しくス
テップ５２２）、ＯＫの場合にはその結果を出力し、Ｏ
Ｋでない場合にはつなき処理となる（ステップ５２３）
。このつなぎ処理は一定長さの端点間距離の２点を線て
つなぐことを意味し、このつなき処理の後に再度文字認
識を行ない（ステップ５２４）、認識結果かＯＫか否か
を判断しくステップ５２５）、ＯＫの場合には結果を出
力し、０にでない場合には縦棒除去の処理となる（ステ
ップ５２６）。縦棒除去は、第８図の（Ａ）で示すよう
な２つのループの共有ブランチとなる同図（Ｂ）の縦棒
３を除去する処理であり、この縦棒除去の後に再度文字
認識を行ないくステップ５２７）、認識結果が０にか否
かを判断して、いずれの場合もその結果を出力する（ス
テップ５２８）。The small loop processing (step 520) is to remove small loops whose loop length is less than a certain level from the small loop 2 shown in FIG. 4, and to connect the two points with one line. In addition, when the ink in the small loop 2 does not come out well with a hole vent, the inside often falls out and becomes a thin line, forming a loop. After this small loop processing, character recognition is performed again (
Step 521), determine whether the recognition result is OK or not.Step 522), if OK, output the result, and
If it is not K, a connection process is performed (step 523).
. This connecting process means connecting two points with a distance between end points of a certain length with a line. After this connecting process, character recognition is performed again (step 524), and it is determined whether the recognition result is OK or not. 525), the result is output if OK, and if it is not 0, the vertical bar is removed (step 526). Vertical bar removal is the process of removing vertical bar 3 in Figure 8 (B), which is a shared branch of two loops as shown in Figure 8 (A), and after this vertical bar removal, character recognition is performed again. Next, in step 527), it is determined whether the recognition result is 0 or not, and in either case, the result is output (step 528).

次に、文字のつぶれの誤認対策について、第６図のフロ
ーチャートを参照して説明する。代表的なつぶれの例は
第５図（Ａ）〜（Ｃ）の２値化データで示すような例か
挙げられる。つまり、第５図（Ａ）〜（Ｃ）はそれぞれ
’Ｏ”、”８°゛、°“０゛かつぶされたものであるが
、これらを細線化するとそれぞれ１”　　”９”、”６
°°となってしまうので、逆に１”“９°°、°“６゛
°等と認識したときに限って元の２値化のデータの幅か
どうかをそれぞれ調へてやれば誤認が防止できることに
なる。例えば第５図ＣＢ）の例、すなわち°′９°゛と
文字認識されたときには、先ず上部のループの平均３点
の膨らみの抽出をαとして第７図の如く得（ステップ５
３０）、更につぶれの近傍線幅をβとして抽出する（ス
テップ５３１）。そして、近傍線幅βがａを係数として
ａ・αよりも大きいか否かを判断しくステップ５３２）
、大きい場合には２値レベルのチエツクアウトを行ない
、“９”と認識した結果を無効としくステップ５３３）
、近傍線幅βの方が小さい場合にはその結果を出力する
。すなわち、”９”と認識しても良いとする。Next, countermeasures against erroneous recognition of blurred characters will be explained with reference to the flowchart shown in FIG. Typical examples of collapse include those shown in the binarized data of FIGS. 5(A) to 5(C). In other words, Figures 5(A) to (C) are 'O', '8°', and '0' or crushed, respectively, but when these are thinned, they become 1, '9', and '6, respectively.
Therefore, if you check whether it is the width of the original binarized data only when it is recognized as 1""9°°,°"6゛°, etc., you will avoid misidentification. For example, in the example shown in Fig. 5 (CB), when the character ``9°'' is recognized, first, the extraction of the average 3-point bulge of the upper loop is set to α, and the result is obtained as shown in Fig. 7 (step 5
30), and further extracts the line width near the collapse as β (step 531). Then, it is determined whether the neighboring line width β is larger than a·α using a as a coefficient (step 532).
, if it is larger, check out the binary level and invalidate the result recognized as "9" (step 533).
, if the neighboring line width β is smaller, the result is output. In other words, it may be recognized as "9".

この発明によれは、第１２図（Ａ）で示すようなひげ部
１を有する手書き数字に対して、ステップＳ３の候補判
定ては候補が有ると判定されるのてそのまま文字認識が
行なわれ、“８゛と認識てきる。According to this invention, for a handwritten numeral having whiskers 1 as shown in FIG. “I recognize it as 8゛.

ところが、従来の方法、すなわち、一定値以下の長さの
ひげを除去する処理では、雑音処理の後に文字認識を行
なっているために“６′°又はリジェクトとなる可能性
がある。However, in the conventional method, that is, in the process of removing whiskers whose length is less than a certain value, character recognition is performed after noise processing, so there is a possibility that "6'°" or "reject" will result.

発明の効果：以上のようにこの発明の雑音処理方式によれは、認識結
果か雑音を含んだコードに該当する場合、雑音要素に該
当する部分の長さか所定以下の時にのみ除去して分類判
断を行なうようにしているため、雑音による誤認識を防
止することかできる。Effects of the invention: As described above, according to the noise processing method of the present invention, if the recognition result corresponds to a code containing noise, it can be removed and classified only when the length of the part corresponding to the noise element is less than a predetermined value. This makes it possible to prevent erroneous recognition due to noise.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すフローチャート、第
２図は文字認識の概略的な動作を示すフローチャート、
第３図（Ａ）及び（Ｂ）は文字の特徴抽出を説明するた
めの図、第４図〜第９図はこの発明を説明するための図
、第１０図（Ａ）　、　（Ｂ）はウェイトチエツクを説
明するための図、ｉｌを図（Ａ）　、　（Ｂ）はポジシ
ョンチエツクを説明するための図、第１２図（八）及び
（Ｂ）は手書き文字の細線化による雑音処理を説明する
ための図である。１・・・ひげ部、２・・・小ループ、３・・・縦棒。FIG. 1 is a flowchart showing an embodiment of the present invention, FIG. 2 is a flowchart showing a general operation of character recognition,
Figures 3 (A) and (B) are diagrams for explaining character feature extraction, Figures 4 to 9 are diagrams for explaining this invention, and Figures 10 (A) and (B) are diagrams for explaining character feature extraction. Figures (A) and (B) are diagrams to explain the weight check; Figures 12 (8) and (B) illustrate noise processing by thinning handwritten characters. This is a diagram for 1... Beard part, 2... Small loop, 3... Vertical bar.

Claims

[Claims]

1. After preprocessing handwritten numbers and thinning them, features are extracted using elements of loops, straight lines, and arcs, and the numbers are recognized by comparing them with pre-registered classification codes. is left without being removed, a code containing noise is prepared as the classification code registered in advance, and when the recognition result corresponds to the code containing noise, the part corresponding to the noise element is 1. A noise processing method for handwritten digit recognition, characterized in that the noise processing method is removed only when the length of the numerals is less than a predetermined value and compared with the classification code again.