JPH1139429A

JPH1139429A - Character recognition device

Info

Publication number: JPH1139429A
Application number: JP9191069A
Authority: JP
Inventors: Misako Suwa; 美佐子諏訪; Satoshi Naoi; 聡直井; Yoshinobu Hotsuta; 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1997-07-16
Filing date: 1997-07-16
Publication date: 1999-02-12
Anticipated expiration: 2017-07-16
Also published as: JP3391223B2

Abstract

(57)【要約】【課題】手書き文字を認識する文字認識装置において
は、誤読文字を減じると共に帳票記入の制限を減らすこ
とが要求される。本発明は、書き間違いを除去するため
の消し線がついているか否かを判定し、ついていると判
定したときはリジェクトすることにより、文字認識の精
度を高めることを目的とする。【解決手段】入力された文字パターンの特徴データを抽
出し辞書データと比較して、文字を認識する文字認識装
置であって、文字パターンから、所定方向の線分のパタ
ーンを抽出する線分パターン生成部と、生成された線分
パターンから特徴データを抽出する線分抽出部と、線分
パターンの特徴データを解析して消し線であるか否かを
判定する消し線判定部とを有するように構成する。 (57) [Summary] In a character recognition device for recognizing handwritten characters, it is required to reduce misread characters and reduce restrictions on form entry. SUMMARY OF THE INVENTION It is an object of the present invention to improve the accuracy of character recognition by determining whether or not a strike-out line for removing a writing error is provided, and rejecting when it is determined to be provided. A character recognition device for recognizing characters by extracting feature data of an input character pattern and comparing the extracted data with dictionary data, wherein a line segment pattern for extracting a line segment pattern in a predetermined direction from the character pattern is provided. It has a generation unit, a line segment extraction unit that extracts feature data from the generated line segment pattern, and a strikeout determination unit that analyzes feature data of the line segment pattern and determines whether or not it is a strikeout line. To be configured.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、光学的文字認識装
置（ＯＣＲ）に関する。手書き文字を認識する文字認識
装置においては、誤読文字を減じると共に帳票記入の制
限を減らすことが要求される。[0001] The present invention relates to an optical character recognition device (OCR). In a character recognition device for recognizing handwritten characters, it is required to reduce misread characters and reduce restrictions on form entry.

【０００２】[0002]

【従来の技術】図９は従来の文字認識装置（ＯＣＲ）の
構成概念図である。文字認識装置１の各部は以下のよう
に動作する。まず観測部２は、帳票Ａ上に記入された文
字（および文字枠）をスキャナ等で読み取り、光電変換
後に２値の画像データに変換する。文字切りだし部３
は、帳票画像から各文字を１文字づつに分離する。前処
理部４は、切り出した文字に対して雑音除去、大きさの
正規化等を行う。特徴抽出部５は、文字認識用の特徴値
を抽出する。辞書部６には、各文字種に対する特徴値の
辞書が格納されている。辞書照合部７は、抽出した文字
の特徴値と辞書との照合を行ない、最も特徴値が近い文
字種を候補とし、結果出力部８はそれに従って認識結果
Ｂを出力する。2. Description of the Related Art FIG. 9 is a conceptual diagram of a conventional character recognition device (OCR). Each part of the character recognition device 1 operates as follows. First, the observation unit 2 reads a character (and a character frame) written on the form A with a scanner or the like, and after photoelectric conversion, converts the character into binary image data. Character cutout part 3
Separates each character from the form image one by one. The preprocessing unit 4 performs noise removal, size normalization, and the like on the extracted characters. The feature extracting unit 5 extracts a feature value for character recognition. The dictionary unit 6 stores a dictionary of characteristic values for each character type. The dictionary matching unit 7 compares the extracted characteristic value of the character with the dictionary, selects the character type having the closest characteristic value as a candidate, and the result output unit 8 outputs the recognition result B accordingly.

【０００３】特徴抽出方式には、各種のものがあるが、
図１０に一例を示す。この例は、図１０（１）に示すよ
うに、文字の画素パターンの輪郭部の画素を抽出し、そ
れぞれ輪郭の方向を示す方向コードを付ける方式であ
る。この例では横、右上がり斜め、縦、左上がり斜めの
４つの方向に分けている。そして、各方向ごとに画素を
計数し、その並びを特徴ベクトルとして扱う（図１０
（2 ）参照）。辞書部６にも同じようにして文字種ごと
のベクトル値が記録してある。入力された文字のベクト
ルと、辞書のベクトルとの距離（特徴距離）を計算し近
い順に候補とする。このとき、候補文字との距離が遠い
場合や、１位候補と２位候補との差が少ない場合には認
識不能とする場合もある。There are various types of feature extraction methods.
FIG. 10 shows an example. In this example, as shown in FIG. 10A, a pixel of a contour portion of a pixel pattern of a character is extracted, and a direction code indicating a direction of the contour is attached to each pixel. In this example, the image is divided into four directions: horizontal, upward diagonal, vertical, and left diagonal. Then, the pixels are counted for each direction, and the arrangement is treated as a feature vector (FIG. 10).
(See (2)). The dictionary unit 6 also stores vector values for each character type in the same manner. The distance (feature distance) between the input character vector and the dictionary vector is calculated and candidates are set in ascending order. At this time, if the distance from the candidate character is long, or if the difference between the first and second candidates is small, the recognition may not be possible.

【０００４】従来、ＯＣＲの帳票では、記入を間違えた
文字列に対しては文字列を貫通する線分（消し線）を記
入することにより、それらを読み取らないというリジェ
クト機能が搭載されたものがある（特開昭６１−３６８
７４）。しかしこれは、１文字の幅と比較して「充分に
長い線分」という条件のもとで有効な手法であり、消し
線長が文字幅程度の１文字のみを訂正した消し線には有
効ではない。Conventionally, some OCR forms have a reject function in which a line segment (strikethrough) penetrating a character string is entered for a character string entered incorrectly so that the character string is not read. (Japanese Unexamined Patent Publication No. 61-368)
74). However, this is an effective method under the condition of "sufficiently long line segment" compared to the width of one character. is not.

【０００５】１文字に対して消し線を記入する場合は、
例えば、文字枠内を塗りつぶす、または文字枠を横断す
る長い水平線分を引くというルールが使われる。このル
ールが守られれば、本来の文字との特徴距離の差が明確
であるため区別できるが、このルールは記入者にとって
負担になるので、守られないことが多い。図１１に手書
き文字を入力した帳票の例を示す。従って、記入者が消
し線を付けたつもりの文字がリジェクトされずに誤読さ
れてしまうケースがかなりの割合で発生する可能性があ
る。例えば、「０」に短い横棒による消し線をつけたも
のを「８」と誤る。これは「消し線付きの０」と「８」
との特徴距離が近いため区別することが容易でないため
である。[0005] When writing a strikeout for one character,
For example, a rule is used to fill the inside of the character frame or draw a long horizontal line that crosses the character frame. If this rule is obeyed, the difference in characteristic distance from the original character is clear and can be distinguished, but since this rule places a burden on the writer, it is often not observed. FIG. 11 shows an example of a form in which handwritten characters are input. Therefore, there is a possibility that a considerable percentage of the characters that the writer intends to have a strike-through are misrecognized without being rejected. For example, “0” with a short horizontal line crossed out is mistaken for “8”. This is "0 with a strikethrough" and "8"
This is because it is not easy to distinguish them because the feature distance is short.

【０００６】従来方式でこの問題を避けるには、本来の
文字のどれに相当するかを判定する他に、消し線付文字
との区別を判定する必要があり、特徴距離を大きくする
ために数多くの特徴を抽出する必要がある。特徴辞書も
大掛かりなものになる。In order to avoid this problem in the conventional method, it is necessary to judge which character is equivalent to the original character, and also to judge the character from being erased. Needs to be extracted. The feature dictionary will also be large.

【０００７】[0007]

【発明が解決しようとする課題】本発明は、消し線の形
態は、横方向、縦方向、斜め方向の線分によることが最
も頻度が高いことに着目して、文字が記入された帳票の
画像から１文字毎に切りだされた文字パターンを対象と
して、消し線がついているか否かを判定し、ついている
と判定したときはリジェクトすることにより、文字認識
の精度を高めることを目的とする。SUMMARY OF THE INVENTION The present invention focuses on the fact that the form of a strike-through line is most frequently determined by horizontal, vertical and oblique line segments, An object of the present invention is to improve the accuracy of character recognition by determining whether or not a character pattern cut out for each character from an image has a strike-through line, and rejecting when it is determined to be. .

【０００８】[0008]

【課題を解決するための手段】図１に本発明の文字認識
装置の原理構成図を示す。観測部２、文字切り出し部
３、前処理部４、特徴抽出部５、辞書部６、辞書照合部
７、結果出力部８は従来と同じでよい。なお、これらの
部分はこの図と異なる構成であってもよい。FIG. 1 is a block diagram showing the principle of a character recognition apparatus according to the present invention. The observation unit 2, the character cutout unit 3, the preprocessing unit 4, the feature extraction unit 5, the dictionary unit 6, the dictionary collation unit 7, and the result output unit 8 may be the same as the conventional one. Note that these portions may have configurations different from those in this drawing.

【０００９】請求項１の発明: 文字パターンから、所
定方向の線分のパターンを抽出する線分パターン生成部
91と、生成された線分パターンから特徴データを抽出す
る線分抽出部92と、線分パターンの特徴データを解析し
て消し線であるか否かを判定する消し線判定部93とを有
するように構成する。線分抽出部92は、文字パターンか
ら、所定方向の、例えば横方向の線分に対応するパター
ンを抜き出して、線分パターンを生成する。この生成
は、例えば次の請求項２のように行う。A first aspect of the present invention is a line segment pattern generating section for extracting a line segment pattern in a predetermined direction from a character pattern.
91, a line segment extraction unit 92 that extracts feature data from the generated line segment pattern, and a line eraser determination unit 93 that analyzes characteristic data of the line segment pattern and determines whether or not the line is a strikeout line. The configuration is as follows. The line segment extraction unit 92 extracts a pattern corresponding to a line segment in a predetermined direction, for example, a horizontal direction, from the character pattern, and generates a line segment pattern. This generation is performed, for example, as in claim 2 below.

【００１０】消し線判定部93は、生成された線分パター
ンから特徴データを抽出する。特徴データを抽出するに
は請求項３のようにヒストグラム方式でもよいし、文字
パターンから特徴データを抽出する方式、その他でもよ
い。消し線判定部93は、特徴データを解析して、文字パ
ターンの一部ではない線分が存在するか否かを判断す
る。そのような線分があれば、それは消し線とみなして
その文字を結果出力部により削除処理を行うことにな
る。A strike-out determination unit 93 extracts feature data from the generated line segment pattern. The feature data may be extracted by a histogram method as described in claim 3, a method of extracting feature data from a character pattern, or the like. The strike-through determination unit 93 analyzes the feature data and determines whether there is a line segment that is not a part of the character pattern. If there is such a line segment, it is regarded as a strike-through line and the character is deleted by the result output unit.

【００１１】請求項２の発明: 線分パターン生成部91
は、抽出対象方向に長い長方形の抽出窓を用い、文字パ
ターン上で移動させ、抽出窓の中の黒画素数が所定値以
下の場合は抽出窓の全画素を白画素とし、所定値より多
ければ全画素を黒画素として変換することにより線分パ
ターンを生成するように構成する。A second aspect of the present invention: a line segment pattern generator 91
Is moved on a character pattern using a rectangular extraction window that is long in the direction of extraction, and if the number of black pixels in the extraction window is less than or equal to a predetermined value, all pixels in the extraction window are assumed to be white pixels, and For example, a line segment pattern is generated by converting all pixels as black pixels.

【００１２】図２に線分抽出の説明図を示す。細長い抽
出窓の中で黒画素が多ければ、その部分はその方向の線
分である可能性が高い。また、黒画素が少ないならその
部分はその方向の線分ではない可能性が高い。従って、
図２（ａ）に示すように、それぞれ黒画素、白画素に置
き換えたパターンを生成することにより、図２（ｂ）の
ように抽出窓方向の線分を構成する画素が抽出されるこ
とになる。FIG. 2 is an explanatory diagram of line segment extraction. If there are many black pixels in the elongated extraction window, there is a high possibility that the portion is a line segment in that direction. If there are few black pixels, there is a high possibility that the portion is not a line segment in that direction. Therefore,
As shown in FIG. 2A, by generating a pattern replaced with a black pixel and a white pixel, pixels forming a line segment in the extraction window direction are extracted as shown in FIG. 2B. Become.

【００１３】請求項３の発明: 消し線判定部93は、線
分パターンを抽出対象方向に走査して画素数ヒストグラ
ムを生成し、生成されたヒストグラムの所定の領域にあ
るピークの幅と高さとから消し線であるか否かを判定す
るように構成する。The eraser determination unit 93 scans the line segment pattern in the direction to be extracted to generate a histogram of the number of pixels, and determines the width and height of the peak in a predetermined area of the generated histogram. It is configured to determine whether or not it is a strike-through line.

【００１４】図３に文字と線分抽出との関係の説明図を
示す。線分パターンに存在する画素はすべて消し線由来
のものとは限らないので、消し線である可能性が高いも
のを取り出す。例えば、上下の境界に近い水平線分をも
つ文字（図３（２）参照）が多いので、また、消し線が
上下の境界付近に存在することは少ないので、横方向の
ヒストグラムを判断する場合に上下境界に近いピークを
除いたほうが精度が高くなる。FIG. 3 is an explanatory diagram of the relationship between characters and line segment extraction. Since all the pixels present in the line segment pattern are not necessarily derived from the strike-out line, those having a high possibility of being strike-through lines are extracted. For example, since there are many characters having a horizontal line segment near the upper and lower boundaries (see FIG. 3 (2)), and since the strike-out line rarely exists near the upper and lower boundaries, when determining the histogram in the horizontal direction, The accuracy is higher when peaks near the upper and lower boundaries are removed.

【００１５】請求項４の発明: 線分抽出部92は、線分
パターンから、文字パターンからの特徴データ抽出と同
様にして特徴データを抽出し、消し線判定部93は、抽出
された特徴データを解析して消し線か否かを判定するよ
うに構成する。A line segment extracting section 92 extracts feature data from a line segment pattern in the same manner as feature data extraction from a character pattern. Is analyzed to determine whether or not it is a strike-through line.

【００１６】請求項５の発明: 消し線判定部93は、線
分抽出部92により抽出された線分の特徴データを解析す
るに当たり、入力された文字パターンの特徴データから
認識された文字種に対応して設定された判定基準を用い
て、消し線であるか否かを判定するように構成する。In the invention, the strike-out determination unit 93 analyzes the characteristic data of the line segment extracted by the line segment extraction unit 92 and corresponds to the character type recognized from the characteristic data of the input character pattern. It is configured to determine whether or not it is a strike-out line by using the determination criteria set as above.

【００１７】例えば、「０」や「８」のパターンは横方
向の走査によるヒストグラムには閾値を越えるピークを
もたないが、図３（２）に示すように、「４」の標準的
パターンは閾値を越えるピークを中央付近に１つもつ。
従って、「０」や「８」と認識されたパターンの中央付
近にヒストグラムのピークが存在する場合は消し線が付
けられている可能性が高いが、「４」と認識されたパタ
ーンの中央付近にヒストグラムのピークがあっても、消
し線が付けられている可能性は低い。よって、消し線の
存在の判定は、文字種ごとに変えるほうが精度が高くな
る。For example, the pattern of "0" or "8" has no peak exceeding the threshold value in the histogram by the horizontal scanning, but the standard pattern of "4" as shown in FIG. Has one peak near the center exceeding the threshold.
Therefore, if the peak of the histogram exists near the center of the pattern recognized as “0” or “8”, it is highly likely that a strike-out line is attached, but the center of the pattern recognized as “4” is high. Even if there is a peak in the histogram, it is unlikely that the line is struck out. Therefore, the determination of the presence of a strike-through line is more accurate when it is changed for each character type.

【００１８】請求項６の発明: 線分パターン生成部91
は複数の線分パターンを生成し、線分抽出部92は、線分
パターンごとに線分の特徴データを抽出し、消し線判定
部93は、いずれかの方向の消し線の有無を検出するよう
に構成する。A sixth aspect of the invention: a line segment pattern generating section 91
Generates a plurality of line segment patterns, a line segment extraction unit 92 extracts line segment characteristic data for each line segment pattern, and a strikeout line determination unit 93 detects the presence or absence of a strikeout line in any direction. The configuration is as follows.

【００１９】横方向の他、縦方向、右上がり斜め方向、
左上がり斜め方向の場合も同様にすればよい。なお、消
し線は横線にするというようなルールを決めることがで
きれば横方向のみを検出することで高速にすることもで
きる。In addition to the horizontal direction, the vertical direction, the diagonally upward right direction,
The same may be applied to the case of a diagonally up left direction. It should be noted that if a rule can be determined such that the strikeout line is a horizontal line, the speed can be increased by detecting only the horizontal direction.

【００２０】請求項７の発明: 線分パターン生成部91
は、必要があれば、文字パターンを所定の角度だけ回転
させてから、線分パターンを生成するように構成する。
例えば、４５度ずつ回転させて４回行えば、斜め、縦、
の処理もその後の処理は同じことをくりかえすだけでよ
い。The invention according to claim 7: a line segment pattern generating section 91.
Is configured to rotate a character pattern by a predetermined angle, if necessary, before generating a line segment pattern.
For example, if it is rotated 45 degrees and performed four times,
Subsequent processes need only be repeated the same.

【００２１】請求項８の発明: 文字認識装置におい
て、文字パターンから所定方向の線分のパターンを抽出
させ、抽出された線分パターンの特徴データを抽出さ
せ、線分パターンの特徴データを解析して消し線である
か否かを判定させる処理を行わせることを特徴とする文
字認識プログラムをコンピュータ読み取り可能な記録媒
体に記録しておく。In the character recognition apparatus, a line segment pattern in a predetermined direction is extracted from the character pattern, feature data of the extracted line segment pattern is extracted, and the feature data of the line segment pattern is analyzed. A character recognition program is recorded in a computer-readable recording medium, wherein the character recognition program performs a process of determining whether the character is a strike-out line.

【００２２】以上のように構成することにより、記入者
が記入を誤ったことを示すために記入した消し線を適切
に判断してその文字を除去することができる。従って消
したはずの文字が別の文字に誤認識されることが少なく
なり、精度を高くすることができる。With the above-described configuration, the character can be removed by appropriately determining the erased line entered by the writer to indicate that the entry has been made incorrectly. Therefore, the character that should have been erased is less likely to be erroneously recognized as another character, and the accuracy can be improved.

【００２３】[0023]

【発明の実施の形態】本発明を適用した光学的文字認識
装置（ＯＣＲ）の例を説明する。図４は、主に手書き数
字を対象とし文字幅程度の線分による消し線をつけるこ
とを許容する文字認識装置の構成ブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS An example of an optical character recognition device (OCR) to which the present invention is applied will be described. FIG. 4 is a block diagram showing the configuration of a character recognition device which mainly allows handwritten numerals to be crossed out by a line segment having a character width.

【００２４】この実施例の全体の処理の流れ図を図５に
示す。 s1〜s6: 文字認識装置１は、帳票上に記入された文字
（および文字枠）をスキャナ等で読み取って、光電変換
後に二値の画像データに変換する観測部２、帳票画像か
ら各文字を従来方式により１文字づつに分離する文字切
りだし部３、切り出した文字に対して雑音除去、大きさ
の正規化等を行う前処理部４、文字認用特徴値を抽出す
る特徴抽出部５、各文字種に対する特徴値の辞書が格納
されている辞書部６、抽出した文字認識用特徴値と辞書
との照合を行い中間認識結果を出力する辞書照合部７を
備える。ここまでは従来方式と同じである。FIG. 5 shows a flow chart of the overall processing of this embodiment. s1 to s6: The character recognition device 1 reads a character (and a character frame) written on a form with a scanner or the like, and converts it into binary image data after photoelectric conversion. A character extracting unit 3 for separating each character by a conventional method, a preprocessing unit 4 for removing noise from the extracted characters, normalizing the size, and the like; a feature extracting unit 5 for extracting a character recognition feature value; A dictionary unit 6 in which a dictionary of feature values for each character type is stored, and a dictionary matching unit 7 that matches the extracted character recognition feature values with the dictionary and outputs an intermediate recognition result. Up to this point, it is the same as the conventional method.

【００２５】s7〜s10 : 中間認識結果と前処理部が生成
した正規化された文字パターンとは、消し線処理を行う
ため、線分パターン生成部91と消し線判定部93とに送ら
れる。消し線判定は、正規化文字パターンに対して、横
方向、縦方向、右上がり斜め方向、左上がり斜め方向の
線分を検出することにより行う。また、文字種ごとにま
た、方向ごとに判定のための閾値を設定してある閾値フ
ァイル94を用意してあり、中間認識結果で第一候補とさ
れた文字種に対応した閾値セットをとりだし、それに基
づいて、消し線があるか否かを判定する。S7 to s10: The intermediate recognition result and the normalized character pattern generated by the preprocessing unit are sent to a line segment pattern generation unit 91 and a deletion line determination unit 93 to perform a deletion process. The erased line determination is performed by detecting a line segment in a horizontal direction, a vertical direction, a diagonally rising right direction, and a diagonally rising left direction with respect to the normalized character pattern. In addition, a threshold file 94 in which a threshold for determination is set for each character type and for each direction is prepared, and a threshold set corresponding to the character type set as the first candidate in the intermediate recognition result is taken out. To determine whether there is a strike-through.

【００２６】消し線付文字と判定した場合は、認識結果
文字種コードをリジェクトコードに付け替え、最終認識
結果として出力する。消し線の無い通常文字として判定
した場合は、中間認識結果をそのまま最終認識結果とし
て出力する。If it is determined that the character is a strike-through character, the recognition result character type code is replaced with a reject code, and the result is output as the final recognition result. If it is determined that the character is a normal character without a strikeout line, the intermediate recognition result is output as it is as the final recognition result.

【００２７】以下に、消し線判定の処理を中心に説明す
る。消し線処理は、図４に示す、線分パターン生成部9
1、線分抽出部92、消し線判定部93、閾値ファイル94、
および、線分パターン回転部（図示していない）より行
われる。In the following, description will be made focusing on the process of determining a strike-through line. The strike-out processing is performed by the line segment pattern generation unit 9 shown in FIG.
1, line segment extraction unit 92, strikeout determination unit 93, threshold file 94,
And a line segment pattern rotation unit (not shown).

【００２８】線分パターン生成部91は、正規化文字パタ
ーンに対し、以下のようにして、線分パターンを抽出す
る。消し線処理の流れ図を図６に示す。The line segment pattern generator 91 extracts a line segment pattern from the normalized character pattern as follows. FIG. 6 shows a flow chart of the strike-through processing.

【００２９】(1) 縦、横、右上がり斜め、左上がり斜め
方向線分の抽出判定対象とする消し線は縦、横、斜め方向の線分による
消し線とする。ここで、斜め方向とは、文字外接長方形
の対角線方向とする。図７に、文字外接長方形と抽出線
分の方向を示す。図７(1)(2)の細枠が文字の外接長方
形、図７(1) の１の線が横方向、２が左上がり斜め方
向、３が縦方向、４が右上がり斜め方向である。(1) Extraction of vertical, horizontal, upward-sloping diagonal, and left-upward diagonal line segments A strike-out line to be determined is a strike-out line composed of vertical, horizontal, and diagonal line segments. Here, the oblique direction is the diagonal direction of the circumscribed rectangle of the character. FIG. 7 shows the directions of the character circumscribed rectangle and the extracted line segments. 7 (1) and (2) are the circumscribed rectangles of the characters, and 1 in FIG. 7 (1) is the horizontal direction, 2 is the diagonal direction going up left, 3 is the vertical direction, and 4 is the diagonal direction going right up. .

【００３０】横方向を例にあげて、線分抽出を以下に説
明する。他の方向を抽出するには、外接長方形の対角線
方向の角度を算出し、その角度だけ文字パターンの座標
変換を行ってから、その角度だけ回転させて（図７(2)
参照）、同じことを行えばよい。または、順次固定角度
で４５度、９０度、１３５度回転させるようにしてもよ
い。The extraction of line segments will be described below by taking the horizontal direction as an example. To extract another direction, the angle of the diagonal direction of the circumscribed rectangle is calculated, the coordinate of the character pattern is converted by that angle, and then rotated by that angle (FIG. 7 (2)
See) and do the same. Alternatively, the rotation may be sequentially performed at a fixed angle of 45 degrees, 90 degrees, and 135 degrees.

【００３１】文字パターン領域内で、縦横サイズｎ×
ｍの横長の長方形（ｎ＜ｍ）の窓を考える。この領域内
の黒画素数をカウントし、長方形内全画素数（ｎ×ｍ
個）に対する黒画素数の比Ｂを求める。In the character pattern area, the vertical and horizontal size nx
Consider an m-long rectangular (n <m) window. The number of black pixels in this area is counted, and the total number of pixels in the rectangle (n × m
), The ratio B of the number of black pixels to the number

【００３２】この閾値をＴＨ１としたとき、Ｂ＞ＴＨ
１の場合、長方形領域内部の画素を、全て黒画素で置き
換える。Ｂ≦ＴＨ１の場合、長方形領域内部の画素を、
全て白画素で置き換える。窓を、文字外接長方形内でく
まなく移動させ、水平線分パターンを抽出する。図２
（ａ）に、ｎ×ｍ＝３×８、ＴＨ１＝０．７とした例を
示す。When this threshold value is set to TH1, B> TH
In the case of 1, all the pixels inside the rectangular area are replaced with black pixels. When B ≦ TH1, pixels inside the rectangular area are
All are replaced with white pixels. The window is moved all over the character circumscribed rectangle to extract a horizontal line pattern. FIG.
(A) shows an example in which n × m = 3 × 8 and TH1 = 0.7.

【００３３】(2) 各方向の線分数のカウント上記で求めた線分パターンから、画素数ヒストグラムを
生成し、所定領域内の線分数を数える。同様に、横方向
の線分を例にあげて説明する。(2) Counting the number of line segments in each direction A histogram of the number of pixels is generated from the line pattern obtained above, and the number of line segments in a predetermined area is counted. Similarly, a description will be given using a horizontal line segment as an example.

【００３４】横方向の線分パターンを横方向に走査し
て黒画素を数えた、黒画素数ヒストグラムを作成する。
ただし、図３(3) に示すように『５』や『２』のよう
に、最上部または最下部の文字線が横線分として検出さ
れる場合が数字では多いので、文字の最上部および最下
部から、幅ＬＥ＝ＬＨ×ＴＨ２の領域を除外してヒスト
グラムを作成する。ＴＨ２は外接長方形の高さに対す
る、除外領域の幅である。なお、ヒストグラムは全体に
わたって作成して、判定のときに除外するようにしても
よい。または、線分パターンを生成するときに除外する
ようにしてもよい。図６ではその例を示している。A black pixel count histogram is created by scanning the horizontal line segment pattern in the horizontal direction and counting black pixels.
However, as shown in FIG. 3 (3), since the uppermost or lowermost character line is often detected as a horizontal line segment such as “5” or “2” in the case of a numeral, the uppermost or lowermost character line is detected. From the bottom, a histogram is created excluding the area of width LE = LH × TH2. TH2 is the width of the exclusion area with respect to the height of the circumscribed rectangle. Note that the histogram may be created for the entirety, and may be excluded at the time of determination. Alternatively, it may be excluded when a line segment pattern is generated. FIG. 6 shows an example thereof.

【００３５】文字パターンの外接長方形の幅をＬＷと
したとき、ＬＣ＝ＬＷ×ＴＨ３を満たす位置ＬＣでヒス
トグラムを切断する。ＴＨ３は外接長方形幅ＬＷに対す
る切断部分幅の比である。If the width of the circumscribed rectangle of the character pattern is LW, the histogram is cut at a position LC satisfying LC = LW × TH3. TH3 is a ratio of the cut portion width to the circumscribed rectangular width LW.

【００３６】切断した残りのピーク部分（図３(1) の
市松模様部分）の切断個所からの高さをＨ、切断個所の
幅をＷとしたとき、Ｈ／Ｗ＞ＴＨ４を満たすものを水平
方向の線分と判断する。Assuming that the height of the remaining cut peak portion (the checkered portion in FIG. 3A) from the cut portion is H and the width of the cut portion is W, a portion satisfying H / W> TH4 is horizontal. Judge as a line segment in the direction.

【００３７】求めた横方向の線分の数が閾値ＴＨ５以
上のものを、水平方向の線分による消し線付き文字と判
定する。すなわち、ＬＨをパターンの外接長方形の高さ
としたとき、（Ｈ／Ｗ＞ＴＨ４) ＆（ＬＨ−ＬＥ≧ｙ≧ＬＥに存
在）であれば、横方向の線分の消し線候補である。If the number of the obtained horizontal line segments is equal to or larger than the threshold value TH5, it is determined that the character has a strike-through line formed by horizontal line segments. In other words, when LH is the height of the circumscribed rectangle of the pattern, if (H / W> TH4) & (exists in LH-LE ≧ y ≧ LE), the line is a crossed line eraser candidate.

【００３８】横方向の線分の消し線候補の本数≧ＴＨ５
であれば、横方向の線分による消し線付文字であると判
定する。 (3) ここで、ＴＨ１〜ＴＨ５の各閾値は、抽出対象線分
方向毎及び文字種毎に最適な値に設定しておき、閾値フ
ァイルとして用意しておく。判定対象文字の一次認識結
果文字種に対応して、使用する閾値のセットを決定す
る。縦、右上がり斜め、左上がり斜め線分も同様に行
う。消し線候補の探索領域は、横、縦方向は外接長方形
内であるが、斜め方向の場合は外接長方形の対角線を含
む長方形内とすればよい。Number of strike-through candidates for horizontal line segments ≧ TH5
If it is, it is determined that the character is a character with a strike-through line by a horizontal line segment. (3) Here, each threshold of TH1 to TH5 is set to an optimal value for each direction of the line segment to be extracted and for each character type, and is prepared as a threshold file. A set of thresholds to be used is determined according to the primary recognition result character type of the character to be determined. Vertical, upward-sloping, and left-upward diagonal line segments are similarly processed. The search area for a strike-through candidate is within a circumscribed rectangle in the horizontal and vertical directions, but may be in a rectangle including a diagonal line of the circumscribed rectangle in an oblique direction.

【００３９】本発明により判定された消し線付文字例を
図８にあげる。FIG. 8 shows an example of a character with a strikethrough determined according to the present invention.

【００４０】[0040]

【発明の効果】以上説明したように、本発明によれば消
し線をつけた文字を適切に判定することができ、一定方
向の線分による消し線の記入された文字の誤読を減ず
る。As described above, according to the present invention, a character with a strike-through line can be appropriately determined, and misreading of a character with a strike-through line in a certain direction can be reduced.

[Brief description of the drawings]

【図１】原理構成図Fig. 1 Principle configuration diagram

【図２】線分抽出の説明図FIG. 2 is an explanatory diagram of line segment extraction.

【図３】文字と線分図抽出例[Figure 3] Extraction example of character and line segment diagram

【図４】実施例の構成ブロック図FIG. 4 is a configuration block diagram of an embodiment.

【図５】全体処理の流れ図FIG. 5 is a flowchart of the entire process.

【図６】消し線処理の流れ図FIG. 6 is a flowchart of a strike-through processing.

【図７】線分抽出の説明図FIG. 7 is an explanatory diagram of line segment extraction.

【図８】消し線判定例FIG. 8 is an example of a strike-through judgment.

【図９】文字認識装置の構成概念図FIG. 9 is a conceptual diagram of a configuration of a character recognition device.

【図１０】文字の特徴抽出の説明図FIG. 10 is an explanatory diagram of character feature extraction.

【図１１】手書き入力帳票の例[FIG. 11] An example of a handwritten input form

[Explanation of symbols]

１文字認識装置２観測部３文字切り出し部４前処理部５特徴抽出部６辞書部７辞書照合部８結果出力部 91 線分パターン生成部 92 線分抽出部 93 消し線判定部 94 閾値ファイル DESCRIPTION OF SYMBOLS 1 Character recognition apparatus 2 Observation part 3 Character cutout part 4 Preprocessing part 5 Feature extraction part 6 Dictionary part 7 Dictionary collation part 8 Result output part 91 Line segment pattern generation part 92 Line segment extraction part 93 Strikeout line judgment part 94 Threshold file

Claims

[Claims]

1. A character recognition device for recognizing a character by extracting feature data of an input character pattern and comparing the extracted data with dictionary data, wherein the line segment extracts a line segment pattern in a predetermined direction from the character pattern. It has a pattern generation unit, a line segment extraction unit that extracts feature data from the generated line segment pattern, and a strikeout determination unit that analyzes feature data of the line segment pattern to determine whether or not it is a strikeout line A character recognition device characterized in that:

2. The character recognition device according to claim 1, wherein the line segment pattern generation unit uses a rectangular extraction window that is long in an extraction target direction, moves the character pattern on the character pattern, and sets the black in the extraction window. A character recognition apparatus characterized in that when the number of pixels is equal to or less than a predetermined value, all pixels in the extraction window are converted to white pixels, and when the number of pixels is larger than the predetermined value, all pixels are converted to black pixels to generate a line segment pattern.

3. The character recognition device according to claim 1, wherein the line segment extraction unit scans the line segment pattern in a direction to be extracted to generate a pixel number histogram, and a strike-out line determination unit. Is a character recognition device characterized in that it is determined from a width and a height of a peak in a predetermined area of a generated histogram whether or not the line is a strike-out line.

4. The character recognition device according to claim 1, wherein the line segment extracting unit extracts feature data from the line segment pattern in the same manner as feature data extraction from a character pattern. A character recognition device characterized in that a strike-through determination unit analyzes the extracted feature data to determine whether or not it is a strike-through.

5. The character recognition device according to claim 1, wherein the strike-out determination unit analyzes the input data of the line segment extracted by the line segment extraction unit. A character recognition device characterized in that it is determined whether or not a character is a strike-out line using a determination criterion set in correspondence with a character type recognized from characteristic data of a pattern.

6. The character recognition device according to claim 1, wherein the line segment pattern generation unit generates a plurality of line segment patterns, and the line segment extraction unit generates a line segment for each line segment pattern. A character recognition device characterized by extracting the characteristic data of (1), wherein the strike-through determination unit detects presence / absence of strike-through in any direction.

7. The character recognition device according to claim 1, wherein the line segment pattern generation unit generates the line segment pattern after rotating the character pattern by a predetermined angle. Character recognition device.

8. A character recognizing device for recognizing characters by extracting feature data of an input character pattern, comparing the extracted data with dictionary data, and extracting a line pattern in a predetermined direction from the character pattern. A segment pattern generating unit, a line segment extracting unit that extracts feature data of the extracted line segment pattern, and a strike-through line determining unit that analyzes the feature data of the line segment pattern and determines whether or not it is a strike-through line. A computer-readable recording medium on which a character recognition program is recorded.