JPH0576671B2

JPH0576671B2 -

Info

Publication number: JPH0576671B2
Application number: JP60106405A
Authority: JP
Inventors: Masahiro Shimizu; Mariko Takenochi
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-05-17
Filing date: 1985-05-17
Publication date: 1993-10-25
Also published as: JPS61262985A

Description

【発明の詳細な説明】産業上の利用分野本発明は新聞、雑誌等の活字および手書き文字
を認識し、たとえばJISコード等の情報量に変換
する文字認識装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a character recognition device that recognizes printed and handwritten characters from newspapers, magazines, etc., and converts them into an amount of information such as a JIS code.

従来の技術従来の文字認識装置では文字間隔が明確な文
書、つまり読取る文書の用紙上の絶対的な位置が
予じめ判明している文書を対象としており、対象
となる文書に制限を与えていた。この問題を解決
するために、入力された文書から認識対象となる
文字列を幅Ｗ、高さＨの矩形で切り出し、文字の
縦と横の長さの比が約“１”であることを利用し
て文字列の中から個別の文字パターンを切り出し
ていた。（例えば、秋山・内藤・増田“縦・横書
き文書からの個別文字切出し法”信学技報
PRL83−７電子通信学会発行）発明が解決しようとする問題点しかしながら、実際には文字の縦横比が“１”
に近くないものが多く、個別文字の切りだしを文
字列の高さを基準として行なう手法では個別文字
の切り出しミスが生じていた。Conventional technology Conventional character recognition devices target documents with clear character spacing, that is, documents whose absolute position on the paper is known in advance, and do not impose restrictions on the target documents. Ta. To solve this problem, we cut out the character string to be recognized from the input document into a rectangle with width W and height H, and confirmed that the ratio of the length to width of the character is approximately 1. It was used to extract individual character patterns from a string of characters. (For example, Akiyama, Naito, and Masuda “Method for extracting individual characters from vertically and horizontally written documents” IEICE Technical Report
PRL83-7 Published by the Institute of Electronics and Communication Engineers) Problems to be solved by the invention However, in reality, the aspect ratio of characters is “1”.
Many of them are not close to that, and the method of cutting out individual characters based on the height of the character string causes errors in cutting out individual characters.

本発明は文字の縦横比が“１”に近くない文字
に対しても文字列から個別文字を切り出し、文字
認識を行なうことができる文字認識装置を提供す
ることを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition device that can extract individual characters from a character string and perform character recognition even for characters whose aspect ratio is not close to "1".

問題点を解決するための手段本発明の文字認識装置は、認識対象文字を含む
画像を入力する画像入力部と、前記画像入力部で
入力された画像から認識対象となる文字列を幅
Ｗ、高さＨの矩形で切り出す文字列切り出し部
と、前記矩形において文字列方向に対して垂直方
向に走査して文字を形成する画素のヒストグラム
を求め、ヒストグラムの値が“１”以上である文
字部において連続する文字部から構成されるサブ
文字パターンを抽出するサブ文字パターン抽出部
と、前記サブ文字パターン抽出部において得られ
たサブ文字パターンの幅W_iにおいて｜W_i−Ｈ｜
≦α（α：定数）を満たすW_iの中から最大値W_jを
求め前記W_jを文字パターンの基準幅W_sとし、前
記サブ文字パターンのうち隣接するサブ文字パタ
ーンを組せるとこにより組合せたサブ文字パター
ンの幅W_pが｜W_p−W_s｜≦β（β：定数）を満た
す時に前記組み合わせたサブ文字パターンを個別
文字パターンとする個別文字パターン抽出部と、
前記個別文字パターン抽出部により得られた文字
パターンの特徴を計算しこの特徴と辞書とを照合
して認識候補文字を抽出する認識部とを設けたこ
とを特徴とする。Means for Solving the Problems The character recognition device of the present invention includes an image input section into which an image including characters to be recognized is input, and a character string to be recognized from the image inputted by the image input section with a width W, A character string cutout part that is cut out with a rectangle with a height H, and a character part whose histogram value is "1" or more by scanning the rectangle in a direction perpendicular to the character string direction to obtain a histogram of pixels that form a character. A sub-character pattern extracting section extracts a sub-character pattern composed of consecutive character parts in , and a width W _i of the sub-character pattern obtained in the sub-character pattern extracting section |W _i -H|
Find the maximum value W _j from W _i that satisfies ≦α (α: constant), set W _j as the standard width W _s of the character pattern, and combine by combining adjacent sub-character patterns among the sub-character patterns. an individual character pattern extraction unit that determines the combined sub-character pattern as an individual character pattern when the width W _p of the sub-character pattern obtained satisfies |W _p −W _s |≦β (β: constant);
The present invention is characterized in that it includes a recognition section that calculates the characteristics of the character pattern obtained by the individual character pattern extraction section and compares the characteristics with a dictionary to extract recognition candidate characters.

作用このように構成したため、矩形で切り出した文
字列において文字列方向と垂直の方向に走査して
ヒストグラムを求め、ヒストグラムから文字の切
れ目を検出して文字パターンの構成要素であるサ
ブ文字パターンを求め、前記文字列中のサブ文字
パターンの最大幅を文字パターンの基準幅とし、
前記基準幅をもとにサブ文字パターンを組合せて
個別文字パターンを抽出することができ、文字の
縦横比が“１”に近くない文字でも正確に切り出
し文字認識が可能となる。Function With this configuration, a histogram is obtained by scanning a rectangular character string in a direction perpendicular to the character string direction, character breaks are detected from the histogram, and sub-character patterns, which are the constituent elements of the character pattern, are determined. and set the maximum width of the sub-character pattern in the character string as the reference width of the character pattern,
Individual character patterns can be extracted by combining sub-character patterns based on the reference width, and even characters whose aspect ratio is not close to "1" can be accurately extracted and recognized.

実施例以下、本発明の一実施例を第１図〜第５図に基
づいて説明する。Embodiment Hereinafter, an embodiment of the present invention will be described based on FIGS. 1 to 5.

第１図は本発明による文字認識装置の一実施例
の構成図を示す。１は画像入力部で、認識対象文
字を含む画像を走査し２値信号で画像を入力し画
像メモリ２に格納する。３は文字列切り出し部
で、画像メモリ２を走査して文字列を矩形で切り
出す。４はサブ文字パターン抽出部で、文字部切
り出し部３で切りだした文字列を列方向と垂直に
走査して文字部のヒストグラムを求め文字パター
ンの構成要素であるるサブ文字パターンを抽出す
る。５は個別文字パターン抽出部で、サブ文字パ
ターン抽出部４で抽出したサブ文字パターンの組
合せから個別文字パターンを決定する。６は認識
部で、個別文字パターン抽出部５で抽出した各文
字パターンのストローク等の特徴量を求め、あら
かじめ辞書７に登録されている文字の特徴量と照
合し、最も似た文字を認識候補文字とする。８は
表示部で、認識部６で得られた認識結果を表示す
る。 FIG. 1 shows a block diagram of an embodiment of a character recognition device according to the present invention. Reference numeral 1 denotes an image input unit which scans an image containing characters to be recognized, inputs the image as a binary signal, and stores it in the image memory 2. 3 is a character string cutting section which scans the image memory 2 and cuts out a character string into a rectangular shape. Reference numeral 4 denotes a sub-character pattern extracting section which scans the character string cut out by the character section cutting section 3 perpendicular to the column direction to obtain a histogram of the character section and extracts sub-character patterns which are constituent elements of the character pattern. Reference numeral 5 denotes an individual character pattern extraction section which determines individual character patterns from the combination of sub-character patterns extracted by the sub-character pattern extraction section 4. Reference numeral 6 denotes a recognition unit, which calculates the stroke and other features of each character pattern extracted by the individual character pattern extraction unit 5, compares it with the character features registered in advance in the dictionary 7, and selects the most similar character as a recognition candidate. Characters. A display section 8 displays the recognition results obtained by the recognition section 6.

このように構成された文字認識装置について、
第２図に示す入力画像を列に挙げて構成を詳細に
説明する。 Regarding the character recognition device configured in this way,
The configuration will be described in detail by listing the input images shown in FIG. 2 in a row.

画像入力部１から入力された第２図に示すよう
な画像は２値化されて画像メモリ２に蓄えられ
る。文字列切り出し部３は画像メモリ２に蓄えら
れるている入力画像から予め絶対的な位置が決め
られている文字列を第３図ａに示すような矩形Ｒ
で切り出す。次にサブ文字パターン抽出部４では
矩形Ｒで切り出された文字列に対し、列方向と垂
直方向に走査して文字部のヒストグラムを第３図
ｂに示すように求め、連続する文字部により構成
されるサブ文字パターンを切り出し、各サブ文字
パターンの幅W_i（ｉ＝１，２，…，８）を求め
る。第３図ｃに切り出されたサブ文字パターン
P_s1，P_s2，…，P_s8を示す。個別文字パターン抽
出部５ではサブ文字パターン抽出部４で抽出され
た各サブ文字パターンの中からサブ文字パターン
の幅W_iが矩形Ｒで切り出した文字列の高さＨと
｜W_i−Ｈ｜≦α（α：定数）の条件を満たすW_iの
中で最大の値W_iを求め、基準文字パターンをP_j
とし、文字パターンの基準幅をW_jとする。例え
ば第３図ｂではW₁が最大であり、基準文字パタ
ーンはP₁となる。さらに隣接するサブ文字パタ
ーンを組合せ、サブ文字パターン幅W_iとサブ文
字パターン間幅b_iが｜〓ⁱ W_i＋〓ⁱ b_i−W₁｜≦β（β：定数）の条件を満す場合、隣接するサブパターンを合わ
せて個別文字パターンとし、個別文字パターン
P₁，P₂，…，P₆を第４図に示すように決定する。
認識部６では個別文字パターン抽出部５で得られ
た個別文字パターンP_iについて第５図ｂの矢印が
示す方向に着目画素を含んでＭ個以上連つている
否かを調べ方向コードを設定し、方向コード毎に
各画素の連結性を調べてストロークを抽出し、ス
トロークの数、位置、長さ等の特徴量を抽出す
る。第５図ａに文字「文」のストロークの抽出結
果を示す。抽出した特徴量を辞書７に登録されて
いる特徴量と照合し、最も似た文字を認識候補文
字とし、表示部８で表示する。 An image as shown in FIG. 2 input from the image input section 1 is binarized and stored in the image memory 2. The character string cutting unit 3 extracts character strings whose absolute positions are predetermined from the input image stored in the image memory 2 into a rectangle R as shown in FIG. 3a.
Cut it out. Next, the sub-character pattern extraction unit 4 scans the character string cut out in the rectangle R in the column direction and the vertical direction to obtain a histogram of the character part as shown in FIG. Then, the width W _i (i=1, 2, . . . , 8) of each sub-character pattern is determined. Sub-character pattern cut out in Figure 3c
P _s1 , P _s2 , ..., P _s8 are shown. In the individual character pattern extracting section 5, the width W _i of the sub-character pattern is determined from among the sub-character patterns extracted by the sub-character pattern extracting section 4 to be the height H of the character string cut out by the rectangle R and |W _i −H| Find the maximum value W _i among W _i that satisfies the condition of ≦α (α: constant), and set the standard character pattern as P _j
Let W _j be the standard width of the character pattern. For example, in FIG. 3b, W ₁ is the maximum, and the reference character pattern is P ₁ . Furthermore, adjacent sub-character patterns are combined, and the sub-character pattern width W _i and the width between sub-character patterns b _i satisfy the condition | 〓 ⁱ W _i + 〓 ⁱ b _i −W ₁ |≦β (β: constant) In this case, adjacent subpatterns are combined into an individual character pattern, and an individual character pattern
P ₁ , P ₂ , ..., P ₆ are determined as shown in FIG.
The recognition unit 6 determines whether M or more individual character patterns P _i obtained by the individual character pattern extraction unit 5 are continuous in the direction indicated by the arrow in FIG. 5b, including the pixel of interest, and sets a direction code. , strokes are extracted by examining the connectivity of each pixel for each direction code, and feature quantities such as the number, position, and length of strokes are extracted. FIG. 5a shows the stroke extraction result of the character ``sent''. The extracted feature amount is compared with the feature amount registered in the dictionary 7, and the most similar character is set as a recognition candidate character and displayed on the display unit 8.

発明の効果以上説明のように本発明の文字認識装置は、画
像入力部と文字列切り出し部と、サブ文字パター
ン抽出部と個別文字パターン抽出部および認識部
とを設け、矩形で切り出した文字列において文字
列方向と垂直の方向に走査してヒストグラムを求
め、ヒストグラムから文字の切れ目を検出して文
字パターンの構成要素であるサブ文字パターンを
求め、前記文字列中のサブ文字パターンの最大幅
を文字パターンの基準幅とし、前記基準幅をもと
にサブ文字パターンの組合せて個別文字パターン
を抽出するため、認識対象文字列から個別文字パ
ターンを抽出する場合に、文字パターンの縦横比
が“１”に近くなくても個別文字パターンを正確
に抽出することができ、文字認識の精度を向上さ
せることができるものである。Effects of the Invention As explained above, the character recognition device of the present invention includes an image input section, a character string cutting section, a sub-character pattern extraction section, an individual character pattern extraction section, and a recognition section, and a character string cut out in a rectangular shape. scan in the direction perpendicular to the character string direction to obtain a histogram, detect character breaks from the histogram to determine sub-character patterns that are constituent elements of the character pattern, and calculate the maximum width of the sub-character patterns in the character string. Since the standard width of the character pattern is used as the standard width, and individual character patterns are extracted by combining sub-character patterns based on the standard width, when extracting individual character patterns from the character string to be recognized, ”, it is possible to accurately extract individual character patterns even if they are not close to ``, and the accuracy of character recognition can be improved.

[Brief explanation of drawings]

第１図は本発明の文字認識装置の一実施例の構
成図、第２図は入力画像の一例を示す図、第３図
は文字列からサブ文字パターンを切り出す動作の
説明図、第４図は個別文字パターンを切り出した
結果の説明図、第５図は文字認識方法の説明図で
ある。１…画像入力部、２…画像メモリ部、３…文字
列切り出し部、４…サブ文字パターン、５…個別
文字パターン抽出部、６…認識部、７…辞書、８
…表示部。 FIG. 1 is a block diagram of an embodiment of the character recognition device of the present invention, FIG. 2 is a diagram showing an example of an input image, FIG. 3 is an explanatory diagram of the operation of cutting out sub-character patterns from a character string, and FIG. 4 5 is an explanatory diagram of the result of cutting out individual character patterns, and FIG. 5 is an explanatory diagram of the character recognition method. DESCRIPTION OF SYMBOLS 1... Image input part, 2... Image memory part, 3... Character string extraction part, 4... Sub character pattern, 5... Individual character pattern extraction part, 6... Recognition part, 7... Dictionary, 8
...Display section.

Claims

[Claims]

1: an image input unit that inputs an image containing characters to be recognized; a character string cutting unit that cuts out a character string to be recognized from the image input by the image input unit into a rectangle with a width W and a height H; Scan in the direction perpendicular to the character string direction to obtain a histogram of pixels forming a character, and extract a sub-character pattern consisting of consecutive character parts in character parts whose histogram value is "1" or more. a sub-character pattern extraction section and a width W _i of the sub-character pattern obtained in the sub-character pattern extraction section;
W _i that satisfies |W _i −H|≦α (α: constant)
Find the maximum value W _j from among them, use W _j as the reference width W _s of the character pattern, and combine adjacent sub character patterns among the sub character patterns, so that the width W _p of the combined sub character pattern is |W _p − _Ws
An individual character pattern extracting unit that uses the combined sub-character pattern as an individual character pattern when ｜≦β (β: constant) is satisfied; and an individual character pattern extraction unit that calculates the characteristics of the character pattern obtained by the individual character pattern extraction unit and A character recognition device that includes a recognition unit that extracts recognition candidate characters by comparing them with a dictionary.