JPH0576671B2 - - Google Patents

Info

Publication number
JPH0576671B2
JPH0576671B2 JP60106405A JP10640585A JPH0576671B2 JP H0576671 B2 JPH0576671 B2 JP H0576671B2 JP 60106405 A JP60106405 A JP 60106405A JP 10640585 A JP10640585 A JP 10640585A JP H0576671 B2 JPH0576671 B2 JP H0576671B2
Authority
JP
Japan
Prior art keywords
character
sub
character pattern
width
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP60106405A
Other languages
Japanese (ja)
Other versions
JPS61262985A (en
Inventor
Masahiro Shimizu
Mariko Takenochi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP60106405A priority Critical patent/JPS61262985A/en
Publication of JPS61262985A publication Critical patent/JPS61262985A/en
Publication of JPH0576671B2 publication Critical patent/JPH0576671B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 産業上の利用分野 本発明は新聞、雑誌等の活字および手書き文字
を認識し、たとえばJISコード等の情報量に変換
する文字認識装置に関するものである。
DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a character recognition device that recognizes printed and handwritten characters from newspapers, magazines, etc., and converts them into an amount of information such as a JIS code.

従来の技術 従来の文字認識装置では文字間隔が明確な文
書、つまり読取る文書の用紙上の絶対的な位置が
予じめ判明している文書を対象としており、対象
となる文書に制限を与えていた。この問題を解決
するために、入力された文書から認識対象となる
文字列を幅W、高さHの矩形で切り出し、文字の
縦と横の長さの比が約“1”であることを利用し
て文字列の中から個別の文字パターンを切り出し
ていた。(例えば、秋山・内藤・増田“縦・横書
き文書からの個別文字切出し法”信学技報
PRL83−7電子通信学会発行) 発明が解決しようとする問題点 しかしながら、実際には文字の縦横比が“1”
に近くないものが多く、個別文字の切りだしを文
字列の高さを基準として行なう手法では個別文字
の切り出しミスが生じていた。
Conventional technology Conventional character recognition devices target documents with clear character spacing, that is, documents whose absolute position on the paper is known in advance, and do not impose restrictions on the target documents. Ta. To solve this problem, we cut out the character string to be recognized from the input document into a rectangle with width W and height H, and confirmed that the ratio of the length to width of the character is approximately 1. It was used to extract individual character patterns from a string of characters. (For example, Akiyama, Naito, and Masuda “Method for extracting individual characters from vertically and horizontally written documents” IEICE Technical Report
PRL83-7 Published by the Institute of Electronics and Communication Engineers) Problems to be solved by the invention However, in reality, the aspect ratio of characters is “1”.
Many of them are not close to that, and the method of cutting out individual characters based on the height of the character string causes errors in cutting out individual characters.

本発明は文字の縦横比が“1”に近くない文字
に対しても文字列から個別文字を切り出し、文字
認識を行なうことができる文字認識装置を提供す
ることを目的とする。
SUMMARY OF THE INVENTION An object of the present invention is to provide a character recognition device that can extract individual characters from a character string and perform character recognition even for characters whose aspect ratio is not close to "1".

問題点を解決するための手段 本発明の文字認識装置は、認識対象文字を含む
画像を入力する画像入力部と、前記画像入力部で
入力された画像から認識対象となる文字列を幅
W、高さHの矩形で切り出す文字列切り出し部
と、前記矩形において文字列方向に対して垂直方
向に走査して文字を形成する画素のヒストグラム
を求め、ヒストグラムの値が“1”以上である文
字部において連続する文字部から構成されるサブ
文字パターンを抽出するサブ文字パターン抽出部
と、前記サブ文字パターン抽出部において得られ
たサブ文字パターンの幅Wiにおいて|Wi−H|
≦α(α:定数)を満たすWiの中から最大値Wj
求め前記Wjを文字パターンの基準幅Wsとし、前
記サブ文字パターンのうち隣接するサブ文字パタ
ーンを組せるとこにより組合せたサブ文字パター
ンの幅Wpが|Wp−Ws|≦β(β:定数)を満た
す時に前記組み合わせたサブ文字パターンを個別
文字パターンとする個別文字パターン抽出部と、
前記個別文字パターン抽出部により得られた文字
パターンの特徴を計算しこの特徴と辞書とを照合
して認識候補文字を抽出する認識部とを設けたこ
とを特徴とする。
Means for Solving the Problems The character recognition device of the present invention includes an image input section into which an image including characters to be recognized is input, and a character string to be recognized from the image inputted by the image input section with a width W, A character string cutout part that is cut out with a rectangle with a height H, and a character part whose histogram value is "1" or more by scanning the rectangle in a direction perpendicular to the character string direction to obtain a histogram of pixels that form a character. A sub-character pattern extracting section extracts a sub-character pattern composed of consecutive character parts in , and a width W i of the sub-character pattern obtained in the sub-character pattern extracting section |W i -H|
Find the maximum value W j from W i that satisfies ≦α (α: constant), set W j as the standard width W s of the character pattern, and combine by combining adjacent sub-character patterns among the sub-character patterns. an individual character pattern extraction unit that determines the combined sub-character pattern as an individual character pattern when the width W p of the sub-character pattern obtained satisfies |W p −W s |≦β (β: constant);
The present invention is characterized in that it includes a recognition section that calculates the characteristics of the character pattern obtained by the individual character pattern extraction section and compares the characteristics with a dictionary to extract recognition candidate characters.

作 用 このように構成したため、矩形で切り出した文
字列において文字列方向と垂直の方向に走査して
ヒストグラムを求め、ヒストグラムから文字の切
れ目を検出して文字パターンの構成要素であるサ
ブ文字パターンを求め、前記文字列中のサブ文字
パターンの最大幅を文字パターンの基準幅とし、
前記基準幅をもとにサブ文字パターンを組合せて
個別文字パターンを抽出することができ、文字の
縦横比が“1”に近くない文字でも正確に切り出
し文字認識が可能となる。
Function With this configuration, a histogram is obtained by scanning a rectangular character string in a direction perpendicular to the character string direction, character breaks are detected from the histogram, and sub-character patterns, which are the constituent elements of the character pattern, are determined. and set the maximum width of the sub-character pattern in the character string as the reference width of the character pattern,
Individual character patterns can be extracted by combining sub-character patterns based on the reference width, and even characters whose aspect ratio is not close to "1" can be accurately extracted and recognized.

実施例 以下、本発明の一実施例を第1図〜第5図に基
づいて説明する。
Embodiment Hereinafter, an embodiment of the present invention will be described based on FIGS. 1 to 5.

第1図は本発明による文字認識装置の一実施例
の構成図を示す。1は画像入力部で、認識対象文
字を含む画像を走査し2値信号で画像を入力し画
像メモリ2に格納する。3は文字列切り出し部
で、画像メモリ2を走査して文字列を矩形で切り
出す。4はサブ文字パターン抽出部で、文字部切
り出し部3で切りだした文字列を列方向と垂直に
走査して文字部のヒストグラムを求め文字パター
ンの構成要素であるるサブ文字パターンを抽出す
る。5は個別文字パターン抽出部で、サブ文字パ
ターン抽出部4で抽出したサブ文字パターンの組
合せから個別文字パターンを決定する。6は認識
部で、個別文字パターン抽出部5で抽出した各文
字パターンのストローク等の特徴量を求め、あら
かじめ辞書7に登録されている文字の特徴量と照
合し、最も似た文字を認識候補文字とする。8は
表示部で、認識部6で得られた認識結果を表示す
る。
FIG. 1 shows a block diagram of an embodiment of a character recognition device according to the present invention. Reference numeral 1 denotes an image input unit which scans an image containing characters to be recognized, inputs the image as a binary signal, and stores it in the image memory 2. 3 is a character string cutting section which scans the image memory 2 and cuts out a character string into a rectangular shape. Reference numeral 4 denotes a sub-character pattern extracting section which scans the character string cut out by the character section cutting section 3 perpendicular to the column direction to obtain a histogram of the character section and extracts sub-character patterns which are constituent elements of the character pattern. Reference numeral 5 denotes an individual character pattern extraction section which determines individual character patterns from the combination of sub-character patterns extracted by the sub-character pattern extraction section 4. Reference numeral 6 denotes a recognition unit, which calculates the stroke and other features of each character pattern extracted by the individual character pattern extraction unit 5, compares it with the character features registered in advance in the dictionary 7, and selects the most similar character as a recognition candidate. Characters. A display section 8 displays the recognition results obtained by the recognition section 6.

このように構成された文字認識装置について、
第2図に示す入力画像を列に挙げて構成を詳細に
説明する。
Regarding the character recognition device configured in this way,
The configuration will be described in detail by listing the input images shown in FIG. 2 in a row.

画像入力部1から入力された第2図に示すよう
な画像は2値化されて画像メモリ2に蓄えられ
る。文字列切り出し部3は画像メモリ2に蓄えら
れるている入力画像から予め絶対的な位置が決め
られている文字列を第3図aに示すような矩形R
で切り出す。次にサブ文字パターン抽出部4では
矩形Rで切り出された文字列に対し、列方向と垂
直方向に走査して文字部のヒストグラムを第3図
bに示すように求め、連続する文字部により構成
されるサブ文字パターンを切り出し、各サブ文字
パターンの幅Wi(i=1,2,…,8)を求め
る。第3図cに切り出されたサブ文字パターン
Ps1,Ps2,…,Ps8を示す。個別文字パターン抽
出部5ではサブ文字パターン抽出部4で抽出され
た各サブ文字パターンの中からサブ文字パターン
の幅Wiが矩形Rで切り出した文字列の高さHと
|Wi−H|≦α(α:定数)の条件を満たすWi
中で最大の値Wiを求め、基準文字パターンをPj
とし、文字パターンの基準幅をWjとする。例え
ば第3図bではW1が最大であり、基準文字パタ
ーンはP1となる。さらに隣接するサブ文字パタ
ーンを組合せ、サブ文字パターン幅Wiとサブ文
字パターン間幅biが | 〓i Wi+ 〓i bi−W1|≦β(β:定数) の条件を満す場合、隣接するサブパターンを合わ
せて個別文字パターンとし、個別文字パターン
P1,P2,…,P6を第4図に示すように決定する。
認識部6では個別文字パターン抽出部5で得られ
た個別文字パターンPiについて第5図bの矢印が
示す方向に着目画素を含んでM個以上連つている
否かを調べ方向コードを設定し、方向コード毎に
各画素の連結性を調べてストロークを抽出し、ス
トロークの数、位置、長さ等の特徴量を抽出す
る。第5図aに文字「文」のストロークの抽出結
果を示す。抽出した特徴量を辞書7に登録されて
いる特徴量と照合し、最も似た文字を認識候補文
字とし、表示部8で表示する。
An image as shown in FIG. 2 input from the image input section 1 is binarized and stored in the image memory 2. The character string cutting unit 3 extracts character strings whose absolute positions are predetermined from the input image stored in the image memory 2 into a rectangle R as shown in FIG. 3a.
Cut it out. Next, the sub-character pattern extraction unit 4 scans the character string cut out in the rectangle R in the column direction and the vertical direction to obtain a histogram of the character part as shown in FIG. Then, the width W i (i=1, 2, . . . , 8) of each sub-character pattern is determined. Sub-character pattern cut out in Figure 3c
P s1 , P s2 , ..., P s8 are shown. In the individual character pattern extracting section 5, the width W i of the sub-character pattern is determined from among the sub-character patterns extracted by the sub-character pattern extracting section 4 to be the height H of the character string cut out by the rectangle R and |W i −H| Find the maximum value W i among W i that satisfies the condition of ≦α (α: constant), and set the standard character pattern as P j
Let W j be the standard width of the character pattern. For example, in FIG. 3b, W 1 is the maximum, and the reference character pattern is P 1 . Furthermore, adjacent sub-character patterns are combined, and the sub-character pattern width W i and the width between sub-character patterns b i satisfy the condition | 〓 i W i + 〓 i b i −W 1 |≦β (β: constant) In this case, adjacent subpatterns are combined into an individual character pattern, and an individual character pattern
P 1 , P 2 , ..., P 6 are determined as shown in FIG.
The recognition unit 6 determines whether M or more individual character patterns P i obtained by the individual character pattern extraction unit 5 are continuous in the direction indicated by the arrow in FIG. 5b, including the pixel of interest, and sets a direction code. , strokes are extracted by examining the connectivity of each pixel for each direction code, and feature quantities such as the number, position, and length of strokes are extracted. FIG. 5a shows the stroke extraction result of the character ``sent''. The extracted feature amount is compared with the feature amount registered in the dictionary 7, and the most similar character is set as a recognition candidate character and displayed on the display unit 8.

発明の効果 以上説明のように本発明の文字認識装置は、画
像入力部と文字列切り出し部と、サブ文字パター
ン抽出部と個別文字パターン抽出部および認識部
とを設け、矩形で切り出した文字列において文字
列方向と垂直の方向に走査してヒストグラムを求
め、ヒストグラムから文字の切れ目を検出して文
字パターンの構成要素であるサブ文字パターンを
求め、前記文字列中のサブ文字パターンの最大幅
を文字パターンの基準幅とし、前記基準幅をもと
にサブ文字パターンの組合せて個別文字パターン
を抽出するため、認識対象文字列から個別文字パ
ターンを抽出する場合に、文字パターンの縦横比
が“1”に近くなくても個別文字パターンを正確
に抽出することができ、文字認識の精度を向上さ
せることができるものである。
Effects of the Invention As explained above, the character recognition device of the present invention includes an image input section, a character string cutting section, a sub-character pattern extraction section, an individual character pattern extraction section, and a recognition section, and a character string cut out in a rectangular shape. scan in the direction perpendicular to the character string direction to obtain a histogram, detect character breaks from the histogram to determine sub-character patterns that are constituent elements of the character pattern, and calculate the maximum width of the sub-character patterns in the character string. Since the standard width of the character pattern is used as the standard width, and individual character patterns are extracted by combining sub-character patterns based on the standard width, when extracting individual character patterns from the character string to be recognized, ”, it is possible to accurately extract individual character patterns even if they are not close to ``, and the accuracy of character recognition can be improved.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は本発明の文字認識装置の一実施例の構
成図、第2図は入力画像の一例を示す図、第3図
は文字列からサブ文字パターンを切り出す動作の
説明図、第4図は個別文字パターンを切り出した
結果の説明図、第5図は文字認識方法の説明図で
ある。 1…画像入力部、2…画像メモリ部、3…文字
列切り出し部、4…サブ文字パターン、5…個別
文字パターン抽出部、6…認識部、7…辞書、8
…表示部。
FIG. 1 is a block diagram of an embodiment of the character recognition device of the present invention, FIG. 2 is a diagram showing an example of an input image, FIG. 3 is an explanatory diagram of the operation of cutting out sub-character patterns from a character string, and FIG. 4 5 is an explanatory diagram of the result of cutting out individual character patterns, and FIG. 5 is an explanatory diagram of the character recognition method. DESCRIPTION OF SYMBOLS 1... Image input part, 2... Image memory part, 3... Character string extraction part, 4... Sub character pattern, 5... Individual character pattern extraction part, 6... Recognition part, 7... Dictionary, 8
...Display section.

Claims (1)

【特許請求の範囲】[Claims] 1 認識対象文字を含む画像を入力する画像入力
部と、前記画像入力部で入力された画像から認識
対象となる文字列を幅W、高さHの矩形で切り出
す文字列切り出し部と、前記矩形において文字列
方向に対して垂直方向に走査して文字を形成する
画素のヒストグラムを求め、ヒストグラムの値が
“1”以上である文字部において連続する文字部
から構成されるサブ文字パターンを抽出するサブ
文字パターン抽出部と、前記サブ文字パターン抽
出部において得られたサブ文字パターンの幅Wi
において|Wi−H|≦α(α:定数)を満たすWi
の中から最大値Wjを求め前記Wjを文字パターン
の基準幅Wsとし、前記サブ文字パターンのうち
隣接するサブ文字パターンを組合せるとこにより
組合せたサブ文字パターンの幅Wpが|Wp−Ws
|≦β(β:定数)を満たす時に前記組み合わせ
たサブ文字パターンを個別文字パターンとする個
別文字パターン抽出部と、前記個別文字パターン
抽出部により得られた文字パターンの特徴を計算
しこの特徴と辞書とを照合して認識候補文字を抽
出する認識部とを設けた文字認識装置。
1: an image input unit that inputs an image containing characters to be recognized; a character string cutting unit that cuts out a character string to be recognized from the image input by the image input unit into a rectangle with a width W and a height H; Scan in the direction perpendicular to the character string direction to obtain a histogram of pixels forming a character, and extract a sub-character pattern consisting of consecutive character parts in character parts whose histogram value is "1" or more. a sub-character pattern extraction section and a width W i of the sub-character pattern obtained in the sub-character pattern extraction section;
W i that satisfies |W i −H|≦α (α: constant)
Find the maximum value W j from among them, use W j as the reference width W s of the character pattern, and combine adjacent sub character patterns among the sub character patterns, so that the width W p of the combined sub character pattern is |W pWs
An individual character pattern extracting unit that uses the combined sub-character pattern as an individual character pattern when |≦β (β: constant) is satisfied; and an individual character pattern extraction unit that calculates the characteristics of the character pattern obtained by the individual character pattern extraction unit and A character recognition device that includes a recognition unit that extracts recognition candidate characters by comparing them with a dictionary.
JP60106405A 1985-05-17 1985-05-17 Character recognizing device Granted JPS61262985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60106405A JPS61262985A (en) 1985-05-17 1985-05-17 Character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60106405A JPS61262985A (en) 1985-05-17 1985-05-17 Character recognizing device

Publications (2)

Publication Number Publication Date
JPS61262985A JPS61262985A (en) 1986-11-20
JPH0576671B2 true JPH0576671B2 (en) 1993-10-25

Family

ID=14432769

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60106405A Granted JPS61262985A (en) 1985-05-17 1985-05-17 Character recognizing device

Country Status (1)

Country Link
JP (1) JPS61262985A (en)

Also Published As

Publication number Publication date
JPS61262985A (en) 1986-11-20

Similar Documents

Publication Publication Date Title
US4813078A (en) Character recognition apparatus
JP2661898B2 (en) Character recognition device
JPH0576671B2 (en)
JP3276555B2 (en) Format recognition device and character reader
JP3476595B2 (en) Image area division method and image binarization method
JP2537973B2 (en) Character recognition device
JPS6316392A (en) Character recognizing device
JPH0584553B2 (en)
JPH0797390B2 (en) Character recognition device
JPH0782525B2 (en) Character recognition device
JPH0664628B2 (en) Character recognition device
JPH07107700B2 (en) Character recognition device
JPS63221495A (en) Character recognizing device
JPS63225883A (en) Character recognition device
JPS62219187A (en) Character recognizing device
JPH083829B2 (en) Character recognition method
JP3100825B2 (en) Line recognition method
JPH01137385A (en) Character recognizing device
JPS6378287A (en) Character recognizing device
JPS6255783A (en) Character recognizing device
JPS6316391A (en) Character recognizing device
JPH0632079B2 (en) Character recognition device
JPH0350689A (en) Character recognizing device
Babić Cursive word raw segmentation based on scanning Skew slots
JPH02214992A (en) Character recognizing device

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term