JPH11238099A

JPH11238099A - Character recognition device, method therefor and computer readable recording medium stored with character recognition program

Info

Publication number: JPH11238099A
Application number: JP10336918A
Authority: JP
Inventors: Mariko Takenouchi; 磨理子竹之内; Minoru Takakura; 穂高倉; Ichiro Nakao; 一郎中尾
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-12-19
Filing date: 1998-11-27
Publication date: 1999-08-31
Also published as: TW406246B; KR19990063196A; CN1153168C; KR100498683B1; CN1221927A

Abstract

PROBLEM TO BE SOLVED: To provide a character recognition device capable of correctly recognizing characters even when the character images of a recognition object are crushed. SOLUTION: A character image input part 302 inputs normalized binarized recognition object character images and an image area division part 303 divides the existence area of the character images into 16. A feature amount extraction part 304 extracts a feature amount for the respective divided areas and an image area classification part 305 classifies the respective areas into two groups. A recognition part 306 calculates the street distance of the feature amount of the recognition object character image and the standard feature amount of a recognition dictionary 301 separately for a first group and a second group, multiplies a prescribed coefficient by the respective street distances of the first group and the second groups, totals them and defines a closest standard character as a recognized result.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された認識対
象の文字画像から文字を認識する文字認識装置及びその
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition apparatus and method for recognizing characters from an input character image to be recognized.

【０００２】[0002]

【従来の技術】文字認識の精度を向上するため、特開昭
６１−１２５６８８号公報に記載された文字認識装置が
ある。この装置は、正規化された標準的な文字画像を縦
横それぞれ４等分の１６領域に分割し、各領域ごとの標
準特徴量と文字コードとを登録した認識辞書を有してい
る。この標準特徴量は、文字要素部分を表す黒画素部分
の並び方のパターンを、横／左上／縦／右上の４方向に
分類し、領域に含まれる文字要素部分の４パターンそれ
ぞれの数を求めたものである。従って、認識辞書には、
文字コードごとに各領域の４次元の標準特徴量が記載さ
れ、「４×１６」の６４次元の標準特徴量が登録されて
いる。2. Description of the Related Art In order to improve the accuracy of character recognition, there is a character recognition device described in Japanese Patent Application Laid-Open No. 61-125688. This apparatus has a recognition dictionary in which a standardized standard character image is divided into 16 regions each having a length and width, and a standard feature amount and a character code for each region are registered. This standard feature amount is obtained by classifying patterns of arrangement of black pixel portions representing character element portions into four directions of horizontal / upper left / vertical / upper right, and calculating the numbers of the four patterns of the character element portions included in the area. Things. Therefore, the recognition dictionary contains:
A four-dimensional standard feature amount of each area is described for each character code, and a “4 × 16” 64-dimensional standard feature amount is registered.

【０００３】この装置では、認識対象の文字画像を文字
要素部分を「黒」、背景部分を「白」の２値画像として
入力し、「１６×１６」画素の領域に正規化した後、縦
横それぞれ４画素ずつの１６領域に分割し、各領域で認
識辞書と同様の特徴量を抽出している。次に、認識対象
の文字画像と認識辞書に登録されている文字コードで示
される文字との類似度として、標準特徴量（６４次元）
と抽出した特徴量（同様の６４次元）との市街地距離を
計算する。計算した市街地距離の小さいものが類似度が
高いものであるので、入力された文字画像を市街地距離
が最小となる標準特徴量を有する文字コードで示される
文字として認識する。In this apparatus, a character image to be recognized is input as a binary image having a character element portion of "black" and a background portion of "white", and is normalized to an area of "16.times.16" pixels. The image is divided into 16 regions each having 4 pixels, and the same feature amount as in the recognition dictionary is extracted in each region. Next, as a similarity between the character image to be recognized and the character indicated by the character code registered in the recognition dictionary, a standard feature amount (64-dimensional)
Is calculated with respect to the extracted feature amount (similar 64 dimensions). Since the smaller the calculated city distance is, the higher the similarity is, the input character image is recognized as a character represented by a character code having a standard feature amount that minimizes the city distance.

【０００４】このように領域を分割することで、例えば
分割を行わずに「横／左上／縦／右上」の４パターンの
数を標準特徴量とする場合に、正しく認識できなかった
「土」と「士」とを区別して認識することができるよう
になる。[0004] By dividing the area in this manner, for example, when the number of four patterns of “horizontal / upper left / vertical / upper right” is used as the standard feature amount without performing division, “soil” which cannot be correctly recognized is used. And "shi" can be distinguished and recognized.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、このよ
うな従来の文字認識装置では、文字画像を２値画像とし
て入力する場合に、例えばスキャナの解像度等によっ
て、入力された文字画像がつぶれているときには、いく
ら領域を細分化するようなことをしても、また、特徴要
素を４パターンから８パターンにして、特徴量の精度を
上げ、類似度計算の精度を上げるようにしても、正しく
文字を認識することができない。However, in such a conventional character recognition apparatus, when a character image is input as a binary image, if the input character image is destroyed due to, for example, the resolution of a scanner, etc. No matter how much the area is subdivided, the number of feature elements is changed from 4 to 8, and the accuracy of the feature amount is increased, and the accuracy of the similarity calculation is increased. I can't recognize.

【０００６】具体例を示して説明する。図１と図２と
は、ともに正規化された文字「電」の２値の文字画像で
ある。図１に示す文字画像「電」１０１は、スキャナに
よる読み込みが正しく行われているけれども、図２に示
す文字画像「電」２０１は、スキャナによる読み込みの
際に文字がつぶれた状態となっている。この２つの文字
画像「電」１０１，２０１を従来の文字認識装置で文字
認識すると、文字画像「電」１０１は、正しく文字
「電」と認識されるけれども、文字画像「電」２０１
は、文字「ｔ」と誤認識される。A specific example will be described. FIGS. 1 and 2 are binary character images of the character “den” both normalized. The character image “den” 101 shown in FIG. 1 is correctly read by the scanner, but the character image “den” 201 shown in FIG. 2 is in a state where the characters are crushed when read by the scanner. . When the two character images “den” 101 and 201 are recognized by a conventional character recognition device, the character image “den” 101 is correctly recognized as the character “den”.
Is erroneously recognized as the letter "t".

【０００７】本発明は、認識対象の文字画像がつぶれて
いる場合でも、正しく文字を認識できる認識精度の向上
した文字認識装置を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a character recognition device with improved recognition accuracy capable of correctly recognizing a character even when a character image to be recognized is crushed.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するため
に、本発明は、認識対象文字画像を文字コードとして認
識する文字認識装置であって、文字の標準的な画像をＮ
（Ｎ≧２）の部分画像領域に分割し、各部分画像領域に
含まれる部分文字画像の標準特徴を文字単位で文字コー
ドと共に予め登録している認識辞書と、認識対象文字画
像をＮの部分画像領域に分割する画像領域分割手段と、
前記画像領域分割手段で分割された部分画像領域に含ま
れる部分文字画像の特徴を抽出する特徴抽出手段と、前
記画像領域分割手段で分割された部分画像領域を複数群
に分類する部分画像領域分類手段と、各群ごとに前記認
識対象文字画像の部分文字画像の特徴と前記認識辞書の
対応する標準特徴との素類似度を計算し、各群の素類似
度に所定の重み付けをして、認識対象文字画像と前記認
識辞書の文字との類似度を計算する類似度計算手段と、
前記類似度が最も高い文字を認識文字としてその文字コ
ードを選出する認識文字選出手段とを備えることとして
いる。According to the present invention, there is provided a character recognition apparatus for recognizing a character image to be recognized as a character code.
(N ≧ 2) is divided into partial image areas, and a recognition dictionary in which standard features of partial character images included in each partial image area are registered in advance along with character codes in character units, and a recognition target character image is divided into N parts Image region dividing means for dividing into image regions;
A feature extracting unit for extracting a feature of a partial character image included in the partial image region divided by the image region dividing unit; and a partial image region classification for classifying the partial image regions divided by the image region dividing unit into a plurality of groups. Means, for each group, calculate the elementary similarity between the feature of the partial character image of the recognition target character image and the corresponding standard feature of the recognition dictionary, weighting the elementary similarity of each group a predetermined weight, Similarity calculation means for calculating the similarity between the recognition target character image and the characters in the recognition dictionary,
A recognition character selecting means for selecting a character having the highest similarity as a recognition character and selecting a character code thereof.

【０００９】[0009]

【発明の実施の形態】以下、本発明に係る文字認識装置
の実施の形態を図面を用いて説明する。（実施の形態１）図３は、本発明に係る文字認識装置の
実施の形態１の構成図である。この文字認識装置は、認
識辞書３０１と、文字画像入力部３０２と、画像領域分
割部３０３と、特徴量抽出部３０４と、画像領域分類部
３０５と、認識部３０６と、認識結果出力部３０７とを
備えている。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the character recognition device according to the present invention will be described below with reference to the drawings. (Embodiment 1) FIG. 3 is a configuration diagram of Embodiment 1 of a character recognition device according to the present invention. The character recognition device includes a recognition dictionary 301, a character image input unit 302, an image region division unit 303, a feature amount extraction unit 304, an image region classification unit 305, a recognition unit 306, and a recognition result output unit 307. It has.

【００１０】認識辞書３０１は、文字ごとに文字コード
と文字の標準的な文字画像の特徴を示す標準特徴量とを
予め登録している。図４(a)は、認識辞書３０１の登録
内容の一例を示す図である。登録内容４０１は、数字
「１」についての文字コード４０２と標準特徴量４０３
とを含んでいる。この標準特徴量４０３の抽出対象とな
った数字「１」の標準的な文字画像を図４(b)に示して
いる。[0010] The recognition dictionary 301 pre-registers a character code and a standard feature amount indicating the characteristics of a standard character image of each character. FIG. 4A is a diagram illustrating an example of registered contents of the recognition dictionary 301. The registration content 401 includes a character code 402 and a standard feature amount 403 for the number “1”.
And FIG. 4B shows a standard character image of the numeral “1” from which the standard feature amount 403 is extracted.

【００１１】文字画像４０４は、縦横それぞれ１６画素
の正規化された領域に文字要素部分を「黒」画素で、背
景部分を「白」画素で示されている。この文字画像４０
４の標準特徴量４０３を抽出するため、文字画像４０４
をＮの領域（ここではＮ＝１６）に分割している。図で
は、破線で示すように、縦横それぞれ「４」画素ずつに
等分割した「１６」の領域としている。In the character image 404, a character element portion is indicated by "black" pixels and a background portion is indicated by "white" pixels in a normalized area of 16 pixels in each of the vertical and horizontal directions. This character image 40
4 to extract the standard feature amount 403 of the character image 404.
Is divided into N regions (N = 16 in this case). In the figure, as indicated by broken lines, the area is “16” divided equally into “4” pixels in each of the vertical and horizontal directions.

【００１２】標準特徴量４０３は、図５に示すような文
字画像４０４の「黒」画素と「白」画素との境界点での
「黒」画素の並び方のパターンを特徴要素とし、そのパ
ターンと一致する数を分割された領域ごとに計数したも
のである。特徴要素「横」５０１は、「黒」画素の横方
向の並び方のパターン５０２，５０３で示される。画素
に付されている「＊」は、文字要素部分と背景部分との
境界点を注目画素としていることを示している。特徴要
素「左上」５０４は、黒画素の左上方向の並び方のパタ
ーン５０５，５０６で示される。ここで「◎」は「黒」
画素であっても「白」画素であってもよいことを示して
いる。特徴要素「縦」５０７は、「黒」画素の縦方向の
並び方のパターン５０８，５０９で示される。特徴要素
「右上」５１０は、「黒」画素の右上方向の並び方のパ
ターン５１１，５１２で示される。The standard feature quantity 403 is a feature element which is a pattern of arrangement of “black” pixels at a boundary point between “black” pixels and “white” pixels of the character image 404 as shown in FIG. The number of coincidences is counted for each divided area. The characteristic element “horizontal” 501 is indicated by patterns 502 and 503 in which “black” pixels are arranged in the horizontal direction. “*” Attached to the pixel indicates that the boundary point between the character element portion and the background portion is set as the target pixel. The characteristic element “upper left” 504 is indicated by patterns 505 and 506 in which black pixels are arranged in the upper left direction. Where "◎" is "black"
This indicates that it may be a pixel or a "white" pixel. The characteristic element “vertical” 507 is indicated by patterns 508 and 509 in which “black” pixels are arranged in the vertical direction. The characteristic element “upper right” 510 is indicated by patterns 511 and 512 in which “black” pixels are arranged in the upper right direction.

【００１３】図４(b)の領域４０５で特徴要素５０１，
５０４，５０７，５１０の出現数を計数すると、「横」
５０１は、「黒」画素４０６を注目画素として並び方の
パターン５０２が１回、「黒」画素４０７を注目画素と
して並び方のパターン５０３が１回出現し、特徴要素
「横」５０１の標準特徴量は「２」となる。また、領域
４０５の「白」画素４０８を注目画素とすると「左上」
の並び方のパターン５０５が１回出現する。これによ
り、特徴要素「左上」５０４の標準特徴量は「１」とな
る。特徴要素「縦」５０７の並び方のパターン５０８，
５０９は、領域４０５に出現しないので、標準特徴量は
「０」となる。また、領域４０５の「白」画素４０９，
４１０をそれぞれ注目画素として並び方のパターン５１
１が２回出現する。これにより、特徴要素「右上」５１
０の標準特徴量は「２」となる。図４(a)の標準特徴量
４０３の＜横１＞＜縦１＞で特定される（領域４０５に
対応する）標準特徴量は「２，１，０，２」である。In the area 405 of FIG.
When the number of appearances of 504, 507, 510 is counted, "horizontal"
501, a pattern 502 in which the “black” pixel 406 is a target pixel and a pattern 503 in which the “black” pixel 407 is a target pixel appear once, and the standard feature amount of the feature element “horizontal” 501 is It becomes "2". When the “white” pixel 408 in the area 405 is set as a target pixel, “upper left”
Pattern 505 appears once. Thus, the standard feature amount of the feature element “upper left” 504 becomes “1”. The pattern 508 of how to arrange the characteristic elements "vertical" 507,
Since 509 does not appear in the area 405, the standard feature amount is “0”. Also, “white” pixels 409 in the area 405,
The pattern 51 of how to line up each with 410 as a pixel of interest
1 appears twice. Thereby, the characteristic element “upper right” 51
The standard feature value of 0 is “2”. The standard feature amount (corresponding to the area 405) specified by <horizontal 1> and <vertical 1> of the standard feature amount 403 in FIG. 4A is “2, 1, 0, 2”.

【００１４】同様に、領域４１１では「黒」画素４１
２，４１３，４１４，４１５を注目画素として並び方の
パターン５０２が４回出現する。これにより特徴要素
「横」５０１の標準特徴量は「４」となる。図４(a)の
標準特徴量４０３の＜横１＞＜縦２＞で特定される（領
域４１１に対応する）標準特徴量は、「４，０，０，
０」である。なお、領域４１１の上側４１６は、「白」
画素であるとして標準特徴量を計数している。Similarly, in the area 411, the "black" pixel 41
The pattern 502 in the arrangement manner appears four times with 2,413,414,415 as the pixel of interest. Thus, the standard feature amount of the feature element “horizontal” 501 becomes “4”. The standard feature amount (corresponding to the area 411) specified by <horizontal 1> and <vertical 2> of the standard feature amount 403 in FIG.
0 ". Note that the upper side 416 of the region 411 is “white”
The standard feature amount is counted as a pixel.

【００１５】残りの１４の領域でも、同様に標準特徴量
が計数されている。このように、標準特徴量４０３は、
「黒」画素の並び方のパターンから４種の特徴要素
「横」５０１、「左上」５０４、「縦」５０７、「右
上」５０９の出現数を分割された「１６」領域ごとに計
数したものであり、文字コード４０２に対して「４×１
６」の６４次元量となる。In the remaining 14 areas, standard feature values are similarly counted. Thus, the standard feature amount 403 is
The number of appearances of four types of characteristic elements “horizontal” 501, “upper left” 504, “vertical” 507, and “upper right” 509 are counted for each of the divided “16” regions from the pattern of the arrangement of “black” pixels. Yes, the character code 402 is “4 × 1
6 ", which is a 64-dimensional quantity.

【００１６】認識辞書３０１の登録内容の他の例と対応
する標準的な文字画像を図６〜図１２にそれぞれ示す。
図６(a)は、アルファベット「i」の登録内容６０１を示
し、同(b)はその文字画像６０２を示している。登録内
容６０１は、図４に示した数字「１」と同様、文字コー
ド６０３と文字画像６０２の領域を「１６」分割して得
られた標準特徴量６０４とからなる。FIGS. 6 to 12 show standard character images corresponding to other examples of the registered contents of the recognition dictionary 301, respectively.
FIG. 6A shows registered contents 601 of the alphabet “i”, and FIG. 6B shows a character image 602 thereof. The registration content 601 includes a character code 603 and a standard feature amount 604 obtained by dividing the area of the character image 602 into “16”, like the numeral “1” shown in FIG.

【００１７】同様に、図７はアルファベット「ｔ」、図
８は漢字「雲」、図９は漢字「雷」、図１０は漢字
「宅」、図１１は漢字「電」、図１２は漢字「竜」の登
録内容と標準的な文字画像とをそれぞれ示している。文
字画像入力部３０２は、スキャナ等を有し、認識対象の
文字画像を文字要素部分を「黒」画素に、背景部分を
「白」画素にした２値画像を入力し、「１６×１６」画
素に正規化する。認識対象の文字画像は、従来技術の説
明で用いた図１、図２に示した文字画像「電」１０１，
「電」２０１と同様である。Similarly, FIG. 7 shows the alphabet "t", FIG. 8 shows the kanji "cloud", FIG. 9 shows the kanji "rai", FIG. 10 shows the kanji "home", FIG. 11 shows the kanji "den", and FIG. The registered content of "dragon" and a standard character image are shown. The character image input unit 302 includes a scanner or the like, and inputs a binary image in which a character image part to be recognized is a pixel element in “black” pixels and a background part is “white” pixels, and “16 × 16” Normalize to pixels. The character image to be recognized is the character image "den" 101, shown in FIGS.
This is the same as “Den” 201.

【００１８】画像領域分割部３０３は、文字画像入力部
３０２で入力された文字画像をＮの領域（ここではＮ＝
１６）に、即ち縦横それぞれ４等分の１６領域に分割す
る。図１３は、図１に示した文字画像１０１を１６領域
に分割した文字画像を示す。特徴量抽出部３０４は、図
５に示した特徴要素である「黒」画素の並び方のパター
ン「横」５０２，５０３、「左上」５０５，５０６、
「縦」５０８，５０９、「右上」５１１，５１２をその
注目画素（図において「＊」印を付している）とともに
記憶している。画像領域分割部３０３で分割された各領
域ごとに順に（例えば、＜横１＞＜縦１＞、＜横１＞＜
縦２＞、＜横１＞＜縦３＞、・・・、＜横４＞＜縦３
＞、＜横４＞＜縦４＞の順に）境界点の「＊」印を付し
た画素を注目画素として、記憶している「黒」画素の並
び方のパターンと一致するパターンがあれば、それぞれ
の特徴量の数を「１」ずつ増やしていく。The image area dividing section 303 converts the character image input by the character image input section 302 into N areas (here, N = N).
16), that is, the image is divided into 16 areas each of which is equal to four in the vertical and horizontal directions. FIG. 13 shows a character image obtained by dividing the character image 101 shown in FIG. 1 into 16 regions. The feature amount extraction unit 304 determines the pattern of the arrangement of “black” pixels “horizontal” 502 and 503, “upper left” 505 and 506, which are the characteristic elements shown in FIG.
“Vertical” 508 and 509 and “upper right” 511 and 512 are stored together with the target pixel (marked with “*” in the figure). For each area divided by the image area dividing unit 303 (for example, <horizontal 1><vertical1>,<horizontal1><
<Vertical 2>, <Horizontal 1><Vertical3>, ..., <Horizontal 4><Vertical 3
>, <Horizontal 4> and <vertical 4>), and if there is a pattern that matches the stored pattern of “black” pixels with the pixel marked with “*” at the boundary point as the pixel of interest, Are increased by “1” at a time.

【００１９】特徴量抽出部３０４は、このような処理を
することによって、１６の各領域で４次元の特徴量を抽
出し、文字画像について６４次元の特徴量を抽出する。
抽出した特徴量を記憶する。図１４は、図１に示した文
字画像「電」１０１から抽出された６４次元の特徴量を
示している。同様に図２に示した文字画像「電」２０１
からは、図１５に示す６４次元の特徴量が抽出される。By performing such processing, the characteristic amount extracting section 304 extracts a four-dimensional characteristic amount from each of the 16 regions, and extracts a 64-dimensional characteristic amount from the character image.
The extracted feature amount is stored. FIG. 14 shows a 64-dimensional feature quantity extracted from the character image "den" 101 shown in FIG. Similarly, the character image “den” 201 shown in FIG.
, A 64-dimensional feature quantity shown in FIG. 15 is extracted.

【００２０】画像領域分類部３０５は、画像領域分割部
３０３で分割されたＮの各領域（Ｎ＝１６）を複数の群
に分類する。ここでは第１群と第２群との２群に分類す
る。図１６は、文字画像１０１，１０２の各領域が周辺
部の第１群１６０１と中央部（斜線を付した部分）の第
２群１６０２とに分類された状態を示している。従っ
て、第１群１６０１には、画像領域分割部３０３で分割
された＜横１＞＜縦１＞〜＜縦４＞、＜横２＞＜縦１
＞、＜横２＞＜縦４＞、＜横３＞＜縦１＞、＜横３＞＜
縦４＞、＜横４＞＜縦１＞〜＜縦４＞の１２領域が分類
される。第２群１６０２には、同様に＜横２＞＜縦２
＞、＜横２＞＜縦３＞、＜横３＞＜縦２＞、＜横３＞＜
縦３＞の４領域が分類される。The image area classifying unit 305 classifies each of the N areas (N = 16) divided by the image area dividing unit 303 into a plurality of groups. Here, it is classified into two groups, a first group and a second group. FIG. 16 shows a state in which the respective regions of the character images 101 and 102 are classified into a first group 1601 in the peripheral portion and a second group 1602 in the central portion (hatched portion). Therefore, the first group 1601 includes the <horizontal 1>, <vertical 1> to <vertical 4>, <horizontal 2>, and <vertical 1> divided by the image area dividing unit 303.
>, <Horizontal 2><vertical4>,<horizontal3><vertical1>,<horizontal3><
Twelve areas of <vertical 4>, <horizontal 4>, <vertical 1> to <vertical 4> are classified. Similarly, the second group 1602 includes <horizontal 2><vertical 2
>, <Horizontal 2><vertical3>,<horizontal3><vertical2>,<horizontal3><
Four regions of vertical 3> are classified.

【００２１】画像領域分類部３０５は、各画像領域の分
類が終了すると、認識部３０６に第１群と第２群とに分
類した各画像領域を通知する。認識部３０６は、画像領
域分類部３０５から第１群と第２群とに分類された各画
像領域の通知を受けると、各群ごとに特徴量抽出部３０
４に記憶されている特徴量を読み出し、認識辞書３０１
に登録されている全ての文字コードについて、文字コー
ドごとに対応する標準特徴量との市街地距離を計算す
る。When the classification of each image region is completed, the image region classification unit 305 notifies the recognition unit 306 of the image regions classified into the first group and the second group. When the recognition unit 306 receives the notification of each image region classified into the first group and the second group from the image region classification unit 305, the feature amount extraction unit 30
4 is read out and the recognition dictionary 301 is read out.
For all the character codes registered in, the distance between the city area and the standard feature amount corresponding to each character code is calculated.

【００２２】認識部３０６は、画像領域分類部３０５か
ら図１６に示したような第１群１６０１と第２群１６０
２との通知を受けると、第１群１６０１については式
（１）を、第２群については式（２）を用いて市街地距
離ｄ１，ｄ２をそれぞれ計算する。ここで、Ｆoiは特徴量抽出部３０４に記憶された第１群
に分類されたi番目の特徴量であり、Ｆsiは認識辞書３
０１に登録されている対応するi番目の標準特徴量であ
る。ここで、Ｆojは特徴量抽出部３０４に記憶された第２群
に分類されたj番目の特徴量であり、Ｆsjは認識辞書３
０１に登録されている対応するj番目の標準特徴量であ
る。The recognizing unit 306 is provided with a first group 1601 and a second group 160 as shown in FIG.
When the notification that the second group is received, the distances d1 and d2 are calculated for the first group 1601 using the equation (1) and for the second group using the equation (2). Here, Foi is the i-th feature amount classified into the first group stored in the feature amount extraction unit 304, and Fsi is the recognition dictionary 3
01 is the corresponding i-th standard feature amount registered in 01. Here, Foj is the j-th feature amount classified into the second group stored in the feature amount extraction unit 304, and Fsj is the recognition dictionary 3
01 is the corresponding j-th standard feature amount registered in 01.

【００２３】次に、認識部３０６は、計算した文字の市
街地距離ｄ１をα倍し、市街地距離ｄ２をβ倍して加算
し、各文字の合計市街地距離ｄを計算する。認識辞書３
０１に記憶されている全ての文字に対して合計市街地距
離を計算し、合計市街地距離ｄが最小になる文字が、認
識対象の文字画像と１番類似度が高いとして認識結果出
力部３０７にその文字の文字コードを通知する。Next, the recognizing unit 306 multiplies the calculated city distance d1 of the character by α and multiplies the city distance d2 by β to add a total city distance d of each character. Recognition dictionary 3
The total city distance is calculated for all the characters stored in “01”, and the character having the smallest total city distance d is determined to have the highest similarity to the character image to be recognized, and is output to the recognition result output unit 307. Notify the character code of the character.

【００２４】今、α＝１．０とし、β＝０．５として、
図１に示した文字画像「電」１０１について、合計市街
地距離ｄを計算すると、図１７(a)に示すような結果と
なる。これによって、文字「電」の合計市街地距離ｄが
「１４」と最小となっているので、文字コード「４５４
５」（図１１参照）が認識結果出力部３０７に通知され
る。Now, assuming that α = 1.0 and β = 0.5,
When the total city distance d is calculated for the character image “den” 101 shown in FIG. 1, the result is as shown in FIG. As a result, the total city distance d of the character "den" is minimized to "14".
"5" (see FIG. 11) is notified to the recognition result output unit 307.

【００２５】図２に示した文字画像「電」２０１につい
て合計市街地距離ｄを計算すると、図１７(b)に示すよ
うな結果となる。これによって、同様に認識結果出力部
３０７に文字コード「４５４５」が通知される。図２に
示したつぶれた文字画像「電」２０１でも、本装置で
は、正しく文字認識することができる。なお、上述の従
来装置で計算された文字画像「電」１０１と「電」２０
１との市街地距離を示すと、図１７(c)に示すようにな
っている。これによって、文字画像「電」２０１では、
正しく認識されなかったことが明瞭に示されている。When the total city distance d is calculated for the character image "den" 201 shown in FIG. 2, the result shown in FIG. 17B is obtained. Thus, the recognition result output unit 307 is similarly notified of the character code “4545”. Even with the crushed character image “den” 201 shown in FIG. 2, the present apparatus can correctly recognize characters. Note that the character images “den” 101 and “den” 20 calculated by the above-described conventional device are used.
FIG. 17C shows the distance from the city center to the area 1. As a result, in the character image "den" 201,
It is clearly shown that it was not recognized correctly.

【００２６】認識結果出力部３０７は、ＣＲＴ等のディ
スプレイを有し、認識部３０６から通知された文字コー
ドに対応する標準的な文字画像を表示する。次に、本実
施の形態の動作を図１８に示すフローチャートを用いて
説明する。文字画像入力部３０２は、認識対象の文字画
像を正規化した２値の文字画像として入力する（Ｓ１８
０２）。The recognition result output unit 307 has a display such as a CRT, and displays a standard character image corresponding to the character code notified from the recognition unit 306. Next, the operation of the present embodiment will be described with reference to the flowchart shown in FIG. The character image input unit 302 inputs the character image to be recognized as a normalized binary character image (S18).
02).

【００２７】画像領域分割部３０３は、文字画像の存在
領域をＮ個の領域に分割する（Ｓ１８０４）。特徴量抽
出部３０４は、分割された各領域ごとに、文字画像から
特徴要素数の次元の特徴量を抽出する（Ｓ１８０６）。
画像領域分類部３０５は、分割された各領域を文字画像
の周辺部である第１群と、文字画像の中央部である第２
群とに分類する（Ｓ１８０８）。The image area dividing section 303 divides the character image existing area into N areas (S1804). The feature amount extraction unit 304 extracts a feature amount of the dimension of the number of feature elements from the character image for each of the divided regions (S1806).
The image area classifying unit 305 assigns each of the divided areas to a first group that is a peripheral part of the character image and a second group that is a central part of the character image.
It is classified into groups (S1808).

【００２８】認識部３０６は、特徴量抽出部３０４で抽
出された各特徴量が周辺部の第１群に属するか否かを判
断する（Ｓ１８１０）。周辺部に属すると判断したとき
は、認識辞書３０１に登録されている文字の標準特徴量
との市街地距離ｄ１を計算し、係数１．０を乗算し（Ｓ
１８１２）、Ｓ１８１６に移る。周辺部に属しないと判
断したときは、中央部であるので、認識辞書３０１に登
録されている文字の標準特徴量との市街地距離ｄ２を計
算し、係数０．５を乗算する（Ｓ１８１４）。The recognizing unit 306 determines whether or not each feature extracted by the feature extracting unit 304 belongs to the first group of peripheral parts (S1810). When it is determined that the character belongs to the peripheral part, the city distance d1 with the standard feature amount of the character registered in the recognition dictionary 301 is calculated and multiplied by a coefficient 1.0 (S
1812), and the process moves to S1816. When it is determined that the character does not belong to the peripheral part, since it is the central part, the city distance d2 with the standard feature amount of the character registered in the recognition dictionary 301 is calculated and multiplied by a coefficient 0.5 (S1814).

【００２９】次に、認識部３０６は、Ｓ１８１２で得ら
れた周辺部の市街地距離ｄ１と、Ｓ１８１４で得られた
中央部の市街地距離ｄ２を０．５倍した値とを加算して
合計市街地距離ｄを計算し、用いた標準特徴量の文字コ
ードと合計市街地距離ｄとを記憶する（Ｓ１８１６）。
認識辞書３０１に登録されている全文字との照合が済む
までＳ１８１０からＳ１８１６を繰り返す（Ｓ１８１
８）。最も合計市街地距離ｄの小さい文字コードを、認
識対象の文字画像の文字として選出し、認識結果出力部
３０７に通知する（Ｓ１８２０）。Next, the recognizing unit 306 adds the value obtained by multiplying the city distance d1 of the peripheral part obtained in S1812 by 0.5 to the city distance d2 of the central part obtained in S1814 by 0.5 to obtain the total city distance. d is calculated, and the character code of the standard feature amount used and the total city distance d are stored (S1816).
Steps S1810 to S1816 are repeated until all characters registered in the recognition dictionary 301 have been collated (S181).
8). The character code with the smallest total city distance d is selected as a character of the character image to be recognized, and is notified to the recognition result output unit 307 (S1820).

【００３０】認識結果出力部３０７は、文字コードを記
憶し、また文字コードに対応する標準文字画像を表示し
処理を終了する。なお、本実施の形態では、第１群１６
０１と第２群１６０２とを文字画像の周辺部と中央部と
に分類したけれども、他の分類方法として文字画像の左
側部と右側部とで分類し、文字構成要素の「偏」に着目
したり、上側部と下側部とで分類し、文字構成要素の
「旁」に着目し、文字認識の精度を上げるようにしても
よい。（実施の形態２）図１９は、本発明に係る文字認識装置
の実施の形態２の構成図である。この文字認識装置は、
認識辞書３０１と、文字画像入力部３０２と、画像領域
分割部３０３と、特徴量抽出部３０４と、画像領域分類
部３０５と、候補文字認識部１９０１と、認識部１９０
２と、認識結果出力部３０７とを備えている。なお、上
記実施の形態１の文字認識装置と同様の構成部分には同
一の符号を付してその説明を省略し、本実施の形態固有
の構成部分についてのみ説明する。The recognition result output unit 307 stores the character code, displays a standard character image corresponding to the character code, and ends the processing. In the present embodiment, the first group 16
01 and the second group 1602 are classified into the peripheral part and the central part of the character image. However, as another classification method, the character group is classified into the left part and the right part of the character image. Alternatively, classification may be made between the upper part and the lower part, and attention may be paid to the “nearby” of the character component to improve the accuracy of character recognition. (Embodiment 2) FIG. 19 is a configuration diagram of a character recognition apparatus according to Embodiment 2 of the present invention. This character recognition device
Recognition dictionary 301, character image input unit 302, image region division unit 303, feature amount extraction unit 304, image region classification unit 305, candidate character recognition unit 1901, recognition unit 190
2 and a recognition result output unit 307. The same components as those of the character recognition device according to the first embodiment are denoted by the same reference numerals, and the description thereof will be omitted. Only the components unique to the present embodiment will be described.

【００３１】候補文字認識部１９０１は、画像領域分類
部３０５で第１群の周辺部と第２群の中央部とに領域が
分類されると、第１群の周辺部の特徴量と認識辞書３０
１に登録されている文字の対応する標準特徴量との市街
地距離ｄ１を式（１）を用いて計算する。図２０(a)
は、文字画像「電」１０１と「電」２０１との市街地距
離ｄ１の近い、即ち、文字画像の周辺部の類似度の高い
文字を示している。なお、候補文字認識部１９０１で
は、実際の文字「電」、「雷」、「雲」、・・・等の代
わりに文字コードが記憶されている。When the image area classifying section 305 classifies the area into the peripheral area of the first group and the central area of the second group, the candidate character recognizing section 1901 determines the feature amount of the peripheral area of the first group and the recognition dictionary. 30
The city distance d1 between the character registered in No. 1 and the corresponding standard feature amount is calculated using Expression (1). FIG. 20 (a)
Indicates a character having a short city distance d1 between the character images "den" 101 and "den" 201, that is, a character having a high degree of similarity at the periphery of the character image. Note that the candidate character recognition unit 1901 stores a character code instead of the actual characters “den”, “lightning”, “cloud”,.

【００３２】候補文字認識部１９０１は、認識辞書３０
１に登録されている全ての文字について認識対象の文字
画像との市街地距離ｄ１の計算が終了すると、市街地距
離ｄ１の小さい、例えば上位３文字の文字コードと市街
地距離ｄ１とを認識部１９０２に通知する。認識部１９
０２は、上記実施の形態の認識部３０６とほぼ同様の構
成であるが、市街地距離ｄ１の計算は、既に候補文字認
識部１９０１でされているので行わない。また、市街地
距離ｄ２の計算についても、候補文字認識部１９０１か
ら通知された文字コードの文字についてのみ、式（２）
を用いて計算する。計算した市街地距離ｄ２を０．５倍
して市街地距離ｄ１に加えた合計市街地距離ｄの最小の
ものを認識された文字として、認識結果出力部３０７に
通知する。The candidate character recognizing unit 1901 includes the recognition dictionary 30
When the calculation of the city distance d1 with respect to the character image to be recognized is completed for all the characters registered in No. 1, the character code of the small city distance d1, for example, the upper three characters and the city distance d1 are notified to the recognizing unit 1902. I do. Recognition unit 19
02 has substantially the same configuration as the recognition unit 306 of the above embodiment, but the calculation of the city distance d1 is not performed because the candidate character recognition unit 1901 has already performed the calculation. Also, with regard to the calculation of the city area distance d2, only the character of the character code notified from the candidate character recognition unit 1901 is given by the equation (2).
Calculate using The calculated city distance d2 is multiplied by 0.5 and the smallest total city distance d added to the city distance d1 is notified to the recognition result output unit 307 as a recognized character.

【００３３】図２０(b)は、文字画像「電」１０１と文
字画像「電」２０１との合計市街地距離ｄを示したもの
である。この結果、文字画像「電」１０１，２０１とも
に文字「電」と正しく認識されている。本実施の形態で
は、候補文字認識部１９０１によって、認識対象の文字
画像の中央部の特徴量と標準特徴量との市街地距離ｄ２
との計算対象を絞る（認識辞書３０１の全文字から例え
ば３文字に絞る）ことができるので、処理速度が速くな
る。FIG. 20B shows the total city distance d between the character image “den” 101 and the character image “den” 201. As a result, both the character images “den” 101 and 201 are correctly recognized as the character “den”. In the present embodiment, the candidate character recognizing unit 1901 causes the city area distance d2 between the feature amount at the center of the character image to be recognized and the standard feature amount.
Can be narrowed down (from all the characters in the recognition dictionary 301 to, for example, three characters), the processing speed is increased.

【００３４】次に本実施の形態の動作を図２１のフロー
チャートを用いて説明する。なお、Ｓ２１０２からＳ２
１０６までは、実施の形態１のＳ１８０２からＳ１８０
６と同様であるので説明を省略する。Ｓ２１０８におい
て、画像領域分類部３０５は、各領域を第１群の周辺部
と第２群の中央部とに分類し、候補文字認識部１９０１
と認識部１９０２とにその分類を通知する。Next, the operation of this embodiment will be described with reference to the flowchart of FIG. Note that S2102 to S2
Steps S1802 to S180 in the first embodiment
6 and the description is omitted. In step S2108, the image area classification unit 305 classifies each area into a peripheral part of the first group and a central part of the second group, and the candidate character recognition unit 1901
And the recognition unit 1902 of the classification.

【００３５】候補文字認識部１９０１は、通知された周
辺部の領域の特徴量と、認識辞書３０１に登録されてい
る対応する標準特徴量との市街地距離ｄ１を式（１）を
用いて計算する（Ｓ２１１０）。認識辞書３０１に登録
されている全ての文字について、Ｓ２１１０の処理を繰
り返し（Ｓ２１１２）、市街地距離ｄ１の小さい（近
い）３文字を選出し、それらの文字コードを認識部１９
０２に通知する（Ｓ２１１４）。The candidate character recognizing section 1901 calculates the city area distance d1 between the notified feature amount of the peripheral area and the corresponding standard feature amount registered in the recognition dictionary 301 by using equation (1). (S2110). The processing of S2110 is repeated for all the characters registered in the recognition dictionary 301 (S2112), and three characters with a small (close) city distance d1 are selected, and their character codes are recognized by the recognition unit 19.
02 is notified (S2114).

【００３６】認識部１９０２は、認識対象の文字画像の
第２群に分類された中央部の領域の特徴量と、候補文字
認識部１９０１から通知された文字コードの中央部の領
域の標準特徴量との市街地距離ｄ２を式（２）を用いて
計算し、係数０．５を乗算する（Ｓ２１１６）。候補文
字認識部１９０１から通知された市街地距離ｄ１に係数
１．０を乗算した値と市街地距離ｄ２に係数０．５を乗
算した値とを加え、合計市街地距離ｄを求める（Ｓ２１
１８）。通知された３文字の標準特徴量との市街地距離
ｄの算出が済むまでＳ２１１６，Ｓ２１１８の処理を繰
り返す（Ｓ２１２０）。認識部１９０２は、最も合計市
街地距離ｄが小さい文字を認識結果として選出し、文字
コードを認識結果出力部３０７に通知し（Ｓ２１２
２）、処理を終了する。（実施の形態３）図２２は、本発明に係る文字認識装置
の実施の形態３の構成図である。この文字認識装置は、
認識辞書３０１と、文字画像入力部３０２と、画像領域
分割部３０３と、特徴量抽出部３０４と、画像領域分類
部３０５と、大分類辞書２２０１と、大分類部２２０２
と、認識部２２０３と、認識結果出力部３０７とを備え
ている。なお、上記実施の形態１と同様の構成部分には
同一の符号を付して説明を省略し、本実施の形態固有の
構成部分についてのみ説明する。The recognizing unit 1902 calculates the characteristic amount of the central region classified into the second group of the character images to be recognized and the standard characteristic amount of the central region of the character code notified from the candidate character recognizing unit 1901. Is calculated using equation (2) and multiplied by a factor of 0.5 (S2116). A value obtained by multiplying the city distance d1 notified from the candidate character recognition unit 1901 by the coefficient 1.0 and a value obtained by multiplying the city distance d2 by the coefficient 0.5 are added to obtain a total city distance d (S21).
18). The processes of S2116 and S2118 are repeated until calculation of the city area distance d with the notified standard feature amount of three characters is completed (S2120). The recognizing unit 1902 selects a character having the smallest total city distance d as the recognition result, and notifies the character code to the recognition result output unit 307 (S212).
2), end the process. (Embodiment 3) FIG. 22 is a configuration diagram of a character recognition apparatus according to Embodiment 3 of the present invention. This character recognition device
Recognition dictionary 301, character image input unit 302, image region division unit 303, feature amount extraction unit 304, image region classification unit 305, large classification dictionary 2201, large classification unit 2202
, A recognition unit 2203, and a recognition result output unit 307. The same components as those in the first embodiment are denoted by the same reference numerals, and the description thereof will be omitted. Only the components unique to the present embodiment will be described.

【００３７】大分類辞書２２０１は、文字を標準的な文
字形状が類似する類似文字単位でグループ化し、類似文
字単位のグループ標準特徴量と類似文字単位に属する文
字の文字コード群とを登録している。図２３は、大分類
辞書２２０１の内容の一例を示す図である。図には、類
似文字単位の「Ｄ１」、「Ｄ２」、「Ｄ３」が例示され
ている。類似文字単位「Ｄ１」には、数字「１」とアル
ファベット「i」「t」とがグループ化されており、その
文字コード群（図では文字コードに替えて、実際の文字
「１」、「i」、「t」で表わしている）２３０１とグル
ープ標準特徴量２３０２とが登録されている。The large classification dictionary 2201 groups characters in similar character units having similar standard character shapes, and registers a group standard feature amount in similar character units and a character code group of characters belonging to similar character units. I have. FIG. 23 is a diagram illustrating an example of the contents of the large classification dictionary 2201. The drawing exemplifies “D1”, “D2”, and “D3” in similar character units. In the similar character unit “D1”, a numeral “1” and alphabets “i” and “t” are grouped, and their character code groups (in the figure, actual characters “1”, “ i ”and“ t ”2301) and a group standard feature value 2302 are registered.

【００３８】類似文字単位「Ｄ２」には、漢字「雲」、
「電」、「雷」がグループ化されており、その文字コー
ド群２３０３とグループ標準特徴量２３０４とが登録さ
れている。類似文字単位「Ｄ３」には、漢字「宅」、
「電」、「竜」とがグループ化されており、その文字コ
ード群２３０５とグループ標準特徴量２３０６とが登録
されている。The similar character unit “D2” includes the kanji “cloud”,
“Den” and “Lightning” are grouped, and a character code group 2303 and a group standard feature value 2304 are registered. The similar character unit “D3” includes the kanji “house”,
“Den” and “dragon” are grouped, and a character code group 2305 and a group standard feature amount 2306 are registered.

【００３９】この類似文字単位でのグループ化は、認識
辞書３０１の標準特徴をクラスタ分析し、所定の類似文
字単位で文字をクラスタに分類したものである。なお、
クラスタ分析については、「多変量統計解析法」田中豊
・脇本和昌著、現代数学社ｐ２３０〜２４４に記載され
ている。ここでは、類似文字単位での文字の重複登録を
許容している。グループ標準特徴量は、グループ化の基
になる文字の標準特徴量の単純平均で与えられている。The grouping in units of similar characters is performed by cluster analysis of standard features of the recognition dictionary 301 and classifying characters into clusters in units of predetermined similar characters. In addition,
The cluster analysis is described in "Multivariate Statistical Analysis", written by Yutaka Tanaka and Kazumasa Wakimoto, Gendai Mathematics, pp. 230-244. Here, overlapping registration of characters in units of similar characters is permitted. The group standard feature amount is given by a simple average of the standard feature amounts of characters that are the basis of grouping.

【００４０】勿論、類似文字単位のグループ化は、これ
以外の方法、例えば漢字の「偏」や「旁」の同一のもの
を類似文字単位にグループ化してもよい。大分類部２２
０２は、画像領域分類部３０５で第１群に分類された領
域の認識対象の文字画像の特徴量と大分類辞書２２０１
の対応するグループ標準特徴量との市街地距離ｄ１を式
（１）を用いて計算する。なお、このときＦsiは、グル
ープ標準特徴量である。市街地距離ｄ１を大分類辞書２
２０１に登録されている全ての類似文字単位について計
算し、市街地距離ｄ１の一番近い（類似度の高い）類似
文字単位を選択し、その文字コード群を認識部２２０３
に通知する。Of course, similar character units may be grouped in other methods, for example, groups of identical Chinese characters such as "biased" and "nearby" may be grouped in similar character units. Large classification section 22
02 denotes a feature amount of the character image to be recognized in the area classified into the first group by the image area classification unit 305 and the large classification dictionary 2201.
Is calculated using the equation (1). At this time, Fsi is a group standard feature amount. Large area dictionary 2 for city distance d1
The calculation is performed for all the similar character units registered in 201, a similar character unit with the closest city area distance d1 (high similarity) is selected, and the character code group is recognized by the recognition unit 2203.
Notify.

【００４１】図２４(a)は、大分類部２２０２での認識
対象文字画像「電」１０１，２０１の市街地距離ｄ１の
計算結果を示す図である。類似度の高い順に類似文字単
位が示されている。文字画像「電」１０１では、類似文
字単位「Ｄ２」の「雲」、「電」、「雷」２３０３が選
択され、文字画像「電」２０１では、類似文字単位「Ｄ
３」の「宅」、「電」、「竜」２３０５が選択され、そ
れらの文字コードが認識部２２０３に通知される。FIG. 24A is a diagram showing the calculation result of the city distance d1 of the character images 101 and 201 to be recognized by the large classification unit 2202. Similar character units are shown in descending order of similarity. In the character image “den” 101, “cloud”, “den”, and “thunder” 2303 of the similar character unit “D2” are selected, and in the character image “den” 201, the similar character unit “D”
“Home”, “Den”, and “Dragon” 2305 of “3” are selected, and their character codes are notified to the recognition unit 2203.

【００４２】認識部２２０３は、大分類部２２０２から
文字コード群の通知を受けると、認識対象の文字画像の
第１群の周辺部の特徴量と、認識辞書３０１に登録され
た通知された文字コードの対応する標準特徴量との市街
地距離ｄ１を式（１）を用いて計算する。同様に第２群
の中央部の特徴量と対応する標準特徴量との市街地距離
ｄ２を式（２）を用いて計算し、市街地距離ｄ２に係数
０．５を乗算する。市街地距離ｄ１と係数０．５を乗じ
た市街地距離ｄ２とを合計した合計市街地距離ｄを計算
し、文字コードとともに記憶する。通知された文字コー
ド群についてこの処理をした後、合計市街地距離ｄの一
番小さい文字コードを認識結果出力部３０７に通知す
る。When the recognition unit 2203 receives the notification of the character code group from the large classification unit 2202, it recognizes the feature amount of the peripheral portion of the first group of the character image to be recognized and the notified character registered in the recognition dictionary 301. The city distance d1 with the corresponding standard feature value of the code is calculated using equation (1). Similarly, the city distance d2 between the feature amount at the center of the second group and the corresponding standard feature amount is calculated using the equation (2), and the city distance d2 is multiplied by a coefficient 0.5. The total city distance d is calculated by summing the city distance d1 and the city distance d2 multiplied by the coefficient 0.5, and is stored together with the character code. After performing this process for the notified character code group, the character code having the smallest total city distance d is notified to the recognition result output unit 307.

【００４３】図２４(b)は、認識部２２０３での認識対
象文字画像「電」１０１，２０１の合計市街地距離ｄの
計算結果を示す図である。文字画像「電」１０１，２０
１とも、類似度の一番高い文字は「電」と認識されてい
る。本実施の形態では、認識対象の文字画像の画像領域
のうち、文字形状の特徴をより明瞭に反映する画像領域
の特徴量からグループ化された類似文字単位を選択し、
認識文字の候補を絞り、その後、実施の形態１と同様に
分類された画像領域により、異なる係数を市街地距離に
乗じて、類似度を計算している。これによって、認識辞
書３０１との照合に要する時間は大幅に短縮される。FIG. 24B is a diagram showing a calculation result of the total city distance d of the character images “D” 101 and 201 to be recognized by the recognition unit 2203. Character image "den" 101, 20
In both cases, the character with the highest similarity is recognized as “den”. In the present embodiment, among the image regions of the character image to be recognized, a similar character unit grouped from the feature amount of the image region that more clearly reflects the character shape feature is selected,
Recognition character candidates are narrowed down, and then, the similarity is calculated by multiplying the city area distance by a different coefficient according to the image area classified as in the first embodiment. As a result, the time required for comparison with the recognition dictionary 301 is greatly reduced.

【００４４】次に、本実施の形態の動作を図２５のフロ
ーチャートを用いて説明する。Ｓ２５０６までは、実施
の形態１のＳ１８０６と同様であるので説明を省略す
る。画像領域分類部３０５は、各領域を周辺部の第１群
と中央部の第２群との分類し、大分類部２２０２と認識
部２２０３とに分類した領域を通知する（Ｓ２５０
８）。Next, the operation of this embodiment will be described with reference to the flowchart of FIG. Steps up to S2506 are the same as S1806 of the first embodiment, and a description thereof will not be repeated. The image area classification unit 305 classifies each area into a first group in the peripheral part and a second group in the central part, and notifies the large classification unit 2202 and the recognition unit 2203 of the classified area (S250).
8).

【００４５】大分類部２２０２は、認識対象文字画像の
周辺部の特徴量と大分類辞書２２０１の対応する類似文
字単位のグループ標準特徴量との市街地距離ｄ１を計算
する（Ｓ２５１０）。大分類辞書２２０１に登録されて
いる全ての類似文字単位と照合するまでＳ２５１０を繰
り返す（Ｓ２５１２）。次に、大分類部２２０２は、最
も市街地距離ｄ１の近い、即ち、類似度の高い類似文字
単位を選出し、その類似文字単位に含まれる文字の文字
コード群を認識部２２０３に通知する（Ｓ２５１４）。The large classification unit 2202 calculates the city distance d1 between the characteristic amount of the peripheral portion of the recognition target character image and the corresponding group standard characteristic amount of similar character units in the large classification dictionary 2201 (S2510). S2510 is repeated until all similar character units registered in the large classification dictionary 2201 are collated (S2512). Next, the large classification unit 2202 selects a similar character unit with the closest city area distance d1, that is, a similar character unit with the highest similarity, and notifies the recognition unit 2203 of a character code group of characters included in the similar character unit (S2514). ).

【００４６】認識部２２０３は、大分類部２２０２から
各文字コードを通知されると、認識対象文字画像の特徴
量が領域の周辺部に属するか否かを判断し（Ｓ２５１
６）、周辺部に属するときには、特徴量と通知された文
字コードの対応する標準特徴量との市街地距離ｄ１を認
識辞書３０１を参照し、式（１）を用いて計算し、市街
地距離ｄ１に係数１．０を乗算する（Ｓ２５１８）。周
辺部に属さない特徴量のときは、特徴量と通知された文
字コードの対応する標準特徴量との市街地距離ｄ２を認
識辞書３０１を参照し、式（２）を用いて計算し、市街
地距離ｄ２に係数０．５を乗算する（Ｓ２５２０）。次
にＳ２５１８とＳ２５２０で得られた値を合計した合計
市街地距離ｄを求め、文字コードとともに記憶する（Ｓ
２５２２）。通知された文字コードについて、認識辞書
３０１との照合をＳ２５１６からＳ２５２２まで繰り返
し（Ｓ２５２４）、最も合計市街地距離ｄの近い（類似
度の高い）文字コードを選出して（Ｓ２５２６）、処理
を終了する。Upon being notified of each character code from the large classifying unit 2202, the recognizing unit 2203 determines whether or not the feature amount of the character image to be recognized belongs to the peripheral part of the area (S251).
6) If it belongs to the peripheral part, the city distance d1 between the feature amount and the corresponding standard feature amount of the notified character code is calculated by using the expression (1) with reference to the recognition dictionary 301, and the city distance d1 is calculated. The coefficient is multiplied by 1.0 (S2518). When the feature amount does not belong to the peripheral portion, the city area distance d2 between the feature amount and the corresponding standard feature amount of the notified character code is calculated using Equation (2) with reference to the recognition dictionary 301, and the city area distance is calculated. d2 is multiplied by a coefficient of 0.5 (S2520). Next, the total city distance d obtained by summing the values obtained in S2518 and S2520 is obtained and stored together with the character code (S
2522). For the notified character code, the collation with the recognition dictionary 301 is repeated from S2516 to S2522 (S2524), and the character code with the closest total city distance d (high similarity) is selected (S2526), and the process is terminated. .

【００４７】なお、本実施の形態で大分類辞書２２０１
は、類似文字単位に属する文字群のグループ標準特徴を
画像領域の全てについて登録していたけれども、第１群
に分類される周辺部の領域のグループ標準特徴だけを文
字コード群とともに登録するようにしてもよい。（実施の形態４）図２６は、本発明に係る文字認識装置
の実施の形態４の構成図である。この文字認識装置は、
認識辞書３０１と、文字画像入力部３０２と、画像領域
分割部３０３と、特徴量抽出部３０４と、つぶれ度合算
出部２６０１と、認識部２６０２と、認識結果出力部３
０７とを備えている。上記実施の形態１と同様の構成部
分には同一の符号を付しその説明を省略し、本実施の形
態固有の構成部分についてのみ説明する。In this embodiment, the large classification dictionary 2201
Has registered the group standard feature of the character group belonging to the similar character unit for all of the image regions, but has registered only the group standard feature of the peripheral region classified into the first group together with the character code group. You may. (Embodiment 4) FIG. 26 is a configuration diagram of a character recognition apparatus according to Embodiment 4 of the present invention. This character recognition device
Recognition dictionary 301, character image input unit 302, image area dividing unit 303, feature amount extracting unit 304, crush degree calculating unit 2601, recognition unit 2602, recognition result output unit 3
07. The same components as those in the first embodiment are denoted by the same reference numerals, and the description thereof will be omitted. Only the components unique to the present embodiment will be described.

【００４８】つぶれ度合算出部２６０１は、画像領域分
割部３０３で分割された１６の各領域について、領域に
占める文字要素部分の割合を計算する。具体的には、領
域に含まれる「黒」画素の数を計数し、その数を領域の
全画素数「１６」で除算し、「１００」を乗算する。こ
の割合がしきい値未満の領域を正常部とみなし第１群に
分類し、しきい値以上の領域をつぶれ部とみなし第２群
に分類する。ここでは、しきい値を「７５％」としてい
る。The crushing degree calculating section 2601 calculates the ratio of the character element portion to the area for each of the 16 areas divided by the image area dividing section 303. Specifically, the number of “black” pixels included in the area is counted, the number is divided by the total number of pixels “16” of the area, and multiplied by “100”. An area where this ratio is less than the threshold value is regarded as a normal part and is classified into the first group, and an area where the ratio is equal to or more than the threshold value is regarded as the collapsed part and is classified into the second group. Here, the threshold is “75%”.

【００４９】図２７は、つぶれ度合算出部２６０１で分
類された第１群の正常部２７０１と第２群のつぶれ部２
７０２とを図１、図２の文字画像「電」１０１，２０１
について示すものである。これによれば、文字画像
「電」２０１につぶれ部が文字画像「電」１０１よりも
多数存在することがわかる。つぶれ度合算出部２６０１
は、第１群と第２群とに分類した各領域を認識部２６０
２に通知する。FIG. 27 shows a normal part 2701 of the first group and a crushed part 2 of the second group classified by the crush degree calculating unit 2601.
702 to the character images “Den” 101 and 201 in FIGS.
It is shown about. According to this, it is understood that the character image “den” 201 has a larger number of collapsed parts than the character image “den” 101. Crush degree calculation unit 2601
Recognizes each area classified into the first group and the second group
Notify 2.

【００５０】認識部２６０２は、つぶれ度合算出部２６
０１から第１群と第２群との領域を通知されると、第１
群の領域の認識対象文字画像の特徴量と認識辞書３０１
の対応する標準特徴量との市街地距離ｄ１を計算し、係
数αを乗算し、第２群の領域の同様の特徴量と認識辞書
３０１の対応する標準特徴量との市街地距離ｄ２を計算
し、係数βを乗算する。得られたそれぞれの値を合計し
て合計市街地距離ｄを求める。求めた合計市街地距離ｄ
と標準特徴量に対応する文字コードとを記憶し、認識辞
書３０１に登録されている全ての文字に対してこの処理
を行う。The recognizing unit 2602 includes the crush degree calculating unit 26
When the area of the first group and the second group is notified from 01, the first group is notified.
Features of recognition target character image of group area and recognition dictionary 301
Is calculated, and a city distance d1 between the corresponding standard feature amount of the second dictionary and the corresponding standard feature amount of the recognition dictionary 301 is calculated by multiplying the same by the coefficient α. Multiply by the coefficient β. The obtained values are summed to obtain a total city distance d. Total city distance d obtained
And a character code corresponding to the standard feature amount, and this process is performed on all the characters registered in the recognition dictionary 301.

【００５１】認識対象文字画像「電」１０１では、第１
群の領域が１３領域であるので、市街地距離ｄ１は、式
（３）で計算される。第２群の領域が３領域であるので、市街地距離ｄ２は、
式（４）で計算される。認識対象文字画像「電」２０１では、第１群の領域が８
領域であり、第２群の領域が８領域であるので、式
（３）でiはi＝１〜i＝３２となり、式（４）ではｊは
ｊ＝１〜ｊ＝３２となる。なお、Ｆoi、Ｆsi、Ｆoj、Ｆ
sj、i、jについては実施の形態１の説明と同様である。In the character image “den” 101 to be recognized, the first
Since there are 13 groups, the city distance d1 is calculated by equation (3). Since the area of the second group is three areas, the city distance d2 is
It is calculated by equation (4). In the recognition target character image “den” 201, the first group area is 8
Since there are eight regions in the second group, i is 1 to i = 32 in Expression (3), and j is j = 1 to j = 32 in Expression (4). Note that Foi, Fsi, Foj, F
sj, i, and j are the same as those described in the first embodiment.

【００５２】ここで、α＝１．０、β＝０．５として合
計市街地距離ｄを文字画像「電」１０１，２０１につい
て求めると、認識対象文字画像「電」１０１，２０１と
もに文字「電」が合計市街地距離ｄが最小となり、即ち
類似度が最高となり、認識結果として選択され、認識結
果出力部３０７に通知される。次に、本実施の形態の動
作を説明する。Here, when α = 1.0 and β = 0.5 and the total city distance d is obtained for the character images “Den” 101 and 201, the recognition target character images “Den” 101 and 201 are both characters “Den”. Is the minimum total city distance d, that is, the similarity is the highest, is selected as the recognition result, and is notified to the recognition result output unit 307. Next, the operation of the present embodiment will be described.

【００５３】なお、本実施の形態の動作は、実施の形態
１の動作とＳ１８０８〜Ｓ１８１４（図１８参照）が異
なるだけであるので、その部分だけを説明する。Ｓ１８
０８では、画像領域分類部３０５が予め設定された周辺
部の第１群と中央部の第２群とに分類したが、本実施の
形態では、画像領域分割部３０３で分割された領域ごと
に領域に占める文字要素部分に相当する「黒」画素の割
合を計算し、その値がしきい値未満であれば、正常部と
して第１群に分類し、しきい値以上であればつぶれ部と
して第２群に分類する。The operation of the present embodiment is different from the operation of the first embodiment only in steps S1808 to S1814 (see FIG. 18). S18
In step 08, the image area classifying unit 305 classifies the image into the first group of the peripheral part and the second group of the central part, which are set in advance. The ratio of “black” pixels corresponding to the character element portion in the area is calculated, and if the value is less than the threshold value, it is classified into the first group as a normal part. Classify into the second group.

【００５４】Ｓ１８１０では、認識部３０６が認識対象
文字画像の特徴量が周辺部の第１群に属するか否かを判
断したが、本実施の形態では、認識部２６０２は、正常
部の第１群に属するか否かを判断する。正常部の第１群
に属するときには、Ｓ１８１２に移り、つぶれ部の第２
群に属するときにはＳ１８１４に移る。認識部２６０２
は、Ｓ１８１２、Ｓ１８１４において、周辺部を正常
部、中央部をつぶれ部として市街地距離ｄ１，ｄ２を計
算し、係数１．０，０．５をそれぞれｄ１，ｄ２に乗算
する。In step S1810, the recognizing unit 306 determines whether or not the feature amount of the character image to be recognized belongs to the first group of the peripheral part. However, in the present embodiment, the recognizing unit 2602 uses the first part of the normal part. It is determined whether it belongs to a group. If it belongs to the first group of the normal part, the process moves to S1812 and the second part of the collapsed part
If it belongs to the group, the process moves to S1814. Recognition unit 2602
Calculates the city distances d1 and d2 in S1812 and S1814 with the peripheral portion being a normal portion and the central portion being a collapsed portion, and multiplying d1 and d2 by coefficients 1.0 and 0.5, respectively.

【００５５】このように、本実施の形態では、文字画像
入力部３０２から入力される認識対象文字画像に応じて
分割された領域を第１群と第２群とに分類するので、予
め設定された各群に分類するよりも認識精度が向上す
る。（実施の形態５）図２９は、本発明に係る文字認識装置
の実施の形態５の構成図である。この文字認識装置は、
認識辞書３０１と、文字画像入力部３０２と、画像領域
分割部３０３と、特徴量抽出部３０４とつぶれ度合算出
部２６０１と、候補文字認識部２９０１と、認識部２９
０２と、認識結果出力部３０７とを備えている。なお、
上記実施の形態１及び実施の形態４と同一の構成部分に
は、同一の符号を付してその説明を省略する。また、本
実施の形態の候補文字認識部２９０１と認識部２９０２
とは、上記実施の形態２の候補文字認識部１９０１と認
識部１９０２とほぼ同様の構成であるが、画像領域分類
部３０５とつぶれ度合算出部２６０１との違いから若干
の相違がある。As described above, in the present embodiment, the regions divided according to the recognition target character image input from the character image input unit 302 are classified into the first group and the second group. Recognition accuracy is improved compared to classification into groups. (Fifth Embodiment) FIG. 29 is a configuration diagram of a character recognition device according to a fifth embodiment of the present invention. This character recognition device
Recognition dictionary 301, character image input unit 302, image region division unit 303, feature amount extraction unit 304, crush degree calculation unit 2601, candidate character recognition unit 2901, recognition unit 29
02 and a recognition result output unit 307. In addition,
The same components as those in the first and fourth embodiments are denoted by the same reference numerals, and the description thereof will be omitted. Also, the candidate character recognition unit 2901 and the recognition unit 2902 of the present embodiment
Has almost the same configuration as the candidate character recognizing unit 1901 and the recognizing unit 1902 of the second embodiment, but has a slight difference from the difference between the image area classifying unit 305 and the crush degree calculating unit 2601.

【００５６】つぶれ度合算出部２６０１は、認識対象の
文字画像の各領域の文字要素部分の割合がしきい値、例
えば７５％未満であるか否かを判断し、未満の領域を第
１群の正常部として候補文字認識部２９０１に、以上の
領域を第２群のつぶれ部として認識部２９０２に通知す
る。候補文字認識部２９０１は、つぶれ度合算出部２６
０１から通知された第１群に分類された正常部の領域の
特徴量を特徴量抽出部３０４から読み出し、認識辞書３
０１に登録されている対応する標準特徴量との市街地距
離ｄ１を計算する。The crushing degree calculating section 2601 determines whether or not the ratio of the character element portion in each region of the character image to be recognized is less than a threshold value, for example, 75%. It notifies the candidate character recognizing unit 2901 as a normal part and notifies the recognizing unit 2902 of the above area as a second group crushed part. The candidate character recognizing unit 2901 includes a crush degree calculating unit 26.
01 is read from the feature value extraction unit 304 and the feature value of the normal portion area classified into the first group notified from
Calculate the city distance d1 with the corresponding standard feature amount registered in 01.

【００５７】今、認識対象文字画像「電」１０１，２０
１の各領域が図２７に示したように、第１群と第２群と
に分類されているときには、式（３）を用いて第１群の
正常部の特徴量と認識辞書３０１の対応する標準特徴量
との市街地距離ｄ１を計算する。認識辞書３０１の全て
の文字について市街地距離ｄ１を求めると図３０(a)に
示すようになる。市街地距離ｄ１の近い、即ち類似度の
高い上位３つの文字の文字コードと市街地距離ｄ１とを
組にして認識部２９０２に通知する。Now, the recognition target character image "den" 101, 20
As shown in FIG. 27, when each area of the first group is classified into the first group and the second group, the correspondence between the feature amount of the normal part of the first group and the recognition dictionary 301 is calculated using Expression (3). The city distance d1 with the standard feature value to be calculated is calculated. When the city distance d1 is obtained for all the characters in the recognition dictionary 301, the result is as shown in FIG. The recognition unit 2902 is notified of a set of the character codes of the top three characters having a short city distance d1, that is, the top three characters having a high degree of similarity, and the city distance d1.

【００５８】認識部２９０２は、第２群に分類された特
徴量と、認識辞書３０１に登録されている通知された文
字コードの対応する標準特徴量との市街地距離ｄ２を式
（４）を用いて計算する。認識部２９０２は、市街地距
離ｄ１に係数αを乗じた値と計算した市街地距離ｄ２に
係数βを乗じた値とを合計し、合計市街地距離ｄを求め
る。α＝１．０，β＝０．５としたとき、認識対象文字
画像「電」１０１，２０１の合計市街地距離ｄは図３０
(b)に示すようになる。この結果、類似度の最も高い文
字「電」の文字コードを認識結果出力部３０７に通知す
る。The recognizing unit 2902 calculates the city distance d2 between the feature quantity classified into the second group and the standard feature quantity corresponding to the notified character code registered in the recognition dictionary 301 by using equation (4). To calculate. The recognizing unit 2902 sums a value obtained by multiplying the city distance d1 by the coefficient α and a value obtained by multiplying the calculated city distance d2 by the coefficient β to obtain a total city distance d. When α = 1.0 and β = 0.5, the total city distance d of the character images 101 and 201 to be recognized is as shown in FIG.
(b). As a result, the character code of the character “den” having the highest similarity is notified to the recognition result output unit 307.

【００５９】本実施の形態の動作は、実施の形態２の動
作と少し異なるだけであるので、異なる部分を図２１を
参照して簡単に説明する。Ｓ２１０８において、つぶれ
度合算出部２６０１は、正常部の第１群とつぶれ部の第
２群に領域を分類する。Ｓ２１１０で候補文字認識部２
９０１は、第１群の特徴量を用いて市街地距離ｄ１を求
める。これによって、Ｓ２１１４において、認識辞書３
０１の標準特徴量との市街地距離ｄ２の計算対象を３文
字に絞る。（実施の形態６）図３１は、本発明に係る文字認識装置
の実施の形態６の構成図である。この文字認識装置は、
認識辞書３０１と、文字画像入力部３０２と、画像領域
分割部３０３と、特徴量抽出部３０４と、つぶれ度合算
出部２６０１と、大分類辞書２２０１と、大分類部３１
０１と、認識部３１０２と、認識結果出力部３０７とを
備えている。The operation of the present embodiment is slightly different from the operation of the second embodiment, and therefore, different portions will be briefly described with reference to FIG. In step S2108, the crushing degree calculation unit 2601 classifies the areas into a first group of normal parts and a second group of crushed parts. In step S2110, the candidate character recognition unit 2
A step 901 obtains an urban area distance d1 using the feature amount of the first group. Thereby, in S2114, the recognition dictionary 3
The calculation target of the city distance d2 with the standard feature value of 01 is narrowed down to three characters. (Embodiment 6) FIG. 31 is a configuration diagram of a character recognition apparatus according to Embodiment 6 of the present invention. This character recognition device
A recognition dictionary 301, a character image input unit 302, an image region dividing unit 303, a feature amount extracting unit 304, a crush degree calculating unit 2601, a large classification dictionary 2201, and a large classification unit 31
01, a recognition unit 3102, and a recognition result output unit 307.

【００６０】実施の形態１、実施の形態３及び実施の形
態４と同一の構成部分には、同一の符号を付しその説明
を省略する。大分類部３１０１は、つぶれ度合算出部２
６０１から通知された第１群に分類された領域の特徴量
と大分類辞書２２０１に登録されている対応するグルー
プ標準特徴量との市街地距離ｄ１を計算する。大分類辞
書２２０１に登録されている全ての類似文字単位につい
て市街地距離ｄ１を求め、一番類似する類似文字単位を
選択し、類似文字単位に含まれる文字の文字コードを認
識部３１０２に通知する。The same components as those in the first, third, and fourth embodiments are denoted by the same reference numerals, and description thereof is omitted. The large classifying unit 3101 includes a crush degree calculating unit 2
The urban area distance d1 between the feature amount of the area classified into the first group notified from 601 and the corresponding group standard feature amount registered in the large classification dictionary 2201 is calculated. The city distance d1 is obtained for all similar character units registered in the large classification dictionary 2201, the most similar similar character unit is selected, and the character code of the character included in the similar character unit is notified to the recognition unit 3102.

【００６１】今、図２７に示すように領域が第１群の正
常部２７０１と第２群のつぶれ部２７０２とに分類され
ている場合、大分類部３１０１は、実施の形態４と同様
に式（３）等を用いて市街地距離ｄ１を計算する。この
とき、Ｆsiは、大分類辞書２２０１のグループ標準特徴
量である。図３２(a)は、この市街地距離ｄ１を認識対
象文字画像「電」１０１，２０１それぞれについて示し
ている。一番類似する類似文字単位は「電」１０１では
「Ｄ２」であり、「電」２０１では「Ｄ３」である。Now, as shown in FIG. 27, when the areas are classified into a normal part 2701 of the first group and a crushed part 2702 of the second group, the large classification unit 3101 The city distance d1 is calculated using (3) and the like. At this time, Fsi is a group standard feature amount of the large classification dictionary 2201. FIG. 32 (a) shows the city area distance d1 for the recognition target character images "den" 101 and 201, respectively. The most similar character unit is “D2” for “den” 101 and “D3” for “den” 201.

【００６２】認識部３１０２は、大分類部３１０１から
文字コード群を通知されると、認識対象文字画像の特徴
量が第１群に分類されたものであるときには、認識辞書
３０１の通知された文字コードの対応する標準特徴量と
の市街地距離ｄ１を式（３）などを用いて計算する。第
２群に分類されたものであるときには、その特徴量と認
識辞書３０１の対応する標準特徴量との市街地距離ｄ２
を式（４）等を用いて計算する。得られた市街地距離ｄ
１，ｄ２にそれぞれ係数α、βを乗算して合計市街地距
離ｄを求める。この処理を通知された文字コード群の文
字についてだけ行う。合計市街地距離ｄの最小となる文
字の文字コードを認識結果出力部３０７に通知する。When the character code group is notified from the large classification unit 3101, the recognition unit 3102, when the feature amount of the character image to be recognized is classified into the first group, recognizes the notified character in the recognition dictionary 301. The city distance d1 with the corresponding standard feature value of the code is calculated using the equation (3) or the like. When the object is classified into the second group, the city distance d2 between the feature amount and the corresponding standard feature amount of the recognition dictionary 301 is determined.
Is calculated using Equation (4) and the like. Obtained city distance d
1 and d2 are multiplied by coefficients α and β, respectively, to obtain a total city distance d. This process is performed only for the characters in the notified character code group. The character code of the character having the minimum total city distance d is notified to the recognition result output unit 307.

【００６３】係数α、βをそれぞれ１．０、０．５とし
たときの認識対象文字画像「電」１０１，２０１の合計
市街地距離を図３２(b)に示す。認識結果は、ともに文
字「電」となっている。このように大分類辞書２２０１
のグループ標準特徴量との市街地距離ｄ１を計算するこ
とによって、認識辞書３０１を用いての合計市街地距離
ｄを求める処理対象を絞ることができ、認識精度の向上
と認識処理に要する時間の短縮とを図ることができる。FIG. 32B shows the total city distance of the character images 101 and 201 to be recognized when the coefficients α and β are 1.0 and 0.5, respectively. The recognition results are both characters "den". Thus, the large classification dictionary 2201
By calculating the city distance d1 with the group standard feature amount of, it is possible to narrow down the processing target for calculating the total city distance d using the recognition dictionary 301, thereby improving the recognition accuracy and shortening the time required for the recognition processing. Can be achieved.

【００６４】次に、本実施の形態の動作を実施の形態３
のフローチャート（図２５）を用いて簡単に説明する。
Ｓ２５０８に換えて、つぶれ度合算出部２６０１は、画
像領域分割部３０３で分割された各領域を正常部の第１
群とつぶれ部の第２群とに分類する。Ｓ２５１０におい
て、大分類部３１０１は、認識対象文字画像の第１群の
特徴量と大分類辞書２２０１のグループ標準特徴量との
市街地距離ｄ１を計算する。Next, the operation of the present embodiment will be described in Embodiment 3.
This will be briefly described with reference to the flowchart (FIG. 25).
Instead of S2508, the degree-of-crush calculation unit 2601 determines each area divided by the image area division unit 303 as the first part of the normal part.
It is classified into a group and a second group of crushed portions. In step S2510, the large classification unit 3101 calculates an urban distance d1 between the feature amount of the first group of the recognition target character image and the group standard feature amount of the large classification dictionary 2201.

【００６５】Ｓ２５１６において、認識部３１０２は、
特徴量が第１群の正常部に属するか否かを判定して、以
下、実施の形態３と同様の処理をする。なお、上記実施
の形態１〜６では、認識辞書３０１に登録された標準的
な正規化された文字画像を１６×１６画素としたけれど
も、他のサイズとしてもよいのは勿論であり、この場合
には、文字画像入力部３０２で入力される認識対象の文
字画像を同じサイズとすればよい。In S2516, the recognition unit 3102
It is determined whether or not the feature quantity belongs to the normal part of the first group, and thereafter, the same processing as in the third embodiment is performed. In the first to sixth embodiments, the standard normalized character image registered in the recognition dictionary 301 is 16 × 16 pixels. However, it is needless to say that other sizes may be used. , The character images to be recognized input by the character image input unit 302 may have the same size.

【００６６】また、標準的な文字画像の領域の分割を縦
横４画素ずつの１６等分としたけれども、縦横それぞれ
異なる画素ずつで分割し、「ｐ×ｑ」分割としてもよい
のは勿論である。この場合には、認識対象の文字画像も
画像領域分割部３０３で、同様に「ｐ×ｑ」分割されれ
ばよい。また、画像領域分割部３０３は、正規化された
認識対象の文字画像を各領域に分割したけれども、文字
画像入力部３０２で文字画像を正規化することなく、画
像領域分割部は、入力された認識対象文字画像を重心や
等ピッチで「ｐ×ｑ」の領域に分割し、特徴量抽出部が
各領域毎に特徴要素の数を計数した上で、各領域の面積
や長さなどの領域サイズで特徴要素の数を正規化して特
徴量としてもよい。この場合、認識辞書には、同様の方
法で抽出された標準文字画像の標準特徴量が登録されて
いる。Although the standard character image area is divided into 16 equal parts of 4 pixels each in the vertical and horizontal directions, it is needless to say that the area may be divided by pixels different in the vertical and horizontal directions and may be divided into “p × q”. . In this case, the character image to be recognized may be similarly “p × q” divided by the image area dividing unit 303. Further, although the image region dividing unit 303 divides the normalized character image to be recognized into each region, the character region input unit 302 does not normalize the character image. The character image to be recognized is divided into “p × q” regions at the center of gravity or at equal pitches, and the feature amount extraction unit counts the number of feature elements for each region, and then calculates the area such as the area and length of each region. The number of feature elements may be normalized by size to obtain a feature amount. In this case, the recognition dictionary registers the standard feature amounts of the standard character images extracted by the same method.

【００６７】また、上記実施の形態では、領域を第１群
と第２群との２群に分類して類似度を示す市街地距離に
２種の係数を乗じて、類似度に重み付けをしたけれど
も、他の実施の形態として３群以上に分類して、それぞ
れ異なる係数をそれらの市街地距離に乗じるようにし
て、更に文字認識の精度を高めるようにしてもよい。ま
た、上記実施の形態では、類似度を求めるのに市街地距
離を用いたけれども、これに限ることはなく、ユークリ
ッド距離やマハラノビス距離を用いてもよい。Further, in the above embodiment, the similarity is weighted by classifying the area into two groups, a first group and a second group, and multiplying the city distance indicating the similarity by two kinds of coefficients. In another embodiment, the distance may be classified into three or more groups, and different coefficients may be multiplied by the distances in the city area to further improve the accuracy of character recognition. Further, in the above embodiment, the city distance is used for obtaining the similarity. However, the present invention is not limited to this, and the Euclidean distance or the Mahalanobis distance may be used.

【００６８】また、上記実施の形態１〜６では、その構
成を図３、図１９、図２２、図２６、図２９、図３１に
それぞれ示したけれども、本発明の他の実施の形態とし
て、各図に示した構成要素の機能が発揮されるようなプ
ログラムをコンピュータ読み取り可能な記録媒体に記録
しておき、このような機能のない文字認識装置に装着し
てプログラムを読み込み、本発明と同様の効果を発揮さ
せるようにしてもよい。In the first to sixth embodiments, the structure is shown in FIGS. 3, 19, 22, 26, 29, and 31, respectively. However, as another embodiment of the present invention, A program in which the functions of the constituent elements shown in the drawings are exhibited is recorded on a computer-readable recording medium, and the program is read by being mounted on a character recognition device having no such function, and is similar to the present invention. The effect of may be exerted.

【００６９】[0069]

【発明の効果】以上説明したように、本発明は、認識対
象文字画像を文字コードとして認識する文字認識装置で
あって、文字の標準的な画像をＮ（Ｎ≧２）の部分画像
領域に分割し、各部分画像領域に含まれる部分文字画像
の標準特徴を文字単位で文字コードと共に予め登録して
いる認識辞書と、認識対象文字画像をＮの部分画像領域
に分割する画像領域分割手段と、前記画像領域分割手段
で分割された部分画像領域に含まれる部分文字画像の特
徴を抽出する特徴抽出手段と、前記画像領域分割手段で
分割された部分画像領域を複数群に分類する部分画像領
域分類手段と、各群ごとに前記認識対象文字画像の部分
文字画像の特徴と前記認識辞書の対応する標準特徴との
素類似度を計算し、各群の素類似度に所定の重み付けを
して、認識対象文字画像と前記認識辞書の文字との類似
度を計算する類似度計算手段と、前記類似度が最も高い
文字を認識文字としてその文字コードを選出する認識文
字選出手段とを備えることとしている。As described above, the present invention relates to a character recognition apparatus for recognizing a character image to be recognized as a character code, wherein a standard image of a character is stored in N (N ≧ 2) partial image areas. A recognition dictionary that divides and pre-registers standard features of partial character images included in each partial image region together with character codes in character units, an image region dividing unit that divides a recognition target character image into N partial image regions, A feature extracting unit for extracting a feature of a partial character image included in the partial image region divided by the image region dividing unit; and a partial image region for classifying the partial image regions divided by the image region dividing unit into a plurality of groups. Classifying means, for each group, calculate the elementary similarity between the partial character image feature of the recognition target character image and the corresponding standard feature of the recognition dictionary, and weight the elementary similarity of each group by a predetermined weight. , Sentence to be recognized A similarity calculation means for calculating a similarity between the image and the recognition dictionary character, is set to be equipped with a recognized character selection means for selecting the character codes of the similarity highest character as the recognized character.

【００７０】このような構成によって、各群の素類似度
に各群の文字認識への寄与に応じた軽重を付加して、全
体の類似度が計算されるので、認識対象文字画像がつぶ
れている場合でも、文字認識の精度が向上した文字認識
装置を得ることができる。また、前記部分画像領域分類
手段は、前記部分画像領域を認識対象文字画像の周辺部
の第１群と中央部の第２群とに分類する周辺・中央分類
部を有し、前記類似度計算手段は、前記第１群と第２群
との、前記部分文字画像の特徴と前記認識辞書の対応す
る標準特徴との市街地距離、ユークリッド距離又はマハ
ラノビス距離である素類似度を別々に計算する素類似度
計算部と、前記第１群の類似度への寄与が第２群の類似
度への寄与よりも大きくなるように第１群と第２群との
素類似度にそれぞれ所定の係数を乗じて合計する類似度
合計部とを有することとしている。With such a configuration, the overall similarity is calculated by adding the weight corresponding to the contribution of each group to character recognition to the elemental similarity of each group, and the character image to be recognized is crushed. Even if there is, a character recognition device with improved character recognition accuracy can be obtained. The partial image area classifying unit includes a peripheral / central classifying unit that classifies the partial image area into a first group of peripheral parts and a second group of central parts of the recognition target character image, The means for separately calculating elementary similarity, which is a city area distance, a Euclidean distance or a Mahalanobis distance, between the feature of the partial character image and the corresponding standard feature of the recognition dictionary in the first and second groups. A similarity calculation unit configured to assign predetermined coefficients to the elementary similarities between the first group and the second group such that the contribution of the first group to the similarity is greater than the contribution of the second group to the similarity; And a similarity totalization unit for multiplying and summing.

【００７１】このような構成によって、一般的に認識対
象文字画像でつぶれが生じる文字画像の中央部の素類似
度の寄与を小さくして、類似度を計算することによっ
て、文字認識の精度が向上できる。また、前記部分画像
領域分類手段は、以下を含む前記部分画像領域を認識対
象文字画像の周辺部の第１群と中央部の第２群とに分類
する周辺・中央分類部を有し、前記類似度計算手段は、
前記第１群の素類似度を計算し、前記部分文字画像の特
徴と前記認識辞書の対応する標準特徴との市街地距離ユ
ークリッド距離又はマハラビス距離である素類似度の上
位の文字群を前記認識辞書から候補文字として選出する
候補文字選出部前記候補文字選出部で選出された候補
文字に対して、第２群の素類似度を計算する第２群素類
似度計算部と、前記第１群の類似度への寄与が第２群の
類似度への寄与よりも大きくなるように第１群と第２群
との素類似度にそれぞれ所定の係数を乗じて合計する類
似度合計部とを有することとしている。With such a configuration, the contribution of the elemental similarity at the center of a character image in which the character image to be recognized is generally collapsed is reduced, and the similarity is calculated to improve the accuracy of character recognition. it can. Further, the partial image area classifying means has a peripheral / central classifying section that classifies the partial image areas into a first group of peripheral parts and a second group of central parts of the character image to be recognized, including: The similarity calculating means includes:
Calculating the elementary similarity of the first group, and recognizing a character group having a higher elementary similarity, which is an urban area distance Euclidean distance or a Maharabis distance, between the feature of the partial character image and the corresponding standard feature of the recognition dictionary, in the recognition dictionary A second group element similarity calculation unit that calculates a second group element similarity degree for the candidate character selected by the candidate character selection unit; A similarity totalization unit that multiplies the elementary similarity between the first group and the second group by a predetermined coefficient so that the contribution to the similarity is greater than the contribution to the similarity of the second group; I have to do that.

【００７２】このような構成によって、一般的に認識対
象文字画像でその形状特徴をよく表わしている周辺部の
特徴から候補文字を選出し、選出した候補文字に対して
だけ、全体の類似度を計算するようしているので、認識
精度の向上とともに処理速度の高速化を図ることができ
る。また、前記部分画像領域分類手段は、前記部分画像
領域を認識対象文字画像の周辺部の第１群と中央部の第
２群とに分類する周辺・中央分類部を有し、形状が類似
する文字の標準的な画像群をＮ（Ｎ≧２）の部分画像領
域に分割し、各部分画像領域に含まれる部分文字画像の
文字群標準特徴を類似文字単位の文字コード群と共に予
め登録している文字群分類辞書と、前記第１群に分類さ
れた認識対象文字画像の部分文字画像の特徴と前記文字
群分類辞書に登録された対応する文字群標準特徴との類
似度を計算し、類似度の高い文字群の文字コードを選出
する文字群選出手段とを備え、前記類似度計算手段は、
前記文字群選出手段で選出された文字コードに対応する
前記認識辞書に登録された標準特徴との間でのみ、認識
対象の部分文字画像の特徴との類似度を計算し、前記類
似度計算手段は、前記第１群と第２群との、前記部分文
字画像の特徴と前記認識辞書の対応する標準特徴との市
街地距離、ユークリッド距離又はマハラノビス距離であ
る素類似度を別々に計算する素類似度計算部と、前記第
１群の類似度への寄与が第２群の類似度への寄与よりも
大きくなるように第１群と第２群との素類似度にそれぞ
れ所定の係数を乗じて合計する類似度合計部とを有する
こととしている。With such a configuration, in general, candidate characters are selected from the features of the peripheral portion that well represent the shape characteristics in the character image to be recognized, and the overall similarity is determined only for the selected candidate characters. Since the calculation is performed, the recognition speed can be improved and the processing speed can be increased. The partial image area classifying unit has a peripheral / central classifying unit that classifies the partial image area into a first group of peripheral parts and a second group of central parts of the character image to be recognized, and has a similar shape. A standard image group of characters is divided into N (N ≧ 2) partial image regions, and character group standard features of partial character images included in each partial image region are registered in advance together with character code groups of similar character units. Calculating the similarity between the character group classification dictionary and the character group partial feature of the character image to be recognized classified into the first group and the corresponding character group standard feature registered in the character group classification dictionary. Character group selection means for selecting a character code of a character group having a high degree, the similarity calculation means,
Calculating the degree of similarity with the feature of the partial character image to be recognized only with the standard feature registered in the recognition dictionary corresponding to the character code selected by the character group selecting means; Are elementary similarities that separately calculate the elementary similarity, which is the city area distance, the Euclidean distance or the Mahalanobis distance, between the feature of the partial character image and the corresponding standard feature of the recognition dictionary in the first and second groups. A degree calculation unit that multiplies the elementary similarity between the first group and the second group by a predetermined coefficient such that the contribution of the first group to the similarity is greater than the contribution of the second group to the similarity. And a total sum of similarities.

【００７３】このような構成によって、認識対象文字画
像の特徴をよく表す部分の特徴を用いて、形状の類似す
る類似文字単位の文字群を選出し、選出した文字群に対
して精度の向上した類似度の計算をするので、認識精度
の向上とともに、更に処理速度の高速化を図ることがで
きる。また、前記部分画像領域分類手段は、前記部分画
像領域に占める部分文字画像の割合を各部分画像領域ご
とに計算する部分文字画像割合計算部と、前記部分文字
画像の割合がしきい値未満か以上かで前記部分画像領域
を正常部の第１群とつぶれ部の第２群とに分類する正常
・つぶれ部判定部とを有することとしている。With such a configuration, a character group of a similar character unit having a similar shape is selected by using features of a portion that well represents the feature of the character image to be recognized, and the accuracy of the selected character group is improved. Since the similarity is calculated, the recognition speed can be improved and the processing speed can be further increased. The partial image area classifying unit may calculate a percentage of the partial character image in the partial image area for each partial image area, and a partial character image ratio calculating unit, wherein the ratio of the partial character image is less than a threshold value. As described above, the image processing apparatus includes a normal / crushed portion determination unit that classifies the partial image region into a first group of normal portions and a second group of collapsed portions.

【００７４】このような構成によって、個別の認識対象
文字画像の各部分画像領域ごとに正常部かつぶれ部かを
判断して第１群と第２群とに分類して、第１群と第２群
との部分画像領域の素類似度に所定の重み付けをするこ
とから、類似度の信頼性が高まり、更に文字認識の精度
が向上する。また、前記類似度計算手段は、前記第１群
と第２群との、前記部分文字画像の特徴と前記認識辞書
の対応する標準特徴との市街地距離、ユークリッド距離
又はマハラノビス距離である素類似度を別々に計算する
素類似度計算部と、前記第１群の類似度への寄与が第２
群の類似度への寄与よりも大きくなるように第１群と第
２群との素類似度にそれぞれ所定の係数を乗じて合計す
る類似度合計部とを有することとしている。With such a configuration, each of the partial image regions of the individual character image to be recognized is determined to be a normal part and a blurred part, and is classified into a first group and a second group. Since a predetermined weight is assigned to the elementary similarity between the partial image regions of the two groups, the reliability of the similarity is increased, and the accuracy of character recognition is further improved. In addition, the similarity calculating means may include an elementary similarity which is a city area distance, a Euclidean distance, or a Mahalanobis distance between a feature of the partial character image and a corresponding standard feature of the recognition dictionary in the first and second groups. And a contribution to the similarity of the first group is a second
There is provided a similarity totaling unit that multiplies the elementary similarity between the first group and the second group by a predetermined coefficient so as to be greater than the contribution to the similarity of the group, and sums them.

【００７５】このような構成によって、部分画像領域の
つぶれ部の類似度への寄与を小さくするので、認識精度
の向上を図ることができる。また、前記類似度計算手段
は、前記第１群の素類似度を計算し、前記部分文字画像
の特徴と前記認識辞書の対応する標準特徴との市街地距
離、ユークリッド距離又はマハラノビス距離である素類
似度の上位の文字群を前記認識辞書から候補文字として
選出する候補文字選出部と、前記候補文字選出部で選出
された候補文字に対して、第２群の素類似度を計算する
第２群素類似度計算部と、前記第１群の類似度への寄与
が第２群の類似度への寄与よりも大きくなるように第１
群と第２群との素類似度にそれぞれ所定の係数を乗じて
合計する類似度合計部とを有することとしている。With such a configuration, the contribution of the crushed portion of the partial image area to the similarity is reduced, so that the recognition accuracy can be improved. Further, the similarity calculating means calculates the elementary similarity of the first group, and calculates the elementary similarity which is a city area distance, a Euclidean distance or a Mahalanobis distance between the feature of the partial character image and the corresponding standard feature of the recognition dictionary. A candidate character selection unit for selecting a character group having a higher degree as a candidate character from the recognition dictionary, and a second group for calculating a second group elementary similarity for the candidate character selected by the candidate character selection unit An elementary similarity calculation unit, and a first similarity calculation unit configured to determine that the first group has a greater contribution to the similarity than the second group has a similarity.
It has a similarity summing section for multiplying the elementary similarity between the group and the second group by a predetermined coefficient, respectively, and summing them.

【００７６】このような構成によって、正常部と判断さ
れた部分画像領域の第１群だけの類似度から候補文字を
選出し、選出された候補文字に対して類似度を計算する
ので、処理速度の高速化と認識精度の向上を図ることが
できる。また、形状が類似する文字の標準的な画像群を
Ｎ（Ｎ≧２）の部分画像領域に分割し、各部分画像領域
に含まれる部分文字画像の文字群標準特徴を類似文字単
位の文字コード群と共に予め登録している文字群分類辞
書と、前記第１群に分類された認識対象文字画像の部分
文字画像の特徴と前記文字群分類辞書に登録された対応
する文字群標準特徴との類似度を計算し、類似度の高い
文字群の文字コードを選出する文字群選出手段とを備
え、前記類似度計算手段は、前記文字群選出手段で選出
された文字コードに対応する前記認識辞書に登録された
標準特徴との間でのみ、認識対象の部分文字画像の特徴
との類似度を計算し前記類似度計算手段は、前記第１群
と第２群との、前記部分文字画像の特徴と前記認識辞書
の対応する標準特徴との市街地距離、ユークリッド距離
又はマハラノビス距離である素類似度を別々に計算する
類似度計算部と、前記第１群の類似度への寄与が第２群
の類似度への寄与よりも大きくなるように第１群と第２
群との素類似度にそれぞれ所定の係数を乗じて合計する
類似度合計部とを有することとしている。With such a configuration, candidate characters are selected from the similarities of only the first group of the partial image areas determined to be normal, and the similarity is calculated for the selected candidate characters. Speed and recognition accuracy can be improved. Further, a standard image group of characters having similar shapes is divided into N (N ≧ 2) partial image regions, and a character group standard feature of a partial character image included in each partial image region is represented by a character code in units of similar characters. A similarity between the character group classification dictionary registered in advance with the group, the characteristic of the partial character image of the recognition target character image classified into the first group, and the corresponding character group standard characteristic registered in the character group classification dictionary Character group selecting means for calculating a degree and selecting a character code of a character group having a high degree of similarity, wherein the similarity degree calculating means stores the character dictionary in the recognition dictionary corresponding to the character code selected by the character group selecting means. Only between the registered standard feature and the registered standard feature, the similarity with the feature of the partial character image to be recognized is calculated, and the similarity calculating means calculates the feature of the partial character image between the first group and the second group. City with corresponding standard features of the recognition dictionary A similarity calculating unit that separately calculates a prime similarity that is a distance, a Euclidean distance, or a Mahalanobis distance; and a similarity calculating unit that makes a contribution to the similarity of the first group larger than a contribution to the similarity of the second group. 1st group and 2nd
And a similarity summing section for multiplying the elementary similarity with the group by a predetermined coefficient and summing them.

【００７７】このような構成によって、正常部と判断さ
れた部分画像領域の特徴から認識対象文字画像の文字を
含む文字群を絞ることができるので、類似度計算の高速
化が図れる。また、前記類似度合計部で乗じられる係数
は、第１群の素類似度に対して１．０であり、第２群の
素類似度に対して０．５であり、前記認識文字選出手段
は、類似度の最も小さい値の文字を類似度が最高の文字
として選出することとしている。。With such a configuration, it is possible to narrow down a character group including the characters of the character image to be recognized from the characteristics of the partial image area determined to be a normal part, so that the similarity calculation can be speeded up. Further, the coefficient multiplied by the similarity total part is 1.0 for the elementary similarity of the first group and 0.5 for the elementary similarity of the second group. Is to select the character having the lowest similarity as the character having the highest similarity. .

【００７８】このような構成によって、認識対象文字画
像の中央部の類似度への寄与を周辺部の半分にすること
により、文字認識率が飛躍的に向上する。また、前記し
きい値は、７５％であることとしている。このような構
成によって、正常部とつぶれ部との第１群と第２群との
分類が適切に行われ、文字認識の精度が向上する。With such a configuration, the contribution to the similarity of the central portion of the recognition target character image is reduced to half of that of the peripheral portion, thereby dramatically improving the character recognition rate. The threshold value is set to 75%. With such a configuration, the first group and the second group of the normal part and the crushed part are appropriately classified, and the accuracy of character recognition is improved.

【００７９】また、本発明は、認識対象文字画像を文字
コードとして認識する文字認識方法であって、認識対象
文字画像をＮ（Ｎ≧２）の部分画像領域に分割する画像
領域分割ステップと、前記部分画像領域に含まれる部分
文字画像の特徴を抽出する特徴抽出ステップと、前記部
分画像領域を複数群に分類する部分画像領域分類ステッ
プと、各群ごとに前記認識対象文字画像の部分文字画像
の特徴と、文字の標準的な画像をＮの部分画像領域に分
割し、各部分画像領域に含まれる部分文字画像の標準特
徴を文字単位で文字コードと共に予め登録している認識
辞書の対応する標準特徴との素類似度を計算し、各群の
素類似度に所定の重み付けをして、認識対象文字画像と
前記認識辞書の文字との類似度を計算する類似度計算ス
テップと、前記類似度が最も高い文字を認識文字として
その文字コードを選出する認識文字選出ステップとを有
することとしている。The present invention is also a character recognition method for recognizing a character image to be recognized as a character code, comprising: an image area dividing step of dividing the character image to be recognized into N (N ≧ 2) partial image areas; A feature extraction step of extracting features of a partial character image included in the partial image area; a partial image area classification step of classifying the partial image areas into a plurality of groups; and a partial character image of the recognition target character image for each group. And the standard image of the character is divided into N partial image regions, and the standard characteristic of the partial character image included in each partial image region corresponds to a corresponding recognition dictionary registered in advance together with a character code in character units. A similarity calculating step of calculating a similarity with the standard feature, weighting the basic similarity of each group with a predetermined weight, and calculating a similarity between the character image to be recognized and the character of the recognition dictionary; Degree is to having a recognized character selection step of selecting the character code having the highest character as the recognized character.

【００８０】このような構成によって、各群の素類似度
に各群の文字認識への寄与に応じた軽重を付加して、全
体の類似度が計算されるので、認識対象文字画像がつぶ
れている場合でも、文字認識の精度が向上した文字認識
方法を得ることができる。更に、本発明は、文字の標準
的な画像をＮ（Ｎ≧２）の部分画像領域に分割し、各部
分画像領域に含まれる部分文字画像の標準特徴を文字単
位で文字コードと共に予め登録している認識辞書が記録
され、認識対象文字画像をＮの部分画像領域に分割する
画像領域分割ステップと、前記画像領域分割ステップで
分割された部分画像領域に含まれる部分文字画像の特徴
を抽出する特徴抽出ステップと、前記画像領域分割ステ
ップで分割された部分画像領域を複数群に分類する部分
画像領域分類ステップと、各群ごとに前記認識対象文字
画像の部分文字画像の特徴と前記認識辞書の対応する標
準特徴との素類似度を計算し、各群の素類似度に所定の
重み付けをして、認識対象文字画像と前記認識辞書の文
字との類似度を計算する類似度計算ステップと、前記類
似度が最も高い文字を認識文字としてその文字コードを
選出する認識文字選出ステップとを有するプログラムが
記録されたコンピュータ読み取り可能な記録媒体として
いる。With this configuration, the overall similarity is calculated by adding the weight corresponding to the contribution of each group to character recognition to the elementary similarity of each group, and the character image to be recognized is crushed. Even if there is, a character recognition method with improved character recognition accuracy can be obtained. Further, according to the present invention, a standard image of a character is divided into N (N ≧ 2) partial image regions, and standard features of the partial character images included in each partial image region are registered in advance along with character codes in units of characters. And an image area dividing step of dividing the recognition target character image into N partial image areas, and extracting characteristics of the partial character images included in the partial image areas divided in the image area dividing step. A feature extraction step, a partial image region classification step of classifying the partial image regions divided in the image region division step into a plurality of groups, and a feature of the partial character image of the recognition target character image and the recognition dictionary for each group. A similarity calculating step of calculating a similarity between the corresponding standard feature and a predetermined weight for the basic similarity of each group to calculate a similarity between the character image to be recognized and the character of the recognition dictionary; , And with the degree of similarity is highest character selects the character codes as recognition character recognized character selection step and a computer-readable recording medium having a program recorded with.

【００８１】このような構成によって、従来の文字認識
装置にこの記録媒体を装着してプログラムを読み込ませ
ることによって、認識対象文字画像がつぶれている場合
でも、認識精度の向上した文字認識装置として用いるこ
とができる。With such a configuration, by mounting the recording medium on a conventional character recognition apparatus and reading the program, even if the recognition target character image is crushed, the apparatus can be used as a character recognition apparatus with improved recognition accuracy. be able to.

[Brief description of the drawings]

【図１】画像入力部に入力された認識対象の文字画像の
一例を示す図である。FIG. 1 is a diagram illustrating an example of a character image to be recognized input to an image input unit.

【図２】画像入力部に入力された認識対象の文字画像の
他の例を示す図である。FIG. 2 is a diagram illustrating another example of a character image to be recognized input to an image input unit.

【図３】本発明に係る文字認識装置の実施の形態１の構
成図である。FIG. 3 is a configuration diagram of a first embodiment of a character recognition device according to the present invention.

【図４】(a)は、上記実施の形態の認識辞書の数字
「１」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「１」を示す図
である。FIG. 4A is a diagram illustrating an example of registered contents of the numeral “1” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "1" based on the registration content of (a).

【図５】標準特徴量抽出のための特徴要素を説明する図
である。FIG. 5 is a diagram illustrating feature elements for extracting a standard feature amount.

【図６】(a)は、上記実施の形態の認識辞書のアルファ
ベット「i」の登録内容の一例を示す図である。(b)は、
上記(a)の登録内容の基になる標準文字画像「i」を示す
図である。FIG. 6A is a diagram illustrating an example of registered contents of the alphabet “i” in the recognition dictionary according to the embodiment. (b)
FIG. 9 is a diagram showing a standard character image “i” that is the basis of the registered content of (a).

【図７】(a)は、上記実施の形態の認識辞書のアルファ
ベット「t」の登録内容の一例を示す図である。(b)は、
上記(a)の登録内容の基になる標準文字画像「t」を示す
図である。FIG. 7A is a diagram illustrating an example of registered contents of the alphabet “t” in the recognition dictionary according to the embodiment. (b)
FIG. 9 is a diagram showing a standard character image “t” that is the basis of the registered content of (a).

【図８】(a)は、上記実施の形態の認識辞書の漢字
「雲」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「雲」を示す図
である。FIG. 8A is a diagram showing an example of registered contents of a kanji “cloud” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "cloud" based on the registration content of (a).

【図９】(a)は、上記実施の形態の認識辞書の漢字
「雷」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「雷」を示す図
である。FIG. 9A is a diagram illustrating an example of registered contents of a kanji character “Rain” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "lightning" based on the registration content of (a).

【図１０】(a)は、上記実施の形態の認識辞書の漢字
「宅」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「宅」を示す図
である。FIG. 10A is a diagram showing an example of registered contents of a kanji character “home” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "home" based on the registration content of (a).

【図１１】(a)は、上記実施の形態の認識辞書の漢字
「電」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「電」を示す図
である。FIG. 11A is a diagram showing an example of registered contents of a kanji character “den” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "den" which becomes the basis of the registration content of (a).

【図１２】(a)は、上記実施の形態の認識辞書の漢字
「竜」の登録内容の一例を示す図である。(b)は、上記
(a)の登録内容の基になる標準文字画像「竜」を示す図
である。FIG. 12A is a diagram showing an example of registered contents of a kanji character “dragon” in the recognition dictionary according to the embodiment. (b)
It is a figure which shows the standard character image "dragon" based on the registration content of (a).

【図１３】上記実施の形態の画像領域分割部で図１に示
した認識対象文字画像「電」１０１を領域分割した状態
を示す図である。FIG. 13 is a diagram illustrating a state where the recognition target character image “D” 101 illustrated in FIG. 1 is divided into regions by the image region dividing unit according to the embodiment.

【図１４】上記実施の形態の特徴量抽出部で抽出された
認識対象文字画像「電」１０１の特徴量を示す図であ
る。FIG. 14 is a diagram showing characteristic amounts of a recognition target character image “den” 101 extracted by the characteristic amount extraction unit of the embodiment.

【図１５】上記実施の形態の特徴量抽出部で抽出された
図２に示した認識対象文字画像「電」２０１の特徴量を
示す図である。FIG. 15 is a diagram illustrating the feature amounts of the recognition target character image “den” 201 illustrated in FIG. 2 extracted by the feature amount extraction unit according to the embodiment.

【図１６】上記実施の形態の画像領域分類部で画像領域
分割部が分割した領域を周辺部の第１群と中央部の第２
群とに分類した状態を示す図である。FIG. 16 is a diagram showing a region divided by the image region dividing unit in the image region classifying unit according to the embodiment described above, as a first group in a peripheral part and a second group in a central part.
It is a figure showing the state classified into groups.

【図１７】(a)は、上記実施の形態の認識部によって計
算された認識対象文字画像「電」１０１の合計市街地距
離の近い文字を示す図である。(b)は、上記実施の形態
の認識部によって計算された認識対象文字画像「電」２
０１の合計市街地距離の近い文字を示す図である。(c)
は、従来の文字認識装置で得られた認識対象文字画像
「電」１０１，２０１の市街地距離の近い文字を参考例
として示す図である。FIG. 17A is a diagram illustrating characters having a short total city distance in the recognition target character image “D” 101 calculated by the recognition unit according to the embodiment. (b) is a recognition target character image “den” 2 calculated by the recognition unit of the above embodiment.
It is a figure which shows the character of 01 with a short total city area distance. (c)
FIG. 3 is a diagram showing, as a reference example, a character having a short city distance in a character image 101 or 201 to be recognized obtained by a conventional character recognition device.

【図１８】上記実施の形態の動作を説明するフローチャ
ートである。FIG. 18 is a flowchart illustrating an operation of the embodiment.

【図１９】本発明に係る文字認識装置の実施の形態２の
構成図である。FIG. 19 is a configuration diagram of a character recognition device according to a second embodiment of the present invention.

【図２０】(a)は、上記実施の形態の候補文字認識部で
認識対象文字画像「電」１０１，２０１の第１群の特徴
量を用いて市街地距離の近い候補文字が選出されること
を示す図である。(b)は、上記実施の形態の認識部で候
補文字認識の結果を利用して認識対象文字画像「電」１
０１，２０１の選出がされることを示す図である。FIG. 20 (a) shows that the candidate character recognition unit of the embodiment selects a candidate character having a short city distance using the feature amount of the first group of the recognition target character images “den” 101 and 201. FIG. (b) shows the recognition target character image “D” 1 using the result of candidate character recognition in the recognition unit of the above embodiment.
It is a figure which shows that 01 and 201 are selected.

【図２１】上記実施の形態の動作を説明するフローチャ
ートである。FIG. 21 is a flowchart illustrating an operation of the embodiment.

【図２２】本発明に係る文字認識装置の実施の形態３の
構成図である。FIG. 22 is a configuration diagram of a character recognition device according to a third embodiment of the present invention.

【図２３】上記実施の形態の大分類辞書の登録内容の一
例を示す図である。FIG. 23 is a diagram showing an example of registered contents of the large classification dictionary according to the embodiment.

【図２４】(a)は、上記実施の形態の大分類部で認識対
象文字画像「電」１０１，２０１の第１群の特徴量を用
いて類似文字単位が選択されることを説明する図であ
る。(b)は、上記実施の形態の認識部で認識対象文字画
像「電」１０１，２０１の文字が選出されることを示す
図である。FIG. 24A is a diagram illustrating that a similar character unit is selected using the first group of feature amounts of the recognition target character images “den” 101 and 201 in the large classification unit according to the embodiment. It is. (b) is a diagram showing that the characters of the recognition target character images “den” 101 and 201 are selected by the recognition unit of the above embodiment.

【図２５】上記実施の形態の動作を説明するフローチャ
ートである。FIG. 25 is a flowchart illustrating the operation of the above embodiment.

【図２６】本発明に係る文字認識装置の実施の形態４の
構成図である。FIG. 26 is a configuration diagram of a character recognition device according to a fourth embodiment of the present invention.

【図２７】上記実施の形態のつぶれ度合算出部で認識対
象文字画像「電」１０１，２０１の正常部の第１群とつ
ぶれ部の第２群とに分類することを説明する図である。FIG. 27 is a diagram illustrating that the crushing degree calculating unit according to the embodiment classifies the recognition target character images “den” 101, 201 into a first group of normal parts and a second group of crushing parts.

【図２８】上記実施の形態の認識部で認識対象文字画像
「電」１０１，２０１の文字が選出されることを示す図
である。FIG. 28 is a diagram illustrating that characters of a recognition target character image “den” 101 and 201 are selected by the recognition unit according to the embodiment.

【図２９】本発明に係る文字認識装置の実施の形態５の
構成図である。FIG. 29 is a configuration diagram of a character recognition device according to a fifth embodiment of the present invention.

【図３０】(a)は、上記実施の形態の候補文字認識部で
認識対象文字画像「電」１０１，２０１の正常部の第１
群の特徴量を用いて候補文字が選出されることを示す図
である。(b)は、上記実施の形態の認識部が候補文字認
識部の結果を利用して認識対象文字画像「電」１０１，
２０１の文字を選出することを示す図である。FIG. 30A is a diagram illustrating a first example of a normal part of the character images 101 and 201 to be recognized by the candidate character recognition unit according to the embodiment;
It is a figure showing that a candidate character is chosen using a feature quantity of a group. (b) shows that the recognition unit of the above-described embodiment uses the result of the candidate character recognition unit to recognize the recognition target character image “den” 101,
It is a figure which shows that the character of 201 is selected.

【図３１】本発明に係る文字認識装置の実施の形態６の
構成図である。FIG. 31 is a configuration diagram of a character recognition device according to a sixth embodiment of the present invention.

【図３２】(a)は、上記実施の形態の大分類部で認識対
象文字画像「電」１０１，２０１の正常部の第１群の特
徴量を用いて類似文字単位が選出されることを示す図で
ある。(b)は、上記実施の形態の認識部で認識対象文字
画像「電」１０１，２０１の文字が選出されることを示
す図である。FIG. 32 (a) shows that a similar character unit is selected by using the feature amount of the first group of the normal part of the recognition target character image “den” 101, 201 in the large classification unit of the above embodiment. FIG. (b) is a diagram showing that the characters of the recognition target character images “den” 101 and 201 are selected by the recognition unit of the above embodiment.

[Explanation of symbols]

３０１認識辞書３０２文字画像入力部３０３画像領域分割部３０４特徴量抽出部３０５画像領域分類部３０６、１９０２、２２０３、２６０２、２９０２、３
１０２認識部３０７認識結果出力部１９０１、２９０１候補文字認識部２２０１大分類辞書２２０２、３１０１大分類部２６０１つぶれ度合算出部301 Recognition dictionary 302 Character image input unit 303 Image region division unit 304 Feature extraction unit 305 Image region classification unit 306, 1902, 2203, 2602, 2902, 3
102 recognition unit 307 recognition result output unit 1901, 2901 candidate character recognition unit 2201 large classification dictionary 2202, 3101 large classification unit 2601 loss degree calculation unit

Claims

[Claims]

1. A character recognition device for recognizing a character image to be recognized as a character code, wherein a standard image of a character is divided into N (N ≧ 2) partial image regions, and each character image is included in each partial image region. A recognition dictionary in which standard features of partial character images are registered in advance along with character codes in units of characters, an image region dividing unit that divides a recognition target character image into N partial image regions, A feature extraction unit for extracting a feature of a partial character image included in the partial image region; a partial image region classification unit for classifying the partial image regions divided by the image region division unit into a plurality of groups; Calculate the elementary similarity between the characteristic of the partial character image of the target character image and the corresponding standard characteristic of the recognition dictionary, weight the elementary similarity of each group by a predetermined weight, and calculate the elementary similarity between the recognition target character image and the recognition dictionary. Letters and Character recognition apparatus characterized by comprising: a similarity calculation means for calculating the similarity, a recognition character selection means for selecting the character codes of the similarity highest character as the recognized character.

2. The partial image region classifying means includes a peripheral / central classifying unit that classifies the partial image region into a first group of peripheral portions and a second group of central portions of a character image to be recognized, The similarity calculating means separately calculates the elementary similarity, which is a city distance, a Euclidean distance, or a Mahalanobis distance, between the feature of the partial character image and the corresponding standard feature of the recognition dictionary in the first group and the second group. An elementary similarity calculating unit for calculating the elementary similarity between the first group and the second group so that the contribution of the first group to the degree of similarity is greater than the contribution of the second group to the degree of similarity 2. A character recognition apparatus according to claim 1, further comprising: a similarity summing unit for multiplying the sum by multiplying the sum by a coefficient.

3. The partial image area classifying means includes: a first partial image area including:
A peripheral / central classification unit for classifying the partial group into a group and a central second group; wherein the similarity calculating means calculates the elementary similarity of the first group, and the feature of the partial character image and the recognition dictionary. A candidate character selection unit for selecting a character group having a higher elemental similarity, which is a city area distance Euclidean distance or a Maharabis distance, with the corresponding standard feature as a candidate character from the recognition dictionary, to a candidate character selected by the candidate character selection unit. On the other hand, a second group element similarity calculation unit for calculating the elementary similarity of the second group, and a second group element similarity calculation unit that makes the contribution to the similarity of the first group larger than the contribution to the similarity of the second group. 2. The character recognition device according to claim 1, further comprising: a similarity totaling unit that multiplies the elementary similarity between the first group and the second group by a predetermined coefficient to sum the result.

4. The partial image area classifying means includes a peripheral / central classifying section for classifying the partial image area into a first group of peripheral parts and a second group of central parts of the character image to be recognized. Is divided into N (N ≧ 2) partial image areas, and the character group standard features of the partial character images included in each partial image area are pre-set together with the character code group of similar character units. Calculating the similarity between the registered character group classification dictionary, the feature of the partial character image of the recognition target character image classified into the first group, and the corresponding character group standard feature registered in the character group classification dictionary And a character group selecting means for selecting a character code of a character group having a high degree of similarity, wherein the similarity calculating means is registered in the recognition dictionary corresponding to the character code selected by the character group selecting means. Recognized parts only between standard features Calculating the degree of similarity with the features of the character image; and the similarity calculating means includes: an urban area distance between the first group and the second group, the features of the partial character image and the corresponding standard features of the recognition dictionary; A primary similarity calculating unit that separately calculates a primary similarity that is a Euclidean distance or a Mahalanobis distance; and a first similarity calculation unit that makes a contribution to the similarity of the first group larger than a contribution to the similarity of the second group. 2. The character recognition device according to claim 1, further comprising: a similarity totaling unit that multiplies the elementary similarity between the group and the second group by a predetermined coefficient, respectively, and sums them.

5. A partial character image ratio calculating unit for calculating a ratio of a partial character image in the partial image region for each partial image region, a threshold value of the partial character image ratio. 2. The character recognition apparatus according to claim 1, further comprising a normal / crushed portion determination unit that classifies the partial image area into a first group of normal portions and a second group of crushed portions based on a value less than or equal to the value. .

6. The similarity calculating means is a city distance, a Euclidean distance, or a Mahalanobis distance between a feature of the partial character image and a corresponding standard feature of the recognition dictionary in the first group and the second group. A first similarity calculating unit for separately calculating a first similarity; a first similarity calculating unit configured to calculate a first similarity of the first group and a second second similarity such that a contribution of the first group to the similarity is greater than a second similarity to the second similarity. 6. The character recognition device according to claim 5, further comprising: a similarity totalization unit that multiplies each of the elementary similarities by a predetermined coefficient and sums them.

7. The similarity calculating means calculates the elementary similarity of the first group, and calculates an urban distance between a feature of the partial character image and a corresponding standard feature of the recognition dictionary.
A candidate character selection unit that selects a character group having a higher elemental similarity that is a Euclidean distance or a Mahalanobis distance as a candidate character from the recognition dictionary; and a second group of candidate characters selected by the candidate character selection unit. A second group prime similarity calculation unit for calculating a prime similarity; and a first group and a second group, such that a contribution of the first group to the similarity is greater than a contribution of the second group to the similarity. 6. The character recognition device according to claim 5, further comprising: a similarity totaling unit that multiplies each of the elementary similarities by a predetermined coefficient and sums them.

8. A standard image group of characters having similar shapes is divided into N (N ≧ 2) partial image regions, and a character group standard feature of a partial character image included in each partial image region is converted into a similar character unit. A character group classification dictionary registered in advance together with the character code group of the character group; characteristics of partial character images of the recognition target character images classified into the first group; and corresponding character group standard characteristics registered in the character group classification dictionary And character group selection means for calculating a character code of a character group with a high degree of similarity, wherein the similarity calculation means corresponds to the character code selected by the character group selection means. Only in the case of the standard feature registered in the recognition dictionary, the similarity with the feature of the partial character image to be recognized is calculated, and the similarity calculating means calculates the similarity between the first group and the second group. Image features and corresponding standard features of the recognition dictionary A similarity calculating unit for separately calculating the elementary similarity that is a city distance, a Euclidean distance, or a Mahalanobis distance to the first group, and the contribution to the similarity of the first group is greater than the contribution to the similarity of the second group 6. The character recognition device according to claim 5, further comprising a similarity totaling unit for multiplying the primary similarity between the first group and the second group by a predetermined coefficient and summing them.

9. The coefficient multiplied by the similarity total part is:
It is 1.0 for the elementary similarity of the first group, and 0.5 for the elementary similarity of the second group. The recognition character selecting means assigns the character having the smallest similarity to the similarity. 10. The character recognition device according to claim 2, wherein the character is selected as the best character.

10. The character recognition device according to claim 5, wherein said threshold value is 75%.

11. A character recognition method for recognizing a character image to be recognized as a character code, comprising: an image region dividing step of dividing the character image to be recognized into N (N ≧ 2) partial image regions; A feature extraction step of extracting a feature of a partial character image included in a partial image region classification step of classifying the partial image regions into a plurality of groups; a feature of a partial character image of the recognition target character image for each group; The standard image of the character is divided into N partial image areas,
The basic features of the partial character images included in each partial image area are calculated with the corresponding standard features of the recognition dictionary in which character codes are registered in advance along with the character codes in units of characters. Weighting, calculating a similarity between the recognition target character image and the character in the recognition dictionary, and a recognition character selecting step of selecting a character code of the character having the highest similarity as a recognition character. Character recognition method characterized by having.

12. The partial image area classification step includes a peripheral / central classification sub-step of classifying the partial image area into a first group of peripheral parts and a second group of central parts of the recognition target character image, In the similarity calculating step, the elementary similarity, which is a city area distance, a Euclidean distance or a Mahalanobis distance, between the feature of the partial character image and the corresponding standard feature of the recognition dictionary in the first group and the second group is separately determined. Calculating the elementary similarity between the first group and the second group such that the contribution to the similarity of the first group is greater than the contribution to the similarity of the second group. 12. The character recognition method according to claim 11, further comprising: a similarity synthesis sub-step of multiplying the sum by a predetermined coefficient and summing them.

13. The partial image area classifying step includes: a partial character image ratio calculating substep of calculating a ratio of the partial character image in the partial image region for each partial image region; 12. The character recognition according to claim 11, further comprising a normal / crush determination sub-step of classifying the partial image region into a first group of normal portions and a second group of crush portions according to a value less than or greater than a threshold value. Method.

14. A standard image of a character is divided into N (N ≧ 2) partial image areas, and standard features of the partial character images included in each partial image area are registered in advance in character units together with character codes in advance. An image area dividing step of dividing a recognition target character image into N partial image areas, and extracting a feature of a partial character image included in the partial image area divided in the image area dividing step. An extracting step; a partial image area classifying step of classifying the partial image areas divided in the image area dividing step into a plurality of groups; and a correspondence between a feature of the partial character image of the recognition target character image and the recognition dictionary for each group. A similarity calculation step for calculating the similarity between the character image to be recognized and the characters in the recognition dictionary by calculating the elementary similarity with the standard feature to be recognized and weighting the elementary similarity of each group with a predetermined weight. If the similarity is highest character recognition character selection step and a computer-readable recording medium having a program recorded with to elect the character codes as recognition character.

15. The partial image area classifying step, wherein a partial character image ratio calculating substep of calculating a ratio of the partial character image in the partial image region for each partial image region; 15. The computer-readable medium according to claim 14, further comprising a normal / crush determination sub-step of classifying the partial image region into a first group of normal portions and a second group of crush portions according to a value less than or equal to a threshold value. Possible recording medium.