JP2000155806A

JP2000155806A - Character recognition method and its device and dictionary preparation method and its device and character quality judgment method and recording medium

Info

Publication number: JP2000155806A
Application number: JP10328980A
Authority: JP
Inventors: 立 ▲せん▼; Ritsu Sen; Toshihiro Suzuki; 俊博鈴木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-11-19
Filing date: 1998-11-19
Publication date: 2000-06-06

Abstract

PROBLEM TO BE SOLVED: To quickly and highly precisely attain character recognition from a high quality character to a low quality character even in a multifont environment. SOLUTION: A character quality judgment parameter detecting part 103 detects a character quality judgment parameter including the 'variance of the maximum value of a distance conversion image' of an input character image, and a condition judging part 104 judges a condition related with the parameter so that whether or not the input character image is 'high quality' or 'low quality' can be judged. A dictionary selecting part 108 selects a dictionary 112 for a high quality character or a dictionary 114 for a low quality character according to the judged character quality, and a matching part 107 performs the character matching of the selected dictionary with the input character image.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字認識技術に関
する。[0001] The present invention relates to a character recognition technique.

【０００２】[0002]

【従来の技術】文字認識には文字画像の特徴量を利用す
るが、文字画像の品質によって特徴量はかなり変化す
る。このような文字画像の品質を考慮した文字認識技術
の例が、特開平７−５７０５７号公報に述べられてい
る。この従来技術においては、同じ文字パターンの特徴
である複数のマスクを辞書に格納しておき、入力画像と
辞書とをマッチングすることにより第Ｎ位までの認識候
補カテゴリと距離値と辞書マスク番号を求め、これらに
基づいて文字画像の品質を判定し、文字品質に応じて候
補カテゴリを選別する。2. Description of the Related Art Character recognition uses a characteristic amount of a character image, but the characteristic amount changes considerably depending on the quality of the character image. An example of such a character recognition technique in consideration of the quality of a character image is described in Japanese Patent Application Laid-Open No. 7-57057. In this conventional technique, a plurality of masks, which are features of the same character pattern, are stored in a dictionary, and the input image and the dictionary are matched to determine the Nth recognition candidate category, the distance value, and the dictionary mask number. Then, the quality of the character image is determined based on these, and candidate categories are selected according to the character quality.

【０００３】また、文字画像の品質を判定する一般的な
方法としては、文字線幅を利用する方法が知られてい
る。As a general method for determining the quality of a character image, a method using a character line width is known.

【０００４】[0004]

【発明が解決しようとする課題】前記特開平７−５７０
５７号の技術では、文字画像から抽出した特徴量を辞書
とマッチングした後に文字の品質を判定するが、違うカ
テゴリの複数のマスクを辞書に格納しているため、カテ
ゴリ間の影響により文字品質判定精度の低下が懸念され
る。また、同じ文字パターンについて複数のマスクを辞
書に格納するため辞書が大きくなるが、大きな辞書との
マッチングには時間がかかり、このことが全体の処理速
度の低下を招くおそれがある。SUMMARY OF THE INVENTION The above-mentioned Japanese Patent Application Laid-Open No. 7-570.
In the technology of No. 57, the quality of a character is determined after matching a feature amount extracted from a character image with a dictionary. However, since a plurality of masks of different categories are stored in the dictionary, character quality determination is performed due to the influence between categories. There is a concern that the accuracy may decrease. Further, since a plurality of masks for the same character pattern are stored in the dictionary, the dictionary becomes large. However, matching with a large dictionary takes time, which may cause a reduction in the overall processing speed.

【０００５】また、文字線幅を利用する文字品質判定方
法は、一種類のフォントだけを扱う環境ならば、ある程
度の判定精度を期待できるが、文字線幅の異なる複数種
類のフォントを扱うマルチフォント環境では高精度の判
定を期待できない。A character quality judgment method using character line width can be expected to have a certain degree of accuracy in an environment where only one type of font is used. High precision judgment cannot be expected in the environment.

【０００６】よって、本発明の目的は、マルチフォント
環境においても、高品質の文字画像から低品質の文字画
像まで精度よく、かつ、高速に認識可能な文字認識方法
及び装置を提供することにある。本発明のもう一つの目
的は、かかる文字認識装置で利用する辞書を作成する方
法及び装置を提供することにある。本発明の他の目的
は、マルチフォントの文字認識装置などに好適な文字品
質判定方法を提供することにある。SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a character recognition method and apparatus capable of accurately and quickly recognizing high-quality character images to low-quality character images even in a multi-font environment. . Another object of the present invention is to provide a method and apparatus for creating a dictionary used in such a character recognition device. Another object of the present invention is to provide a character quality determination method suitable for a multi-font character recognition device and the like.

【０００７】[0007]

【課題を解決するための手段】本発明の文字認識方法
は、認識しようとする文字画像を入力する文字画像入力
ステップ、該文字画像入力ステップによる入力文字画像
より、その品質を判定するための文字品質判定パラメー
タを検出するパラメータ検出ステップ、該パラメータ検
出ステップにより検出された文字品質判定パラメータに
関する条件判定を行うことにより該入力文字画像の品質
を判定する条件判定ステップ、該条件判定ステップによ
り高品質と判定されたときに予め用意された高品質文字
用辞書を選択し、該条件判定ステップにより低品質と判
定されたときに予め用意された低品質文字用辞書を選択
する辞書選択ステップ、該辞書選択ステップにより選択
された辞書を用いて該入力文字画像の文字認識を行う文
字認識ステップ、及び、該文字認識ステップにより認識
された結果を出力するステップからなり、該パラメータ
検出ステップにおいて検出される文字品質判定パラメー
タに、該入力文字画像の距離変換画像の極大値の分散が
含まれることを特徴とする。本発明の好ましい態様によ
れば、前記パラメータ検出ステップにおいて検出される
文字品質判定パラメータに、前記入力文字画像の黒画素
の全画素に占める割合、前記入力文字画像の黒画素数を
輪郭長で割った値、前記入力文字画像の輪郭長を該入力
文字画像の距離変換画像の極大値を持つ黒画素の個数で
割った値のいずれか一つ以上も含まれる。According to the character recognition method of the present invention, a character image input step of inputting a character image to be recognized, and a character for judging the quality of the character image from the input character image in the character image input step. A parameter detection step of detecting a quality determination parameter; a condition determination step of determining the quality of the input character image by performing a condition determination on the character quality determination parameter detected by the parameter detection step; A dictionary selecting step of selecting a high-quality character dictionary prepared in advance when determined, and selecting a low-quality character dictionary prepared in advance when determined to be low quality in the condition determining step; A character recognition step of performing character recognition of the input character image using the dictionary selected in the step; Outputting a result recognized by the character recognition step, wherein the character quality determination parameter detected in the parameter detection step includes a variance of a maximum value of a distance conversion image of the input character image. And According to a preferred aspect of the present invention, the ratio of the number of black pixels of the input character image to all pixels and the number of black pixels of the input character image are divided by the contour length to the character quality determination parameter detected in the parameter detection step. And a value obtained by dividing the contour length of the input character image by the number of black pixels having the maximum value of the distance-converted image of the input character image.

【０００８】本発明の文字認識装置は、認識しようとす
る文字画像を入力する文字画像入力部、該文字画像入力
部による入力文字画像より、その品質を判定するための
文字品質判定パラメータを検出し、検出した文字品質パ
ラメータに関する条件判定を行うことにより該入力文字
画像の品質を判定する文字品質判定部、低品質文字用辞
書、高品質文字用辞書、該文字品質判定部により高品質
と判定されたときに該高品質文字用辞書を用いて、低品
質と判定されたときに該低品質文字用辞書を用いて、該
力文字画像の文字認識を行う文字認識部、及び、該文字
認識部による認識結果を出力する出力部からなり、前記
文字品質判定部において検出される文字品質判定パラメ
ータに、該入力文字画像の距離変換画像の極大値の分散
が含まれることを特徴とする。The character recognition device of the present invention detects a character image input unit for inputting a character image to be recognized, and a character quality judgment parameter for judging the quality from the character image input by the character image input unit. A character quality determination unit that determines the quality of the input character image by performing a condition determination regarding the detected character quality parameter, a low-quality character dictionary, a high-quality character dictionary, and is determined to be high quality by the character quality determination unit. A character recognition unit that performs character recognition of the power character image using the low-quality character dictionary when the quality is determined to be low, and the character recognition unit. And a character quality determination parameter detected by the character quality determination unit includes a variance of a maximum value of a distance-converted image of the input character image. And butterflies.

【０００９】本発明の辞書作成方法は、辞書作成のため
の文字画像を入力する文字画像入力ステップ、該文字画
像入力ステップによる入力文字画像より、その品質を判
定するための文字品質判定パラメータを検出するパラメ
ータ検出ステップ、該パラメータ検出ステップにより検
出された文字品質判定パラメータに関する条件判定を行
うことにより該入力文字画像の品質を判定する条件判定
ステップ、該条件判定ステップにより高品質と判定され
た該入力画像を高品質文字用辞書の学習画像として利用
して該高品質文字用辞書を作成するステップ、及び、該
条件判定ステップにより低品質と判定された該入力画像
を低品質文字用辞書の学習画像として利用して該低品質
文字用辞書を作成する辞書作成ステップからなり、該パ
ラメータ検出ステップにより検出される文字品質判定パ
ラメータに、該入力文字画像の距離変換画像の極大値の
分散が含まれることを特徴とする。According to the dictionary creation method of the present invention, a character image input step of inputting a character image for creating a dictionary, and a character quality judgment parameter for judging the quality is detected from the input character image in the character image input step. A parameter detecting step of performing a condition determination on the quality of the input character image by performing a condition determination on the character quality determination parameter detected by the parameter detecting step; and the input determined to be high quality by the condition determining step. Using the image as a learning image for a high-quality character dictionary, creating the high-quality character dictionary; and converting the input image determined as low-quality by the condition determining step to a low-quality character dictionary learning image. A dictionary creating step of creating the dictionary for low-quality characters by utilizing The character quality determining parameters detected by flops, characterized in that includes the variance of the maximum values of the distance transform image of the input character image.

【００１０】本発明の辞書作成装置は、辞書作成のため
の文字画像を入力する文字画像入力部、該文字画像入力
部による入力文字画像より、その品質を判定するための
文字品質判定パラメータを検出し、検出された文字品質
パラメータに関する条件判定を行うことにより該入力文
字画像の品質を判定する文字品質判定部、及び、該文字
品質判定部により高品質と判定された該入力画像を高品
質文字用辞書の学習画像として利用して該高品質文字用
辞書を作成し、該文字品質判定部により低品質と判定さ
れた該入力画像を低品質文字用辞書の学習画像として利
用して該低品質文字用辞書を作成する辞書作成部からな
り、該文字品質判定部により検出される文字品質判定パ
ラメータに、該入力文字画像の距離変換画像の極大値の
分散が含まれることを特徴とする。A dictionary creation apparatus according to the present invention detects a character image input unit for inputting a character image for creating a dictionary, and a character quality judgment parameter for judging the quality from a character image input by the character image input unit. A character quality determination unit that determines the quality of the input character image by performing a condition determination on the detected character quality parameter; and converts the input image determined to be high quality by the character quality determination unit into a high-quality character. The high-quality character dictionary is created using the learning image of the dictionary for the high-quality character, and the input image determined to be low-quality by the character quality determination unit is used as the learning image of the low-quality character dictionary to generate the low-quality character dictionary. The character quality determination parameter detected by the character quality determination unit includes a variance of the maximum value of the distance conversion image of the input character image. The features.

【００１１】本発明の文字品質判定方法は、文字画像の
距離変換画像の極大値の分散を、該文字画像の品質判定
のためのパラメータとして用いることを特徴とする。The character quality determination method according to the present invention is characterized in that the variance of the maximum value of the distance-converted image of the character image is used as a parameter for determining the quality of the character image.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態について説明する。図１は、本発明の一実施形
態である文字認識／辞書作成装置の構成を示すブロック
図である。この文字認識／辞書作成装置は、文字認識装
置としての機能と、文字認識のための辞書を作成する機
能とを併せ持ち、それらの共通機能部分を兼用するよう
に構成されている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a character recognition / dictionary creating apparatus according to an embodiment of the present invention. This character recognition / dictionary creation device has both a function as a character recognition device and a function to create a dictionary for character recognition, and is configured to also serve as a common functional part.

【００１３】図１において、文字画像入力部１００は、
例えばスキャナなどの画像入力機器、記憶装置あるいは
通信回線などから、認識しようとする文字画像又は辞書
作成のための学習文字画像を入力する部分である。文字
認識判定部１０２は、文字画像入力部１００より入力さ
れた文字画像の品質を判定する部分である。この文字画
像判定部１０２は、入力文字画像の文字品質判定パラメ
ータを検出する文字品質判定パラメータ検出部１０３
と、検出された文字品質判定パラメータの条件判定を行
って入力文字画像が「高品質」か「低品質」かを判断す
る条件判定部１０４とから構成される。この判定結果は
文字認識部１０６及び辞書作成部１１０へ送られる。In FIG. 1, a character image input unit 100 includes:
For example, a character image to be recognized or a learning character image for creating a dictionary is input from an image input device such as a scanner, a storage device, or a communication line. The character recognition determination unit 102 is a unit that determines the quality of the character image input from the character image input unit 100. The character image determination unit 102 detects a character quality determination parameter of the input character image.
And a condition determining unit 104 that performs a condition determination of the detected character quality determination parameter to determine whether the input character image is “high quality” or “low quality”. This determination result is sent to the character recognition unit 106 and the dictionary creation unit 110.

【００１４】文字認識部１０６は、高品質文字用辞書１
１２又は低品質文字用辞書１１４を利用して入力文字画
像の文字認識を行う部分である。この文字認識部１０６
は、文字品質判定部１０２により判定された文字品質に
応じて高品質文字用辞書１１２又は低品質文字用辞書１
１４を選択する辞書選択部１０８と、選択された辞書と
入力文字画像の特徴マッチングを行うマッチング部１０
７とから構成される。出力部１１６は、マッチング部１
０７による認識結果をプリンタやディスプレイなどの出
力機器、記憶装置やフロッピーディスクなどの媒体、又
は通信回線へ出力する部分である。The character recognition unit 106 is a high quality character dictionary 1
This is a part for performing character recognition of an input character image using the 12 or the low-quality character dictionary 114. This character recognition unit 106
Is a high-quality character dictionary 112 or a low-quality character dictionary 1 according to the character quality determined by the character quality determination unit 102.
And a matching unit 10 for performing feature matching between the selected dictionary and the input character image.
And 7. The output unit 116 includes the matching unit 1
07 is a unit for outputting the recognition result to an output device such as a printer or a display, a medium such as a storage device or a floppy disk, or a communication line.

【００１５】辞書作成部１１０は、文字画像入力部１０
０によって入力される文字画像を辞書作成の学習画像と
して利用して高品質文字用辞書１１２及び低品質文字用
辞書１１４を作成する部分である。文字品質判定部１０
２により高品質と判定された入力文字画像は高品質文字
用辞書１１２のための学習画像として利用され、低品質
と判定された入力文字画像は低品質文字用辞書１１４の
ための学習画像として利用される。The dictionary creation unit 110 includes the character image input unit 10
This is a part for creating a high-quality character dictionary 112 and a low-quality character dictionary 114 by using a character image input by 0 as a learning image for creating a dictionary. Character quality judgment unit 10
2 is used as a learning image for the high-quality character dictionary 112, and the input character image determined to be low-quality is used as a learning image for the low-quality character dictionary 114. Is done.

【００１６】より具体的に処理内容を説明する。まず、
図２に示すフローチャートを参照して文字認識装置とし
ての処理内容を説明する。The processing will be described more specifically. First,
The processing content of the character recognition device will be described with reference to the flowchart shown in FIG.

【００１７】図２において、文字画像入力部１００より
認識したい文字画像を入力する（ステップ２００）。文
字品質判定部１０２の文字品質判定パラメータ検出部１
０３によって、この入力文字画像より１種類以上の文字
品質判定パラメータを検出する（ステップ２０２〜２０
７）。In FIG. 2, a character image to be recognized is input from the character image input unit 100 (step 200). Character quality determination parameter detection unit 1 of character quality determination unit 102
03, one or more types of character quality determination parameters are detected from the input character image (steps 202 to 20).
7).

【００１８】まず、必須の文字品質判定パラメータの検
出について説明する。入力文字画像より、その距離変換
画像を作成する（ステップ２０２）。この距離変換画像
とは、文字画像中の各黒画素に、それと最も近い白画素
までの距離でラベル付けした画像である。文字「灌」の
高品質文字画像の距離変換画像の例を図３の（ａ）に示
す。次に、距離変換画像の極大値を算出する（ステップ
２０３）。この極大値とは、距離変換画像の黒画素に付
けられたラベルの極大値のことである。図３（ａ）に示
した距離変換画像の極大値を表す画像を図３（ｂ）に示
す。このような極大値の画像は、本発明に関連する技術
分野においてスケルトンとも呼ばれるものである。次に
距離変換画像の極大値の分散を計算する（ステップ２０
４）。この極大値の分散が必須の文字品質判定パラメー
タである。First, detection of an essential character quality determination parameter will be described. A distance conversion image is created from the input character image (step 202). This distance-converted image is an image in which each black pixel in the character image is labeled with the distance to the closest white pixel. FIG. 3A shows an example of a distance conversion image of a high-quality character image of the character “I”. Next, the maximum value of the distance conversion image is calculated (step 203). The maximum value is the maximum value of the label attached to the black pixel of the distance conversion image. FIG. 3B shows an image representing the maximum value of the distance conversion image shown in FIG. Such a maximal value image is also called a skeleton in the technical field related to the present invention. Next, the variance of the maximum value of the distance conversion image is calculated (step 20).
4). The dispersion of the maximum value is an essential character quality determination parameter.

【００１９】この必須の文字品質判定パラメータだけで
も、文字線幅を利用する方法に比べ、様々な種類のフォ
ントにおいて文字画像の品質を精度よく判定することが
できる。すなわち、マルチフォントに対応できる。これ
について図８に関連して説明する。図８は、５種類のフ
ォントに関し、さまざまな品質の「灌」の文字画像を、
その距離変換画像の極大値の分散の大きい順にソートし
た結果を示している。図８から、いずれのフォント種類
についても、潰れた文字画像からかすれた文字画像ま
で、品質に従って文字画像が整然とソートされることを
確認できる。このことは、距離変換画像の極大値の分散
は、マルチフォント環境においても、文字品質判定パラ
メータとして極めて優れていることを意味している。The quality of a character image can be determined with higher accuracy in various types of fonts, as compared with the method using the character line width, using only the essential character quality determination parameters. That is, multi-fonts can be handled. This will be described with reference to FIG. FIG. 8 shows character images of various types of “I” for five fonts.
The results of sorting the distance-converted images in descending order of the variance of the maximum value are shown. From FIG. 8, it can be confirmed that the character images are sorted in order from the crushed character image to the blurred character image according to the quality for any font type. This means that the variance of the maximum value of the distance conversion image is extremely excellent as a character quality determination parameter even in a multi-font environment.

【００２０】このように、距離変換画像の極大値の分散
だけを用いても文字品質の高精度判定が可能であるが、
判定精度をより向上させるために、次に述べる３種類の
パラメータも検出される。ただし、これら３種類のパラ
メータを利用するか否か、どの１つ又は複数のパラメー
タを利用するかは、ユーザが必要に応じて指定すること
ができる。As described above, it is possible to determine the character quality with high accuracy by using only the variance of the maximum value of the distance conversion image.
In order to further improve the determination accuracy, the following three types of parameters are also detected. However, whether or not to use these three types of parameters and which one or more parameters to use can be specified by the user as necessary.

【００２１】その一つのパラメータとして、入力文字画
像の黒画素の全画素に占める割合を算出する（ステップ
２０５）。もう一つのパラメータとして、入力文字画像
の黒画素数を輪郭長で割った値も計算する（ステップ２
０６）。ここで、輪郭長（より正確には黒画素輪郭長）
とは、白画素と隣接してる黒画素の個数である。さらに
もう一つのパラメータとして、輪郭長を、距離変換画像
の極大値を持つ黒画素の個数で割った値も計算する（ス
テップ２０７）。As one of the parameters, the ratio of the black pixels of the input character image to all the pixels is calculated (step 205). As another parameter, a value obtained by dividing the number of black pixels of the input character image by the contour length is also calculated (step 2).
06). Here, the contour length (more precisely, the black pixel contour length)
Is the number of black pixels adjacent to white pixels. As another parameter, a value obtained by dividing the contour length by the number of black pixels having the maximum value of the distance-converted image is calculated (step 207).

【００２２】文字品質判定部１０２の条件判定部１０４
において、以上のようにして検出された文字品質判定パ
ラメータに対する条件判定を行い、入力文字画像が「高
品質」か「低品質」か判断する（ステップ２１０）。こ
の判定条件の例を次の表１に示す。Condition determining section 104 of character quality determining section 102
In the above, a condition determination is performed on the character quality determination parameters detected as described above, and it is determined whether the input character image is “high quality” or “low quality” (step 210). Table 1 below shows an example of this determination condition.

【００２３】[0023]

【表１】すなわち、条件「距離変換画像の極大値の分散」が閾値１より
大きい。条件「黒画素の全画素に占める割合」が閾値０．４
より大きい。条件「黒画素数を輪郭長で割った値」が閾値２より
大きい。条件「輪郭長を、距離変換画像の極大値を持つ黒画
素の個数で割った値」が閾値１．５より大きい。が全て成立する場合に「低品質」と判定し、条件が一つ
でも不成立の場合に「高品質」と判定する。ただし、使
用するパラメータに関する条件のみ判定の対象とされ
る。したがって、例えば文字品質判定パラメータとし
て、必須の「距離変換画像の極大値の分散」だけが利用
される場合には、条件の成立／不成立のみで文字品質
を判定する。それに加えて「黒画素の全画素に占める割
合」が利用される場合には、条件と条件とが同時に
成立したのきのみ「低品質」と判定し、それ以外のとき
には「高品質」と判定する。[Table 1] That is, the condition “variance of the maximum value of the distance conversion image” is larger than the threshold value 1. Condition "Ratio of black pixels to all pixels" is threshold value 0.4
Greater than. Condition “The value obtained by dividing the number of black pixels by the contour length” is larger than threshold value 2. The condition “value obtained by dividing the contour length by the number of black pixels having the maximum value of the distance-converted image” is larger than the threshold value 1.5. Are determined as "low quality" when all the conditions are satisfied, and as "high quality" when at least one condition is not satisfied. However, only the conditions relating to the parameters to be used are to be determined. Therefore, for example, when only the essential “variance of the maximum value of the distance-converted image” is used as the character quality determination parameter, the character quality is determined only based on whether or not the condition is satisfied. In addition, when the “ratio of black pixels to all pixels” is used, “low quality” is determined only when the conditions are satisfied at the same time, and “high quality” is determined otherwise. .

【００２４】文字「灌」の潰れた（低品質の）文字画像
の距離変換画像を図４（ａ）に、その極大値の画像を図
４（ｂ）に示す。この文字画像と、図３（ａ）に示した
距離変換画像に対応する、文字「灌」の高品質文字画像
に関する文字品質判定パラメータの例を次の表２に示
す。FIG. 4A shows a distance-converted image of a crushed (low-quality) character image of the character "I", and FIG. 4B shows an image of the maximum value. Table 2 below shows examples of character quality judgment parameters relating to this character image and the high-quality character image of the character “I” corresponding to the distance conversion image shown in FIG.

【００２５】[0025]

【表２】この表２と前記表１とを対照すれば明らかなように、図
３（ａ）に対応した文字画像では、各パラメータはその
閾値より小さい。したがって、前記条件からのすべ
てが不成立であり、「高品質」と判定されることにな
る。図４（ａ）に対応した文字画像では、各パラメータ
はその閾値より大きく、したがって前記条件からの
どれも成立し、「低品質」と判定される。[Table 2] As is apparent from a comparison between Table 2 and Table 1, in the character image corresponding to FIG. 3A, each parameter is smaller than the threshold value. Therefore, all of the above conditions are not satisfied, and the result is determined to be “high quality”. In the character image corresponding to FIG. 4A, each parameter is larger than the threshold value, and therefore, any of the above conditions is satisfied, and it is determined that the quality is "low quality".

【００２６】次に、文字認識部１０６の辞書選択部１０
８において、入力文字画像が「低品質」と判定された場
合には低品質文字用辞書１１４を選択し（ステップ２１
１）、「高品質」と判定された場合には高品質文字用辞
書１１２を選択する（ステップ２１３）。マッチング部
１０７では、辞書選択部１０８によって選択された辞書
（１１４又は１１２）と入力文字画像との間で特徴マッ
チングを行って認識結果を決定する（ステップ２１
３）。なお、特徴マッチングによって複数の候補を求め
た後に、さらに知識処理等の後処理を行って最終的な認
識結果を決定してもよいことは当然である。認識結果は
出力部１１６によって出力される（ステップ２１４）。Next, the dictionary selection unit 10 of the character recognition unit 106
In step 8, if the input character image is determined to be “low quality”, the low quality character dictionary 114 is selected (step 21).
1) If it is determined to be “high quality”, the high quality character dictionary 112 is selected (step 213). The matching unit 107 determines the recognition result by performing feature matching between the dictionary (114 or 112) selected by the dictionary selection unit 108 and the input character image (step 21).
3). It should be noted that after a plurality of candidates are obtained by feature matching, post-processing such as knowledge processing may be further performed to determine the final recognition result. The recognition result is output by the output unit 116 (Step 214).

【００２７】このように、本発明によれば、マルチフォ
ント環境においても、文字画像の品質を高精度に判定
し、判定した文字品質によって２種類の辞書（１１２又
は１１４）を使い分ける。それぞれの辞書は、高品質文
字または低品質文字の認識のために専用に用意されてい
るため、高品質の文字画像も低品質の文字画像も高精度
に認識可能となる。また、それぞれの辞書のサイズは、
全ての文字品質に対応するように構築された辞書のサイ
ズよりも一般に小さくできる。したがって、本発明のよ
うに文字品質に応じた専用の辞書を選んで文字認識を行
うと、文字品質に関係なく共通の大きなサイズの辞書を
利用する構成に比べ処理速度を上げることができる。As described above, according to the present invention, even in a multi-font environment, the quality of a character image is determined with high accuracy, and two types of dictionaries (112 or 114) are selectively used depending on the determined character quality. Since each dictionary is prepared exclusively for recognition of high-quality characters or low-quality characters, both high-quality character images and low-quality character images can be recognized with high accuracy. The size of each dictionary is
It can generally be smaller than the size of a dictionary built to accommodate all character qualities. Therefore, when character recognition is performed by selecting a dedicated dictionary corresponding to the character quality as in the present invention, the processing speed can be increased as compared with a configuration using a common large-size dictionary regardless of the character quality.

【００２８】本発明の文字認識／辞書作成装置の辞書作
成装置としての処理内容を、図５に示すフローチャート
を参照して説明する。The processing contents of the character recognition / dictionary creating apparatus of the present invention as a dictionary creating apparatus will be described with reference to the flowchart shown in FIG.

【００２９】図５において、まず文字画像入力部１００
により、辞書作成のための学習画像としての文字画像を
入力する（ステップ３００）。文字品質判定部１０２の
文字品質判定パラメータ検出部１０３によって、入力文
字画像より文字品質判定パラメータを検出する（ステッ
プ３０１）。この検出処理の内容と検出される文字品質
判定パラメータの内容は、図２のステップ２０２〜２０
７に関連して述べたとおりである。検出された文字品質
判定パラメータに対する条件判定を文字品質判定部１０
２の条件判定部１０４によって行い、入力文字画像が
「高品質」か「低品質」か判断する（ステップ３０
２）。この条件判定の内容は図２のステップ２１０に関
連して述べたとおりである。入力文字画像が低品質文字
と判定された場合、辞書作成部１１０において、入力文
字画像を低品質文字用の学習画像として利用して低品質
文字用辞書１１４を作成する（ステップ３０３）。高品
質と判定された場合には、入力文字画像を高品質文字用
の学習画像として利用して高品質文字用辞書１１２を作
成する（ステップ３０４）。In FIG. 5, first, a character image input unit 100
Input a character image as a learning image for creating a dictionary (step 300). The character quality judgment parameter is detected from the input character image by the character quality judgment parameter detection unit 103 of the character quality judgment unit 102 (step 301). The contents of this detection processing and the contents of the character quality judgment parameters to be detected are shown in steps 202 to 20 in FIG.
7 is as described above. The condition determination for the detected character quality determination parameter is performed by the character quality determination unit 10.
The second condition determination unit 104 determines whether the input character image is "high quality" or "low quality" (step 30).
2). The details of this condition determination are as described in connection with step 210 in FIG. If the input character image is determined to be a low-quality character, the dictionary creation unit 110 creates a low-quality character dictionary 114 by using the input character image as a learning image for the low-quality character (step 303). If it is determined to be high quality, the high-quality character dictionary 112 is created using the input character image as a learning image for high-quality characters (step 304).

【００３０】前述のように、文字画像の文字品質を高精
度に判定することができる。そして、文字品質の判定結
果に応じて、入力文字画像を高品質文字用の学習画像又
は低品質文字用の学習画像として使い分けて辞書を作成
する。したがって、高品質文字の高精度認識を可能にす
る高品質文字用辞書１１２と、低品質文字の高精度認識
を可能にする低品質文字用辞書１１４を作成することが
できる。As described above, the character quality of a character image can be determined with high accuracy. Then, a dictionary is created by selectively using the input character image as a learning image for high-quality characters or a learning image for low-quality characters according to the determination result of character quality. Therefore, a high-quality character dictionary 112 that enables high-precision recognition of high-quality characters and a low-quality character dictionary 114 that enables high-precision recognition of low-quality characters can be created.

【００３１】以上に説明した本発明の方法又は装置は、
例えば図６に示すようなＣＰＵ４００、メモリ４０１、
入力インターフェース４０２、出力インターフェース４
０３、補助記憶装置４０４、フロッピーディスクやＣＤ
−ＲＯＭのようなコンピュータ読み取り可能記録媒体４
０６用のドライブ４０５などをバス４０７で接続したよ
うな構成の一般的なコンピュータ上で、ソフトウェアに
より実現することも可能である。この場合、図２又は図
５に関連して説明したような各処理ステップをコンピュ
ータに実行させるためのプログラムは、例えば、それが
記録された記録媒体４０６からドライブ４０５を介して
メモり４０１に読み込まれ、あるいは補助記憶装置４０
４に一旦格納され実行時にメモり４０１に読み込まれ
る。高品質文字用辞書と低品質文字用辞書は例えば補助
記憶装置４０４に格納され、処理実行時にメモリ４００
に読み込まれる。認識しようとする文字画像又は辞書作
成の学習画像としての文字画像は、例えば、入力インタ
ーフォース４０２を介して接続されたスキャナよりメモ
リ４０２に読み込まれたり、補助記憶装置４０４よりメ
モリ４０２に読み込まれたり、あるいは、記録媒体４０
６よりメモリ４０２に読み込まれる。認識結果は、例え
ば出力インターフェース４０３を介して接続された出力
装置に出力されたり、補助記憶装置４０４に出力された
り、記録媒体４０６に出力される。The above-described method or apparatus of the present invention comprises:
For example, as shown in FIG.
Input interface 402, output interface 4
03, auxiliary storage device 404, floppy disk or CD
A computer-readable recording medium 4 such as a ROM
It can also be realized by software on a general computer having a configuration in which a drive 405 for 06 and the like are connected by a bus 407. In this case, a program for causing a computer to execute each processing step as described with reference to FIG. 2 or FIG. 5 is, for example, read from a recording medium 406 on which the processing is recorded into the memory 401 via the drive 405. Or the auxiliary storage device 40
4 and read into the memory 401 at the time of execution. The high-quality character dictionary and the low-quality character dictionary are stored in, for example, the auxiliary storage device 404, and are stored in the memory 400 when the process is performed.
Is read in. A character image to be recognized or a character image as a learning image for creating a dictionary is read into the memory 402 by a scanner connected via the input interface 402 or read into the memory 402 from the auxiliary storage device 404, for example. Or the recording medium 40
6 is read into the memory 402. The recognition result is output to, for example, an output device connected via the output interface 403, to the auxiliary storage device 404, or to the recording medium 406.

【００３２】また、本発明の方法又は装置は、図７に略
示するようなクライアント・サーバシステムとして実現
することも可能である。例えば、サーバ・コンピュータ
５００側に、図１の文字品質判定部１０２、文字認識部
１０６、辞書作成部１１０に対応した処理を実行させる
ためのプログラムと各辞書１１２，１１４を置き、クラ
イアント・コンピュータ５０１，５０２側に文字画像入
力部１００及び／又は出力部１１６に対応した処理を実
行させるためのプログラムを置き、クライアント・コン
ピュータ５０１，５０２より文字画像をサーバ・コンピ
ュータ５００へ入力し、文字認識結果をサーバ・コンピ
ュータ５００よりクライアント・コンピュータ５０１，
５０２へ出力するような構成が可能である。Further, the method or apparatus of the present invention can be realized as a client / server system as schematically shown in FIG. For example, on the server computer 500 side, a program for causing the character quality determination unit 102, the character recognition unit 106, and the dictionary creation unit 110 of FIG. , 502, a program for executing the processing corresponding to the character image input unit 100 and / or the output unit 116 is provided, the character images are input from the client computers 501 and 502 to the server computer 500, and the character recognition result is obtained. From the server computer 500 to the client computer 501,
A configuration in which the data is output to 502 is possible.

【００３３】[0033]

【発明の効果】以上に述べたように、本発明の文字認識
方法及び装置によれば、マルチフォント環境において
も、高品質の文字画像と低品質の文字画像のいずれに対
しても高速かつ高精度の文字認識が可能になる。本発明
の辞書作成方法及び装置によれば、そのような高速、高
精度の文字認識を可能にするための高品質文字用辞書と
低品質文字用辞書を作成することができる。また、本発
明の文字品質判定方法によれば、マルチフォント環境に
おいても、文字画像の品質を高精度に判定することが可
能になる、等々の効果を得られるものである。As described above, according to the character recognition method and apparatus of the present invention, even in a multi-font environment, both high-quality character images and low-quality character images can be processed at high speed and high quality. Accurate character recognition becomes possible. According to the dictionary creation method and apparatus of the present invention, a high-quality character dictionary and a low-quality character dictionary for enabling such high-speed and high-precision character recognition can be created. Further, according to the character quality determination method of the present invention, it is possible to obtain the effects of enabling the quality of a character image to be determined with high accuracy even in a multi-font environment.

[Brief description of the drawings]

【図１】本発明の一実施形態としての文字認識／辞書作
成装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a character recognition / dictionary creating apparatus according to an embodiment of the present invention.

【図２】文字認識装置としての処理フローを示すフロー
チャートである。FIG. 2 is a flowchart showing a processing flow as a character recognition device.

【図３】文字「灌」の高品質文字画像の距離変換画像と
その極大値画像の例を示す図である。FIG. 3 is a diagram illustrating an example of a distance-converted image of a high-quality character image of the character “I” and its maximum value image.

【図４】文字「灌」の潰れ文字画像の距離変換画像とそ
の極大値画像の例を示す図である。FIG. 4 is a diagram illustrating an example of a distance-converted image of a crushed character image of the character “I” and its maximum value image.

【図５】辞書作成装置としての処理フローを示すフロー
チャートである。FIG. 5 is a flowchart showing a processing flow as a dictionary creation device.

【図６】本発明をソフトウェアにより実現するためのコ
ンピュータの構成例を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration example of a computer for realizing the present invention by software.

【図７】本発明をソフトウェアにより実現するためのク
ライアント・サーバシステムの概略図である。FIG. 7 is a schematic diagram of a client-server system for realizing the present invention by software.

【図８】５種類のフォントに関し、さまざまな品質の
「灌」の文字画像を、その距離変換画像の極大値の分散
の大きい順にソートした結果を示す図である。FIG. 8 is a diagram showing a result of sorting character images of various quality “I” with respect to five types of fonts in descending order of the variance of the maximum value of the distance conversion image.

[Explanation of symbols]

１００文字画像入力部１０２文字品質判定部１０３文字品質判定パラメータ検出部１０４条件判定部１０６文字認識部１０７マッチング部１０８辞書選択部１１０辞書作成部１１２高品質文字用辞書１１４低品質文字用辞書１１６出力部 Reference Signs List 100 Character image input unit 102 Character quality judgment unit 103 Character quality judgment parameter detection unit 104 Condition judgment unit 106 Character recognition unit 107 Matching unit 108 Dictionary selection unit 110 Dictionary creation unit 112 High quality character dictionary 114 Low quality character dictionary 116 Output Department

Claims

[Claims]

1. A character image input step of inputting a character image to be recognized, a parameter detection step of detecting a character quality determination parameter for determining the quality of the character image input in the character image input step, A condition determination step of determining the quality of the input character image by performing a condition determination on the character quality determination parameter detected by the detection step; a high-quality character prepared in advance when the condition determination step determines that the quality is high; A dictionary for selecting a low-quality character dictionary prepared in advance when a low-quality dictionary is determined by the condition determining step; and inputting the input characters using the dictionary selected in the dictionary selecting step. A character recognition step of performing character recognition of an image, and a character recognition step performed by the character recognition step. And a character quality determination parameter detected in the parameter detecting step includes a variance of a maximum value of a distance-converted image of the input character image.

2. The character recognition method according to claim 1, wherein the character quality determination parameter detected in the parameter detecting step further includes a ratio of black pixels of the input character image to all pixels.

3. The character recognition according to claim 1, wherein the character quality judgment parameter detected in the parameter detection step further includes a value obtained by dividing the number of black pixels of the input character image by a contour length. Method.

4. A value obtained by dividing the contour length of the input character image by the number of black pixels having the maximum value of the distance-converted image of the input character image to the character quality determination parameter detected in the parameter detecting step. The character recognition method according to claim 1, wherein the character recognition method is included.

5. A character image input unit for inputting a character image to be recognized, a character quality judgment parameter for judging the quality of the character image input from the character image input unit, and the detected character quality parameter A character quality determination unit that determines the quality of the input character image by performing a condition determination on the dictionary, a low-quality character dictionary, a high-quality character dictionary,
When the character quality determination unit determines that the quality is high, the high-quality character dictionary is used. When the quality is low, the low-quality character dictionary is used. A character recognizing unit that performs the recognition, and an output unit that outputs a recognition result by the character recognizing unit. The character recognition device characterized by including.

6. A character image input step of inputting a character image for creating a dictionary, a parameter detecting step of detecting a character quality determination parameter for determining the quality of the character image input in the character image input step, A condition determination step of determining the quality of the input character image by performing a condition determination on the character quality determination parameter detected by the parameter detection step; and converting the input image determined to be high quality by the condition determination step into a high quality character image. Creating the high-quality character dictionary by using it as a learning image of the dictionary; and using the input image determined to be low-quality by the condition determining step as a learning image of the low-quality character dictionary. A dictionary creation step for creating a quality character dictionary, wherein the characters detected by the parameter detection step are A method for creating a dictionary, wherein the quality determination parameter includes a variance of a maximum value of a distance-converted image of the input character image.

7. A character image input unit for inputting a character image for creating a dictionary, a character quality determination parameter for determining the quality of the character image input from the character image input unit by the character image input unit, and the detected character A character quality determining unit that determines the quality of the input character image by performing a condition determination regarding the quality parameter; and the input image determined to be high quality by the character quality determining unit as a learning image for a high quality character dictionary. The high quality character dictionary is created using the input image determined as low quality by the character quality determination unit as a learning image of the low quality character dictionary to create the low quality character dictionary. A dictionary creation unit, wherein the character quality determination parameter detected by the character quality determination unit includes a variance of a maximum value of a distance-converted image of the input character image. Place.

8. A character quality judging method, wherein a variance of a local maximum value of a distance-converted image of a character image is used as a parameter for judging the quality of the character image.

9. A program for causing a computer to execute a character image input step, a parameter detection step, a condition determination step, a dictionary selection step, a character recognition step, and a recognition result output step according to claim 1, 2, 3, or 4. A computer-readable recording medium characterized by being recorded.