JP3763954B2

JP3763954B2 - Learning data creation method and recording medium for character recognition

Info

Publication number: JP3763954B2
Application number: JP34091797A
Authority: JP
Inventors: 秀明山形
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1997-12-11
Filing date: 1997-12-11
Publication date: 2006-04-05
Anticipated expiration: 2017-12-11
Also published as: JPH11175663A

Description

【０００１】
【発明の属する技術分野】
本発明は、多値文字画像データから文字認識用の学習データを作成する方法および該作成方法を実行するプログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
多値文字画像に対する最適２値化しきい値を求める方法がいくつか提案されている。例えば、多値画像を複数の閾値で２値化したとき、各閾値における（総輪郭数）²／（黒画素数）を求め、該値が最も大きい閾値を基に認識に最適な閾値を決定する最適２値化方法がある（特開平２−２４７８７号公報を参照）。あるいは、多値画像データを複数の閾値で２値化した際の各閾値に対する平均線幅を求め、最適な線幅設定値に最も近い平均線幅の一つを選択し、その平均線幅を基に２値化閾値を決定する画像データの２値化装置もある（特開平５−２８２４９４号公報を参照）。
【０００３】
【発明が解決しようとする課題】
上記した従来の手法は、品質の良い２値文字画像を得るための２値化閾値を算出するものである。ところで、文字認識（ＯＣＲ）用の学習データを作成する際に、低品質な原稿の認識に対する頑強性を向上させるためには、学習データ中に認識対象とする低品質な文字画像が含まれている必要がある。前述したように、これまでＯＣＲの認識対象画像について、品質の良い文字画像を得るための多くの手法が提案されているが、学習画像として用いる低品質な文字画像を得るための手法については提案されていない。
【０００４】
低品質の画像といっても、極めて低品質で人間が見ても判別不可能な文字画像については、学習データとして用いるのは好ましくない。また、人間が見て識別可能であるか否かを判断するのは個人差などもあり極めて困難な問題であるが、前掲した前者の公報に示すような線分の状態を示すパラメータ（（総輪郭数）²／（黒画素数））を用いることによって、それに近い処理を行うことが可能である。
【０００５】
本発明は上記した事情を背景になされたもので、
本発明の目的は、人間が見て識別可能な範囲で低品質な学習用の文字画像を作成する文字認識用の学習データ作成方法および記録媒体を提供することにある。
【０００６】
【課題を解決するための手段】
前記目的を達成するために、請求項１記載の発明では、多値文字画像を複数の２値化閾値で２値化し、２値化された複数の文字画像を文字認識用の学習データとする文字認識用の学習データ作成方法であって、２値化閾値に対する文字画像の（輪郭長） ² ／（黒画素数）の値が最大となる２値化閾値をかすれ画像作成閾値として算出し、前記値が最小となる２値化閾値をつぶれ画像作成閾値として算出し、前記かすれ画像作成閾値およびつぶれ画像作成閾値を基に前記複数の２値化閾値を算出することを特徴としている。
【０００７】
請求項２記載の発明では、前記複数の２値化閾値で２値化された複数の文字画像の大きさと、最適２値化された文字画像の大きさとを比較し、前記複数の文字画像の内、大きさの相違が所定の条件を満たす文字画像を文字認識用の学習データとすることを特徴としている。
【０００８】
請求項３記載の発明では、請求項１または２記載の文字認識用の学習データ作成方法をコンピュータに実現させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体であることを特徴としている。
【０００９】
【発明の実施の形態】
以下、本発明の一実施例を図面を用いて具体的に説明する。
図１は、本発明の実施例の構成を示す。図において、１はスキャナなどの画像入力部、２は読み込まれた多値文字画像を格納するメモリ、３は学習画像を作成する学習画像作成部、４はかすれ画像作成閾値算出部、５はつぶれ画像作成閾値算出部、６は最適２値化閾値算出部、７は最適２値化閾値で２値化処理した画像を格納する最適２値化画像メモリ、８は採用された学習画像からノイズを除去するノイズ除去部、９はノイズ除去された学習画像を格納する学習画像メモリである。また、図２は、本発明の学習用画像の作成処理フローチャートである。
【００１０】
スキャナなどの画像入力部１によって原稿などを読み取り、多値（１６階調、２５６階調など）の文字画像を入力する（ステップ１０１）。次いで、かすれ画像作成閾値算出部４は、かすれ画像を作成する閾値を算出する（ステップ１０２）。ここで、かすれ画像作成閾値とは、２値化閾値に対する（輪郭長）²／（黒画素数）の値が局所最大となる２値化閾値で、後述する最適２値化閾値よりも大きく、なおかつ最適２値化閾値との差が最も小さいものを、かすれ画像作成閾値とする。なお、本発明の「輪郭長」は、前掲した特開平２−２４７８７号公報に記載の「総輪郭数」に同義であり、また（輪郭長）²／（黒画素数）＝（総輪郭数）²／（黒画素数）である。例えば、図６の場合、黒画素数は３であり、白画素と黒画素が隣接している画素部分の総数（輪郭長さ）は８である。
【００１１】
図３は、文字「量」の閾値に対する（輪郭長）²／（黒画素数）の変化と、文字画像の変化を示す。また、「量」の文字画像は輪郭によって図示されている。そして、（輪郭長）²／（黒画素数）の値が最大（ｍａｘ）となる閾値（ＴＨｍａｘ）が、かすれ２値化閾値である。
【００１２】
つぶれ画像作成閾値算出部５は、つぶれ画像を作成する閾値を算出する（ステップ１０３）。ここで、つぶれ画像作成閾値とは、２値化閾値に対する（輪郭長）²／（黒画素数）の値が局所最小となる２値化閾値で、後述する最適２値化閾値よりも小さく、なおかつ最適２値化閾値との差が最も小さいものを、つぶれ画像作成閾値とする。図３に示す、（輪郭長）²／（黒画素数）の値が最小（ｍｉｎ）となる閾値（ＴＨｍｉｎ）が、つぶれ２値化閾値である。
【００１３】
続いて、学習画像作成部３は、かすれ画像作成閾値（ＴＨｍａｘ）とつぶれ画像作成閾値（ＴＨｍｉｎ）の間にｎ個の２値化閾値を設定する（ステップ１０４）。例えば、ｍ（０，１，・・・，ｎ−１）番目の２値化閾値ＴＨ（ｍ）は以下のように求める。
【００１４】
ＴＨ（ｍ）＝ｍ（ＴＨｍａｘ−ＴＨｍｉｎ）／（ｎ−１）
最適２値化閾値算出部６は、最適２値化閾値を算出する（ステップ１０５）。最適２値化閾値の算出は以下の手順で求める。すなわち、（１）多値文字画像中から濃度勾配が局所最大となる画素を求める。（２）濃度勾配が局所最大となる画素の画素値の平均値を求め、これを最適２値化閾値とする。
【００１５】
学習画像作成部３は、多値文字画像を最適２値化閾値で２値化処理を行なった画像を、最適２値化画像として最適２値化画像メモリ７に保存する（ステップ１０６）。次いで、ステップ１０４で設定されたｎ個の閾値で２値化された各画像と最適２値化画像の大きさを比較し、大きさが大きく異ならない場合には学習画像として採用し（ステップ１０７でＮｏ）、大きく異なる場合は採用しない（ステップ１０７でＹｅｓ）。
【００１６】
例えば、最適２値化画像の外接矩形の左上座標が（×ｓｓ，Ｙｓｓ），右下座標が（Ｘｓｅ，Ｙｓｅ）であり、ＴＨ（ｍ）で２値化された画像の左上座標が（Ｘｒｓ，Ｙｒｓ）、右下座標が（Ｘｒｅ，Ｙｒｅ）のとき、以下の判定式を全て満たす場合に学習画像として採用する。図４は、最適２値化画像との大きさの比較を説明する図である。
【００１７】
｜Ｘｓｓ−Ｘｒｓ｜＜Ｔｓ
｜Ｘｓｅ−Ｘｒｅ｜＜Ｔｓ
｜Ｙｓｓ−Ｙｒｓ｜＜Ｔｓ
｜Ｙｓｅ−Ｙｒｅ｜＜Ｔｓ
ノイズ除去部８は、学習画像として採用された２値画像について、最適２値画像との比較を行ない、ノイズを除去する（ステップ１０８）。図５は、ノイズ除去を説明する図であり、具体的には、ＴＨ（ｍ）での２値化画像中に、最適２値化画像が全く含まない黒画素の連結成分を除去する。
【００１８】
そして、最後に、学習画像作成部３は、ノイズ除去の終了した複数の画像を学習用画像としてメモリ９に出力する（ステップ１０９）。出力された複数の学習用２値文字画像はパターン辞書の作成に用いられる。
【００１９】
本発明は上記した実施例に限定されず、ソフトウエアによっても実現することができる。本発明をソフトウエアによって実現する場合には、図７に示すように、ＣＰＵ、メモリ、表示装置、ハードディスク、キーボード、ＣＤ‐ＲＯＭドライブ、マウスなどからなるコンピュータシステムを用意する。ＣＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体には、本発明の学習データ作成機能や処理手順を実現するプログラムなどが記録されている。また、多値文字画像は例えばハードディスクなどに格納されている。そして、ＣＰＵは、記録媒体から上記した処理機能、処理手順を実現するプログラムを読み出し、ハードディスクなどから読み込まれた多値文字画像から学習用の２値文字画像を作成し、ハードディスクなどに書き出す。
【００２０】
【発明の効果】
以上、説明したように、本発明によれば、かすれ画像作成の２値化閾値およびつぶれ画像作成の２値化閾値を考慮した複数の２値化閾値を用いて、多値文字画像から複数の学習用文字画像デー夕を作成しているので、認識対象画像が、かすれやつぶれがある低品質な文字画像であっても、精度良く認識することができる。
【図面の簡単な説明】
【図１】本発明の実施例の構成を示す。
【図２】本発明の学習用画像の作成処理フローチャートである。
【図３】文字「量」の閾値に対する（輪郭長）²／（黒画素数）の変化と、文字画像の変化を示す。
【図４】最適２値化画像との大きさの比較を説明する図である。
【図５】ノイズ除去を説明する図である。
【図６】黒画素数と輪郭長を説明する図である。
【図７】本発明をソフトウェアによって実現する場合の構成例を示す。
【符号の説明】
１画像入力部
２多値文字画像メモリ
３学習画像作成部
４かすれ画像作成閾値算出部
５つぶれ画像作成閾値算出部
６最適２値化閾値算出部
７最適２値化画像メモリ
８ノイズ除去部
９学習画像メモリ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for creating learning data for character recognition from multi-value character image data, and a recording medium on which a program for executing the creation method is recorded.
[0002]
[Prior art]
Several methods for obtaining the optimum binarization threshold for multi-value character images have been proposed. For example, when a multi-valued image is binarized with a plurality of threshold values, (total contour number) ² / (number of black pixels) at each threshold value is obtained, and the optimum threshold value for recognition is determined based on the threshold value having the largest value. There is an optimal binarization method (see JP-A-2-24787). Alternatively, the average line width for each threshold when binarizing the multivalued image data with a plurality of threshold values is obtained, and one of the average line widths closest to the optimum line width setting value is selected, and the average line width is determined. There is also a binarization device for image data that determines a binarization threshold based on it (see Japanese Patent Laid-Open No. 5-282494).
[0003]
[Problems to be solved by the invention]
The conventional method described above calculates a binarization threshold value for obtaining a high-quality binary character image. By the way, when creating learning data for character recognition (OCR), in order to improve the robustness to the recognition of a low-quality document, the learning data includes a low-quality character image to be recognized. Need to be. As described above, many techniques have been proposed for obtaining a character image with good quality for the OCR recognition target image, but a technique for obtaining a low-quality character image to be used as a learning image has been proposed. It has not been.
[0004]
Even though it is a low-quality image, it is not preferable to use it as learning data for a character image that is extremely low quality and cannot be discerned by humans. In addition, it is extremely difficult to determine whether or not a human being can see and distinguish it, but there are individual differences and other factors. However, the parameter ((total By using (number of contours) ² / (number of black pixels)), it is possible to perform processing close to that.
[0005]
The present invention was made in the background of the above circumstances,
An object of the present invention is to provide a learning data creation method and a recording medium for character recognition that creates a low-quality character image for learning within a range that can be identified by human eyes.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, according to the first aspect of the present invention, the multi-value character image is binarized with a plurality of binarization threshold values, and the binarized character images are used as learning data for character recognition. A learning data creation method for character recognition, wherein a binarization threshold that maximizes the value of (contour length) ² / (number of black pixels) of a character image with respect to the binarization threshold is calculated as a blurred image creation threshold, A binarization threshold that minimizes the value is calculated as a collapsed image creation threshold, and the plurality of binarization thresholds are calculated based on the blurred image creation threshold and the collapsed image creation threshold .
[0007]
In the invention according to claim 2, the size of the plurality of character images binarized with the plurality of binarization threshold values is compared with the size of the character image binarized optimally, and the character images of the plurality of character images are compared. Among them, a character image whose size difference satisfies a predetermined condition is used as learning data for character recognition.
[0008]
According to a third aspect of the present invention, there is provided a computer-readable recording medium on which a program for causing a computer to realize the learning data creation method for character recognition according to the first or second aspect is recorded.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
FIG. 1 shows the configuration of an embodiment of the present invention. In the figure, 1 is an image input unit such as a scanner, 2 is a memory for storing a read multi-value character image, 3 is a learning image creation unit for creating a learning image, 4 is a blurred image creation threshold value calculation unit, and 5 is collapsed. An image creation threshold value calculation unit, 6 is an optimal binarization threshold value calculation unit, 7 is an optimal binarized image memory for storing an image binarized by the optimal binarization threshold value, and 8 is a noise signal from the adopted learning image. A noise removing unit 9 for removing noise is a learning image memory for storing the learning image from which noise has been removed. FIG. 2 is a flowchart for creating a learning image according to the present invention.
[0010]
A document or the like is read by the image input unit 1 such as a scanner, and a multi-value (16 gradation, 256 gradation, etc.) character image is input (step 101). Next, the blurred image creation threshold value calculation unit 4 calculates a threshold value for creating a blurred image (step 102). Here, the blurred image creation threshold is a binarization threshold at which the value of (contour length) ² / (number of black pixels) with respect to the binarization threshold is a local maximum, which is larger than an optimal binarization threshold described later. In addition, the image having the smallest difference from the optimum binarization threshold is set as a blurred image creation threshold. The “contour length” of the present invention is synonymous with the “total number of contours” described in JP-A-2-24787, and (contour length) ² / (number of black pixels) = (total number of contours). ) ² / (number of black pixels). For example, in the case of FIG. 6, the number of black pixels is 3, and the total number (contour length) of pixel portions where white pixels and black pixels are adjacent is 8.
[0011]
FIG. 3 shows changes in (contour length) ² / (number of black pixels) with respect to the threshold value of the character “amount” and changes in the character image. Further, the “amount” character image is illustrated by an outline. The threshold value (THmax) at which the value of (contour length) ² / (number of black pixels) becomes the maximum (max) is the blurred binarization threshold value.
[0012]
The collapsed image creation threshold value calculation unit 5 calculates a threshold value for creating a collapsed image (step 103). Here, the collapsed image creation threshold is a binarization threshold at which the value of (contour length) ² / (number of black pixels) with respect to the binarization threshold is a local minimum, which is smaller than the optimum binarization threshold described later. In addition, the image having the smallest difference from the optimum binarization threshold is set as a collapsed image creation threshold. The threshold value (THmin) at which the value of (contour length) ² / (number of black pixels) is the minimum (min) shown in FIG. 3 is the collapse binarization threshold value.
[0013]
Subsequently, the learning image creation unit 3 sets n binarization thresholds between the blurred image creation threshold (THmax) and the collapsed image creation threshold (THmin) (step 104). For example, the m (0, 1,..., N−1) -th binarization threshold TH (m) is obtained as follows.
[0014]
TH (m) = m (THmax−THmin) / (n−1)
The optimum binarization threshold value calculation unit 6 calculates an optimum binarization threshold value (step 105). The optimal binarization threshold value is calculated by the following procedure. That is, (1) A pixel having a local maximum density gradient is obtained from the multi-value character image. (2) An average value of the pixel values of the pixels having the local density maximum is obtained, and this is set as the optimum binarization threshold.
[0015]
The learning image creation unit 3 stores the image obtained by binarizing the multi-value character image with the optimum binarization threshold value in the optimum binarized image memory 7 as the optimum binarized image (step 106). Next, the size of each image binarized with the n threshold values set in step 104 is compared with the size of the optimum binarized image. If the sizes do not differ greatly, they are adopted as learning images (step 107). No), and when not greatly different, it is not adopted (Yes in Step 107).
[0016]
For example, the upper left coordinate of the circumscribed rectangle of the optimal binarized image is (xss, Yss), the lower right coordinate is (Xse, Yse), and the upper left coordinate of the image binarized with TH (m) is (Xrs , Yrs), and when the lower right coordinates are (Xre, Yre), it is adopted as a learning image when all of the following determination formulas are satisfied. FIG. 4 is a diagram for explaining the comparison of the size with the optimum binarized image.
[0017]
| Xss-Xrs | <Ts
| Xse-Xre | <Ts
| Yss-Yrs | <Ts
| Yse-Yre | <Ts
The noise removing unit 8 compares the binary image adopted as the learning image with the optimum binary image, and removes noise (step 108). FIG. 5 is a diagram for explaining noise removal. Specifically, in the binarized image at TH (m), connected components of black pixels that do not include the optimal binarized image at all are removed.
[0018]
Finally, the learning image creation unit 3 outputs a plurality of images from which noise removal has been completed to the memory 9 as learning images (step 109). The plurality of learning binary character images that are output are used to create a pattern dictionary.
[0019]
The present invention is not limited to the above-described embodiments, and can be realized by software. When the present invention is realized by software, a computer system including a CPU, a memory, a display device, a hard disk, a keyboard, a CD-ROM drive, a mouse, etc. is prepared as shown in FIG. A computer-readable recording medium such as a CD-ROM stores a program for realizing the learning data creation function and processing procedure of the present invention. The multi-value character image is stored in, for example, a hard disk. Then, the CPU reads a program for realizing the processing functions and processing procedures described above from the recording medium, creates a binary character image for learning from the multi-value character image read from the hard disk or the like, and writes it to the hard disk or the like.
[0020]
【The invention's effect】
As described above, according to the present invention, a plurality of binarization threshold values in consideration of a binarization threshold value for creating a blurred image and a binarization threshold value for creating a collapsed image are used. Since the learning character image data is created, even if the recognition target image is a low-quality character image that is blurred or crushed, it can be recognized with high accuracy.
[Brief description of the drawings]
FIG. 1 shows a configuration of an embodiment of the present invention.
FIG. 2 is a flowchart of learning image creation processing according to the present invention.
FIG. 3 shows a change in (contour length) ² / (number of black pixels) with respect to a threshold value of a character “amount” and a change in a character image.
FIG. 4 is a diagram illustrating a comparison of the size with an optimal binarized image.
FIG. 5 is a diagram illustrating noise removal.
FIG. 6 is a diagram illustrating the number of black pixels and the contour length.
FIG. 7 shows a configuration example when the present invention is realized by software.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Image input part 2 Multi-value character image memory 3 Learning image creation part 4 Blurred image creation threshold value calculation part 5 Collapsed image creation threshold value calculation part 6 Optimal binarization threshold value calculation part 7 Optimal binarization image memory 8 Noise removal part 9 Learning Image memory

Claims

Binarizing the multivalued character images in a plurality of binarization threshold, a binarized plurality of learning data creation method for character recognition for the character image and the learning data for character recognition, binarization threshold The binarization threshold value that maximizes the value of (contour length) ² / (number of black pixels ) of the character image is calculated as the blurred image creation threshold value, and the binarization threshold value that minimizes the value is used as the collapsed image creation threshold value. A learning data creation method for character recognition, characterized in that the plurality of binarization threshold values are calculated based on the blurred image creation threshold value and the collapsed image creation threshold value .

The size of the plurality of character images binarized with the plurality of binarization threshold values is compared with the size of the character image that has been binarized optimally, and a difference in size among the plurality of character images is predetermined. 2. The learning data creation method for character recognition according to claim 1, wherein a character image satisfying the above condition is used as learning data for character recognition.

A computer-readable recording medium recording a program for causing a computer to implement the learning data creation method for character recognition according to claim 1 .