JP6191286B2

JP6191286B2 - Character recognition apparatus, character recognition method, and computer program for character recognition

Info

Publication number: JP6191286B2
Application number: JP2013141680A
Authority: JP
Inventors: 悠人永田; 悠介野中; 亜希子楠本; 塩原　守人; 守人塩原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-07-05
Filing date: 2013-07-05
Publication date: 2017-09-06
Anticipated expiration: 2033-07-05
Also published as: JP2015014940A

Description

本発明は、例えば、画像上に写っている文字を認識する文字認識装置、文字認識方法及び文字認識用コンピュータプログラムに関する。 The present invention relates to a character recognition device, a character recognition method, and a character recognition computer program for recognizing characters appearing on an image, for example.

画像に写った文字（記号及び数字を含む）を認識するために、文字が写っている文字領域と文字周囲の背景が写っている背景領域とが正確に識別できることが好ましい。そこで、文字認識装置は、例えば、画像の輝度値分布に基づいて二値化閾値を決定し、その二値化閾値以上の輝度値を持つ画素と二値化閾値未満の輝度値を持つ画素とに二値化することで、文字領域と背景領域とを識別する。しかし、認識対象となる文字が記されている被写体、例えば、ナンバープレートに、他の物体の影が掛かることがある。このような場合、その被写体が写った画像に、文字領域と、背景領域と、影と文字または影と背景が写っている領域とが含まれる。そのため、例え背景が一様であったとしても、輝度値の分布形状が多峰性の形状となる。その結果として、適切な二値化閾値を設定することが困難となり、文字領域と背景領域とが正確に分離されなくなることがある。そこで、濃淡画像に対して様々な二値化閾値を用いて濃淡画像を二値画像に変換し、それぞれの二値化閾値に対する文字認識結果、信頼度及び位置を用いて最適な二値化閾値を決定する技術が提案されている（例えば、特許文献１を参照）。 In order to recognize characters (including symbols and numbers) shown in the image, it is preferable that the character region in which the character is shown and the background region in which the background around the character is shown can be accurately identified. Therefore, for example, the character recognition device determines a binarization threshold based on the luminance value distribution of the image, and a pixel having a luminance value equal to or higher than the binarization threshold and a pixel having a luminance value less than the binarization threshold. By binarizing, the character area and the background area are identified. However, a shadow of another object may be cast on a subject on which characters to be recognized are written, for example, a license plate. In such a case, the image of the subject includes a character region, a background region, and a region where a shadow and a character or a shadow and a background are reflected. For this reason, even if the background is uniform, the distribution shape of luminance values is a multimodal shape. As a result, it becomes difficult to set an appropriate binarization threshold, and the character area and the background area may not be accurately separated. Therefore, the grayscale image is converted into a binary image using various binarization thresholds for the grayscale image, and the optimum binarization threshold is determined using the character recognition result, reliability, and position for each binarization threshold. Has been proposed (see, for example, Patent Document 1).

特開平８−１５３１６３号公報JP-A-8-153163

しかしながら、影の掛かり方によっては、文字認識装置は、一つの二値化閾値で文字領域と背景領域を識別できないことがある。例えば、背景領域の輝度が文字領域の輝度よりも高い画像において、背景に影が掛かっている領域の輝度が文字領域の輝度より低いと、文字認識装置は、一つの二値化閾値で文字領域だけを他の領域と識別することはできない。そして文字領域が正確に識別されなければ、文字認識装置は、例えば、背景中の影が掛かっている部分を文字の一部と誤認識することで、文字認識結果が誤りとなるおそれがある。 However, depending on how the shadow is applied, the character recognition device may not be able to identify the character area and the background area with one binarization threshold. For example, in an image in which the brightness of the background area is higher than the brightness of the character area, if the brightness of the shadowed area is lower than the brightness of the character area, the character recognition device uses a single binarization threshold to Cannot be distinguished from other areas. If the character region is not correctly identified, the character recognition device may erroneously recognize the character recognition result by, for example, misrecognizing a shaded portion in the background as a part of the character.

そこで、本明細書は、文字が含まれる領域に影が掛かっている場合でも、その領域を撮影した画像から文字を正確に認識できる文字認識装置を提供することを目的とする。 Therefore, an object of the present specification is to provide a character recognition device capable of accurately recognizing a character from an image obtained by photographing the region even when a region including the character is shaded.

一つの実施形態によれば、文字認識装置が提供される。この文字認識装置は、画像上の文字を含む所定領域内の画素の輝度値のヒストグラムを生成するヒストグラム生成部と、ヒストグラムを、複数のヒストグラムの山に分割し、ヒストグラムの山のそれぞれは、輝度値の頻度の極大値を含む分割部と、複数のヒストグラムの山のそれぞれを第１のグループ及び第２のグループの何れかに分類し、かつ第１のグループに分類されるヒストグラムの山と第２のグループに分類されるヒストグラムの山の組み合わせを変え、その組み合わせごとに、所定領域内の画素のうち、第１のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素を第１の画素値とし、一方、第２のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素を第２の画素値とすることによって所定領域の二値画像を複数生成する二値画像生成部と、複数の二値画像のそれぞれについて、その二値画像に写っている文字を検出するとともに、その文字の確からしさを求め、複数の二値画像のそれぞれから検出された文字のうち、確からしさが最高となる文字を、所定領域に含まれる文字とする認識部とを有する。 According to one embodiment, a character recognition device is provided. This character recognition device divides a histogram into a plurality of histogram peaks, and generates a histogram of luminance values of pixels in a predetermined area including characters on an image. A division unit including a maximum value frequency value, a plurality of histogram peaks, each of which is classified into one of a first group and a second group, and a histogram peak classified into the first group A combination of histogram peaks classified into two groups is changed, and for each combination, a pixel having a luminance value corresponding to a histogram peak classified into the first group among the pixels in the predetermined region is first. On the other hand, a pixel having a luminance value corresponding to the peak of the histogram classified in the second group is set as the second pixel value. A binary image generating unit that generates a plurality of value images, and for each of the plurality of binary images, detects a character reflected in the binary image, calculates the likelihood of the character, And a recognition unit that sets a character having the highest certainty among characters detected from each character as a character included in a predetermined area.

本発明の目的及び利点は、請求項において特に指摘されたエレメント及び組み合わせにより実現され、かつ達成される。
上記の一般的な記述及び下記の詳細な記述の何れも、例示的かつ説明的なものであり、請求項のように、本発明を制限するものではないことを理解されたい。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It should be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.

本明細書に開示される文字認識装置は、文字が含まれる領域に影が掛かっている場合でも、その領域を撮影した画像から文字を正確に認識できる。 The character recognition device disclosed in the present specification can accurately recognize a character from an image obtained by capturing the area even if the area including the character is shaded.

（ａ）は、影が掛かっている被写体の画像の一例を示す図である。（ｂ）は、（ａ）に示された画像の輝度値のヒストグラムを示す図である。（ｃ）は、（ａ）に示された画像を一つの二値化閾値で二値化することにより得られる二値画像の一例を示す図である。（ｄ）は、（ａ）に示された画像を、文字領域に相当するヒストグラムの山に対応する輝度を持つ画素の集合と他のヒストグラムの山に対応する輝度を持つ画素の集合とに二値化することにより得られる二値画像の一例を示す図である。(A) is a figure showing an example of a picture of a subject with a shadow. (B) is a figure which shows the histogram of the luminance value of the image shown by (a). (C) is a figure which shows an example of the binary image obtained by binarizing the image shown by (a) with one binarization threshold value. (D) converts the image shown in (a) into a set of pixels having luminance corresponding to the histogram peaks corresponding to the character area and a set of pixels having luminance corresponding to the other histogram peaks. It is a figure which shows an example of the binary image obtained by digitizing. 一つの実施形態による文字認識装置の概略構成図である。It is a schematic block diagram of the character recognition apparatus by one Embodiment. 処理部の機能を示すブロック図である。It is a block diagram which shows the function of a process part. 輝度値と頻度の組を一つの点とする三角形の一例を示す図である。It is a figure which shows an example of the triangle which uses the set of a luminance value and frequency as one point. ヒストグラムの山が３個の場合における、二値化閾値と生成される二値画像の関係の一例を示す図である。It is a figure which shows an example of the relationship between the binarization threshold value and the generated binary image when there are three peaks in the histogram. 文字認識処理の動作フローチャートである。It is an operation | movement flowchart of a character recognition process. 輝度値のヒストグラムの山が４個ある場合における、文字領域、背景領域、文字に影が掛かっている領域及び背景に影が掛かっている領域とヒストグラムの山の対応関係を示す図である。It is a figure which shows the correspondence of a character peak, a background area | region, the area | region where the shadow is applied to the character, and the area | region where the shadow is applied to the background, and the peak of a histogram in the case where there are four histogram value peaks.

以下、図を参照しつつ、一つの実施形態による、文字認識装置について説明する。
最初に、一つの二値化閾値だけでは、文字領域を他の領域と識別できない画像の例について説明する。図１（ａ）は、影が掛かっている被写体の画像の一例を示す図である。画像１００では、白い背景に黒い文字が描写されている。また、文字以外の影が掛かっている曲線状の領域１０１も、周囲の背景より黒くなっている。 Hereinafter, a character recognition device according to an embodiment will be described with reference to the drawings.
First, an example of an image in which a character area cannot be distinguished from other areas with only one binarization threshold will be described. FIG. 1A is a diagram illustrating an example of an image of a subject with a shadow. In the image 100, black characters are depicted on a white background. Further, the curved area 101 with shadows other than characters is also blacker than the surrounding background.

図１（ｂ）は、図１（ａ）に示された画像の輝度値のヒストグラムを示す図である。図１（ｂ）において、横軸は輝度を表し、縦軸は頻度を表す。ヒストグラムの山１１１〜１１４は、それぞれ、文字に影が掛かった領域の輝度分布、背景に影が掛かった領域の輝度分布、文字領域の輝度分布、背景領域の輝度分布に対応する。ここで、最も輝度値の高い、背景領域と他の領域とを識別するように、ヒストグラムの山１１３と１１４の間に、二値化閾値Thが設定されたとする。 FIG. 1B is a diagram showing a histogram of luminance values of the image shown in FIG. In FIG. 1B, the horizontal axis represents luminance, and the vertical axis represents frequency. The histogram peaks 111 to 114 respectively correspond to the luminance distribution of the shadowed area of the character, the luminance distribution of the shadowed area of the background, the luminance distribution of the character area, and the luminance distribution of the background area. Here, it is assumed that the binarization threshold Th is set between the peaks 113 and 114 of the histogram so as to distinguish the background area and other areas having the highest luminance value.

図１（ｃ）は、図１（ａ）に示された画像を、図１（ｂ）に示した二値化閾値Thで二値化することにより得られる二値画像の一例を示す図である。この例では、二値画像１２０では、影が掛かっていない背景領域とその他の領域とに二値化され、背景に影が掛かっている領域と文字領域の識別は困難である。またこの例では、文字領域の輝度の方が背景に影が掛かっている領域の輝度よりも高いため、ヒストグラムの山１１２と１１３の間に二値化閾値Thが設定されたとしても、二値画像は、影が掛かっている領域と影が掛かっていない領域に二値化される。そのため、やはり文字領域だけを他の領域と区別されるように二値化されない。この例では、ヒストグラムの山１１１またはヒストグラムの山１１３に含まれる輝度を持つ画素と、ヒストグラムの山１１２またはヒストグラムの山１１４に含まれる輝度を持つ画素とが区別されるように、画像を二値化することが好ましい。このように画像を二値化することで、図１（ｄ）に示されるように、二値画像１３０では、文字領域１３１が背景領域１３２と区別される。そのため、背景に影が掛かっている領域が文字の一部として誤認識される可能性が低減する。 FIG. 1C is a diagram illustrating an example of a binary image obtained by binarizing the image illustrated in FIG. 1A with the binarization threshold Th illustrated in FIG. is there. In this example, the binary image 120 is binarized into a non-shadowed background area and other areas, and it is difficult to distinguish between a shadowed area and a character area. In this example, since the brightness of the character area is higher than the brightness of the shadowed area, even if the binarization threshold Th is set between the peaks 112 and 113 of the histogram, The image is binarized into a shaded area and a non-shadowed area. Therefore, the binarization is not performed so that only the character area can be distinguished from other areas. In this example, an image is binarized so that pixels having luminance included in the histogram peak 111 or histogram peak 113 are distinguished from pixels having luminance included in the histogram peak 112 or histogram peak 114. Is preferable. By binarizing the image in this manner, the character area 131 is distinguished from the background area 132 in the binary image 130 as shown in FIG. Therefore, the possibility that a region with a shadow on the background is erroneously recognized as part of a character is reduced.

そこで、この文字認識装置は、文字を含む被写体を撮影した画像の輝度値のヒストグラムを求め、そのヒストグラムを、ヒストグラムの山ごとに分割する。この文字認識装置は、個々のヒストグラムの山を、組み合わせを変えつつ、第１のグループ及び第２のグループの何れかに分類する。この文字認識装置は、組み合わせごとに、第１のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素の値と、第２のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素の値とを異ならせることで二値画像を複数生成する。そしてこの文字認識装置は、二値画像ごとに文字認識を実行し、最も確からしい認識結果に対応する文字を、画像に写っている文字とする。 Therefore, this character recognition apparatus obtains a histogram of luminance values of an image obtained by photographing a subject including characters, and divides the histogram for each peak of the histogram. This character recognition device classifies each peak of the histogram into either the first group or the second group while changing the combination. For each combination, the character recognition device obtains a value of a pixel having a luminance value corresponding to the peak of the histogram classified into the first group and a luminance value corresponding to the peak of the histogram classified into the second group. A plurality of binary images are generated by making the pixel values different from each other. The character recognition device executes character recognition for each binary image, and sets the character corresponding to the most probable recognition result as the character shown in the image.

図２は、一つの実施形態による文字認識装置の概略構成図である。文字認識装置１は、撮像部２が認識対象の少なくとも一つの文字を含む被写体を撮影することにより生成した画像に基づいて、その画像に写っている少なくとも一つの文字を認識する。そのために、文字認識装置１は、画像取得部１１と、出力部１２と、記憶部１３と、記憶媒体アクセス装置１４と、処理部１５とを有する。さらに文字認識装置１は、文字認識結果を表示する液晶ディスプレイなどの表示装置を有してもよい。処理部１５は、画像取得部１１、出力部１２、記憶部１３及び記憶媒体アクセス装置１４と、例えば、バスを介して接続される。 FIG. 2 is a schematic configuration diagram of a character recognition device according to an embodiment. The character recognition device 1 recognizes at least one character appearing in an image based on an image generated by the imaging unit 2 capturing a subject including at least one character to be recognized. For this purpose, the character recognition device 1 includes an image acquisition unit 11, an output unit 12, a storage unit 13, a storage medium access device 14, and a processing unit 15. Furthermore, the character recognition device 1 may have a display device such as a liquid crystal display for displaying the character recognition result. The processing unit 15 is connected to the image acquisition unit 11, the output unit 12, the storage unit 13, and the storage medium access device 14 via, for example, a bus.

撮像部２は、例えば、デジタルビデオカメラであり、被写体がその撮影範囲内に含まれるように設置される。例えば、認識対象となる文字を含む被写体が車両のナンバープレートである場合、撮像部２は、例えば、道路を走行中の車両の前面が撮影範囲に含まれるように屋外に設置される。そして撮像部２は、被写体を撮影することにより、その被写体が写った画像を生成する。撮像部２は、画像を生成する度に、その画像を画像取得部１１へ出力する。
なお、本実施形態では、撮像部２が生成する画像は、画素値が輝度値のみを有するグレー画像とする。 The imaging unit 2 is a digital video camera, for example, and is installed so that the subject is included in the shooting range. For example, when the subject including characters to be recognized is a license plate of a vehicle, the imaging unit 2 is installed outdoors so that, for example, the front surface of the vehicle traveling on a road is included in the shooting range. Then, the imaging unit 2 shoots the subject to generate an image showing the subject. The imaging unit 2 outputs the image to the image acquisition unit 11 each time an image is generated.
In the present embodiment, the image generated by the imaging unit 2 is a gray image whose pixel value has only a luminance value.

画像取得部１１は、撮像部２と接続するための通信インターフェース及びその制御回路を有する。そのような通信インターフェースは、例えば、Universal Serial Bus（ユニバーサル・シリアル・バス、USB）などの周辺機器接続用の通信規格に従ったインターフェース、あるいはビデオ信号入力用インターフェースとすることができる。
あるいは画像取得部１１は、イーサネット（登録商標）などの通信規格に従った通信ネットワークに接続するための通信インターフェース及びその制御回路を有してもよい。この場合には、画像取得部１１は、撮像部２から、通信ネットワークを介して画像を取得する。
画像取得部１１は、画像を取得する度に、その画像を記憶部１３に記憶させる。 The image acquisition unit 11 includes a communication interface for connecting to the imaging unit 2 and its control circuit. Such a communication interface can be, for example, an interface according to a communication standard for connecting peripheral devices such as Universal Serial Bus (USB), or an interface for video signal input.
Alternatively, the image acquisition unit 11 may include a communication interface for connecting to a communication network in accordance with a communication standard such as Ethernet (registered trademark) and its control circuit. In this case, the image acquisition unit 11 acquires an image from the imaging unit 2 via the communication network.
The image acquisition unit 11 stores the image in the storage unit 13 every time an image is acquired.

出力部１２は、例えば、被写体認識装置１を他の機器と接続するための通信インターフェース及びその制御回路を有する。そのような通信インターフェースは、USBなどの周辺機器接続用の通信規格、あるいはイーサネット（登録商標）などの通信規格に従ったインターフェースとすることができる。
出力部１２は、処理部１５による被写体の認識結果を表す情報を処理部１５から受け取り、その情報を他の機器へ出力する。なお、画像取得部１１と出力部１２とは、一体化されていてもよい。 The output unit 12 includes, for example, a communication interface for connecting the subject recognition apparatus 1 to another device and its control circuit. Such a communication interface can be an interface according to a communication standard for connecting peripheral devices such as USB or a communication standard such as Ethernet (registered trademark).
The output unit 12 receives information representing the recognition result of the subject by the processing unit 15 from the processing unit 15 and outputs the information to another device. Note that the image acquisition unit 11 and the output unit 12 may be integrated.

記憶部１３は、例えば、読み書き可能な半導体メモリと読み出し専用の半導体メモリとを有する。そして記憶部１３は、処理部１５上で実行されるコンピュータプログラム、及び画像に写っている文字を認識するために用いられる各種の情報及び文字認識処理の途中で生成される各種のデータを記憶する。また記憶部１３は、文字認識処理が終了するまで、画像取得部１１から受け取った画像を記憶してもよい。 The storage unit 13 includes, for example, a readable / writable semiconductor memory and a read-only semiconductor memory. And the memory | storage part 13 memorize | stores the computer program run on the process part 15, the various information used in order to recognize the character reflected in the image, and the various data produced | generated in the middle of a character recognition process. . The storage unit 13 may store the image received from the image acquisition unit 11 until the character recognition process is completed.

記憶媒体アクセス装置１４は、例えば、磁気ディスク、半導体メモリカード及び光記憶媒体といった記憶媒体１６にアクセスする装置である。記憶媒体アクセス装置１４は、例えば、記憶媒体１６に記憶された、処理部１５上で実行される文字認識用コンピュータプログラムを読み込み、処理部１５に渡すか、記憶部１３に記憶させる。 The storage medium access device 14 is a device that accesses the storage medium 16 such as a magnetic disk, a semiconductor memory card, and an optical storage medium. For example, the storage medium access device 14 reads a computer program for character recognition executed on the processing unit 15 stored in the storage medium 16 and passes the computer program to the processing unit 15 or stores it in the storage unit 13.

処理部１５は、１個または複数個のプロセッサ及びその周辺回路を有する。そして処理部１５は、画像取得部１１から受け取った画像に写った文字を認識する。 The processing unit 15 includes one or a plurality of processors and their peripheral circuits. Then, the processing unit 15 recognizes characters that appear in the image received from the image acquisition unit 11.

図３は、処理部１５の機能を示すブロック図である。処理部１５は、切り出し部２１と、ヒストグラム生成部２２と、分割部２３と、二値画像生成部２４と、認識部２５とを有する。処理部１５が有するこれらの各部は、例えば、処理部１５が有するプロセッサ上で実行されるコンピュータプログラムによって実装される機能モジュールである。あるいは、処理部１５が有するこれらの各部は、それぞれの機能を実現する回路が集積された集積回路として文字認識装置１に実装されてもよい。 FIG. 3 is a block diagram illustrating functions of the processing unit 15. The processing unit 15 includes a cutout unit 21, a histogram generation unit 22, a division unit 23, a binary image generation unit 24, and a recognition unit 25. Each of these units included in the processing unit 15 is, for example, a functional module implemented by a computer program executed on a processor included in the processing unit 15. Alternatively, each of these units included in the processing unit 15 may be mounted on the character recognition device 1 as an integrated circuit in which circuits for realizing the respective functions are integrated.

切り出し部２１は、画像から、一文字ごとに、文字領域及びその周囲の背景領域を含む、認識対象領域を切り出す。本実施形態では、切り出し部２１は、Y.Ariki and T.Teranishi、''Indexing and classification of TV news articles based on telop recognition''、 Proc. of 4th ICDAR、 pp. 422-497、1997年に開示されているように、認識対象領域からはエッジ成分が多く検出されるという特徴に着目することで、認識対象領域を切り出す。 The cutout unit 21 cuts out a recognition target area including a character area and a surrounding background area for each character from the image. In this embodiment, the cutout unit 21 is disclosed in Y. Ariki and T. Teranishi, “Indexing and classification of TV news articles based on telop recognition”, Proc. Of 4th ICDAR, pp. 422-497, 1997. As described above, the recognition target area is cut out by paying attention to the feature that many edge components are detected from the recognition target area.

切り出し部２１は、先ず、prewittフィルタまたはsobelフィルタといったエッジ検出フィルタを用いて、空間的に輝度値が変化するエッジ画素を検出する。そして切り出し部２１は、水平方向及び垂直方向のそれぞれの画素列ごとにエッジ画素の数をカウントする。切り出し部２１は、例えば、画像上で想定される文字の高さに相当する、連続した複数の水平方向の画素列を含むブロックごとにエッジ画素の総数を求め、その総数が所定数以上となるブロックに文字列が含まれていると判定する。そして切り出し部２１は、文字列が含まれるブロックの上下の境界を、それぞれ、認識対象領域の上端及び下端とする。同様に、切り出し部２１は、文字列が含まれるブロックのそれぞれについて、画像上で想定される文字の幅に相当する、連続した複数の垂直方向の画素列を含むサブブロックごとにエッジ画素の総数を求める。そして切り出し部２１は、その総数が所定値以上となるサブブロックに一つの文字が含まれていると判定する。そして切り出し部２１は、文字が含まれるサブブロックの左右の境界を、認識対象領域の左端及び右端とする。これにより、一文字ごとに、その文字を含む認識対象領域が切り出される。 The cutout unit 21 first detects an edge pixel whose luminance value spatially changes using an edge detection filter such as a prewitt filter or a sobel filter. Then, the cutout unit 21 counts the number of edge pixels for each pixel row in the horizontal direction and the vertical direction. For example, the cutout unit 21 obtains the total number of edge pixels for each block including a plurality of continuous pixel rows in the horizontal direction corresponding to the assumed character height on the image, and the total number is equal to or greater than a predetermined number. It is determined that a character string is included in the block. Then, the cutout unit 21 sets the upper and lower boundaries of the block including the character string as the upper end and the lower end of the recognition target area, respectively. Similarly, for each block including a character string, the cutout unit 21 calculates the total number of edge pixels for each sub-block including a plurality of continuous vertical pixel columns corresponding to the character width assumed on the image. Ask for. Then, the cutout unit 21 determines that one character is included in the sub-block whose total number is equal to or greater than a predetermined value. Then, the cutout unit 21 sets the left and right boundaries of the sub-block including the characters as the left end and the right end of the recognition target area. Thereby, the recognition object area | region containing the character is cut out for every character.

なお、切り出し部２１は、文字を含む領域を切り出すための他の方法を利用して、画像から認識対象領域を切り出してもよい。 Note that the cutout unit 21 may cut out the recognition target region from the image by using another method for cutting out a region including characters.

ヒストグラム生成部２２は、認識対象領域ごとに、認識対象領域に含まれる各画素の輝度値を調べて、輝度値ごとに、その輝度値を持つ画素の数、すなわち頻度を求めることにより、輝度値のヒストグラムを生成する。なお、ヒストグラム生成部２２は、所定の輝度値の幅を持つ帯域（例えば、輝度値が0〜255で表される場合、2〜8）ごとに、その帯域に含まれる輝度値を持つ画素の数を頻度として求めてもよい。そしてヒストグラム生成部２２は、輝度値のヒストグラムを記憶部１３に記憶する。 The histogram generation unit 22 examines the luminance value of each pixel included in the recognition target region for each recognition target region, and obtains the number of pixels having the luminance value for each luminance value, that is, the frequency, thereby obtaining the luminance value. Generate a histogram of Note that the histogram generation unit 22 generates a pixel having a luminance value included in the band for each band having a predetermined luminance value width (for example, 2 to 8 when the luminance value is represented by 0 to 255). The number may be obtained as a frequency. Then, the histogram generation unit 22 stores the luminance value histogram in the storage unit 13.

分割部２３は、ヒストグラムを、複数のヒストグラムの山に分割する。なお、ヒストグラムの山のそれぞれは、輝度値の頻度の極大値を含む。 The dividing unit 23 divides the histogram into a plurality of histogram peaks. Each of the peaks in the histogram includes a maximum value of the frequency of luminance values.

そのために、分割部２３は、例えば、特開平８−２２１５１３号公報に開示されているように、輝度値の最小値から最大値までを解析範囲としてヒストグラムの分布の偏りを示すスキュー値を算出する。分割部２３は、ヒストグラムが輝度値の低い方に偏っている場合は、輝度値の最小値から平均値までを解析範囲に設定し直して、再度スキュー値を調べる。一方、分割部２３は、ヒストグラムが輝度値の高い方に偏っている場合は、輝度値の平均値から最大値までを解析範囲に設定し直して、再度スキュー値を調べる。分割部２３は、スキュー値がヒストグラムの分布の偏りが小さいことに相当する値となったときに、その解析範囲内の輝度値の平均値を、ヒストグラムの山と山とを区別する二値化閾値とする。分割部２３は、二値化閾値で二つに区切られた解析範囲のそれぞれを、新たな解析範囲として上記の処理を繰り返す。隣接する二つの二値化閾値の間の度数の合計が所定の頻度閾値（例えば、認識対象領域の全画素数の1/10）未満となったときに、分割部２３は、その直前に求められた二値化閾値を破棄する。そして分割部２３は、それ以前に求められた連続する二つの二値化閾値に挟まれた部分を、一つのヒストグラムの山とする。 For this purpose, the dividing unit 23 calculates a skew value indicating a bias in the distribution of the histogram with the analysis range from the minimum value to the maximum value of the luminance value as disclosed in, for example, Japanese Patent Application Laid-Open No. 8-221513. . When the histogram is biased toward a lower luminance value, the dividing unit 23 resets the luminance value from the minimum value to the average value as the analysis range, and checks the skew value again. On the other hand, when the histogram is biased toward a higher luminance value, the dividing unit 23 resets the luminance value from the average value to the maximum value as the analysis range, and checks the skew value again. The dividing unit 23 binarizes the average value of the luminance values in the analysis range when the skew value becomes a value corresponding to a small distribution deviation of the histogram so as to distinguish the peak and the peak of the histogram. The threshold is used. The dividing unit 23 repeats the above processing using each of the analysis ranges divided into two by the binarization threshold as a new analysis range. When the sum of the frequencies between two adjacent binarization thresholds is less than a predetermined frequency threshold (for example, 1/10 of the total number of pixels in the recognition target area), the dividing unit 23 obtains the value immediately before that. Discard the binarization threshold value. Then, the dividing unit 23 sets a portion between two consecutive binarization threshold values obtained before that as one histogram peak.

あるいは、分割部２３は、解析範囲内でスキュー値がヒストグラムの分布の偏りが小さいことに相当する値となったときに、その解析範囲の輝度値の平均よりも低い輝度値について頻度最大となる第１の輝度値を求める。同様に、分割部２３は、その解析範囲の輝度値の平均よりも高い輝度値について頻度最大となる第２の輝度値を求める。分割部２３は、第１の輝度値と第２の輝度値の間で、頻度が最小となる第３の輝度値を求める。そして分割部２３は、輝度値を横軸、頻度を縦軸として、第１の輝度値〜第３の輝度値のそれぞれについて、その輝度値と対応する頻度の組を一つの点の座標として、それら三点の座標を結ぶ三角形の面積を算出する。 Alternatively, when the skew value within the analysis range becomes a value corresponding to a small bias in the distribution of the histogram, the dividing unit 23 maximizes the frequency for a luminance value lower than the average of the luminance values in the analysis range. A first luminance value is obtained. Similarly, the dividing unit 23 obtains a second luminance value having a maximum frequency for a luminance value higher than the average of luminance values in the analysis range. The dividing unit 23 obtains a third luminance value having a minimum frequency between the first luminance value and the second luminance value. Then, the dividing unit 23 sets the luminance value and the frequency corresponding to each of the first luminance value to the third luminance value as the coordinates of one point, with the luminance value as the horizontal axis and the frequency as the vertical axis. The area of a triangle connecting the coordinates of these three points is calculated.

図４は、輝度値と頻度の組を一つの点とする三角形の一例を示す図である。図４において、横軸は輝度値を表し、縦軸は頻度を表す。線４００はヒストグラムを表し、点４０１〜点４０３は、それぞれ、第１〜第３の輝度値とその輝度値についての頻度の組の座標を表す点である。分割部２３は、点４０１〜４０３をそれぞれ頂点とする三角形４１０の面積を算出する。 FIG. 4 is a diagram illustrating an example of a triangle having a set of luminance value and frequency as one point. In FIG. 4, the horizontal axis represents the luminance value, and the vertical axis represents the frequency. A line 400 represents a histogram, and points 401 to 403 are points representing the coordinates of a set of frequencies for the first to third luminance values and the luminance values, respectively. The dividing unit 23 calculates the area of the triangle 410 whose vertices are the points 401 to 403, respectively.

分割部２３は、その面積が所定の閾値よりも大きければ、第３の輝度値を二値化閾値とする。なお、所定の閾値は、例えば、予め、実験的に決められる。分割部２３は、その面積が所定の閾値よりも大きい場合、解析範囲内で、例えば、大津の二値化法を適用することで、二値化閾値を決定してもよい。一方、その面積が所定の閾値未満であれば、分割部２３は、第３の輝度値を二値化閾値とせず、解析範囲全体が一つのヒストグラムの山に含まれると判定する。 If the area is larger than the predetermined threshold, the dividing unit 23 sets the third luminance value as the binarization threshold. The predetermined threshold is experimentally determined in advance, for example. When the area is larger than the predetermined threshold, the dividing unit 23 may determine the binarization threshold by applying, for example, the Otsu binarization method within the analysis range. On the other hand, if the area is less than the predetermined threshold, the dividing unit 23 determines that the entire analysis range is included in one histogram peak without using the third luminance value as the binarization threshold.

なお、分割部２３は、ヒストグラムを山ごとに区別する他の方法に従って、個々のヒストグラムの山及びその山で最大頻度となる輝度値を求めてもよい。そして分割部２３は、隣接する二つのヒストグラムの山の組ごとに、その二つのヒストグラムの山の最大頻度となる輝度値間を解析範囲として大津の二値化法を適用することにより、その二つのヒストグラムの山の間に二値化閾値を設定してもよい。 The dividing unit 23 may obtain each histogram peak and the luminance value having the maximum frequency at the peak according to another method of distinguishing the histogram for each peak. Then, the dividing unit 23 applies the Otsu's binarization method for each pair of adjacent two histogram peaks by using the Otsu binarization method with the luminance range between the two histogram peaks as the analysis range. A binarization threshold value may be set between two histogram peaks.

分割部２３は、認識対象領域ごとに、それぞれのヒストグラムの山の境界を表す二値化閾値を記憶部１３に記憶する。 The dividing unit 23 stores, in the storage unit 13, a binarization threshold value that represents the boundary between the peaks of each histogram for each recognition target region.

二値画像生成部２４は、複数のヒストグラムの山のそれぞれを第１のグループ及び第２のグループの何れかに分類する。その際、二値画像生成部２４は、第１のグループに分類されるヒストグラムの山と第２のグループに分類されるヒストグラムの山の組み合わせを様々に変えてヒストグラムの山の組み合わせを複数作成する。そして二値画像生成部２４は、ヒストグラムの山の組み合わせごとに、認識対象領域内の画素のうち、第１のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素を第１の画素値（例えば、0）とする。一方、二値画像生成部２４は、第２のグループに分類されたヒストグラムの山に対応する輝度値を持つ画素を第２の画素値（例えば、255）とする。これによって、二値画像生成部２４は、認識対象領域ごとに、二値画像を複数生成する。なお、二値画像生成部２４は、考え得る全ての組み合わせについて二値画像を作成してもよく、あるいは、考え得る全ての組み合わせの一部についてのみ二値画像を作成してもよい。 The binary image generation unit 24 classifies each of the plurality of histogram peaks into either the first group or the second group. At that time, the binary image generation unit 24 creates a plurality of combinations of histogram peaks by changing various combinations of histogram peaks classified into the first group and histogram peaks classified into the second group. . Then, for each combination of histogram peaks, the binary image generation unit 24 selects pixels having luminance values corresponding to the histogram peaks classified into the first group among the pixels in the recognition target region as the first pixels. Value (for example, 0). On the other hand, the binary image generation unit 24 sets a pixel having a luminance value corresponding to the peak of the histogram classified into the second group as the second pixel value (for example, 255). Thereby, the binary image generation unit 24 generates a plurality of binary images for each recognition target area. Note that the binary image generation unit 24 may create binary images for all possible combinations, or may create binary images only for some of all possible combinations.

図５は、ヒストグラムの山が３個の場合における、二値化閾値と生成される二値画像の関係の一例を示す図である。図５の上側には、輝度値のヒストグラムの一例が示される。なお、このヒストグラムにおいて、横軸は輝度値を表し、縦軸は頻度を表す。ヒストグラムの山５０１〜５０３は、それぞれ、影が掛かった背景領域、文字領域、影が掛かっていない背景領域に対応する。そして二値化閾値Th1、Th2は、それぞれ、ヒストグラムの山５０１と５０２の間、及びヒストグラムの山５０２と５０３の間に設定される。一方、図５の下側には、生成される二値画像の例が示される。 FIG. 5 is a diagram illustrating an example of a relationship between a binarization threshold value and a generated binary image when there are three histogram peaks. An example of a histogram of luminance values is shown on the upper side of FIG. In this histogram, the horizontal axis represents the luminance value, and the vertical axis represents the frequency. The peaks 501 to 503 of the histogram correspond to a background area with shadows, a character area, and a background area without shadows, respectively. The binarization thresholds Th1 and Th2 are set between the histogram peaks 501 and 502 and between the histogram peaks 502 and 503, respectively. On the other hand, an example of a generated binary image is shown on the lower side of FIG.

例えば、二値画像５１１は、二値化閾値Th1より低い輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも低くなるように認識対象領域を二値化することにより生成される。逆に、二値画像５１２は、二値化閾値Th1より低い輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも高くなるように認識対象領域を二値化することにより生成される。
また、二値画像５１３は、二値化閾値Th1以上で、かつ、二値化閾値Th2より低い輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも低くなるように認識対象領域を二値化することにより生成される。逆に、二値画像５１４は、二値化閾値Th1以上で、かつ、二値化閾値Th2より低い輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも高くなるように認識対象領域を二値化することにより生成される。
さらに、二値画像５１５は、二値化閾値Th2以上の輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも低くなるように認識対象領域を二値化することにより生成される。逆に、二値画像５１６は、二値化閾値Th2以上の輝度値を持つ画素の値が、他の輝度値を持つ画素の値よりも高くなるように認識対象領域を二値化することにより生成される。 For example, the binary image 511 is generated by binarizing the recognition target region so that the value of a pixel having a luminance value lower than the binarization threshold Th1 is lower than the value of a pixel having another luminance value. Is done. Conversely, the binary image 512 is obtained by binarizing the recognition target area so that the value of a pixel having a luminance value lower than the binarization threshold Th1 is higher than the value of a pixel having another luminance value. Generated.
Also, the binary image 513 is recognized so that the value of a pixel having a luminance value equal to or higher than the binarization threshold Th1 and lower than the binarization threshold Th2 is lower than the value of a pixel having another luminance value. It is generated by binarizing the target area. Conversely, in the binary image 514, the value of a pixel having a luminance value that is equal to or higher than the binarization threshold Th1 and lower than the binarization threshold Th2 is higher than the value of a pixel having another luminance value. It is generated by binarizing the recognition target area.
Further, the binary image 515 is generated by binarizing the recognition target area so that the value of the pixel having the luminance value equal to or higher than the binarization threshold Th2 is lower than the value of the pixel having the other luminance value. Is done. Conversely, the binary image 516 binarizes the recognition target area so that the value of the pixel having the luminance value equal to or higher than the binarization threshold Th2 is higher than the value of the pixel having the other luminance value. Generated.

二値画像生成部２４は、認識対象領域ごとに、生成した各二値画像を記憶部１３に記憶する。 The binary image generation unit 24 stores the generated binary images in the storage unit 13 for each recognition target area.

認識部２５は、認識対象領域ごとに、二値画像のそれぞれに対して文字認識処理を行って、各二値画像に写っている文字を検出するとともに、その確からしさを求める。そして認識部２５は、二値画像のそれぞれから検出された文字のうち、最も確からしい文字を、その認識対象領域に写っている文字と判定する。 The recognition unit 25 performs character recognition processing on each of the binary images for each recognition target area, detects characters appearing in each binary image, and obtains its certainty. And the recognition part 25 determines with the most probable character among the characters detected from each of the binary image as the character reflected in the recognition object area | region.

認識部２５は、認識対象領域ごとに、その認識対象領域について求められた二値画像のそれぞれと、認識対象となる個々の文字に対応する複数のテンプレートとの間でテンプレートマッチングを行う。そして認識部２５は、例えば、認識対象領域と各テンプレートとの正規化相互相関値を算出する。そして認識部２５は、認識対象領域の二値画像ごとに、正規化相互相関値の最大値を、その最大値となるテンプレートに対応する文字が認識対象領域に写っている確からしさとして求めるとともに、その最大値に対応するテンプレートの文字を求める。
認識部２５は、各二値画像について求められた確からしさを比較し、確からしさが最大となる文字を、認識対象領域に写っている文字として認識する。
認識部２５は、各認識対象領域について認識された文字を、認識対象領域の並びに従って配列することにより、画像に写っている文字列を認識する。 For each recognition target area, the recognition unit 25 performs template matching between each of the binary images obtained for the recognition target area and a plurality of templates corresponding to individual characters to be recognized. And the recognition part 25 calculates the normalization cross correlation value of a recognition object area | region and each template, for example. And the recognition part 25 calculates | requires the maximum value of a normalization cross-correlation value for every binary image of a recognition object area | region as probability that the character corresponding to the template used as the maximum value is reflected in the recognition object area | region, The character of the template corresponding to the maximum value is obtained.
The recognizing unit 25 compares the probabilities obtained for the respective binary images, and recognizes the character having the maximum certainty as a character appearing in the recognition target area.
The recognition unit 25 recognizes the character string shown in the image by arranging the characters recognized for each recognition target area according to the arrangement of the recognition target areas.

なお、認識部２５は、画像上に写っている文字を認識する他の方法に従って、認識対象領域の二値画像に写っている文字を検出し、かつその確からしさを求めてもよい。例えば、認識部２５は、認識対象領域の各二値画像から、文字の交点及び端点といった文字の形状に関する特徴量を抽出する。そして認識部２５は、二値画像ごとに、その二値画像から抽出された特徴量を、所定の文字であることの確からしさを出力する識別器に入力することで、二値画像にその所定の文字が写っている確からしさを求めてもよい。そして認識部２５は、二値画像ごとに、確からしさが最大となる文字を、検出された文字とし、その確からしさの最大値を、その二値画像についての文字の確からしさとすればよい。なお、このような識別器は、例えば、多層パーセプトロン、サポートベクターマシンあるいはReal adaBoost識別器とすることができ、認識対象となる文字ごとに予め準備される。そしてこれらの識別器は、予め、認識対象となる文字が写った複数のサンプル画像と、その文字が写っていない複数のサンプル画像とに基づいて、その識別器に応じたいわゆる教師付学習アルゴリズムに従って学習される。 Note that the recognizing unit 25 may detect the character shown in the binary image of the recognition target area and obtain the certainty according to another method for recognizing the character shown in the image. For example, the recognition unit 25 extracts a feature amount related to a character shape such as a character intersection and an end point from each binary image of the recognition target region. And the recognition part 25 inputs the feature-value extracted from the binary image into the discriminator which outputs the certainty that it is a predetermined character for every binary image, and it is the predetermined image to a binary image. You may ask for the certainty that the letter is reflected. And the recognition part 25 should just make the character with the largest probability as the detected character for every binary image, and let the maximum value of the certainty be the character certainty about the binary image. Such a discriminator can be, for example, a multilayer perceptron, a support vector machine, or a Real adaBoost discriminator, and is prepared in advance for each character to be recognized. And these discriminators are based on a so-called supervised learning algorithm corresponding to the discriminator based on a plurality of sample images in which characters to be recognized are captured in advance and a plurality of sample images in which the characters are not captured. To be learned.

処理部１５は、認識部２５により認識された文字列を表す情報を出力部１２を介して他の機器へ出力する。その際、処理部１５は、認識された文字列を表す情報とともに、その認識に利用した画像も出力部１２を介して他の機器へ出力してもよい。あるいは、処理部１５は、認識部２５により認識された文字列を表す情報を記憶部１３に記憶してもよい。 The processing unit 15 outputs information representing the character string recognized by the recognition unit 25 to another device via the output unit 12. At that time, the processing unit 15 may output information used for the recognition together with information representing the recognized character string to another device via the output unit 12. Alternatively, the processing unit 15 may store information representing the character string recognized by the recognition unit 25 in the storage unit 13.

図６は、処理部１５により実行される文字認識処理の動作フローチャートである。処理部１５は、画像ごとに、この文字認識処理を実行する。 FIG. 6 is an operation flowchart of character recognition processing executed by the processing unit 15. The processing unit 15 executes this character recognition process for each image.

処理部１５の切り出し部２１は、画像から、一文字ごとに認識対象領域を切り出す（ステップＳ１０１）。処理部１５のヒストグラム生成部２２は、認識対象領域ごとに、輝度値のヒストグラムを生成する（ステップＳ１０２）。処理部１５の分割部２３は、輝度値のヒストグラムをヒストグラムの山ごとに分割する（ステップＳ１０３）。 The cutout unit 21 of the processing unit 15 cuts out a recognition target area for each character from the image (step S101). The histogram generation unit 22 of the processing unit 15 generates a histogram of luminance values for each recognition target area (step S102). The dividing unit 23 of the processing unit 15 divides the luminance value histogram for each peak of the histogram (step S103).

処理部１５の二値画像生成部２４は、各認識対象領域について、各ヒストグラムの山を、組み合わせを変えつつ二つのグループに分類する。そして二値画像生成部２４は、グループごとに異なる値を、そのグループに属するヒストグラムの山に対応する輝度値を持つ画素に割り当てることで複数の二値画像を生成する（ステップＳ１０４）。そして処理部１５の認識部２５は、各認証対象領域についての二値画像ごとに、その二値画像に写っている文字を検出し、かつ、その確からしさを求める（ステップＳ１０５）。認識部２５は、各二値画像のうち、確からしさが最大となる二値画像から検出された文字を、認識対象領域に写っている文字とする（ステップＳ１０６）。そして認識部２５は、各認識対象領域の並び順にしたがって、各認識対象領域について認識された文字を並べることにより、画像に写っている文字列を求め、その文字列を表す情報を出力部１２を介して出力する（ステップＳ１０７）。そして処理部１５は、文字認識処理を終了する。 The binary image generation unit 24 of the processing unit 15 classifies the peaks of each histogram into two groups for each recognition target region while changing the combination. Then, the binary image generation unit 24 generates a plurality of binary images by assigning different values for each group to pixels having luminance values corresponding to the peaks of the histogram belonging to the group (step S104). And the recognition part 25 of the process part 15 detects the character reflected in the binary image for every binary image about each authentication object area | region, and calculates | requires the certainty (step S105). The recognition unit 25 sets a character detected from the binary image having the maximum certainty among the respective binary images as a character reflected in the recognition target area (step S106). Then, the recognition unit 25 obtains a character string shown in the image by arranging characters recognized for each recognition target region in accordance with the arrangement order of each recognition target region, and outputs information representing the character string to the output unit 12. (Step S107). Then, the processing unit 15 ends the character recognition process.

以上に説明してきたように、この文字認識装置は、画像上の認識対象領域の輝度値のヒストグラムを複数のヒストグラムの山に分割し、各ヒストグラムの山を、組み合わせを変えつつ二つのグループに分類する。この文字認識装置は、グループごとに異なる値を、そのグループに属するヒストグラムの山に対応する輝度値を持つ画素に割り当てることで複数の二値画像を生成する。そしてこの文字認識装置は、二値画像ごとに文字認識を行って、最も確からしい認識結果が得られた二値画像から検出された文字を、実際に画像に写っている文字とする。そのため、この文字認識装置は、被写体領域に影が掛かるなどの理由により、輝度値のヒストグラムが３個以上の山を持つ多峰性の形状を示す場合でも、文字領域と背景領域とを適切に区分した二値画像に基づいて文字認識を行える。そのため、この文字認識装置は、画像に写った文字を正確に認識できる。 As described above, this character recognition device divides the histogram of the luminance value of the recognition target area on the image into a plurality of histogram peaks, and classifies each histogram peak into two groups while changing the combination. To do. This character recognition device generates a plurality of binary images by assigning different values for each group to pixels having luminance values corresponding to the peaks of the histogram belonging to the group. The character recognition apparatus performs character recognition for each binary image, and sets the character detected from the binary image that has obtained the most probable recognition result as the character actually reflected in the image. For this reason, this character recognition device appropriately distinguishes the character area and the background area even when the histogram of luminance values shows a multi-modal shape having three or more peaks due to a shadow on the subject area. Character recognition can be performed based on the segmented binary image. Therefore, this character recognition device can accurately recognize characters in the image.

次に、第２の実施形態による文字認識装置について説明する。第２の実施形態による文字認識装置は、ヒストグラムの山が４個となる場合に、影による輝度変化の特性を考慮して、生成する二値画像の数を抑制する。 Next, a character recognition device according to a second embodiment will be described. The character recognition device according to the second embodiment suppresses the number of binary images to be generated in consideration of the characteristic of luminance change due to shadows when there are four histogram peaks.

第２の実施形態による文字認識装置は、第１の実施形態による文字認識装置と比較して、処理部１５の二値画像生成部２４の処理が異なる。そこで以下では、二値画像生成部２４について説明する。第２の実施形態による文字認識装置のその他の構成要素については、第１の実施形態による文字認識装置の対応する構成要素の説明を参照されたい。 The character recognition device according to the second embodiment differs from the character recognition device according to the first embodiment in the processing of the binary image generation unit 24 of the processing unit 15. Therefore, the binary image generation unit 24 will be described below. For other components of the character recognition device according to the second embodiment, refer to the description of the corresponding components of the character recognition device according to the first embodiment.

認識対象領域についての輝度値のヒストグラムの山が４個である場合、ヒストグラムの山ごとに、その山に含まれる輝度値を持つ画素に第１の値または第２の値を割り当てるとすると、二値画像生成部２４は、最大で2⁴=16個の二値画像を生成することになる。
ここで、ヒストグラムの山が４個である場合、それぞれのヒストグラムの山は、影が掛かっていない文字領域と、影が掛かっている文字領域と、影が掛かっていない背景領域と、影が掛かっている背景領域の何れかに対応すると推定される。また、影が掛かった領域に含まれる画素の輝度は低下する。そのため、それぞれのヒストグラムの山と、上記の４種類の領域の取り得る関係は図７のテーブル７００に示される４パターンに限定される。 If there are four peaks in the luminance value histogram for the recognition target area, if the first value or the second value is assigned to each pixel having a luminance value included in the peak, The value image generation unit 24 generates 2 ⁴ = 16 binary images at the maximum.
Here, when there are four histogram peaks, each histogram peak has a shadowed character region, a shadowed character region, a non-shadowed background region, and a shadowed region. It is estimated that it corresponds to any of the background areas. In addition, the luminance of the pixels included in the shaded area decreases. Therefore, the relationship between each histogram peak and the above four types of areas is limited to the four patterns shown in the table 700 of FIG.

図７に示されたテーブル７００において、パターン１及びパターン２は、文字領域の輝度が背景領域の輝度よりも高い、いわゆる白抜き文字の場合における、ヒストグラムの山と各領域の想定される関係を示す。一方、パターン３及びパターン４は、文字領域の輝度が背景領域の輝度よりも低い場合における、ヒストグラムの山と各領域の想定される関係を示す。また順位１〜４は、それぞれ、４個のヒストグラムの山の輝度値が高い方からの順位を表す。そして『文字＋影領域』は、影が掛かった文字領域を表し、『背景＋影領域』は、影が掛かった背景領域を表す。
ここで、認識部２５が使用する文字のテンプレートにおいて、文字領域に含まれる画素が第１の画素値（例えば、0）を持ち、文字領域の周囲の画素が第２の画素値（例えば、255）を持つとする。この場合、二値画像に写っている文字と同じ文字のテンプレートの正規化相互相関値が最大となるように二値画像を生成するには、二値画像でも、文字領域に含まれる画素が第１の画素値を持ち、背景領域に含まれる画素が第２の画素値を持つことが好ましい。 In the table 700 shown in FIG. 7, the pattern 1 and the pattern 2 indicate the assumed relationship between the peak of the histogram and each region in the case of so-called white characters in which the luminance of the character region is higher than the luminance of the background region. Show. On the other hand, pattern 3 and pattern 4 show the assumed relationship between the peaks of the histogram and each region when the luminance of the character region is lower than the luminance of the background region. In addition, each of the ranks 1 to 4 represents a rank from the highest luminance value of the four histogram peaks. “Character + shadow region” represents a shadowed character region, and “background + shadow region” represents a shadowed background region.
Here, in the character template used by the recognition unit 25, pixels included in the character area have a first pixel value (for example, 0), and pixels around the character area have a second pixel value (for example, 255). ). In this case, in order to generate a binary image so that the normalized cross-correlation value of the template of the same character as the character shown in the binary image is maximized, the pixels included in the character region are the first even in the binary image. It is preferable that a pixel having a pixel value of 1 and a pixel included in the background area has a second pixel value.

パターン１の場合、順位１または２のヒストグラムの山に相当する輝度値を持つ画素に第１の画素値を割り当て、順位３または４のヒストグラムの山に相当する輝度値を持つ画素に第２の画素値を割り当てる。また、パターン２の場合、順位１または３のヒストグラムの山に相当する輝度値を持つ画素に第１の画素値を割り当て、順位２または４のヒストグラムの山に相当する輝度値を持つ画素に第２の画素値を割り当てる。 In the case of the pattern 1, the first pixel value is assigned to the pixel having the luminance value corresponding to the peak of the histogram of rank 1 or 2, and the second pixel is assigned to the pixel having the luminance value corresponding to the peak of the histogram of rank 3 or 4. Assign pixel values. In the case of pattern 2, the first pixel value is assigned to the pixel having the luminance value corresponding to the peak of the histogram of rank 1 or 3, and the pixel having the luminance value corresponding to the peak of the histogram of rank 2 or 4 is assigned to the first pixel value. Allocate 2 pixel values.

同様に、パターン３の場合、順位１または２のヒストグラムの山に相当する輝度値を持つ画素に第２の画素値を割り当て、順位２または４のヒストグラムの山に相当する輝度値を持つ画素に第１の画素値を割り当てる。また、パターン４の場合、順位１または３のヒストグラムの山に相当する輝度値を持つ画素に第２の画素値を割り当て、順位２または４のヒストグラムの山に相当する輝度値を持つ画素に第１の画素値を割り当てる。これにより、文字領域と背景領域が分離された二値画像が得られる。 Similarly, in the case of pattern 3, a second pixel value is assigned to a pixel having a luminance value corresponding to the peak of the histogram of rank 1 or 2, and a pixel having a luminance value corresponding to the peak of the histogram of rank 2 or 4 is assigned. A first pixel value is assigned. In the case of the pattern 4, the second pixel value is assigned to the pixel having the luminance value corresponding to the peak of the histogram of rank 1 or 3, and the pixel having the luminance value corresponding to the peak of the histogram of rank 2 or 4 is assigned to the second pixel value. A pixel value of 1 is assigned. Thereby, a binary image in which the character area and the background area are separated is obtained.

このように、ヒストグラムの山が４個ある場合、上記のパターン１〜パターン４の何れかに該当するので、二値画像生成部２４は、最大で４通りの二値画像のみを生成すればよい。なお、認識部２５が、文字の交点、端点の位置又は数などの特徴量に基づいて文字認識を行う場合には、文字領域と背景領域のどちらの輝度が高くてもよい。そのため、この場合には、二値画像生成部２４は、パターン１または３の場合に相当する二値画像とパターン２または４の場合に相当する二値画像の二つのみを生成すればよい。 As described above, when there are four histogram peaks, any one of the above-described patterns 1 to 4 is satisfied, and therefore the binary image generation unit 24 only needs to generate a maximum of four types of binary images. . In addition, when the recognition unit 25 performs character recognition based on the feature amount such as the intersection or end point position or number of characters, either the character region or the background region may have high brightness. Therefore, in this case, the binary image generation unit 24 may generate only two of the binary image corresponding to the pattern 1 or 3 and the binary image corresponding to the pattern 2 or 4.

このように、第２の実施形態による文字認識装置は、影が掛かった領域の輝度の低下を考慮して、文字が写っている可能性のある輝度のヒストグラムの山とそうでないヒストグラムの山を異なるグループに分類した組み合わせに対応する二値画像のみを生成する。そのため、この文字認識装置は、生成する二値画像の数を抑制できるので、文字認識処理全体の演算量も抑制できる。
なお、輝度値のヒストグラムが５個以上の山を持つ場合についても、二値画像生成部は、そのような輝度分布となる原因を考慮して、文字領域と背景領域とが区別される可能性がある二値画像のみを生成することで、生成する二値画像の数を抑制できる。
また、二値画像生成部２４は、ヒストグラムの山の数が３個以下の場合には、ヒストグラムの山を二つのグループに分類する全ての組み合わせについて二値画像を生成してもよい。そして、ヒストグラムの山の数が４個以上の場合に、二値画像生成部２４は、組み合わせの一部についてのみ二値画像を生成してもよい。 As described above, the character recognition device according to the second embodiment takes into consideration the decrease in luminance in the shadowed area, and the histogram histogram peaks that may be reflected in the characters and the histogram peaks that are not in the shadowed regions. Only binary images corresponding to combinations classified into different groups are generated. Therefore, since this character recognition device can suppress the number of binary images to be generated, the amount of calculation of the entire character recognition process can be suppressed.
Even when the histogram of luminance values has five or more peaks, the binary image generation unit may distinguish the character area from the background area in consideration of the cause of such a luminance distribution. By generating only a certain binary image, the number of generated binary images can be suppressed.
Further, when the number of peaks in the histogram is three or less, the binary image generator 24 may generate binary images for all combinations that classify the peaks in the histogram into two groups. Then, when the number of peaks in the histogram is four or more, the binary image generation unit 24 may generate a binary image for only a part of the combination.

なお、上記の各実施形態または変形例において、撮像部２が生成する画像は、RGB表色系で表されるカラー画像であってもよい。この場合には、処理部１５が、そのカラー画像を、例えばHLS表色系の画像に変換し、その変換された画像の各画素の輝度値に基づいて処理部１５の各処理を実行すればよい。 Note that in each of the above-described embodiments or modifications, the image generated by the imaging unit 2 may be a color image represented in the RGB color system. In this case, the processing unit 15 converts the color image into, for example, an HLS color system image, and executes each process of the processing unit 15 based on the luminance value of each pixel of the converted image. Good.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms listed herein are intended for instructional purposes to help the reader understand the concepts contributed by the inventor to the present invention and the promotion of the technology. It should be construed that it is not limited to the construction of any example herein, such specific examples and conditions, with respect to showing the superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions and modifications can be made thereto without departing from the spirit and scope of the present invention.

１文字認識装置
２撮像部
１１画像取得部
１２出力部
１３記憶部
１４記憶媒体アクセス装置
１５処理部
１６記憶媒体
２１切り出し部
２２ヒストグラム生成部
２３分割部
２４二値画像生成部
２５認識部 DESCRIPTION OF SYMBOLS 1 Character recognition apparatus 2 Imaging part 11 Image acquisition part 12 Output part 13 Storage part 14 Storage medium access apparatus 15 Processing part 16 Storage medium 21 Cutout part 22 Histogram generation part 23 Dividing part 24 Binary image generation part 25 Recognition part

Claims

A histogram generation unit that generates a histogram of luminance values of pixels in a predetermined area including characters on the image;
The histogram is divided into a plurality of histogram peaks, each of the histogram peaks including a maximum value of luminance value frequency,
Each of the plurality of histogram peaks is classified into one of a first group and a second group, and a histogram peak classified into the first group and a histogram classified into the second group Changing the combination of peaks, for each of the combinations, out of the pixels in the predetermined area, the pixel having a luminance value corresponding to the peak of the histogram classified into the first group is set as the first pixel value, A binary image generation unit that generates a plurality of binary images of the predetermined region by setting a pixel having a luminance value corresponding to a peak of the histogram classified into the second group as a second pixel value;
For each of the plurality of binary images, a character appearing in the binary image is detected, and the likelihood of the character is obtained. Among the characters detected from each of the plurality of binary images, the certainty A recognition unit that sets the character having the highest likelihood as a character included in the predetermined area;
A character recognition device.

The binary image generation unit classifies, among the plurality of histogram peaks, a mountain in which a character may appear, into the first group, and classifies another mountain into the second group. The character recognition apparatus according to claim 1, wherein only a binary image corresponding to a combination of histogram peaks is generated.

When the number of peaks in the histogram is four, the binary image generation unit classifies the first histogram peak and the third histogram peak into a first group in order from the highest luminance value, And the binary image corresponding to the combination which classified the peak of the 2nd histogram and the peak of the 4th histogram into the 2nd group, the peak of the 1st histogram, and the peak of the 2nd histogram are the 1st group. The character recognition device according to claim 2, further generating a binary image corresponding to a combination in which the third histogram peak and the fourth histogram peak are classified into the second group.

Generate a histogram of the luminance values of the pixels in a given area that contains characters on the image,
The histogram is divided into a plurality of histogram peaks, each of the histogram peaks including a local maximum of luminance frequency;
Each of the plurality of histogram peaks is classified into one of a first group and a second group, and a histogram peak classified into the first group and a histogram classified into the second group Changing the combination of peaks, for each of the combinations, out of the pixels in the predetermined area, the pixel having a luminance value corresponding to the peak of the histogram classified into the first group is set as the first pixel value, Generating a plurality of binary images of the predetermined region by setting a pixel having a luminance value corresponding to a peak of the histogram classified into the second group as a second pixel value;
For each of the plurality of binary images, a character appearing in the binary image is detected, and the likelihood of the character is obtained. Among the characters detected from each of the plurality of binary images, the certainty The character with the highest likelihood is the character included in the predetermined area,
Character recognition method.

Generate a histogram of the luminance values of the pixels in a given area that contains characters on the image,
The histogram is divided into a plurality of histogram peaks, each of the histogram peaks including a local maximum of luminance frequency;
Each of the plurality of histogram peaks is classified into one of a first group and a second group, and a histogram peak classified into the first group and a histogram classified into the second group Changing the combination of peaks, for each of the combinations, out of the pixels in the predetermined area, the pixel having a luminance value corresponding to the peak of the histogram classified into the first group is set as the first pixel value, Generating a plurality of binary images of the predetermined region by setting a pixel having a luminance value corresponding to a peak of the histogram classified into the second group as a second pixel value;
For each of the plurality of binary images, a character appearing in the binary image is detected, and the likelihood of the character is obtained. Among the characters detected from each of the plurality of binary images, the certainty The character with the highest likelihood is the character included in the predetermined area,
A computer program that causes a computer to execute the operation.