JP6671613B2

JP6671613B2 - Character recognition method and computer program

Info

Publication number: JP6671613B2
Application number: JP2017049764A
Authority: JP
Inventors: 栄竹内; 克犬嶋
Original assignee: SOFNEC CO., LTD.
Current assignee: SOFNEC CO., LTD.
Priority date: 2017-03-15
Filing date: 2017-03-15
Publication date: 2020-03-25
Anticipated expiration: 2037-03-15
Also published as: JP2018152026A

Description

本発明は、数多くの色が使われている画像内の個々の文字を認識し、特に１つの文字に複数の色が使われていたり、グラデーションがかかっていたり、ハッチングされていたりする文字の認識も可能とする文字認識方法に関する。 The present invention recognizes individual characters in an image in which many colors are used, and in particular, recognizes a character in which multiple colors are used, a gradation is applied, or a hatched character. And a character recognition method that enables the same.

テレビ映像のような動画像には、画像に文字がオーバーレイされていることが多く、文字のみを抽出する機能が必要となることがある。最近の画像の多くはカラー画像であって、背景にも文字にも通常複数の色が用いられるので、対象となるカラー画像から文字のみを抽出して、その文字を認識することは容易ではない。特許文献１には、背景がある画像からも文字列を抽出するようにした「文字認識装置及び画像処理プログラム」が提案されている。 In a moving image such as a television image, characters are often overlaid on the image, and a function of extracting only characters may be required. Many of recent images are color images, and usually a plurality of colors are used for the background and the character, so it is not easy to extract only the character from the target color image and recognize the character. . Patent Document 1 proposes a “character recognition device and image processing program” in which a character string is extracted from an image having a background.

特開２０１５−１８４６９１号公報JP 2015-184691 A

特許文献１に記載の発明は、色やサイズが同じ文字が並んでいるテレビや映画の字幕、ドキュメント類の文字認識およびテキスト処理に適している。しかしながら、この発明では、画像全体の中で文字がまばらに配置されていたり、文字の大きさがさまざまであったり、文字列を構成する個々の文字の色が異なっていたりする場合、例えばテレビのバラエティ番組のテロップのような文字の認識には向いているとはいえない。 The invention described in Patent Literature 1 is suitable for character recognition and text processing of subtitles and documents of televisions and movies in which characters having the same color and size are arranged. However, according to the present invention, when characters are sparsely arranged in the entire image, the size of the characters is various, or the colors of the individual characters constituting the character string are different, for example, in a television, It is not suitable for recognizing characters such as telops in variety programs.

本発明は、字幕やドキュメントのほかに、位置も大きさも色も異なる文字が含まれるカラー画像から、高い精度で文字を抽出し、その文字を認識することを課題とする。 An object of the present invention is to extract characters with high precision from a color image including characters having different positions, sizes, and colors, in addition to subtitles and documents, and recognize the characters.

本発明は、画像に含まれる文字を認識する文字認識方法であって、対象となる画像から生成された複数の２値画像を取得するステップと、各２値画像から連結成分を抽出するステップと、近接した連結成分の組み合わせ（以下、「連結成分群」）が文字認識対象となる文字候補であるか否かを判定するステップと、前記文字候補と判定された連結成分群をニューラルネットワークにかけて、文字か非文字かを判定させ、その判定結果が文字であれば文字コードとその尤度を、非文字であれば文字でない尤度を取得するステップとからなり、ニューラルネットワークの文字用の教師データには、書体の相違によらず同一の文字には同一のコードが付与されており、ニューラルネットワークの非文字用の教師データには、フラクタル生成処理によって生成されたフラクタル図形と、複数の文字をランダムに組み合わせたデータとが含まれることを特徴とする。本発明はどのような画像も対象となるが、特に多種類の色を含むカラー画像を対象とした文字認識に高い効果を発揮する。カラー画像でなくても、グレースケールのモノクロ画像内の文字を認識するのにも役立つ。 The present invention is a character recognition method for recognizing characters included in an image, comprising: obtaining a plurality of binary images generated from a target image; and extracting a connected component from each of the binary images. Determining whether a combination of adjacent connected components (hereinafter, “connected component group”) is a character candidate to be subjected to character recognition; and subjecting the connected component group determined as the character candidate to a neural network, to determine whether a character or non-character, if the judgment result is a character and a character code thereof likelihood consists of a step of obtaining a likelihood non-character as long as it is a non-character, the teacher data for the neural network character The same code is assigned to the same character regardless of the typeface, and the teacher data for non-characters in the neural network is processed by the fractal generation process. The fractal graphic generated as described above and data obtained by randomly combining a plurality of characters are included . The present invention can be applied to any image, and is particularly effective in character recognition for a color image including various types of colors. Even if it is not a color image, it is also useful for recognizing characters in a grayscale monochrome image.

本発明の画像処理方法によれば、文字か非文字かをニューラルネットワークを利用して判定するので、運用実績に伴い認識精度が向上する。ニューラルネットワークにはいろいろな種類があるが、下記の実施形態では畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ。以下、「ＣＮＮ」）を用いている。複数の２値画像を対象に文字認識を行うので、背景と文字のそれぞれが複数の色を含むカラー画像であっても、文字認識結果は高い精度が期待できる。例えば、一部の２値画像によっては連結成分が消失し、文字認識不能であるとしても、他の２値画像によって文字認識が可能となる場合もある。連結成分群をニューラルネットワークにかける前に、文字の可能性が有るか否かを簡易判定するので、迅速な処理速度が実現できる。 According to the image processing method of the present invention, whether a character is a character or a non-character is determined by using a neural network. There are various types of neural networks. In the following embodiments, a convolutional neural network (Convolutional Neural Network; hereinafter, "CNN") is used. Since character recognition is performed on a plurality of binary images, even if the background and the character are each a color image including a plurality of colors, high accuracy of the character recognition result can be expected. For example, a connected component may be lost in some binary images and character recognition may not be possible, but character recognition may be possible in another binary image. Before applying the connected component group to the neural network, it is simply determined whether or not there is a possibility of a character, so that a quick processing speed can be realized.

本発明では、
同一の文字には同一のコードが付与されることにより、ニューラルネットワークの汎化能力が高まり、異なる人による手書き文字であっても認識可能となる。既存のフォントにはない若干デザイン化された文字にも対応可能である。
また、非文字用の教師データには、フラクタル生成処理によって生成されたフラクタル図形と、複数の文字をランダムに組み合わせたデータとが含まれることにより、非文字用の教師データを迅速大量に生成できるので、２値画像に含まれるノイズの抽出が容易になる。
In the present invention ,
By assigning the same code to the same character, the generalization capability of the neural network is enhanced, and even a handwritten character by a different person can be recognized. It can also handle slightly designed characters that are not available in existing fonts.
In addition, since the non-character teacher data includes a fractal figure generated by the fractal generation process and data obtained by randomly combining a plurality of characters, a large amount of non-character teacher data can be generated quickly. Therefore, it is easy to extract noise included in the binary image.

本発明では、Ｋ−ｍｅａｎｓ法によってＮ個（Ｎ＞＝３）のグループに分類し、分類されたＮ個のグループを２分し、一方に含まれるピクセルが白で、他方に含まれるピクセルが黒で表示された２^Ｎ−２個の２値画像を文字認識の対象とすることが望ましい。これらの２^Ｎ−２個の２値画像には、互いに白と黒が反転した２値画像の対が含まれているので、白抜きした文字や周囲を縁取りした文字等の認識も可能となる。 In the present invention, the N groups (N> = 3) are classified by the K-means method, and the classified N groups are divided into two groups. Pixels included in one of the groups are white, and pixels included in the other are white. It is desirable that the ^2N -2 binary images displayed in black be subjected to character recognition. Since these 2 ^N −2 binary images include pairs of binary images in which white and black are inverted from each other, it is also possible to recognize white characters, characters with borders around them, and the like. .

本発明では、各２値画像から得られた文字候補のニューラルネットワークによる判定結果を、文字候補毎にその外接矩形の位置と大きさとともに出力することが望ましい。 In the present invention, it is desirable to output the result of the neural network determination of the character candidates obtained from each binary image together with the position and size of the circumscribed rectangle for each character candidate.

入力されたカラー画像に含まれる文字を、背景から取り出して、高い精度でその文字を認識できる。特に、１つの文字に複数の色が使われていたり、グラデーションがかかっていたりする文字や、画像内に孤立して存在する文字であっても認識可能である。モノクロであっても、グレースケールの画像に含まれる文字であって、文字毎に或は１つの文字内で輝度が異なる場合にも、高い精度で文字認識が可能である。 Characters included in the input color image can be extracted from the background and can be recognized with high accuracy. In particular, it is possible to recognize even a character in which a plurality of colors are used or a gradation is applied to one character, or a character which is isolated in an image. Even in the case of monochrome, even if the characters are included in a grayscale image and the luminance differs for each character or within one character, character recognition can be performed with high accuracy.

本発明の実施形態に係る文字認識装置の構成を示す機能ブロック図である。It is a functional block diagram showing the composition of the character recognition device concerning the embodiment of the present invention. 本発明の実施形態に係る学習用の文字データを例示する図である。It is a figure which illustrates the character data for learning which concerns on embodiment of this invention. 本発明の実施形態に係る非文字を学習するために生成されたフラクタル図形を例示する図である。FIG. 6 is a diagram illustrating a fractal figure generated for learning a non-character according to the embodiment of the present invention. 本発明の実施形態に係る学習用の文字類似の非文字データを例示する図である。It is a figure which illustrates non-character data similar to the character for learning concerning embodiment of this invention. 本発明の実施形態に係る２値画像の個数を説明する図である。FIG. 4 is a diagram illustrating the number of binary images according to the embodiment of the present invention. 本発明の実施形態に係る文字認識が、複数の２値画像を必要とすることを説明するための図である。FIG. 4 is a diagram for explaining that character recognition according to the embodiment of the present invention requires a plurality of binary images. 本発明の実施形態に係る文字認識が、複数の２値画像を必要とすることを説明するための図である。FIG. 4 is a diagram for explaining that character recognition according to the embodiment of the present invention requires a plurality of binary images. 本発明の実施形態に係る文字認識の処理フローを示す流れ図である。5 is a flowchart showing a processing flow of character recognition according to the embodiment of the present invention. 本発明の実施形態に係る２値画像を走査し推定文字領域を抽出することを説明する図である。FIG. 6 is a diagram illustrating that a binary image is scanned and an estimated character area is extracted according to the embodiment of the present invention. 本発明の実施形態に係る「連結成分」の意義を説明するための図である。It is a figure for explaining the meaning of the “connection component” concerning the embodiment of the present invention. 本発明の実施形態に係る膨張・収縮処理による連結成分のラベリング処理を説明するための図である。It is a figure for explaining labeling processing of a connected component by expansion and contraction processing concerning an embodiment of the present invention. 本発明の実施形態に係る文字候補を構成する連結成分と、その外接矩形を説明するための図である。FIG. 4 is a diagram for describing connected components forming a character candidate and a circumscribed rectangle thereof according to the embodiment of the present invention. 本発明の実施形態に係るＣＮＮによる文字判定から除外する文字候補を説明するための図である。It is a figure for explaining a character candidate excluded from character judgment by CNN concerning an embodiment of the present invention. 本発明の実施形態に係る文字候補のＣＮＮによる判定結果を例示する図である。It is a figure which illustrates the determination result by CNN of the character candidate which concerns on embodiment of this invention. 本発明の実施形態に係る複数の２値画像から得られた文字認識結果を説明するための図である。FIG. 9 is a diagram for explaining a character recognition result obtained from a plurality of binary images according to the embodiment of the present invention.

図面を参照しながら本発明の一実施形態の文字認識処理について、次の項目別に説明する。
《１．文字認識装置の機能ブロック構成》
《２．文字認識装置による前処理（機械学習（ＣＮＮ識別器２２の生成））》
《３．文字認識装置による前処理（複数の２値画像の生成）》
《４．文字認識装置による本処理（原画像に含まれる各文字の認識）》 The character recognition processing according to the embodiment of the present invention will be described below with reference to the drawings.
<< 1. Functional block configuration of character recognition device >>
<< 2. Preprocessing by character recognition device (machine learning (generation of CNN discriminator 22)) >>
<< 3. Preprocessing by character recognition device (generation of multiple binary images) >>
<< 4. Main processing by character recognition device (recognition of each character included in original image) >>

《１．文字認識装置の機能ブロック構成》
本実施形態を実行するコンピュータ（以下、「文字認識装置」という）の機能に着目した構成について、図１を参照しながら説明する。
文字認識装置１は、パソコンやスマートフォンなどのコンピュータと、そのコンピュータに実装されたコンピュータプログラムによって実現されている。
文字認識装置１は、処理部２と記憶部３と通信インターフェース部４を備える。これらのほかに、オペレータが操作時に用いるマウスやキーボードなどの入力操作部、ディスプレイやプリンタなどの出力部やカメラなども適宜備えるが図示は省略する。
<< 1. Functional block configuration of character recognition device >>
A configuration focusing on the function of a computer that executes the present embodiment (hereinafter, referred to as a “character recognition device”) will be described with reference to FIG.
The character recognition device 1 is realized by a computer such as a personal computer and a smartphone, and a computer program installed in the computer .
The character recognition device 1 includes a processing unit 2, a storage unit 3, and a communication interface unit 4. In addition to these, an input operation unit such as a mouse and a keyboard used by the operator during operation, an output unit such as a display and a printer, a camera, and the like are appropriately provided, but not shown.

記憶部３には、入力された処理対象画像、文字識別のための学習サンプル、各種閾値等のパラメータ類、処理部２による各種の中間処理結果などが格納され、メモリやハードディスクなどの記憶装置によって実現される。
中間処理結果には、推定文字領域のピクセル群、連結成分、文字候補、２値画像ごとの文字認識結果などが含まれる。
記憶部３には、コンピュータを文字認識装置１として機能させるためのプログラムも含まれ、これらのプログラムがメモリ上に読み込まれ、読み込まれたプログラムコードを図示しないＣＰＵが実行することによって処理部２の各部が動作することになる。
次に、処理部２について説明する。 The storage unit 3 stores an input image to be processed, a learning sample for character identification, parameters such as various threshold values, various intermediate processing results by the processing unit 2, and the like, and is stored in a storage device such as a memory or a hard disk. Is achieved.
The intermediate processing result includes a pixel group of the estimated character area, a connected component, a character candidate, a character recognition result for each binary image, and the like.
The storage unit 3 also includes programs for causing a computer to function as the character recognition device 1. These programs are read into a memory, and the read program code is executed by a CPU (not shown), so that the processing unit 2 Each part operates.
Next, the processing unit 2 will be described.

処理部２は、機械学習用データ取得部２０と、機械学習部２１と、ＣＮＮ識別器２２と、２値画像取得部２３と、推定文字領域走査部２４と、連結成分抽出部２５と、文字候補選定部２６と、文字候補認識部２７と、文字認識結果出力部２８を備える。以下、各部２０〜２８について説明する。 The processing unit 2 includes a machine learning data acquisition unit 20, a machine learning unit 21, a CNN discriminator 22, a binary image acquisition unit 23, an estimated character area scanning unit 24, a connected component extraction unit 25, a character A candidate selection unit 26, a character candidate recognition unit 27, and a character recognition result output unit 28 are provided. Hereinafter, each of the units 20 to 28 will be described.

機械学習用データ取得部２０は、通信インターフェース部４を介して、外部の通信ネットワークや情報処理装置から機械学習をさせるための文字データや非文字データを取得する。非文字データの学習用にフラクタル図形を用いるが、このフラクタル図形は外部から取得してもよいが、文字認識装置１の内部にフラクタル図形生成部２０ａを備えてもよい。この実施形態では、フラクタル図形生成部２０ａによって非文字データとしてのフラクタル図形が生成されるものとして説明する。 The machine learning data acquisition unit 20 acquires character data and non-character data for machine learning from an external communication network or an information processing device via the communication interface unit 4. Although a fractal figure is used for learning non-character data, the fractal figure may be obtained from the outside, or a fractal figure generation unit 20a may be provided inside the character recognition device 1. In this embodiment, a description will be given assuming that a fractal figure as non-character data is generated by the fractal figure generating unit 20a.

機械学習部２１は、機械学習用データを用いて学習をし、その結果得られたパラメータ類をＣＮＮ識別器２２に記憶させる。本実施形態では、文字候補認識部２７によって入力された文字候補をＣＮＮ識別器２２に実装されたＣＮＮの機能によって文字か非文字（ノイズ）かを判定し、その判定結果が文字候補認識部２７に返される。
機械学習については、後述する《２．文字認識装置による前処理（機械学習（ＣＮＮ識別器２２の生成））》において詳しく説明する。 The machine learning unit 21 performs learning using the data for machine learning, and causes the CNN discriminator 22 to store parameters obtained as a result. In the present embodiment, the character candidate input by the character candidate recognition unit 27 is determined as a character or non-character (noise) by the function of the CNN mounted on the CNN discriminator 22, and the determination result is determined by the character candidate recognition unit 27. Is returned to
For the machine learning, see “2. Preprocessing by character recognition device (machine learning (generation of CNN discriminator 22)) >> will be described in detail.

２値画像取得部２３は、通信インターフェース部４を介して、外部の通信ネットワークや情報処理装置から処理対象となる２値画像データを取得する。ただし、外部からは原カラー画像を取得し、文字認識装置１の内部に備えた画像２値化処理部２３ａで２値画像を生成してもよい。この実施形態では、画像２値化処理部２３ａによって２値画像が生成されるものとして説明する。
２値画像の生成については、後述する《３．文字認識装置による前処理（複数の２値画像の生成）》において詳しく説明する。 The binary image acquisition unit 23 acquires binary image data to be processed from an external communication network or an information processing device via the communication interface unit 4. However, an original color image may be obtained from the outside, and a binary image may be generated by the image binarization processing unit 23a provided inside the character recognition device 1. In this embodiment, a description will be given assuming that a binary image is generated by the image binarization processing unit 23a.
The generation of the binary image will be described later in << 3. Preprocessing by character recognition device (generation of plural binary images) >> will be described in detail.

推定文字領域走査部２４は、１枚の２値画像を、左上頂点を基点として垂直方向および水平方向に走査し、１個以上の文字が集まっていると推定される推定文字領域を抽出する。 The estimated character area scanning unit 24 scans one binary image in the vertical and horizontal directions starting from the upper left vertex, and extracts an estimated character area in which one or more characters are estimated to be collected.

連結成分抽出部２５は、推定文字領域から連結成分を抽出する。解像度等の制約によっては異なる文字の一部のピクセル同士が繋がっていることがある。そのため適宜膨張・収縮処理を施して公知の手法でラベリングを行い各連結成分を抽出する。 The connected component extraction unit 25 extracts a connected component from the estimated character area. Some pixels of different characters may be connected to each other depending on the resolution or the like. Therefore, expansion and contraction processing is appropriately performed, labeling is performed by a known method, and each connected component is extracted.

文字候補選定部２６は、外接矩形同士に一部重なりがある連結成分群或は重なりがなくても外接矩形間の距離が小さい連結成分群等が、文字認識の対象として適当か否かを判定し、適当であれば文字候補とする。この文字候補のみがＣＮＮ識別器２２による判定対象となる。 The character candidate selection unit 26 determines whether a connected component group in which the circumscribed rectangles partially overlap or a connected component group in which there is no overlap and the distance between the circumscribed rectangles is small is appropriate for character recognition. Then, if appropriate, it is a character candidate. Only this character candidate is to be determined by the CNN discriminator 22.

文字候補認識部２７は、文字候補として選定された１個以上の連結成分群が、文字か非文字かをＣＮＮ識別器２２によって判定する。判定結果が文字であれば、文字コードとその尤度を、非文字であれば、「文字でない」という情報と非文字である尤度をＣＮＮの出力として得る。 The character candidate recognition unit 27 uses the CNN discriminator 22 to determine whether one or more connected component groups selected as character candidates are characters or non-characters. If the determination result is a character, a character code and its likelihood are obtained as an output of the CNN.

文字認識結果出力部２８は、文字認識結果を文字認識装置１に備えられているプリンタや画面などへ出力したり、後続するテキスト処理等の入力データとして出力したりする。 The character recognition result output unit 28 outputs the character recognition result to a printer, a screen, or the like provided in the character recognition device 1, or outputs the data as input data for subsequent text processing or the like.

《２．文字認識装置による前処理（機械学習（ＣＮＮ識別器２２の生成））》
これは、学習用データを外部から取得し或は内部で生成し、機械学習を行い、学習によって得たパラメータ類をＣＮＮ識別器２２に保存する処理である。 << 2. Preprocessing by character recognition device (machine learning (generation of CNN discriminator 22)) >>
This is a process in which learning data is acquired from outside or generated internally, machine learning is performed, and parameters obtained by learning are stored in the CNN discriminator 22.

学習用データには、文字データと非文字データとがある。
文字データ、つまり文字コードに対応するサンプルはその文字を描画した画像を与えればよいが、できるだけバラエティに富んだ画像を用意することで認識精度を向上させるものとする。例えば同一文字コードが付与された文字を、多数のフォントを用いて描画する。
図２には、文字データの例を示す。算用数字「３」として、各種のフォントと手書き文字を同一の文字コードに対応づけて保存する。このように、書体の違いや活字か手書きかによらず同一の文字には同一のコードを設定する。これによりＣＮＮの汎化能力が高まる。もし、書体等の違いによって異なるコードを設定するならば、未学習のデータに適合できないという所謂オーバーフィッティングの問題が生じやすくなる。 The learning data includes character data and non-character data.
The sample corresponding to the character data, that is, the character code may be provided with an image in which the character is drawn. However, it is assumed that an image having as much variety as possible is prepared to improve the recognition accuracy. For example, a character to which the same character code is assigned is drawn using a large number of fonts.
FIG. 2 shows an example of character data. As the arithmetic numeral "3", various fonts and handwritten characters are stored in association with the same character code. In this way, the same code is set for the same character regardless of the typeface or whether the character is handwritten or handwritten. This increases the generalization ability of the CNN. If a different code is set depending on the typeface or the like, a problem of so-called overfitting, which cannot be applied to unlearned data, is likely to occur.

文字候補の中で、文字でないと判定されるデータ（非文字データ）には、2種類がある。
第１は実写画像に現われる自然物等が2値化の結果、文字認識の対象となってしまったもの、第２は複数の文字が並んだものがひとまとめに文字認識の対象として挙げられたものである。 Among the character candidates, there are two types of data (non-character data) determined to be non-character.
The first is a natural object that appears in a live-action image, which has been subjected to character recognition as a result of binarization. The second is an object in which multiple characters are lined up as a group for character recognition. is there.

第１の非文字パターンに対する学習データとしては、自然物のシミュレーションとしてよく用いられるフラクタル図形が使用できる。フラクタル地形生成法によってランダムに山地の地形を作り、これを等高線で分けた２値画像を生成し、この2値画像から適宜学習用のデータを取り出す。図３（ａ）（ｂ）（ｃ）には、ランダムに生成した山地を等高線で分けた図を標高の低い順から示している。図中破線の矩形で囲んだ部分は、非文字データとして任意に選択し登録するデータ例である。文字データのコードには正の整数を付与するのに対し、非文字データには負の整数のコードを付与するものとする。コードの正負だけで文字か非文字か直ちに判断できるからである。
なお、自然物に対応する非文字データとしてフラクタル図形を利用するのは、非文字つまりノイズにはフラクタル図形に似ているものが多いからである。 As the learning data for the first non-character pattern, a fractal figure often used as a simulation of a natural object can be used. A terrain of a mountain area is randomly created by a fractal terrain generation method, a binary image is generated by dividing the terrain by contour lines, and data for learning is appropriately extracted from the binary image. FIGS. 3A, 3B, and 3C show diagrams in which randomly generated mountains are divided by contour lines in ascending order of altitude. A portion surrounded by a broken-line rectangle in the drawing is an example of data that is arbitrarily selected and registered as non-character data. A positive integer code is assigned to character data codes, while a negative integer code is assigned to non-character data. This is because whether a character or a non-character can be immediately determined only by the sign of the code.
The reason why a fractal figure is used as non-character data corresponding to a natural object is that many non-characters, that is, noises, resemble a fractal figure.

第２の非文字パターンは文字に似ているノイズである。これは、図４（１ａ）〜（２ｂ）に例示するように、格子状あるいは三角状に文字を配置した画像をランダムに生成すればよい。
複雑な部首を組み合わせた漢字もこの学習データにマッチしてしまうおそれはある。しかし、仮にそのような文字があるとしても、その文字に対応した学習データの方により高い尤度でマッチするはずである。例えば、図４（２ａ）の非文字データは、図４（３ａ）の文字データと似ている。しかし、本実施形態のＣＮＮには、非文字データと文字データの両方を学習させているので、文字「轟」であれば、図４（３ａ）の方の尤度が高くなると考えられる。 The second non-character pattern is noise that resembles characters. This can be achieved by randomly generating an image in which characters are arranged in a grid or a triangle as illustrated in FIGS. 4A to 4B.
There is a possibility that kanji combining complex radicals may also match this learning data. However, even if there is such a character, the learning data corresponding to the character should match with a higher likelihood. For example, the non-character data in FIG. 4 (2a) is similar to the character data in FIG. 4 (3a). However, since both the non-character data and the character data are learned in the CNN of the present embodiment, it is considered that the likelihood in FIG.

《３．文字認識装置による前処理（複数の２値画像の生成）》
本実施形態では、カラー画像から文字を抽出することを想定している。原画像が２値化画像（モノクロとは限らない）であったり、モノクロのドキュメントであったりする場合は２値画像は１枚ですむが、カラー画像やグレースケール画像の場合は複数の２値画像が必要となる。次に本実施形態における２値画像の生成手順を簡単に説明する。 << 3. Preprocessing by character recognition device (generation of multiple binary images) >>
In the present embodiment, it is assumed that characters are extracted from a color image. If the original image is a binary image (not necessarily monochrome) or a monochrome document, only one binary image is required, but if it is a color image or grayscale image, multiple binary images are required. You need an image. Next, a procedure for generating a binary image in the present embodiment will be briefly described.

Ｋ−ｍｅａｎｓ法によって原画像内の全ピクセルをＮ個（Ｎは３以上）のグループに分類する。Ｎ個のグループを白いピクセルのグループと黒いピクセルのグループに２分する。グループ数Ｎは、原カラー画像で使われている色の個数や、文字認識の処理スピードや精度を考慮して適宜決定すればよい。このように本実施形態では、Ｋ−ｍｅａｎｓ法による処理という同一のアルゴリズムによって同時に複数の２値画像を生成できる。図５の例では、グループ個数Ｎ＝３であり、２^３通りの塗り分け方がある。ただし、全グループが白あるいは黒の場合は処理対象外とし、２値画像（２）〜（７）の６枚を処理対象とする。なお、黒色のピクセルを以下「前景ピクセル」と呼ぶ。 All pixels in the original image are classified into N (N is 3 or more) groups by the K-means method. Divide the N groups into groups of white pixels and groups of black pixels. The number N of groups may be appropriately determined in consideration of the number of colors used in the original color image and the processing speed and accuracy of character recognition. As described above, in the present embodiment, a plurality of binary images can be generated at the same time by the same algorithm of processing by the K-means method. In the example of FIG. 5, a group number N = 3, there are colored separately how are two ^3. However, if all the groups are white or black, they are excluded from the processing and six binary images (2) to (7) are processed. Note that the black pixels are hereinafter referred to as “foreground pixels”.

図５において、例えば（２）と（７）は、互いに白黒が反転しているだけなので、いずれか一方の２値画像について文字認識処理を実行すれば足りるようにも思える。しかし、図６（ａ）に例示するように、画像には縁取りのある文字や白抜きの文字も含まれる。本実施形態では黒い前景ピクセルのみを文字認識の対象としているので、図６（ａ）の白抜きの文字「Ｚ」は文字認識の対象外となりかねない。周囲が幅の狭い前景ピクセルで囲まれているだけなのでこの前景ピクセルが非文字として判定されたり、そもそも文字候補として認識されなかったりするおそれがあるからである。しかし、図６（ｂ）のように反転した２値画像も用意しておけば、原画像で白抜きされている文字も文字認識の対象となる。 In FIG. 5, for example, (2) and (7) are simply inverted in black and white, so it may seem sufficient to perform the character recognition process on either one of the binary images. However, as illustrated in FIG. 6A, the image also includes bordered characters and outline characters. In this embodiment, since only black foreground pixels are targeted for character recognition, the outlined character "Z" in FIG. 6A may be excluded from character recognition. This is because the surrounding area is merely surrounded by a narrow foreground pixel, and this foreground pixel may be determined as a non-character, or may not be recognized as a character candidate in the first place. However, if an inverted binary image is prepared as shown in FIG. 6B, characters that are outlined in the original image are also subjected to character recognition.

本実施形態では、１つの文字であっても複数の色が使われていたり、グラデーションがかかっていたりする文字も認識できなくてはならない。そのためにも、２値画像が複数あることが意味を持つ。例えば、図７はグラデーションがかかった大文字「Ｋ」が２値化されている状態を示す。図７の（ａ）、（ｂ）、（Ｃ）のそれぞれ単独では大文字「Ｋ」と特定することは難しいが、これら３枚の２値画像から得た情報を総合すれば大文字「Ｋ」と認識することが可能である。 In the present embodiment, even a single character must be able to recognize a character in which a plurality of colors are used or a gradation is applied. Therefore, it is significant that there are a plurality of binary images. For example, FIG. 7 shows a state where a gradation-applied capital letter “K” is binarized. Although it is difficult to specify the capital letter “K” alone in each of (a), (b), and (C) of FIG. 7, if the information obtained from these three binary images is combined, the capital letter “K” is obtained. It is possible to recognize.

《４．文字認識装置による本処理（２値画像に含まれる各文字の認識）》
図８の処理フローに従い、説明する。
先ずＪ枚の２値画像を取得し（ステップＳ１０），画像カウンタ変数ｊ（ｊ＝１〜Ｊの整数）に初期値１をセットする（ステップＳ１１）。 << 4. Main processing by character recognition device (recognition of each character included in binary image) >>
A description will be given according to the processing flow of FIG.
First, J binary images are acquired (step S10), and an initial value 1 is set to an image counter variable j (j = 1 to an integer of J) (step S11).

対象となる２値画像について、２値画像を走査して推定文字領域を抽出する（ステップＳ１２）。
図９に示すように、まず、画像の左上頂点から下方に向かって垂直方向に走査する。前景ピクセルが横方向に広がって並んでいる領域Ｒ１が見つかる。しかし、前景ピクセル群の外接矩形の縦の長さが所定の閾値以下である場合は、ノイズであると判断して文字認識の対象とはせず、下方への走査を再開する。領域Ｒ２の外接矩形の縦・横が所定の閾値以上であれば、１個以上の文字が含まれる領域と推定し、ステップＳ１３以降の処理対象となる。
このように、画像を走査する時点で、ある程度のノイズは除去できる。 For the target binary image, the binary image is scanned to extract an estimated character area (step S12).
As shown in FIG. 9, first, the image is scanned vertically downward from the upper left vertex of the image. A region R1 in which the foreground pixels are spread in the horizontal direction is found. However, when the vertical length of the circumscribed rectangle of the foreground pixel group is equal to or less than a predetermined threshold, it is determined that the noise is noise, and is not subjected to character recognition, and scanning downward is restarted. If the height and width of the circumscribed rectangle of the region R2 are equal to or larger than a predetermined threshold value, it is estimated that the region includes one or more characters, and becomes a processing target after step S13.
In this way, some noise can be removed at the time of scanning the image.

ステップＳ１２で抽出された推定文字領域から、連結成分を抽出する（ステップＳ１３）。
ここで、用語「連結」および「連結成分」について、図１０（ａ）を参照しながら説明する。因みに、本発明における「連結」および「連結成分」は、位相空間における連結性の概念を離散集合であるピクセルの集まりに適用できるよう変形したものである。 A connected component is extracted from the estimated character area extracted in step S12 (step S13).
Here, the terms “connection” and “connected component” will be described with reference to FIG. Incidentally, the terms “connection” and “connected component” in the present invention are obtained by modifying the concept of connectivity in the phase space so that the concept can be applied to a set of pixels that is a discrete set.

2値画像の全ピクセルを元とする集合 U=[1,W]×[1,H]内にあって前景ピクセルからなる集合をＢとすると、B⊆U である。図中破線の楕円で囲まれている前景ピクセルが、集合Ｂの元である
ここでは、ピクセル間の隣接関係が重要な概念となるが、これは上下左右のみを隣接点とする場合（４連結）と、斜めも隣接点として扱う場合（８連結）とが考えられる。これは任意に選択してよい。
図１０（ａ）では、ピクセルp,q,r∈Ｂに対してpとqは隣接し、ｑとｒは隣接している。このように任意のピクセル同士が互いに隣接したピクセルを辿ることで到達できる場合、これを「連結である」と言い、これらのピクセルのみを元とするBの部分集合Ｃを「連結成分」と言う。同様にBの部分集合Ｄも「連結成分」である（図中、集合Ｃ，Ｄの元であるピクセルは１点鎖線の楕円で囲まれている）。集合Ｃと集合Ｄのような連結成分同士の共通部分は空集合である。
一つの文字は一個または複数の連結成分から構成される。図１０（ｂ）の「あ」という文字は１個の連結成分のみから構成され、図１０（ｄ）の「談」という文字は１１個の連結成分から構成される。なお、１個の連結成分において、その真部分集合は連結成分ではない。例えば、図１０（ｃ）は図１０（ｂ）の一部のピクセルを取り出した集合なので、もはや連結成分とはいえず、本実施形態の処理対象外である。 If B is a set of foreground pixels in a set U = [1, W] × [1, H] based on all pixels of the binary image, then B⊆U. The foreground pixel surrounded by the broken ellipse in the figure is a source of the set B. Here, the adjacent relationship between the pixels is an important concept. ) And the case where the diagonal is treated as an adjacent point (8 concatenation). This may be chosen arbitrarily.
In FIG. 10A, p and q are adjacent to the pixel p, q, r∈B, and q and r are adjacent. When arbitrary pixels can be reached by tracing adjacent pixels as described above, this is called “connected”, and a subset C of B based on only these pixels is called “connected component”. . Similarly, the subset D of B is also a “connected component” (in the figure, the pixels that are the elements of the sets C and D are surrounded by a chain line ellipse). A common part between connected components such as the set C and the set D is an empty set.
One character is composed of one or more connected components. The character “A” in FIG. 10B is composed of only one connected component, and the character “Dan” in FIG. 10D is composed of 11 connected components. Note that, in one connected component, the true subset is not a connected component. For example, FIG. 10C is a set obtained by extracting a part of the pixels in FIG. 10B, and is no longer a connected component, and is not a processing target of the present embodiment.

次に、連結成分の抽出の仕方を説明する。
推定文字領域から、前景ピクセルを隣接関係に従ってラベリングすることで容易に連結成分が抽出できる。しかし、画像の解像度の制約等により、複数の文字がピクセルを共有している状態がしばしば起こる。この問題を解決するために膨張・収縮処理を用いる。
図１１の例では、図１１（ａ）に示すように、隣り合う「た」と「け」の文字が破線を付した部分でつながっている（図１１（ｂ）は破線部分の拡大図）。そのため、図１１（ｃ）に示すように、画像に対して収縮処理を施す。これによって本来隣接しているべきでないピクセル同士が分離される。収縮した画像でラベリングを行ない、得られた連結成分Ｐ１、Ｐ２に対してその近辺の収縮処理によって削られたピクセルを追加し直し、これを連結成分ＮＰ１．ＮＰ２とする。
なお、この膨張・収縮処理によって画像のノイズに由来する細かなごみ、ひげが消去できるという副次的効果もある。 Next, a method of extracting a connected component will be described.
A connected component can be easily extracted from the estimated character area by labeling the foreground pixels according to the adjacent relation. However, a situation in which a plurality of characters share a pixel often occurs due to restrictions on the resolution of an image or the like. In order to solve this problem, expansion / shrinkage processing is used.
In the example of FIG. 11, as shown in FIG. 11A, adjacent characters “ta” and “ke” are connected by a portion indicated by a broken line (FIG. 11B is an enlarged view of the broken line). . Therefore, as shown in FIG. 11C, the image is subjected to a contraction process. This separates pixels that should not be adjacent to each other. Labeling is performed on the contracted image, and the pixels connected to the obtained connected components P1 and P2, which have been removed by the contraction processing in the vicinity thereof, are added again. NP2.
Note that there is also a secondary effect that fine dirt and whiskers derived from image noise can be eliminated by the expansion / contraction processing.

文字は１個以上の連結成分からなる。そこで、ＣＮＮによる文字判定の処理にかける前に文字候補となりうる連結成分群を抽出する（ステップＳ１４）。
文字候補とは、１個の文字を構成すると推定される連結成分群のことであって、ＣＮＮによる判定対象とするだけの意味があるものをいう。
図１２に例示する文字列左端の「た」の文字はＰａ、Ｐｂ、Ｐｃの３つの連結成分からなる。もし、集合｛Ｐａ、Ｐｂ、Ｐｃ｝のべき集合の空集合を除く７個の元についてＣＮＮによる判定処理を行おうとするならば、処理速度の点で望ましくない。そのため、本実施形態では、次のように連結成分の外接矩形を利用する。 A character consists of one or more connected components. Therefore, a connected component group that can be a character candidate is extracted before performing the character determination process by the CNN (step S14).
A character candidate refers to a connected component group that is presumed to constitute one character, and has a meaning that is sufficient to be determined by the CNN.
The character “ta” at the left end of the character string illustrated in FIG. 12 includes three connected components Pa, Pb, and Pc. If the CNN is to perform determination processing on seven elements excluding the empty set of the power set of the set {Pa, Pb, Pc}, this is not desirable in terms of processing speed. Therefore, in the present embodiment, the circumscribed rectangle of the connected component is used as follows.

図１２（ａ）の文字列は、Ｐａ，Ｐｂ，・・・、Ｐｈの連結成分からなり、図１２（ｂ）に示すように各連結成分の外接矩形をｒＰａ，ｒＰｂ，・・・、ｒＰｈとする。左端に位置する外接矩形ｒＰａの左上頂点のＸ座標から走査を開始する。外接矩形ｒＰａは外接矩形ｒＰｂおよびｒＰｃと重なり合う部分があるので、これら３つの外接矩形を包含する矩形Ｒｅｃｔ１の内部にある連結成分群（Ｐａ，Ｐｂ、Ｐｃ）を文字候補（この段階では、仮の文字候補にすぎない）とする。
外接矩形ｒＰｂ，ｒＰｃの右側に外接矩形ｒＰｄがあるが、Ｘ座標同士（ｘ３とｘ４）の距離が離れているので、Ｒｅｃｔ１には外接矩形ｒＰｄを含めない。
続いて、外接矩形ｒＰｄの左上頂点のＸ座標ｘ４から右に向かって水平に走査を再開する。出発点ｘ４から右側に位置する外接矩形の右上頂点のＸ座標ｘ５、ｘ６、ｘ７、ｘ８の値を抽出する。外接矩形ｒＰｄの横幅が狭い（ｘ５−ｘ４）ので、右隣の外接矩形ｒＰｅも包含する矩形Ｒｅｃｔ２の内部にある連結成分群（Ｐｄ，Ｐｅ）を仮の文字候補とする。さらに右隣にある外接矩形ｒＰｆも包含した矩形Ｒｅｃｔ３の内部にある連結成分群（Ｐｄ，Ｐｅ、Ｐｆ）を仮の文字候補としてもよい。文字列の右端にある外接矩形ｒＰｇ，ｒＰｈの右上のＸ座標ｘ８は、スタート位置のＸ座標ｘ４から離れすぎているので、これらを包含する矩形Ｒｅｃｔ４の内部にある連結成分群（Ｐｄ，Ｐｅ、Ｐｆ、Ｐｇ、Ｐｈ）は仮の文字候補とはしない。 The character string in FIG. 12A is composed of connected components of Pa, Pb,..., Ph, and the circumscribed rectangle of each connected component is represented by rPa, rPb,. And Scanning starts from the X coordinate of the upper left vertex of the circumscribed rectangle rPa located at the left end. Since the circumscribed rectangle rPa overlaps the circumscribed rectangles rPb and rPc, the connected component group (Pa, Pb, Pc) inside the rectangle Rect1 including these three circumscribed rectangles is a character candidate (at this stage, a temporary It is just a character candidate).
The circumscribed rectangle rPd is on the right side of the circumscribed rectangles rPb and rPc, but since the X coordinates (x3 and x4) are far apart, Rect1 does not include the circumscribed rectangle rPd.
Subsequently, scanning is horizontally restarted rightward from the X coordinate x4 of the upper left vertex of the circumscribed rectangle rPd. The values of the X coordinates x5, x6, x7 and x8 of the upper right vertex of the circumscribed rectangle located on the right side from the starting point x4 are extracted. Since the width of the circumscribed rectangle rPd is narrow (x5−x4), the connected component group (Pd, Pe) inside the rectangle Rect2 that also includes the circumscribed rectangle rPe on the right is used as a temporary character candidate. Furthermore, the connected component group (Pd, Pe, Pf) inside the rectangle Rect3 that also includes the circumscribed rectangle rPf on the right may be used as a temporary character candidate. Since the upper right X coordinate x8 of the circumscribed rectangles rPg and rPh at the right end of the character string is too far from the X coordinate x4 of the start position, the connected component group (Pd, Pe, Pf, Pg, and Ph) are not temporary character candidates.

以上、煩雑さを避けるために各外接矩形のｘ座標同士の比較についてのみ説明したが、ｙ座標同士の比較をすることも当然である。例えば、連結成分Ｐｇに着目した場合、連結成分Ｐｈは外接矩形同士が上下に近接しているので、外接矩形ｒＰｇとｒＰｈを包含する矩形Ｒｅｃｔ５の内部にある連結成分群（Ｐｇ，Ｐｈ）も仮の文字候補とする。 Although only the comparison between the x-coordinates of the circumscribed rectangles has been described to avoid complexity, the comparison between the y-coordinates is also natural. For example, when focusing on the connected component Pg, since the circumscribed rectangles of the connected component Ph are vertically adjacent to each other, the connected component group (Pg, Ph) inside the rectangle Rect5 including the circumscribed rectangles rPg and rPh is also temporary. Character candidates.

このような外接矩形を利用した文字候補の選定方法をとるならば、矩形内に混入したノイズによって認識精度が影響を受ける可能性はある。しかし、本実施形態では以下の理由で問題としないことにする。すなわち、第１に、２値化方法の特徴から、ある２値画像にノイズが混入していても、大部分のケースでは別の画像のほぼ同じ矩形部分を取り出すとノイズの無い文字が得られるからである。第２に、文字の判定にはＣＮＮを使用するが、その特徴としてこのようなノイズに強くなるように訓練することができるからである。汎化能力の高い学習ができていれば、たとえ認識対象の画像としてノイズを含んだものしか得られなかったとしても、若干尤度の低い認識結果となるだけで最終結果の品質にはさほど影響しないと考えられる。 If a method of selecting a character candidate using such a circumscribed rectangle is adopted, there is a possibility that the recognition accuracy is affected by noise mixed in the rectangle. However, this embodiment does not pose a problem for the following reason. That is, first, due to the characteristics of the binarization method, even if noise is mixed in a certain binary image, in almost all cases, if almost the same rectangular portion of another image is extracted, a character without noise can be obtained. Because. Second, although CNN is used for character determination, it can be trained to be resistant to such noise as a feature thereof. If learning with high generalization ability has been achieved, even if only images containing noise are obtained as recognition target images, the recognition results will be slightly low likelihood and the quality of the final result will be significantly affected. It is not considered.

以上が、文字候補の基本的な決定方法である。
しかし、文字候補として得られた中には、ＣＮＮによる文字識別を行なうまでもなく、簡単な判定のみで文字をなさないとわかるものが多数含まれている。そのため、ＣＮＮにかける前に、文字識別の対象とする文字候補を選別する（ステップＳ１５）。このように、簡易な判定法で文字候補の数を絞り込むことは全体的な処理高速化のために有効である。
以下にそのような判定法を例示する。 The above is the basic method of determining a character candidate.
However, among the character candidates obtained, there are many that need not be subjected to character identification by CNN, but can be determined only by simple determination and that a character is not formed. Therefore, before applying to CNN, character candidates to be subjected to character identification are selected (step S15). Thus, narrowing down the number of character candidates by a simple determination method is effective for speeding up the entire processing.
The following is an example of such a determination method.

（１）外接矩形の上下左右端のいずれかに接する連結成分があまりにも微細なものしかないものは文字候補を構成する連結成分から除外する（図１３（ａ）のｃ１は除外し、ｃ２を文字候補とする）。
（２）外接矩形のサイズの上限と下限を予め設けておいて、サイズ上限を超えるあるいは下限を下回る文字候補は除外する（図１３（ｂ）のｃ３、ｃ４）。
（３）外接矩形の縦横比が極端なものを除外する。例えば、図１３（ｃ）のｃ６は縦・横比が１：２であり、隣接する文字候補ｃ５と比べても１個の文字でない可能性が高い。ただし、文字のなかには極端な縦横比のもの(漢数字の「一」など)もあるので、それらに対する配慮との兼ね合いになる。例えば、処理速度よりも認識精度が重要視されるような用途では、縦横比による判定を省略してもよい。
（４）あまりにも多くの連結成分を含むものは文字候補から除外する（図１３（ｄ）のｃ７）。
（５）全体の外接矩形の面積に対して、含まれる各連結成分の外接矩形の面積の総和が小さすぎる場合は除外する（図１３（ｅ）のｃ８）。ここで、ピクセル数の比率で判定しないのは、「口」のような文字を除外しないようにするためである。 (1) If the connected components that are in contact with any of the upper, lower, left, and right ends of the circumscribed rectangle are too small, they are excluded from the connected components constituting the character candidate (c1 in FIG. 13A is excluded, and c2 is excluded). Character candidates).
(2) The upper limit and the lower limit of the size of the circumscribed rectangle are set in advance, and character candidates exceeding the upper limit or below the lower limit are excluded (c3 and c4 in FIG. 13B).
(3) Exclude circumscribed rectangles having an extreme aspect ratio. For example, c6 in FIG. 13C has an aspect ratio of 1: 2, and it is highly likely that the character is not one character even when compared with the adjacent character candidate c5. However, some characters have an extreme aspect ratio (such as the Chinese character "one"), which is a balance with consideration for them. For example, in applications where recognition accuracy is more important than processing speed, the determination based on the aspect ratio may be omitted.
(4) Those containing too many connected components are excluded from character candidates (c7 in FIG. 13D).
(5) Exclude the case where the sum of the areas of the circumscribed rectangles of the respective connected components is too small with respect to the area of the entire circumscribed rectangle (c8 in FIG. 13E). Here, the reason why the determination is not made based on the ratio of the number of pixels is to prevent characters such as “mouth” from being excluded.

上記の文字候補としての適否を判定する方法（１）〜（５）は例示にすぎない。要は、文字認識の精度と処理速度の兼ね合いとからＣＮＮを利用した文字識別処理にかける文字候補を取捨選択できればよいのである。 The above methods (1) to (5) for determining the suitability as a character candidate are merely examples. The point is that it is only necessary to be able to select character candidates to be subjected to character identification processing using CNN in consideration of the accuracy of character recognition and the processing speed.

続いて、文字らしいと簡易判定された文字候補（１個以上の連結成分群）をＣＮＮにかける（ステップＳ１６）。
あらかじめ用意した文字データおよび非文字データで学習済みのＣＮＮに、文字候補を入力する。ＣＮＮが入力されたデータを文字と判定すると、その文字コードと尤度を返し、文字でないと判定すると、"文字でない"という判定結果を非文字である尤度とともに返す。本実施形態では、ＣＮＮは文字データと非文字データの両者で学習しているので、文字か非文字（＝ノイズ）かをその尤度をもって同時に判断できるのである。 Subsequently, the character candidate (one or more connected component groups) which is simply determined to be a character is applied to the CNN (step S16).
A character candidate is input to a CNN that has been trained with previously prepared character data and non-character data. If the CNN determines that the input data is a character, the CNN returns the character code and likelihood. If it is determined that the data is not a character, the CNN returns a determination result of “not a character” along with the non-character likelihood. In this embodiment, since the CNN learns both the character data and the non-character data, it is possible to simultaneously determine whether the character is a character or a non-character (= noise) with the likelihood.

図１４（ａ）は文字と判定された場合の出力結果を示すが、文字コードと尤度の組合せは１とおりとは限らない。１枚の２値画像からは唯一の判定結果を得ることは困難なので、ここでは文字コードの候補を尤度の高い順に取得できればよい。図１４（ｂ）は、文字でないと推定された場合の判定結果を示す。
これらの出力結果は、文字候補の外接矩形の（左上の）位置および縦横サイズとともに、記憶部３に格納し、後続の処理で参照する。ここで出力されるのは、高い尤度で文字と判定された連結成分群だけでもよい。
１個の連結成分群に対して、文字と非文字の矛盾する判断が返ってくることもあるが、最終的には全２値画像の認識結果を総合するので、妥当な判断が得られる。つまり、１枚の２値画像によっては文字か非文字か、あるいは文字の場合も文字コードが何かがはっきりと決定されなくてもかまわない。 FIG. 14A shows an output result when a character is determined, but the combination of the character code and the likelihood is not always one. Since it is difficult to obtain a single determination result from one binary image, it is sufficient that character code candidates can be acquired in the order of likelihood. FIG. 14B shows a determination result when it is estimated that the character is not a character.
These output results are stored in the storage unit 3 together with the (upper left) position and the vertical and horizontal sizes of the circumscribed rectangle of the character candidate, and are referred to in subsequent processing. Here, only the connected component group determined to be a character with high likelihood may be output.
Inconsistent judgments of characters and non-characters may be returned for one connected component group. However, since the recognition results of all the binary images are finally integrated, a proper judgment can be obtained. That is, depending on one binary image, it may be a character or a non-character, or in the case of a character, the character code may not be clearly determined.

１枚の２値画像に含まれる全推定文字領域についての処理が終了していなければ（ステップＳ１７でＮｏ）、ステップＳ１２の処理に戻り、次の推定文字領域を抽出するために画像を走査する。図９の例の場合、垂直方向の走査が完了済であれば、画像の左上座標から右方に向かって水平方向に走査する。前景ピクセルが縦方向に広がって並んでいる領域Ｒ３が見つかるが、横の長さが所定の閾値以下である場合は、ノイズであると判断して文字認識の対象とはしない。水平方向の走査を続行し、領域Ｒ４の外接矩形の縦・横が所定の閾値以上であれば、１個以上の文字が含まれる領域と推定して、ステップＳ１３以降の処理を実行する。
１枚の２値画像についての文字認識処理が終了したならば（ステップＳ１７でＹｅｓ），Ｊ枚の２値画像の全部についての処理が終了したかを判断する。まだ終了していなければ（ステップＳ１８でＮｏ），変数ｊをインクリメントし（ステップＳ１９）、ステップＳ１２に戻って、ｊ番目の画像を走査して推定文字領域を抽出する。 If the processing has not been completed for all the estimated character areas included in one binary image (No in step S17), the process returns to step S12, and the image is scanned to extract the next estimated character area. . In the case of the example of FIG. 9, if the scanning in the vertical direction has been completed, the image is scanned in the horizontal direction from the upper left coordinates to the right. An area R3 in which foreground pixels are spread in the vertical direction is found, but if the horizontal length is equal to or less than a predetermined threshold, it is determined that the noise is noise and is not subjected to character recognition. The scanning in the horizontal direction is continued, and if the height and width of the circumscribed rectangle of the region R4 are equal to or larger than a predetermined threshold value, it is estimated that the region includes one or more characters, and the processes after step S13 are executed.
If the character recognition processing for one binary image has been completed (Yes in step S17), it is determined whether the processing for all J binary images has been completed. If the processing has not been completed (No in step S18), the variable j is incremented (step S19), and the process returns to step S12 to scan the j-th image to extract an estimated character area.

全２値画像のそれぞれについて、前景ピクセルの文字認識が終了しているならば（ステップＳ１８でＹｅｓ），全部の２値画像についての文字認識結果を画面やプリンタに出力したり、他の処理システムへ出力したりする（ステップＳ２０）。例えば、文脈を考慮したテキスト処理である。この後続処理は、他の情報処理装置で行ってもよく、文字認識装置１の内部で行っても良い。図１５に、複数の２値画像のほぼ同じ位置にある連結成分群の認識結果を例示する。２値画像によって、ＣＮＮの判定結果は異なるが、これらの判定結果をどのように利用するかは後続の処理次第なのである。 If the character recognition of the foreground pixel has been completed for each of all the binary images (Yes in step S18), the character recognition results for all the binary images are output to a screen or a printer, or another processing system. Output to the server (step S20). For example, text processing considering the context. This subsequent processing may be performed by another information processing device, or may be performed inside the character recognition device 1. FIG. 15 illustrates a recognition result of a connected component group at substantially the same position in a plurality of binary images. Although the judgment result of CNN differs depending on the binary image, how to use these judgment results depends on the subsequent processing.

以上、本発明の１実施形態について説明した。しかし、本発明はこの実施形態に限るものではなく、特許請求の範囲を逸脱しない限りで、種々の実施形態が考えられる。例えば、２値画像は複数を前提としているが、１枚の２値画像に本発明を適用することも勿論可能である。また、１枚のカラー画像あるいはグレースケール画像から２値画像を生成するために、必ずしもｋ−ｍｅａｎｓ法を利用しなくてもよい。さらに、図８に示した処理フローは例示にすぎず、例えば、１枚の画像についての文字認識の都度、その判定結果を出力してもよいことは言うまでもない。 Hereinabove, one embodiment of the present invention has been described. However, the present invention is not limited to this embodiment, and various embodiments can be considered without departing from the scope of the claims. For example, a plurality of binary images is assumed, but the present invention can of course be applied to one binary image. Further, in order to generate a binary image from one color image or grayscale image, it is not always necessary to use the k-means method. Further, the processing flow illustrated in FIG. 8 is merely an example, and it goes without saying that the determination result may be output each time character recognition is performed on one image.

カラー画像に含まれる文字を高い精度で認識でき、テレビのテロップ、道路交通標識、看板等からテキストを抽出する際の基本となる技術として、幅広い利用が期待される。 Characters included in color images can be recognized with high accuracy, and wide use is expected as a basic technology for extracting text from television telops, road traffic signs, signboards, and the like.

１：文字認識装置
２：処理部
２０：機械学習用データ取得部
２１：機械学習部
２２：ＣＮＮ識別器
２３：２値画像取得部
２４：推定文字領域走査部
２５：連結成分抽出部
２６：文字候補選定部
２７：文字候補認識部
２８：文字認識結果出力部
３：記憶部
４：通信インターフェース部 1: Character recognition device 2: Processing unit 20: Machine learning data acquisition unit 21: Machine learning unit 22: CNN discriminator 23: Binary image acquisition unit 24: Estimated character area scanning unit 25: Connected component extraction unit 26: Character Candidate selection unit 27: Character candidate recognition unit 28: Character recognition result output unit 3: Storage unit 4: Communication interface unit

Claims

A character recognition method for recognizing characters included in an image, comprising: obtaining a plurality of binary images generated from a target image; extracting connected components from each of the binary images; Determining whether the combination of components (hereinafter, “connected component group”) is a character candidate to be subjected to character recognition; and subjecting the connected component group determined as the character candidate to a neural network to perform a character or non-character or by determining, if the judgment result is a character and a character code thereof likelihood consists of a step of obtaining a likelihood non-character as long as it is a non-character, the teacher data for the neural network character, typeface The same code is assigned to the same character regardless of the difference between them. The teacher data for non-characters in the neural network is generated by the fractal generation process. A character recognition method , comprising: a fractal pattern obtained by combining a plurality of characters ;

It is classified into N (N> = 3) groups by the K-means method, and the classified N groups are divided into two groups. Pixels included in one of the groups are displayed in white, and pixels included in the other are displayed in black. 2. The character recognition method according to claim 1, wherein the ^2N −2 binary images are subjected to character recognition. 3.