JP2006235817A

JP2006235817A - Character-recognizing device, character-recognizing method and recording medium for character recognition program

Info

Publication number: JP2006235817A
Application number: JP2005047157A
Authority: JP
Inventors: Shingo Ando; 慎吾安藤; Yoshinori Kusachi; 良規草地; Akira Suzuki; 章鈴木; Kenichi Arakawa; 賢一荒川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-02-23
Filing date: 2005-02-23
Publication date: 2006-09-07

Abstract

<P>PROBLEM TO BE SOLVED: To improve recognition precision in character recognition, concerning an image identification technology for identifying a character string by photographing the character string as an image. <P>SOLUTION: A character-recognizing device 12 which recognize characters in an image is provided with a first stage character-recognizing means 21 for segmenting a local image, and for extracting features, while scanning an inputted image, and for deciding character candidates, based on the similarity of the features and the features of characters registered in a dictionary; a deformation estimating means 22 for estimating the deformation degree of the character candidates, based on the character candidates and the image; a deformation correcting means 23 for segmenting the corrected local image from the image, based on the degree of deformation; and a character rerecognizing means 24 for segmenting the local image, and for extracting the features, while scanning the corrected local image, and for deciding new character candidates, based on the similarity of the features and the features of the characters registered in the dictionary and the character candidates. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文字列を画像として撮影し、その文字列を識別する画像識別技術に関するものである。具体的な産業応用システムとして、例えば看板の日本語翻訳システムなどが挙げられる。 The present invention relates to an image identification technique for capturing a character string as an image and identifying the character string. As a specific industrial application system, for example, a Japanese translation system for a signboard can be cited.

景観に存在する文字列識別は、一般的には文字列位置特定、文字領域特定、２値化、文字識別という４ステップを経る。しかし、このような技術では、照明変動や複雑な背景などの影響によって、文字列位置特定、文字領域特定、２値化がうまくいかず、文字列識別精度が低いという問題があった。 Character string identification existing in the landscape generally passes through four steps of character string position specification, character region specification, binarization, and character identification. However, such a technique has a problem in that character string position specification, character region specification, and binarization are not successful due to the influence of illumination fluctuations, complicated background, and the like, and character string identification accuracy is low.

このような問題を解決するために、景観に存在する文字列識別技術として、全画面探索、文字候補絞込み、言語モデルによる文字列推定、という３ステップを経る方法がある(例えば、非特許文献１参照。)。
草地良規、伊藤直己、鈴木章、荒川賢一、「画像インデクシングを目的としたテキスト領域不要の景観中文字認識」、電子情報通信学会信学技報ＰＲＭＵ２００４−８９、（２００４−１０）、ｐ.３７−４２ In order to solve such a problem, as a character string identification technique existing in a landscape, there is a method that passes through three steps of full screen search, character candidate narrowing down, and character string estimation using a language model (for example, Non-Patent Document 1). reference.).
Yoshinori Kusachi, Naomi Ito, Akira Suzuki, Kenichi Arakawa, “Recognition of Characters in a Landscape without Image Area for Image Indexing”, IEICE Technical Report PRMU 2004-89, (2004-10), p.37 -42

しかしながら、上記方法では、特徴の定義及び識別アルゴリズムに限界があり、識別精度が十分ではなかった。特に、撮像方向による文字の変形が識別精度に大きく影響を与えることがわかっている。また、背景部分での文字候補が多く残ってしまい、言語モデルによる文字列推定がうまく働かないという課題があった。 However, in the above method, the definition of features and the identification algorithm are limited, and the identification accuracy is not sufficient. In particular, it has been found that the deformation of characters depending on the imaging direction greatly affects the identification accuracy. In addition, many character candidates remain in the background portion, and there is a problem that the character string estimation by the language model does not work well.

本発明は、かかる事情に鑑みてなされたものであり、その目的は、上記課題を解決した文字認識技術を提供することにある。 This invention is made | formed in view of this situation, The objective is to provide the character recognition technique which solved the said subject.

そこで、上記課題を解決するために、請求項１に記載の発明は、画像中の文字を認識する文字認識装置であって、入力した画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度に基づいて文字候補を決定する第一段文字認識手段と、前記文字候補及び前記画像に基づいて前記文字候補の変形度合いを推定する変形推定手段と、前記変形度合いに基づいて前記画像から補正後局所画像を切り出す変形補正手段と、前記補正後局所画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度、及び前記文字候補に基づいて新文字候補を決定する文字再認識手段と、を有することを特徴とする。 Accordingly, in order to solve the above-described problem, the invention according to claim 1 is a character recognition device that recognizes characters in an image, and extracts features by cutting out a local image while scanning the input image. First-stage character recognition means for determining a character candidate based on the similarity between the feature and the character feature registered in the dictionary, and deformation for estimating the deformation degree of the character candidate based on the character candidate and the image An estimation unit; a deformation correction unit that extracts a corrected local image from the image based on the degree of deformation; and a feature that is extracted by extracting the local image while scanning the corrected local image, and is registered in the feature and dictionary And a character re-recognizing means for determining a new character candidate based on the character candidate.

また、請求項２に記載の発明は、前記変形度合いは、アンフィン変換のパラメータであることを特徴とする。 The invention according to claim 2 is characterized in that the degree of deformation is a parameter of affine transformation.

また、請求項３に記載の発明は、前記文字再認識手段では、前記第一段文字認識手段及び当該文字再認識手段で類似度が高いと判断した文字候補を認識結果とすることを特徴とする。 According to a third aspect of the present invention, in the character re-recognition means, a character candidate determined as having a high similarity by the first-stage character recognition means and the character re-recognition means is used as a recognition result. To do.

また、請求項４に記載の発明は、画像中の文字を認識する文字認識装置における文字認識方法であって、第一段文字認識手段が、入力した画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度に基づいて文字候補を決定する第一段文字認識ステップと、変形推定手段が、前記文字候補及び前記画像に基づいて前記文字候補の変形度合いを推定する変形推定ステップと、変形補正手段が、前記変形度合いに基づいて前記画像から補正後局所画像を切り出す変形補正ステップと、文字再認識手段が、前記補正後局所画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度、及び前記文字候補に基づいて新文字候補を決定する文字再認識ステップと、を有することを特徴とする。 According to a fourth aspect of the present invention, there is provided a character recognition method in a character recognition device for recognizing characters in an image, wherein the first stage character recognition means cuts out a local image while scanning the input image. A first-stage character recognition step for determining a character candidate based on the similarity between the feature and the feature of the character registered in the dictionary, and a deformation estimation unit, based on the character candidate and the image, A deformation estimation step for estimating the deformation degree of the character candidate, a deformation correction unit for cutting out the corrected local image from the image based on the deformation degree, and a character re-recognition unit for extracting the corrected local image A character re-recognition system that extracts a feature by cutting out a local image while scanning, and determines a new character candidate based on the similarity between the feature and the character feature registered in the dictionary, and the character candidate. Characterized in that it has Tsu and up, the.

また、請求項５に記載の発明は、前記変形度合いは、アンフィン変換のパラメータであることを特徴とする。 The invention according to claim 5 is characterized in that the degree of deformation is a parameter of affine transformation.

また、請求項６に記載の発明は、前記文字再認識ステップでは、前記第一段文字認識ステップ及び当該文字再認識ステップで類似度が高いと判断した文字候補を認識結果とすることを特徴とする。 Further, the invention according to claim 6 is characterized in that, in the character re-recognition step, character candidates determined to have high similarity in the first-stage character recognition step and the character re-recognition step are used as recognition results. To do.

また、請求項７に記載の発明は、上記請求項１〜６いずれかに記載の文字認識装置又は文字認識方法を、コンピュータで実行可能に記載したプログラムを記録したことを特徴とする。 The invention described in claim 7 is characterized in that a program that records the character recognition device or the character recognition method described in any of claims 1 to 6 so as to be executable by a computer is recorded.

請求項１〜７に記載の発明では、文字候補の変形度合いを推定し、変形を補正してから類似度を再算出することにより、識別精度が向上する。 In the first to seventh aspects of the invention, the accuracy of identification is improved by estimating the degree of deformation of the character candidate, correcting the deformation, and recalculating the similarity.

請求項３及び６に記載の発明では、文字候補をさらに絞り込むことが可能となる。 In the inventions according to claims 3 and 6, the character candidates can be further narrowed down.

請求項１〜７に記載の発明によれば、文字認識において、認識精度を向上させることが可能となる。 According to the first to seventh aspects of the present invention, it is possible to improve recognition accuracy in character recognition.

また、請求項３及び６によれば、文字候補のさらなる絞込みを行うことで、背景部分の文字候補を削減することが可能となる。 According to claims 3 and 6, it is possible to reduce the number of character candidates in the background portion by further narrowing down the character candidates.

以下、本発明の実施形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず図１を用いて、文字認識装置を文字列翻訳システムに適用した例を説明する。文字列翻訳システムは、カメラ付きＰＤＡなどの携帯端末１１、文字認識装置１２、文字列推定装置１３、及び翻訳装置１４から構成される。 First, an example in which the character recognition device is applied to a character string translation system will be described with reference to FIG. The character string translation system includes a mobile terminal 11 such as a PDA with a camera, a character recognition device 12, a character string estimation device 13, and a translation device 14.

本文字列翻訳システムにおいて、ユーザは携帯端末１１にて画像を撮影し、文字認識装置１２に送付する。文字認識装置１２は画像から文字候補を抽出して文字列推定装置１３に送付する。文字列推定装置１３は文字候補から文字列を推定して翻訳装置１４に送付する。翻訳装置１４は文字列を翻訳して翻訳結果を携帯端末１１に送付し、ユーザはこの翻訳結果を閲覧できる。このように本文字列翻訳システムにより、ユーザは撮影した文字列の画像を基に、その文字列の翻訳結果を見ることが可能となる。 In this character string translation system, the user takes an image with the mobile terminal 11 and sends it to the character recognition device 12. The character recognition device 12 extracts character candidates from the image and sends them to the character string estimation device 13. The character string estimation device 13 estimates a character string from the character candidates and sends it to the translation device 14. The translation device 14 translates the character string and sends the translation result to the portable terminal 11 so that the user can view the translation result. In this manner, the present character string translation system allows the user to see the translation result of the character string based on the captured character string image.

ここで図２を用いて、文字認識装置１２の構成を説明する。図２に示すように、文字認識装置１２は、第一段文字認識手段２１、変形推定手段２２、変形補正手段２３、及び文字再認識手段から構成される。 Here, the configuration of the character recognition device 12 will be described with reference to FIG. As shown in FIG. 2, the character recognition device 12 includes a first stage character recognition unit 21, a deformation estimation unit 22, a deformation correction unit 23, and a character re-recognition unit.

第一段文字認識手段２１は、画像から文字候補を抽出する手段であり、例えば、「画像インデクシングを目的としたテキスト領域不要の景観中文字認識」、電子情報通信学会信学技報ＰＲＭＵ２００４−８９（２００４−１０）、ｐ.３７−４２、に記載の方法により実現できる。 The first-stage character recognition means 21 is a means for extracting character candidates from an image. For example, “character recognition in a landscape that does not require a text area for the purpose of image indexing”, IEICE Technical Report PRMU 2004-89. (2004-10), p. 37-42.

ここで、第一段文字認識手段２１について、例を挙げて説明する。この第一段文字認識手段２１では、大きさの異なる文字に対応するために、複数解像度画像を生成し、位置をずらしながら、定められた大きさの画像を切り出し、粗密検索を行う。この結果をインデクスとして利用する。画像検索では、キーワードが入力されると、インデクスから該当する文字のみを抽出して規則性を判定し、規則的と判定された画像を結果として出力する。 Here, the first stage character recognition means 21 will be described with an example. The first-stage character recognition means 21 generates a multi-resolution image in order to correspond to characters of different sizes, cuts out an image of a predetermined size while shifting the position, and performs a coarse / fine search. This result is used as an index. In the image search, when a keyword is input, only the corresponding character is extracted from the index to determine regularity, and an image determined to be regular is output as a result.

ここで、粗密探索はパターン学習、及びパターン識別より構成される。以下に、パターン学習、及びパターン識別について説明する。 Here, the coarse / fine search includes pattern learning and pattern identification. Hereinafter, pattern learning and pattern identification will be described.

(パターン学習)
パターン学習は、特徴抽出、カテゴリの階層構造の作成、幾何学変形によるパターン生成、及び辞書生成の４つの段階から構成される。 (Pattern learning)
Pattern learning is composed of four stages: feature extraction, creation of a category hierarchical structure, pattern generation by geometric deformation, and dictionary generation.

特徴抽出では、文字を正面から撮影した原パターン(ｗ×ｗとする)を用意し、特徴を抽出する。特徴は、加重方向指数ヒストグラム特徴(ＷＤＣＨ)を利用する。ＷＤＣＨはＯＣＲに用いられ、２値画像を対象としていたが、グレー画像に容易に拡張可能である。以下にアルゴリズムの概要を示す。ただし、Ｍ、Ｎは正定数である。
１：原パターンから、ソーベルオペレータを用いて微分の値及び方向を求める。
２：微分の方向をＭ方向に量子化する。
３：原パターンをＮ×Ｎのグリッドに分割する。
４：各グリッドの各Ｍ方向で、微分値の大きさを加算する。
５：Ｎ×Ｎ×Ｍの特徴ベクトルと考え、ノルムを正規化する。 In the feature extraction, an original pattern (w × w) obtained by photographing a character from the front is prepared, and the feature is extracted. The feature utilizes a weighted direction index histogram feature (WDCH). WDCH is used for OCR and is intended for binary images, but can be easily extended to gray images. The outline of the algorithm is shown below. However, M and N are positive constants.
1: A differential value and direction are obtained from the original pattern using a Sobel operator.
2: The direction of differentiation is quantized in the M direction.
3: The original pattern is divided into N × N grids.
4: The magnitude of the differential value is added in each M direction of each grid.
5: Consider a feature vector of N × N × M, and normalize the norm.

ＷＤＣＨは、微分値をベースとしているため、明るさ変動の影響を受けにくい。また、グリッド内の微分値の和を用いることにより、フォントによる変形などの形状の微小変動を吸収することができる。 Since WDCH is based on differential values, it is not easily affected by brightness fluctuations. Further, by using the sum of the differential values in the grid, it is possible to absorb minute variations in shape such as deformation due to fonts.

カテゴリの階層構造の作成では、特徴ベクトルの類似性から各カテゴリをクラスタリングし、階層構造を作成する。各ノードは、複数のカテゴリが含まれる。最下層のノードでは、単体のカテゴリのみが含まれる。 In creating a category hierarchical structure, each category is clustered based on the similarity of feature vectors to create a hierarchical structure. Each node includes a plurality of categories. In the lowest layer node, only a single category is included.

幾何学変形によるパターン生成では、各カテゴリに対し、視点の変動に伴う文字の変形パターンを生成する。原パターンを回転、垂直スキュー、水平スキュー、縦横比、及び拡大縮小の５パラメータのアフィン変換により幾何学変形する。生成されたパターンの大きさは、原パターンよりも大きくなる場合があるが、現パターンの窓サイズ内の部分パターンのみを用いて特徴を抽出し、この特徴ベクトルを辞書生成に用いる。 In the pattern generation by geometric deformation, a character deformation pattern accompanying a change in viewpoint is generated for each category. The original pattern is geometrically deformed by five-parameter affine transformation of rotation, vertical skew, horizontal skew, aspect ratio, and enlargement / reduction. Although the size of the generated pattern may be larger than the original pattern, features are extracted using only partial patterns within the window size of the current pattern, and this feature vector is used for dictionary generation.

辞書生成では、以下の手順で各ノードの辞書を作成する。 In dictionary generation, a dictionary for each node is created in the following procedure.

第１段階として、特徴圧縮を行う。まず、各階層において、特徴を圧縮する。幾何変形パターンを含む全特徴ベクトルを主成分分析し、上位の固有値を有する固有ベクトルを用いて圧縮する。この圧縮特徴ベクトルをｆ(ｃ，ｒ，ｐ)と表す。ただし、ｃはカテゴリ、ｒは圧縮率、ｐは変形パラメータである。 As a first stage, feature compression is performed. First, the features are compressed in each layer. All feature vectors including the geometric deformation pattern are subjected to principal component analysis and compressed using eigenvectors having higher eigenvalues. This compressed feature vector is represented as f (c, r, p). However, c is a category, r is a compression rate, and p is a deformation parameter.

第２段階として、各ノードでの辞書生成を行う。各ノードのカテゴリ集合をＣとすると、ｆ(Ｃ，ｒ，ｐ)のベクトルを主成分分析し、部分空間Ｅｄ(Ｃ，ｒ)を求める。ただし、ｄは部分空間の次元数であり、寄与率により求めるが、システムにより定められる整数である。 As a second stage, dictionary generation at each node is performed. If the category set of each node is C, the principal component analysis is performed on the vector of f (C, r, p), and the subspace Ed (C, r) is obtained. Here, d is the number of dimensions of the subspace, and is an integer determined by the system, which is determined by the contribution rate.

各階層の圧縮率は、下層に向かうに従い低く設定することで粗密探索を実現する。上層では、精度は低いが、高速な識別を行い、下層では、低速であるが高精度な識別を実行する。 The compression rate of each layer is set to be lower toward the lower layer, thereby realizing the coarse / fine search. In the upper layer, although the accuracy is low, identification is performed at high speed, and in the lower layer, identification is performed at low speed but with high accuracy.

(パターン識別)
複数解像度画像全面に位置を動かしながら、大きさＷ×Ｗの小領域を切り出し、パターン識別を行う。パターン識別では、階層構造において複数のルートを辿りながら、粗密探索を実行する。以下にアルゴリズムの概要を示す。
１．特徴抽出：各解像度画像全面に対し位置を変化させつつ、領域を切り出して特徴を抽出する。すべての切り出し領域の特徴をあらかじめ算出しておく。
２．初期化：木構造のルートノードを出発点とする。
３．候補ノードの設定：すべての切り出し領域に、候補ノードとして第一階層のノードをセットする。各切り出し領域に対して、４〜６を繰り返す。
４．圧縮：下層の圧縮率を用いて切り出し領域の特徴を圧縮する。これをＩ’(ｒ)と表す。
５．投影距離計算：以下の式に従い、候補ノードＣの部分空間を用いて投影距離Ｌ(Ｃ)を求める。 (Pattern identification)
While moving the position over the entire surface of the multi-resolution image, a small area of size W × W is cut out and pattern identification is performed. In pattern identification, a coarse / fine search is performed while following a plurality of routes in a hierarchical structure. The outline of the algorithm is shown below.
1. Feature extraction: Extracting features by cutting out regions while changing the position of each resolution image. Features of all cutout areas are calculated in advance.
2. Initialization: Start from the root node of the tree structure.
3. Candidate node setting: Nodes in the first layer are set as candidate nodes in all cutout areas. Repeat 4-6 for each cutout area.
4). Compression: compresses the features of the clip region using the compression ratio of the lower layer. This is represented as I ′ (r).
5. Projection distance calculation: The projection distance L (C) is obtained using the partial space of the candidate node C according to the following equation.

ただし、Ｄは部分空間次元である。
６．スクリーニング：上記の距離値から、各候補ノードの順位を計算する。この距離及び順位の閾値から、候補ノードを更新する。
７．ピーク検出：すべての切り出し領域の各候補ノードについて、３次元(縦、横、解像度)の空間的な連結性を算出し、セグメントを求める。各セグメント内の距離値の最小ピークを有する候補ノードのみを残す。その他の候補ノードは削除する。
８．局所領域でのスクリーニング：同一解像度のピークの集合各々に対して以下の処理を行う。まず空間をブロック分割し、各ブロック内に含まれるピークを距離値によってソートして、上位から一定個数以内だけ採用する。その後、ブロック分割の位置を水平／垂直に半ブロックずらして同じ処理を行う。
９．同一候補文字のピーク統合：同一候補文字を持つ２つのピークを取り出して中心座標と解像度が互いに近ければ距離値の小さい方に統合する処理を、統合するピークの対が存在しなくなるまで繰り返す。
１０．候補ノードの更新：候補ノードの下層に接続されたノードを新しい候補ノードとして登録する。
１１．終了判定：最下層に辿り着いていれば残った候補ノードをインデクスとして出力して終了、その他であれば上記４に戻る。インデクスのフォーマットは(カテゴリ名、位置、大きさ、類似度)である。 Where D is the subspace dimension.
6). Screening: The rank of each candidate node is calculated from the above distance values. The candidate nodes are updated from the distance and rank thresholds.
7). Peak detection: Three-dimensional (vertical, horizontal, resolution) spatial connectivity is calculated for each candidate node in all cutout regions, and a segment is obtained. Only the candidate nodes with the smallest peak of distance value within each segment are left. Other candidate nodes are deleted.
8). Screening in the local region: The following processing is performed for each set of peaks having the same resolution. First, the space is divided into blocks, and the peaks included in each block are sorted according to distance values, and only a certain number from the top is adopted. Thereafter, the same processing is performed by shifting the block division position by a half block horizontally / vertically.
9. Peak integration of the same candidate characters: If two peaks having the same candidate character are taken out and integrated with a smaller distance value if the center coordinate and the resolution are close to each other, the processing is repeated until there is no pair of peaks to be integrated.
10. Candidate node update: A node connected to a layer below the candidate node is registered as a new candidate node.
11. End determination: If the candidate node has reached the lowest layer, the remaining candidate node is output as an index and the process ends. Otherwise, the process returns to 4 above. The format of the index is (category name, position, size, similarity).

上記７の空間的な連結性は３次元だけでなく２次元(縦、横)等も考えられる。また、上記７〜９は、処理量削減のための処理であり、精度及び処理量のトレードオフとなる。すべての階層で行う必要はなく、定められた階層のみでおこなえばよい。 The spatial connectivity of 7 is not limited to three dimensions, but two dimensions (vertical and horizontal) can be considered. In addition, the above 7 to 9 are processes for reducing the processing amount, which is a tradeoff between accuracy and processing amount. It is not necessary to carry out at all levels, and it is sufficient to carry out only at a predetermined level.

(画像検索)
画像検索では、キーワードが文字列として入力されるとパターン識別で得たインデクスの中からパターンが空間的に規則的に配置された個所を探索し、そのような個所が存在するインデクスを有する画像を検索結果として出力する。パターンの空間的な配置の規則として、ここでは、
(１)パターンの大きさがほぼそろっていること
(２)ピッチがほぼ一定であること
(３)ピッチの大きさが個別のパターンの大きさに対して相対的に一定の範囲内に収まっていること
(４)パターンの並ぶ順序が入力された文字列の順序と一致し、かつパターンの並ぶ方向と水平方向又は垂直方向との角度の差が一定の範囲内であることを用いる。 (Image search)
In an image search, when a keyword is input as a character string, the index obtained by pattern identification is searched for a place where the pattern is spatially regularly arranged, and an image having an index in which such a place exists is searched. Output as search results. As a rule of spatial arrangement of patterns, here
(1) The pattern size is almost the same
(2) The pitch is almost constant
(3) The pitch size is within a certain range relative to the size of each individual pattern.
(4) The pattern arrangement order matches the input character string order, and the difference in angle between the pattern arrangement direction and the horizontal or vertical direction is within a certain range.

この場合の探索アルゴリズムでは、入力文字列を構成する順方向の任意の２個の文字の組み合わせがインデクスの中で存在する個所をすべて探し、これらの個所で仮想的な入力文字列の開始位置の２次元座標、及び文字送りを表す２次元のベクトルの値を算出し、これらのパラメータで構成される投票空間に投票を行う。ただし、投票の際には、その組み合わせが上記(１)、(３)、(４)の規則に反しないか否かをチェックし、反すると判定した場合には投票を行わない。そして最後に、投票空間の中からスコアが閾値以上の個所の有無を探索する。 In this case, the search algorithm searches for all the locations in the index where any combination of two characters in the forward direction that make up the input character string exists, and at these points the start position of the virtual input character string is determined. A two-dimensional coordinate and a value of a two-dimensional vector representing character advance are calculated, and voting is performed in a voting space constituted by these parameters. However, at the time of voting, it is checked whether or not the combination does not violate the rules (1), (3), and (4) above. Finally, the voting space is searched for the presence or absence of a part whose score is equal to or greater than a threshold value.

このアルゴリズムでは、投票の際に処理対象となる候補文字は２つのカテゴリだけに限定するため偽の候補文字を多く含むインデクスに対しても高速な処理が可能となり、かつ投票処理の特性により部分的な正解の欠落に対してロバスト性を有することになる。 In this algorithm, the candidate characters to be processed at the time of voting are limited to only two categories, so it is possible to perform high-speed processing even for indexes that contain many false candidate characters, and partial due to the characteristics of voting processing. It is robust against lack of correct answers.

このようにして第一段文字認識手段２１は画像から文字候補(候補カテゴリ、位置、大きさ、類似度)を決定する。 In this way, the first stage character recognition means 21 determines character candidates (candidate category, position, size, similarity) from the image.

変形推定手段２２は、文字候補（候補カテゴリ、位置、大きさ、類似度）及び画像から、各位置・大きさにおける文字の変形度合いを推定する。これは、例えば、「サポートベクトル回帰による３次元物体の姿勢推定」、電子情報通信学会信学技報ＰＲＭＵ２００４−９１（２００４−１０）、に記載のパラメータ推定方法により実現できる。 The deformation estimation means 22 estimates the deformation degree of the character at each position / size from the character candidates (candidate category, position, size, similarity) and the image. This can be realized, for example, by a parameter estimation method described in “Posture estimation of a three-dimensional object by support vector regression”, IEICE Technical Report PRMU 2004-91 (2004-10).

具体的には、パラメータとしてカメラのヨー角、ピッチ角、及びロール角を対象にしているが、文字の変形パラメータとしては、アフィン変換のパラメータ（縦伸縮、横伸縮、縦スキュー、横スキュー、回転）などを対象にするのがよいと考えられる。また、パラメータ推定関数の学習には、数種類の文字フォントを人工的に変形させて利用すればよい。 Specifically, the parameters are the camera yaw angle, pitch angle, and roll angle, but the character deformation parameters are affine transformation parameters (vertical expansion / contraction, horizontal expansion / contraction, vertical skew, horizontal skew, rotation). ) Etc. are considered good targets. In order to learn the parameter estimation function, several types of character fonts may be artificially deformed and used.

このようにして得られた変形パラメータ推定関数は各文字カテゴリに対し導出される。したがって、パラメータ推定においては、文字候補（候補カテゴリ、位置、大きさ、類似度）のプロパティである「候補カテゴリ」に対応する変形パラメータ推定関数を呼び出し、変形パラメータ値を算出すればよい。また、このとき変形パラメータ推定関数に入力すべき特徴ベクトルは、文字候補（候補カテゴリ、位置、大きさ、類似度）のプロパティである「位置、大きさ」で特定される局所画像から抽出される。この際の特徴量は、第一段文字認識手段２１と同じ特徴を用いてもよいし、異なる特徴、例えば画像を構成する各画素を要素とした特徴量などを利用してもよい。 The deformation parameter estimation function thus obtained is derived for each character category. Therefore, in parameter estimation, a deformation parameter value may be calculated by calling a deformation parameter estimation function corresponding to a “candidate category” that is a property of a character candidate (candidate category, position, size, similarity). At this time, the feature vector to be input to the deformation parameter estimation function is extracted from the local image specified by “position, size” that is a property of the character candidate (candidate category, position, size, similarity). . The feature amount at this time may use the same feature as that of the first-stage character recognition means 21, or may use a different feature, for example, a feature amount having each pixel constituting the image as an element.

以上の方法により各文字候補（候補カテゴリ、位置、大きさ、類似度）から変形パラメータを推定できる。このうち、同じ位置・大きさのものは、これらのパラメータ値の平均値を計算し、その位置・大きさに対し唯一の変形パラメータにまとめる。この結果を、変形度合い（位置、大きさ）として出力する。 The deformation parameter can be estimated from each character candidate (candidate category, position, size, similarity) by the above method. Among these, those having the same position and size are calculated as an average value of these parameter values, and are combined into a single deformation parameter for the position and size. This result is output as the degree of deformation (position, size).

変形補正手段２３では、変形度合い（位置、大きさ）に基づいて、対応する変形補正後局所画像を入力画像より切り出す。 The deformation correction unit 23 cuts out the corresponding local image after deformation correction from the input image based on the degree of deformation (position, size).

ここで、入力画像から変形補正後局所画像が切り出される様子を図３に示す。変形補正後局所画像は、文字の変形が補正された状態になるように切り出される。たとえば変形パラメータがアフィン変換ならば、そのアフィン変換の逆変換を画像にかけて切り出すことで、変形補正後局所画像を得ることができる。その結果を、補正後局所画像（位置、大きさ）として出力する。 Here, FIG. 3 shows how the local image after deformation correction is cut out from the input image. The local image after deformation correction is cut out so that the deformation of the character is corrected. For example, if the deformation parameter is an affine transformation, a local image after deformation correction can be obtained by cutting out the inverse transformation of the affine transformation on the image. The result is output as a corrected local image (position, size).

文字再認識手段２４では、補正後局所画像（位置、大きさ）に対して第一段文字認識手段２１と同様の処理を行い、類似度を算出する。この類似度を、第一段文字認識手段２１から受信した対応する文字候補(候補カテゴリ、位置、大きさ、類似度)の新たな類似度として、これを新文字候補として出力する。 The character re-recognition means 24 performs the same processing as the first-stage character recognition means 21 on the corrected local image (position and size), and calculates the similarity. This similarity is output as a new character candidate as a new similarity of the corresponding character candidate (candidate category, position, size, similarity) received from the first-stage character recognition means 21.

このとき、各位置・大きさにおいて、新文字候補の類似度順位がＮ位以内に含まれ、かつ第一段文字認識手段２１における認識結果の類似度順位がＭ位以内に含まれているものだけを認識結果として出力することが可能である。これにより、第一段文字認識手段２１及び文字再認識手段２４の両方で類似度順位が高い文字候補を残すこととなり、さらなる文字候補の絞込みを行うことができる。このＮ、Ｍは予め決められるパラメータであり、Ｎの値とＭの値とは同じでもよいし。異なっていてもよい。 At this time, in each position / size, the similarity rank of the new character candidate is included within the Nth rank, and the similarity rank of the recognition result in the first stage character recognition means 21 is included within the Mth rank. It is possible to output only as a recognition result. As a result, both the first-stage character recognition means 21 and the character re-recognition means 24 leave the character candidates having a high similarity ranking, and further narrowing down the character candidates can be performed. N and M are predetermined parameters, and the value of N and the value of M may be the same. May be different.

なお、上記実施形態において、文字認識装置２１は、例えば、文字認識装置２１を構成するコンピュータ装置が有するＣＰＵによって実現され、必要とする第一段文字認識処理、変形推定処理、変形補正処理、及び文字再認識処理などをアプリケーションプログラムとして搭載することができる。 In the above-described embodiment, the character recognition device 21 is realized by, for example, a CPU included in a computer device that constitutes the character recognition device 21, and the required first-stage character recognition processing, deformation estimation processing, deformation correction processing, and Character re-recognition processing can be installed as an application program.

また、必要とする第一段文字認識処理、変形推定処理、変形補正処理、文字再認識処理などで行った処理結果や計算結果等のデータを内部メモリや外部記憶装置等に書き込み・読み出しができるようにしてもよい。 In addition, data such as processing results and calculation results obtained in the required first-stage character recognition processing, deformation estimation processing, deformation correction processing, character re-recognition processing, etc. can be written to and read from the internal memory or external storage device. You may do it.

また、本実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体を、システム又は装置に供給し、そのシステム又は装置のＣＰＵ（ＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することも可能である。この場合、記憶媒体から読み出されたプログラムコード自体が上記実施形態の機能を実現することになり、このプログラムコードを記憶した記憶媒体としては、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、ＨＤＤ等がある。 In addition, a recording medium recording software program codes for realizing the functions of the present embodiment is supplied to a system or apparatus, and a CPU (MPU) of the system or apparatus reads and executes the program code stored in the storage medium. It is also possible. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and examples of the storage medium storing the program code include CD-ROM, DVD-ROM, and CD-R. , CD-RW, MO, HDD and the like.

文字認識装置を用いた文字列翻訳システム。A character string translation system using a character recognition device. 文字認識装置の構成図。The block diagram of a character recognition apparatus. 変形補正後局所画像切り出しの説明図。Explanatory drawing of a local image cutout after deformation | transformation correction.

Explanation of symbols

１１…携帯端末
１２…文字認識装置
１３…文字列推定装置
１４…翻訳装置
２１…第一段文字認識装置
２２…変形推定手段
２３…変形補正手段
２４…文字再認識手段
DESCRIPTION OF SYMBOLS 11 ... Portable terminal 12 ... Character recognition apparatus 13 ... Character string estimation apparatus 14 ... Translation apparatus 21 ... First stage character recognition apparatus 22 ... Deformation estimation means 23 ... Deformation correction means 24 ... Character re-recognition means

Claims

A character recognition device for recognizing characters in an image,
First stage character recognition means for extracting a feature by cutting out a local image while scanning an input image, and determining a character candidate based on the similarity between the feature and the feature of a character registered in the dictionary;
Deformation estimating means for estimating a deformation degree of the character candidate based on the character candidate and the image;
Deformation correction means for cutting out a corrected local image from the image based on the degree of deformation;
Character scanning is performed by extracting a feature by cutting out the local image while scanning the corrected local image, and determining a new character candidate based on the similarity between the feature and the character feature registered in the dictionary, and the character candidate. And a character recognition device.

The character recognition device according to claim 1, wherein the degree of deformation is a parameter of affine transformation.

3. The character recognition according to claim 1, wherein the character re-recognition means uses a character candidate determined as having a high degree of similarity in the first-stage character recognition means and the character re-recognition means as a recognition result. apparatus.

A character recognition method in a character recognition device for recognizing characters in an image,
First stage character recognition means extracts a feature by cutting out a local image while scanning the input image, and determines a character candidate based on the similarity between the feature and the feature of the character registered in the dictionary A step recognition step;
A deformation estimating step for estimating a deformation degree of the character candidate based on the character candidate and the image;
A deformation correction step in which a deformation correction unit cuts out a corrected local image from the image based on the degree of deformation;
The character re-recognition means extracts a feature by cutting out the local image while scanning the corrected local image, and based on the similarity between the feature and the character feature registered in the dictionary, and the character candidate And a character re-recognition step for determining a candidate.

The character recognition method according to claim 4, wherein the degree of deformation is an affine transformation parameter.

The character recognition according to claim 4 or 5, wherein, in the character re-recognition step, a character candidate determined as having high similarity in the first-stage character recognition step and the character re-recognition step is used as a recognition result. Method.

A recording medium having recorded thereon a program that can execute the character recognition device or the character recognition method according to any one of claims 1 to 6 by a computer.