JP2013186906A

JP2013186906A - Method and device for recognizing character string in image

Info

Publication number: JP2013186906A
Application number: JP2013046996A
Authority: JP
Inventors: yi-feng Pan; 屹峰潘; Suyuan Chen; チェヌ・スユアヌ; Junu Sunu; スヌ・ジュヌ; Yuan He; 源何
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-03-09
Filing date: 2013-03-08
Publication date: 2013-09-19
Anticipated expiration: 2033-03-08
Also published as: CN103310209A; JP6085999B2

Abstract

PROBLEM TO BE SOLVED: To provide a method and device for recognizing a character string in an image.SOLUTION: The method includes the steps of: extracting a character string area in an image; performing over-segmentation on the character string area; and recognizing a character string included in the character string area by path searching strategy on the basis of at least one feature of a language type context feature and a character width context feature.

Description

本発明は、文字（character）認識分野に関し、具体的に、画像中の文字列を認識する方法及び装置に関する。 The present invention relates to the field of character recognition, and more particularly, to a method and apparatus for recognizing a character string in an image.

デジタル画像取得装置（例えば、携帯電話、カメラなど）の普及に伴い、テキスト情報に基づく画像検索システムが幅広く注目されている。そのうち、自然シーンの画像中のテキスト認識は、システム全体の主な構成部分として、システムのパフォーマンスに大きく影響する。しかし、テキストのサイズ、フォント及び画像品質の劣化などの影響を受け、テキスト認識は、高精度の認識結果を達成することが依然として困難である。また、自然シーンの画像には通常複数種類の言語が含まれるので、これもテキス認識の精度に大きい影響を与える。 With the spread of digital image acquisition devices (for example, mobile phones, cameras, etc.), image search systems based on text information have attracted widespread attention. Of these, the recognition of text in natural scene images is a major component of the overall system and greatly affects system performance. However, under the influence of text size, font and image quality degradation, text recognition is still difficult to achieve highly accurate recognition results. In addition, since a natural scene image usually includes a plurality of kinds of languages, this also greatly affects the accuracy of text recognition.

よって、上述の問題を解決することができる技術が望ましい。 Therefore, a technique that can solve the above-described problem is desirable.

本発明の主な目的は、画像中の文字列を認識する方法及び装置を提供することにある。 The main object of the present invention is to provide a method and apparatus for recognizing a character string in an image.

本発明の一側面によれば、画像中の文字列を認識する方法が提供される。この方法は、画像中の文字列領域を抽出するステップと、文字列領域に対して分割を行うステップと、言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略（Path Searching Strategy）により文字列領域に含まれる文字列を認識するステップと、を含む。 According to one aspect of the present invention, a method for recognizing a character string in an image is provided. The method includes a step of extracting a character string region in an image, a step of dividing the character string region, and a path search based on at least one of a language type context feature and a character width context feature. Recognizing a character string included in the character string area by a strategy (Path Searching Strategy).

本発明の実施例による、画像中の文字列を認識する方法のフローチャートである。4 is a flowchart of a method for recognizing a character string in an image according to an embodiment of the present invention. 本発明による例示的な文字列領域を示す図である。FIG. 4 is a diagram illustrating an exemplary character string area according to the present invention. 本発明による例示的な前処理後の文字列領域を示す図である。It is a figure which shows the character string area | region after the exemplary pre-processing by this invention. 本発明による例示的なオーバーセグメンテーション後の文字列領域の画像を示す図である。FIG. 6 is a diagram illustrating an image of a character string region after an exemplary over-segmentation according to the present invention. 本発明による例示的な最適経路探索を示す図である。FIG. 4 is a diagram illustrating an exemplary optimum route search according to the present invention. 本発明の実施例による、隣接する文字に基づいて言語種類文脈特徴を確定するフローチャートである。4 is a flowchart for determining a language type context feature based on adjacent characters according to an embodiment of the present invention. 本発明の他の実施例による、隣接する文字に基づいて言語種類文脈特徴を確定するフローチャートである。7 is a flowchart for determining language type context features based on adjacent characters according to another embodiment of the present invention. 本発明による例示的な文字列領域を示す図である。FIG. 4 is a diagram illustrating an exemplary character string area according to the present invention. 本発明による例示的な、言語種類文脈特徴を導入しない時に得られた認識結果を示す図である。FIG. 6 is a diagram illustrating a recognition result obtained when an exemplary language type context feature is not introduced according to the present invention. 本発明による例示的な、言語種類文脈特徴を導入した時に得られた認識結果を示す図である。FIG. 6 is a diagram illustrating recognition results obtained when introducing exemplary language type context features according to the present invention. 本発明の実施例による、文字幅及び重み付き平均文字幅の間の差に基づいて文字幅文脈特徴を確定するフローチャートである。6 is a flowchart for determining character width context features based on a difference between a character width and a weighted average character width, according to an embodiment of the present invention. 本発明による例示的な文字列領域を示す図である。FIG. 4 is a diagram illustrating an exemplary character string area according to the present invention. 本発明による例示的な、文字幅文脈特徴を導入しない時の文字列認識結果を示す図である。FIG. 6 is a diagram illustrating an exemplary character string recognition result when no character width context feature is introduced according to the present invention. 本発明による例示的な、文字幅文脈特徴を導入した時の文字列認識結果を示す図である。FIG. 6 is a diagram illustrating a character string recognition result when an exemplary character width context feature is introduced according to the present invention. 本発明による例示的な、画像中の文字列を認識するフローチャートである。6 is an exemplary flowchart for recognizing a character string in an image according to the present invention. 本発明の実施例による、画像中の文字列を認識する装置のブロック図である。1 is a block diagram of an apparatus for recognizing a character string in an image according to an embodiment of the present invention. 本発明の他の実施例による、画像中の文字列を認識する装置のブロック図である。FIG. 6 is a block diagram of an apparatus for recognizing a character string in an image according to another embodiment of the present invention. 本発明の他の実施例による、画像中の文字列を認識する装置のブロック図である。FIG. 6 is a block diagram of an apparatus for recognizing a character string in an image according to another embodiment of the present invention. 本発明の実施例による、画像中の文字列を認識する方法及び装置を実施するために用い得る計算装置の例示的な構造図である。FIG. 2 is an exemplary structural diagram of a computing device that can be used to implement a method and apparatus for recognizing a character string in an image according to an embodiment of the present invention.

以下、添付した図面を参照しながら本発明の好適な実施形態について説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

図1、図2A〜2C及び図3を参照して、本発明の実施例による、画像中の文字列を認識する方法の処理100を説明する。 A process 100 of a method for recognizing a character string in an image according to an embodiment of the present invention will be described with reference to FIGS. 1, 2A to 2C, and FIG.

図1に示すように、ステップS105では、画像中の文字列領域を抽出することができる。図2Aは、本発明の例示的な文字列領域を示す図である。図2Aに示すように、画像中の文字列領域、即ち、“成就我[智]造”を含む文字列領域を抽出することができる。 As shown in FIG. 1, in step S105, a character string region in the image can be extracted. FIG. 2A is a diagram showing an exemplary character string region of the present invention. As shown in FIG. 2A, a character string region in the image, that is, a character string region including “Fulfillment” can be extracted.

オプションで、後述のステップS110での文字列領域のオーバーセグメンテーションの前に、文字列領域に対して前処理を行うことができる。図2Bは、本発明の例示的な前処理後の文字列領域を示す図である。そのうち、前処理は、幾つかの基本的な画像処理プロセス、例えば、二値化、画像の平滑化、傾き除去及び連通域の抽出などを含む。これらの前処理の主な目的は、後続の文字分割、抽出及び認識に、信頼できる画像品質を提供することにある。理解すべきは、前処理は、必ずしも必要でないステップである。例えば、画像中の文字列領域が非常にはっきりしている場合は、これらの前処理を行わなくてもよい。 Optionally, pre-processing can be performed on the character string region prior to over-segmentation of the character string region in step S110 described below. FIG. 2B is a diagram showing an exemplary pre-processed character string area of the present invention. Among them, the pre-processing includes several basic image processing processes such as binarization, image smoothing, inclination removal, and communication area extraction. The main purpose of these preprocessing is to provide reliable image quality for subsequent character segmentation, extraction and recognition. It should be understood that preprocessing is a step that is not necessarily required. For example, if the character string region in the image is very clear, these pre-processing steps need not be performed.

ステップS110では、ステップS105にて抽出された文字列領域に対してオーバーセグメンテーションを行うことができる。シーン画像中の文字列のサイズ、フォント及び配置方式などの差が比較的大きい場合は、オーバーセグメンテーションの戦略を採用することで、文字間ができるだけくっつかないことを保証し得る。オーバーセグメンテーションの結果に基づいて、後続の最適経路探索、即ち、文字列認識に便利であるために、分割候補グリッドを構成することができる。図2Cは、本発明による例示的なオーバーセグメンテーション後の文字列領域の画像を示す図である。図2Cに示すように、“成就我[智]造”を含む文字列領域に対してオーバーセグメンテーションを行っている。 In step S110, over segmentation can be performed on the character string region extracted in step S105. When the difference in the size, font, and arrangement method of the character string in the scene image is relatively large, it is possible to ensure that the characters are not stuck as much as possible by adopting an over-segmentation strategy. Based on the result of the over-segmentation, the division candidate grid can be configured to be convenient for the subsequent optimum route search, that is, the character string recognition. FIG. 2C is a diagram illustrating an image of a string region after exemplary over segmentation according to the present invention. As shown in FIG. 2C, over-segmentation is performed on the character string region including “Fulfillment”.

続いて、ステップS115では、言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略により文字列領域に含まれる文字列を認識することができる。図3は、本発明による例示的な最適経路探索を示す図である。図3には、各種の探索経路が示されている。 Subsequently, in step S115, the character string included in the character string region can be recognized by the route search strategy based on at least one of the language type context feature and the character width context feature. FIG. 3 is a diagram illustrating an exemplary optimum route search according to the present invention. FIG. 3 shows various search paths.

一般的な文字列認識については、最適経路探索は、分割及び文字列認識を同時に行う常用の方法である。分割候補グリッドには、何れか１つの経路が１つの文字列認識結果に対応し、最適経路探索の目的は、経路目標関数の最適化結果に対応する経路を見つけ、この経路に対応する認識結果が真の結果に最も接近すると判定することにある。文字列シーケンスX、文字種類符号シーケンスY及び対応する分割経路Sが与えられると、経路目標関数は、次のように表れ得る。

For general character string recognition, the optimum route search is a common method for performing division and character string recognition simultaneously. In the division candidate grid, any one route corresponds to one character string recognition result, and the purpose of the optimum route search is to find a route corresponding to the optimization result of the route target function, and a recognition result corresponding to this route. Is determined to be closest to the true result. Given a character string sequence X, a character type code sequence Y, and a corresponding split path S, the path target function can appear as follows:

ここで、
（外１）

は、特徴関数を表し、
（外２）

は、関数の重み（weight）を表す。 here,
(Outside 1)

Represents the feature function,
(Outside 2)

Represents the weight of the function.

本実施例では、特徴関数は、言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴を含む。 In this embodiment, the feature function includes at least one of a language type context feature and a character width context feature.

オプションで、特徴関数
（外３）

は、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴のうちの１つ又は複数の特徴を更に含んでもよい。これについては、図9を参照して後述する。 Optional feature function (outside 3)

May further include one or more of a single character recognizer feature, a semantic context feature, and a geometric context feature. This will be described later with reference to FIG.

通常、自然シーンの画像中の文字の多様性により、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴のみに基づければ、満足な認識結果を得ることが難しい。本実施例による技術案では、言語種類文脈特徴及び文字幅文脈特徴を考慮しているため、認識精度を向上させることができる。次に、図4、図5及び図6A〜6Cを参照して、言語種類文脈特徴を確定する処理を説明し、また、図7及び図8A〜8Cを参照して、文字幅文脈特徴を確定する処理を説明する。 Usually, it is difficult to obtain a satisfactory recognition result based on single character recognizer features, semantic context features, and geometric context features due to the diversity of characters in natural scene images. In the technical solution according to the present embodiment, the language type context feature and the character width context feature are taken into consideration, so that the recognition accuracy can be improved. Next, the processing for determining the language type context feature will be described with reference to FIGS. 4, 5 and 6A to 6C, and the character width context feature will be determined with reference to FIGS. 7 and 8A to 8C. Processing to be performed will be described.

まず、図4、図5及び図6A〜6Cを参照して、本発明の実施例による、隣接する文字に基づいて言語種類文脈特徴を確定する処理を説明する。 First, with reference to FIG. 4, FIG. 5 and FIGS. 6A to 6C, a process for determining a language type context feature based on adjacent characters according to an embodiment of the present invention will be described.

自然シーンの画像には、通常、異なる種類の言語、例えば、漢字、英文及びアラビア数字などが含まれる。また、同一文字列中の文字は、通常、１つの種類の言語に属する。これに基づいて、本実施例によれば、言語種類文脈特徴に基づいて、経路探索関数の計算を行うことができる。 Natural scene images typically include different types of languages, such as kanji, English, and Arabic numerals. In addition, characters in the same character string usually belong to one type of language. Based on this, according to the present embodiment, the route search function can be calculated based on the language type context feature.

図4に示すように、ステップS110にて文字列領域に対してオーバーセグメンテーションを行った後に、ステップS405では、経路中の各文字及びその一つの隣接する文字が同じ種類の言語に属するかどうかを確定することができる。ステップS410では、上述の確定結果に基づいて言語種類文脈特徴を算出することができる。 As shown in FIG. 4, after performing over-segmentation on the character string area in step S110, in step S405, it is determined whether each character in the path and its one adjacent character belong to the same type of language. It can be confirmed. In step S410, the language type context feature can be calculated based on the above-described determination result.

具体的には、２つの隣接する文字
（外４）

及び符号
（外５）

が与えられるとすると、言語種類文脈特徴関数は、次のように定義され得る。

Specifically, two adjacent characters (outside 4)

And sign (outside 5)

Is given, the language type context feature function can be defined as:

ここで、α_は、ペナルティ係数であり、その値は、経験により確定されてもよい。言語種類文脈特徴は、異なる種類の言語に属する２つの隣接する文字のみに対してペナルティを行うことにより、同一行にある文字を同じ種類の言語に強制的に属させる目的を達成することができる。 Here, α _{is a} penalty coefficient, and its value may be determined by experience. A language type context feature can achieve the purpose of forcing a character on the same line to belong to the same type of language by penalizing only two adjacent characters belonging to different types of languages. .

続いて、ステップS115では、言語種類文脈特徴に基づいて、経路探索戦略により文字列領域に含まれる文字列を認識することができる。 Subsequently, in step S115, the character string included in the character string area can be recognized by the route search strategy based on the language type context feature.

次に、図5を参照して、本発明の他の実施例による、隣接する文字に基づいて言語種類文脈特徴を確定する処理を説明する。 Next, referring to FIG. 5, a process for determining a language type context feature based on adjacent characters according to another embodiment of the present invention will be described.

図5に示すように、ステップS110にて文字列領域に対してオーバーセグメンテーションを行った後に、ステップS505では、探索経路中の各文字及びその複数の隣接する文字中の各隣接する文字が同じ種類の言語に属するかどうかを確定することができる。ステップS510では、上述の確定結果に基づいて言語種類文脈特徴を算出することができる。 As shown in FIG. 5, after over-segmentation is performed on the character string area in step S110, in step S505, each character in the search path and each adjacent character in the plurality of adjacent characters are of the same type. It can be determined whether or not it belongs to a certain language. In step S510, a language type context feature can be calculated based on the above-described determination result.

具体的には、目標文字の隣接する文字の数が複数であってもよい。目標文字X₁の隣接する文字の集合が
（外６）

であるとすると、言語種類文脈特徴関数は、次のように定義され得る。

Specifically, the number of characters adjacent to the target character may be plural. The set of adjacent characters of target character X ₁ is (Outside 6)

, The language type context feature function can be defined as:

好ましくは、ペナルティ係数_αは、訓練サンプルを用いて、機械学習アルゴリズムにより求められたものである。具体的なアルゴリズムについては、例えば、非特許文献である“Xiang-Dong Zhou、Jin-Lun Yu、Cheng-Lin Liu、Nagasaki，T．、Marukawa，K.：Online Handwritten Japanese Character String Recognition Incorporating Geometric Context．ICDAR 2009．7：48-52”を参照することができる。 The penalty coefficient _α is preferably obtained by a machine learning algorithm using a training sample. For specific algorithms, for example, “Xiang-Dong Zhou, Jin-Lun Yu, Cheng-Lin Liu, Nagasaki, T., Marukawa, K .: Online Handwritten Japanese Character String Recognition Incorporating Geometric Context. ICDAR 2009.7: 48-52 ”can be referred to.

図6Aは、本発明による例示的な文字列領域を示す図であり、図6Bは、本発明による例示的な、言語種類文脈特徴を導入しない時に得られた認識結果を示す図であり、図6Cは、本発明による例示的な、言語種類文脈特徴を導入した時に得られた認識結果を示す図である。図6Bに示すように、言語種類文脈特徴を導入しない時に得られた結果は、“o 1 o − 6 7 5 o 2 2 2 9”であり、そのうち、３つ“0”は、誤って“o”と認識されている。図6Cに示すように、言語種類文脈特徴を導入した時に得られた結果は、“0 1 o − 6 7 5 0 2 2 2 9”であり、そのうち、３つの“0”のうちの１つは、誤って“o”と認識されている。これによって分かるように、言語種類文脈特徴を導入することにより、文字認識の正確率を向上させることができる。 FIG. 6A is a diagram illustrating an exemplary character string region according to the present invention, and FIG. 6B is a diagram illustrating a recognition result obtained when an exemplary language type context feature according to the present invention is not introduced. FIG. 6C is a diagram showing recognition results obtained when introducing exemplary language type context features according to the present invention. As shown in FIG. 6B, the result obtained when the language type context feature is not introduced is “o 1 o −6 7 5 o 2 2 2 9”, of which three “0” are mistakenly “ o ”. As shown in FIG. 6C, the result obtained when the language type context feature is introduced is “0 1 o −6 7 5 0 2 2 2 9”, of which one of the three “0” s. Is mistakenly recognized as “o”. As can be seen, the accuracy of character recognition can be improved by introducing language type context features.

次に、図7及び図8A〜図8Cを参照して、本発明の実施例による、文字幅と重み付き平均文字幅との間の差に基づいて文字幅文脈特徴を確定する処理を説明する。そのうち、文字幅文脈特徴は、各探索経路について、後述の方法により確定されるものである。文字列認識過程では、誤って分割することによる誤認識が比較的よく現れる。また、文字の書き方が異なっても、レイアウトに便利であるために、同一行にある文字は、通常、同じ文字幅を有する。本実施例によれば、文字幅文脈特徴に基づいて、上述のような誤りを修正することができる。 Next, a process for determining a character width context feature based on a difference between a character width and a weighted average character width according to an embodiment of the present invention will be described with reference to FIGS. 7 and 8A to 8C. . Among them, the character width context feature is determined by a method described later for each search path. In the character string recognition process, misrecognition due to erroneous division is relatively common. In addition, even if the writing method of characters is different, the characters on the same line usually have the same character width in order to be convenient for layout. According to the present embodiment, the above-described error can be corrected based on the character width context feature.

図7に示すように、ステップS110にて文字列領域に対してオーバーセグメンテーションを行った後に、ステップS705では、文字列領域に対して初期認識を行うことができる。 As shown in FIG. 7, after over-segmentation is performed on the character string region in step S110, initial recognition can be performed on the character string region in step S705.

ステップS710では、初期認識結果に基づいて重み付き平均文字幅を推定することができ、即ち、重み付き平均文字幅
（外7）

であり、そのうち、
（外8）

は、初期認識結果中の第i個文字の文字幅であり、
（外9）

は、
（外10）

の文字認識信頼度である。即ち、文字認識信頼度
（外11）

を重み係数の値とする。例えば、目標文字の左側にある第一個の文字の右境界から、該目標文字の右側にある第一個の文字の左境界までの距離（図8Aに示すように）を、該目標文字の文字幅
（外12）

とすることができる。理解すべきは、これは、文字幅の１つの例だけである。例えば、文字幅は、文字自身の幅、文字の両側にある隙間の幅と該文字自身の幅との和、文字の両側にある隙間の幅の半分と該文字自身の幅との和、文字自身の幅と該文字の右側にある隙間の幅との和、及び、文字自身の幅と該文字の左側にある隙間の幅との和、のうちの何れか１つであってもよい。 In step S710, the weighted average character width can be estimated based on the initial recognition result, that is, the weighted average character width (outside 7).

Of which
(Outside 8)

Is the character width of the i-th character in the initial recognition result,
(Outside 9)

Is
(Outside 10)

Is the character recognition reliability. That is, character recognition reliability (Outside 11)

Is the value of the weighting factor. For example, the distance (as shown in FIG. 8A) from the right boundary of the first character on the left side of the target character to the left boundary of the first character on the right side of the target character is Character width (outside 12)

It can be. It should be understood that this is only one example of character width. For example, the character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the half of the width of the gap on both sides of the character and the width of the character itself, It may be any one of the sum of its own width and the width of the gap on the right side of the character and the sum of the width of the character itself and the width of the gap on the left side of the character.

ステップS715では、探索経路中の各文字の文字幅と重み付き平均文字幅との間の差を確定することができる。 In step S715, the difference between the character width of each character in the search path and the weighted average character width can be determined.

例えば、文字幅文脈特徴及び元の特徴を組み合わせて、次のような新しい特徴を定義してもよい。

For example, the following new features may be defined by combining the character width context feature and the original feature.

この定義は、目標文字の文字幅とテキスト行の平均文字幅とが接近すれば、比較的小さいペナルティ係数が得られ、これに反して、比較的大きいペナルティ係数が得られることを示している。 This definition shows that if the character width of the target character and the average character width of the text line are close, a relatively small penalty coefficient can be obtained and, on the other hand, a relatively large penalty coefficient can be obtained.

ステップS720では、上述の差に基づいて文字幅文脈特徴を算出することができる。 In step S720, a character width context feature can be calculated based on the difference described above.

続いて、ステップS115では、文字幅文脈特徴に基づいて、経路探索戦略により文字列領域に含まれる文字列を認識することができる。 Subsequently, in step S115, the character string included in the character string region can be recognized by the route search strategy based on the character width context feature.

図8Aは、本発明による例示的な文字列領域を示す図であり、図8Bは、本発明による例示的な、文字幅文脈特徴を導入しない時の文字列認識結果を示す図であり、図8Cは、本発明による例示的な、文字幅文脈特徴を導入した時の文字列認識結果を示す図である。図8Bに示すように、文字幅文脈特徴を導入しない時に、認識結果は、“成就我胤造”である。図8Cに示すように、文字幅文脈特徴を導入した時に、認識結果は、“成就我[智]造”である。これによって分かるように、文字幅文脈特徴を導入することにより、文字認識の正確率を向上させることができる。 FIG. 8A is a diagram illustrating an exemplary character string region according to the present invention, and FIG. 8B is a diagram illustrating an exemplary character string recognition result when no character width context feature is introduced according to the present invention. FIG. 8C is a diagram illustrating a character string recognition result when an exemplary character width context feature according to the present invention is introduced. As shown in FIG. 8B, when the character width context feature is not introduced, the recognition result is “Fulfillment”. As shown in FIG. 8C, when the character width context feature is introduced, the recognition result is “Fulfillment”. As can be seen, the accuracy of character recognition can be improved by introducing character width context features.

次に、文字種類文脈特徴を導入した場合、及び文字幅文脈特徴を導入した場合についてそれぞれ説明する。実際には、上述の数1により、文字種類文脈特徴及び文字幅文脈特徴を同時に導入することができる。また、他の特徴、例えば、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴などを導入することもできる。 Next, a case where a character type context feature is introduced and a case where a character width context feature is introduced will be described. Actually, the character type context feature and the character width context feature can be simultaneously introduced by the above-described equation (1). Other features can also be introduced, such as single character recognizer features, semantic context features and geometric context features.

次に、図9を参照して、本発明による例示的な、画像中の文字列を認識する処理を説明する。 Next, an exemplary process for recognizing a character string in an image according to the present invention will be described with reference to FIG.

図9に示すように、上述の言語種類文脈特徴及び文字幅文脈特徴の他、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴も同時に導入している。 As shown in FIG. 9, in addition to the language type context feature and the character width context feature described above, a single character recognizer feature, a semantic context feature, and a geometric context feature are simultaneously introduced.

理解すべきは、必ずしもこれらの特徴の全てを同時に導入する必要がなく、そのうちの１つ又は複数を導入してもよい。 It should be understood that not all of these features need be introduced at the same time, and one or more of them may be introduced.

次に、図10を参照して、本発明の実施例による画像中の文字列を認識する装置1000を説明する。 Next, an apparatus 1000 for recognizing a character string in an image according to an embodiment of the present invention will be described with reference to FIG.

図10に示すように、画像中の文字列を認識する装置1000は、抽出ユニット1005、分割ユニット1010及び認識ユニット1015を含んでもよい。 As shown in FIG. 10, an apparatus 1000 that recognizes a character string in an image may include an extraction unit 1005, a division unit 1010, and a recognition unit 1015.

抽出ユニット1005は、画像中の文字列領域を抽出することができる。分割ユニット1010は、文字列領域に対してオーバーセグメンテーションを行うことができる。認識ユニット1015は、言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略により文字列領域に含まれる文字列を認識することができる。 The extraction unit 1005 can extract a character string region in the image. The dividing unit 1010 can perform over-segmentation on the character string area. The recognition unit 1015 can recognize a character string included in the character string region by a route search strategy based on at least one of the language type context feature and the character width context feature.

次に、図11を参照して、本発明の他の実施例による、画像中の文字列を認識する装置1000’を説明する。図11に示すような画像中の文字列を認識する装置1000’と、図10に示すような画像中の文字列を認識する装置1000との相違点は、認識ユニット1015は、言語種類確定サブユニット1015-1及び第一計算サブユニット1015-2を含むことにある。 Next, an apparatus 1000 'for recognizing a character string in an image according to another embodiment of the present invention will be described with reference to FIG. The difference between the device 1000 ′ for recognizing the character string in the image as shown in FIG. 11 and the device 1000 for recognizing the character string in the image as shown in FIG. It includes a unit 1015-1 and a first calculation subunit 1015-2.

言語種類確定サブユニット1015-1は、経路中の各文字及びその１つの隣接する文字が同じ種類の言語に属するかどうかを確定することができる。或いは、言語種類確定サブユニット1015-1は、経路中の各文字及びその複数の隣接する文字中の各隣接する文字が同じ種類の言語に属するかどうかを確定することができる。 The language type determination subunit 1015-1 can determine whether each character in the path and its one adjacent character belong to the same type of language. Alternatively, the language type determination subunit 1015-1 can determine whether each character in the path and each adjacent character in the plurality of adjacent characters belong to the same type of language.

第一計算サブユニット1015-2は、上述の確定結果に基づいて言語種類文脈特徴を算出することができる。 The first calculation subunit 1015-2 can calculate the language type context feature based on the above-described determination result.

次に、図12を参照して、本発明の他の実施例による、画像中の文字列を認識する装置1000”を説明する。図12に示すような画像中の文字列を認識する装置1000”と、図10に示すような画像中の文字列を認識する装置1000との相違点は、認識ユニット1015は、初期認識サブユニット1015-3、平均文字幅推定サブユニット1015-4、差異確定サブユニット1015-5及び第二計算サブユニット1015-6を含むことにある。 Next, an apparatus 1000 "for recognizing a character string in an image according to another embodiment of the present invention will be described with reference to FIG. 12. An apparatus 1000 for recognizing a character string in an image as shown in FIG. ”And the apparatus 1000 for recognizing the character string in the image as shown in FIG. 10 are that the recognition unit 1015 has an initial recognition subunit 1015-3, an average character width estimation subunit 1015-4, and a difference confirmation. It includes a subunit 1015-5 and a second calculation subunit 1015-6.

初期認識サブユニット1015-3は、文字列領域に対して初期認識を行うことができる。平均文字幅推定サブユニット1015-4は、初期認識結果に基づいて重み付き平均文字幅を推定することができ、即ち、重み付き平均文字幅
（外13）

であり、そのうち、
（外14）

は、初期認識結果中の第i個文字の文字幅であり、
（外15）

は、
（外16）

の文字認識信頼度である。差異確定サブユニット1015-5は、経路中の各文字の文字幅と、重み付き平均文字幅との間の差を確定することができる。第二計算サブユニット1015-6は、この差に基づいて文字幅文脈特徴を算出することができる。 The initial recognition subunit 1015-3 can perform initial recognition on the character string area. The average character width estimation subunit 1015-4 can estimate the weighted average character width based on the initial recognition result, that is, the weighted average character width (outside 13).

Of which
(Outside 14)

Is the character width of the i-th character in the initial recognition result,
(Outside 15)

Is
(Outside 16)

Is the character recognition reliability. The difference determination subunit 1015-5 can determine the difference between the character width of each character in the path and the weighted average character width. The second calculation subunit 1015-6 can calculate a character width context feature based on this difference.

そのうち、文字幅は、文字自身の幅、文字の両側にある隙間の幅と該文字自身の幅との和、文字の両側にある隙間の幅の半分と該文字自身の幅との和、文字自身の幅と該文字の右側にある隙間の幅との和、及び、文字自身の幅と該文字の左側にある隙間の幅との和、のうちの何れか１つである。 Among them, the character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the half of the width of the gap on both sides of the character and the width of the character itself, One of the width of the character itself and the width of the gap on the right side of the character, and the sum of the width of the character itself and the width of the gap on the left side of the character.

画像中の文字列を認識する装置1000、1000’及び1000”は、前処理ユニット（図示せず）を含んでもよく、前処理ユニットは、文字列領域に対して前処理を行うことができる。 The devices 1000, 1000 'and 1000' 'for recognizing character strings in an image may include a preprocessing unit (not shown), and the preprocessing unit can perform preprocessing on a character string region.

オプションで、認識ユニット1015は、言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴と、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴のうちの少なくとも１つの特徴とに基づいて、経路探索戦略により文字列領域に含まれる文字列を認識することができる。 Optionally, the recognition unit 1015 may include at least one of a language type context feature and a character width context feature and at least one of a single character recognizer feature, a semantic context feature, and a geometric context feature. Based on this, the character string included in the character string region can be recognized by the route search strategy.

本発明の実施例によれば、言語種類文脈特徴及び文字幅文脈特徴を用いることにより、自然シーンの画像中の文字列の認識精度を向上させることができる。 According to the embodiment of the present invention, the recognition accuracy of the character string in the image of the natural scene can be improved by using the language type context feature and the character width context feature.

そのうち、言語種類文脈特徴は、“同一テキスト行では、文字が同じ種類に属する”という合理的な制約を強化している。文字幅文脈特徴は、“同一テキスト行では、文字幅が近い”という合理的な制約を強化している。本発明の実施例によれば、“分割−認識”という仕組みの下で、２種類の新しい特徴（即ち、言語種類文脈特徴及び文字幅文脈特徴）と、従来の特徴とを１つの目標最適化特徴関数に統合し、これに基づいて自然シーンの画像中の文字列に対して認識を行うことができる。また、実験結果によれば、本発明が提供しているこの２種類の新しい特徴は、自然シーンの画像中の文字列を認識する時の認識精度の顕著な向上に役立つことができる。 Among them, the language type context feature reinforces the rational constraint that “characters belong to the same type in the same text line”. The character width context feature reinforces the rational constraint that “character width is close in the same text line”. According to an embodiment of the present invention, two types of new features (ie, a language type context feature and a character width context feature) and a conventional feature are combined into one target optimization under the mechanism of “split-recognition”. It is possible to recognize a character string in an image of a natural scene based on the feature function. Also, according to the experimental results, these two kinds of new features provided by the present invention can be useful for significantly improving recognition accuracy when recognizing a character string in an image of a natural scene.

上述の実施例による、画像中の文字列を認識する方法及び装置における各ステップや構成ユニットなどは、ソフトウェア、ファームウェア、ハードウェア又はそれらの任意の組み合わせの方式で実現されてもよい。ソフトウェア又はファームウェアにより実現される場合は、記憶媒体又はネットワークから、専用ハードウェア構造を有する装置（例えば図13に示す汎用装置1300）に、このソフトウェア又はファームウェアを構成するプログラムをインストールすることができる。この装置は、各種のプログラムがインストールされている時に、上述の各構成ユニットやステップの各種の機能を行うことができる。 Each step, component unit, and the like in the method and apparatus for recognizing a character string in an image according to the above-described embodiment may be realized by a method of software, firmware, hardware, or any combination thereof. When realized by software or firmware, a program constituting the software or firmware can be installed from a storage medium or a network to a device having a dedicated hardware structure (for example, the general-purpose device 1300 shown in FIG. 13). This apparatus can perform various functions of the above-described constituent units and steps when various programs are installed.

図13は、本発明の実施例による、画像中の文字列を認識する方法及び装置を実施するために用い得る計算装置の例示的な構造図である。 FIG. 13 is an exemplary structural diagram of a computing device that can be used to implement a method and apparatus for recognizing character strings in an image according to an embodiment of the present invention.

図13では、中央処理ユニット（CPU）1301は、ROM 1302に記憶されているプログラム、又は、記憶部1308からRAM 1303にロードされているプログラムに基づいて、各種の処理を行う。RAM 1303は、必要に応じて、CPU 1301が各種の処理などを実行する時に必要なデータを記憶する。CPU 1301、ROM 1302及びRAM 1303は、バス1304により互いに接続される。入力/出力インタフェース1305もバス1304に接続される。 In FIG. 13, the central processing unit (CPU) 1301 performs various processes based on a program stored in the ROM 1302 or a program loaded from the storage unit 1308 to the RAM 1303. The RAM 1303 stores data necessary when the CPU 1301 executes various processes as necessary. The CPU 1301, ROM 1302, and RAM 1303 are connected to each other via a bus 1304. An input / output interface 1305 is also connected to the bus 1304.

また、入力/出力インタフェース1305に接続されるのは、入力部1306（キーボード、マウスなどを含み）、出力部1307（例えばCRT、LCDのような表示器及びスピーカーなどを含み）、記憶部1308（ハードディスクなどを含み）、通信部1309（例えばLANカード、モデムなどのネットワークアクセスカードを含み）をも含む。通信部1309は、ネットワーク、例えばインターネットを介して通信処理を行う。必要に応じて、ドライブ1310も入力/出力インタフェース1305に接続され得る。取り外し可能な媒体1311、例えば磁気ディスク、光ディスク、光磁気ディスク、半導体記憶装置なども、必要に応じてドライブ1310に取り付けされてもよく、その中から読み出されたコンピュータプログラムは、必要に応じて記憶部1308にインストールされ得る。 Also connected to the input / output interface 1305 are an input unit 1306 (including a keyboard and a mouse), an output unit 1307 (including a display such as a CRT and an LCD, a speaker, etc.), and a storage unit 1308 (including And a communication unit 1309 (including a network access card such as a LAN card and a modem). The communication unit 1309 performs communication processing via a network, for example, the Internet. If desired, the drive 1310 can also be connected to the input / output interface 1305. A removable medium 1311, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor storage device, etc., may also be attached to the drive 1310 as necessary, and the computer program read out from it may be It can be installed in the storage unit 1308.

ソフトウェアにより上述の一連の処理を実現する場合、ネットワーク、例えばインターネット、又は、記憶媒体、例えば取り外し可能な媒体介質1311からソフトウェアを構成するプログラムをインストールしてもよい。 When the above-described series of processing is realized by software, a program constituting the software may be installed from a network, for example, the Internet, or a storage medium, for example, a removable medium medium 1311.

なお、当業者が理解すべきは、このような記憶媒体は、中にプログラムが記憶されており、ユーザにプログラムを提供するよう装置と独立して配られる図13に示すような取り外し可能な媒体1311に限定されない。取り外し可能な媒体1311の例としては、磁気ディスク（フロッピー（登録商標）ディスクを含む）、光ディスク（ＣＤ−ＲＯＭ及びＤＶＤを含む）、光磁気ディスク（ＭＤ（登録商標）を含む）、及び半導体メモリを含む。或いは、記憶媒体はROM1302、記憶部1308に含まれるハードディスクなどであってもよく、それらにはプログラムが記憶されており、且つそれらを含む装置とともにユーザに配られてもよい。 It should be understood by those skilled in the art that such a storage medium has a program stored therein and is a removable medium as shown in FIG. 13 that is distributed independently of the apparatus so as to provide the program to the user. It is not limited to 1311. Examples of the removable medium 1311 include a magnetic disk (including a floppy (registered trademark) disk), an optical disk (including a CD-ROM and a DVD), a magneto-optical disk (including MD (registered trademark)), and a semiconductor memory. including. Alternatively, the storage medium may be a ROM 1302, a hard disk included in the storage unit 1308, or the like, in which a program is stored and distributed to a user together with a device including them.

また、本開示は、マシン（例えば、コンピュータ）読取可能な命令コードからなるプログラムプロダクトにも関する。この命令コードは、マシンに読み取られて実行される時に、上述の実施例による方法を実行することができる。それ相応に、上述のマシン読取可能な命令コードからなるプログラムプロダクトを記憶している記憶媒体も本開示に含まれている。このような記憶媒体は、磁気ディスク（フロッピーディスク）、光ディスク、光磁気ディスク、メモリカード、メモリメモリスティックなどを含むが、これらに限定されない。 The present disclosure also relates to a program product comprising machine (eg, computer) readable instruction code. When this instruction code is read and executed by a machine, the method according to the above-described embodiment can be executed. Accordingly, the present disclosure also includes a storage medium that stores a program product comprising the machine-readable instruction code described above. Such storage media include, but are not limited to, magnetic disks (floppy disks), optical disks, magneto-optical disks, memory cards, memory memory sticks, and the like.

また、本開示の一つの図面又は一つの実施例に記載の要素及び特徴は、一つ以上の他の図面又は実施例に示す要素及び特徴と組み合わせることができる。 Also, elements and features described in one drawing or embodiment of the present disclosure may be combined with elements and features shown in one or more other drawings or embodiments.

また、上述の一連の処理を行うステップは、上述に説明した順序に従って時間順に行ってもよいが、必ずしも時間順に行う必要がない。一部のステップは、並行又は互いに独立で行ってもよい。 In addition, the steps of performing the above-described series of processing may be performed in time order according to the order described above, but are not necessarily performed in time order. Some steps may be performed in parallel or independently of each other.

また、本開示による上述の方法の各処理プロセスは、各種のマシン読み取り可能な記憶媒体に記憶されるコンピュータ実行可能なプログラムで実現され得ることも明らかである。 It is also apparent that each processing process of the above-described method according to the present disclosure can be realized by a computer-executable program stored in various machine-readable storage media.

また、本開示の目的は、次の方法で実現されてもよい。即ち、上述の実行可能なプログラムコードを記憶している記憶媒体を直接又は間接的にシステム又は装置に提供し、且つ、該システム又は装置内のコンピュータ又はCPUは、上述のプログラムコードを読み出して実行する。 The object of the present disclosure may be realized by the following method. That is, a storage medium storing the above-described executable program code is provided directly or indirectly to the system or apparatus, and the computer or CPU in the system or apparatus reads out and executes the above-described program code To do.

このとき、システム又は装置はプログラムを実行する機能を有すれば、本発明の実施形態はプログラムに限定されず、且つ、該プログラムは任意の形式であってもよく、例えば、オブジェクトプログラム、インタープリター実行可能なプログラム、又は、オペレーティングシステムへのスクリプトプログラムであってもよい。 At this time, as long as the system or apparatus has a function of executing the program, the embodiment of the present invention is not limited to the program, and the program may be in an arbitrary format, for example, an object program, an interpreter. It may be an executable program or a script program for the operating system.

上述のマシン読み取り可能な記憶媒体は、各種の記憶器及び記憶ユニット、半導体装置、光、磁気及び光磁気ディスクのような磁気ディスクユニット、及び情報記憶に適する他の媒体等を含むが、これらに限定されない。 The machine-readable storage medium described above includes various storage devices and storage units, semiconductor devices, magnetic disk units such as optical, magnetic and magneto-optical disks, and other media suitable for information storage. It is not limited.

また、クライントコンピュータは、インターネットを介して、対応するサーバに接続し、且つ、本発明によるコンピュータプログムラコードをコンピュータにダウンロードしてインストールし、それから、このプログラムを実行することにより、本発明を実現することもできる。 In addition, the client computer connects to a corresponding server via the Internet, downloads and installs the computer program code according to the present invention to the computer, and then executes the program to realize the present invention. You can also

最後に説明すべきは、本文では、例えば、「第一」及び「第二」などのような関係を表す語は、１つの実体又は操作と、もう１つの実体又は操作とを区分するためだけのものであり、これらの実体又は操作の間にそのような実際の関係又は順序が存在するとの意味又は示唆を有しない。また、「含む」、「有する」の語又はその他の変形語は、非排他的な「含む」を包括するため用いられ、これにより、一連の要素を含むプロセス、方法、物品又は装置は、これらの要素だけでなく、明記されていない他の要素を含んでもよく、或いは、このプロセス、方法、物品又は装置が所有する固有の要素を含むものである。より多くの限定が無い場合、「・・・を含む」という語句で限定される要素は、この要素を含むプロセス、方法、物品又は装置に存在する他の同じ要素を排除しない。 Lastly, it should be explained that in the text, for example, terms such as “first” and “second” are only used to distinguish one entity or operation from another. And has no meaning or suggestion that there is such an actual relationship or order between these entities or operations. Also, the word “comprising”, “having” or other variations are used to encompass non-exclusive “comprising”, whereby a process, method, article or device comprising a series of elements May include other elements not specified, or may include unique elements possessed by the process, method, article or device. In the absence of more limitations, an element defined by the phrase “including” does not exclude other identical elements present in the process, method, article, or apparatus containing the element.

また、上述の各実施例を含む実施形態に関し、更に以下の付記を開示する。 Moreover, the following additional remarks are disclosed regarding the embodiment including each of the above-described examples.

（付記1）
画像中の文字列を認識する方法であって、
前記画像中の文字列領域を抽出するステップと、
前記文字列領域に対してオーバーセグメンテーションを行うステップと、
言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略により前記文字列領域に含まれる文字列を認識するステップと、を含む、方法。 (Appendix 1)
A method for recognizing a character string in an image,
Extracting a character string region in the image;
Over-segmenting the string region;
Recognizing a character string included in the character string region by a route search strategy based on at least one of a language type context feature and a character width context feature.

（付記2）
付記1に記載の方法であって、
前記言語種類文脈特徴は、各探索経路について、
前記探索経路中の各文字及びその一つの隣接する文字が同じ種類の言語に属するかどうかを確定し、及び、
前記確定結果に基づいて、前記言語種類文脈特徴を計算することにより確定される、方法。 (Appendix 2)
The method according to appendix 1, wherein
The language type context feature is:
Determining whether each character in the search path and its one adjacent character belong to the same type of language; and
The method is determined by calculating the language type context feature based on the determination result.

（付記3）
付記1に記載の方法であって、
前記言語種類文脈特徴は、各探索経路について、
前記探索経路中の各文字及びその複数の隣接する文字中の各隣接する文字が同じ種類の言語に属するかどうかを確定し、及び、
前記確定結果に基づいて、前記言語種類文脈特徴を計算することより確定される、方法。 (Appendix 3)
The method according to appendix 1, wherein
The language type context feature is:
Determining whether each character in the search path and each adjacent character in the plurality of adjacent characters belong to the same type of language; and
The method is determined by calculating the language type context feature based on the determination result.

（付記4）
付記1に記載の方法であって、
前記文字幅文脈特徴は、各探索経路について、
前記文字列領域に対して初期認識を行い、
前記初期認識結果に基づいて、次の数5により重み付き平均文字幅を推定し、
前記探索経路中の各文字の文字幅と、前記重み付き平均文字幅との間の差を確定し、及び、
前記差に基づいて、前記文字幅文脈特徴を計算することにより確定され、

ここで、
（外17）

は、前記初期認識結果中の第i個文字の文字幅であり、
（外18）

は、
（外19）

の信頼度である、方法。 (Appendix 4)
The method according to appendix 1, wherein
The character width context feature is:
Initial recognition is performed on the character string area,
Based on the initial recognition result, the weighted average character width is estimated by the following formula 5,
Determining the difference between the character width of each character in the search path and the weighted average character width; and
Determined by calculating the character width context feature based on the difference;

here,
(Outside 17)

Is the character width of the i-th character in the initial recognition result,
(Outside 18)

Is
(Outside 19)

Is the confidence of the method.

（付記5）
付記4に記載の方法であって、
前記文字幅は、文字自身の幅、文字の両側にある隙間の幅と該文字自身の幅との和、文字の両側にある隙間の幅の半分と該文字自身の幅との和、文字自身の幅と該文字の右側にある隙間の幅との和、及び、文字自身の幅と該文字の左側にある隙間の幅との和、のうちの何れか１つである、方法。 (Appendix 5)
The method according to appendix 4, wherein
The character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the character itself And the width of the gap on the right side of the character and the sum of the width of the character itself and the width of the gap on the left side of the character.

（付記6）
付記1乃至5の何れか1つに記載の方法であって、
前記文字列領域に対してオーバーセグメンテーションを行う前に、前記文字列領域に対して前処理を行うステップを更に含む、方法。 (Appendix 6)
The method according to any one of appendices 1 to 5, wherein
A method further comprising pre-processing the character string area before over-segmenting the character string area.

（付記7）
付記1乃至5の何れか1つに記載の方法であって、
言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略により前記文字列領域に含まれる文字列を認識するステップは、
前記言語種類文脈特徴及び前記文字幅文脈特徴のうちの少なくとも１つの特徴と、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴のうちの少なくとも１つの特徴とに基づいて、前記経路探索戦略により前記文字列領域に含まれる文字列を認識するステップを含む、方法。 (Appendix 7)
The method according to any one of appendices 1 to 5, wherein
Based on at least one of the language type context feature and the character width context feature, the step of recognizing a character string included in the character string region by a route search strategy includes:
The path search based on at least one of the language type context feature and the character width context feature and at least one feature of a single character recognizer feature, semantic context feature, and geometric context feature. Recognizing a character string included in the character string region by a strategy.

（付記8）
画像中の文字列を認識する装置であって、
前記画像中の文字列領域を抽出する抽出ユニットと、
前記文字列領域に対してオーバーセグメンテーションを行う分割ユニットと、
言語種類文脈特徴及び文字幅文脈特徴のうちの少なくとも１つの特徴に基づいて、経路探索戦略により前記文字列領域に含まれる文字列を認識する認識ユニットと、を含む、装置。 (Appendix 8)
A device for recognizing a character string in an image,
An extraction unit for extracting a character string region in the image;
A division unit that performs over-segmentation on the character string region;
A recognition unit for recognizing a character string included in the character string region by a route search strategy based on at least one of a language type context feature and a character width context feature.

（付記9）
付記8に記載の装置であって、
前記認識ユニットは、
前記探索経路中の各文字及びその1つの隣接する文字が同じ種類の言語に属するかどうかを確定する言語種類確定サブユニットと、
前記確定結果に基づいて、前記言語種類文脈特徴を計算する第一計算サブユニットと、を含む、装置。 (Appendix 9)
The apparatus according to appendix 8, wherein
The recognition unit is
A language type determination subunit for determining whether each character in the search path and its one adjacent character belong to the same type of language;
A first calculation subunit that calculates the language type context feature based on the determination result.

（付記10）
付記8に記載の装置であって、
前記認識ユニットは、
前記探索経路中の各文字及びその複数の隣接する文字中の各隣接する文字が同じ種類の言語に属するかどうかを確定する言語種類確定サブユニットと、
前記確定結果に基づいて、前記言語種類文脈特徴を計算する第一計算サブユニットと、を含む、装置。 (Appendix 10)
The apparatus according to appendix 8, wherein
The recognition unit is
A language type determination subunit for determining whether each character in the search path and each adjacent character in the plurality of adjacent characters belong to the same type of language;
A first calculation subunit that calculates the language type context feature based on the determination result.

（付記11）
付記8に記載の装置であって、
前記認識ユニットは、
前記文字列領域に対して初期認識を行う初期認識サブユニットと、
前記初期認識結果に基づいて、次の数6により重み付き平均文字幅を推定する平均文字幅推定サブユニットと、
前記探索経路中の各文字の文字幅と、前記重み付き平均文字幅との間の差を確定する差異確定サブユニットと、
前記差に基づいて、前記文字幅文脈特徴を計算する第二計算サブユニットと、を含み、

ここで、
（外20）

は、前記初期認識結果中の第i個文字の文字幅であり、
（外21）

は、
（外22）

の信頼度である、装置。 (Appendix 11)
The apparatus according to appendix 8, wherein
The recognition unit is
An initial recognition subunit for performing initial recognition on the character string area;
Based on the initial recognition result, an average character width estimation subunit that estimates a weighted average character width according to the following equation 6,
A difference determination subunit for determining a difference between a character width of each character in the search path and the weighted average character width;
A second calculation subunit that calculates the character width context feature based on the difference, and

here,
(Outside 20)

Is the character width of the i-th character in the initial recognition result,
(Outside 21)

Is
(Outside 22)

The reliability of the device.

（付記12）
付記11に記載の装置であって、
前記文字幅は、文字自身の幅、文字の両側にある隙間の幅と該文字自身の幅との和、文字の両側にある隙間の幅の半分と該文字自身の幅との和、文字自身の幅と該文字の右側にある隙間の幅との和、及び、文字自身の幅と該文字の左側にある隙間の幅との和、のうちの何れか１つである、装置。 (Appendix 12)
The apparatus according to appendix 11, wherein
The character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the character itself And the width of the gap on the right side of the character, and the sum of the width of the character itself and the width of the gap on the left side of the character.

（付記13）
付記8乃至12の何れか1つに記載の装置であって、
前記文字列領域に対して前処理を行う前処理ユニットを更に含む、装置。 (Appendix 13)
The device according to any one of appendices 8 to 12,
The apparatus further includes a preprocessing unit for performing preprocessing on the character string area.

（付記14）
付記8乃至12の何れか1つに記載の装置であって、
前記認識ユニットは、前記言語種類文脈特徴及び前記文字幅文脈特徴のうちの少なくとも１つの特徴と、単字認識器特徴、語義文脈特徴及び幾何学的文脈特徴のうちの少なくとも１つの特徴とに基づいて、前記経路探索戦略により前記文字列領域に含まれる文字列を認識する、装置。 (Appendix 14)
The device according to any one of appendices 8 to 12,
The recognition unit is based on at least one of the language type context feature and the character width context feature and at least one of a single character recognizer feature, a semantic context feature, and a geometric context feature. An apparatus for recognizing a character string included in the character string region by the route search strategy.

（付記15）
コンピュータに、付記1に記載の方法の各ステップを実行させるためのプログラム。 (Appendix 15)
A program for causing a computer to execute each step of the method described in Appendix 1.

（付記16）
付記15に記載のプログラムを記録しているコンピュータ読み出し可能な記憶媒体。 (Appendix 16)
A computer-readable storage medium storing the program according to attachment 15.

以上、本発明の好ましい実施形態を説明したが、本発明はこの実施形態に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は本発明の技術的範囲に属する。
The preferred embodiment of the present invention has been described above, but the present invention is not limited to this embodiment, and all modifications to the present invention belong to the technical scope of the present invention unless departing from the spirit of the present invention.

Claims

A method for recognizing a character string in an image,
An extraction step of extracting a character string region in the image;
A division step of performing over-segmentation on the character string region;
Recognizing a character string included in the character string region by a route search strategy based on at least one of a language type context feature and a character width context feature.

The method of claim 1, comprising
The language type context feature is:
Determining whether each character in the search path and its one adjacent character belong to the same type of language;
The method is determined by calculating the language type context feature based on the result of the determination.

The method of claim 1, comprising
The language type context feature is:
Determining whether each character in the search path and each adjacent character in the plurality of adjacent characters belong to the same type of language;
The method is determined by calculating the language type context feature based on the result of the determination.

The method of claim 1, comprising
The character width context feature is:
Initial recognition is performed on the character string area,
Based on the result of the initial recognition, the weighted average character width is estimated by the following formula 7,
Determining the difference between the character width of each character in the search path and the weighted average character width;
Determined by calculating the character width context feature based on the difference;

here,
(Outside 23)

Is the weighted average character width;
(Outside 24)

Is the character width of the i-th character in the result of the initial recognition,
(Outside 25)

Is
(Outside 26)

Is the confidence of the method.

A method according to claim 4, wherein
The character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the character itself And the width of the gap on the right side of the character and the sum of the width of the character itself and the width of the gap on the left side of the character.

A device for recognizing a character string in an image,
An extraction unit for extracting a character string region in the image;
A division unit that performs over-segmentation on the character string region;
A recognition unit for recognizing a character string included in the character string region by a route search strategy based on at least one of a language type context feature and a character width context feature.

The apparatus according to claim 6, wherein
The recognition unit is
A language type determination subunit for determining whether each character in the search path and its one adjacent character belong to the same type of language;
A first calculation subunit for calculating the language type context feature based on the result of the determination.

The apparatus according to claim 6, wherein
The recognition unit is
A language type determination subunit for determining whether each character in the search path and each adjacent character in the plurality of adjacent characters belong to the same type of language;
A first calculation subunit for calculating the language type context feature based on the result of the determination.

The apparatus according to claim 6, wherein
The recognition unit is
An initial recognition subunit for performing initial recognition on the character string area;
Based on the result of the initial recognition, an average character width estimation subunit that estimates a weighted average character width by the following formula 8;
A difference determination subunit for determining a difference between a character width of each character in the search path and the weighted average character width;
A second calculation subunit that calculates the character width context feature based on the difference, and

here,
(Outside 27)

Is the weighted average character width;
(Outside 28)

Is the character width of the i-th character in the result of the initial recognition,
(Outside 29)

Is
(Outside 30)

The reliability of the device.

The apparatus according to claim 9, wherein
The character width is the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the sum of the width of the gap on both sides of the character and the width of the character itself, the character itself And the width of the gap on the right side of the character, and the sum of the width of the character itself and the width of the gap on the left side of the character.