JP2017228297A

JP2017228297A - Text detection method and apparatus

Info

Publication number: JP2017228297A
Application number: JP2017122474A
Authority: JP
Inventors: マーウェンフォア; Wenhua Ma
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-06-23
Filing date: 2017-06-22
Publication date: 2017-12-28
Anticipated expiration: 2037-06-22
Also published as: JP6377214B2; CN107545261A

Abstract

PROBLEM TO BE SOLVED: To provide a text detection method and apparatus for finding a text area in a natural scenery image being a difficult subject, and also to provide a text information extraction method and system.SOLUTION: A method for detecting a text area in an image includes: generating components from an input image; grouping the components to form a component group; using characteristics obtained from distribution of component connections to classify the component group into a text group and a non-text group; and generating a text area according to the text group. Thus the method improves accuracy and reproducibility at a time cost similar to that of a prior art.SELECTED DRAWING: Figure 2

Description

本発明は主として、しかしながら非排他的に、コンピュータビジョン、画像処理および画像理解に関し、特にテキスト検出方法および装置に関する。 The present invention relates primarily, but non-exclusively, to computer vision, image processing and image understanding, and more particularly to a text detection method and apparatus.

自然風景画像や動画フレーム内のテキストは、視覚的コンテンツの理解や取得のために重要な情報を有する。画像において、特に自然画像や動画フレームにおいてテキストを検出することは、視覚障害者用または外国人用のコンピュータ化された補助器や画像および動画の自動取得や都市環境におけるロボットナビゲーションなどの多くのコンピュータビジョンアプリケーションにとって決定的である。 The text in the natural scenery image and the moving image frame has important information for understanding and acquiring the visual content. Detecting text in images, especially in natural images and video frames, is a computer aid for visually impaired or foreigners and many computers such as automatic image and video acquisition and robot navigation in urban environments It is decisive for vision applications.

それにもかかわらず、自然風景におけるテキスト検出は困難な主題である。主な挑戦は、異なるフォントやサイズやスキュー角や歪みなどを伴う種々のテキストにある。不均一な照明や反射、悪い照明条件、複雑な背景などの環境ファクタがさらなる複雑さを追加する。 Nevertheless, text detection in natural landscapes is a difficult subject. The main challenge lies in various texts with different fonts, sizes, skew angles and distortions. Environmental factors such as non-uniform lighting and reflections, bad lighting conditions, and complex backgrounds add further complexity.

関連文献では、自然風景においてテキスト領域を検出するためのテキスト検出方法は、画像から個別化されたコンポーネントを生成するステップと、ある規則に基づいてコンポーネントをグループ化することでコンポーネントグループを生成するステップと、次いでコンポーネントグループを検証することで非テキストグループを除去し、残ったテキストグループを用いてテキスト領域（例えば、テキストライン、語）を回復するステップと、に主に従う。 In related literature, a text detection method for detecting a text region in a natural landscape includes a step of generating individualized components from an image and a step of generating component groups by grouping components based on a certain rule. And then removing the non-text group by verifying the component group and recovering the text region (eg, text line, word) using the remaining text group.

コンポーネントグループ化が通常さらなる検証を必要とするのには主に二つの理由がある。第１に、テキストグループと似た知覚構成をたまたま持つ非テキストコンポーネントによって構成されるノイズコンポーネントグループが存在する。例えば、空間的に近く、かつ、外観が似ている複数の非テキストコンポーネントもまた一緒にグループ化され、かつ保存されうる。第２に、複数ラインまたは複数方向テキストが自然風景に共通して見られるので、テキストコンポーネントをいかに正しくグループ化するか、はテキスト領域検出性能にとって非常に重要である。コンポーネントグループ化ステップの間、レイアウトパターンを判定するために、より少ない証拠に対応して複数の仮定が保存される。上述の解析に基づいて、公開文書におけるいくつかのテキスト検出方法はさらにグループ検証ステップを含む。そこでは、コンポーネントグループが解析され、テキストグループおよび非テキストグループとして分類され、テキストグループのみを用いてテキスト領域（例えば、テキストライン、語）が復元され、一方で非テキストグループは除去されるであろう。 There are two main reasons why component grouping usually requires further verification. First, there is a noise component group composed of non-text components that happen to have a perceptual configuration similar to a text group. For example, non-text components that are spatially close and similar in appearance can also be grouped and stored together. Second, since multiple lines or multi-directional text is commonly seen in natural scenery, how to properly group text components is very important for text area detection performance. During the component grouping step, multiple assumptions are stored corresponding to less evidence to determine the layout pattern. Based on the above analysis, some text detection methods in public documents further include a group verification step. There, component groups are parsed and classified as text groups and non-text groups, and text regions (eg, text lines, words) are restored using only text groups, while non-text groups are removed. Let's go.

例えば、中華人民共和国特許出願第１０３０７７３８９号および特許出願第１０４１８２７４４２９号の両方は、グループレベル特徴および分類器に基づいてどのようにコンポーネントグループを検証するかを開示した。グループレベル特徴は通常、二つの観点、すなわち規則性および文字尤度、からグループを記述する。前者は、グループに属するコンポーネントの、サイズ、色、ギャップおよびストローク幅の点での分散と、グループにおけるコンポーネントの空間配置と、を含む。グループにおけるコンポーネントの文字尤度は、通常、文字分類器によって測定され、次いでその値はグループ内部に統合される。これらのグループレベル特徴を、テキスト分類器の入力特徴ベクトルとして、またはカスケード規則として、用いることができる。その特徴に基づいてグループのテキスト信頼値が算出され、テキスト信頼度が高いグループが保存される。しかしながら、性能は特徴と分類器で用いられる訓練サンプルとに依存している。高い規則性を持つ非テキストグループを拒否することは難しく、また訓練サンプルとは異なるテキストグループを受け入れることも難しい。 For example, both the Chinese patent application 103077389 and the patent application 10418274429 disclosed how to validate component groups based on group level features and classifiers. Group level features typically describe a group from two perspectives: regularity and character likelihood. The former includes dispersion of components belonging to the group in terms of size, color, gap, and stroke width, and spatial arrangement of the components in the group. The character likelihood of a component in a group is usually measured by a character classifier, and then its value is integrated within the group. These group level features can be used as input feature vectors for text classifiers or as cascading rules. A text confidence value of the group is calculated based on the feature, and a group having a high text reliability is stored. However, performance depends on the features and training samples used in the classifier. It is difficult to reject non-text groups with high regularity, and it is also difficult to accept text groups that are different from the training sample.

さらなる例として、米国特許第８３２０６７４号および第６５６３９４９号の両方は、認識結果に基づいていかにコンポーネントグループを検証するかを開示した。コンポーネントグループはＯＣＲエンジンによって認識され、低い認識信頼度を伴うグループは拒否される。複数ラインまたは複数方向テキストなどの複雑なレイアウトのケースについて、言語モデルを満たすグループが保存される。しかしながら、性能は認識エンジンおよび言語モデルに非常に関連する。また、特にコンポーネント量が多い場合、グループ内の全てのコンポーネントを認識するのには時間がかかる。 As a further example, both U.S. Pat. Nos. 8,320,674 and 6,563,949 disclosed how to validate component groups based on recognition results. Component groups are recognized by the OCR engine, and groups with low recognition confidence are rejected. For complex layout cases such as multi-line or multi-directional text, groups that satisfy the language model are saved. However, performance is highly related to the recognition engine and language model. Also, it takes time to recognize all the components in the group, especially when the amount of components is large.

実際、認識結果をグループのひとつの特別な特徴として見なす場合、二種類の従来技術は統合されうる。それらのひとつの共通した短所は、各グループが別個に評価されることである。すなわち、画像のグローバル情報は無視される。 In fact, when considering the recognition result as one special feature of the group, the two types of prior art can be integrated. One common disadvantage is that each group is evaluated separately. That is, the global information of the image is ignored.

参考文献
後述の詳細な説明において、以下の文献が参照される。 References The following references are referred to in the detailed description below.

L.Neumann and J. Matas, "On combining multiple segmentations in scene text recognition", International Conference on Document Analysis and Recognition (ICDAR), pp 523 - 527, 2013.L. Neumann and J. Matas, "On combining multiple segmentations in scene text recognition", International Conference on Document Analysis and Recognition (ICDAR), pp 523-527, 2013. Xu-cheng Yin, Xuwang Y., Kaizhu H., Hongwei Hao, "Robust text detection in natural scene images", IEEE Trans. on Pattern Analysis and Machine Intellignece, Vol. 36, No. 5, 2014.Xu-cheng Yin, Xuwang Y., Kaizhu H., Hongwei Hao, "Robust text detection in natural scene images", IEEE Trans. On Pattern Analysis and Machine Intellignece, Vol. 36, No. 5, 2014. Boris Epshtein, Eyal Ofek, Yonatan Wexler, "Detecting text in natural scenes with stroke width transform", Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, pp. 2963-2970, 2010.Boris Epshtein, Eyal Ofek, Yonatan Wexler, "Detecting text in natural scenes with stroke width transform", Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, pp. 2963-2970, 2010. J. Matas, O. Chum, M. Urban and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions", Proc. of British Machine Vision Conference, pp. 384-396, 2002.J. Matas, O. Chum, M. Urban and T. Pajdla, "Robust wide baseline stereo from maximally stable extremal regions", Proc. Of British Machine Vision Conference, pp. 384-396, 2002. Chang C C, Lin.CJ.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST, 2003, 2(3):389-396.Chang C C, Lin.CJ.LIBSVM: A library for support vector machines [J] .ACM Transactions on Intelligent Systems and Technology (TIST, 2003, 2 (3): 389-396.

用語の説明
以下の用語は本書にたびたび現れ、後述の詳細な説明において定義される。
コンポーネントは、画像内のピクセルの集合であって、同様な色やストローク幅やグレースケールを伴い、空間的に接続されるピクセルの集合を指す。
テキストコンポーネントは文字の基本的要素を指す。
コンポーネントグループは、同様の外観を有し、かつ、直線的に並んだコンポーネントの集合を指す。
コンポーネント接続は、ひとつのコンポーネントグループ内の少なくとも二つの隣接するコンポーネントを含むコンポーネント集合を指す。
テキストグループは、テキストコンポーネントからなるコンポーネントグループを指す。
テキスト領域は、テキストグループのバウンディングボックスまたは四辺形を指し、テキスト検出の出力である。
グローバルプライム特徴は、ひとつの画像における、テキストグループの大抵のコンポーネント接続によって共有される共通特徴を指す。通常、それは選択されたコンテキスト情報であり、例えば約９０度の方向である。 Terminology The following terms appear frequently in this document and are defined in the detailed description below.
A component is a collection of pixels in an image that is spatially connected with similar colors, stroke widths, and gray scales.
A text component refers to the basic element of a character.
A component group refers to a collection of components that have a similar appearance and are arranged in a straight line.
A component connection refers to a component set that includes at least two adjacent components in a component group.
A text group refers to a component group composed of text components.
The text area refers to the bounding box or quadrilateral of the text group and is the output of text detection.
A global prime feature refers to a common feature shared by most component connections of a text group in an image. Usually it is the selected context information, for example in a direction of about 90 degrees.

したがって、画像における、特に自然風景画像におけるテキスト検出の性能を改善するための、新規なテキスト検出方法および装置が本開示で提案される。本発明のある態様によると、入力画像からコンポーネントを生成するコンポーネント生成ステップと、類似性要件を満たすコンポーネントをグループ化してコンポーネントグループを形成するコンポーネントグループ化ステップと、ひとつのコンポーネントグループにおける少なくとも二つの隣接するコンポーネントを含むコンポーネント接続を抽出するコンポーネント接続抽出ステップと、全てのコンポーネント接続の特徴を取得する特徴取得ステップと、特徴取得ステップにおいて取得された特徴に基づいて、コンポーネントグループをテキストグループと非テキストグループとに分類するコンポーネントグループ分類ステップと、テキストグループに基づいてテキスト領域を生成するテキスト領域生成ステップと、を含む方法が提供される。 Accordingly, a novel text detection method and apparatus for improving text detection performance in images, particularly natural landscape images, is proposed in this disclosure. According to an aspect of the present invention, a component generation step of generating a component from an input image, a component grouping step of grouping components satisfying the similarity requirement to form a component group, and at least two neighbors in one component group Component connection extraction step for extracting component connections including components to be performed, a feature acquisition step for acquiring features of all component connections, and a component group as a text group and a non-text group based on the features acquired in the feature acquisition step A method is provided that includes a component group classification step for classifying and a text region generation step for generating a text region based on the text group. It is.

本発明の主たる新規性は、コンポーネントグループ分類に存する。ひとつの画像のテキストのグローバルプライム特徴は抽出され、コンポーネントグループ分類において用いられる。単体で用いられるグローバル情報またはグループレベル特徴と組み合わされたグローバル情報は、テキスト検出の確度を改善することが期待されている。グローバルプライム特徴はいくつかの特徴から自動的に選択され、したがって、異なるシーンに適合可能である。 The main novelty of the present invention resides in component group classification. The global prime features of one image text are extracted and used in component group classification. Global information used alone or combined with group level features is expected to improve the accuracy of text detection. The global prime feature is automatically selected from several features and can therefore be adapted to different scenes.

本発明は、自然風景画像のなかのテキスト領域を見つけるために用いられる。それは、元の画像ファイルを入力として取得し、出力として多くの矩形（テキストグループのバウンディングボックス）の集合を生成する。それを従来技術と比べると、本発明によって、同程度の時間的コストで、精度および再現性の両方を改善できる。 The present invention is used to find a text region in a natural landscape image. It takes the original image file as input and generates a set of many rectangles (bounding boxes for text groups) as output. Compared to the prior art, the present invention can improve both accuracy and reproducibility at the same time cost.

本発明の実施の形態を実行するコンピュータシステムのハードウエア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer system which performs embodiment of this invention. テキスト検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of a text detection apparatus. テキスト検出装置によって行われるテキスト検出方法を示すフローチャートである。It is a flowchart which shows the text detection method performed by a text detection apparatus. 本発明の実施の形態に係るコンポーネントグループを分類する方法を示すフローチャートである。It is a flowchart which shows the method of classifying the component group which concerns on embodiment of this invention. 図５Ａから図５Ｃは、本発明の実施の形態に係る候補テキストコンポーネントの生成の説明例を示す。FIG. 5A to FIG. 5C show examples of generation of candidate text components according to the embodiment of the present invention. 図５Ａから図５Ｃは、本発明の実施の形態に係る候補テキストコンポーネントの生成の説明例を示す。FIG. 5A to FIG. 5C show examples of generation of candidate text components according to the embodiment of the present invention. 図５Ａから図５Ｃは、本発明の実施の形態に係る候補テキストコンポーネントの生成の説明例を示す。FIG. 5A to FIG. 5C show examples of generation of candidate text components according to the embodiment of the present invention. 本発明の実施の形態に係るコンポーネントグループ化結果の説明例を示す。The example of description of the component grouping result which concerns on embodiment of this invention is shown. 本発明の実施の形態に係るグローバルプライム特徴を取得する方法を示すフローチャートである。5 is a flowchart illustrating a method for acquiring a global prime feature according to an embodiment of the present invention. 本発明の実施の形態に係るコンポーネント接続の方向のグローバル分布を取得する方法を示すフローチャートである。6 is a flowchart illustrating a method for obtaining a global distribution of component connection directions according to an embodiment of the present invention. コンポーネント接続の方向のグローバル分布の説明例を示す。The example of global distribution of the direction of component connection is shown. グローバルプライム特徴の取得の説明例を示す。The example of acquisition of a global prime characteristic is shown. グローバルプライム特徴抽出において特徴を選択する方法を示すフローチャートである。It is a flowchart which shows the method of selecting a feature in global prime feature extraction. 所定の特徴に基づいてグローバルプライム特徴を取得する方法を示すフローチャートである。6 is a flowchart illustrating a method for acquiring a global prime feature based on a predetermined feature. 図１３Ａおよび図１３Ｂは、本発明の実施の形態に係るテキスト領域の生成の説明例を示す。FIG. 13A and FIG. 13B show an example of how a text area is generated according to the embodiment of the present invention. 図１３Ａおよび図１３Ｂは、本発明の実施の形態に係るテキスト領域の生成の説明例を示す。FIG. 13A and FIG. 13B show an example of how a text area is generated according to the embodiment of the present invention. 本発明の実施の形態に係るテキスト情報抽出方法を示すフローチャートである。It is a flowchart which shows the text information extraction method which concerns on embodiment of this invention. 本発明の実施の形態に係るテキスト情報抽出システムを示すブロック図である。It is a block diagram which shows the text information extraction system which concerns on embodiment of this invention.

上でリストされた図面を参照し、本セクションは特定の実施の形態ならびにその詳細な構成および動作を説明する。以下で説明される実施の形態は説明のみを目的とし、限定を目的としないものであることに注意されたい。したがって、実施の形態は本発明の範囲を限定せず、本発明の範囲内において種々の形態に変更可能である。当業者であれば、本明細書の教示に照らし、本明細書で説明される例示的実施の形態には多くの均等物が存在することを認識するであろう。 Referring to the drawings listed above, this section describes specific embodiments and their detailed construction and operation. It should be noted that the embodiments described below are for illustrative purposes only and are not intended to be limiting. Therefore, the embodiment does not limit the scope of the present invention, and can be changed to various forms within the scope of the present invention. Those skilled in the art will recognize that, in light of the teachings herein, there are many equivalents to the exemplary embodiments described herein.

図１は、本発明の実施の形態を実行するコンピュータシステムのハードウエア構成を示すブロック図である。 FIG. 1 is a block diagram showing a hardware configuration of a computer system that executes an embodiment of the present invention.

図１に示されるように、システムは、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、システムバス１０４、入力デバイス１０５、出力デバイス１０６、およびドライブ１０７を含むコンピュータ１００を少なくとも備える。例えば、コンピュータ１００は画像認識デバイスであってもよい。コンピュータ１００はひとつまたは複数のコンピュータを含んでもよく、複数のコンピュータはそれぞれがコンピュータ１００の対応する機能を個々に実装してもよいことに注意されたい。 As shown in FIG. 1, the system includes at least a computer 100 including a CPU 101, a RAM 102, a ROM 103, a system bus 104, an input device 105, an output device 106, and a drive 107. For example, the computer 100 may be an image recognition device. Note that the computer 100 may include one or more computers, each of which may individually implement the corresponding functionality of the computer 100.

ＣＰＵ１０１は、ＲＡＭ１０２またはＲＯＭ１０３に保持されるプログラムにしがたい、全体の処理を実行する。ＲＡＭ１０２は、ＣＰＵ１０１が本発明の実施の形態などの種々の処理を実行する場合に一時保持領域として用いられる。 The CPU 101 executes overall processing that is difficult to make a program held in the RAM 102 or the ROM 103. The RAM 102 is used as a temporary holding area when the CPU 101 executes various processes such as the embodiment of the present invention.

入力デバイス１０５は、ユーザがコンピュータ１００に種々のインストラクションを発行することを可能とするものであり、撮像デバイス（例えば、スキャナ、デジタルカメラ）やユーザ入力インタフェースやネットワークインタフェースを含む。 The input device 105 allows a user to issue various instructions to the computer 100, and includes an imaging device (for example, a scanner and a digital camera), a user input interface, and a network interface.

出力デバイス１０６は、ユーザが本発明のテキスト検出等を出力することを可能とするものであり、出力周辺インタフェースや表示デバイス（例えば、モニタ、ＣＲＴ、液晶ディスプレイ、またはグラフィックコントローラ）やプリンタを含む。 The output device 106 enables the user to output the text detection and the like of the present invention, and includes an output peripheral interface, a display device (for example, a monitor, a CRT, a liquid crystal display, or a graphic controller), and a printer.

ドライバ１０７は、ハードディスクやメモリカードや光学ディスク（例えば、ＣＤ−ＲＯＭやＤＶＤＲＯＭ）などの保持媒体を駆動するために使用される。例えば、画像データまたはテキスト検出処理を行うためのプログラムが保持媒体に保持され、ドライバ１０７によって駆動される。 The driver 107 is used to drive a holding medium such as a hard disk, a memory card, or an optical disk (for example, a CD-ROM or a DVD ROM). For example, a program for performing image data or text detection processing is held in a holding medium and is driven by the driver 107.

システムバス１０４は、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、入力デバイス１０５、出力デバイス１０６、およびドライバ１０７を接続する。システムバス１０４上でデータが通信される。本明細書で用いられる場合、「接続される（connected）」という用語は、ひとつ以上の媒介を通じて間接的にまたは直接的に、論理的にまたは物理的に接続されることを意味する。 The system bus 104 connects the CPU 101, RAM 102, ROM 103, input device 105, output device 106, and driver 107. Data is communicated on the system bus 104. As used herein, the term “connected” means logically or physically connected indirectly or directly through one or more intermediaries.

総じて、本発明のテキスト検出の入力は全ての種類の画像である。例えば、デジタルカメラやデジタルビデオカメラやセンサや走査デバイス（例えば、スキャナや多機能デバイス）などの撮像デバイスによって、画像を得ることができる。 In general, the text detection inputs of the present invention are all types of images. For example, an image can be obtained by an imaging device such as a digital camera, a digital video camera, a sensor, or a scanning device (for example, a scanner or a multifunction device).

図１に示されるシステムは単なる例であって、そのアプリケーションや利用を含む本発明を限定する意図は全くない。例えば、テキスト検出処理を行うためのプログラムが活性化されると、ＣＰＵ１０１は、図３から図４、図７から図８、図１１から図１２および図１４で説明されるステップなどの本発明で開示される全てのステップを実行することによって、入力デバイス１０５から入力画像を取得し、コンポーネントを抽出し、コンポーネントを検証し、テキスト領域を生成する。その後、ＣＰＵ１０１はシステムバス１０４を通じて、出力デバイス１０６にその結果を送る。その結果はＲＡＭ１０２に格納されてもよい。ネットワークインタフェースを介して、その結果をリモートコンピュータに送り、他のアプリケーションの用に供してもよい。 The system shown in FIG. 1 is merely an example and is not intended to limit the present invention including its applications and uses. For example, when a program for performing text detection processing is activated, the CPU 101 performs the present invention such as the steps described in FIGS. 3 to 4, 7 to 8, 11 to 12, and 14. By performing all the disclosed steps, an input image is obtained from the input device 105, the component is extracted, the component is verified, and a text region is generated. Thereafter, the CPU 101 sends the result to the output device 106 through the system bus 104. The result may be stored in the RAM 102. The result may be sent to the remote computer via the network interface for use by other applications.

さらに、図２および図１５に示される装置などの、本発明の装置の各部、デバイス、コンポーネントおよび／またはアセンブリを実行することが可能であり、それらはソフトウエア、ハードウエア、ファームウエアまたはそれらの任意の組み合わせを通じてテキスト検出を実行するよう構成される。 Furthermore, it is possible to execute parts, devices, components and / or assemblies of the apparatus of the present invention, such as the apparatus shown in FIGS. 2 and 15, which are software, hardware, firmware or their Configured to perform text detection through any combination.

図２は、テキスト検出装置の構成を示すブロック図である。図３は、図２に示されるテキスト検出装置によって行われるテキスト検出方法を示すフローチャートである。ＣＰＵ１０１は、ＲＡＭ１０２またはＲＯＭ１０３に保持されるプログラムおよび画像データを用いて、本発明の方法を実行する。 FIG. 2 is a block diagram showing the configuration of the text detection apparatus. FIG. 3 is a flowchart showing a text detection method performed by the text detection apparatus shown in FIG. The CPU 101 executes the method of the present invention using a program and image data held in the RAM 102 or the ROM 103.

図２に示されるように、本明細書では、テキスト検出装置２００は、画像入力部２０１と、コンポーネント生成部２０２と、コンポーネントグループ化部２０３と、コンポーネントグループ分類部２０４と、テキスト領域生成部２０５と、テキスト領域出力部２０６と、を備える。 As shown in FIG. 2, in this specification, the text detection device 200 includes an image input unit 201, a component generation unit 202, a component grouping unit 203, a component group classification unit 204, and a text region generation unit 205. And a text area output unit 206.

画像入力部２０１は、撮像デバイス２０７によって取得された自然風景画像を取得するよう、またはテキスト検出装置２００のために保持デバイス（例えば、ハードディスク）に保持される自然風景画像を取得するよう構成される。取得された画像は入力画像として見られ、例えば図５Ａに示される。 The image input unit 201 is configured to acquire a natural scenery image acquired by the imaging device 207 or to acquire a natural scenery image held in a holding device (for example, a hard disk) for the text detection device 200. . The acquired image is seen as an input image, for example shown in FIG. 5A.

コンポーネント生成部２０２は、入力画像から候補テキストコンポーネントの集合を生成するよう構成され、それは図３のＳ３０１で説明される。 The component generator 202 is configured to generate a set of candidate text components from the input image, which is described in S301 of FIG.

ステップＳ３０１では、コンポーネント生成部２０２は候補テキストコンポーネントの集合を生成する。コンポーネントは、通常、同様な色（非特許文献１）もしくは同様なグレースケール（非特許文献２）もしくは同様なストローク幅（非特許文献３）を伴う、または空間的に接続されたピクセルの集合である。コンポーネントの生成において、カラークラスタリングや適応二値化や形態的処理などのいくつかの方法を用いてもよい。本発明の例示的な実施の形態によると、コンポーネントは最大安定極値的領域（Maximally Stable Extremal Region、ＭＳＥＲ）（非特許文献４）に基づいて、グレースケール画像から生成される。図５Ｂに示されるように、コンポーネントは濃い灰色の矩形としてラベルされる。例えば、コンポーネント５０１および５０２は濃い灰色の矩形としてラベルされる。 In step S301, the component generation unit 202 generates a set of candidate text components. A component is usually a collection of pixels with similar colors (Non-Patent Document 1) or similar gray scales (Non-Patent Document 2) or similar stroke widths (Non-Patent Document 3) or spatially connected. is there. In the generation of components, several methods such as color clustering, adaptive binarization, and morphological processing may be used. According to an exemplary embodiment of the present invention, the component is generated from a grayscale image based on a Maximum Stable Extremal Region (MSER). As shown in FIG. 5B, the component is labeled as a dark gray rectangle. For example, components 501 and 502 are labeled as dark gray rectangles.

より良い結果のために、コンポーネントの生成の後にコンポーネントのフィルタリングを適用し、明らかに非テキストのコンポーネントを除去する。コンポーネントのフィルタリングにおいて共通して用いられる特徴は、コンポーネントのサイズ、コンポーネントのアスペクト比、そのバウンディングボックス内におけるコンポーネントピクセルの占有比であるコンポーネントの密度、コンポーネントのストローク幅の統計的特徴、コンポーネント領域から抽出されたテクスチャ特徴（例えば、ウェーブレット、ガボール（Gabor）、ＬＢＰ）、を含む。これらの特徴は、階層的フィルタとして用いられてもよく、または学習分類器への入力として用いられてもよい。本発明の例示的な実施の形態によると、サポートベクトルマシーン（以下、「ＳＶＭ」）分類器（非特許文献５）を用いてテキストコンポーネントと非テキストコンポーネントとを区別する。図５Ｃに示されるように、いくつかの非テキストコンポーネントはコンポーネントのフィルタリングの後に除去される。例えば、コンポーネント５０１および５０２は、それらが非テキストコンポーネントであるが故に除去される。 For better results, apply component filtering after component generation to remove apparently non-text components. Commonly used features in component filtering are extracted from component size, component aspect ratio, component density, which is the component pixel occupancy within its bounding box, statistical characteristics of component stroke width, and component region Texture features (eg, wavelet, Gabor, LBP). These features may be used as a hierarchical filter or as an input to a learning classifier. According to an exemplary embodiment of the present invention, a support vector machine (hereinafter “SVM”) classifier (Non-Patent Document 5) is used to distinguish between text components and non-text components. As shown in FIG. 5C, some non-text components are removed after component filtering. For example, components 501 and 502 are removed because they are non-text components.

コンポーネントグループ化部２０３は、コンポーネントをグループ化するよう構成され、これは図３のステップＳ３０２で説明される。 The component grouping unit 203 is configured to group components, which will be described in step S302 of FIG.

ステップＳ３０２では、コンポーネントグループ化部２０３は、類似性要件を満たす候補テキストコンポーネントを一緒に接続してコンポーネントグループを構築する。二つのコンポーネント間の類似性を記述する特徴値は、距離特徴、差分特徴および比特徴を含み、それらは例えば空間距離、グレースケール差分、色の差、境界コントラスト差、バウンディングボックスの高さの比、幅の比、ストローク幅の比、を含む。距離特徴の値は、コンポーネントの中心間の正規化されたユークリッド距離によって算出される。差分特徴の全ての値は、差分の絶対値を平均値で除すことによって算出される。比特徴の全ての値は、指定されたプロパティの最大値を指定されたプロパティの最小値で除すことによって算出される。コンポーネントがいずれのグループにも属しない場合、それらはノイズコンポーネントと判断され、除去される。 In step S302, the component grouping unit 203 constructs a component group by connecting candidate text components that satisfy the similarity requirement together. Feature values that describe the similarity between two components include distance features, difference features, and ratio features, such as spatial distance, gray scale difference, color difference, boundary contrast difference, bounding box height ratio. , Width ratio, stroke width ratio. The distance feature value is calculated by the normalized Euclidean distance between the centers of the components. All values of the difference feature are calculated by dividing the absolute value of the difference by the average value. All values of the ratio feature are calculated by dividing the maximum value of the specified property by the minimum value of the specified property. If the components do not belong to any group, they are considered noise components and are removed.

図６は、本発明の実施の形態に係るコンポーネントグループ化結果の説明例を示す。図６に示されるテキストコンポーネントは線によって接続され、テキストコンポーネントグループを構築するものとして見ることができる。しかしながら、期待される真のテキストグループの他に、図６に示される異なるテキストラインにある文字によって構成されるグループなどの、真のテキストグループを横断するグループが存在する。また、非テキストコンポーネントにより構成されるグループが依然としていくつか存在する。例えば、図６に示される窓の非テキストコンポーネントは、線で結ばれ、非テキストコンポーネントグループを構築する。別の例では、図６に示される窓および信号機の非テキストコンポーネントは、線で結ばれ、別の非テキストコンポーネントグループを構築する。したがって、コンポーネントグループ分類が必要である。 FIG. 6 shows an explanatory example of the component grouping result according to the embodiment of the present invention. The text components shown in FIG. 6 are connected by lines and can be viewed as building a text component group. However, in addition to the expected true text group, there are groups that traverse the true text group, such as groups composed of characters in different text lines shown in FIG. There are still some groups composed of non-text components. For example, the non-text components of the window shown in FIG. 6 are connected by lines to build a non-text component group. In another example, the non-text components of the window and traffic light shown in FIG. 6 are connected by a line to build another non-text component group. Therefore, component group classification is necessary.

コンポーネントグループ分類部２０４は、全てのコンポーネント接続の少なくともひとつの候補特徴の値の分布から取得された特徴に基づいて、コンポーネントグループを分類するよう構成される。本発明の例示的な実施の形態によると、全てのコンポーネント接続の少なくともひとつの候補特徴の値の分布から取得された特徴は、図３のステップＳ３０３で説明されるグローバルプライム特徴であってもよい。本明細書において、グローバルプライム特徴は、テキストグループの大抵のコンポーネント接続によって共有される共通特徴を指す。通常、それは選択されたコンテキスト情報であり、例えば約９０度の方向である。 The component group classification unit 204 is configured to classify component groups based on features acquired from a distribution of values of at least one candidate feature of all component connections. According to an exemplary embodiment of the present invention, the feature obtained from the distribution of values of at least one candidate feature of all component connections may be a global prime feature described in step S303 of FIG. . As used herein, a global prime feature refers to a common feature shared by most component connections of a text group. Usually it is the selected context information, for example in a direction of about 90 degrees.

ステップＳ３０３において、コンポーネントグループ分類部２０４はまずグローバルプライム特徴を取得し、次いで図４で説明されるようにグローバルプライム特徴を用いてコンポーネントグループをテキストグループと非テキストグループとに分類する。 In step S303, the component group classification unit 204 first acquires a global prime feature, and then classifies the component group into a text group and a non-text group using the global prime feature as described in FIG.

本発明の実施の形態に係るコンポーネントグループを分類する方法であってステップＳ３０３で実行される方法を示すフローチャートである図４を参照する。 Reference is made to FIG. 4 which is a flowchart showing the method for classifying component groups according to the embodiment of the present invention and executed in step S303.

ステップＳ４０１では、コンポーネントグループ分類部２０４はコンポーネントグループからコンポーネント接続を抽出する。このステップでは、コンポーネントグループはコンポーネント接続の集合として見られるので、本発明の例示的な実施の形態によると、ひとつのコンポーネントグループ内の二つの隣接するコンポーネントがひとつのコンポーネント接続として抽出される。本明細書において、コンポーネント接続は、ひとつのコンポーネントグループ内の少なくとも二つの隣接するコンポーネントを含むコンポーネント集合を指す。 In step S401, the component group classification unit 204 extracts a component connection from the component group. In this step, a component group is viewed as a collection of component connections, so that according to an exemplary embodiment of the invention, two adjacent components in a component group are extracted as a single component connection. In this specification, the component connection refers to a component set including at least two adjacent components in one component group.

ステップＳ４０２において、コンポーネントグループ分類部２０４は、各コンポーネント接続のテキスト信頼度を算出する。各コンポーネント接続のテキスト信頼値は、色の類似性（例えば、グレースケール差、色の差）やサイズの類似性や方向や空間距離や境界コントラスト差やバウンディングボックス高さ比や幅比やストローク幅比などの、コンポーネント接続から抽出される特徴の集合によって、算出される。 In step S402, the component group classification unit 204 calculates the text reliability of each component connection. The text confidence value for each component connection includes color similarity (eg, grayscale difference, color difference), size similarity, direction, spatial distance, boundary contrast difference, bounding box height ratio, width ratio, and stroke width. Calculated by a set of features extracted from component connections, such as ratios.

本発明の例示的な実施の形態によると、コンポーネント接続のテキスト信頼値は、テキストコンポーネントおよび非テキストコンポーネントの事前訓練された分類器によって取得されてもよい。事前訓練された分類器は、正負のサンプルに基づいて訓練される。正のサンプルは、ひとつのコンポーネントグループ内の二つの隣接するコンポーネントからなるコンポーネント接続である。負のサンプルは、グループ内のひとつのコンポーネントと、そのグループ外（例えば、他のグループ内またはノイズコンポーネント）のひとつのコンポーネントと、からなるコンポーネント接続である。二値分類器（例えば、ＳＶＭ）を用いてコンポーネント接続を分類し、次いで出力スコアはさらにテキスト信頼値に変換される。 According to an exemplary embodiment of the invention, text confidence values for component connections may be obtained by a pre-trained classifier for text and non-text components. A pretrained classifier is trained based on positive and negative samples. A positive sample is a component connection consisting of two adjacent components in one component group. A negative sample is a component connection consisting of one component in a group and one component outside that group (eg, in another group or a noise component). A binary classifier (eg, SVM) is used to classify the component connections, and then the output score is further converted to a text confidence value.

ステップＳ４０３では、コンポーネントグループ分類部２０４は全てのコンポーネント接続のグローバルプライム特徴を取得する。コンポーネント接続およびそのテキスト信頼値に基づいて、グローバルプライム特徴は、図７に示されるような特徴の集合から自動で選択された特徴または図１２に示されるような所定の特徴にしたがい取得される。 In step S403, the component group classification unit 204 acquires global prime features of all component connections. Based on the component connections and their text confidence values, global prime features are obtained according to features automatically selected from a set of features as shown in FIG. 7 or predetermined features as shown in FIG.

図７を参照すると、そこに示されるのは、ステップＳ４０３で実行されるグローバルプライム特徴を取得するための方法を示すフローチャートである。 Referring to FIG. 7, shown therein is a flowchart illustrating a method for obtaining a global prime feature performed in step S403.

ステップＳ７０１では、コンポーネント接続と、そのテキスト信頼値と、コンポーネント接続の候補特徴の集合と、を取得した後、コンポーネントグループ分類部２０４は候補特徴の集合における特徴ｉの値の分布を取得する。本実施の形態では、特徴ｉの値の分布は、グローバル分布と称される。候補特徴ｉは、コンポーネント接続の方向、コンポーネントの平均前景色、コンポーネントの平均背景色、平均境界コントラストまたはコンポーネント接続内の二つのコンポーネント間の距離などを含む。本明細書において、例えば図８において、コンポーネント接続の方向を特徴ｉとすることで方向のグローバル分布を取得することが説明される。 In step S701, after acquiring the component connection, its text confidence value, and the set of candidate features for component connection, the component group classification unit 204 acquires the distribution of the value of the feature i in the set of candidate features. In the present embodiment, the distribution of the value of feature i is referred to as a global distribution. Candidate features i include the direction of component connection, the average foreground color of the component, the average background color of the component, the average boundary contrast, or the distance between two components in the component connection. In the present specification, for example, in FIG. 8, it is described that a global distribution of directions is acquired by setting the direction of component connection as a feature i.

図８を参照すると、そこに示されるのは、ステップＳ７０１で実行されるコンポーネント接続の方向のグローバル分布を取得するための方法を示すフローチャートである。 Referring to FIG. 8, shown therein is a flowchart illustrating a method for obtaining a global distribution of component connection directions performed in step S701.

ステップＳ８０１において、コンポーネントグループ分類部２０４はコンポーネント接続から方向特徴を抽出することで、コンポーネント接続の方向を取得する。 In step S801, the component group classification unit 204 acquires the direction of component connection by extracting the direction feature from the component connection.

ステップＳ８０２において、コンポーネントグループ分類部２０４は、ひとつの画像における全てのコンポーネント接続に基づいて、方向特徴のテキスト信頼度重み付けヒストグラムを取得する。本発明の例示的な実施の形態によると、各方向において、ヒストグラムの値（ｙ軸）は、コンポーネント接続の特徴ｉの各値であってステップＳ４０２において算出されたそれらのテキスト信頼値によって重み付けされた特徴ｉの各値の頻度を表す。各方向におけるヒストグラム値は例えば図９に示され、方向の範囲は［０、１８０］度である。 In step S802, the component group classification unit 204 acquires a text reliability weighted histogram of direction features based on all component connections in one image. According to an exemplary embodiment of the present invention, in each direction, the histogram values (y-axis) are weighted by their respective text confidence values calculated in step S402, each value of component connection feature i. Represents the frequency of each value of the feature i. The histogram value in each direction is shown in FIG. 9, for example, and the range of the direction is [0, 180] degrees.

ステップＳ８０３において、コンポーネントグループ分類部２０４は図９に示されるようにスライド窓を用いて最上位のヒストグラムビンを見つける。分布はさらに幅「Ｄ」を伴う「Ｎ」個のビンへと定量化される。本発明の例示的な実施の形態によると、定量化誤差のグローバルプライム特徴への影響を最小化するために、幅「Ｄ」のスライド窓が用いられる。スライド中、窓の中のヒストグラム値（ｙ軸）が合計されて記録され、図９に示されるように最高値を有する窓が最上位のヒストグラムビンとされ、他のビンはそれに応じて決定される。 In step S803, the component group classification unit 204 finds the highest histogram bin using a sliding window as shown in FIG. The distribution is further quantified into “N” bins with a width “D”. According to an exemplary embodiment of the present invention, a sliding window of width “D” is used to minimize the impact of quantification errors on global prime characteristics. During the slide, the histogram values (y-axis) in the windows are summed and recorded, and the window with the highest value is the top histogram bin as shown in FIG. 9, and the other bins are determined accordingly. The

ステップＳ８０４において、コンポーネントグループ分類部２０４は最上位のヒストグラムビンに基づいてヒストグラムを定量化する。ヒストグラムは正規化され、その結果、分布の全てのビンのヒストグラム値（ｙ軸）の合計は１となる。ひとつの画像のグループ内のコンポーネント接続の特徴ｉの全ての値はひとつのグローバル分布に寄与する。 In step S804, the component group classification unit 204 quantifies the histogram based on the highest-order histogram bin. The histogram is normalized so that the sum of the histogram values (y-axis) of all bins in the distribution is 1. All values of component connection feature i within a group of images contribute to a global distribution.

ステップＳ７０２では、コンポーネントグループ分類部２０４はグローバル分布に基づいて、グローバルプライム特徴を選択する。図１０に示されるように、例えば、本明細書において、グローバル分布をＨ＝｛ｈ_ｉ、ｉ＝０、１、…Ｎ｝と表記する。ここで、
「Ｎ」は、分布内のビンの数である。
ｈ_ｔｏｐは分布の最上位のヒストグラムビンである。
ｈ_ｓｅｃは分布の二番目のヒストグラムビンである。
ｆ_ｔｏｐはｘ軸での最上位ビンの中心値である。
「ＣＬ」はコンポーネントの文字尤度の略である。単一のコンポーネントのＣＬは、ＳＶＭ分類器などの文字分類器によって得られる。一方、ビンの「ＣＬ」は、ビンのコンポーネント接続に含まれる全てのコンポーネントの平均スコアとして定義される。 In step S702, the component group classification unit 204 selects a global prime feature based on the global distribution. As shown in FIG. 10, for example, in the present specification, it referred to the global distribution _{H = {h i, i =} 0,1, ... N} and. here,
“N” is the number of bins in the distribution.
h _top is the highest histogram bin of the distribution.
h _sec is the second histogram bin of the distribution.
f _top is the center value of the most significant bin on the x-axis.
“CL” is an abbreviation for component character likelihood. A single component CL is obtained by a character classifier such as an SVM classifier. On the other hand, “CL” of the bin is defined as an average score of all the components included in the bin component connection.

コンポーネント接続の分布の濃度が所定のしきい値よりも大きい場合に、コンポーネント接続の分布から得られる特徴が選択される。本発明の例示的な実施の形態によると、コンポーネント接続のグローバル分布が以下の条件のうちのひとつを満たす場合に、グローバルプライム特徴が選択される。
１．分布の最上位のヒストグラムビンｈ_ｔｏｐが所定の第１しきい値よりも大きい。
２．分布の最上位のヒストグラムビンにおけるコンポーネント接続の平均文字尤度ＣＬ（ｈ_ｔｏｐ）が他のどのビンの
よりも大きい。
３．分布の最上位のヒストグラムビンと分布の二番目のヒストグラムビンとの値の比ｈ_ｔｏｐ／ｈ_ｓｅｃが所定の第２しきい値よりも大きい。 A feature derived from the component connection distribution is selected if the concentration of the component connection distribution is greater than a predetermined threshold. According to an exemplary embodiment of the present invention, a global prime feature is selected when the global distribution of component connections satisfies one of the following conditions:
1. The highest histogram bin h _top of the distribution is greater than a predetermined first threshold.
2. The average character likelihood CL (h _top ) of the component connection in the highest histogram bin of the distribution is
Bigger than.
3. A ratio h _top / h _sec of values of the highest histogram bin of the distribution and the second histogram bin of the distribution is larger than a predetermined second threshold value.

図１０は、本発明の実施の形態に係るグローバルプライム特徴を取得することの例を示す。分布が上述の要件のうちのひとつを満たす場合、ｆ_ｔｏｐはグローバルプライム特徴である。 FIG. 10 shows an example of acquiring a global prime feature according to an embodiment of the present invention. If the distribution meets one of the above requirements, f _top is a global prime feature.

ステップＳ７０３では、コンポーネントグループ分類部２０４は全ての特徴が処理されたか否かを判定する。他の特徴が残っている場合、処理はステップＳ７０１に戻り、候補特徴の集合の中の他の特徴のグローバル分布を取得する。そうでなければ、処理はステップＳ７０４に進んでもよい。 In step S703, the component group classification unit 204 determines whether all the features have been processed. If other features remain, the process returns to step S701 to obtain a global distribution of other features in the candidate feature set. Otherwise, the process may proceed to step S704.

ステップＳ７０４では、コンポーネントグループ分類部２０４は、最も重要なグローバルプライム特徴を伴う特徴を選択する。異なる特徴のグローバルプライム特徴が比較され、最も重要なグローバルプライム特徴を伴う特徴が選択される。 In step S704, the component group classification unit 204 selects the feature with the most important global prime feature. The global prime features of the different features are compared and the feature with the most important global prime feature is selected.

本発明の例示的な実施の形態によると、図１１は、グローバルプライム特徴抽出において特徴を選択する方法であってステップＳ７０４で実行される方法を示すフローチャートである。 According to an exemplary embodiment of the present invention, FIG. 11 is a flowchart illustrating a method for selecting features in global prime feature extraction and performed in step S704.

ステップＳ１１０１において、異なる特徴のグローバル分布は変わりうるので、グローバルプライム特徴は特徴に関連するであることは注意されるべきである。本発明の例示的な実施の形態によると、グローバルプライム特徴を取得するために用いられる特徴は以下の特徴のうちの少なくともひとつであってもよい。方向、コンポーネントの色（例えば、平均前景色、平均背景色）、コンポーネント接続内の二つのコンポーネントの間の距離、平均境界コントラスト。 It should be noted that in step S1101, the global prime feature is associated with a feature because the global distribution of different features can change. According to an exemplary embodiment of the present invention, the feature used to obtain the global prime feature may be at least one of the following features: Direction, component color (eg, average foreground color, average background color), distance between two components in a component connection, average boundary contrast.

ステップＳ１１０２では、コンポーネントグループ分類部２０４は異なる特徴のグローバル分布に基づいて、グローバルプライム特徴を選択する。本明細書において、ステップＳ１１０３で示されるようにグローバルプライム特徴が存在する場合、特徴を選択するための方法はｈ_ｔｏｐ／ｈ_ｓｅｃの値に基づく。比が大きいほど、グローバルプライム特徴はより重要となる。したがって、最も大きな比ｈ_ｔｏｐ／ｈ_ｓｅｃを伴う特徴がグローバルプライム特徴として選択されるであろう。すなわち、コンポーネントグループ分類部２０４は、ステップＳ１１０４に示されるように、ｈ_ｔｏｐ／ｈ_ｓｅｃの最大値を伴うグローバル分布を選択する。 In step S1102, the component group classification unit 204 selects a global prime feature based on the global distribution of different features. In this specification, if there is a global prime feature as shown in step S1103, the method for selecting the feature is based on the value of h _top / h _sec . The larger the ratio, the more important the global prime characteristics. Therefore, the feature with the largest ratio h _top / h _sec will be selected as the global prime feature. That is, the component group classification unit 204 selects a global distribution with the maximum value of h _top / h _sec as shown in step S1104.

候補特徴をひとつずつ処理した後、異なる特徴にしたがうグローバルプライム特徴が比較され、ステップＳ７０４に示されるように最も重要なグローバルプライム特徴を伴う特徴が選択される。したがって、ステップＳ４０３の出力は選択された特徴にしたがうグローバルプライム特徴である。 After processing the candidate features one by one, the global prime features according to the different features are compared and the feature with the most important global prime feature is selected as shown in step S704. Therefore, the output of step S403 is a global prime feature according to the selected feature.

図１２は、所定の特徴に基づいてグローバルプライム特徴を取得する別の方法を示すフローチャートである。 FIG. 12 is a flowchart illustrating another method for obtaining a global prime feature based on a predetermined feature.

図１２を参照すると、例えばナンバープレート認識において所定の特徴を取得した後、テキストグループは基本的に水平方向に沿っており、したがってコンポーネント接続の方向が所定の特徴として用いられる。他の例で、道路標識認識では、多くの場合、テキストおよび周囲の色は一様であり、したがってそれが所定の特徴として用いられる。所定の特徴がコンポーネント接続の方向である場合、コンポーネントグループ分類部２０４は、ステップＳ１２０１およびステップＳ１２０２で示されるように、ステップＳ７０１およびステップＳ７０２で説明された方法と同じ方法を用いて、特徴のグローバル分布を取得し、グローバル分布に基づいてグローバルプライム特徴を選択する。 Referring to FIG. 12, for example, after obtaining a predetermined feature in license plate recognition, the text group is basically along the horizontal direction, so the direction of component connection is used as the predetermined feature. In another example, in road sign recognition, the text and surrounding colors are often uniform and therefore are used as predetermined features. When the predetermined feature is the direction of component connection, the component group classification unit 204 uses the same method as described in step S701 and step S702, as shown in step S1201 and step S1202, to perform the feature globalization. Obtain a distribution and select global prime features based on the global distribution.

したがって、ステップＳ４０３の出力は所定の特徴にしたがうグローバルプライム特徴である。 Therefore, the output of step S403 is a global prime feature according to a predetermined feature.

特定の画像について、グローバルプライム特徴の例は以下の通りである。テキストグループ内の大抵のコンポーネント接続は同様な方向に沿ったものであり、その方向は［−１５、１５］度に属する。または、テキストグループ内の大抵のコンポーネント接続は同様の色／グレースケールのものであり、黒色値またはグレースケール値は［０、３０］に属する。 Examples of global prime features for a particular image are as follows: Most component connections within a text group are along a similar direction, which belongs to [-15, 15] degrees. Or, most component connections in a text group are of similar color / grayscale, with black or grayscale values belonging to [0, 30].

ステップＳ４０４において、コンポーネントグループ分類部２０４は、コンポーネント接続のテキスト信頼度を調整する。コンポーネント接続のテキスト信頼値は、ステップＳ４０３で抽出されたグローバルプライム特徴に基づいて調整される。グローバルプライム特徴にしたがうコンポーネント接続は増やされ、一方グローバルプライム特徴にしたがわないコンポーネント接続は減らされる。コンポーネント接続のテキスト信頼値を調整することによって調整処理が実行される。 In step S404, the component group classification unit 204 adjusts the text reliability of component connection. The text confidence value of the component connection is adjusted based on the global prime feature extracted in step S403. Component connections that follow the global prime feature are increased, while component connections that do not follow the global prime feature are reduced. Adjustment processing is performed by adjusting the text confidence value of the component connection.

ステップＳ７０１またはステップＳ１２０１で取得されたグローバル分布は既に正規化されているので、それは所定の特徴の確率分布として扱われ、コンポーネント接続のテキスト信頼値は確率的推論に基づいて調整される。 Since the global distribution obtained in step S701 or step S1201 has already been normalized, it is treated as a probability distribution of a predetermined feature, and the text confidence value of the component connection is adjusted based on the probabilistic reasoning.

コンポーネント接続がグローバルプライム特徴にしたがう確率をｈ_ｔｏｐとすると、コンポーネント接続のテキスト信頼値はグローバルプライム特徴からのその偏差にしたがって調整されるべきである。正規化された偏差は
と表される。ここで、
ｆ_ｃｕｒは現在のコンポーネント接続の特徴値である。
「Ｄ」は特徴分布のビン幅である。 Given that the probability that a component connection follows the global prime feature is h _top , the text confidence value of the component connection should be adjusted according to its deviation from the global prime feature. The normalized deviation is
It is expressed. here,
f _cur is a feature value of the current component connection.
“D” is the bin width of the feature distribution.

例えば、グローバルプライム特徴が方向についてのものであり、ｆ_ｔｏｐがｘ軸での最上位ビンの中心であって９０度とし、ｆ_ｃｕｒが現在のコンポーネント接続の方向であって１０度とするとき、二つの間の偏差は８０度である。 For example, if the global prime feature is about direction and f _top is the center of the top bin on the x axis and is 90 degrees, and f _cur is the current component connection direction and 10 degrees, The deviation between the two is 80 degrees.

コンポーネント接続がグローバルプライム特徴にしたがわない確率を１−ｈ_ｔｏｐとすると、コンポーネント接続のテキスト信頼値は変えられるべきではない。 If the probability that a component connection does not follow the global prime feature is 1-h _top , the text confidence value of the component connection should not be changed.

上述の二つのケースをまとめると、コンポーネント接続のテキスト信頼値は以下の様に調整される。
ここで、
ＴＣ_ｏｒｇは、現在のコンポーネント接続の元のテキスト信頼値であり、ステップＳ４０２においてコンポーネント接続分類器によって提供される。
ＴＣ_ａｄｊは現在のコンポーネント接続の調整されたテキスト信頼値である。
「ｗ」は、上述の二つのケースを統合する調整ファクタである。
βは調整パラメータである。
ｃは補償パラメータである。 To summarize the above two cases, the text confidence value of the component connection is adjusted as follows.
here,
TC _org is the original text confidence value of the current component connection and is provided by the component connection classifier in step S402.
TC _adj is the adjusted text confidence value of the current component connection.
“W” is an adjustment factor that integrates the above two cases.
β is an adjustment parameter.
c is a compensation parameter.

コンポーネントグループ分類において、画像内のコンポーネント接続のグローバルプライム特徴が抽出され、ローカル情報（グループレベル特徴）と組み合わされると見ることができよう。 It can be seen that in component group classification, global prime features of component connections in an image are extracted and combined with local information (group level features).

ステップＳ４０５において、コンポーネントグループ分類部２０４はコンポーネントグループをテキストグループまたは非テキストグループに分類する。コンポーネントグループは、グループのテキスト信頼値に基づいて、テキストグループと非テキストグループとに分類される。非テキストグループは除去され、一方、テキストグループはテキスト領域を復元するために残される。一方で、コンポーネントグループのテキスト信頼値ＴＣ_ｇは、そのなかのコンポーネント接続の平均テキスト信頼値である。ＴＣ_ａｄｊ ^ｉは、グループ内のｉ番目のコンポーネント接続の調整されたテキスト信頼値であり、ステップＳ４０４で取得される。
ここで、Ｍはグループ内のコンポーネント接続の量である。 In step S405, the component group classification unit 204 classifies the component group into a text group or a non-text group. Component groups are classified into text groups and non-text groups based on the group's text confidence value. Non-text groups are removed, while text groups are left to restore text areas. On the other hand, the text confidence value TC _g of component group is the average text confidence value component connections therein. TC _adj ⁱ is the adjusted text confidence value of the i th component connection in the group and is obtained in step S404.
Here, M is the amount of component connections in the group.

一方、グループ内のコンポーネントの、サイズ、色、およびストローク幅の点での分散やグループ内のコンポーネントの空間配置などのグループレベル特徴を抽出することで、グループの別のテキスト信頼値を測定し、これはＴＣ_ｆと表される。したがって、グループの最終的なテキスト信頼値はそれら二つの重み付け和として定義される。
ＴＣ＝ωＴＣ_ｇ＋（１−ω）ＴＣ_ｆ
ここで、０≦ω≦１ On the other hand, by extracting group-level features such as dispersion in terms of size, color, and stroke width of the components in the group and the spatial arrangement of the components in the group, measure another text confidence value of the group This is represented as TC _f . Thus, the final text confidence value for the group is defined as the weighted sum of those two.
TC = ωTC _g + (1−ω) TC _f
Where 0 ≦ ω ≦ 1

ω＝０の場合、グループ分類においてグループレベル特徴のみが用いられる。これは、グループレベル特徴および分類器に基づいてコンポーネントグループをいかに検証するかを開示する従来技術と同じである。 When ω = 0, only group level features are used in group classification. This is the same as the prior art that discloses how to verify component groups based on group level features and classifiers.

ω＝１の場合、グローバルプライム特徴情報のみが用いられ、これは上述の最初の例である。０＜ω＜１の場合、グローバルプライム特徴はグループ分類においてグループレベル特徴と統合され、これは上述の別の例である。 When ω = 1, only global prime feature information is used, which is the first example described above. If 0 <ω <1, the global prime feature is integrated with the group level feature in the group classification, which is another example described above.

所定値よりも高いテキスト信頼度を伴うコンポーネントグループはテキストグループとして判定されて残り、そうでなければ除去される。 A component group with a text reliability higher than a predetermined value remains determined as a text group, otherwise it is removed.

図１３Ａおよび図１３Ｂは、本発明の実施の形態に係るテキスト領域の生成の例を示す。 13A and 13B show an example of generating a text area according to the embodiment of the present invention.

図１３Ａに示される通り、コンポーネントグループ分類の後、ノイズコンポーネントグループ（例えば、図６に示されるような、窓や信号機によって構築される非テキストコンポーネントグループ）および偽コンポーネントグループ（例えば、図６に示されるような、異なるテキストラインの文字によって構成されるテキストコンポーネントグループ）はステップＳ３０３の後に除去される。グローバルプライム特徴によってテキストグループと非テキストグループとの区別が拡大されることを理解することができる。したがって、コンポーネントグループ分類の最終的な結果はよい効果である。 As shown in FIG. 13A, after component group classification, a noise component group (eg, a non-text component group built by windows or traffic lights as shown in FIG. 6) and a false component group (eg, shown in FIG. 6). The text component group composed of characters of different text lines) is removed after step S303. It can be seen that the global prime feature extends the distinction between text groups and non-text groups. Therefore, the final result of component group classification is a good effect.

テキスト領域生成部２０５は、残ったテキストグループに基づいてテキスト領域を生成するよう構成され、これは図３のステップＳ３０４で説明される。 The text region generation unit 205 is configured to generate a text region based on the remaining text group, which will be described in step S304 of FIG.

ステップＳ３０４では、テキスト領域生成部２０５は残ったコンポーネントグループをテキスト領域に変換する。テキスト領域は、通常、グループのコンポーネントの矩形とグループのテキストラインとに基づいて生成される。このステップのひとつの例示的な実装は以下の通りである。 In step S304, the text area generation unit 205 converts the remaining component group into a text area. The text region is usually generated based on the group component rectangle and the group text line. One exemplary implementation of this step is as follows.

第１に、グループ内の全てのコンポーネントの中心の最小二乗回帰によりテキストラインが得られる。
次いで、テキストラインの平行シフトによって上限ラインを決定することで、グループ内のコンポーネントの極上点をカバーする。そして、同様に下限ラインを決める。
最後に、グループ内の最も左のコンポーネントおよび最も右のコンポーネントの矩形により左限ラインおよび右限ラインを決める。 First, a text line is obtained by least squares regression of the centers of all components in the group.
The upper limit line is then determined by parallel shifting of the text lines to cover the top points of the components in the group. Similarly, the lower limit line is determined.
Finally, the left limit line and the right limit line are determined by the rectangles of the leftmost component and the rightmost component in the group.

図１３Ｂに示されるように、検出されたテキスト領域は生成されたテキスト領域の例である。 As shown in FIG. 13B, the detected text area is an example of a generated text area.

テキスト領域出力部２０６は、情報抽出や情報認識などのさらなる画像処理のために、テキスト領域の結果を出力デバイス１０６（例えば、画像認識デバイス）に出力するよう構成される。 The text area output unit 206 is configured to output the result of the text area to the output device 106 (for example, an image recognition device) for further image processing such as information extraction and information recognition.

グループ分類において、画像内のテキストのグローバルプライム特徴が抽出され、ローカル情報（グループレベル特徴）と組み合わされる。グローバルプライム特徴は、特徴の分布に基づいて選択される。したがって、本発明は、さまざまな風景に適合可能である。 In group classification, global prime features of text in an image are extracted and combined with local information (group level features). Global prime features are selected based on the distribution of features. Therefore, the present invention can be adapted to various landscapes.

図１４は、本発明の実施の形態に係るテキスト情報抽出方法を示す。本発明は、カメラで撮った画像や動画からテキスト情報を自動的に抽出する際に使用可能である。図１４に示されるように、ブロック１４０１では、入力画像または入力動画からのテキスト領域は、図３−図１３を参照して説明されたテキスト検出方法にしたがうテキスト検出方法を用いて検出される。 FIG. 14 shows a text information extraction method according to the embodiment of the present invention. The present invention can be used when text information is automatically extracted from images and moving images taken with a camera. As shown in FIG. 14, in block 1401, a text region from an input image or an input moving image is detected using a text detection method according to the text detection method described with reference to FIGS.

ブロック１４０２では、検出されたテキスト領域からテキストが抽出される。オプションで、入力動画からテキスト領域を検出する場合、入力動画のテキストはブロック１４０４に示されるように追跡される。 At block 1402, text is extracted from the detected text region. Optionally, when detecting a text region from the input video, the text of the input video is tracked as shown in block 1404.

ブロック１４０３では、抽出されたテキストに対してテキスト認識を行うことで、テキスト情報を取得する。 In block 1403, text information is acquired by performing text recognition on the extracted text.

図１５は、本発明の実施の形態に係るテキスト情報抽出システムを示すブロック図である。 FIG. 15 is a block diagram showing a text information extraction system according to the embodiment of the present invention.

図１５を参照すると、本発明の実施の形態に係るテキスト情報抽出システム１５００のブロック図が示される。システム１５００は、図１４を参照して説明された方法を実装するために用いられる。 Referring to FIG. 15, a block diagram of a text information extraction system 1500 according to an embodiment of the present invention is shown. System 1500 is used to implement the method described with reference to FIG.

図１５に示されるように、システム１５００は、テキスト検出装置１５０１と、抽出装置１５０２と、認識装置１５０３と、を備える。 As illustrated in FIG. 15, the system 1500 includes a text detection device 1501, an extraction device 1502, and a recognition device 1503.

テキスト検出装置１５０１は、入力画像または入力動画からテキスト領域を検出するよう構成され、図２に関して説明された装置２００と同じものである。 Text detection device 1501 is configured to detect a text region from an input image or input video and is the same as device 200 described with respect to FIG.

抽出装置１５０２は検出されたテキスト領域からテキストを抽出するよう構成される。
認識装置１５０３は、抽出されたテキストを認識することでテキスト情報を取得するよう構成される。 Extractor 1502 is configured to extract text from the detected text region.
The recognition device 1503 is configured to acquire text information by recognizing the extracted text.

オプションで、システム１５００はさらに追跡装置１５０４を備える。追跡装置１５０４は、テキスト検出装置１５０１が入力動画からテキスト領域を検出するよう構成される場合、入力動画においてテキストを追跡するよう構成される。 Optionally, system 1500 further comprises a tracking device 1504. The tracking device 1504 is configured to track text in the input video if the text detection device 1501 is configured to detect a text region from the input video.

図２および図１５に関して上述された各部および装置は種々のステップを実装するための例示的なおよび／または好適なモジュールであることは理解されるであろう。モジュールは、ハードウエアユニット（プロセッサや特定用途向け集積回路など）および／またはソフトウエアモジュール（コンピュータプログラムなど）である。上では網羅的に種々のステップを実装するためのモジュールを説明したわけではない。しかしながら、所定の処理を実行するステップがある場合には、同じ処理を実装するための対応する機能モジュールまたはユニット（ハードウエアおよび／またはソフトウエアにより実装される）が存在する。上述のおよび後述のステップの組み合わせおよびこれらのステップに対応するユニットは、それらを構成要素とする技術的解が完全で適用可能である限り、本発明の開示に含まれる。 It will be appreciated that the parts and devices described above with respect to FIGS. 2 and 15 are exemplary and / or suitable modules for implementing the various steps. Modules are hardware units (such as processors and application specific integrated circuits) and / or software modules (such as computer programs). The above is not an exhaustive description of modules for implementing the various steps. However, if there are steps to perform a predetermined process, there is a corresponding functional module or unit (implemented by hardware and / or software) for implementing the same process. Combinations of the steps described above and below and units corresponding to these steps are included in the disclosure of the present invention so long as the technical solution comprising them is complete and applicable.

Claims

A text detection method for detecting a text area of an image, comprising:
A component generation step for generating a component from the input image;
A component grouping step of grouping the components meeting similarity requirements to form a component group;
A component connection extraction step for extracting a component connection including at least two adjacent components in one component group;
A feature acquisition step of acquiring all component connection features;
Based on the feature of the component connection in one component group and the feature of the component connection in another component group different from the component group acquired in the feature acquisition step, the component group is a text group. Component group classification step for classifying into non-text groups;
A text region generating step for generating a text region based on the text group.

The method of claim 1, wherein the feature is obtained from a distribution of values of at least one of the candidate features for all component connections.

The component group classification step further includes a text confidence value calculation step of calculating a text confidence value of each component connection, and the component group classification step converts the component group into a text group or a non-text group based on the text confidence value. The method according to claim 1 or 2, wherein the classification is performed.

The component group classification step further includes
A text confidence value adjustment step of adjusting the text confidence value of the component connection according to the feature acquired in the feature acquisition step;
The component group classification step may be configured to classify the component group as a text group or non-based based on a text confidence value of the component group and obtained from an adjusted text confidence value of all component connections in the group. The method of claim 3, wherein the method is classified into text groups.

The method of claim 2, wherein the feature is acquired in the feature acquisition step when a value of a histogram bin of the distribution is greater than a predetermined threshold.

The feature is acquired in the feature acquisition step when the average character likelihood of the component connection in the highest histogram bin of the distribution is greater than the average character likelihood of the component connection in any other histogram bin. The method described in 1.

The feature is acquired in the feature acquisition step when a ratio of values of the highest histogram bin of the distribution and a second histogram bin of the distribution is greater than a predetermined threshold. the method of.

The method of claim 2, wherein the candidate feature of all component connections is a predetermined feature or a set of selected features.

The method of claim 2, wherein the candidate feature of all component connections is at least one of the following.
(1) Component connection direction,
(2) The average foreground color of the component,
(3) The average background color of the component,
(4) average boundary contrast, and (5) distance between two components in the component connection.

9. The method of claim 8, wherein the selected feature has a maximum value ratio of a top histogram bin and a second histogram bin.

4. The text confidence value of the component connection is calculated from a set of features extracted from the component connection and obtained by a pretrained classifier of text and non-text components. Method.

5. The method of claim 4, wherein the text confidence value of a component group is calculated as an average text confidence value of all component connections in it.

The method of claim 4, wherein the text confidence value of a component group is calculated as a weighted value of an average text confidence value of all component connections in the component group and a text confidence value determined based on a group level feature.

The method of claim 13, wherein the group level feature is at least one of the following:
(1) Distribution of components in the group,
(2) component size, color or stroke width,
(3) Spatial arrangement of components in the group.

A text detection device for detecting a text area of an image,
A component generator configured to generate components from an input image;
A component grouping unit configured to group the components that meet similarity requirements to form a component group;
The component connection including at least two adjacent components in one component group is extracted, the characteristics of all component connections are acquired, and the acquired characteristics of the component connection in one component group and the component group are A component group classifier configured to classify the component group into a text group and a non-text group based on the characteristics of component connections in different other component groups;
A text region generation unit configured to generate a text region based on the text group.

The apparatus of claim 15, wherein the feature is obtained from a distribution of values of at least one of the candidate features of all component connections.

The component group classification unit is further configured to calculate a text confidence value of each component connection, and the component group classification unit classifies the component group into a text group or a non-text group based on the text confidence value. The apparatus according to 15 or 16.

The component group classifier is further configured to adjust the text confidence value of a component connection according to the acquired characteristics, wherein the component group text confidence value is adjusted for all component connections in the group. The apparatus of claim 17, wherein the component group is classified into a text group or a non-text group based on a text confidence value obtained from a text confidence value.

A text region is detected from an input image or an input video using the text detection method according to any one of claims 1 to 14, the text is extracted from the detected text region, and extracted Recognizing text and obtaining text information;

The method according to claim 19, further comprising tracking the text in the input video when detecting the text region from the input video using the text detection method according to claim 1. .

The text detection device according to any one of claims 15 to 18 configured to detect a text region from an input image or an input moving image, and an extraction device configured to extract text from the detected text region. And a recognition device configured to acquire text information by recognizing the extracted text.

The system of claim 21, wherein if the text detection device is configured to detect the text region from the input video, the system further comprises a tracking device configured to track the text in the input video.