JP2016151978A

JP2016151978A - Image processing apparatus and image processing program

Info

Publication number: JP2016151978A
Application number: JP2015030162A
Authority: JP
Inventors: 長尾　景則; Kagenori Nagao; 景則長尾
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2015-02-19
Filing date: 2015-02-19
Publication date: 2016-08-22

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus that can more accurately specify partial images describing one character compared with a case of specifying the partial images by using information indicating only characterness in areas.SOLUTION: An image processing apparatus comprises: detection means that detects partial images including characteristics as a character from images as character candidate areas; extraction means that extracts the inclusion relationship between the partial images; and specification means that specifies the partial images describing one character by using information indicating characterness in areas in the partial images and the inclusion relationship.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

特許文献１には、画像データに文字列以外の背景画像が含まれ、また文字質が劣悪な場合であっても文字領域を正確に決定できる画像の文字領域決定方法を提供することを目的とし、所定の配置をなし、且つ同じ大きさの文字からなる文字列を含む画像を入力して２値化処理し、上記処理データに含まれる全ての連結画素の中から、その特徴が所定の条件を満足する文字候補を抽出し、上記文字候補同士の位置関係が上記所定の配置をなすような文字候補の全てを含む外接矩形領域を上記画像の文字領域と決定するように構成されており、上記構成により、常に正確に文字領域を決定できることが開示されている。 Patent Document 1 aims to provide a method for determining a character area of an image that can accurately determine a character area even if the image data includes a background image other than a character string and the character quality is poor. , Input an image including a character string composed of characters of the same size and having a predetermined arrangement, and perform binarization processing, and among all the connected pixels included in the processing data, the characteristic is a predetermined condition And a circumscribed rectangular region including all the character candidates whose positional relationship between the character candidates forms the predetermined arrangement is determined as the character region of the image, It is disclosed that a character region can always be accurately determined by the above configuration.

特許文献２には、２次元面又は３次元空間中に存在する文字を含む情景を濃淡画像として入力し、情景画像の領域分割処理と各領域の濃度差評価により文字線に対応する可能性の高い箇所を領域として検出し、互いに近傍に存在する領域の組み合わせを文字パターン候補として抽出し、前記文字パターン候補と、文字認識用辞書中に格納されている各認識対象文字のカテゴリの標準パターンとの距離から類似度を計算し、１つでも閾値より大きい類似度を与えるカテゴリがあるか否かで、前記文字パターン候補が文字に対応するか否かの判定を行い、文字であると判定された場合には、文字と判定された文字パターン候補を高類似度パターンとして抽出し、前記高類似度パターンにおいて最も高い類似度を与えるカテゴリを文字認識結果とする文字認識処理方式において、前記互いに近傍に存在する領域の組み合わせにより得られる文字パターン候補から得られた高類似度パターンから構成される集合について、前記集合に含まれる各高類似度パターンが対応するカテゴリの標準パターンがもつ２次元図形としての特徴と前記各項類似度パターンがもつ２次元図形としての特徴との一致の程度を、前記集合に含まれる高類似度パターン間で比較し、一致の程度の高い高類似度パターンの類似度を高くし、そうでないものの類似度を低くする更新処理を反復的に行い、最終的に最大の類似度を与える高類似度パターンを文字パターンとして抽出し、その最大の類似度を与えるカテゴリを文字認識結果として出力することが開示されている。 In Patent Document 2, a scene including characters existing in a two-dimensional plane or a three-dimensional space is input as a grayscale image, and it is possible to correspond to a character line by area division processing of the scene image and density difference evaluation of each area. A high part is detected as a region, a combination of regions existing in the vicinity is extracted as a character pattern candidate, and the character pattern candidate and a standard pattern of each recognition target character category stored in the character recognition dictionary The similarity is calculated from the distance of the character, and it is determined whether or not the character pattern candidate corresponds to a character based on whether or not there is a category that gives a similarity greater than a threshold value. If the character pattern candidate is a character, the character pattern candidate determined as a character is extracted as a high similarity pattern, and the category that gives the highest similarity in the high similarity pattern is taken as the character recognition result. In the character recognition processing method, for a set composed of high similarity patterns obtained from character pattern candidates obtained by combinations of the regions existing in the vicinity, categories corresponding to the high similarity patterns included in the set The degree of coincidence between the feature as a two-dimensional figure possessed by the standard pattern and the feature as the two-dimensional figure possessed by each item similarity pattern is compared between the high similarity patterns included in the set, and the degree of coincidence The high-similarity pattern with a high degree of similarity is increased, and the renewal process that lowers the degree of similarity is not performed repeatedly. Finally, the high-similarity pattern that gives the maximum similarity is extracted as a character pattern. It is disclosed that a category that gives the maximum similarity is output as a character recognition result.

非特許文献１には、情景内文字は影や反射等の影響で本来文字が備えている画像的特徴を失っている場合があり、候補領域単体では文字／非文字の判定が困難な場合があることを課題とし、単一候補領域での文字／非文字判定に続いて、候補領域２つ組、候補領域３つ組、のような領域間の特徴を用いる識別を併用して文字候補を絞り込んでいき、候補領域２つ組での識別では領域間の包含関係も考慮して判定することが開示されている。 In Non-Patent Document 1, there are cases where characters in a scene have lost the image characteristics inherent to the characters due to the influence of shadows, reflections, etc., and it is sometimes difficult to determine the character / non-character in the candidate area alone. A character candidate is identified by combining identification using features between regions such as a candidate region triplet and a candidate region triplet following character / non-character determination in a single candidate region. It is disclosed that the determination is made in consideration of the inclusion relationship between the areas in the narrowing down and the identification by the candidate area pair.

特開平０８−３３９４２１号公報JP 08-339421 A 特許第２９７９０８９号公報Japanese Patent No. 2979089

Ｌ．Ｎｅｕｍａｎｎ，Ｊ．Ｍａｔａｓ， “ＴｅｘｔＬｏｃａｌｉｚａｔｉｏｎｉｎＲｅａｌ−ｗｏｒｌｄＩｍａｇｅｓｕｓｉｎｇＥｆｆｉｃｉｅｎｔｌｙＰｒｕｎｅｄＥｘｈａｕｓｔｉｖｅＳｅａｒｃｈ”，ｉｎＩＣＤＡＲ２０１１，ｐ．６８７−６９１L. Neumann, J.M. Matas, “Text Localization in Real-World Images using Efficiently Pruned Exclusive Search”, in ICDAR 2011, p. 687-691

画像から１つの文字が記載されている部分画像を特定する場合において、領域の文字らしさを示す情報を用いることが行われている。
本発明は、単に領域の文字らしさを示す情報を用いて１つの文字が記載されている部分画像を特定する場合に比べて、より正確に特定するようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 When specifying a partial image in which one character is described from an image, information indicating the character of a region is used.
The present invention provides an image processing apparatus and an image processing program that can be specified more accurately than when a partial image in which one character is described is simply specified using information indicating the character of an area. The purpose is to do.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、画像から文字としての特徴を備える部分画像を文字候補領域として検出する検出手段と、前記部分画像間の包含関係を抽出する抽出手段と、前記部分画像内の領域の文字らしさを示す情報と前記包含関係を用いて、１つの文字が記載されている部分画像を特定する特定手段を具備することを特徴とする画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
According to the first aspect of the present invention, there is provided a detecting means for detecting a partial image having character characteristics from an image as a character candidate area, an extracting means for extracting an inclusion relationship between the partial images, and a character in an area in the partial image. An image processing apparatus comprising: specifying means for specifying a partial image in which one character is described using information indicating the likelihood and the inclusion relation.

請求項２の発明は、前記抽出手段は、前記包含関係を示す木構造を構築し、前記特定手段は、前記画像と前記部分画像から該部分画像内の領域の特徴を抽出する第２の抽出手段と、前記第２の抽出手段によって抽出された特徴から、前記部分画像内の領域の文字らしさを示す情報を算出する算出手段と、前記算出手段によって算出された文字らしさを示す情報と前記木構造における包含関係を用いて、１つの文字が記載されている部分画像を特定する第２の特定手段を具備することを特徴とする請求項１に記載の画像処理装置である。 According to a second aspect of the present invention, the extraction unit constructs a tree structure indicating the inclusion relation, and the specifying unit extracts a feature of a region in the partial image from the image and the partial image. Means for calculating information indicating the character likeness of the area in the partial image from the features extracted by the second extracting means, information indicating the character likeness calculated by the calculating means, and the tree The image processing apparatus according to claim 1, further comprising a second specifying unit that specifies a partial image in which one character is described using an inclusion relation in the structure.

請求項３の発明は、前記第２の特定手段は、前記木構造におけるルートからリーフまでのノードにおいて、各ノードに対応する部分画像内の領域の文字らしさを示す情報が最も高いものを１つの文字が記載されている部分画像として特定することを特徴とする請求項２に記載の画像処理装置である。 The invention according to claim 3 is characterized in that the second specifying means selects one of the nodes from the root to the leaf in the tree structure having the highest information indicating the character character of the region in the partial image corresponding to each node. The image processing apparatus according to claim 2, wherein the image processing apparatus is specified as a partial image in which characters are described.

請求項４の発明は、前記第２の特定手段は、１つの文字が記載されている部分画像として特定したノードの親と子のノードを削除することを特徴とする請求項３に記載の画像処理装置である。 The invention according to claim 4 is characterized in that the second specifying means deletes the parent and child nodes of the node specified as a partial image in which one character is described. It is a processing device.

請求項５の発明は、コンピュータを、画像から文字としての特徴を備える部分画像を文字候補領域として検出する検出手段と、前記部分画像間の包含関係を抽出する抽出手段と、前記部分画像内の領域の文字らしさを示す情報と前記包含関係を用いて、１つの文字が記載されている部分画像を特定する特定手段として機能させるための画像処理プログラムである。 According to a fifth aspect of the present invention, there is provided a computer for detecting, as a character candidate area, a partial image having a character feature from an image, an extracting unit for extracting an inclusion relationship between the partial images, An image processing program for functioning as a specifying unit that specifies a partial image in which one character is described, using information indicating the character of an area and the inclusion relation.

請求項１の画像処理装置によれば、単に領域の文字らしさを示す情報を用いて１つの文字が記載されている部分画像を特定する場合に比べて、より正確な特定を行うことができる。 According to the image processing apparatus of the first aspect, more accurate identification can be performed as compared with the case where the partial image in which one character is described is simply identified using the information indicating the character of the area.

請求項２の画像処理装置によれば、木構造における包含関係を用いて、１つの文字が記載されている部分画像を特定することができる。 According to the image processing apparatus of the second aspect, it is possible to specify a partial image in which one character is described using the inclusion relation in the tree structure.

請求項３の画像処理装置によれば、木構造におけるルートからリーフまでのノードにおいて、各ノードに対応する部分画像内の領域の文字らしさを示す情報が最も高いものを１つの文字が記載されている部分画像として特定することができる。 According to the image processing apparatus of claim 3, in the nodes from the root to the leaf in the tree structure, one character is described that has the highest information indicating the character character of the area in the partial image corresponding to each node. Can be identified as a partial image.

請求項４の画像処理装置によれば、１つの文字が記載されている部分画像として特定したノードの親と子のノードを削除することができる。 According to the image processing apparatus of the fourth aspect, the parent and child nodes of the node specified as the partial image in which one character is described can be deleted.

請求項５の画像処理プログラムによれば、単に領域の文字らしさを示す情報を用いて１つの文字が記載されている部分画像を特定する場合に比べて、より正確な特定を行うことができる。 According to the image processing program of the fifth aspect, more accurate identification can be performed as compared with the case where the partial image in which one character is described is simply identified using the information indicating the character of the area.

本実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment. 本実施の形態を利用したシステム構成例を示す説明図である。It is explanatory drawing which shows the system configuration example using this Embodiment. 本実施の形態を利用したシステム構成例を示す説明図である。It is explanatory drawing which shows the system configuration example using this Embodiment. 本実施の形態（文字領域特定モジュール）の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment (character area specific | specification module). 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態（包含関係修正モジュール）による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment (inclusion relation correction module). 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な一実施の形態の例を説明する。
図１は、本実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、全ての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, an example of a preferred embodiment for realizing the present invention will be described with reference to the drawings.
FIG. 1 shows a conceptual module configuration diagram of a configuration example of the present embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, they may be different values, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である画像処理装置１００は、画像から１つの文字が記載されている部分画像を特定するものであって、図１の例に示すように、画像受付モジュール１１０、文字候補領域検出モジュール１２０、文字領域特定モジュール１３０、文字認識モジュール１４０を有している。
画像処理装置１００は、主に、情景画像内の文字認識を行うためのものである。つまり、カメラ（デジタルカメラ、携帯電話等を含む携帯端末（ウェアラブル端末を含めてもよい）に内蔵されているカメラ等）で撮影した静止画又は動画（以下、情景画像ともいう）中から文字領域を特定し、文字認識を行うための技術に関するものである。例えば、この情景画像として、看板等の文字画像が含まれている風景画像等がある。特に、画像処理装置１００は、情景画像内からの文字領域の特定技術に関するものである。なお、文字認識以外の処理（例えば、画像復元処理等）に用いるようにしてもよい。
また、画像処理装置１００の処理結果を、例えば、ナビゲーション、コンテンツベース画像検索、情景画像内の案内等の翻訳、視覚障害者支援、情景画像からの地点情報抽出、情景画像へのセマンティックなタグ付け、目視確認作業の自動化等の処理に用いるようにしてもよい。
また、情景画像内の文字認識を行うにあたって、対象としての情景画像には、次のような、固有の性質がある。特に、ドキュメントの文字認識にはない性質を挙げている。
・複雑な情景からの文字領域を特定する必要がある。
・影、反射、ガラス等への映り込み等がある場合がある。
・多様なフォント、レイアウトがある。
・パースペクティブ、円柱等の曲面上の文字等を対象とする場合がある。
このような性質を有していることから、情景画像内で本来の文字領域から必ずしも高い文字スコア（文字らしさを示す情報）が得られるとは限らない。 The image processing apparatus 100 according to the present embodiment specifies a partial image in which one character is described from an image. As shown in the example of FIG. 1, the image receiving module 110, character candidate area detection A module 120, a character area specifying module 130, and a character recognition module 140 are provided.
The image processing apparatus 100 is mainly for performing character recognition in a scene image. That is, a character area from a still image or a moving image (hereinafter also referred to as a scene image) taken by a camera (a camera built in a portable terminal (including a wearable terminal) including a digital camera or a mobile phone). This invention relates to a technique for identifying characters and performing character recognition. For example, the scene image includes a landscape image including a character image such as a signboard. In particular, the image processing apparatus 100 relates to a technique for specifying a character area from a scene image. In addition, you may make it use for processes (for example, image restoration process etc.) other than character recognition.
In addition, the processing result of the image processing apparatus 100 is used for, for example, navigation, content-based image search, translation of guidance in a scene image, assistance for the visually impaired, extraction of point information from the scene image, and semantic tagging of the scene image Further, it may be used for processing such as automation of visual confirmation work.
In addition, when performing character recognition in a scene image, the scene image as a target has the following unique properties. In particular, it mentions properties that are not in document character recognition.
-It is necessary to specify a character area from a complicated scene.
・ There may be shadows, reflections, reflections on glass, etc.
-There are various fonts and layouts.
-Perspectives and characters on curved surfaces such as cylinders may be targeted.
Because of such a property, a high character score (information indicating character character) is not always obtained from an original character region in a scene image.

本実施の形態を説明するにあたって、用語を定義する。
木構造とは、グラフ理論の木の構造をしたデータ構造のことである。グラフ理論では、木とは非環状（ループを持たない）グラフを意味する。
木構造は、ノードとノード間を結ぶエッジで表される。データ構造として使われる木は、ほとんどの場合、根となるノード（ルート）が決められた根付き木で、ノード間の関係は家系図に見立てた用語で表現される。木構造内の各ノードは、０個以上の子ノードを持ち、子ノードは木構造内では下方に存在する（木構造の成長方向は下とするのが一般的である）。子ノードを持つノードは、子ノードから見れば親ノードである。ノードは高々１つの親ノードを持つ。 In describing this embodiment, terms are defined.
The tree structure is a data structure having a tree structure of graph theory. In graph theory, a tree means an acyclic (no loop) graph.
The tree structure is represented by edges connecting nodes. In most cases, the tree used as a data structure is a rooted tree in which a root node (root) is determined, and the relationship between the nodes is expressed in terms similar to a family tree. Each node in the tree structure has zero or more child nodes, and the child nodes exist downward in the tree structure (generally, the growth direction of the tree structure is downward). A node having a child node is a parent node when viewed from the child node. A node has at most one parent node.

画像受付モジュール１１０は、文字候補領域検出モジュール１２０、文字領域特定モジュール１３０と接続されており、文字候補領域検出モジュール１２０、文字領域特定モジュール１３０に情景画像情報１１５を渡す。画像受付モジュール１１０は、文字画像が含まれている画像を受け付けて、その画像を文字候補領域検出モジュール１２０、文字領域特定モジュール１３０へ渡す。ここで、画像を受け付けるとは、例えば、カメラ、スキャナ等で画像を読み込むこと、ファックス等で通信回線を介して外部機器から画像を受信すること、ハードディスク（コンピュータに内蔵されているものの他に、ネットワークを介して接続されているもの等を含む）等に記憶されている画像を読み出すこと等が含まれる。画像は、２値、多値画像（カラー画像を含む）である。受け付ける画像は、１枚であってもよいし、複数枚であってもよい。また、画像の内容として、主に前述の情景画像である。 The image reception module 110 is connected to the character candidate area detection module 120 and the character area identification module 130, and passes the scene image information 115 to the character candidate area detection module 120 and the character area identification module 130. The image reception module 110 receives an image including a character image and passes the image to the character candidate region detection module 120 and the character region identification module 130. Here, accepting an image means, for example, reading an image with a camera, a scanner, etc., receiving an image from an external device via a communication line by fax or the like, a hard disk (in addition to what is built in a computer, For example, reading out an image stored in a network (including those connected via a network). The image is a binary or multivalued image (including a color image). One image may be received or a plurality of images may be received. Further, the content of the image is mainly the above-described scene image.

文字候補領域検出モジュール１２０は、画像受付モジュール１１０、文字領域特定モジュール１３０と接続されており、文字領域特定モジュール１３０に文字候補領域情報１２５を渡す。文字候補領域検出モジュール１２０は、画像受付モジュール１１０が受け付けた画像から文字としての特徴を備える部分画像を文字候補領域として検出する。検出結果を文字候補領域情報１２５として、文字領域特定モジュール１３０に渡す。文字としての特徴の例としては、背景に対するコントラストが高いことや文字領域内での濃淡の変化が小さいことなどが挙げられる。具体的な例として、本実施の形態は、これらの特徴を利用する検出技術の中で、影・反射やグラデーション文字に対してロバストな手法として、近年特に注目されているＭａｘｉｍａｌｌｙＳｔａｂｌｅＥｘｔｒｅｍａｌＲｅｇｉｏｎ（ＭＳＥＲ）を用いる。ＭＳＥＲは、式（１）のように定義されている（「Ｊ．Ｍａｔａｓ，Ｏ．Ｃｈｕｍ，Ｍ．Ｕｒｂａｎ，Ｔ．Ｐａｊｄｌａ，“Ｒｏｂｕｓｔｗｉｄｅ−ｂａｓｅｌｉｎｅｓｔｅｒｅｏｆｒｏｍｍａｘｉｍａｌｌｙｓｔａｂｌｅｅｘｔｒｅｍａｌｒｅｇｉｏｎｓ”，ＩｍａｇｅａｎｄＶｉｓｉｏｎＣｏｍｐｕｔｉｎｇ２２，ｐ．７６１ − ７６７，２００４」参照）。

The character candidate area detection module 120 is connected to the image reception module 110 and the character area identification module 130, and passes the character candidate area information 125 to the character area identification module 130. The character candidate area detection module 120 detects a partial image having a character feature from the image received by the image receiving module 110 as a character candidate area. The detection result is passed to the character area specifying module 130 as character candidate area information 125. Examples of characteristics as characters include high contrast with the background and small changes in shading within the character area. As a specific example, the present embodiment is a Maximal Stable Extreme Region (MSER) that has recently attracted particular attention as a robust technique for shadows, reflections, and gradation characters among detection techniques that use these features. ) Is used. MSER is defined as shown in Equation (1) (“J. Matas, O. Chum, M. Urban, T. Pajdla,“ Robest wide-baseline stereo maximum stable extent regi sions 22 ”. , P. 761-767, 2004).

ＭＳＥＲ（文字候補領域検出モジュール１２０による処理内容（文字候補領域情報１２５の生成処理））について、図５〜１３を用いて説明する。
（１）図５の例に示すような対象画像５００（グレイスケール画像）を、文字候補領域検出モジュール１２０による処理対象（情景画像情報１１５）とする。対象画像５００は、ほぼ黒である矩形の領域５１０と、ほぼ白である矩形の領域５３０と、左から右に向かって薄くなっている灰色グラデーションの領域５２０の３つの領域に分かれている。
（２）二値化の閾値を最も暗い画素値から最も明るい画素値に向かって動かしながら、対象画像５００を二値化する。
（３）閾値が最も暗い画素値では、全ての画素が閾値以上の画素値を有しているため、図６の例に示すように、どの領域も検出されていない。 MSER (contents of processing by the character candidate region detection module 120 (processing for generating the character candidate region information 125)) will be described with reference to FIGS.
(1) A target image 500 (grayscale image) as shown in the example of FIG. 5 is set as a processing target (scene image information 115) by the character candidate area detection module 120. The target image 500 is divided into three areas: a rectangular area 510 that is substantially black, a rectangular area 530 that is substantially white, and a gray gradation area 520 that is thinned from left to right.
(2) The target image 500 is binarized while moving the binarization threshold from the darkest pixel value toward the brightest pixel value.
(3) With the pixel value having the darkest threshold value, all the pixels have pixel values equal to or greater than the threshold value, so that no region is detected as shown in the example of FIG.

（４）閾値を動かしていき、閾値が図５の例の領域５１０の領域の画素値を上回ると領域５１０が検出され、図７の例に示すように、１つ目のＭＳＥＲ（領域５１０Ａ）が検出される。
（５）さらに、閾値を明るい側に動かしていくと、領域５２０の最暗部付近が検出されていく。しかし、閾値を±Δだけ変動させると検出される領域の面積も大きく変動するため、ＭＳＥＲの定義より領域５２０の最暗部付近の領域は検出されない。
（６）閾値が領域５２０の最明部よりも明るくなると、閾値を変動させても検出される領域の面積が変化しないので、この段階で領域５２０が検出され、図８の例に示すように、領域５１０と領域５２０の和領域である２つ目のＭＳＥＲ（領域５２０Ａ）が検出される。２つ目のＭＳＥＲ（領域５２０Ａ）は最初に検出したＭＳＥＲ（領域５１０Ａ）を包含している。
（７）さらに閾値を明るい側に動かしていき、領域５３０よりも明るくなった段階で領域５３０が検出され、図９の例に示すように、３番目のＭＳＥＲ（対象画像５００Ａ）が検出される。３番目のＭＳＥＲ（対象画像５００Ａ）は１番目、２番目に検出したＭＳＥＲ（領域５１０Ａ、領域５２０Ａ）を包含している。 (4) The threshold value is moved, and when the threshold value exceeds the pixel value of the region 510 in the example of FIG. 5, the region 510 is detected. As shown in the example of FIG. 7, the first MSER (region 510A) Is detected.
(5) Further, when the threshold value is moved to the bright side, the vicinity of the darkest part of the region 520 is detected. However, if the threshold value is changed by ± Δ, the area of the detected region also changes greatly, and therefore the region near the darkest part of the region 520 is not detected from the definition of MSER.
(6) When the threshold value becomes brighter than the brightest part of the region 520, the area of the detected region does not change even if the threshold value is changed. Therefore, the region 520 is detected at this stage, and as shown in the example of FIG. The second MSER (area 520A), which is the sum area of the areas 510 and 520, is detected. The second MSER (area 520A) includes the first detected MSER (area 510A).
(7) The threshold value is further moved to the brighter side, and the region 530 is detected when it becomes brighter than the region 530, and the third MSER (target image 500A) is detected as shown in the example of FIG. . The third MSER (target image 500A) includes the first and second detected MSERs (region 510A, region 520A).

（８）次に、閾値を最も明るい画素値から最も暗い画素値に向かって動かしながら同様の処理を行う。つまり、前述とは逆方向に処理を進める。これは白抜き文字などを検出するためである。
（９）検出結果を、図１０〜１３の例に示す。閾値を暗から明に変化させたときと同様に、検出されるＭＳＥＲは互いに包含関係を有している。
ＭＳＥＲの性質を説明する。
・２値化の閾値を暗から明に変動させたときのＭＳＥＲと、明から暗に変動させたときのＭＳＥＲはそれぞれ包含関係を有している。
・閾値を暗から明に変動させたときに検出されるＭＳＥＲは、明から暗に変動させたときのＭＳＥＲのいずれとも包含関係を有していない。ただし、画像全体が検出されたＭＳＥＲを除く。逆の場合も同様である。
・図５の例に示す領域５２０のように、グラデーションのある領域も検出できる。 (8) Next, the same processing is performed while moving the threshold value from the brightest pixel value toward the darkest pixel value. That is, the process proceeds in the opposite direction to that described above. This is to detect white characters and the like.
(9) The detection results are shown in the examples of FIGS. Similar to the case where the threshold value is changed from dark to light, the detected MSERs are inclusive of each other.
The nature of MSER will be described.
The MSER when the binarization threshold is changed from dark to light and the MSER when the threshold value is changed from light to dark have an inclusive relationship.
The MSER detected when the threshold value is changed from dark to light does not have an inclusive relationship with any of the MSER values when the threshold value is changed from light to dark. However, the MSER in which the entire image is detected is excluded. The same applies to the reverse case.
A region with gradation can be detected as in the region 520 shown in the example of FIG.

文字領域特定モジュール１３０は、画像受付モジュール１１０、文字候補領域検出モジュール１２０、文字認識モジュール１４０と接続されており、画像受付モジュール１１０から情景画像情報１１５を、文字候補領域検出モジュール１２０から文字候補領域情報１２５を受け取り、文字認識モジュール１４０に特定文字領域情報１３５を渡す。文字領域特定モジュール１３０は、部分画像間の包含関係を抽出し、そして、その部分画像内の領域の文字らしさを示す情報（以下、文字スコアともいう）と包含関係を用いて、１つの文字が記載されている部分画像を特定する。文字候補領域検出モジュール１２０の処理結果である文字候補領域には文字以外の領域も含まれる。また、ＭＳＥＲを用いた場合、１つの文字から複数種類の領域候補群が抽出される場合がある（図１４〜１６を用いて後述）。このような文字候補領域のそれぞれについて文字／非文字判定を行い、文字領域を特定する。文字領域特定モジュール１３０内のモジュール構成、処理等については、図４等を用いて後述する。 The character area specifying module 130 is connected to the image receiving module 110, the character candidate area detecting module 120, and the character recognizing module 140, and the scene image information 115 is received from the image receiving module 110 and the character candidate area is transferred from the character candidate area detecting module 120. The information 125 is received and the specific character area information 135 is passed to the character recognition module 140. The character region specifying module 130 extracts an inclusion relationship between the partial images, and uses the information indicating the character of the region in the partial image (hereinafter also referred to as a character score) and the inclusion relationship, The described partial image is specified. The candidate character region that is the processing result of the candidate character region detection module 120 includes regions other than characters. When MSER is used, a plurality of types of region candidate groups may be extracted from one character (described later with reference to FIGS. 14 to 16). Character / non-character determination is performed for each of such character candidate areas to identify the character area. The module configuration, processing, and the like in the character area specifying module 130 will be described later with reference to FIG.

文字認識モジュール１４０は、文字領域特定モジュール１３０と接続されており、文字領域特定モジュール１３０から特定文字領域情報１３５を受け取る。文字認識モジュール１４０は、文字領域特定モジュール１３０によって特定された文字領域画像（特定文字領域情報１３５）を対象として、文字認識を行う。文字認識方法としては公知の手法を用いればよい。 The character recognition module 140 is connected to the character area specifying module 130 and receives specific character area information 135 from the character area specifying module 130. The character recognition module 140 performs character recognition on the character area image (specific character area information 135) specified by the character area specifying module 130. A known method may be used as the character recognition method.

図２は、本実施の形態を利用したシステム構成例を示す説明図である。
画像処理装置１００は、撮影装置２１０、情報処理装置２３０と接続されている。このシステムは、一体型の筐体に収められていてもよいし、別々の筐体であってもよい。各装置間の通信は、無線、有線、これらの組み合わせであってもよい。
撮影装置２１０は、画像処理装置１００と接続されている。撮影装置２１０は、前述のカメラであり、例えば人が撮影するものであってもよいし、自動車等に搭載されているものであってもよいし、監視カメラ等のように固定した場所に設置されているものであってもよい。
情報処理装置２３０は、画像処理装置１００と接続されている。情報処理装置２３０は、画像処理装置１００（文字認識モジュール１４０）の処理結果を利用した処理を行う。例えば、このシステムを自動車等に搭載して、前述したようにナビゲーション等の処理を行う。また、情報処理装置２３０は、文字領域特定モジュール１３０による処理結果を利用した処理を行うようにしてもよい。例えば、前述したように画像復元等の処理を行う。 FIG. 2 is an explanatory diagram showing a system configuration example using the present embodiment.
The image processing apparatus 100 is connected to a photographing apparatus 210 and an information processing apparatus 230. This system may be housed in an integrated housing or may be a separate housing. Communication between the devices may be wireless, wired, or a combination thereof.
The imaging device 210 is connected to the image processing device 100. The photographing device 210 is the above-described camera, for example, may be one that is photographed by a person, or may be mounted on a car or the like, or installed in a fixed place such as a surveillance camera. It may be what has been done.
The information processing device 230 is connected to the image processing device 100. The information processing device 230 performs processing using the processing result of the image processing device 100 (character recognition module 140). For example, this system is mounted on an automobile or the like, and processing such as navigation is performed as described above. Further, the information processing apparatus 230 may perform processing using the processing result by the character area specifying module 130. For example, as described above, processing such as image restoration is performed.

図３は、本実施の形態を利用したシステム構成例を示す説明図である。
画像処理装置１００、ユーザー端末３１０Ａ、ユーザー端末３１０Ｂ、ユーザー端末３１０Ｃ、情報処理装置３３０は、通信回線３９０を介してそれぞれ接続されている。通信回線３９０は、無線、有線、これらの組み合わせであってもよく、例えば、通信インフラとしてのインターネット、イントラネット等であってもよい。
情報処理装置３３０は、前述の情報処理装置２３０と同等の処理を行う。画像処理装置１００、情報処理装置３３０、画像処理装置１００と情報処理装置３３０の組み合わせによる機能は、クラウドサービスとして実現してもよい。
ユーザー端末３１０は、前述の撮影装置２１０と同等の処理を行う。例えば、ユーザー端末３１０によって撮影された画像を画像処理装置１００に処理させるようにしてもよい。また、ユーザー端末３１０Ａからの指示によって、画像処理装置１００、情報処理装置３３０で処理が行われ、その処理結果をユーザー端末３１０Ｂ等に送信するようにしてもよい。 FIG. 3 is an explanatory diagram showing a system configuration example using the present embodiment.
The image processing apparatus 100, the user terminal 310A, the user terminal 310B, the user terminal 310C, and the information processing apparatus 330 are connected via a communication line 390, respectively. The communication line 390 may be wireless, wired, or a combination thereof, and may be, for example, the Internet or an intranet as a communication infrastructure.
The information processing device 330 performs processing equivalent to that of the information processing device 230 described above. The functions of the image processing apparatus 100, the information processing apparatus 330, and the combination of the image processing apparatus 100 and the information processing apparatus 330 may be realized as a cloud service.
The user terminal 310 performs processing equivalent to that of the above-described photographing apparatus 210. For example, an image captured by the user terminal 310 may be processed by the image processing apparatus 100. Further, processing may be performed by the image processing apparatus 100 and the information processing apparatus 330 according to an instruction from the user terminal 310A, and the processing result may be transmitted to the user terminal 310B or the like.

図４は、本実施の形態（文字領域特定モジュール１３０）の構成例についての概念的なモジュール構成図である。
文字領域特定モジュール１３０は、包含関係構築モジュール４１０、文字領域特徴抽出モジュール４２０、文字スコア算出モジュール４３０、包含関係修正モジュール４４０を有している。 FIG. 4 is a conceptual module configuration diagram of a configuration example of the present embodiment (character area specifying module 130).
The character area specifying module 130 includes an inclusion relation construction module 410, a character area feature extraction module 420, a character score calculation module 430, and an inclusion relation correction module 440.

情景画像情報１１５は、画像受付モジュール１１０から出力される情景画像に関する情報（情景画像そのものであってもよい）で、カラー（グレイスケール画像を含む）又は白黒のラスタ画像情報である。カラーの場合はＲＧＢであってもよいし、Ｌ＊ａ＊ｂ＊等の他の色空間における画像であってもよい。入力画像がカラーの場合の処理として、本実施の形態では１つの特定のチャネルのみ（例えば、ＲＧＢ画像のＲ画像のみ等）を用いる場合を例として説明するが、全チャネルに対して個別に文字領域を特定して、最後に論理和をとるなどの方法で結果を統合してもよい。
文字候補領域情報１２５は、文字候補領域検出モジュール１２０による処理結果であって、各候補領域の輪郭画素位置情報、領域内画素位置情報等で表現される。本実施の形態では、個々のＭＳＥＲとして検出される画素集合の位置座標を列挙したものをＭＳＥＲ情報と呼ぶことにし、対象としている画像全体から検出された全てのＭＳＥＲ情報の集合を文字候補領域情報１２５とする。 The scene image information 115 is information (which may be the scene image itself) related to the scene image output from the image receiving module 110, and is color (including a grayscale image) or monochrome raster image information. In the case of color, it may be RGB or an image in another color space such as L * a * b *. In the present embodiment, the case where only one specific channel (for example, only the R image of the RGB image) is used as an example of processing when the input image is in color will be described. The results may be integrated by specifying a region and finally performing a logical sum.
The character candidate area information 125 is a processing result by the character candidate area detection module 120 and is expressed by contour pixel position information, in-area pixel position information, and the like of each candidate area. In this embodiment, a list of pixel coordinates detected as individual MSERs is referred to as MSER information, and a set of all MSER information detected from the entire target image is character candidate area information. 125.

包含関係構築モジュール４１０は、包含関係修正モジュール４４０と接続されており、文字候補領域情報１２５を受け付ける。包含関係構築モジュール４１０は、部分画像間の包含関係を抽出する。また、包含関係構築モジュール４１０は、包含関係を示す木構造を構築するようにしてもよい。
ＭＳＥＲはある領域が別の領域を包含し、このような包含関係を連ねていくと木構造を形成するという特徴がある（図５〜１３を用いて前述）。そこで、具体的には、包含関係構築モジュール４１０では、文字候補領域情報１２５内の全てのＭＳＥＲについて包含関係をチェックし、複数の木構造を構築する。ここで複数の木構造として、少なくとも２つの木構造がある。例えば、前述したように、２値化の閾値を暗から明へ変動させることによって生成する木構造と、閾値を明から暗へ変動させることによって生成する木構造がある。 The inclusion relationship construction module 410 is connected to the inclusion relationship correction module 440 and receives the character candidate area information 125. The inclusion relationship construction module 410 extracts inclusion relationships between partial images. Further, the inclusion relationship construction module 410 may construct a tree structure indicating the inclusion relationship.
MSER is characterized in that a certain region includes another region and a tree structure is formed when such inclusion relationships are linked (described above with reference to FIGS. 5 to 13). Therefore, specifically, the inclusion relationship construction module 410 checks the inclusion relationship for all MSERs in the character candidate area information 125 and constructs a plurality of tree structures. Here, there are at least two tree structures as a plurality of tree structures. For example, as described above, there are a tree structure generated by changing the threshold value for binarization from dark to light, and a tree structure generated by changing the threshold value from light to dark.

木構造の例については、図１４〜１６を用いて説明する。
ＭＳＥＲの性質として、１文字から複数のＭＳＥＲが検出される場合がある。例えば、元の文字画像（例えば、看板等に記載された文字画像）は、図１４の例に示すような２値画像であるとする。カメラで撮影した情景画像では、影、反射、照明ムラ、撮影時の手ぶれ、ぼけ等の影響で、元々は２値画像であっても、図１５の例に示すように多値の画像になる。
その結果、図１６の例に示すように１つの文字から複数のＭＳＥＲが検出される場合がある。
ＭＳＥＲなので、個々の検出領域は図１６の例に示す木構造で表されるような包含関係を有している。 An example of a tree structure will be described with reference to FIGS.
As the nature of MSER, a plurality of MSERs may be detected from one character. For example, an original character image (for example, a character image written on a signboard or the like) is assumed to be a binary image as shown in the example of FIG. A scene image shot by a camera is a multi-valued image as shown in the example of FIG. 15 even if it was originally a binary image due to the influence of shadows, reflections, uneven illumination, camera shake, blurring, etc. .
As a result, a plurality of MSERs may be detected from one character as shown in the example of FIG.
Since it is MSER, each detection area has an inclusive relationship represented by the tree structure shown in the example of FIG.

文字領域特徴抽出モジュール４２０は、文字スコア算出モジュール４３０と接続されており、情景画像情報１１５、文字候補領域情報１２５を受け付ける。文字領域特徴抽出モジュール４２０は、対象としている画像（情景画像情報１１５）と文字候補領域検出モジュール１２０の処理結果である部分画像（文字候補領域情報１２５）から、その部分画像内の領域の特徴を抽出する。具体的には、文字領域特徴抽出モジュール４２０は、文字候補領域情報１２５によって示される個々のＭＳＥＲに関する情報（文字候補領域情報１２５）と、個々のＭＳＥＲに対応する情景画像情報１１５から、領域の文字らしさを反映する特徴量を複数抽出し、それらを並べた特徴ベクトル（特徴量をベクトルの要素とした特徴ベクトル）を生成する。文字画像の特徴として、
（１）文字領域内での色・濃淡の変化が小さい
（２）背景に対するコントラストが高い
（３）単純な輪郭
（４）一定幅の線分で構成される
等が挙げられる。そして、領域の文字らしさを反映する特徴量の例として、
（１）ａｓｐｅｃｔｒａｔｉｏ
（２）ｃｏｍｐａｃｔｎｅｓｓ
（３）ｃｏｎｖｅｘｈｕｌｌａｒｅａｔｏｓｕｒｆａｃｅｒａｔｉｏ
（４）ｂａｃｋｇｒｏｕｎｄｃｏｌｏｒｃｏｎｓｉｓｔｅｎｃｙ
（５）ｒｅｌａｔｉｖｅｓｅｇｍｅｎｔｈｅｉｇｈｔ
（６）ｎｕｍｂｅｒｏｆｈｏｌｅｓ
（７）ｃｈａｒａｃｔｅｒｃｏｌｏｒｃｏｎｓｉｓｔｅｎｃｙ
（８）ｓｋｅｌｅｔｏｎｌｅｎｇｔｈｔｏｐｅｒｉｍｅｔｅｒｒａｔｉｏ
等がある（Ｌ．Ｎｅｕｍａｎｎ，Ｊ．Ｍａｔａｓ， “ＡＭｅｔｈｏｄｆｏｒＴｅｘｔＬｏｃａｌｉｚａｔｉｏｎａｎｄＲｅｃｏｇｎｉｔｉｏｎｉｎＲｅａｌ−ＷｏｒｌｄＩｍａｇｅｓ”，ｉｎＡＣＣＶ２０１０，ｐ．７７０−７８３，２０１０参照）。
文字領域特徴抽出モジュール４２０は、特徴量（実数値）を並べたベクトルを特徴ベクトルとする。上記の例では、８次元のベクトルを生成することになる。
また、特徴ベクトルとしては、上記のような注目領域から抽出される特徴量だけでなく、近傍の領域との関係性に基づいて抽出される特徴量を含んでもよい。例えば、最近傍領域との平均色差やサイズ比などを特徴量として用いるようにしてもよい。 The character area feature extraction module 420 is connected to the character score calculation module 430 and receives the scene image information 115 and the character candidate area information 125. The character region feature extraction module 420 extracts the feature of the region in the partial image from the target image (scene image information 115) and the partial image (character candidate region information 125) that is the processing result of the character candidate region detection module 120. Extract. Specifically, the character area feature extraction module 420 determines the character of the area from the information (character candidate area information 125) regarding each MSER indicated by the character candidate area information 125 and the scene image information 115 corresponding to each MSER. A plurality of feature quantities reflecting the uniqueness are extracted, and a feature vector in which the feature quantities are arranged (a feature vector having the feature quantity as a vector element) is generated. As a feature of the character image,
(1) The change in color and shade in the character area is small (2) The contrast with the background is high (3) Simple outline (4) Consists of line segments of a certain width, etc. And as an example of the feature value that reflects the character of the area,
(1) aspect ratio
(2) compactness
(3) Convex hull area to surface ratio
(4) background color consistency
(5) relative segment height
(6) number of holes
(7) character color consistency
(8) skeleton length to perimeter ratio
(See L. Neumann, J. Matas, “A Method for Text Localization and Recognition in Real-World Images”, in ACCV 2010, p. 770-783, 2010).
The character area feature extraction module 420 sets a vector in which feature amounts (real values) are arranged as a feature vector. In the above example, an 8-dimensional vector is generated.
The feature vector may include not only the feature quantity extracted from the attention area as described above but also the feature quantity extracted based on the relationship with the neighboring area. For example, an average color difference from the nearest region, a size ratio, or the like may be used as the feature amount.

文字スコア算出モジュール４３０は、文字領域特徴抽出モジュール４２０、包含関係修正モジュール４４０と接続されている。文字スコア算出モジュール４３０は、文字領域特徴抽出モジュール４２０によって抽出された特徴から、部分画像内の領域の文字らしさを示す情報（文字スコア）を算出する。
具体的には、文字スコア算出モジュール４３０は、文字領域特徴抽出モジュール４２０により生成された特徴ベクトルを入力とし、対応するＭＳＥＲが文字領域であるスコアを算出する。例えば、文字スコアを文字の事後確率とし、事後確率はニューラルネットワークにより算定するようにしてもよい。より具体的には、文字スコアの一例として、文字領域特徴抽出モジュール４２０により生成された特徴ベクトルｘが与えられたときの、対応する領域のラベルｙに関する事後確率ｐ（ｙ｜ｘ）を用いることができる。ラベルは二値（ｙ＝｛０，１｝）とし、ｙ＝０が非文字、ｙ＝１が文字を表すものとする。
事後確率ｐ（ｙ｜ｘ）は、例えば図１７の例に示すようなニューラルネットワーク（中間層１層の多層パーセプトロン）の出力として求めることができる。
特徴ベクトルをｄ次元の実数ベクトルｘ＝（ｘ_１，ｘ_２，…，ｘ_ｄ）とし、それに定数要素１を付加したベクトルＸ＝（ｘ_１，ｘ_２，…，ｘ_ｄ，１）をニューラルネットワークへの入力とする。このとき事後確率ｐ（ｙ｜ｘ）はニューラルネットワークの出力として、式（２）のように求められる。

ただし、式（２）において、σはシグモイド関数、ｈ_ｊはｊ番目の中間ノード出力、ｕ_ｊはｊ番目の中間ノードと出力ノードのリンクの重み、ｗ_ｉｊはｉ番目の入力ノードとｊ番目の中間ノードのリンクの重み、Ｘ_ｉはベクトルＸのｉ番目の成分、Ｉは入力次元数、Ｊは中間ノード数を表す。
パラメータｕ及びパラメータｗは、学習により求める。具体的には、文字領域の特徴ベクトル（正例）と非文字領域の特徴ベクトル（負例）を学習データとして多数用意し、ニューラルネットワークが、それぞれｙ＝１、ｙ＝０になるべく近い値を出力するように最尤学習を行う。 The character score calculation module 430 is connected to the character region feature extraction module 420 and the inclusion relationship correction module 440. The character score calculation module 430 calculates information (character score) indicating the character character of the region in the partial image from the features extracted by the character region feature extraction module 420.
Specifically, the character score calculation module 430 receives the feature vector generated by the character region feature extraction module 420 and calculates a score whose corresponding MSER is a character region. For example, the character score may be a posterior probability of a character, and the posterior probability may be calculated by a neural network. More specifically, as an example of the character score, the posterior probability p (y | x) regarding the label y of the corresponding region when the feature vector x generated by the character region feature extraction module 420 is given is used. Can do. The label is binary (y = {0, 1}), where y = 0 represents a non-character and y = 1 represents a character.
The posterior probability p (y | x) can be obtained as an output of a neural network (multilayer perceptron having one intermediate layer) as shown in the example of FIG.
A feature vector is a d-dimensional real vector x = (x ₁ , x ₂ ,..., X _d ), and a vector X = (x ₁ , x ₂ ,..., X _d , 1) with a constant element 1 added thereto is neuralized. As input to the network. At this time, the posterior probability p (y | x) is obtained as shown in the equation (2) as an output of the neural network.

In Equation (2), σ is a sigmoid function, h _j is the j-th intermediate node output, u _j is the link weight of the j-th intermediate node and the output node, w _ij is the i-th input node and the j-th , X _i is the i-th component of vector X, I is the number of input dimensions, and J is the number of intermediate nodes.
The parameter u and the parameter w are obtained by learning. Specifically, a large number of character vector feature vectors (positive examples) and non-character region feature vectors (negative examples) are prepared as learning data, and the neural network sets values as close as possible to y = 1 and y = 0, respectively. Perform maximum likelihood learning to output.

包含関係修正モジュール４４０は、包含関係構築モジュール４１０、文字スコア算出モジュール４３０と接続されており、特定文字領域情報１３５を出力する。包含関係修正モジュール４４０は、文字スコア算出モジュール４３０によって算出された文字らしさを示す情報と木構造における包含関係を用いて、１つの文字が記載されている部分画像を特定する。
また、包含関係修正モジュール４４０は、木構造におけるルートからリーフまでのノードにおいて、各ノードに対応する部分画像内の領域の文字らしさを示す情報が最も高いものを１つの文字が記載されている部分画像として特定するようにしてもよい。なお、条件としての「最も高いもの」の他に、上位ｎ個の部分画像を特定するようにしてもよい。例えば、文字数が事前に判明している場合は、その文字数をｎとすればよい。
また、包含関係修正モジュール４４０は、１つの文字が記載されている部分画像として特定したノードの親と子のノードを削除するようにしてもよい。
具体的には、包含関係修正モジュール４４０は、文字スコア算出モジュール４３０から出力される文字スコアが最も高いＭＳＥＲを文字領域として特定する。ここで、「文字スコアが最も高い」とは、木構造全体のノードにおいて、そのノードに対応する文字スコアの中で最も高いものをいう。ある文字の中に別の文字を含む例は極めて希であるため、包含関係を表す木構造の中で文字領域として特定したＭＳＥＲの親と子に相当するＭＳＥＲを文字候補領域から除外し、木構造から削除する。ここで除外する親には、親の親も含み、さらにその親も含み、その経路におけるルートまでを含む。除外する子には、子の子も含み、さらにその子も含み、その経路におけるリーフまでを含む。つまり、対象としているノードを含み、ルートからリーフまでの経路上でのノードの中で、その対象としているノード以外を削除する。これにより１つの木が複数の部分木に分割される。以上の処理を全ての木が単一ノードのみになるまで繰り返す。 The inclusion relationship correction module 440 is connected to the inclusion relationship construction module 410 and the character score calculation module 430, and outputs specific character region information 135. The inclusion relationship correction module 440 specifies a partial image in which one character is described, using information indicating the character likeness calculated by the character score calculation module 430 and the inclusion relationship in the tree structure.
In addition, the inclusion relationship correction module 440 has a portion in which one character is described in the node from the root to the leaf in the tree structure that has the highest information indicating the character character of the region in the partial image corresponding to each node. It may be specified as an image. In addition to the “highest” as a condition, the top n partial images may be specified. For example, if the number of characters is known in advance, the number of characters may be n.
Further, the inclusion relationship correction module 440 may delete the parent and child nodes of the node specified as the partial image in which one character is described.
Specifically, the inclusion relationship correction module 440 identifies the MSER having the highest character score output from the character score calculation module 430 as the character region. Here, “the highest character score” refers to the highest character score among the nodes of the entire tree structure. An example of including another character in a certain character is extremely rare. Therefore, the MSER corresponding to the parent and child of the MSER specified as the character region in the tree structure representing the inclusion relation is excluded from the character candidate region, and the tree Remove from structure. The parent excluded here includes the parent of the parent, further includes the parent, and includes the route in the route. The excluded children include the children of the child, and further include the child, including the leaves in the path. That is, the nodes other than the target node are deleted from the nodes on the route from the root to the leaf including the target node. Thereby, one tree is divided into a plurality of subtrees. The above processing is repeated until all trees are only a single node.

図１８〜２２を用いて説明する。
包含関係構築モジュール４１０により構築された個々の木について、最も文字スコアの高いＭＳＥＲを選択し、文字領域として特定する。図１８に示す例は、図１６に示す例（木構造）に、各ノード（ＭＳＥＲ）に対して文字スコア算出モジュール４３０によって算出された文字スコア（数字）を記載したものである。なお、ここでは文字スコアの値が高いほど文字らしいことを示している。この中で、枠線で囲ったＭＳＥＲ１８１０が、この木構造の中で最も高いスコアを示すものである。
ある文字の中に別の文字を含む例は極めて希であるため、文字領域として特定したＭＳＥＲの親と子に相当するＭＳＥＲを文字候補領域から除外する。図１８の例では、ＭＳＥＲ１８１０の親（文字スコア：０．９１のＭＳＥＲ）、その親（文字スコア：０．２６のＭＳＥＲ、ルート）、ＭＳＥＲ１８１０の子（文字スコア：０．１１のＭＳＥＲと文字スコア：０．１３のＭＳＥＲ、２つのリーフ）を削除する。この削除処理により、図１９の例に示すように、１つの木が複数の部分木（図１９の例では３つ）に分割される。
再び個々の木について、最も文字スコアの高いＭＳＥＲを選択し、文字領域として特定する。図２０の例のように、左側の木ではＭＳＥＲ２０１０が選択され、右側の木ではＭＳＥＲ２０２０が選択されることになる。
文字領域として特定したＭＳＥＲの親と子に相当するＭＳＥＲを文字候補領域から除外する。具体的には、ＭＳＥＲ２０１０の子（文字スコア：０．２１のＭＳＥＲと文字スコア：０．０３のＭＳＥＲ、２つのリーフ）を削除し、ＭＳＥＲ２０２０の親（文字スコア：０．８３のＭＳＥＲ）、ＭＳＥＲ２０２０の子（文字スコア：０．５８のＭＳＥＲと文字スコア：０．３２のＭＳＥＲ、２つのリーフ）を削除して、図２１の例に示すような状態になる。
以上の処理を全ての木が単一ノードのみになるまで繰り返す。 This will be described with reference to FIGS.
For each tree constructed by the inclusion relationship construction module 410, the MSER having the highest character score is selected and specified as a character region. The example shown in FIG. 18 describes the character score (number) calculated by the character score calculation module 430 for each node (MSER) in the example (tree structure) shown in FIG. Here, the higher the value of the character score, the more likely it is to be a character. Among these, the MSER 1810 surrounded by a frame line indicates the highest score in this tree structure.
Since an example including another character in a certain character is very rare, MSER corresponding to the parent and child of the MSER specified as the character region is excluded from the character candidate region. In the example of FIG. 18, the parent of MSER 1810 (character score: 0.91 MSER), its parent (character score: MSER of 0.26, root), and child of MSER 1810 (character score: 0.11 MSER and character score) : 0.13 MSER, 2 leaves). By this deletion processing, as shown in the example of FIG. 19, one tree is divided into a plurality of subtrees (three in the example of FIG. 19).
Again, for each tree, the MSER with the highest character score is selected and specified as the character region. As in the example of FIG. 20, MSER 2010 is selected for the left tree, and MSER 2020 is selected for the right tree.
The MSER corresponding to the parent and child of the MSER specified as the character area is excluded from the character candidate area. Specifically, the child of MSER 2010 (character score: 0.21 MSER and character score: 0.03 MSER, two leaves) is deleted, and MSER 2020 parent (character score: 0.83 MSER), MSER 2020 Are deleted (character score: 0.58 MSER and character score: 0.32 MSER, two leaves), and the state shown in the example of FIG. 21 is obtained.
The above processing is repeated until all trees are only a single node.

図２２は、本実施の形態（包含関係修正モジュール４４０）による処理例を示すフローチャートである。
ステップＳ２２０２では、複数のノードを有する木を選択する。
ステップＳ２２０４では、文字スコア最大のＭＳＥＲを文字領域として特定する。
ステップＳ２２０６では、木構造中の親と子を削除する。
ステップＳ２２０８では、全ての木のノード数が１であるか否かを判断し、全ての木のノード数が１である場合は、処理を終了し（ステップＳ２２９９）、それ以外の場合はステップＳ２２０２へ戻る。 FIG. 22 is a flowchart illustrating a processing example according to the present exemplary embodiment (the inclusion relationship correction module 440).
In step S2202, a tree having a plurality of nodes is selected.
In step S2204, the MSER having the maximum character score is specified as the character region.
In step S2206, the parent and child in the tree structure are deleted.
In step S2208, it is determined whether or not the number of nodes of all trees is 1. If the number of nodes of all trees is 1, the process is terminated (step S2299). Otherwise, step S2202 is performed. Return to.

特定文字領域情報１３５は、包含関係修正モジュール４４０により抽出されたノード集合が文字として特定されたＭＳＥＲとなる。前述の例では、図２１の状態である。これを特定文字領域と呼ぶことにする。特定文字領域情報１３５は文字候補領域情報１２５と同一フォーマットか、文字候補領域情報１２５のインデックス情報で表現される。 The specific character area information 135 is an MSER in which the node set extracted by the inclusion relationship correction module 440 is specified as a character. In the above example, the state is as shown in FIG. This is called a specific character area. The specific character area information 135 is expressed in the same format as the character candidate area information 125 or in the index information of the character candidate area information 125.

情景画像内の文字は、影や反射等の影響で本来文字が備えている画像的特徴を失っている場合があり、文字領域から必ずしも高い文字スコアが得られるとは限らない。したがって、文字スコアのみを基準に文字／非文字判定すると文字スコアの低い文字領域が検出されない場合がある。
包含関係修正モジュール４４０では、領域間の包含関係も利用することで、文字スコアの低い領域の検出も可能となる。
文字領域は文字スコアの高い方から順に特定される。文字スコアの高い領域は文字領域としての信頼性が高いので、全体として信頼性の高い処理になる。
影や反射等の影響で文字スコアの値が低くなった文字領域は、文字スコアの低いノード群から構成される部分木に属することになる。しかし、この部分木の中で最も文字スコアの高い領域が文字領域として特定されるので、依然として結果の信頼性は高い。 The characters in the scene image may have lost the image characteristics inherent to the characters due to the influence of shadows and reflections, and a high character score is not always obtained from the character region. Accordingly, when character / non-character determination is performed based only on the character score, a character region having a low character score may not be detected.
The inclusion relationship correction module 440 can also detect a region with a low character score by using the inclusion relationship between regions.
Character regions are specified in order from the highest character score. Since a region having a high character score has high reliability as a character region, the processing is highly reliable as a whole.
A character area having a low character score due to the influence of shadows, reflections, etc. belongs to a subtree composed of nodes having a low character score. However, since the area having the highest character score in the partial tree is specified as the character area, the reliability of the result is still high.

図２３を参照して、本実施の形態の画像処理装置のハードウェア構成例について説明する。図２３に示す構成は、例えばパーソナルコンピュータ（ＰＣ）等によって構成されるものであり、スキャナ等のデータ読み取り部２３１７と、プリンタ等のデータ出力部２３１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the present embodiment will be described with reference to FIG. The configuration illustrated in FIG. 23 is configured by, for example, a personal computer (PC), and illustrates a hardware configuration example including a data reading unit 2317 such as a scanner and a data output unit 2318 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２３０１は、前述の実施の形態において説明した各種のモジュール、すなわち、画像受付モジュール１１０、文字候補領域検出モジュール１２０、文字領域特定モジュール１３０、文字認識モジュール１４０、包含関係構築モジュール４１０、文字領域特徴抽出モジュール４２０、文字スコア算出モジュール４３０、包含関係修正モジュール４４０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 A CPU (Central Processing Unit) 2301 includes various modules described in the above-described embodiments, that is, the image reception module 110, the character candidate area detection module 120, the character area identification module 130, the character recognition module 140, and the inclusion relation construction module. 410, a character area feature extraction module 420, a character score calculation module 430, an inclusion relationship correction module 440, and the like. The control unit executes processing according to a computer program that describes an execution sequence of each module.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２３０２は、ＣＰＵ２３０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２３０３は、ＣＰＵ２３０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバス等から構成されるホストバス２３０４により相互に接続されている。 A ROM (Read Only Memory) 2302 stores programs, calculation parameters, and the like used by the CPU 2301. A RAM (Random Access Memory) 2303 stores programs used in the execution of the CPU 2301, parameters that change as appropriate during the execution, and the like. These are connected to each other by a host bus 2304 including a CPU bus or the like.

ホストバス２３０４は、ブリッジ２３０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス２３０６に接続されている。 The host bus 2304 is connected to an external bus 2306 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 2305.

キーボード２３０８、マウス等のポインティングデバイス２３０９は、操作者により操作される入力デバイスである。ディスプレイ２３１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）等があり、各種情報をテキストやイメージ情報として表示する。 A keyboard 2308 and a pointing device 2309 such as a mouse are input devices operated by an operator. The display 2310 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）２３１１は、ハードディスク（フラッシュメモリ等であってもよい）を内蔵し、ハードディスクを駆動し、ＣＰＵ２３０１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、対象とする画像、情景画像情報１１５、文字候補領域情報１２５、特定文字領域情報１３５、文字認識結果等が格納される。さらに、その他の各種データ、各種コンピュータ・プログラム等が格納される。 An HDD (Hard Disk Drive) 2311 includes a hard disk (may be a flash memory or the like), drives the hard disk, and records or reproduces a program executed by the CPU 2301 and information. The hard disk stores a target image, scene image information 115, character candidate area information 125, specific character area information 135, a character recognition result, and the like. Further, various other data, various computer programs, and the like are stored.

ドライブ２３１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体２３１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース２３０７、外部バス２３０６、ブリッジ２３０５、及びホストバス２３０４を介して接続されているＲＡＭ２３０３に供給する。リムーバブル記録媒体２３１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 2312 reads data or a program recorded on a removable recording medium 2313 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read out to the interface 2307 and the external bus 2306. , The bridge 2305, and the RAM 2303 connected via the host bus 2304. The removable recording medium 2313 can also be used as a data recording area similar to the hard disk.

接続ポート２３１４は、外部接続機器２３１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート２３１４は、インタフェース２３０７、及び外部バス２３０６、ブリッジ２３０５、ホストバス２３０４等を介してＣＰＵ２３０１等に接続されている。通信部２３１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部２３１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部２３１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 2314 is a port for connecting the external connection device 2315 and has a connection unit such as USB or IEEE1394. The connection port 2314 is connected to the CPU 2301 and the like via the interface 2307, the external bus 2306, the bridge 2305, the host bus 2304, and the like. A communication unit 2316 is connected to a communication line and executes data communication processing with the outside. The data reading unit 2317 is, for example, a scanner, and executes document reading processing. The data output unit 2318 is a printer, for example, and executes document data output processing.

なお、図２３に示す画像処理装置のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図２３に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図２３に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、特に、パーソナルコンピュータの他、携帯情報通信機器（携帯電話、スマートフォン、モバイル機器、ウェアラブルコンピュータ等を含む）、情報家電、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus illustrated in FIG. 23 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 23, and the modules described in the present embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line In addition, a plurality of systems shown in FIG. 23 may be connected to each other via communication lines so as to cooperate with each other. In particular, in addition to personal computers, portable information communication devices (including mobile phones, smartphones, mobile devices, wearable computers, etc.), information appliances, copiers, fax machines, scanners, printers, multifunction devices (scanners, printers, copiers) Or an image processing apparatus having two or more functions such as a fax machine).

前述の実施の形態を次のようにしてもよい。
・最大の文字スコアを有している領域の画像を、その親・子とともに表示する文字特定領域表示モジュールを付加してもよい。
さらに、操作者の操作によって、その文字特定領域表示モジュールによって表示される親・子のいずれかの領域に文字特定結果を変更する文字特定結果修正モジュールを付加してもよい。
操作者の操作による修正を可能にするとともに、最大の文字スコアを有している領域とその親・子のみから選択すればよいので、操作者の負担を軽減できる。
・木構造の中の最大の文字スコアが予め定められた値未満又は以下の場合は、その木構造中の全ての候補領域を除外する包含関係修正モジュールを付加するようにしてもよい。
例えば、文字スコアが極端に低い場合はノイズ（非文字領域）である可能性が高いので、この処理により誤検出率を低減できる。
・本実施の形態では、文字領域の検出をしているが、例えば、画像の領域ラベリングに適用してもよい。なお、画像の領域ラベリングとは、例えば、風景画像の各領域に対して、空、雲、海、地面、木、等のラベルを付与するものである。 The above-described embodiment may be performed as follows.
-You may add the character specific area | region display module which displays the image of the area | region which has the largest character score with the parent / child.
Furthermore, a character identification result correction module for changing the character identification result may be added to either the parent or child area displayed by the character identification area display module by the operation of the operator.
The correction by the operation of the operator is possible, and it is only necessary to select from the region having the maximum character score and its parent / child.
If the maximum character score in the tree structure is less than or less than a predetermined value, an inclusion relationship correction module that excludes all candidate regions in the tree structure may be added.
For example, if the character score is extremely low, there is a high possibility of noise (non-character region), and this process can reduce the false detection rate.
In the present embodiment, the character area is detected, but may be applied to, for example, area labeling of an image. The image area labeling is, for example, assigning labels such as sky, clouds, sea, ground, and trees to each area of a landscape image.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通等のために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、又は無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、又は別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して記録されていてもよい。また、圧縮や暗号化等、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, or a wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１００…画像処理装置
１１０…画像受付モジュール
１１５…情景画像情報
１２０…文字候補領域検出モジュール
１２５…文字候補領域情報
１３０…文字領域特定モジュール
１３５…特定文字領域情報
１４０…文字認識モジュール
２１０…撮影装置
２３０…情報処理装置
３１０…ユーザー端末
３３０…情報処理装置
３９０…通信回線
４１０…包含関係構築モジュール
４２０…文字領域特徴抽出モジュール
４３０…文字スコア算出モジュール
４４０…包含関係修正モジュール DESCRIPTION OF SYMBOLS 100 ... Image processing apparatus 110 ... Image reception module 115 ... Scene image information 120 ... Character candidate area detection module 125 ... Character candidate area information 130 ... Character area identification module 135 ... Specific character area information 140 ... Character recognition module 210 ... Shooting apparatus 230 ... Information processing device 310 ... User terminal 330 ... Information processing device 390 ... Communication line 410 ... Inclusion relation construction module 420 ... Character area feature extraction module 430 ... Character score calculation module 440 ... Inclusion relation correction module

Claims

Detecting means for detecting a partial image having a feature as a character from the image as a character candidate region;
Extraction means for extracting an inclusion relationship between the partial images;
An image processing apparatus comprising: specifying means for specifying a partial image in which one character is described using information indicating the character-likeness of an area in the partial image and the inclusion relation.

The extraction means constructs a tree structure indicating the inclusion relationship,
The specifying means is:
Second extraction means for extracting features of a region in the partial image from the image and the partial image;
Calculating means for calculating information indicating the character likeness of the area in the partial image from the features extracted by the second extracting means;
2. A second specifying unit that specifies a partial image in which one character is described using information indicating the character likeness calculated by the calculating unit and an inclusion relation in the tree structure. Item 8. The image processing apparatus according to Item 1.

The second specifying means includes a portion in which one character is described in the node from the root to the leaf in the tree structure that has the highest information indicating the character of the region in the partial image corresponding to each node. The image processing apparatus according to claim 2, wherein the image processing apparatus is specified as an image.

The image processing apparatus according to claim 3, wherein the second specifying unit deletes a parent node and a child node of a node specified as a partial image in which one character is described.

Computer
Detecting means for detecting a partial image having a feature as a character from the image as a character candidate region;
Extraction means for extracting an inclusion relationship between the partial images;
An image processing program for functioning as specifying means for specifying a partial image in which one character is described, using information indicating the character-likeness of an area in the partial image and the inclusion relation.