JP5991076B2

JP5991076B2 - Image processing apparatus and image processing program

Info

Publication number: JP5991076B2
Application number: JP2012185233A
Authority: JP
Inventors: 瑛一田中
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2012-08-24
Filing date: 2012-08-24
Publication date: 2016-09-14
Anticipated expiration: 2032-08-24
Also published as: JP2014044500A

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

特許文献１には、帳票上の文字を読取って２値化した文字パターンを複数行分格納する第１の記憶手段と、第１の記憶手段を列方向に走査して黒点を計数することにより得られる行方向の黒点の射影分布を格納する第２の記憶手段と、文字の下端アドレスを少なくとも１行分格納する第３の記憶手段と、前記射影分布に基づいて文字行の上端アドレスを定め、列方向の一方からの走査を、該上端アドレスからの第１の行方向走査範囲について行って最初の黒点を検出し、その後黒点の検出結果にしたがって移動する第２の行方向走査範囲について行って文字の下端アドレスを順次検出すると共に、検出した時点の下端アドレスの最大値を第３の記憶手段に順次格納し、格納結果の内容を列方向の他方からの走査により検出した文字の下端アドレスと比較することにより小さい方の値により該内容を更新し、更新結果に基づいて文字行の切り出しを行う制御手段とを有することを特徴とする文字行切り出し方式について開示されている。 In Patent Document 1, a first storage unit that reads a character on a form and stores a binarized character pattern for a plurality of lines, and scans the first storage unit in the column direction to count black dots. Second storage means for storing the projection distribution of black dots in the line direction obtained, third storage means for storing the lower end address of the character for at least one line, and determining the upper end address of the character line based on the projection distribution , Scanning from one side in the column direction is performed on the first row direction scanning range from the upper end address to detect the first black point, and then performed on the second row direction scanning range moving according to the black point detection result. The lower end address of the character is sequentially detected, the maximum value of the lower end address at the time of detection is sequentially stored in the third storage means, and the content of the storage result is detected by scanning from the other in the column direction. Scan smaller value by comparing the updating the contents, it discloses a character line extracting method characterized by having a control means for cutting out a character row based on the updated results.

特許文献２には、読み込むべき行の抽出、読み取られた文字画像の列方向のずれや傾きを補正して、文字認識における認識正解率の向上を図ることを目的とし、蛇行して読み込まれた複数行の文書画像の列方向の中心又はその付近に存在する文字を抽出することにより、その文字を基に読み込むべき行の抽出を行うとともに、読み込まれた同一行内の文字画像同志に列方向のずれがある場合は、そのずれを文字間の傾きをもとに求めてその傾きに応じた補正を行い、かつ、読み込まれた文字画像が列方向への傾斜を生じている場合は、その傾斜の傾きが最小となるような処理を行って傾斜の補正を行うことが開示されている。 Patent Document 2 reads in a meandering manner for the purpose of improving the recognition accuracy rate in character recognition by extracting the line to be read and correcting the deviation and inclination in the column direction of the read character image. By extracting characters that exist at or near the center of the column direction of document images of multiple lines, the lines to be read are extracted based on the characters, and the character images in the same line that have been read are extracted in the column direction. If there is a misalignment, find the misalignment based on the tilt between the characters, correct it according to the tilt, and if the read character image has a tilt in the column direction, the tilt It is disclosed that the inclination is corrected by performing a process that minimizes the inclination of.

非特許文献１に示す方式は、ひとつの連結成分を文字列要素とする。また、文字列方向が水平であることを前提とする。まず、最近傍でありサイズが近似している条件から連結成分のペアを作成する。次に、ペアを始点として左右に向かって追跡を行う。追跡においては、連結成分の上端と下端から２つの平行線を算出し、これらに挟まれる領域を予測領域として観測を行う。平行線は最近観測した一定個数の連結成分から最小二乗法により算出する。非特許文献１は、文字列の傾き・湾曲に対して頑健な切り出し精度を示す。 The method shown in Non-Patent Document 1 uses one connected component as a character string element. It is also assumed that the character string direction is horizontal. First, a pair of connected components is created from a condition that is the nearest neighbor and approximates the size. Next, the tracking is performed from the pair to the left and right. In tracking, two parallel lines are calculated from the upper and lower ends of the connected component, and the region sandwiched between these is observed as a prediction region. The parallel lines are calculated by the least squares method from a certain number of recently observed connected components. Non-Patent Document 1 shows a cutout accuracy that is robust against the inclination and curvature of a character string.

非特許文献２に示す方式は、ひとつの連結成分を文字列要素とする。はじめに、画像の一般図形ボロノイ図を解析し、連結成分どうしの隣接グラフを作成する。次に、隣接グラフから、文字列の一部である確度が高い部分グラフを検出する。次に、この部分グラフを始点として連結成分を追跡する。なお、追跡においては、隣接関係にある複数の連結成分から、最適なものを観測する。最適な連結成分の判別においては、追跡している文字列の形状を推定し、これを参照する。非特許文献２の方式は、文字列の太さ・方向の多様性に対して頑健な切り出し精度を示す。 The method shown in Non-Patent Document 2 uses one connected component as a character string element. First, the general graphic Voronoi diagram of the image is analyzed to create an adjacency graph between connected components. Next, a partial graph having a high probability of being a part of the character string is detected from the adjacent graph. Next, the connected component is traced starting from this subgraph. In tracking, an optimum one is observed from a plurality of adjacent connected components. In determining the optimum connected component, the shape of the character string being tracked is estimated and referred to. The method of Non-Patent Document 2 shows a cutout accuracy that is robust against variations in the thickness and direction of character strings.

非特許文献３に示す方式は、２値画像の垂直に連なる画素ランを文字列要素とする。また、文字列方向が水平であることを前提とする。はじめに、文書画像を縮小する。縮小により、文字列領域は他の領域よりも文字画素密度が高くなる。次に、縮小後画像を２値化する。次に、左端（又は、右端）の画素ランを始点として、右端（左端）へ画素ランを追跡する。なお、追跡においては、Ｋａｌｍａｎ−Ｆｉｌｔｅｒを利用する。非特許文献３の方式は、文字列の傾き・歪曲に対して頑健な切り出し精度を示す。 In the method shown in Non-Patent Document 3, a pixel run that is vertically connected to a binary image is used as a character string element. It is also assumed that the character string direction is horizontal. First, the document image is reduced. Due to the reduction, the character string area has a higher character pixel density than the other areas. Next, the reduced image is binarized. Next, the pixel run is traced to the right end (left end) starting from the left end (or right end) pixel run. In tracking, Kalman-Filter is used. The method of Non-Patent Document 3 shows robust cutout accuracy against the inclination / distortion of a character string.

特開昭６２−２６２１９３号公報Japanese Patent Laid-Open No. 62-262193 特開平０８−０４４８１９号公報Japanese Patent Laid-Open No. 08-044819

ＤａｎｉｅｌＭ．Ｏｌｉｖｅｉｒａ，ＲａｆａｅｌＤ．Ｌｉｎｓ，ＧａｂｒｉｅｌＴｏｒｒｅａｏ，ＪｉａｎＦａｎ，ＭａｒｃｅｌｏＴｈｉｅｌｏ， “ＡＮｅｗＭｅｔｈｏｄｆｏｒＴｅｘｔ−ＬｉｎｅＳｅｇｍｅｎｔａｔｉｏｎｆｏｒＷａｒｐｅｄＤｏｃｕｍｅｎｔｓ，” ｉｎＰｒｏｃ．ｏｆＩｎｔ．Ｃｏｎｆ．ｏｎＩｍａｇｅＡｎａｌｙｓｉｓａｎｄＲｅｃｏｇｎｉｔｉｏｎ，ＰｏｖｏａｄｅＶａｒｚｉｍ，Ｐｏｒｔｕｇａｌ，ｐｐ．３９８−４０８，２０１０．Daniel M. Oliveira, Rafael D. et al. Lins, Gabriel Torreao, Jian Fan, Marcelo Thielo, “A New Method for Text-Line Segmentation for Warped Documents,” in Proc. of Int. Conf. on Image Analysis and Recognition, Povoa de Varzim, Portugal, pp. 398-408, 2010. 岩田基，黄瀬浩一，松本啓之亮， “隣接グラフを用いた欧文文書からの文字列抽出”情報処理学会論文誌，Ｖｏｌ．４９，Ｎｏ．８，ｐｐ．３２３９−３２４８，Ａｕｇ１９９９．Moto Iwata, Koichi Kise, Hiroyuki Matsumoto, “Extracting Character Strings from European Documents Using Neighboring Graphs” Information Processing Society Journal, Vol. 49, no. 8, pp. 3239-3248, Aug 1999. ＡｕｒｅｌｅｌｉｅＬｅｍａｉｔｒｅ，ＪｅａｎＣａｍｉｌｌｅｒａｐｐ， “ＴｅｘｔＬｉｎｅＥｘｔｒａｃｔｉｏｎｉｎＨａｎｄｗｒｉｔｔｅｎＤｏｃｕｍｅｎｔｗｉｔｈＫａｌｍａｎＦｉｌｔｅｒＡｐｐｌｉｅｄｏｎＬｏｗＲｅｓｏｌｕｔｉｏｎＩｍａｇｅ，” ２ｎｄＩｎｔ．Ｃｏｎｆ．ｏｎＤｏｃｕｍｅｎｔＩｍａｇｅＡｎａｌｙｓｉｓｆｏｒＬｉｂｒａｒｉｅｓ，Ｌｙｏｎ，Ｆｒａｎｃｅ，２００６．Aurelie Lemaitre, Jean Camillerapp, “Text Line Extraction in Handwritten Document with Kalman Filter Applied on Low Resolution Image,” 2nd Int. Conf. on Document Image Analysis for Libraries, Lyon, France, 2006. On Document Image Analysis for Libraries, Lyon, France, 2006.

本発明は、文字列要素が複数の連結成分によって構成されており、次の文字列要素が含まれている領域内に文字列を構成するのに余分な連結成分が含まれている場合にあって、誤った文字列を生成することを防止するようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 The present invention is suitable when the character string element is composed of a plurality of connected components, and an extra connected component is included in the area including the next character string element. Thus, an object of the present invention is to provide an image processing apparatus and an image processing program that prevent generation of an erroneous character string.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、対象とする画像の領域内の連結成分の組み合わせを検出する検出手段と、前記検出手段によって検出された連結成分の組み合わせに基づいて、文字列を構成し得る文字列要素の候補となる文字列要素候補であって、複数の文字からなる文字列要素候補を複数生成することによって、文字列要素候補の集合を作成する文字列要素候補作成手段と、前記文字列要素候補作成手段によって作成された文字列要素候補の集合から、過去に選択された文字列要素に続く文字列要素候補を文字列要素として選択する文字列要素選択手段を具備することを特徴とする画像処理装置である。
請求項２の発明は、前記文字列要素候補作成手段は、前記連結成分として、該連結成分の外接矩形に内接する楕円を用いることを特徴とする請求項１に記載の画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
According to the first aspect of the present invention, a character string element that can constitute a character string based on a combination of a connected component detected by the detection means and a detection means that detects a combination of the connected components in the region of the target image. A character string element candidate that creates a set of character string element candidates by generating a plurality of character string element candidates composed of a plurality of characters, and the character string element candidate image processing, wherein the set of strings component candidate created by the creation means, to be provided with a string element selection means for selecting a character string element candidates following the string elements that are selected in the past as a string element Device.
The invention according to claim 2 is the image processing apparatus according to claim 1, wherein the character string element candidate creating means uses an ellipse inscribed in a circumscribed rectangle of the connected component as the connected component.

請求項３の発明は、過去に選択された文字列要素に基づいて、該文字列要素に続く文字列要素があるべき領域を予測する予測手段をさらに具備し、前記検出手段は、前記予測手段によって予測された領域内の連結成分の組み合わせを検出することを特徴とする請求項１又は請求項２に記載の画像処理装置である。 The invention of claim 3 further comprises prediction means for predicting a region where a character string element following the character string element should be based on a character string element selected in the past, and the detection means includes the prediction means. the image processing apparatus according to claim 1 or claim 2, characterized in that to detect the combination of the connected components within the predicted region by.

請求項４の発明は、前記検出手段によって検出された連結成分の組み合わせから、文字列要素としては不適切な連結成分を除去することによって連結成分を選択する連結成分選択手段をさらに具備し、前記文字列要素候補作成手段は、前記連結成分選択手段によって選択された連結成分の組み合わせに基づいて、文字列要素候補の集合を作成することを特徴とする請求項１から３のいずれか一項に記載の画像処理装置である。 The invention of claim 4 further comprises connected component selection means for selecting a connected component by removing a connected component inappropriate as a character string element from the combination of connected components detected by the detecting means, string element candidate creating means, on the basis of the combination of the selected connected component by connected component selection means, from claim 1, characterized in that to create a set of string element candidates in any one of 3 It is an image processing apparatus of description.

請求項５の発明は、前記予測手段は、複数の予測領域を作成し、前記検出手段は、異なる予測領域に含まれる連結成分を組み合わせることを行わないことを特徴とする請求項３又は請求項３に従属する請求項４に記載の画像処理装置である。 The invention of claim 5, wherein the predicting means, creating a plurality of prediction areas, said detection means, according to claim 3 or claim, characterized in that it is carried out combining the coupling components contained in the different prediction region 5. The image processing apparatus according to claim 4 , which is dependent on 3 .

請求項６の発明は、前記文字列要素選択手段によって選択された文字列要素を文字列として出力する出力手段と、画像内から連結成分である開始のための文字列要素が検出されなくなることを終了として判別する判別手段を具備することを特徴とする請求項１から５のいずれか一項に記載の画像処理装置である。 According to a sixth aspect of the present invention, there is provided an output means for outputting the character string element selected by the character string element selection means as a character string, and a start character string element as a connected component is not detected from the image. the image processing apparatus according to any one of claims 1 5, characterized by comprising discriminating means for discriminating the completion.

請求項７の発明は、前記出力手段によって出力された文字列に対して、文字認識を行う文字認識手段をさらに具備することを特徴とする請求項６に記載の画像処理装置である。 The invention of claim 7, the output string by said output means, an image processing apparatus according to claim 6, further comprising a character recognition means for performing character recognition.

請求項８の発明は、前記出力手段によって出力された文字列に基づいて、歪み量を算出する算出手段と、前記算出手段によって算出された歪み量に基づいて、前記文字列を含む画像の歪みを補正する補正手段をさらに具備することを特徴とする請求項６に記載の画像処理装置である。 The invention according to claim 8 is a calculation unit that calculates a distortion amount based on the character string output by the output unit, and a distortion of the image including the character string based on the distortion amount calculated by the calculation unit. The image processing apparatus according to claim 6 , further comprising a correcting unit that corrects the error.

請求項９の発明は、コンピュータを、対象とする画像の領域内の連結成分の組み合わせを検出する検出手段と、前記検出手段によって検出された連結成分の組み合わせに基づいて、文字列を構成し得る文字列要素の候補となる文字列要素候補であって、複数の文字からなる文字列要素候補を複数生成することによって、文字列要素候補の集合を作成する文字列要素候補作成手段と、前記文字列要素候補作成手段によって作成された文字列要素候補の集合から、過去に選択された文字列要素に続く文字列要素候補を文字列要素として選択する文字列要素選択手段として機能させるための画像処理プログラムである。 According to the ninth aspect of the present invention, the computer can configure the character string based on the combination of the connected component detected by the detecting unit detecting the combination of the connected component in the target image area and the detecting unit. A character string element candidate that is a character string element candidate that is a character string element candidate, and generates a set of character string element candidates by generating a plurality of character string element candidates composed of a plurality of characters, and the character image processing for the set of strings component candidate created by the column elements candidate creating unit, to function as a string element selection means for selecting a character string element candidates following the string elements that are selected in the past as a string element It is a program.

請求項１の画像処理装置によれば、文字列要素が複数の連結成分によって構成されており、次の文字列要素が含まれている領域内に文字列を構成するのに余分な連結成分が含まれている場合にあって、誤った文字列を生成することを防止することができる。
請求項２の画像処理装置によれば、連結成分として、その連結成分の外接矩形に内接する楕円を用いることができる。 According to the image processing apparatus of the first aspect, the character string element is composed of a plurality of connected components, and there is an extra connected component for forming a character string in the area including the next character string element. In such a case, it is possible to prevent generation of an erroneous character string.
According to the image processing apparatus of claim 2, an ellipse inscribed in the circumscribed rectangle of the connected component can be used as the connected component.

請求項３の画像処理装置によれば、過去に選択された文字列要素に基づいて、その文字列要素に続く文字列要素があるべき領域を予測することができる。 According to the image processing device of the third aspect , based on the character string element selected in the past, it is possible to predict a region where the character string element following the character string element should be.

請求項４の画像処理装置によれば、文字列要素としては不適切な連結成分を除去することができる。 According to the image processing apparatus of the fourth aspect , it is possible to remove a connected component that is inappropriate as a character string element.

請求項５の画像処理装置によれば、異なる予測領域に含まれる連結成分を組み合わせることを行わないようにすることができる。 According to the image processing device of the fifth aspect , it is possible not to combine connected components included in different prediction regions.

請求項６の画像処理装置によれば、文字列要素を文字列として出力することができる。 According to the image processing apparatus of the sixth aspect , the character string element can be output as a character string.

請求項７の画像処理装置によれば、抽出した文字列に対して、文字認識を行うことができる。 According to the image processing apparatus of the seventh aspect , character recognition can be performed on the extracted character string.

請求項８の画像処理装置によれば、抽出した文字列の歪み量に基づいて、その文字列を含む画像の歪みを補正することができる。 According to the image processing apparatus of the eighth aspect , it is possible to correct the distortion of the image including the character string based on the extracted distortion amount of the character string.

請求項９の画像処理プログラムによれば、文字列要素が複数の連結成分によって構成されており、次の文字列要素が含まれている領域内に文字列を構成するのに余分な連結成分が含まれている場合にあって、誤った文字列を生成することを防止することができる。 According to the image processing program of claim 9 , the character string element is composed of a plurality of connected components, and an extra connected component is included to form a character string in a region including the next character string element. In such a case, it is possible to prevent generation of an erroneous character string.

第１の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 1st Embodiment. 追跡型の文字列切り出しの例を示す説明図である。It is explanatory drawing which shows the example of a tracking type | mold character string cut-out. 予測領域が本来の文字列要素を含まない場合の例を示す説明図である。It is explanatory drawing which shows the example in case a prediction area | region does not contain the original character string element. 予測領域内の誤った文字列要素を選択した場合の例を示す説明図である。It is explanatory drawing which shows the example at the time of selecting the incorrect character string element in a prediction area | region. 予測領域の例を示す説明図である。It is explanatory drawing which shows the example of a prediction area | region. 予測領域の例を示す説明図である。It is explanatory drawing which shows the example of a prediction area | region. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 第１の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 1st Embodiment. 第２の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 2nd Embodiment. 第２の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 2nd Embodiment. 第３の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by 3rd Embodiment. 第３の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 3rd Embodiment. 第３の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 3rd Embodiment. 第４の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by 4th Embodiment. 第４の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by 4th Embodiment. 第５の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 5th Embodiment. 第６の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 6th Embodiment. 第７の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 7th Embodiment. 第８の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 8th Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

まず、本実施の形態を説明する前に、その前提となる技術について説明する。なお、この説明は、本実施の形態の理解を容易にすることを目的とするものである。
以下に説明する実施の形態は、文字列切り出し技術に属する。文字列切り出し技術とは、文書画像（ビットマップ画像、ストローク画像、ベクトル画像）中の、文字列領域を個々に検出する技術である。
本実施の形態では、文字列を文字列方向に連なる文字列要素のセット（集合）とする。なお、本実施の形態において文字列要素とは、連結成分のセットである。また、連結成分とは、ビットマップ画像の連結する文字画素のセットに相当する。また、ストローク画像においては、ひとつのストローク、又は被覆し合うストロークのセットに相当する。また、文字列要素は必ずしも単文字ではない。
なお、文字画素（画素塊）とは、４連結又は８連結で連続する画素領域を少なくとも含み、これらの画素領域の集合をも含む。これらの画素領域の集合とは、４連結等で連続した画素領域が複数あり、その複数の画素領域は近傍にあるものをいう。ここで、近傍にあるものとは、例えば、互いの画素領域が距離的に近いもの、文章としての１行から１文字ずつ切り出すように縦又は横方向に射影し、空白地点で切り出した画像領域、又は一定間隔で切り出した画像領域等がある。
なお、ひとつの画素塊として、１文字の画像となる場合が多い。ただし、実際に人間が文字として認識できる画素領域である必要はない。文字の一部分、文字を形成しない画素領域等もあり、何らかの画素の塊であればよい。 First, before explaining the present embodiment, a technique that is a premise thereof will be described. This description is intended to facilitate understanding of the present embodiment.
The embodiments described below belong to the character string segmentation technique. The character string cut-out technique is a technique for individually detecting character string regions in a document image (bitmap image, stroke image, vector image).
In the present embodiment, a character string is a set (set) of character string elements continuous in the character string direction. In the present embodiment, the character string element is a set of connected components. The connected component corresponds to a set of character pixels to be connected in the bitmap image. Moreover, in a stroke image, it corresponds to one stroke or a set of strokes that cover each other. The character string element is not necessarily a single character.
Note that a character pixel (pixel block) includes at least a pixel region that is continuous in four or eight connections, and includes a set of these pixel regions. The set of these pixel areas means that there are a plurality of continuous pixel areas such as 4-connected, and the plurality of pixel areas are in the vicinity. Here, what is in the vicinity is, for example, an image area in which the pixel areas are close to each other in distance, an image area that is projected vertically or horizontally so as to cut out one character at a time from a line as a sentence, and cut out at a blank spot Or an image region cut out at regular intervals.
In many cases, one pixel block is an image of one character. However, it is not necessary that the pixel area is actually recognizable as a character by humans. There are a part of a character, a pixel region that does not form a character, and the like, and any pixel block may be used.

特に本実施の形態は、追跡による文字列切り出し技術に属する。
追跡は、これまで観測している文字列要素から、続く文字列要素を観測する処理を繰り返し行う処理である。このとき、続く文字列要素があるべき領域を予測する。なお、観測とは、追跡している文字列に文字列要素を加えることである。
説明のための模式図を図２に示す。実線円は文字列要素Ｃ_ｋを、破線円は予測領域Ｒ_ｋを、矩形は連結成分ｃ_ｋ，ｉを表す。また、太線矩形は予測領域Ｒ_ｋに含まれる連結成分を表す。ここで、Ｃ_ｋ−１は、既に文字列要素として選択されたものである。そして、この状態からＣ_ｋ−１に続く文字列要素があるべき領域として予測領域Ｒ_ｋを対象として、文字列要素Ｃ_ｋを観測（選択）している。 In particular, the present embodiment belongs to a character string cutting technique by tracking.
Tracking is a process of repeatedly observing the following character string elements from the character string elements that have been observed so far. At this time, a region where a subsequent character string element should be predicted is predicted. Note that “observation” means adding a character string element to the character string being tracked.
A schematic diagram for explanation is shown in FIG. A solid line circle represents the character string element C _k , a broken line circle represents the prediction region R _k, and a rectangle represents the connected component c _{k, i} . Further, the bold line rectangle represents a linking component included in the prediction region R _k. Here, C _k-1 has already been selected as a character string element. In this state, the character string element C _k is observed (selected) for the prediction region R _k as a region where the character string element following C _k−1 should be.

文字列切り出し技術は、文書画像の文字認識や歪み補正の要素技術である。
追跡による文字列切り出し方式は、個々の文字列に対して個別に検出を行うため、他のアプローチと比較して、文字列の形状と文書のレイアウトの多様性に対して頑健な切り出し精度を示す。なお、文字列の形状とは、太さ・方向・湾曲をさす。 The character string segmentation technique is an element technique for character recognition and distortion correction of a document image.
The character string segmentation method based on tracking performs detection individually for each character string, and therefore shows robust segmentation accuracy against the diversity of character string shapes and document layouts compared to other approaches. . In addition, the shape of a character string refers to thickness, direction, and curvature.

追跡による文字列切り出しの先行方式は、予測領域が、続く文字列要素を含まない場合、文字列切り出しに失敗する。説明のための模式図を図３に示す。いま、文字列要素Ｃ_ｋ−１から続く文字列要素Ｃ_ｋを観測しようとする。そこで予測領域Ｒ_ｋを作成するが、Ｒ_ｋがＣ_ｋの要素である連結成分｛ｃ_ｋ，１，ｃ_ｋ，２｝を含まない。この場合、文字列切り出しに失敗する。 The preceding method of character string extraction by tracking fails to extract a character string when the prediction region does not include the following character string element. A schematic diagram for explanation is shown in FIG. Now, an attempt is made to observe a character string element C _k that follows the character string element C _k−1 . Therefore, although the prediction region R _k is created, the connected component {c _{k, 1} , c _{k, 2} } in which R _k is an element of C _k is not included. In this case, the character string extraction fails.

一方で、予測領域が、続く文字列要素に含まれない連結成分を含む場合、文字列切り出しに失敗する。説明のための模式図を図４に示す。図４に示す例では、Ｒ_ｋはＣ_ｋの要素でない連結成分｛ｃ_ｋ，４，ｃ_ｋ，５，ｃ_ｋ，６｝を含む。なお、ここでｃ_ｋ，６を観測することは必ずしも誤りでない。これは、文字列切り出しの目的は文字列領域を正しく検出することであり、文字列要素どうしの切れ目を正しく検出することでないためである。しかし、追跡は、これまで観測している文字列要素ｋ、続く文字列要素のあるべき領域を予測し、予測領域に含まれる文字列要素を観測する。ゆえに、追跡にとって最適でない文字列要素を観測すること、続く文字列要素の観測の精度を低下させることがある。ゆえに文字列切り出しの精度を低下させることになる。 On the other hand, when the prediction region includes a connected component that is not included in the subsequent character string element, the character string extraction fails. A schematic diagram for explanation is shown in FIG. In the example shown in FIG. 4, R _k includes a connected component {c _{k, 4} , c _{k, 5} , c _{k, 6} } that is not an element of C _k . Note that it is not necessarily an error to observe _{ck, 6} here. This is because the purpose of character string extraction is to correctly detect a character string region, not to correctly detect a break between character string elements. However, the tracking predicts the character string element k that has been observed so far, the region where the subsequent character string element should be, and observes the character string element included in the prediction region. Therefore, observing a character string element that is not optimal for tracking and reducing the accuracy of the subsequent observation of the character string element may occur. Therefore, the accuracy of character string extraction is reduced.

そこで、非特許文献１、非特許文献２、非特許文献３、特許文献２に示されている方式には、誤った観測を避けるため、文字列要素の大きさ・角度などから観測するか否かを判別する工夫がある。また、非特許文献２の方式には、複数の文字列要素から最適な文字列要素を観測する、という工夫がある。
しかし、文字列要素は必ずしもひとつの連結成分から成るとは限らない。これに対して、非特許文献１、非特許文献２、非特許文献３及び、特許文献２の方式は、文字列要素がひとつの連結成分から成ると想定する。ゆえに、ひとつの連結成分毎に観測するか否かを判別するため、図３、図４の状況下において文字列切り出しに失敗する。 Therefore, in the methods shown in Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and Patent Document 2, in order to avoid erroneous observation, whether or not to observe from the size / angle of the character string element, etc. There is a device to determine whether. Further, the method of Non-Patent Document 2 has a device of observing an optimum character string element from a plurality of character string elements.
However, a character string element does not necessarily consist of one connected component. On the other hand, the methods of Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and Patent Document 2 assume that a character string element is composed of one connected component. Therefore, in order to determine whether or not to observe every connected component, the character string segmentation fails under the situation of FIGS.

以下、図面に基づき本発明を実現するにあたっての好適な各種の実施の形態の例を説明する。
図１は、第１の実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、ひとつのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、すべての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、ひとつのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, examples of various preferred embodiments for realizing the present invention will be described with reference to the drawings.
FIG. 1 is a conceptual module configuration diagram of a configuration example according to the first embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, the values may be different from each other, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection) or the like, or one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

第１の実施の形態である画像処理装置は、文字列を切り出すものであって、図１の例に示すように、予測モジュール１１０、連結成分検出モジュール１２０、文字列要素候補作成モジュール１３０、文字列要素観測モジュール１４０、連結成分保持モジュール１５０、文字列要素保持モジュール１６０を有している。 The image processing apparatus according to the first embodiment cuts out a character string. As shown in the example of FIG. 1, the prediction module 110, the connected component detection module 120, the character string element candidate creation module 130, the character It has a column element observation module 140, a connected component holding module 150, and a character string element holding module 160.

予測モジュール１１０は、連結成分検出モジュール１２０、文字列要素観測モジュール１４０と接続されている。予測モジュール１１０は、過去に選択された文字列要素（ここでは、開始文字列要素１０５又は文字列要素１４５）に基づいて、その文字列要素に続く文字列要素があるべき予測領域１１５を予測する。つまり、予測モジュール１１０は、これまで観測している文字列要素から、続く文字列要素のあるべき領域を予測する。
連結成分検出モジュール１２０は、予測モジュール１１０、文字列要素候補作成モジュール１３０、連結成分保持モジュール１５０と接続されている。連結成分検出モジュール１２０は、予測モジュール１１０によって予測された予測領域１１５内の連結成分の組み合わせ（連結成分セット１２５）を検出する。つまり、連結成分検出モジュール１２０は、予測領域に含まれる連結成分のセットを検出する。なお、連結成分検出モジュール１２０において連結成分が観測されない場合、追跡を終了する（終了信号１９５を出力する）。そして、文字列要素保持モジュール１６０に保持する文字列要素のセットを文字列として出力する。 The prediction module 110 is connected to the connected component detection module 120 and the character string element observation module 140. The prediction module 110 predicts a prediction region 115 where a character string element following the character string element should be based on a character string element selected in the past (here, the start character string element 105 or the character string element 145). . That is, the prediction module 110 predicts a region where a subsequent character string element should be present from character string elements that have been observed so far.
The connected component detection module 120 is connected to the prediction module 110, the character string element candidate creation module 130, and the connected component holding module 150. The connected component detection module 120 detects a combination of connected components (connected component set 125) in the prediction region 115 predicted by the prediction module 110. That is, the connected component detection module 120 detects a set of connected components included in the prediction region. When no connected component is observed in the connected component detection module 120, the tracking is ended (an end signal 195 is output). Then, a set of character string elements held in the character string element holding module 160 is output as a character string.

文字列要素候補作成モジュール１３０は、連結成分検出モジュール１２０、文字列要素観測モジュール１４０と接続されている。文字列要素候補作成モジュール１３０は、連結成分検出モジュール１２０によって検出された連結成分の組み合わせに基づいて、文字列を構成し得る文字列要素の候補となる文字列要素候補の集合（文字列要素セット１３５）を作成する。つまり、文字列要素候補作成モジュール１３０は、連結成分のセットの組み合わせから文字列要素の候補のセットを作成する。
文字列要素観測モジュール１４０は、文字列要素候補作成モジュール１３０、文字列要素保持モジュール１６０、予測モジュール１１０と接続されている。文字列要素観測モジュール１４０は、文字列要素候補作成モジュール１３０によって作成された文字列要素候補の集合から、過去に選択された文字列要素に続く文字列要素（文字列要素１４５）を選択する。つまり、文字列要素観測モジュール１４０は、候補のセットから文字列に適している文字列要素を選択する。 The character string element candidate creation module 130 is connected to the connected component detection module 120 and the character string element observation module 140. The character string element candidate creation module 130, based on the combination of connected components detected by the connected component detection module 120, sets of character string element candidates that are candidates for character string elements that can form a character string (character string element set). 135). That is, the character string element candidate creation module 130 creates a character string element candidate set from a combination of connected component sets.
The character string element observation module 140 is connected to the character string element candidate creation module 130, the character string element holding module 160, and the prediction module 110. The character string element observation module 140 selects a character string element (character string element 145) that follows the character string element selected in the past from the set of character string element candidates created by the character string element candidate creation module 130. That is, the character string element observation module 140 selects a character string element suitable for the character string from the candidate set.

連結成分保持モジュール１５０は、連結成分検出モジュール１２０と接続されている。連結成分保持モジュール１５０は、画像内の連結成分を予め記憶している。例えば、対象としている画像から既存の技術を用いて連結成分を抽出しておき、連結成分保持モジュール１５０に記憶しており、連結成分検出モジュール１２０に対して供給する。
文字列要素保持モジュール１６０は、文字列要素観測モジュール１４０と接続されている。文字列要素保持モジュール１６０は、文字列要素観測モジュール１４０によって出力された文字列要素１４５を記憶する。この文字列要素保持モジュール１６０内に記憶されている文字列要素１４５は対象としている画像内の文字列である。 The connected component holding module 150 is connected to the connected component detection module 120. The connected component holding module 150 stores the connected components in the image in advance. For example, a connected component is extracted from the target image using an existing technique, stored in the connected component holding module 150, and supplied to the connected component detection module 120.
The character string element holding module 160 is connected to the character string element observation module 140. The character string element holding module 160 stores the character string element 145 output by the character string element observation module 140. The character string element 145 stored in the character string element holding module 160 is a character string in the target image.

なお、第１の実施の形態では、予測モジュール１１０は無くてもよい。その場合、連結成分検出モジュール１２０は、対象とする画像の領域内の連結成分の組み合わせを検出する。つまり、開始文字列要素１０５又は文字列要素１４５に基づいて、連結成分セット１２５を検出する。その場合、対象とする領域は、例えば、開始文字列要素１０５又は文字列要素１４５の右側の領域（左から右への横書きである場合）、左側の領域（右から左への横書きである場合）、下側の領域（縦書きである場合）としてもよい。 In the first embodiment, the prediction module 110 may be omitted. In that case, the connected component detection module 120 detects a combination of connected components in the region of the target image. That is, the connected component set 125 is detected based on the start character string element 105 or the character string element 145. In this case, the target area is, for example, the area on the right side of the start character string element 105 or the character string element 145 (in the case of horizontal writing from left to right) and the area on the left side (in the case of horizontal writing from right to left). ), Or a lower area (in the case of vertical writing).

以下に、予測モジュール１１０、文字列要素候補作成モジュール１３０、文字列要素観測モジュール１４０の処理について詳述する。
予測モジュール１１０は、予測領域を設定する。文字列要素候補作成モジュール１３０は、連結成分の組み合わせから文字列要素の候補を作成する。そして、文字列要素観測モジュール１４０は、文字列要素の候補から文字列として適しているものを選択する。
説明のための模式図を図５〜７に示す。
まず、文字列要素候補作成モジュール１３０は、予測領域Ｒ_ｋに含まれる連結成分のセット｛ｃ_ｋ，１，…，ｃ_ｋ，Ｍ｝を取得する。次に、連結成分の組み合わせから、続く文字列要素の候補のセット｛Ｃ_ｋ，１，…，Ｃ_ｋ，Ｎ｝を作成する（ｓｔｅｐ１）。
次に、文字列要素観測モジュール１４０は、文字列要素の候補のセットから、文字列に適している文字列要素Ｃ_ｋ，ｎ’を選択する（ｓｔｅｐ２）。 Hereinafter, the processes of the prediction module 110, the character string element candidate creation module 130, and the character string element observation module 140 will be described in detail.
The prediction module 110 sets a prediction area. The character string element candidate creation module 130 creates a character string element candidate from a combination of connected components. Then, the character string element observation module 140 selects a suitable character string from the character string element candidates.
Schematic diagrams for explanation are shown in FIGS.
First, the character string element candidate creation module 130 acquires a set of connected components {c _{k, 1} ,..., C _{k, M} } included in the prediction region R _k . Next, a set {C _{k, 1} ,..., C _{k, N} } of subsequent character string elements is created from the combination of connected components (step 1).
Next, the character string element observation module 140 selects a character string element C _{k, n ′} suitable for the character string from the set of character string element candidates (step 2).

なお、予測モジュール１１０が行うＲ_ｋの作成方法は、目的に応じて自由に決定してよい。例えば、以下のような方法がある。
いま、文字列要素Ｃ_ｋの外接矩形をｒ_ｋ＝（ｘ_ｋ，ｙ_ｋ，ｗ_ｋ，ｈ_ｋ）^Ｔとする。なお、ｘ_ｋ，ｙ_ｋはｒ_ｋの中心座標、ｗ_ｋ，ｈ_ｋはｒ_ｋのサイズである。最近観測したｍ個の文字列要素のセット｛Ｃ_{ｋ−１−ｍ}，…，Ｃ_ｋ−１｝から、続く文字列要素のセットＣ_ｋ−１のあるべき領域Ｒ_ｋを矩形（ｘ_ｋ，ｙ_ｋ，ｗ_ｋ，ｈ_ｋ）^Ｔとして、数式１、数式２、数式３、数式４のように算出する。なお、δ_ｋはＣ_ｋとＣ_ｋ−１の成す距離であり数式５のように算出する。また、θ_ｋはＣ_ｋとＣ_ｋ−１の成す角度であり数式６のように算出する。ここに示す例では、文字列要素の外接矩形のセット｛ｒ_{ｋ−１−ｍ}，…，ｒ_ｋ−１｝の平均を利用している。以上の様子を図５に示す。

Note that the method of creating R _k performed by the prediction module 110 may be freely determined according to the purpose. For example, there are the following methods.
Now, let the circumscribed rectangle of the character string element C _{k be} r _k = (x _k , y _k , w _k , h _k ) ^T. In _addition, x _k, the center coordinates of the _{y k} is _{r _{_k,}} w _{_k,} _h _k is the size of _{r k.} From the recently observed set of m string elements {C _k-1-m ,..., C _k-1 }, the region R _k that should be the subsequent set of string elements C _k−1 is _defined as a rectangle (x _k , y _k , w _k , h _k ) ^T is calculated as Equation 1, Equation 2, Equation 3, and Equation 4. Note that δ _k is the distance formed by C _k and C _k−1 and is calculated as in Equation 5. Θ _k is an angle formed by C _k and C _k−1 , and is calculated as in Expression 6. In the example shown here, an average of a set of circumscribed rectangles {r _k-1-m ,..., R _k-1 } of character string elements is used. The above situation is shown in FIG.

また、Ｃ_ｋの外接矩形ではなく、Ｃ_ｋの外接矩形が成す楕円に基づいて、Ｒ_ｋの算出を行ってもよい。このとき、Ｃ_ｋとＣ_ｋ−１の成す距離δ_ｋは数式７のように、楕円領域を含まないものであってもよい。なおＲａｄｉｕｓ（ｒ_ｋ，θ_ｋ）は、数式１０により計算される、矩形ｒ_ｋに内接する楕円の角度θ_ｋに関する半径である。また、Ｃ_ｋのサイズは、文字列方向と文字列と直交する方向について、それぞれ算出される楕円の直径であってもよい。それぞれ、τ_ｋ、λ_ｋとして数式８、数式９のように算出する。それぞれ、文字列要素の外接矩形のセット｛ｒ_{ｋ−１−ｍ}，…ｒ_ｋ−１｝から算出し、数式１、数式２、数式３、数式４、数式５と同様に平均を利用して算出する。この様子を図６に示す。

また、平均を利用するのではなく、２乗誤差最小法を用いてもよい。又は、Ｋａｌｍａｎ−Ｆｉｌｔｅｒを利用した逐次演算を行ってもよい。さらに、Ｒ_ｋの領域のサイズを、ある割合で拡大してもよい。 Further, instead of the circumscribed rectangle of C _k, on the basis of the ellipse formed by circumscribing rectangle of C _k, may be performed to calculate the R _k. At this time, the distance δ _k formed by C _k and C _k−1 may not include an elliptical region as shown in Equation 7. Note Radius _{(r _k, θ k)} is calculated by the equation 10, the radius about the angle theta _k ellipse inscribing the rectangle _{r k.} The size of C _k may be the diameter of an ellipse calculated for the character string direction and the direction orthogonal to the character string. Τ _k and λ _k are calculated as Equation 8 and Equation 9, respectively. Each is calculated from a set of circumscribing rectangles {r _k-1−m ,... R _k−1 } of character string elements, and uses the average in the same manner as Equation 1, Equation 2, Equation 3, Equation 4, and Equation 5. calculate. This is shown in FIG.

Further, instead of using the average, the square error minimum method may be used. Or you may perform the sequential calculation using Kalman-Filter. Further, the size of the region of R _k may be enlarged at a certain rate.

また、文字列要素観測モジュール１４０が行う文字列要素Ｃ_ｋ，ｎ’の選択は、目的に応じて自由に決定してよい。
例えば、以下のような方法がある。
いま文字列要素の特徴量をｅ_ｋとする。またｅ_ｋどうしの距離を表す評価関数ｄを用いて、数式１１を満たすｎ’を選択する。なお、ｅ^ｍ _ｋ，ｎはｎ番目の文字列要素の候補の特徴量である。また、ｅ^ｐ _ｋ，ｎは基準となる文字列要素の特徴量である。文字列要素の特徴量ｅ_ｋは、例えば（ｘ_ｋ，ｙ_ｋ，ｗ_ｋ，ｈ_ｋ）や（θ_ｋ，δ_ｋ，τ_ｋ，λ_ｋ）とする方法がある。それぞれ、数式１、数式２、数式３、数４、数式５、数式６、数式２、数式３、数式４のように算出する。評価関数ｄは、例えば数式１２が考えられる。いま、文字列要素ｅのｉ番目の特徴量を［ｅ］_ｉとする。なお、ψ_ｉは重みである。
また、文字列に適している文字列要素Ｃ_ｋ，ｎ’の選択は、決定木による選択を行ってもよい。

また、連結成分の総数Ｎに対する文字列要素の候補の総数の最大値は、２^Ｎ（＝Ｍ）であるが、適用目的に応じて２^Ｎ以下の候補数へ限定してもよい。 The selection of the character string element C _{k, n ′} performed by the character string element observation module 140 may be freely determined according to the purpose.
For example, there are the following methods.
Now, let e _{k be} the feature quantity of the character string element. And using the evaluation function d representing a distance and if e _k, selects n 'satisfying Equation 11. Note that e ^m _{k, n} is a feature amount of an nth character string element candidate. Further, e ^p _{k, n} is a feature amount of a character string element serving as a reference. For example, the character string element feature amount e _k may be (x _k , y _k , w _k , h _k ) or (θ _k , δ _k , τ _k , λ _k ). Calculations are made as Equation 1, Equation 2, Equation 3, Equation 4, Equation 5, Equation 6, Equation 2, Equation 3, and Equation 4, respectively. As the evaluation function d, for example, Formula 12 can be considered. Now, the i-th feature amount of the character string element e is [e] _i . Note that ψ _i is a weight.
The selection of the character string element C _{k, n ′} suitable for the character string may be performed by a decision tree.

The maximum value of the total number of character string element candidates with respect to the total number N of connected components is 2 ^N (= M), but may be limited to 2 ^N or less candidates depending on the application purpose.

本実施の形態は、追跡において連結成分の組み合わせを用いる。これにより、必ずしも文字列要素がひとつの連結成分から成るとは限らず、かつ、予測領域が正しい文字列要素のみを含まない場合（過不足がある場合）においても対処可能となる。
また追跡において文字列要素どうしの距離が一定であることを利用する場合がある。距離とはδ_ｋに相当する。このとき、文字列方向について適した切れ目で文字列要素を観測することは、切り出しの精度を改善する。なおこのことはδ_ｋに限らず、他の文字列要素どうしの特徴量についても同様のことがいえる。 In this embodiment, a combination of connected components is used for tracking. As a result, the character string element is not necessarily composed of one connected component, and it is possible to cope with the case where the prediction region does not include only the correct character string element (when there is an excess or deficiency).
Further, there is a case where the distance between character string elements is constant in tracking. Distance to correspond to [delta] _k. At this time, observing the character string element at a suitable break in the character string direction improves the accuracy of clipping. Note this is not limited to the [delta] _k, same is true for the feature quantity of each other other string element.

図８は、第１の実施の形態による処理例を示すフローチャートである。
ステップＳ８０２では、予測モジュール１１０へ、文字列要素Ｃ_１を入力する。
ステップＳ８０４では、変数ｋに１を代入する。
ステップＳ８０６では、変数ｋにｋ＋１を代入する。
ステップＳ８０８では、予測モジュール１１０が、予測領域Ｒ_ｋを作成する。
ステップＳ８１０では、連結成分検出モジュール１２０が、連結成分保持モジュール１５０内からＲ_ｋに含まれる連結成分のセットＳを検出する。
ステップＳ８１２では、連結成分検出モジュール１２０が、＃｛Ｓ｝＞０であるか否かを判断し、＃｛Ｓ｝＞０である場合はステップＳ８１４へ進み、それ以外の場合はステップＳ８２０へ進む。なお、＃｛Ｓ｝は集合Ｓの要素数である。 FIG. 8 is a flowchart illustrating a processing example according to the first exemplary embodiment.
In step S <b> 802, the character string element C <b> ₁ is input to the prediction module 110.
In step S804, 1 is substituted into the variable k.
In step S806, k + 1 is substituted for variable k.
In step S808, the prediction module 110 generates a prediction region _{R k.}
In step S810, the connected component detection module 120 detects the set S of connected components included in R _k from within the connected component holding module 150.
In step S812, the connected component detection module 120 determines whether or not # {S}> 0. If # {S}> 0, the process proceeds to step S814; otherwise, the process proceeds to step S820. . Note that # {S} is the number of elements in the set S.

ステップＳ８１４では、文字列要素候補作成モジュール１３０が、Ｓの要素の組み合わせからＣ_ｋの候補のセットＥを作成する。
ステップＳ８１６では、文字列要素観測モジュール１４０が、Ｅから要素Ｃ_ｋを選択する。
ステップＳ８１８では、文字列要素保持モジュール１６０が、Ｃ_ｋを保持し、ステップＳ８０６へ戻る。
ステップＳ８２０では、連結成分検出モジュール１２０が、｛Ｃ_１，…，Ｃ_ｋ｝を出力する。 In step S814, character string component candidate creation module 130 creates a set E of combinations of elements of S in _{C k} candidates.
In step S816, the character string element observation module 140 selects an element C _k from E.
In step S818, the character string element holding module 160 holds C _k and returns to step S806.
In step S820, the connected component detection module 120 outputs {C ₁ ,..., C _k }.

図９は、第２の実施の形態の構成例についての概念的なモジュール構成図である。
第２の実施の形態である画像処理装置は、連結成分を選択して文字列を切り出すものであって、図９の例に示すように、予測モジュール１１０、連結成分検出モジュール１２０、連結成分選択モジュール９２６、文字列要素候補作成モジュール１３０、文字列要素観測モジュール１４０、連結成分保持モジュール１５０、文字列要素保持モジュール１６０を有している。第２の実施の形態は、第１の実施の形態に連結成分選択モジュール９２６を加えたものである。なお、前述の実施の形態と同種の部位には同一符号を付し重複した説明を省略する（以下、同様）。 FIG. 9 is a conceptual module configuration diagram of a configuration example according to the second embodiment.
The image processing apparatus according to the second embodiment selects a connected component and cuts out a character string. As shown in the example of FIG. 9, the prediction module 110, the connected component detection module 120, the connected component selection, and the like. A module 926, a character string element candidate creation module 130, a character string element observation module 140, a connected component holding module 150, and a character string element holding module 160 are included. In the second embodiment, a connected component selection module 926 is added to the first embodiment. In addition, the same code | symbol is attached | subjected to the site | part of the same kind as the above-mentioned embodiment, and the overlapping description is abbreviate | omitted (hereinafter the same).

予測モジュール１１０は、連結成分検出モジュール１２０、連結成分選択モジュール９２６、文字列要素観測モジュール１４０と接続されている。予測モジュール１１０は、予測領域１１５を連結成分検出モジュール１２０、連結成分選択モジュール９２６へ渡す。
連結成分検出モジュール１２０は、予測モジュール１１０、連結成分選択モジュール９２６、連結成分保持モジュール１５０と接続されている。連結成分検出モジュール１２０は、連結成分セット１２５を連結成分選択モジュール９２６へ渡す。
連結成分選択モジュール９２６は、予測モジュール１１０、連結成分検出モジュール１２０、文字列要素候補作成モジュール１３０と接続されている。連結成分選択モジュール９２６は、予測領域１１５内で連結成分検出モジュール１２０によって検出された連結成分の組み合わせ（連結成分セット１２５）から、文字列要素としては不適切な連結成分を除去することによって連結成分を選択する。選択した結果が文字列要素セット９２８である。つまり、連結成分選択モジュール９２６において、連結成分検出モジュール１２０が検出した連結成分セット１２５から、文字列要素としては不適切な要素を除く。なお、ここで不適切とは、既に他の文字列として検出されている連結成分や、これまで観測している文字列要素とサイズが大きく異なる連結成分を指す。具体的には、これまで観測している文字列要素のサイズの統計値（平均値、中央値、最頻値等）を算出し、その統計値と文字列要素のサイズの差が予め定められた値以上である場合には、その文字列要素を不適切な要素と判断する。この処理はノイズ除去に相当し、切り出し精度を向上させる場合に行う。
文字列要素候補作成モジュール１３０は、連結成分選択モジュール９２６、文字列要素観測モジュール１４０と接続されている。文字列要素候補作成モジュール１３０は、連結成分選択モジュール９２６によって選択された連結成分の組み合わせ（文字列要素セット９２８）に基づいて、文字列要素候補の集合を作成する。 The prediction module 110 is connected to the connected component detection module 120, the connected component selection module 926, and the character string element observation module 140. The prediction module 110 passes the prediction region 115 to the connected component detection module 120 and the connected component selection module 926.
The connected component detection module 120 is connected to the prediction module 110, the connected component selection module 926, and the connected component holding module 150. The connected component detection module 120 passes the connected component set 125 to the connected component selection module 926.
The connected component selection module 926 is connected to the prediction module 110, the connected component detection module 120, and the character string element candidate creation module 130. The connected component selection module 926 removes a connected component that is inappropriate as a character string element from the combination of connected components detected by the connected component detection module 120 in the prediction region 115 (connected component set 125). Select. The selected result is a character string element set 928. That is, the connected component selection module 926 removes an element that is inappropriate as a character string element from the connected component set 125 detected by the connected component detection module 120. Here, inappropriate means a connected component that has already been detected as another character string, or a connected component that is significantly different in size from the character string element that has been observed so far. Specifically, a statistical value (average value, median value, mode value, etc.) of the size of the character string element observed so far is calculated, and the difference between the statistical value and the size of the character string element is determined in advance. If the value is greater than or equal to the value, the character string element is determined to be an inappropriate element. This process corresponds to noise removal and is performed when the cutout accuracy is improved.
The character string element candidate creation module 130 is connected to the connected component selection module 926 and the character string element observation module 140. The character string element candidate creation module 130 creates a set of character string element candidates based on the combination of connected components selected by the connected component selection module 926 (character string element set 928).

図１０は、第２の実施の形態による処理例を示すフローチャートである。
ステップＳ１００２では、予測モジュール１１０へ、文字列要素Ｃ_１を入力する。
ステップＳ１００４では、変数ｋに１を代入する。
ステップＳ１００６では、変数ｋにｋ＋１を代入する。
ステップＳ１００８では、予測モジュール１１０が、予測領域Ｒ_ｋを作成する。
ステップＳ１０１０では、連結成分検出モジュール１２０が、Ｒ_ｋに含まれる連結成分のセットＳ’を検出する。
ステップＳ１０１２では、連結成分選択モジュール９２６が、Ｓ’の要素について、不適切なものを除去し、Ｓを作成する。 FIG. 10 is a flowchart illustrating a processing example according to the second exemplary embodiment.
In step S <b> 1002, the character string element C <b> ₁ is input to the prediction module 110.
In step S1004, 1 is substituted into the variable k.
In step S1006, k + 1 is substituted for variable k.
In step S1008, the prediction module 110 generates a prediction region _{R k.}
In step S1010, the connected component detection module 120 detects a connected component set S ′ included in R _k .
In step S1012, the connected component selection module 926 removes inappropriate elements from S ′ and creates S.

ステップＳ１０１４では、連結成分選択モジュール９２６が、＃｛Ｓ｝＞０であるか否かを判断し、＃｛Ｓ｝＞０である場合はステップＳ１０１６へ進み、それ以外の場合はステップＳ１０２２へ進む。
ステップＳ１０１６では、文字列要素候補作成モジュール１３０が、Ｓの要素の組み合わせからＣ_ｋの候補のセットＥを作成する。
ステップＳ１０１８では、文字列要素観測モジュール１４０が、Ｅから要素Ｃ_ｋを選択する。
ステップＳ１０２０では、文字列要素保持モジュール１６０が、Ｃ_ｋを保持し、ステップＳ１００６へ戻る。
ステップＳ１０２２では、連結成分検出モジュール１２０が、｛Ｃ_１，…，Ｃ_ｋ｝を出力する。 In step S1014, the connected component selection module 926 determines whether or not # {S}> 0. If # {S}> 0, the process proceeds to step S1016. Otherwise, the process proceeds to step S1022. .
In step S1016, the string element candidate creation module 130 creates a set E of combinations of elements of S in _{C k} candidates.
In step S1018, the character string element observation module 140 selects an element C _k from E.
In step S1020, the character string element holding module 160 holds C _k and returns to step S1006.
In step S1022, the connected component detection module 120 outputs {C ₁ ,..., C _k }.

図１１は、第３の実施の形態による処理例を示す説明図である。第３の実施の形態は、複数の予測領域を対象とする。
まず、複数予測領域作成モジュール１２０６が、複数の予測領域を作成する、次に、文字列要素候補作成モジュール１３０が、それぞれの予測領域に含まれる連結成分の組み合わせから文字列要素の候補のセットを作成する。次に、文字列要素観測モジュール１４０が、候補のセットから文字列として適している文字列要素を選択する。 FIG. 11 is an explanatory diagram illustrating a processing example according to the third exemplary embodiment. The third embodiment targets a plurality of prediction regions.
First, the multiple prediction region creation module 1206 creates a plurality of prediction regions. Next, the character string element candidate creation module 130 generates a set of character string element candidates from combinations of connected components included in each prediction region. create. Next, the character string element observation module 140 selects a character string element suitable as a character string from the candidate set.

説明のための模式図を図１１に示す。まず、複数予測領域作成モジュール１２０６が、複数の予測領域Ｒ_ｋ，１，Ｒ_ｋ，２，Ｒ_ｋ，３を作成する。Ｒ_ｋ，１からは｛ｃ_ｋ，１，ｃ_ｋ，２，ｃ_ｋ，３｝を検出する。Ｒ_ｋ，２からは｛ｃ_ｋ，３｝を検出する。Ｒ_ｋ，３からは｛ｃ_ｋ，４，ｃ_ｋ，５｝を検出する。異なる予測領域に含まれる連結成分の組み合わせを行わないため、図７の例と比較して、候補数が減少することがわかる。 A schematic diagram for explanation is shown in FIG. First, the multiple prediction region creation module 1206 creates multiple prediction regions R _{k, 1} , R _{k, 2} , R _{k, 3} . {C _{k, 1} , c _{k, 2} , c _{k, 3} } is detected from R _{k, 1} . {C _{k, 3} } is detected from R _{k, 2} . {C _{k, 4} , c _{k, 5} } is detected from R _{k, 3} . Since combinations of connected components included in different prediction regions are not performed, it can be seen that the number of candidates is reduced as compared to the example of FIG.

図１２は、第３の実施の形態の構成例についての概念的なモジュール構成図である。第３の実施の形態は、図１２の例に示すように、複数予測領域作成モジュール１２０６、連結成分検出モジュール１２０、文字列要素候補作成モジュール１３０、文字列要素観測モジュール１４０、連結成分保持モジュール１５０、文字列要素保持モジュール１６０を有している。
複数予測領域作成モジュール１２０６は、連結成分検出モジュール１２０、文字列要素観測モジュール１４０と接続されている。複数予測領域作成モジュール１２０６は、複数の予測領域を作成する。例えば、図５、６の例を用いて前述した方法でパラメータ（例えば、θ等）を異ならせることによって、複数の予測領域Ｒを作成する。 FIG. 12 is a conceptual module configuration diagram of an exemplary configuration according to the third embodiment. In the third embodiment, as shown in the example of FIG. 12, a plurality of prediction region creation module 1206, a connected component detection module 120, a character string element candidate creation module 130, a character string element observation module 140, and a connected component holding module 150. And a character string element holding module 160.
The multiple prediction region creation module 1206 is connected to the connected component detection module 120 and the character string element observation module 140. The multiple prediction region creation module 1206 creates a plurality of prediction regions. For example, a plurality of prediction regions R are created by varying parameters (for example, θ and the like) by the method described above using the examples of FIGS.

連結成分検出モジュール１２０は、複数予測領域作成モジュール１２０６、文字列要素候補作成モジュール１３０、連結成分保持モジュール１５０と接続されている。連結成分検出モジュール１２０は、は、異なる予測領域に含まれる連結成分を組み合わせることを行わない。つまり、複数予測領域作成モジュール１２０６が作成した予測領域毎に、連結成分の組み合わせを検出する。
文字列要素候補作成モジュール１３０は、連結成分検出モジュール１２０、文字列要素観測モジュール１４０と接続されている。文字列要素候補作成モジュール１３０は、それぞれの予測領域毎に、連結成分のセットの組み合わせから文字列要素の候補のセットを作成する。
文字列要素観測モジュール１４０は、文字列要素候補作成モジュール１３０、文字列要素保持モジュール１６０、複数予測領域作成モジュール１２０６と接続されている。文字列要素観測モジュール１４０は、それぞれの予測領域について作成された文字列要素候補から、文字列として適している文字列要素を選択する。 The connected component detection module 120 is connected to the multiple prediction region creation module 1206, the character string element candidate creation module 130, and the connected component holding module 150. The connected component detection module 120 does not combine connected components included in different prediction regions. That is, a combination of connected components is detected for each prediction region created by the multiple prediction region creation module 1206.
The character string element candidate creation module 130 is connected to the connected component detection module 120 and the character string element observation module 140. The character string element candidate creation module 130 creates a candidate set of character string elements from combinations of sets of connected components for each prediction region.
The character string element observation module 140 is connected to the character string element candidate creation module 130, the character string element holding module 160, and the multiple prediction region creation module 1206. The character string element observation module 140 selects a character string element suitable as a character string from character string element candidates created for each prediction region.

図１３は、第３の実施の形態による処理例を示すフローチャートである。
ステップＳ１３０２では、予測領域のセットＡ＝｛Ｒ_１，…，Ｒ_＃｛Ａ｝｝を作成する。
ステップＳ１３０４では、変数ｍｉｎに∞（∞であることを示す値）を代入する。
ステップＳ１３０６では、変数ａに１を代入する。
ステップＳ１３０８では、Ｒ_ａについて、連結成分検出モジュール１２０が、連結成分検出、文字列要素候補作成モジュール１３０が、文字列要素候補作成、文字列要素観測モジュール１４０が、文字列として適している文字列要素検出を行い、Ｃ_ａを作成する。 FIG. 13 is a flowchart illustrating a processing example according to the third exemplary embodiment.
In step S1302, a prediction area set A = {R ₁ ,..., R _{# {A}} } is created.
In step S1304, ∞ (a value indicating ∞) is substituted for the variable min.
In step S1306, 1 is substituted into variable a.
In step S1308, for R _a , the connected component detection module 120 performs the connected component detection, the character string element candidate creation module 130 uses the character string element candidate creation, and the character string element observation module 140 uses a character string suitable as a character string. Element detection is performed to create C _a .

ステップＳ１３１０では、ｍｉｎ＞ｄ（Ｃ_ａ）であるか否かを判断し、ｍｉｎ＞ｄ（Ｃ_ａ）である場合はステップＳ１３１２へ進み、それ以外の場合はステップＳ１３１６へ進む。なお、関数ｄ（Ｃ）は、文字列要素Ｃが適切であるほど小さな値を返す評価関数である。例えば、関数ｄ（Ｃ）は数式１２で実施される。
ステップＳ１３１２では、変数ｍｉｎにｄ（Ｃ_ａ）を代入する。
ステップＳ１３１４では、変数ａ’にａを代入する。
ステップＳ１３１６では、ａ＜＃｛Ａ｝であるか否かを判断し、ａ＜＃｛Ａ｝である場合はステップＳ１３１８へ進み、それ以外の場合は処理を終了する（ステップＳ１３９９）。
ステップＳ１３１８では、変数ａにａ＋１を代入し、ステップＳ１３０８へ戻る。 In step S1310, it is determined whether the min> d _{(C a),} if a min> d _{(C a)} the process proceeds to step S1312, otherwise, the process goes to step S1316. The function d (C) is an evaluation function that returns a smaller value as the character string element C is appropriate. For example, the function d (C) is implemented by Equation 12.
In step S1312, d (C _a ) is substituted for the variable min.
In step S1314, a is substituted for variable a ′.
In step S1316, it is determined whether or not a <# {A}. If a <# {A}, the process proceeds to step S1318; otherwise, the process ends (step S1399).
In step S1318, a + 1 is substituted for variable a, and the process returns to step S1308.

図１４は、第４の実施の形態による処理例を示す説明図である。第４の実施の形態は、注目している変数を対象として組み合わせを行う。
文字列要素候補作成モジュール１３０と文字列要素選択モジュール１４０の変形例として、注目している変数を有している組み合わせを対象とするものである。連結成分のセットを、予め定められた変数についてソートし、文字列要素とする連結成分のセットの開始位置と終了位置を選択する。
図１４に説明のための模式図を示す。また図１５に第４の実施の形態の処理フローを示す。
まず、文字列要素候補作成モジュール１３０が、予測領域に含まれる連結成分を注目変数についてソートする。図１４の例では追跡の進行方向について、位置の順にソートしている。注目変数は他に角度やサイズなどを用いてもよい。次に、ソートされた連結成分のセットについて、その開始位置ｉと終了位置ｊとの組み合わせから、文字列要素の候補のセットを作成する。図１４の例では、開始位置ｉ（＝２）と終了位置ｊ（＝４）の間にあるｃ_ｋ，２、ｃ_ｋ，３、ｃ_ｋ，４を対象として、文字列要素の候補のセットを作成する。
そして、文字列要素観測モジュール１４０が、候補から文字列として適している文字列要素を選択する。
この処理により、最大で２^＃｛Ｓ｝であった文字列要素の候補数が_＃｛Ｓ｝Ｃ_２に減少する。 FIG. 14 is an explanatory diagram illustrating a processing example according to the fourth exemplary embodiment. In the fourth embodiment, combinations are performed with respect to a variable of interest.
As a modification of the character string element candidate creation module 130 and the character string element selection module 140, a combination having a variable of interest is targeted. A set of connected components is sorted with respect to a predetermined variable, and a start position and an end position of a set of connected components that are character string elements are selected.
FIG. 14 shows a schematic diagram for explanation. FIG. 15 shows a processing flow of the fourth embodiment.
First, the character string element candidate creation module 130 sorts the connected components included in the prediction region for the variable of interest. In the example of FIG. 14, the tracking direction is sorted in order of position. As the attention variable, an angle, a size, or the like may be used. Next, a set of candidate character string elements is created from the combination of the start position i and the end position j for the set of connected components. In the example of FIG. 14, a set of character string element candidates for c _{k, 2} , c _{k, 3} , c _{k, 4} between the start position i (= 2) and the end position j (= 4). Create
Then, the character string element observation module 140 selects a character string element suitable as a character string from the candidates.
This process, up to ^{2 #} number of candidates for a character string element in ^{S} is reduced to _{# {S}} _{C 2.}

図１５は、第４の実施の形態による処理例を示すフローチャートである。
ステップＳ１５０２では、連結成分のセットＳを注目変数についてソートする。
ステップＳ１５０４では、変数ｍｉｎに∞を代入する。
ステップＳ１５０６では、変数ｉに１を代入する。
ステップＳ１５０８では、変数ｊにｉを代入する。
ステップＳ１５１０では、ｍｉｎ＞ｄ（｛ｃ_ｉ，…，ｃ_ｊ｝）であるか否かを判断し、ｍｉｎ＞ｄ（｛ｃ_ｉ，…，ｃ_ｊ｝）である場合はステップＳ１５１２へ進み、それ以外の場合はステップＳ１５１８へ進む。
ステップＳ１５１２では、変数ｍｉｎにｄ（｛ｃ_ｉ，…，ｃ_ｊ｝）を代入する。 FIG. 15 is a flowchart illustrating a processing example according to the fourth exemplary embodiment.
In step S1502, the set S of connected components is sorted for the variable of interest.
In step S1504, ∞ is substituted for the variable min.
In step S1506, 1 is substituted into the variable i.
In step S1508, i is substituted into variable j.
In step _{S1510, min> d ({c,} ..., c j}) determines _{whether, min> d ({c i} , ..., c j}) if a flow proceeds to step S1512, Otherwise, the process proceeds to step S1518.
In step S1512, d ({c _i ,..., C _j }) is substituted for the variable min.

ステップＳ１５１４では、変数ｉ’に１を代入する。
ステップＳ１５１６では、変数ｊ’にｉを代入する。
ステップＳ１５１８では、ｊ＜＃｛Ｓ｝であるか否かを判断し、ｊ＜＃｛Ｓ｝である場合はステップＳ１５２０へ進み、それ以外の場合はステップＳ１５２２へ進む。
ステップＳ１５２０では、変数ｊにｊ＋１を代入し、ステップＳ１５１２へ戻る。
ステップＳ１５２２では、ｉ＜＃｛Ｓ｝であるか否かを判断し、ｉ＜＃｛Ｓ｝である場合はステップＳ１５２４へ進み、それ以外の場合はステップＳ１５２６へ進む。
ステップＳ１５２４では、変数ｉにｉ＋１を代入し、ステップＳ１５０８へ戻る。
ステップＳ１５２６では、｛ｃ_ｉ’，…，ｃ_ｊ’｝を出力する。 In step S1514, 1 is substituted into variable i ′.
In step S1516, i is substituted into variable j ′.
In step S1518, it is determined whether or not j <# {S}. If j <# {S}, the process proceeds to step S1520. Otherwise, the process proceeds to step S1522.
In step S1520, j + 1 is substituted for variable j, and the process returns to step S1512.
In step S1522, it is determined whether i <# {S}. If i <# {S}, the process proceeds to step S1524. Otherwise, the process proceeds to step S1526.
In step S1524, i + 1 is substituted for variable i, and the process returns to step S1508.
In step S1526, {c _{i ′} ,..., C _{j ′} } are output.

図１６は、第５の実施の形態の構成例についての概念的なモジュール構成図である。第５の実施の形態は、複数の文字列を切り出すものであり、図１６の例に示すように、終了判別モジュール１６１０、文字列追跡モジュール１６２０、文字列出力モジュール１６３０、連結成分保持モジュール１６４０を有している。
終了判別モジュール１６１０は、文字列追跡モジュール１６２０、文字列出力モジュール１６３０、連結成分保持モジュール１６４０と接続されている。終了判別モジュール１６１０は、画像内から連結成分である開始のための文字列要素が検出されなくなることを終了として判別する。具体的には、終了判別モジュール１６１０は、連結成分保持モジュール１６４０から開始文字列要素１６１５を作成し、連結成分保持モジュール１６４０に開始文字列要素を作成するための連結成分がないとき、処理を終了する。作成した開始文字列要素１６１５を文字列追跡モジュール１６２０に渡す。 FIG. 16 is a conceptual module configuration diagram of a configuration example according to the fifth embodiment. In the fifth embodiment, a plurality of character strings are cut out. As shown in the example of FIG. 16, an end determination module 1610, a character string tracking module 1620, a character string output module 1630, and a connected component holding module 1640 are provided. Have.
The end determination module 1610 is connected to the character string tracking module 1620, the character string output module 1630, and the connected component holding module 1640. The end determination module 1610 determines that the character string element for start as a connected component is not detected from the image as end. Specifically, the end determination module 1610 creates the start character string element 1615 from the connected component holding module 1640, and ends the processing when there is no connected component for creating the start character string element in the connected component holding module 1640. To do. The created start character string element 1615 is passed to the character string tracking module 1620.

文字列追跡モジュール１６２０は、終了判別モジュール１６１０、文字列出力モジュール１６３０、連結成分保持モジュール１６４０と接続されている。文字列追跡モジュール１６２０は、前述の実施の形態、又は、その組み合わせである。
文字列出力モジュール１６３０は、終了判別モジュール１６１０、文字列追跡モジュール１６２０、連結成分保持モジュール１６４０と接続されている。文字列出力モジュール１６３０は、文字列要素観測モジュール１４０によって選択された文字列要素（文字列要素保持モジュール１６０に記憶されている文字列要素１４５）を文字列として出力する。つまり、文字列出力モジュール１６３０は、文字列追跡モジュール１６２０が保持する文字列要素のセットを文字列として出力する。
連結成分保持モジュール１６４０は、終了判別モジュール１６１０、文字列追跡モジュール１６２０、文字列出力モジュール１６３０と接続されている。連結成分保持モジュール１６４０は、前述の連結成分保持モジュール１５０に該当する。そして、文字列追跡モジュール１６２０は、検出した文字列が含む連結成分に対して、既に検出されたことを示す情報を付加するよう連結成分保持モジュール１６４０内の情報を更新する。 The character string tracking module 1620 is connected to the end determination module 1610, the character string output module 1630, and the connected component holding module 1640. The character string tracking module 1620 is the above-described embodiment or a combination thereof.
The character string output module 1630 is connected to the end determination module 1610, the character string tracking module 1620, and the connected component holding module 1640. The character string output module 1630 outputs the character string element selected by the character string element observation module 140 (character string element 145 stored in the character string element holding module 160) as a character string. That is, the character string output module 1630 outputs a set of character string elements held by the character string tracking module 1620 as a character string.
The connected component holding module 1640 is connected to the end determination module 1610, the character string tracking module 1620, and the character string output module 1630. The connected component holding module 1640 corresponds to the aforementioned connected component holding module 150. Then, the character string tracking module 1620 updates the information in the connected component holding module 1640 so as to add information indicating that it has already been detected to the connected components included in the detected character string.

図１７は、第６の実施の形態の構成例についての概念的なモジュール構成図である。第６の実施の形態は、画像の文字列を検出する。
第６の実施の形態は、前述した実施の形態を利用して文書画像内の文字列を検出するものであり、図１７の例に示すように、連結成分作成モジュール１７１０、文字列切り出しモジュール１７２０、連結成分保持モジュール１７３０を有している。
連結成分作成モジュール１７１０は、文字列切り出しモジュール１７２０、連結成分保持モジュール１７３０と接続されている。連結成分作成モジュール１７１０は、受け付けた文書画像１７０５に対して、連結成分１７１７を作成する。文書画像１７０５中のすべての連結成分を検出したとき、開始信号１７１５を出力する。
文字列切り出しモジュール１７２０は、連結成分作成モジュール１７１０、連結成分保持モジュール１７３０と接続されている。文字列切り出しモジュール１７２０は、図１６の例に示した実施の形態に相当する。説明のため、連結成分保持モジュール１７３０を取り出して図示している。
連結成分保持モジュール１７３０は、連結成分作成モジュール１７１０、文字列切り出しモジュール１７２０と接続されている。 FIG. 17 is a conceptual module configuration diagram of a configuration example according to the sixth embodiment. In the sixth embodiment, a character string of an image is detected.
In the sixth embodiment, a character string in a document image is detected using the above-described embodiment. As shown in the example of FIG. 17, a connected component creation module 1710 and a character string cut-out module 1720 are used. And a connected component holding module 1730.
The connected component creation module 1710 is connected to the character string cutout module 1720 and the connected component holding module 1730. The connected component creation module 1710 creates a connected component 1717 for the received document image 1705. When all connected components in the document image 1705 are detected, a start signal 1715 is output.
The character string clipping module 1720 is connected to the connected component creation module 1710 and the connected component holding module 1730. The character string cutout module 1720 corresponds to the embodiment shown in the example of FIG. For illustration, the connected component holding module 1730 is taken out and shown.
The connected component holding module 1730 is connected to the connected component creation module 1710 and the character string cutout module 1720.

図１８は、第７の実施の形態の構成例についての概念的なモジュール構成図である。第７の実施の形態は、文字認識装置である。つまり、前述した実施の形態の出力を、文字認識の対象とする文字認識装置である。
第７の実施の形態は、図１８の例に示すように、文字列処理装置１８１０、文字列認識モジュール１８２０を有している。
文字列処理装置１８１０は、文字列認識モジュール１８２０と接続されている。文字列処理装置１８１０は、図１７の例に示した実施の形態に相当する。
文字列認識モジュール１８２０は、文字列処理装置１８１０と接続されている。文字列認識モジュール１８２０は、文字列処理装置１８１０によって出力された文字列１８１５（図１７では文字列１７２５）に対して、文字認識を行って、文字認識結果１８２５を出力する。文字列認識結果１８２５には、文字列に対する、個々の単文字の位置とテキストコードが含まれる。 FIG. 18 is a conceptual module configuration diagram of a configuration example according to the seventh embodiment. The seventh embodiment is a character recognition device. That is, the character recognition apparatus uses the output of the above-described embodiment as a character recognition target.
As shown in the example of FIG. 18, the seventh embodiment includes a character string processing device 1810 and a character string recognition module 1820.
The character string processing device 1810 is connected to the character string recognition module 1820. The character string processing device 1810 corresponds to the embodiment shown in the example of FIG.
The character string recognition module 1820 is connected to the character string processing device 1810. The character string recognition module 1820 performs character recognition on the character string 1815 (character string 1725 in FIG. 17) output by the character string processing device 1810, and outputs a character recognition result 1825. The character string recognition result 1825 includes the position and text code of each single character with respect to the character string.

図１９は、第８の実施の形態の構成例についての概念的なモジュール構成図である。第８の実施の形態は、文書歪み推定・補正装置である。つまり、前述した実施の形態を文書画像の歪み補正に利用するものであり、前述した実施の形態による出力（文字列）を受け付け、その文字列を水平又は垂直にして、文書画像の歪みを補正する。
第８の実施の形態は、図１９の例に示すように、文字列処理装置１９１０、歪み量推定モジュール１９２０、歪み補正モジュール１９３０を有している。
文字列処理装置１９１０は、歪み量推定モジュール１９２０、歪み補正モジュール１９３０と接続されている。文字列処理装置１９１０は、図１７の例に示した実施の形態に相当する。 FIG. 19 is a conceptual module configuration diagram of an exemplary configuration according to the eighth embodiment. The eighth embodiment is a document distortion estimation / correction apparatus. In other words, the above-described embodiment is used for correcting the distortion of the document image. The output (character string) according to the above-described embodiment is accepted, and the character string is horizontally or vertically corrected to correct the distortion of the document image. To do.
As shown in the example of FIG. 19, the eighth embodiment includes a character string processing device 1910, a distortion amount estimation module 1920, and a distortion correction module 1930.
The character string processing device 1910 is connected to a distortion amount estimation module 1920 and a distortion correction module 1930. The character string processing device 1910 corresponds to the embodiment shown in the example of FIG.

歪み量推定モジュール１９２０は、文字列処理装置１９１０、歪み補正モジュール１９３０と接続されている。歪み量推定モジュール１９２０は、文字列処理装置１９１０によって出力された文字列（文字列切り出しモジュール１７２０が出力した文字列１７２５）に基づいて、歪み量１９２５を算出する。つまり、歪み量推定モジュール１９２０は、受け付けた文字列１９１５を対象として、補正に必要な文字列の歪み量１９２５を推定する。例えば、各文字列の中心線を抽出し、その傾きを検出する。
歪み補正モジュール１９３０は、文字列処理装置１９１０、歪み量推定モジュール１９２０と接続されている。歪み補正モジュール１９３０は、歪み量推定モジュール１９２０によって算出された歪み量１９２５に基づいて、その文字列を含む文書画像１９０５の歪みを補正する。複数の文字列の歪みを考慮するため、文字列処理装置１９１０がすべての文字列を検出したことを示す終了信号１９１７を受け付けてから処理を開始する。推定された歪み量１９２５に基づいて、文書画像１９０５の歪みを補正し、歪み補正文書画像１９３５を出力する。歪みを補正としては、具体的には、傾きを逆方向に修正すること等のアフィン変換を行う。 The distortion amount estimation module 1920 is connected to the character string processing device 1910 and the distortion correction module 1930. The distortion amount estimation module 1920 calculates a distortion amount 1925 based on the character string output by the character string processing device 1910 (character string 1725 output by the character string cutout module 1720). That is, the distortion amount estimation module 1920 estimates the distortion amount 1925 of the character string necessary for correction with respect to the received character string 1915. For example, the center line of each character string is extracted and its inclination is detected.
The distortion correction module 1930 is connected to the character string processing device 1910 and the distortion amount estimation module 1920. The distortion correction module 1930 corrects the distortion of the document image 1905 including the character string based on the distortion amount 1925 calculated by the distortion amount estimation module 1920. In order to consider the distortion of a plurality of character strings, the processing is started after receiving an end signal 1917 indicating that the character string processing device 1910 has detected all the character strings. Based on the estimated distortion amount 1925, the distortion of the document image 1905 is corrected and a distortion corrected document image 1935 is output. Specifically, the distortion is corrected by performing affine transformation such as correcting the inclination in the reverse direction.

図２０を参照して、本実施の形態の画像処理装置のハードウェア構成例について説明する。図２０に示す構成は、例えばパーソナルコンピュータ（ＰＣ）などによって構成されるものであり、スキャナ等のデータ読み取り部２０１７と、プリンタなどのデータ出力部２０１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the present embodiment will be described with reference to FIG. The configuration illustrated in FIG. 20 is configured by, for example, a personal computer (PC), and illustrates a hardware configuration example including a data reading unit 2017 such as a scanner and a data output unit 2018 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２００１は、前述の実施の形態において説明した各種のモジュール、すなわち、予測モジュール１１０、連結成分検出モジュール１２０、文字列要素候補作成モジュール１３０、文字列要素観測モジュール１４０、連結成分選択モジュール９２６、複数予測領域作成モジュール１２０６、終了判別モジュール１６１０、文字列出力モジュール１６３０、連結成分作成モジュール１７１０、文字列認識モジュール１８２０、歪み量推定モジュール１９２０、歪み補正モジュール１９３０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 A CPU (Central Processing Unit) 2001 includes various modules described in the above-described embodiments, that is, a prediction module 110, a connected component detection module 120, a character string element candidate creation module 130, a character string element observation module 140, a connected component. Execution of modules such as a selection module 926, multiple prediction region creation module 1206, end determination module 1610, character string output module 1630, connected component creation module 1710, character string recognition module 1820, distortion amount estimation module 1920, distortion correction module 1930, etc. It is a control part which performs the process according to the computer program which described the sequence.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２００２は、ＣＰＵ２００１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２００３は、ＣＰＵ２００１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス２００４により相互に接続されている。 A ROM (Read Only Memory) 2002 stores programs, calculation parameters, and the like used by the CPU 2001. A RAM (Random Access Memory) 2003 stores programs used in the execution of the CPU 2001, parameters that change as appropriate during the execution, and the like. These are connected to each other by a host bus 2004 including a CPU bus.

ホストバス２００４は、ブリッジ２００５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス２００６に接続されている。 The host bus 2004 is connected via a bridge 2005 to an external bus 2006 such as a PCI (Peripheral Component Interconnect / Interface) bus.

キーボード２００８、マウス等のポインティングデバイス２００９は、操作者により操作される入力デバイスである。ディスプレイ２０１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）などがあり、各種情報をテキストやイメージ情報として表示する。 A keyboard 2008 and a pointing device 2009 such as a mouse are input devices operated by an operator. The display 2010 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）２０１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ２００１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、対象とする画像、連結成分、文字列要素候補、文字列要素などが格納される。さらに、その他の各種のデータ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 2011 includes a hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 2001 and information. The hard disk stores a target image, a connected component, a character string element candidate, a character string element, and the like. Further, various computer programs such as various other data processing programs are stored.

ドライブ２０１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体２０１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース２００７、外部バス２００６、ブリッジ２００５、及びホストバス２００４を介して接続されているＲＡＭ２００３に供給する。リムーバブル記録媒体２０１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 2012 reads data or a program recorded in a removable recording medium 2013 such as a mounted magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read from the interface 2007 and an external bus 2006. , The bridge 2005, and the RAM 2003 connected via the host bus 2004. The removable recording medium 2013 can also be used as a data recording area similar to the hard disk.

接続ポート２０１４は、外部接続機器２０１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート２０１４は、インタフェース２００７、及び外部バス２００６、ブリッジ２００５、ホストバス２００４等を介してＣＰＵ２００１等に接続されている。通信部２０１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部２０１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部２０１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 2014 is a port for connecting the external connection device 2015, and has a connection unit such as USB, IEEE1394. The connection port 2014 is connected to the CPU 2001 and the like via the interface 2007, the external bus 2006, the bridge 2005, the host bus 2004, and the like. The communication unit 2016 is connected to a communication line and executes data communication processing with the outside. The data reading unit 2017 is a scanner, for example, and executes document reading processing. The data output unit 2018 is, for example, a printer, and executes document data output processing.

なお、図２０に示す画像処理装置のハードウェア構成は、ひとつの構成例を示すものであり、本実施の形態は、図２０に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図２０に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus shown in FIG. 20 shows one configuration example, and the present embodiment is not limited to the configuration shown in FIG. 20, and the modules described in this embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line In addition, a plurality of systems shown in FIG. 20 may be connected to each other via a communication line so as to cooperate with each other. Further, it may be incorporated in a copying machine, a fax machine, a scanner, a printer, a multifunction machine (an image processing apparatus having any two or more functions of a scanner, a printer, a copying machine, a fax machine, etc.).

なお、前述の実施の形態を組み合わせてもよい。例えば、第２の実施の形態と第３の実施の形態を組み合わせてもよい。 Note that the above-described embodiments may be combined. For example, the second embodiment and the third embodiment may be combined.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙＤｉｓｃ（登録商標））、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray Disc (registered trademark), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１１０…予測モジュール
１２０…連結成分検出モジュール
１３０…文字列要素候補作成モジュール
１４０…文字列要素観測モジュール
１５０…連結成分保持モジュール
１６０…文字列要素保持モジュール
９２６…連結成分選択モジュール
１２０６…複数予測領域作成モジュール
１６１０…終了判別モジュール
１６２０…文字列追跡モジュール
１６３０…文字列出力モジュール
１６４０…連結成分保持モジュール
１７１０…連結成分作成モジュール
１７２０…文字列切り出しモジュール
１７３０…連結成分保持モジュール
１８２０…文字列認識モジュール
１９２０…歪み量推定モジュール
１９３０…歪み補正モジュール DESCRIPTION OF SYMBOLS 110 ... Prediction module 120 ... Connected component detection module 130 ... Character string element candidate creation module 140 ... Character string element observation module 150 ... Connected component holding module 160 ... Character string element holding module 926 ... Connected component selection module 1206 ... Multiple prediction area creation Module 1610 ... End determination module 1620 ... Character string tracking module 1630 ... Character string output module 1640 ... Connected component holding module 1710 ... Connected component creation module 1720 ... Character string cutout module 1730 ... Connected component holding module 1820 ... Character string recognition module 1920 ... Distortion estimation module 1930 ... Distortion correction module

Claims

Detecting means for detecting a combination of connected components in the region of the target image;
Generating a plurality of character string element candidates that are character string element candidates that are candidates for character string elements that can form a character string, based on a combination of connected components detected by the detecting means, by a string element candidate creating means for creating a set of string element candidates,
From a set of string element candidates created by the string element candidate creating unit, that comprises a string element selection means for selecting a character string element candidates following the string elements that are selected in the past as a string element A featured image processing apparatus.

The character string element candidate creation means uses an ellipse inscribed in a circumscribed rectangle of the connected component as the connected component.
The image processing apparatus according to claim 1.

A predicting means for predicting a region where a character string element following the character string element should be based on a character string element selected in the past;
The detecting device, an image processing apparatus according to claim 1 or claim 2, characterized in that to detect the combination of the connected components within the predicted region by said predicting means.

A connected component selecting means for selecting a connected component by removing a connected component that is inappropriate as a character string element from the combination of connected components detected by the detecting means;
The string element candidate creating unit, the connected component selection means based on a combination of the selected connected component by any one of claims 1, characterized in that to create a set of string element candidates 3 An image processing apparatus according to 1.

The prediction means creates a plurality of prediction regions,
The detecting device, an image processing apparatus according to claim 4 dependent on claim 3 or claim 3, characterized in that it is carried out combining the coupling components contained in the different prediction region.

Output means for outputting the character string element selected by the character string element selection means as a character string;
The image processing device according to claim 1, characterized by comprising determining means for determining that the string element for starting a connected component from the image is not detected as the end in any one of 5 .

The image processing apparatus according to claim 6 , further comprising: a character recognition unit that performs character recognition on the character string output by the output unit.

Calculation means for calculating a distortion amount based on the character string output by the output means;
The image processing apparatus according to claim 6 , further comprising: a correcting unit that corrects distortion of the image including the character string based on the distortion amount calculated by the calculating unit.

Computer
Detecting means for detecting a combination of connected components in the region of the target image;
Generating a plurality of character string element candidates that are character string element candidates that are candidates for character string elements that can form a character string, based on a combination of connected components detected by the detecting means, by a string element candidate creating means for creating a set of string element candidates,
From a set of string element candidates created by the string element candidate creating unit, to function as a string element selection means for selecting a character string element candidates following the string elements that are selected in the past as a string element Image processing program.