JP2015184691A

JP2015184691A - Image processor and image processing program

Info

Publication number: JP2015184691A
Application number: JP2014057611A
Authority: JP
Inventors: 井原　富士夫; Fujio Ihara; 富士夫井原
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2014-03-20
Filing date: 2014-03-20
Publication date: 2015-10-22
Anticipated expiration: 2034-03-20
Also published as: JP6303671B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processor for extracting a character string even from an image having a background in the case of extracting the character string from in an image.SOLUTION: A plurality of binarization means of the image processor binarize an image, selection means selects a pixel block on the basis of the possibility that the pixel block in each binarized image binarized by the binarization means is a character, extraction means retrieves adjacent characters from the pixel block selected by the selection means to extract a character string, and combination means combines the character string extracted by the extraction means.

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

特許文献１には、屋外や工場内などの現場での文字認識に汎用的に適用できる文字認識方法及び装置を提供することを課題とし、認識対象文字の撮影画像を複数の閾値で２値化した複数の文字パターンを取得し、この複数の文字パターンをＸ方向、Ｙ方向に射影加算することによって文字領域を切り出し、得られた文字をニューラルネットワークに入力して、設定基準値を越える出力値が最大値をとるものを最終認識結果として出力することが開示されている。 In Patent Document 1, it is an object to provide a character recognition method and apparatus that can be generally applied to character recognition in the field such as outdoors or in a factory, and binarize a captured image of a character to be recognized with a plurality of threshold values. The obtained character pattern is obtained, and the character area is cut out by projecting and adding the plurality of character patterns in the X and Y directions. The obtained character is input to the neural network, and the output value exceeds the set reference value. Is output as the final recognition result.

特許文献２には、未知の照明条件に対し頑健な文字認識の実現を課題とし、複数の異なる二値化手段と、それらから得られる文字パターン候補を統合する手段と、文字パターン候補を解析して文字列として認識することが開示されている。 In Patent Document 2, it is an object to realize robust character recognition with respect to unknown lighting conditions, and a plurality of different binarization means, means for integrating character pattern candidates obtained from them, and character pattern candidates are analyzed. Are recognized as character strings.

特開２００３−０１６３８６号公報JP 2003-016386 A 特開２００６−０５９１２４号公報JP 2006-059124 A

本発明は、画像内から文字列を抽出する場合にあって、背景がある画像からも文字列を抽出するようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 An object of the present invention is to provide an image processing apparatus and an image processing program that extract a character string from an image with a background when a character string is extracted from the image.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、画像を２値化する複数の２値化手段と、前記２値化手段によって２値化された各２値化画像内の画素塊が文字である可能性に基づいて、画素塊を選択する選択手段と、前記選択手段によって選択された画素塊から隣接する文字を探索して文字列を抽出する抽出手段と、前記抽出手段によって抽出された文字列を合成する合成手段を具備することを特徴とする画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
The invention of claim 1 is based on a plurality of binarization means for binarizing an image and a possibility that a pixel block in each binarized image binarized by the binarization means is a character. A selection unit for selecting a pixel block, an extraction unit for searching for adjacent characters from the pixel block selected by the selection unit and extracting a character string, and a synthesis unit for synthesizing the character string extracted by the extraction unit An image processing apparatus comprising:

請求項２の発明は、前記選択手段は、前記各２値化画像を文字認識した結果の文字の確信度が予め定められた第１の閾値より高い又は以上であって、該文字の複雑度が第２の閾値より高い又は以上である画素塊を選択することを特徴とする請求項１に記載の画像処理装置である。 The invention according to claim 2 is characterized in that the selection means has a certainty factor of a character as a result of character recognition of each of the binarized images higher or higher than a predetermined first threshold, and the complexity of the character 2. The image processing apparatus according to claim 1, wherein a pixel block whose is higher than or equal to a second threshold value is selected.

請求項３の発明は、前記複雑度は、前記画素塊の外接矩形内での交番数、文字認識結果である文字の字画数のいずれか、又はこれらの組み合わせであることを特徴とする請求項２に記載の画像処理装置である。 The invention according to claim 3 is characterized in that the complexity is one of the number of alternations in the circumscribed rectangle of the pixel block, the number of strokes of characters as a character recognition result, or a combination thereof. The image processing apparatus according to 2.

請求項４の発明は、前記選択手段は、前記各２値化画像を文字認識した結果の文字の確信度が予め定められた第１の閾値より高い又は以上であって、該文字が予め定められた文字ではない画素塊を選択することを特徴とする請求項１に記載の画像処理装置である。 According to a fourth aspect of the present invention, the selection means has a certainty factor of a character as a result of character recognition of each of the binarized images being higher than or equal to a predetermined first threshold, and the character is determined in advance. 2. The image processing apparatus according to claim 1, wherein a pixel block that is not a written character is selected.

請求項５の発明は、前記抽出手段は、前記選択手段によって選択された第１の画素塊の外接矩形の大きさに応じた矩形を設定し、該外接矩形の周辺にある各２値化画像の該矩形内を文字認識し、該文字認識した結果の確信度が第３の閾値より高い又は以上である画素塊の組み合わせを、該第１の文字の隣にある第２の文字とし、次に該第２の文字を第１の文字に設定して文字列を抽出することを特徴とする請求項１から４のいずれか一項に記載の画像処理装置である。 According to a fifth aspect of the present invention, the extraction unit sets a rectangle corresponding to the size of the circumscribed rectangle of the first pixel block selected by the selection unit, and each binarized image around the circumscribed rectangle. Character recognition within the rectangle, and a combination of pixel blocks whose certainty of the result of character recognition is higher or higher than a third threshold is set as a second character next to the first character, and The image processing apparatus according to claim 1, wherein the second character is set as the first character to extract a character string.

請求項６の発明は、前記抽出手段は、予め定められた条件に合致する画素塊の組み合わせを除外して前記文字認識を行うことを特徴とする請求項５に記載の画像処理装置である。 A sixth aspect of the present invention is the image processing apparatus according to the fifth aspect, wherein the extraction means performs the character recognition by excluding a combination of pixel blocks matching a predetermined condition.

請求項７の発明は、前記予め定められた条件として、異なる色の画素塊を含む組み合わせであること、異なるストローク幅を含む組み合わせであること、前記第１の画素塊における文字のストローク幅と異なるストローク幅を含む組み合わせであることのいずれか、又はこれらの組み合わせであることを特徴とする請求項６に記載の画像処理装置である。 The invention according to claim 7 is, as the predetermined condition, a combination including pixel clusters of different colors, a combination including different stroke widths, and different from a character stroke width in the first pixel blocks. The image processing apparatus according to claim 6, wherein the image processing apparatus is one of a combination including a stroke width, or a combination thereof.

請求項８の発明は、前記合成手段は、前記各２値化画像における文字列画像を論理和合成して文字列画像を生成すること、又は、前記各２値化画像における文字列画像に対応する各文字の文字コード毎に計数し、該計数の結果に基づいて文字列画像を生成することを特徴とする請求項１から７のいずれか一項に記載の画像処理装置である。 The invention according to claim 8 is characterized in that the synthesizing unit generates a character string image by performing a logical sum synthesis on the character string images in the respective binarized images, or corresponds to the character string image in each of the binarized images. The image processing apparatus according to claim 1, wherein a character string image is generated on the basis of a result of the counting for each character code of each character.

請求項９の発明は、前記２値化手段は、複数の異なる閾値による２値化、前記画像を反転後に複数の異なる閾値による２値化、又はこれらの組み合わせの２値化を行うことを特徴とする請求項１から８のいずれか一項に記載の画像処理装置である。 The invention of claim 9 is characterized in that the binarization means performs binarization using a plurality of different thresholds, binarization using a plurality of different thresholds after inversion of the image, or binarization of a combination thereof. An image processing apparatus according to any one of claims 1 to 8.

請求項１０の発明は、コンピュータを、画像を２値化する複数の２値化手段と、前記２値化手段によって２値化された各２値化画像内の画素塊が文字である可能性に基づいて、画素塊を選択する選択手段と、前記選択手段によって選択された画素塊から隣接する文字を探索して文字列を抽出する抽出手段と、前記抽出手段によって抽出された文字列を合成する合成手段として機能させるための画像処理プログラムである。 According to the invention of claim 10, there is a possibility that the computer includes a plurality of binarizing means for binarizing the image, and the pixel block in each binarized image binarized by the binarizing means is a character. A selection unit for selecting a pixel block, an extraction unit for extracting a character string by searching for adjacent characters from the pixel block selected by the selection unit, and a character string extracted by the extraction unit An image processing program for functioning as a synthesizing unit.

請求項１の画像処理装置によれば、画像内から文字列を抽出する場合にあって、背景がある画像からも文字列を抽出することができる。 According to the image processing apparatus of the first aspect, when a character string is extracted from an image, the character string can be extracted from an image with a background.

請求項２の画像処理装置によれば、文字認識した結果の文字の確信度と文字の複雑度を用いて、文字である画素塊を選択することができる。 According to the image processing apparatus of the second aspect, it is possible to select a pixel block which is a character by using the certainty factor of the character as a result of character recognition and the complexity of the character.

請求項３の画像処理装置によれば、複雑度として、交番数又は字画数を用いることができる。 According to the image processing apparatus of claim 3, the number of alternations or the number of strokes can be used as the complexity.

請求項４の画像処理装置によれば、文字認識した結果の文字の確信度と予め定められた文字を用いて、文字である画素塊を選択することができる。 According to the image processing apparatus of the fourth aspect, it is possible to select a pixel block which is a character by using the certainty factor of the character as a result of character recognition and a predetermined character.

請求項５の画像処理装置によれば、第１の画素塊の外接矩形の周辺にある文字を抽出することができる。 According to the image processing apparatus of the fifth aspect, it is possible to extract characters around the circumscribed rectangle of the first pixel block.

請求項６の画像処理装置によれば、予め定められた条件に合致する画素塊の組み合わせを除外して文字認識を行うことができる。 According to the image processing device of the sixth aspect, it is possible to perform character recognition by excluding combinations of pixel blocks that match a predetermined condition.

請求項７の画像処理装置によれば、予め定められた条件として、異なる色の画素塊を含む組み合わせであること、異なるストローク幅を含む組み合わせであること、又は第１の画素塊における文字のストローク幅と異なるストローク幅を含む組み合わせであることを用いることができる。 According to the image processing apparatus of claim 7, as a predetermined condition, a combination including pixel clusters of different colors, a combination including different stroke widths, or a stroke of characters in the first pixel block A combination including a stroke width different from the width can be used.

請求項８の画像処理装置によれば、論理和合成、又は文字コードの計数結果に基づいて文字列画像を生成することができる。 According to the image processing apparatus of the eighth aspect, it is possible to generate a character string image based on the logical sum synthesis or the character code count result.

請求項９の画像処理装置によれば、複数の異なる閾値を用いて２値化を行うことができる。 According to the image processing apparatus of the ninth aspect, binarization can be performed using a plurality of different threshold values.

請求項１０の画像処理プログラムによれば、画像内から文字列を抽出する場合にあって、背景がある画像からも文字列を抽出することができる。 According to the image processing program of the tenth aspect, when a character string is extracted from the image, the character string can be extracted from an image with a background.

本実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態の起点文字探索モジュール内の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example in the origin character search module of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態の文字列探索モジュール内の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example in the character string search module of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態の文字列探索モジュール内の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example in the character string search module of this Embodiment. 本実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the process example by this Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な一実施の形態の例を説明する。
図１は、本実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（もちろんのことながら、全ての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, an example of a preferred embodiment for realizing the present invention will be described with reference to the drawings.
FIG. 1 shows a conceptual module configuration diagram of a configuration example of the present embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, they may be different values, or two or more values (of course, including all values) may be the same. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

本実施の形態である画像処理装置１００は、画像内から文字列を抽出するものであって、図１の例に示すように、画像受付モジュール１１０、２値化（１）モジュール１２０Ａ、２値化（２）モジュール１２０Ｂ、・・・、２値化（Ｎ）モジュール１２０Ｚ、起点文字探索モジュール１３０Ａ、起点文字探索モジュール１３０Ｂ、・・・、起点文字探索モジュール１３０Ｚ、文字列探索モジュール１４０Ａ、文字列探索モジュール１４０Ｂ、・・・、文字列探索モジュール１４０Ｚ、文字列合成モジュール１５０を有している。 The image processing apparatus 100 according to the present embodiment extracts a character string from an image. As shown in the example of FIG. 1, the image receiving module 110, the binarization (1) module 120A, the binary (2) module 120B,..., Binarization (N) module 120Z, starting character search module 130A, starting character search module 130B,..., Starting character search module 130Z, character string search module 140A, character string .., A character string search module 140Z, and a character string composition module 150.

画像受付モジュール１１０は、２値化（１）モジュール１２０Ａ、２値化（２）モジュール１２０Ｂ、２値化（Ｎ）モジュール１２０Ｚ、文字列探索モジュール１４０Ｚと接続されている。画像受付モジュール１１０は、対象となる画像を受け付ける。画像を受け付けるとは、例えば、カメラ等で画像を撮影すること、動画像から画像を抽出すること、スキャナ等で画像を読み込むこと、ファックス等で通信回線を介して外部機器から画像を受信すること、ハードディスク（コンピュータに内蔵されているものの他に、ネットワークを介して接続されているもの等を含む）等に記憶されている画像を読み出すこと等が含まれる。画像は、多値画像（カラー画像を含む）である。受け付ける画像は、１枚であってもよいし、複数枚であってもよい。また、画像の内容として、広告宣伝用のパンフレット等におけるデザイン文字列、風景画像における看板等に記載された文字列、動画像におけるテロップ等のような文字列が含まれている。また、その文字列には背景があってもよい。また、文字の位置、サイズ、色等は未知であり、文字列の並びも直線と仮定できなくてもよい。具体的には、カメラで撮影した風景画像、教室において講義を撮影した動画像、自動車のドライブレコーダーで撮影した動画像等から抽出した画像（情景画像ともいわれる）が該当する。 The image reception module 110 is connected to a binarization (1) module 120A, a binarization (2) module 120B, a binarization (N) module 120Z, and a character string search module 140Z. The image reception module 110 receives a target image. Accepting an image means, for example, taking an image with a camera or the like, extracting an image from a moving image, reading an image with a scanner or the like, or receiving an image from an external device via a communication line with a fax or the like. And reading out an image stored in a hard disk (including those connected to the computer in addition to those built in the computer). The image is a multivalued image (including a color image). One image may be received or a plurality of images may be received. Further, the contents of the image include a character string such as a design character string in an advertisement pamphlet, a character string described on a signboard in a landscape image, a telop in a moving image, and the like. The character string may have a background. Further, the position, size, color, etc. of the characters are unknown, and it is not necessary to assume that the arrangement of the character strings is a straight line. Specifically, this includes a landscape image taken with a camera, a moving image taken with a lecture in a classroom, an image extracted from a moving image taken with a car drive recorder, etc. (also referred to as a scene image).

２値化モジュール１２０、起点文字探索モジュール１３０、文字列探索モジュール１４０は、それぞれ複数ある。それぞれのモジュールは、同等の処理を行うので、それぞれ２値化（１）モジュール１２０Ａ、起点文字探索モジュール１３０Ａ、文字列探索モジュール１４０Ａを代表して説明する。
２値化（１）モジュール１２０Ａは、画像受付モジュール１１０、起点文字探索モジュール１３０Ａと接続されている。２値化（１）モジュール１２０Ａは、起点文字探索モジュール１３０Ａに２値化画像層（１）１２２Ａを渡す。２値化（１）モジュール１２０Ａは、画像受付モジュール１１０が受け付けた画像を２値化する。それぞれの２値化モジュール１２０は、異なる２値化処理を行う。「異なる２値化処理」として、例えば、手法そのものが異なる２値化であってもよい。例えば、２値化（１）モジュール１２０Ａはｐ−タイル法、２値化（２）モジュール１２０Ｂはモード法、２値化（Ｎ）モジュール１２０Ｚは判別分析法等のようにしてもよい。それぞれが異なる２値化画像を出力するものであればよい。また、「異なる２値化処理」として、複数の異なる閾値による２値化、対象となっている画像を反転後に複数の異なる閾値による２値化、又はこれらの組み合わせの２値化であってもよい。なお、「対象となっている画像を反転」するのは、白抜き文字等に対応するためである。 There are a plurality of binarization modules 120, starting character search modules 130, and character string search modules 140, respectively. Since each module performs the same process, the binarization (1) module 120A, the starting character search module 130A, and the character string search module 140A will be described as a representative.
The binarization (1) module 120A is connected to the image receiving module 110 and the starting character search module 130A. The binarization (1) module 120A passes the binarized image layer (1) 122A to the starting character search module 130A. The binarization (1) module 120A binarizes the image received by the image reception module 110. Each binarization module 120 performs different binarization processing. As the “different binarization processing”, for example, binarization in which the method itself is different may be used. For example, the binarization (1) module 120A may be a p-tile method, the binarization (2) module 120B may be a mode method, and the binarization (N) module 120Z may be a discriminant analysis method or the like. Any one that outputs different binarized images may be used. Further, as “different binarization processing”, binarization by a plurality of different thresholds, binarization by a plurality of different thresholds after reversing the target image, or binarization of a combination thereof Good. The reason for “inverting the target image” is to deal with white characters and the like.

図９の例に示す２値化画像９００は、「なかた商店」という文字が記載された看板が含まれている情景画像を２値化モジュール１２０のいずれかが２値化処理した結果である。２値化画像９００内には、「なかた商店」という文字の画像の他に、文字「た」の近傍に、ノイズ（非文字）９１０、ノイズ（非文字）９２０、ノイズ（非文字）９３０が含まれることがある。従来の技術では、１文字の画像を切り出すのに、これらのノイズに影響を受けてしまい、１文字の画像として、図１０の例に示すように連結方向１０１０として、「た」の画像以外にノイズ（非文字）９１０、ノイズ（非文字）９２０等も含めて文字画像としてしまうことがある。 The binarized image 900 shown in the example of FIG. 9 is a result of binarization processing performed by any of the binarization modules 120 on a scene image including a signboard on which characters “Nakata store” are written. In the binarized image 900, in addition to the character image “Nakata Shoten”, there are noise (non-character) 910, noise (non-character) 920, and noise (non-character) 930 in the vicinity of the character “ta”. May be included. In the conventional technique, when cutting out a one-character image, it is affected by these noises. As a one-character image, as shown in the example of FIG. A character image may be included including noise (non-character) 910, noise (non-character) 920, and the like.

起点文字探索モジュール１３０Ａは、２値化（１）モジュール１２０Ａ、文字列探索モジュール１４０Ａ、文字列探索モジュール１４０Ｂ、文字列探索モジュール１４０Ｚと接続されている。起点文字探索モジュール１３０Ａは、２値化（１）モジュール１２０Ａによって２値化された２値化画像層（１）１２２Ａ内の画素塊が文字である可能性に基づいて、画素塊を選択する。ここで選択する画素塊は、文字である可能性が高い画素塊である。また、画素塊とは、４連結又は８連結で連続する画素領域を少なくとも含み、これらの画素領域の集合をも含む。これらの画素領域の集合とは、４連結等で連続した画素領域が複数あり、その複数の画素領域は近傍にあるものをいう。ここで、近傍にあるものとは、例えば、互いの画素領域が距離的に近いもの（互いの画素塊が予め定められた距離内にあること）等がある。なお、１つの画素塊として、１文字の画像となる場合が多い。ただし、実際に人間が文字として認識できる画素領域である必要はない。文字の一部分、文字を形成しない画素領域等もあり、何らかの画素の塊であればよい。起点文字探索モジュール１３０Ａによって選択された文字画像が、文字列を探索する場合の起点文字となる。
「文字である可能性に基づいて選択」とは、文字である可能性が高い画素塊を選択することである。そして、ここでの「文字である可能性が高い」とは、文字である可能性を示す値が、予め定められた閾値以上であること、又は、文字である可能性を示す値で降順に並べ、予め定められた順位より上位であること（例えば、文字である可能性を示す値がもっとも高いものであること）等がある。したがって、２値化画像において、文字である可能性が高い画素塊がない場合は、画素塊は選択されないことになり、その２値化画像においては文字列は抽出できないことになる。ただし、他の２値化画像で画素塊が選択されれば、文字列は抽出できることになる。 The starting character search module 130A is connected to the binarization (1) module 120A, the character string search module 140A, the character string search module 140B, and the character string search module 140Z. The starting character search module 130A selects a pixel block based on the possibility that the pixel block in the binarized image layer (1) 122A binarized by the binarization (1) module 120A is a character. The pixel block selected here is a pixel block having a high possibility of being a character. Further, the pixel block includes at least a pixel region that is continuous in four or eight connections, and includes a set of these pixel regions. The set of these pixel areas means that there are a plurality of continuous pixel areas such as 4-connected, and the plurality of pixel areas are in the vicinity. Here, what is in the vicinity includes, for example, those in which the pixel areas are close to each other in distance (the pixel clusters are within a predetermined distance). In many cases, an image of one character is formed as one pixel block. However, it is not necessary that the pixel area is actually recognizable as a character by humans. There are a part of a character, a pixel region that does not form a character, and the like, and any pixel block may be used. The character image selected by the starting character search module 130A becomes the starting character when searching for a character string.
“Select based on the possibility of being a character” refers to selecting a pixel block that has a high possibility of being a character. And "the possibility that it is a character is high" here is that the value indicating the possibility of being a character is equal to or more than a predetermined threshold or the value indicating the possibility of being a character in descending order. Arrangement and higher rank than a predetermined order (for example, the highest value indicating the possibility of being a character). Therefore, if there is no pixel block having a high possibility of being a character in the binarized image, the pixel block is not selected, and the character string cannot be extracted from the binarized image. However, if a pixel block is selected in another binarized image, the character string can be extracted.

起点文字探索モジュール１３０Ａは、２値化画像層（１）１２２Ａを文字認識した結果の文字の確信度（例えば、文字認識用の特徴空間において、対象としている文字画像と辞書内の文字との距離の逆数等）が予め定められた第１の閾値より高い又は以上であって、その文字の複雑度が第２の閾値より高い又は以上である画素塊を選択するようにしてもよい。ここでの「複雑度」は、画素塊の外接矩形内での交番数（黒から白へ、白から黒への切り替わり回数）、文字認識結果である文字の字画数のいずれか、又はこれらの組み合わせとしてもよい。
また、起点文字探索モジュール１３０Ａは、２値化画像層（１）１２２Ａを文字認識した結果の文字の確信度が予め定められた第１の閾値より高い又は以上であって、該文字が予め定められた文字ではない画素塊を選択するようにしてもよい。 The starting character search module 130A determines the certainty of the character as a result of character recognition of the binarized image layer (1) 122A (for example, the distance between the target character image and the character in the dictionary in the feature space for character recognition). May be selected such that the character complexity is higher or higher than the first threshold, and the complexity of the character is higher or higher than the second threshold. Here, the “complexity” is either the number of alternations in the circumscribed rectangle of the pixel block (the number of times of switching from black to white or white to black), the number of strokes of characters that are character recognition results, or these It is good also as a combination.
In addition, the starting character search module 130A has a certainty factor of a character as a result of character recognition of the binarized image layer (1) 122A that is higher or higher than a predetermined first threshold, and the character is determined in advance. A pixel block which is not a written character may be selected.

より具体的には、起点文字探索モジュール１３０Ａは、各々の２値化画像から既に知られているラベリング処理などにより、連結した画素領域を画素塊として認識し、それを外接矩形の画像として切り出して、文字認識することにより文字候補の文字コードと、その確信度（確度）を得る。外接矩形の領域に、他の画素塊も含まれる場合（外接矩形が重なる場合）は、それら複数の画素塊を含めた外接矩形の画像として切り出して、それを文字認識してもよい。
また、文字認識の出力として、文字コード、その文字の確信度の他に、字画数を加えてもよい。文字候補の複雑度は、その字画数を用いてもよいし、切り出した外接矩形の交番数を計測して用いてもよい。交番数は、外接矩形に対して、数本の平行線や垂線に沿って、白黒の反転回数を計測し、そのうちの最大数を交番数とすればよい。
また、複雑度の代わりに、予め定められた文字のリストを用いてもいい。つまり、そのリストに含まれている文字は、起点文字探索モジュール１３０Ａによって選択されない。そのリストに含まれている文字として、例えば、「１」、「ｌ」、「｜」、「一」、「□」、「Ｌ」、「○」等の文字がある。情景画像の２値化後の画像には、文字ではないが文字に類似した非文字画像が含まれており、文字認識結果を利用する場合、情景画像中には、前述した文字（「１」等）と間違えやすい自然物や構造物等があるため、単純な形状の文字は起点文字から除外する。また、これらの形状の文字は、文字認識を行うと高い確信度を出力してしまうことが多く、確信度だけで起点文字を判断することはできない。なお、起点文字は複数個あってもよい。 More specifically, the starting character search module 130A recognizes the connected pixel area as a pixel block by labeling processing already known from each binarized image, and cuts it out as a circumscribed rectangular image. By recognizing the character, the character code of the character candidate and its certainty (accuracy) are obtained. When other pixel blocks are included in the circumscribed rectangle area (when the circumscribed rectangles overlap), the circumscribed rectangle image including the plurality of pixel blocks may be cut out and recognized as characters.
In addition to the character code and the certainty of the character, the number of strokes may be added as an output of character recognition. As the complexity of the character candidate, the number of strokes may be used, or the number of alternating turns of the cut out circumscribed rectangle may be used. The number of alternations may be determined by measuring the number of black and white inversions along several parallel lines and perpendiculars with respect to the circumscribed rectangle, and the maximum number of them is the alternation number.
A predetermined character list may be used instead of the complexity. That is, the characters included in the list are not selected by the starting character search module 130A. Examples of characters included in the list include characters such as “1”, “l”, “|”, “one”, “□”, “L”, “◯”, and the like. The image after the binarization of the scene image includes a non-character image that is not a character but is similar to a character. When the character recognition result is used, the above-mentioned character (“1”) is included in the scene image. Because there are natural objects and structures that can be easily mistaken for, etc., simple-shaped characters are excluded from the starting characters. In addition, characters having these shapes often output a high certainty factor when character recognition is performed, and the starting character cannot be determined only by the certainty factor. There may be a plurality of starting characters.

文字列探索モジュール１４０Ａは、起点文字探索モジュール１３０Ａ、起点文字探索モジュール１３０Ｂ、起点文字探索モジュール１３０Ｚ、文字列合成モジュール１５０と接続されている。文字列探索モジュール１４０Ａは、起点文字探索モジュール１３０によって選択された画素塊から隣接する文字を探索して文字列を抽出する。起点文字探索モジュール１３０によって選択された第１の画素塊（以下、起点文字ともいう）の外接矩形の大きさに応じた矩形（以下、ウィンドウともいう）を設定し、その外接矩形の周辺にある各２値化画像のその矩形内を文字認識し、その文字認識した結果の確信度が第３の閾値より高い又は以上である画素塊の組み合わせを、その第１の文字の隣にある第２の文字とし、次にその第２の文字を第１の文字に設定して文字列を抽出する。概略として、起点文字の周辺文字を抽出して、その周辺文字を起点文字にして次々と文字列を辿っていくことによって文字列を抽出する。より具体的には、起点文字の外接矩形の大きさに応じたウィンドウを設定し、その外接矩形の周辺を全ての２値化画像に対して１周させた際に、ウィンドウ内に入る全画素塊の組み合わせを文字認識し、その中で第３の閾値より高い又は以上となった画素塊の組み合わせを隣の文字の候補とし、次に隣の文字候補を起点文字に設定し、同様の方法で文字列を探索していく。 The character string search module 140A is connected to the start character search module 130A, the start character search module 130B, the start character search module 130Z, and the character string composition module 150. The character string search module 140A searches for adjacent characters from the pixel block selected by the starting character search module 130 and extracts a character string. A rectangle (hereinafter also referred to as a window) corresponding to the size of the circumscribed rectangle of the first pixel block (hereinafter also referred to as the origin character) selected by the origin character search module 130 is set and is located around the circumscribed rectangle. Character recognition is performed within the rectangle of each binarized image, and a combination of pixel blocks having a certainty level higher than or equal to the third threshold value is recognized next to the first character. Next, the second character is set as the first character, and the character string is extracted. As an outline, a character string is extracted by extracting the surrounding characters of the starting character and tracing the character string one after another using the surrounding characters as the starting character. More specifically, when a window corresponding to the size of the circumscribed rectangle of the starting character is set and the periphery of the circumscribed rectangle is rotated once for all the binarized images, all the pixels that enter the window A method of recognizing a combination of blocks, setting a combination of pixel blocks higher or higher than the third threshold among them as a candidate for the next character, and then setting the next character candidate as the starting character To search for a string.

また、文字列探索モジュール１４０Ａは、予め定められた条件に合致する画素塊の組み合わせを除外して前記文字認識を行うようにしてもよい。この予め定められた条件は、異質な画素塊を含む組み合わせを除外するための条件であり、具体的には、異なる色の画素塊を含む組み合わせであること、異なるストローク幅を含む組み合わせであること、前記第１の画素塊における文字のストローク幅と異なるストローク幅を含む組み合わせであることのいずれか、又はこれらの組み合わせとしてもよい。なお、ここでのストローク幅として、文字幅としてもよい。 Further, the character string search module 140A may perform the character recognition by excluding combinations of pixel blocks that match a predetermined condition. This predetermined condition is a condition for excluding a combination including a different pixel block, and specifically, a combination including a pixel block of a different color or a combination including a different stroke width. A combination including a stroke width different from the stroke width of the character in the first pixel block or a combination thereof may be used. The stroke width here may be a character width.

より具体的には、文字列探索モジュール１４０Ａによる隣接文字の探索は、起点文字の大きさに基づいたサイズのウィンドウを全ての２値化画像上で起点文字周辺でスライディングさせ、その中の画素塊の組み合わせを文字認識し、その確信度が第３の閾値より高い又は以上である画素塊の組み合わせを隣の文字とし、再帰的に隣の文字の探索を行う。この場合、ウィンドウ内で色の異なる画素塊の組み合わせやストローク幅（文字幅）の異なる組み合わせは除外する。
また、組み合わせた画素塊の外接矩形の横幅又は縦幅が起点文字のサイズにより決定される値の範囲を超えている組み合わせを除外してもよい。
隣の文字候補をウィンドウ内の画素塊に限定している。そして、１つの文字は同じ色で構成されていることと仮定し、異なる色の組み合わせを除外してもよい。起点文字とストローク幅の違う画素塊はノイズの可能性が大きいので除外してもよい。
また、画素塊の組み合わせでその外接矩形の横幅、縦幅が大きい場合（例えば、起点文字の外接矩形の横幅又は縦幅と比較して、対象としている外接矩形の横幅又は縦幅が予め定められた倍数より大きい又は以上である場合、対象としている外接矩形の横幅と縦幅の比率が予め定められた値より大きい又は以上である場合等）は、その外接矩形内の画素塊の組み合わせはノイズを含んでる可能性が大きいので除外してもよい。
また、文字列が同じ色で描かれてると想定できる場合は、ウィンドウ内の画素塊は起点文字と同じ色に限定してもよい。
文字列探索モジュール１４０によって、任意の文字列方向（曲線も含む）に対応し、また、縦横混在の文字列にも対応する。 More specifically, the search for adjacent characters by the character string search module 140A is performed by sliding a window having a size based on the size of the starting character around all the binarized images around the starting character, Character combinations are recognized, and a pixel block combination whose certainty is higher than or higher than the third threshold is set as the adjacent character, and the adjacent character is searched recursively. In this case, combinations of pixel blocks having different colors and combinations having different stroke widths (character widths) are excluded.
Further, combinations in which the width or length of the circumscribed rectangle of the combined pixel block exceeds the range of values determined by the size of the starting character may be excluded.
The adjacent character candidates are limited to the pixel block in the window. Then, assuming that one character is composed of the same color, combinations of different colors may be excluded. A pixel block having a stroke width different from that of the starting character may be excluded because of a high possibility of noise.
In addition, when the width and length of the circumscribed rectangle are large due to the combination of pixel blocks (for example, the width or length of the circumscribed rectangle as a target is compared with the width or length of the circumscribed rectangle of the starting character in advance. If the ratio of the horizontal width to the vertical width of the circumscribed rectangle is larger or larger than a predetermined value), the combination of pixel blocks in the circumscribed rectangle is noise. May be excluded because it is highly likely to contain.
If it can be assumed that the character string is drawn in the same color, the pixel block in the window may be limited to the same color as the starting character.
The character string search module 140 corresponds to an arbitrary character string direction (including a curved line), and also supports a mixed character string.

文字列合成モジュール１５０は、文字列探索モジュール１４０Ａ、文字列探索モジュール１４０Ｂ、文字列探索モジュール１４０Ｚと接続されている。文字列合成モジュール１５０は、各文字列探索モジュール１４０によって抽出された文字列を合成する。文字列合成モジュール１５０は、各２値化画像における文字列画像を論理和合成して文字列画像を生成すること、又は、各２値化画像における文字列画像に対応する各文字の文字コード毎に計数し、その計数の結果に基づいて文字列画像を生成するようにしてもよい。 The character string synthesis module 150 is connected to the character string search module 140A, the character string search module 140B, and the character string search module 140Z. The character string synthesis module 150 synthesizes the character strings extracted by the character string search modules 140. The character string synthesizing module 150 generates a character string image by logically synthesizing the character string images in each binarized image, or for each character code of each character corresponding to the character string image in each binarized image The character string image may be generated based on the result of the counting.

図２は、本実施の形態による処理例を示すフローチャートである。
ステップＳ２０２では、画像受付モジュール１１０が、画像を受け付ける。
ステップＳ２０４では、各２値化モジュール１２０が、画像を２値化処理して、複数の２値化画像を生成する。
ステップＳ２０６では、各起点文字探索モジュール１３０が、各２値化画像から文字である可能性の高い画素塊（起点文字）を選択する。
ステップＳ２０８では、各文字列探索モジュール１４０が、選択された画素塊（起点文字）に隣接する文字を探索して、文字列を抽出する。
ステップＳ２１０では、文字列合成モジュール１５０が、抽出した文字列を合成する。
ステップＳ２１２では、文字列合成モジュール１５０が、文字列を出力する。 FIG. 2 is a flowchart showing an example of processing according to this embodiment.
In step S202, the image reception module 110 receives an image.
In step S204, each binarization module 120 binarizes the image to generate a plurality of binarized images.
In step S206, each starting character search module 130 selects a pixel block (starting character) that is highly likely to be a character from each binarized image.
In step S208, each character string search module 140 searches for a character adjacent to the selected pixel block (starting character) and extracts a character string.
In step S210, the character string synthesis module 150 synthesizes the extracted character string.
In step S212, the character string composition module 150 outputs a character string.

図３は、本実施の形態の起点文字探索モジュール１３０内の構成例についての概念的なモジュール構成図である。起点文字探索モジュール１３０は、文字認識モジュール３１０、複雑度算出モジュール３２０、起点文字探索処理モジュール３３０を有している。
文字認識モジュール３１０は、起点文字探索処理モジュール３３０と接続されている。文字認識モジュール３１０は、文字認識処理を行う。文字認識結果として、文字コード、その文字の確信度を出力する。
複雑度算出モジュール３２０は、起点文字探索処理モジュール３３０と接続されている。複雑度算出モジュール３２０は、前述の交番数を計測する。また、文字認識モジュール３１０によって文字認識された文字の字画数を出力する。例えば、文字コードと字画数を対応させて記憶しているテーブルを予め用意しておき、そのテーブルから文字コードを検索して、対応する字画数を出力すればよい。
起点文字探索処理モジュール３３０は、文字認識モジュール３１０、複雑度算出モジュール３２０と接続されている。起点文字探索処理モジュール３３０は、２値化画像内の画素塊が文字である可能性である文字認識モジュール３１０と複雑度算出モジュール３２０から出力した情報に基づいて、起点文字の画素塊を選択する。 FIG. 3 is a conceptual module configuration diagram of a configuration example in the starting character search module 130 of the present embodiment. The starting character search module 130 includes a character recognition module 310, a complexity calculation module 320, and a starting character search processing module 330.
The character recognition module 310 is connected to the starting character search processing module 330. The character recognition module 310 performs character recognition processing. As the character recognition result, the character code and the certainty factor of the character are output.
The complexity calculation module 320 is connected to the starting character search processing module 330. The complexity calculation module 320 measures the number of alternating turns described above. In addition, the number of strokes of characters recognized by the character recognition module 310 is output. For example, a table that stores character codes and the number of strokes in correspondence with each other may be prepared in advance, the character codes may be searched from the table, and the corresponding number of strokes may be output.
The starting character search processing module 330 is connected to the character recognition module 310 and the complexity calculation module 320. The starting character search processing module 330 selects the pixel block of the starting character based on the information output from the character recognition module 310 and the complexity calculation module 320, which are the possibility that the pixel block in the binarized image is a character. .

図４は、本実施の形態（起点文字探索モジュール１３０）による処理例を示すフローチャートである。
ステップＳ４０２では、文字認識モジュール３１０が、２値化画像に対して文字認識処理を行う。
ステップＳ４０４では、複雑度算出モジュール３２０が、その文字の複雑度を算出する。
ステップＳ４０６では、起点文字探索処理モジュール３３０が、文字列内の起点となる文字を選択する。 FIG. 4 is a flowchart showing an example of processing by the present embodiment (starting character search module 130).
In step S402, the character recognition module 310 performs character recognition processing on the binarized image.
In step S404, the complexity calculation module 320 calculates the complexity of the character.
In step S406, the starting character search processing module 330 selects a starting character in the character string.

図５は、本実施の形態の文字列探索モジュール１４０内の構成例についての概念的なモジュール構成図である。
文字列探索モジュール１４０は、ウィンドウ設定モジュール５１０、文字認識モジュール５２０、選択モジュール５３０、制御モジュール５４０を有している。
ウィンドウ設定モジュール５１０は、制御モジュール５４０と接続されている。ウィンドウ設定モジュール５１０は、起点文字探索モジュール１３０によって選択された起点文字の外接矩形の大きさに応じたウィンドウを設定する。起点文字の外接矩形の大きさと同じ大きさにしてもよいし、起点文字の外接矩形の縦長、横長に、予め定められた倍率を乗算して得た大きさにしてもよい。図１１の例を用いて説明する。起点文字探索モジュール１３０によって、図９の例に示した２値化画像９００内から「商」の文字画像が起点文字１１１０として選択されたとする。２値化画像９００の中でこの文字は複雑度が高い文字であるからである。ウィンドウ設定モジュール５１０は、起点文字１１１０の外接矩形の大きさに予め定められた倍率を乗算して得た大きさのウィンドウを設定する。例えば、検索窓１１３０であり、これは起点文字１１１０よりも大きい。具体的には、起点文字１１１０の外接矩形がＮ×Ｎの場合、検索窓１１３０をｔＮ×ｔＮに設定する。起点文字１１１０の外接矩形の周囲を１周するように、検索窓１１３０を走査する。走査として、例えば、起点文字１１１０の外接矩形の左上（検索範囲１１２０の左上角）、上、右上（検索範囲１１２０の右上角）、右、右下（検索範囲１１２０の右下角）、下、左下（検索範囲１１２０の左下角）、左の８個の検索窓１１３０を設定するようにしてもよいし、起点文字１１１０の外接矩形の左上（検索範囲１１２０の左上角）から予め定めた画素（例えば１画素）ずつ移動させて、起点文字１１１０の外接矩形を１周するように、検索窓１１３０を設定するようにしてもよい。したがって、探索範囲１１２０は、起点文字１１１０を中心とした「（２ｔ＋１）Ｎ×（２ｔ＋１）Ｎ」の矩形となる。なお、ｔは１以上が望ましい。さらには、１以上２未満が望ましい。 FIG. 5 is a conceptual module configuration diagram of a configuration example in the character string search module 140 of the present embodiment.
The character string search module 140 includes a window setting module 510, a character recognition module 520, a selection module 530, and a control module 540.
The window setting module 510 is connected to the control module 540. The window setting module 510 sets a window corresponding to the size of the circumscribed rectangle of the starting character selected by the starting character search module 130. The size may be the same as the size of the circumscribed rectangle of the starting character, or may be a size obtained by multiplying the length and width of the circumscribed rectangle of the starting character by a predetermined magnification. This will be described with reference to the example of FIG. Assume that the character image of “quotient” is selected as the starting character 1110 from the binarized image 900 shown in the example of FIG. 9 by the starting character search module 130. This is because this character is a highly complex character in the binarized image 900. The window setting module 510 sets a window having a size obtained by multiplying the size of the circumscribed rectangle of the starting character 1110 by a predetermined magnification. For example, the search window 1130 is larger than the starting character 1110. Specifically, when the circumscribed rectangle of the starting character 1110 is N × N, the search window 1130 is set to tN × tN. The search window 1130 is scanned so as to make one round around the circumscribed rectangle of the starting character 1110. As scanning, for example, the upper left of the circumscribed rectangle of the starting character 1110 (upper left corner of the search range 1120), upper, upper right (upper right corner of the search range 1120), right, lower right (lower right corner of the search range 1120), lower, lower left The eight search windows 1130 on the left (lower left corner of the search range 1120) may be set, or a predetermined pixel (for example, the upper left corner of the search range 1120) from the upper left corner of the circumscribed rectangle of the starting character 1110 (for example, The search window 1130 may be set so as to move around the circumscribed rectangle of the starting character 1110 by one pixel). Therefore, the search range 1120 is a rectangle of “(2t + 1) N × (2t + 1) N” with the starting character 1110 as the center. T is preferably 1 or more. Furthermore, 1 or more and less than 2 are desirable.

文字認識モジュール５２０は、制御モジュール５４０と接続されている。文字認識モジュール５２０は、前述の文字認識モジュール３１０と同等のものであり、１つの文字認識モジュールを起点文字探索モジュール１３０、文字列探索モジュール１４０が用いるようにしてもよい。文字認識モジュール５２０は、ウィンドウ設定モジュール５１０によって設定されたウィンドウ（例えば、検索窓１１３０）内の画像を文字認識する。
制御モジュール５４０は、ウィンドウ設定モジュール５１０、文字認識モジュール５２０、選択モジュール５３０と接続されている。制御モジュール５４０は、検索窓１１３０を起点文字１１１０の周りに１周させ（例えば、探索方向１１４２、探索方向１１４４、探索方向１１４６、探索方向１１４８）、その過程で検索窓１１３０内に入った画素塊の組み合わせを算出する。 The character recognition module 520 is connected to the control module 540. The character recognition module 520 is equivalent to the character recognition module 310 described above, and one character recognition module may be used by the starting character search module 130 and the character string search module 140. The character recognition module 520 recognizes characters in an image in the window (for example, the search window 1130) set by the window setting module 510.
The control module 540 is connected to the window setting module 510, the character recognition module 520, and the selection module 530. The control module 540 causes the search window 1130 to make a round around the origin character 1110 (for example, search direction 1142, search direction 1144, search direction 1146, search direction 1148), and the pixel block that has entered the search window 1130 in the process. The combination of is calculated.

なお、検索窓１１３０の移動量は、画素塊毎に移動するようにしてもよいし、予め定められた画素毎（例えば、検索窓１１３０の大きさ毎、１画素毎等）移動するようにしてもよい。
また、図１２の例に示すように、探索範囲１１２０は、他の２値化画像（例えば、全ての２値化画像）に設定してもよい。つまり、起点文字１１１０Ｂを選択した２値化画像９００Ｂに対してだけ探索範囲１１２０を設定するのではなく、２値化画像９００Ａ、２値化画像９００Ｚにも探索範囲１１２０を設置し、２値化画像９００Ａでは起点画像（起点文字１１１０Ｂに対応する位置の画像）の周囲を検索窓１１３０Ａでスキャンし、２値化画像９００Ｂでは起点文字１１１０Ｂの周囲を検索窓１１３０Ｂでスキャンし、２値化画像９００Ｚでは起点画像（起点文字１１１０Ｂに対応する位置の画像）の周囲を検索窓１１３０Ｚでスキャンする。 The moving amount of the search window 1130 may be moved for each pixel block, or may be moved for each predetermined pixel (for example, for each size of the search window 1130, for each pixel, etc.). Also good.
Further, as shown in the example of FIG. 12, the search range 1120 may be set to other binarized images (for example, all binarized images). In other words, the search range 1120 is not set only for the binarized image 900B in which the starting character 1110B is selected, but the search range 1120 is also set in the binarized image 900A and the binarized image 900Z. In the image 900A, the periphery of the starting image (image at a position corresponding to the starting character 1110B) is scanned with the search window 1130A, and in the binarized image 900B, the periphery of the starting character 1110B is scanned with the search window 1130B. Then, the search window 1130Z scans around the origin image (image at a position corresponding to the origin character 1110B).

検索窓１１３０を移動させ、図１３の例に示すように「た」を含むところでは、検索窓１３３０内にはノイズ（非文字）９３０も含めて画素塊（連結領域）の数は４つある。具体的には、図１４（ａ）の例に示す検索窓１３３０では、図１４（ｂ）の例に示すように、認識対象１４１０、認識対象１４２０、認識対象１４３０、認識対象１４４０の４つとなる。したがって、全部で２^４−１個の組み合わせができる。なお、この式内の「−１」は、何もない場合（検索窓１３３０が空白の場合）は、対象外としたものである。これらの１つ１つの組み合わせの画像を、文字認識モジュール５２０で文字認識させる。 When the search window 1130 is moved and “TA” is included as shown in the example of FIG. 13, the search window 1330 has four pixel clusters (connected regions) including noise (non-character) 930. . Specifically, in the search window 1330 shown in the example of FIG. 14A, as shown in the example of FIG. 14B, there are four recognition objects 1410, recognition objects 1420, recognition objects 1430, and recognition objects 1440. . Therefore, a total of 2 ⁴ -1 combinations are possible. It should be noted that “−1” in this expression is excluded from the object when there is nothing (when the search window 1330 is blank). The character recognition module 520 causes the character recognition module 520 to recognize characters of each one of these combinations.

選択モジュール５３０は、制御モジュール５４０と接続されている。選択モジュール５３０は、文字認識モジュール５２０によって文字認識した結果の確信度が第３の閾値より高い又は以上である検索窓１１３０内の画素塊の組み合わせを、起点文字の隣にある第２の文字として選択する。ただし、複数の２値化画像において、同じ検索窓の位置で閾値を超える複数の文字候補があった場合は、もっとも確信度の高い文字を選択する。 The selection module 530 is connected to the control module 540. The selection module 530 uses, as the second character next to the starting character, a combination of pixel blocks in the search window 1130 whose certainty of the result of character recognition by the character recognition module 520 is higher or higher than the third threshold. select. However, in a plurality of binarized images, when there are a plurality of character candidates exceeding the threshold at the same search window position, the character with the highest certainty is selected.

制御モジュール５４０が行う隣接する文字の探索について、図１５の例を用いて説明する。図１５は、本実施の形態（文字列探索モジュール１４０）による処理例を示す説明図である。この探索は、再帰的な繰り返しを用いている。この例では、起点文字を選択するために用いる第１の閾値と、隣接する文字を選択するために用いる第３の閾値は異なる。第１の閾値を第３の閾値よりも高くすることが望ましい。図１５の例では、第１の閾値を０．９９９とし、第３の閾値を０．９９０とする。なお、確信度は、０〜１までの値をとり、１に近いほど確信度が高い（その文字である可能性が高い）ことを示している。文字認識結果（商）１５１０は、確信度が０．９９９７であったので、起点文字として選択された。もちろんのことながら、他の条件（複雑度、予め定められた文字のリストにない）も合致しているものである。そして、文字認識結果（商）１５１０の周囲をスキャンした結果、文字認識結果（た）１５２０の確信度が０．９９５３であり、文字認識結果（店）１５５０の確信度が０．９９２１であった。つまり、第３の閾値よりも高いので、起点文字の次の文字として、文字認識結果（た）１５２０、文字認識結果（店）１５５０が選択される。ここでは、深さ優先（確信度が高いものを優先）で、探索を行う。したがって、文字認識結果（た）１５２０を起点文字として、文字認識結果（た）１５２０の周囲をスキャンした結果（なお、文字認識結果（商）１５１０は対象から外す）、文字認識結果（か）１５３０の確信度が０．９９３６であり、第３の閾値よりも高いので、文字認識結果（た）１５２０の次の文字として、文字認識結果（か）１５３０が選択される。次に、文字認識結果（か）１５３０を起点文字として、文字認識結果（か）１５３０の周囲をスキャンした結果（なお、文字認識結果（た）１５２０は対象から外す）、文字認識結果（な）１５４０の確信度が０．９９７４であり、第３の閾値よりも高いので、文字認識結果（か）１５３０の次の文字として、文字認識結果（な）１５４０が選択される。文字認識結果（な）１５４０の周囲をスキャンした結果（なお、文字認識結果（か）１５３０は対象から外す）、第３の閾値よりも高いものはなかったので、順に戻る（文字認識結果（な）１５４０、文字認識結果（か）１５３０、文字認識結果（た）１５２０、文字認識結果（商）１５１０の順）。次に、文字認識結果（店）１５５０の周囲をスキャンした結果（なお、文字認識結果（商）１５１０は対象から外す）、第３の閾値よりも高いものはなかったので終了する。この結果、探索順は、「商→た→か→な→店」となる。もちろんのことながら、文字の位置に基づいて並べると「なかた商店」となる。 The search for adjacent characters performed by the control module 540 will be described with reference to the example of FIG. FIG. 15 is an explanatory diagram showing a processing example according to the present exemplary embodiment (character string search module 140). This search uses recursive iterations. In this example, the first threshold value used for selecting the starting character is different from the third threshold value used for selecting the adjacent character. It is desirable to make the first threshold value higher than the third threshold value. In the example of FIG. 15, the first threshold value is 0.999, and the third threshold value is 0.990. The certainty factor is a value from 0 to 1, and the closer the value is to 1, the higher the certainty factor is (that is, the possibility of the character is high). The character recognition result (quotient) 1510 was selected as the starting character because the certainty factor was 0.9997. Of course, other conditions (complexity, not in a predetermined list of characters) are also met. As a result of scanning around the character recognition result (quotient) 1510, the certainty factor of the character recognition result (ten) 1520 was 0.9953, and the certainty factor of the character recognition result (store) 1550 was 0.9921. . That is, since it is higher than the third threshold, the character recognition result (ta) 1520 and the character recognition result (store) 1550 are selected as the next character after the starting character. Here, the search is performed with depth priority (high priority is given priority). Therefore, the result of scanning around the character recognition result (ta) 1520 using the character recognition result (ta) 1520 as a starting character (note that the character recognition result (quotient) 1510 is excluded from the target), and the character recognition result (ka) 1530. Therefore, the character recognition result (ka) 1530 is selected as the next character after the character recognition result (ta) 1520. Next, the result of scanning around the character recognition result (ka) 1530 using the character recognition result (ka) 1530 as a starting character (note that the character recognition result (ta) 1520 is excluded from the target), the character recognition result (na) Since the certainty factor of 1540 is 0.9974, which is higher than the third threshold, the character recognition result (NA) 1540 is selected as the next character after the character recognition result (KA) 1530. As a result of scanning around the character recognition result (NA) 1540 (note that the character recognition result (KA) 1530 is excluded from the target), there is nothing higher than the third threshold value, so the order returns (character recognition result (NA 1540, character recognition result (ka) 1530, character recognition result (ta) 1520, character recognition result (quotient) 1510 in this order). Next, the result of scanning the vicinity of the character recognition result (store) 1550 (note that the character recognition result (quotient) 1510 is excluded from the target), and since there is nothing higher than the third threshold value, the process ends. As a result, the search order is “quotient → ta → ka → na → store”. Of course, it becomes “Nakata store” when arranged based on the position of the character.

図６は、本実施の形態（文字列探索モジュール１４０）による処理例を示すフローチャートである。
ステップＳ６０２では、ウィンドウ設定モジュール５１０が、起点文字に応じて検索窓を設定する。
ステップＳ６０４では、文字認識モジュール５２０が、各検索窓内の画像に対して文字認識処理を行う。
ステップＳ６０６では、選択モジュール５３０が、起点文字の隣にある文字を文字認識結果内から選択する。
ステップＳ６０８では、制御モジュール５４０が、終了したか否かを判断し、終了した場合は処理を終了し（ステップＳ６９９）、それ以外の場合はステップＳ６１０へ進む。
ステップＳ６１０では、選択した文字を起点文字とし、ステップＳ６０２へ戻る。 FIG. 6 is a flowchart illustrating a processing example according to the present exemplary embodiment (character string search module 140).
In step S602, the window setting module 510 sets a search window according to the starting character.
In step S604, the character recognition module 520 performs character recognition processing on the image in each search window.
In step S606, the selection module 530 selects a character next to the starting character from the character recognition result.
In step S608, the control module 540 determines whether or not the process has been completed. If the process has been completed, the process ends (step S699). Otherwise, the process proceeds to step S610.
In step S610, the selected character is set as the starting character, and the process returns to step S602.

図７は、本実施の形態の文字列探索モジュール１４０内の構成例についての概念的なモジュール構成図である。文字列探索モジュール１４０は、ウィンドウ設定モジュール５１０、除外モジュール７１０、文字認識モジュール５２０、選択モジュール５３０、制御モジュール５４０を有している。図５の例で示した文字列探索モジュール１４０に除外モジュール７１０を付加したものである。
除外モジュール７１０は、制御モジュール５４０と接続されている。除外モジュール７１０は、予め定められた条件に合致する画素塊の組み合わせを、文字認識モジュール５２０の処理対象から除外する。予め定められた条件として、例えば、異なる色の画素塊を含む組み合わせであること、異なるストローク幅を含む組み合わせであること、起点文字のストローク幅と異なるストローク幅を含む組み合わせであることのいずれか、又はこれらの組み合わせがある。画素塊の色は、画像受付モジュール１１０が受け付けた画像（カラー画像）から、その画素塊の位置に対応する画素の色を抽出すればよい。「異なる色の画素塊を含む組み合わせ」の条件は、１文字内で色が異なるものを除外するものである。「異なるストローク幅」とは、画素塊の組み合わせの中で、各画素塊の幅が異なることをいう。「起点文字のストローク幅と異なるストローク幅」とは、起点文字の外接矩形の幅と画素塊の組み合わせを囲む外接矩形の幅とが異なることをいう。ここで異なるとは、完全一致以外の場合だけでなく、画素塊の幅の最小と最大の差が予め定められた範囲に収まっていない場合としてもよい。したがって、その範囲に収まっている場合は異なってはいないと判断することになる。 FIG. 7 is a conceptual module configuration diagram of a configuration example in the character string search module 140 of the present embodiment. The character string search module 140 includes a window setting module 510, an exclusion module 710, a character recognition module 520, a selection module 530, and a control module 540. An exclusion module 710 is added to the character string search module 140 shown in the example of FIG.
The exclusion module 710 is connected to the control module 540. The exclusion module 710 excludes a pixel block combination that matches a predetermined condition from the processing target of the character recognition module 520. As the predetermined condition, for example, one of a combination including pixel blocks of different colors, a combination including a different stroke width, or a combination including a stroke width different from the stroke width of the starting character, Or there are combinations of these. As for the color of the pixel block, the color of the pixel corresponding to the position of the pixel block may be extracted from the image (color image) received by the image receiving module 110. The condition of “combination including pixel clusters of different colors” is to exclude those having different colors within one character. “Different stroke width” means that each pixel block has a different width in the combination of pixel blocks. “Stroke width different from the stroke width of the starting character” means that the width of the circumscribed rectangle of the starting character is different from the width of the circumscribed rectangle surrounding the combination of pixel blocks. Here, “difference” is not limited to a case other than a perfect match, but may be a case where the difference between the minimum and maximum pixel block widths is not within a predetermined range. Therefore, if it is within that range, it is determined that there is no difference.

図８は、本実施の形態による処理例を示すフローチャートである。
ステップＳ８０２では、ウィンドウ設定モジュール５１０が、起点文字に応じて検索窓を設定する。
ステップＳ８０４では、除外モジュール７１０が、異質な検索窓の画像は文字認識対象から除外する。
ステップＳ８０６では、文字認識モジュール５２０が、検索窓内の画像に対して文字認識処理を行う。
ステップＳ８０８では、選択モジュール５３０が、起点文字の隣にある文字を文字認識結果内から選択する。
ステップＳ８１０では、制御モジュール５４０が、終了したか否かを判断し、終了した場合は処理を終了し（ステップＳ８９９）、それ以外の場合はステップＳ８１２へ進む。
ステップＳ８１２では、選択した文字を起点文字とし、ステップＳ８０２へ戻る。
ステップＳ８０６では、全ての検索窓内の画像に対して文字認識処理を行うわけではなく、ステップＳ８０４で除外された検索窓以外の検索窓内の画像に対して文字認識処理を行う。 FIG. 8 is a flowchart showing an example of processing according to this embodiment.
In step S802, the window setting module 510 sets a search window according to the starting character.
In step S804, the exclusion module 710 excludes a foreign search window image from the character recognition target.
In step S806, the character recognition module 520 performs character recognition processing on the image in the search window.
In step S808, the selection module 530 selects a character next to the starting character from the character recognition result.
In step S810, the control module 540 determines whether or not the process has been completed. If the process has been completed, the process ends (step S899). Otherwise, the process proceeds to step S812.
In step S812, the selected character is used as a starting character, and the process returns to step S802.
In step S806, the character recognition process is not performed on all the images in the search window, but the character recognition process is performed on the images in the search windows other than the search window excluded in step S804.

図１６は、本実施の形態（文字列合成モジュール１５０）による処理例を示す説明図である。各２値化画像における文字列画像である文字列探索結果１６１０、文字列探索結果１６２０を合成して、文字列合成結果１６９０を生成している。図１６の例では、論理和（ＯＲ）合成した結果を示している。なお、文字列探索結果１６１０、文字列探索結果１６２０と２つの文字列の結果だけを示しているが、起点文字の数だけ発生することになる。また、各２値化画像における文字列画像に対応する各文字の文字コード毎に計数し、その計数の結果に基づいて文字列画像を生成するようにしてもよい。つまり、各文字の文字コードの多数決の結果、もっとも数の多かった文字コードに対応する画像を選択して文字列画像を生成している。 FIG. 16 is an explanatory diagram showing a processing example according to the present embodiment (character string synthesis module 150). A character string search result 1610 and a character string search result 1620, which are character string images in each binarized image, are combined to generate a character string combination result 1690. In the example of FIG. 16, the result of logical sum (OR) synthesis is shown. Note that only the character string search result 1610, the character string search result 1620, and the result of the two character strings are shown, but only the number of starting characters is generated. In addition, the character code of each character corresponding to the character string image in each binarized image may be counted, and the character string image may be generated based on the result of the count. That is, the character string image is generated by selecting the image corresponding to the character code having the largest number as a result of the majority of the character codes of each character.

図１７を参照して、本実施の形態の画像処理装置のハードウェア構成例について説明する。図１７に示す構成は、例えばパーソナルコンピュータ（ＰＣ）などによって構成されるものであり、スキャナ等のデータ読み取り部１７１７と、プリンタなどのデータ出力部１７１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the present embodiment will be described with reference to FIG. The configuration shown in FIG. 17 is configured by a personal computer (PC), for example, and shows a hardware configuration example including a data reading unit 1717 such as a scanner and a data output unit 1718 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１７０１は、前述の実施の形態において説明した各種のモジュール、すなわち、画像受付モジュール１１０、２値化モジュール１２０、起点文字探索モジュール１３０、文字列探索モジュール１４０、文字列合成モジュール１５０、文字認識モジュール３１０、複雑度算出モジュール３２０、起点文字探索処理モジュール３３０、ウィンドウ設定モジュール５１０、文字認識モジュール５２０、選択モジュール５３０、制御モジュール５４０、除外モジュール７１０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 A CPU (Central Processing Unit) 1701 includes various modules described in the above-described embodiments, that is, the image reception module 110, the binarization module 120, the starting character search module 130, the character string search module 140, and a character string synthesis module. 150, a character recognition module 310, a complexity calculation module 320, a starting character search processing module 330, a window setting module 510, a character recognition module 520, a selection module 530, a control module 540, an exclusion module 710, and the like. The control unit executes processing according to the computer program.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１７０２は、ＣＰＵ１７０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１７０３は、ＣＰＵ１７０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス１７０４により相互に接続されている。 A ROM (Read Only Memory) 1702 stores programs used by the CPU 1701, calculation parameters, and the like. A RAM (Random Access Memory) 1703 stores programs used in the execution of the CPU 1701, parameters that change as appropriate in the execution, and the like. These are connected to each other by a host bus 1704 including a CPU bus.

ホストバス１７０４は、ブリッジ１７０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス１７０６に接続されている。 The host bus 1704 is connected to an external bus 1706 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 1705.

キーボード１７０８、マウス等のポインティングデバイス１７０９は、操作者により操作される入力デバイスである。ディスプレイ１７１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）などがあり、各種情報をテキストやイメージ情報として表示する。 A keyboard 1708 and a pointing device 1709 such as a mouse are input devices operated by an operator. The display 1710 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１７１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ１７０１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、画像受付モジュール１１０が受け付けた画像、２値化画像層１２２、起点文字に関する情報、文字認識結果、最終的な処理結果である文字列（文字列画像）などが格納される。さらに、その他の各種のデータ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 1711 has a built-in hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 1701 and information. The hard disk stores an image received by the image receiving module 110, a binarized image layer 122, information about a starting character, a character recognition result, a character string (character string image) that is a final processing result, and the like. Further, various computer programs such as various other data processing programs are stored.

ドライブ１７１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体１７１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース１７０７、外部バス１７０６、ブリッジ１７０５、及びホストバス１７０４を介して接続されているＲＡＭ１７０３に供給する。リムーバブル記録媒体１７１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 1712 reads out data or a program recorded on a removable recording medium 1713 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and the data or program is read out as an interface 1707 and an external bus 1706. , The bridge 1705, and the RAM 1703 connected via the host bus 1704. The removable recording medium 1713 can also be used as a data recording area similar to the hard disk.

接続ポート１７１４は、外部接続機器１７１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート１７１４は、インタフェース１７０７、及び外部バス１７０６、ブリッジ１７０５、ホストバス１７０４等を介してＣＰＵ１７０１等に接続されている。通信部１７１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部１７１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部１７１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 1714 is a port for connecting the external connection device 1715 and has a connection unit such as USB or IEEE1394. The connection port 1714 is connected to the CPU 1701 and the like via an interface 1707, an external bus 1706, a bridge 1705, a host bus 1704, and the like. The communication unit 1716 is connected to a communication line and executes data communication processing with the outside. The data reading unit 1717 is a scanner, for example, and executes document reading processing. The data output unit 1718 is a printer, for example, and executes document data output processing.

なお、図１７に示す画像処理装置のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図１７に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図１７に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus illustrated in FIG. 17 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 17, and the modules described in the present embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line In addition, a plurality of systems shown in FIG. 17 may be connected to each other via communication lines so as to cooperate with each other. Further, it may be incorporated in a copying machine, a fax machine, a scanner, a printer, a multifunction machine (an image processing apparatus having any two or more functions of a scanner, a printer, a copying machine, a fax machine, etc.).

また、前述の実施の形態の説明において、予め定められた値との比較において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。 Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more”, and “less” may be used.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１００…画像処理装置
１１０…画像受付モジュール
１２０…２値化モジュール
１２２…２値化画像層
１３０…起点文字探索モジュール
１４０…文字列探索モジュール
１５０…文字列合成モジュール DESCRIPTION OF SYMBOLS 100 ... Image processing apparatus 110 ... Image reception module 120 ... Binarization module 122 ... Binary image layer 130 ... Origin character search module 140 ... Character string search module 150 ... Character string synthesis module

Claims

A plurality of binarization means for binarizing the image;
Selection means for selecting a pixel block based on a possibility that the pixel block in each binarized image binarized by the binarization unit is a character;
Extraction means for searching for adjacent characters from the pixel block selected by the selection means and extracting a character string;
An image processing apparatus comprising: a combining unit that combines the character strings extracted by the extracting unit.

The selection means has a certainty factor of a character as a result of character recognition of each of the binarized images higher than or equal to a predetermined first threshold, and the complexity of the character is higher than a second threshold. The image processing apparatus according to claim 1, wherein the pixel block is selected from the above.

The image processing apparatus according to claim 2, wherein the complexity is any one of a number of alternations in a circumscribed rectangle of the pixel block, a number of strokes of characters as a character recognition result, or a combination thereof. .

The selection means is a pixel block in which the certainty factor of the character as a result of character recognition of each of the binarized images is higher or higher than a predetermined first threshold value, and the character is not a predetermined character. The image processing apparatus according to claim 1, wherein the image processing apparatus is selected.

The extraction unit sets a rectangle corresponding to the size of a circumscribed rectangle of the first pixel block selected by the selection unit, and performs character recognition in the rectangle of each binarized image around the circumscribed rectangle. Then, a combination of pixel blocks whose certainty of the result of character recognition is higher than or higher than a third threshold is set as a second character next to the first character, and then the second character is The image processing apparatus according to any one of claims 1 to 4, wherein the character string is extracted by setting the first character.

The image processing apparatus according to claim 5, wherein the extraction unit performs the character recognition by excluding a combination of pixel blocks matching a predetermined condition.

The predetermined condition is a combination including a pixel block of a different color, a combination including a different stroke width, or a combination including a stroke width different from the stroke width of the character in the first pixel block. The image processing apparatus according to claim 6, wherein the image processing apparatus is any one of the above or a combination thereof.

The synthesizing unit generates a character string image by performing a logical sum synthesis on the character string images in the binarized images, or for each character code of each character corresponding to the character string image in the binarized images. The image processing apparatus according to claim 1, wherein a character string image is generated based on a result of the counting.

The binarization unit performs binarization using a plurality of different thresholds, binarization using a plurality of different thresholds after inversion of the image, or binarization of a combination thereof. The image processing apparatus according to any one of the above.

Computer
A plurality of binarization means for binarizing the image;
Selection means for selecting a pixel block based on a possibility that the pixel block in each binarized image binarized by the binarization unit is a character;
Extraction means for searching for adjacent characters from the pixel block selected by the selection means and extracting a character string;
An image processing program for causing a character string extracted by the extracting unit to function as a combining unit.