JP2018097707A

JP2018097707A - Information processor, character recognition method, computer program, and storage medium

Info

Publication number: JP2018097707A
Application number: JP2016242985A
Authority: JP
Inventors: 金津　知俊; Tomotoshi Kanatsu; 知俊金津
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-12-15
Filing date: 2016-12-15
Publication date: 2018-06-21

Abstract

PROBLEM TO BE SOLVED: To provide an information processor for performing character recognition of a character string included in a plurality of photographed images with partially deteriorated quality.SOLUTION: An information processor 100 generates a first recognition graph from a first character pixel block extracted from a first image obtained by photographing a character string and a first character element classified from the first character pixel block. The information processor 100 generates a second recognition graph from a second character pixel block extracted from a second image obtained by photographing a character string and a second character element classified from the second character pixel block. On the basis of determination on whether or not the first character pixel block and the second character pixel block correspond to the same character string, the information processor 100 generates an integrated recognition graph obtained by integrating the first recognition graph and the second recognition graph and recognizes the character string on the basis of the integrated recognition graph.SELECTED DRAWING: Figure 9

Description

本発明は、動画等から得られる複数の画像から文字を認識する文字認識機能を有する情報処理装置に関する。 The present invention relates to an information processing apparatus having a character recognition function for recognizing characters from a plurality of images obtained from a moving image or the like.

スマートフォンやタブレット端末等のカメラ機能付きで手持の情報処理装置は、紙面や看板等の被写体を撮影し、その撮影画像から、文字列認識処理により被写体に印刷される文字列を認識することができる。情報処理装置による撮影では、手ブレやオートフォーカスのずれ等による撮影画像の品質低下が、文字認識の精度に影響する。特許文献１では、文字が印刷された被写体の動画を撮影し、同一の文字列に対して複数の撮影画像により文字認識を行い、その結果を統合して文字列認識結果を取得する画像処理装置を開示する。この画像処理装置は、対象の文字毎に最も多い認識結果を採用することで、不良な撮影画像からの認識結果を除外して、多数決の効果を利用した高精度の文字認識を実現する。特許文献２は、連接文字の発生頻度であるＮ−ｇｒａｍや、認識対象の文字列パターンを定義した辞書のような、認識対象文字のコンテキスト情報を用いた文字認識方法を開示する。この方法は、認識対象領域の画像を文字として分類した結果生じる複数の分類候補から文字列認識結果を選択する際にコンテキスト情報を用いることで、精度の高い文字認識を行う。また、この方法では、入力画像から複数の切り出し方式で抽出した認識対象領域の集合を統合した上でコンテキスト情報を利用して、文字列としての認識結果を得ることもできる。 An information processing device with a camera function such as a smartphone or a tablet terminal can photograph a subject such as a paper surface or a signboard and recognize a character string printed on the subject by character string recognition processing from the photographed image. . In photographing by an information processing apparatus, a reduction in quality of a photographed image due to camera shake or a shift in autofocus affects the accuracy of character recognition. In Patent Literature 1, an image processing apparatus that captures a moving image of a subject on which characters are printed, performs character recognition on a plurality of captured images of the same character string, and integrates the results to obtain a character string recognition result. Is disclosed. This image processing apparatus employs the largest number of recognition results for each target character, thereby excluding recognition results from defective captured images and realizing highly accurate character recognition using the effect of majority vote. Patent Document 2 discloses a character recognition method using context information of recognition target characters, such as N-gram, which is the frequency of occurrence of concatenated characters, and a dictionary that defines character string patterns to be recognized. This method performs highly accurate character recognition by using context information when selecting a character string recognition result from a plurality of classification candidates generated as a result of classifying an image of a recognition target area as a character. In this method, a recognition result as a character string can also be obtained by integrating a set of recognition target areas extracted from an input image by a plurality of extraction methods and using context information.

特開２０１０−２１８０６１号公報JP 2010-218061 A 米国公開２０１５／０３５６３６５号公報US Publication No. 2015/0356365

情報処理装置は、品質の低い撮影画像から認識対象となる文字の領域を抽出する場合、複数文字を一つの領域として抽出したり、あるいは１文字を複数の領域に分割して抽出することがある。この場合、特許文献１の画像処理装置では、動画から得られる複数の撮影画像により文字毎の認識結果を収集しようとしても、文字領域の抽出が不適切であるために、各文字の認識結果の関連付けが困難である。そのために、多数決の効果を得られるほどの数の認識結果を得ることができず、高精度の文字認識ができない。あるいは品質の高い撮影画像を得るまで、ユーザが長時間撮影を行う必要がある。特許文献２のコンテキスト情報を用いる文字認識方法は、撮影画像から文字領域を１文字単位で切り出して抽出することが困難な場合でも有効であるが、この場合、処理対象となる撮影画像は１つである。そのために、動画を構成する複数の撮影画像からの切り出し誤りを含む文字領域の集合がある場合に、それら全体からコンテキスト情報を用いて文字列として文字列認識結果を得ることが困難である。 When extracting an area of a character to be recognized from a low-quality captured image, the information processing apparatus may extract a plurality of characters as one area or may divide and extract one character into a plurality of areas. . In this case, in the image processing apparatus of Patent Document 1, even if an attempt is made to collect recognition results for each character from a plurality of captured images obtained from a moving image, the extraction of the character region is inappropriate. The association is difficult. For this reason, it is not possible to obtain a number of recognition results sufficient to obtain the effect of majority voting, and character recognition with high accuracy cannot be performed. Alternatively, it is necessary for the user to perform shooting for a long time until a high-quality shot image is obtained. The character recognition method using the context information in Patent Document 2 is effective even when it is difficult to extract and extract a character area from a captured image in units of one character. In this case, one captured image is a processing target. It is. For this reason, when there is a set of character areas including cut-out errors from a plurality of captured images constituting a moving image, it is difficult to obtain a character string recognition result as a character string using context information from all of them.

本発明は、上記課題に鑑みてなされたものであり、部分的に品質の低い複数の撮影画像に含まれる文字列の文字認識を高精度に行うことができる情報処理装置を提供することを主たる目的とする。 The present invention has been made in view of the above problems, and mainly provides an information processing apparatus capable of performing character recognition of a character string included in a plurality of captured images partially having low quality with high accuracy. Objective.

本発明の情報処理装置は、被写体に印刷された文字列を撮影した撮影画像を取得する取得手段と、前記撮影画像の文字列の部分から文字画素塊を抽出する抽出手段と、前記文字画素塊に基づいて文字要素を分類する分類手段と、前記文字画素塊と前記文字要素の分類結果とから認識グラフを生成する生成手段と、前記認識グラフに基づいて前記文字列を認識する認識手段と、を備え、前記生成手段は、複数の前記撮影画像のうちの第１の画像から抽出される第１の文字画素塊と、前記第１の文字画素塊から分類される第１の文字要素とから第１の認識グラフを生成するとともに、複数の前記撮影画像のうちの前記第１の画像とは異なる第２の画像から抽出される第２の文字画素塊と、前記第２の文字画素塊から分類される第２の文字要素とから第２の認識グラフを生成し、前記認識手段は、前記第１の文字画素塊と前記第２の文字画素塊とが同一の文字列に対応するか否かの判断に基づいて、前記第１の認識グラフと前記第２の認識グラフとを統合した統合認識グラフを生成し、この統合認識グラフに基づいて前記文字列を認識することを特徴とする。 The information processing apparatus according to the present invention includes an acquisition unit that acquires a captured image obtained by capturing a character string printed on a subject, an extraction unit that extracts a character pixel block from a character string portion of the captured image, and the character pixel block. Classifying means for classifying character elements based on the above, generating means for generating a recognition graph from the character pixel block and the classification result of the character elements, recognition means for recognizing the character string based on the recognition graph, And the generation means includes a first character pixel block extracted from a first image of the plurality of photographed images and a first character element classified from the first character pixel block. From the second character pixel block extracted from the second image different from the first image among the plurality of captured images, and the second character pixel block, while generating a first recognition graph The second character element to be classified, A second recognition graph is generated, and the recognizing unit determines whether the first character pixel block and the second character pixel block correspond to the same character string. An integrated recognition graph in which the recognition graph is integrated with the second recognition graph is generated, and the character string is recognized based on the integrated recognition graph.

本発明によれば、統合認識グラフに基づいて文字列を認識することで、部分的に品質の低い複数の撮影画像に含まれる文字列の高精度な文字認識を行うことが可能となる。 According to the present invention, by recognizing a character string based on an integrated recognition graph, it is possible to perform highly accurate character recognition of character strings included in a plurality of captured images that are partially low in quality.

（ａ）、（ｂ）は、情報処理装置の外観例を示す図。(A), (b) is a figure which shows the external appearance example of information processing apparatus. 情報処理装置の機能ブロック図。The functional block diagram of information processing apparatus. 文字列認識処理の流れを表すフローチャート。The flowchart showing the flow of a character string recognition process. Ｓ３０８の処理の流れを表すフローチャート。The flowchart showing the flow of the process of S308. 文字画素塊の抽出処理及び文字要素の生成処理の説明図。Explanatory drawing of the extraction process of a character pixel block, and the production | generation process of a character element. 認識グラフの例示図。FIG. コンテキスト情報を適用した認識グラフの例示図。The illustration figure of the recognition graph to which context information is applied. 認識グラフの更新処理の流れを表すフローチャート。The flowchart showing the flow of the update process of a recognition graph. 文字列認識処理を施す具体例の説明図。Explanatory drawing of the specific example which performs a character string recognition process. （ａ）〜（ｆ）は、認識グラフの具体例の説明図。(A)-(f) is explanatory drawing of the specific example of a recognition graph. 文字列認識処理の変形例の説明図。Explanatory drawing of the modification of a character string recognition process.

以下、実施の形態を図面を参照しつつ詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the drawings.

（情報処理装置の構成）
図１は、本実施形態の情報処理装置の外観例を示す図である。情報処理装置１００は、板状の外観であり、その一方の面に撮像器１０１を備え（図１（ａ））、一方の面の反対に位置する他方の面に表示器１０２及び操作ボタン１０３を備える（図１（ｂ））。情報処理装置１００は、内部に、通信部１０４及び制御部１０５を備えるコンピュータである。情報処理装置１００は、例えばスマートフォンやタブレット端末、高機能のカメラにより実現される。制御部１０５は、ＣＰＵ（Central Processing Unit）やメモリ等を備える。制御部１０５は、情報処理装置１００の動作を制御する。 (Configuration of information processing device)
FIG. 1 is a diagram illustrating an appearance example of the information processing apparatus according to the present embodiment. The information processing apparatus 100 has a plate-like appearance, and includes an image pickup device 101 on one surface thereof (FIG. 1A), and a display device 102 and operation buttons 103 on the other surface opposite to the one surface. (FIG. 1B). The information processing apparatus 100 is a computer that includes a communication unit 104 and a control unit 105 therein. The information processing apparatus 100 is realized by, for example, a smartphone, a tablet terminal, or a high function camera. The control unit 105 includes a CPU (Central Processing Unit), a memory, and the like. The control unit 105 controls the operation of the information processing apparatus 100.

撮像器１０１は、実世界の情景を撮像して、撮影画像を表す画像データを取得する。撮像器１０１は、例えばレンズ及び撮像素子を備える。表示器１０２は、撮像器１０１で取得した画像データをユーザに視認させる出力インタフェースである。表示器１０２は、例えば液晶ディスプレイである。操作ボタン１０３は、ユーザが情報処理装置１００に対して撮影開始、撮影終了等を指示するための入力インタフェースである。操作ボタン１０３は、機械式あるいは感圧式の物理的なボタンである。なお、表示器１０２をタッチパネルにより構成する場合、操作ボタン１０３の機能を表示器１０２が備えていてもよい。通信部１０４は、無線通信を行う通信インタフェースである。通信部１０４は、情報処理装置１００と外部のサーバ等の装置との、通信網を介した通信制御を行う。 The imaging device 101 captures a real world scene and acquires image data representing a captured image. The image pickup device 101 includes, for example, a lens and an image pickup device. The display device 102 is an output interface that allows the user to visually recognize the image data acquired by the image pickup device 101. The display device 102 is, for example, a liquid crystal display. The operation button 103 is an input interface for the user to instruct the information processing apparatus 100 to start shooting, end shooting, and the like. The operation button 103 is a mechanical or pressure-sensitive physical button. When the display device 102 is configured by a touch panel, the display device 102 may have the function of the operation button 103. The communication unit 104 is a communication interface that performs wireless communication. The communication unit 104 controls communication between the information processing apparatus 100 and an apparatus such as an external server via a communication network.

撮像器１０１は、被写体を連続して複数回撮影することで、動画の撮影が可能である。撮影された動画は、直ちに表示器１０２に表示される。そのために情報処理装置１００のユーザは、情報処理装置１００が撮影中の被写体を随時視認することができる。情報処理装置１００は、撮影画像に含まれる文字列の文字認識を行う機能を有する。情報処理装置１００は、文字認識の結果取得した文字列認識結果を、撮影画像に関連付けて表示器１０２に表示してもよい。また、情報処理装置１００は、取得した文字列認識結果を、通信部１０４を介して外部の装置へ送信してもよい。 The image pickup device 101 can shoot a moving image by continuously shooting a subject a plurality of times. The captured moving image is immediately displayed on the display 102. Therefore, the user of the information processing apparatus 100 can view the subject that the information processing apparatus 100 is photographing at any time. The information processing apparatus 100 has a function of performing character recognition of a character string included in a captured image. The information processing apparatus 100 may display the character string recognition result acquired as a result of character recognition on the display unit 102 in association with the captured image. The information processing apparatus 100 may transmit the acquired character string recognition result to an external apparatus via the communication unit 104.

図２は、情報処理装置１００の機能ブロック図である。情報処理装置１００は、例えば制御部１０５に設けられるＣＰＵが、所定のコンピュータプログラムを実行することで各機能を実現する。情報処理装置１００は、撮影画像取得部２０１、撮影画像処理部２０２、表示制御部２０３、文字列検出部２０４、文字列認識部２０５、及び文字列情報記憶部２０６として機能する。 FIG. 2 is a functional block diagram of the information processing apparatus 100. In the information processing apparatus 100, for example, a CPU provided in the control unit 105 realizes each function by executing a predetermined computer program. The information processing apparatus 100 functions as a captured image acquisition unit 201, a captured image processing unit 202, a display control unit 203, a character string detection unit 204, a character string recognition unit 205, and a character string information storage unit 206.

撮影画像取得部２０１は、撮像器１０１から所定の撮影間隔で撮影画像の画像データを取得する。撮影間隔は、例えば１秒間に３０回あるいは６０回である。画像データは、撮影画像処理部２０２及び表示制御部２０３へ送られる。 The captured image acquisition unit 201 acquires image data of a captured image from the imaging device 101 at a predetermined capturing interval. The shooting interval is, for example, 30 times or 60 times per second. The image data is sent to the captured image processing unit 202 and the display control unit 203.

撮影画像処理部２０２は、所定の撮影間隔で撮影画像取得部２０１から取得する撮影画像の画像データへの画像処理を行う。本実施形態の撮影画像処理部２０２は、少なくとも以下の３つの処理を行う。
（処理１）画像データを表示制御部２０３、文字列検出部２０４、及び文字列認識部２０５による処理に適した形式に補正する。
（処理２）画像データから、表示制御部２０３、文字列検出部２０４、及び文字列認識部２０５による処理に適切な画像及び不適切な画像を選別する。
（処理３）選別した複数の画像において同一被写体の移動があれば、その移動量を算出する。 The captured image processing unit 202 performs image processing on the image data of the captured image acquired from the captured image acquisition unit 201 at a predetermined capturing interval. The captured image processing unit 202 of the present embodiment performs at least the following three processes.
(Process 1) The image data is corrected to a format suitable for processing by the display control unit 203, the character string detection unit 204, and the character string recognition unit 205.
(Process 2) An image suitable for processing by the display control unit 203, the character string detection unit 204, and the character string recognition unit 205 and an inappropriate image are selected from the image data.
(Process 3) If there is a movement of the same subject in the selected plurality of images, the movement amount is calculated.

表示制御部２０３は、表示器１０２に表示するための表示画像を生成する。表示制御部２０３は、生成した表示画像を表示器１０２に表示させる。表示画像は、例えば撮影画像取得部２０１から取得する画像データが表す撮影画像である。表示制御部２０３は、撮影間隔と同じ時間間隔で表示画像を表示器１０２に表示することで、ユーザに撮影内容や撮影状態を確認させることができる。なお、表示画像は、撮影画像処理部２０２によりユーザが視認しやすいように画像処理されていてもよい。また、表示画像は、文字列認識結果が付加、重畳されていてもよい。 The display control unit 203 generates a display image to be displayed on the display device 102. The display control unit 203 causes the display 102 to display the generated display image. The display image is a captured image represented by image data acquired from the captured image acquisition unit 201, for example. The display control unit 203 displays the display image on the display unit 102 at the same time interval as the shooting interval, thereby allowing the user to check the shooting content and shooting state. Note that the display image may be image-processed by the captured image processing unit 202 so that the user can easily recognize it. The display image may have a character string recognition result added and superimposed.

文字列検出部２０４は、撮影画像処理部２０２で画像処理された画像データに基づいて、撮影画像内の文字認識の対象となる文字列画像の領域を検出する。文字列認識部２０５は、文字列検出部２０４で検出された文字列画像の領域内で、公知の文字列認識処理を行い、文字コードの並びからなる文字列認識結果を生成する。文字列認識部２０５は、撮影画像処理部２０２で画像処理された画像データに基づいて、撮影画像の文字列画像の領域内の文字列認識処理を行う。 The character string detection unit 204 detects a region of a character string image that is a target of character recognition in the captured image based on the image data that has been subjected to image processing by the captured image processing unit 202. The character string recognition unit 205 performs a known character string recognition process within the region of the character string image detected by the character string detection unit 204, and generates a character string recognition result including a sequence of character codes. The character string recognition unit 205 performs character string recognition processing in the region of the character string image of the photographed image based on the image data image-processed by the photographed image processing unit 202.

文字列情報記憶部２０６は、文字列検出部２０４で検出された文字列画像の領域を文字列情報として記憶する。文字列画像の領域は、例えば矩形で表される、文字列情報記憶部２０６は、該矩形の座標を文字列画像の領域として記憶する。文字列検出部２０４が複数の文字列画像の領域を検出する場合、文字列情報記憶部２０６は、各領域の座標を記憶する。文字列情報記憶部２０６は、文字列に対して文字列認識部２０５が生成する文字列認識結果を文字列情報に加えて記憶する。文字列情報記憶部２０６に記憶される内容は、必要に応じて通信部１０４から外部の装置へ送信されてもよい。 The character string information storage unit 206 stores the area of the character string image detected by the character string detection unit 204 as character string information. The area of the character string image is represented by a rectangle, for example, and the character string information storage unit 206 stores the coordinates of the rectangle as the area of the character string image. When the character string detection unit 204 detects a plurality of character string image regions, the character string information storage unit 206 stores the coordinates of each region. The character string information storage unit 206 stores the character string recognition result generated by the character string recognition unit 205 for the character string in addition to the character string information. The content stored in the character string information storage unit 206 may be transmitted from the communication unit 104 to an external device as necessary.

（文字列認識処理）
図３は、このような構成の情報処理装置１００を用いた文字列認識処理の流れを表すフローチャートである。以下、フローチャートは、ＣＰＵが制御プログラムを実行することにより実現されるものとする。情報処理装置１００は、文字列認識処理の開始が指示されると、撮像器１０１を起動して所定の撮影間隔で撮影を開始する。これにより撮像器１０１は、連続した撮影画像からなる動画を撮影する。 (Character string recognition processing)
FIG. 3 is a flowchart showing the flow of character string recognition processing using the information processing apparatus 100 having such a configuration. Hereinafter, the flowchart is realized by the CPU executing the control program. When the information processing apparatus 100 is instructed to start the character string recognition process, the information processing apparatus 100 activates the image pickup device 101 and starts shooting at a predetermined shooting interval. As a result, the image pickup device 101 captures a moving image composed of continuous captured images.

撮影画像取得部２０１は、撮像器１０１が撮影した撮影画像を表す画像データを所定の撮影間隔で連続して取得する（Ｓ３０１）。つまり、図３の文字列認識処理は、所定の撮影間隔で連続する撮影画像に対して、連続して行われる。以降、時刻ｔで撮影された撮影画像を表す画像データを画像ｔと表す。文字列認識処理では、画像ｔあるいは画像ｔを撮影画像処理部２０２で処理した画像に対して文字認識を行う。 The captured image acquisition unit 201 continuously acquires image data representing captured images captured by the image pickup device 101 at a predetermined capturing interval (S301). That is, the character string recognition process of FIG. 3 is continuously performed on captured images that are continuous at a predetermined capturing interval. Hereinafter, image data representing a photographed image photographed at time t is represented as an image t. In the character string recognition process, character recognition is performed on the image t or an image obtained by processing the image t by the captured image processing unit 202.

撮影画像処理部２０２は、画像ｔに対して上記の（処理１）を行う。具体的には、撮影画像処理部２０２は、画像ｔに対して公知のぼけ補正等のフィルタ処理や、解像度変換処理等の画像処理を行う（Ｓ３０２）。これらの処理により画像ｔは、表示器１０２への表示や文字列検出、文字列認識といった処理に好適な画像データとなる。 The captured image processing unit 202 performs the above (Process 1) on the image t. Specifically, the captured image processing unit 202 performs known filter processing such as blur correction and image processing such as resolution conversion processing on the image t (S302). Through these processes, the image t becomes image data suitable for processes such as display on the display 102, character string detection, and character string recognition.

撮影画像処理部２０２は、画像処理後の画像ｔに対して上記の（処理２）を行う。これにより撮影画像処理部２０２は、画像処理後の画像ｔの画質が、文字列検出部２０４及び文字列認識部２０５が行う処理に適切か否かを判断する（Ｓ３０３）。この判断は、例えば画像ｔに含まれるエッジの総量あるいは密度といった評価値を用いて行われる。評価値が所定の閾値よりも低い場合、画像ｔは、ぼけが発生した品質の低い撮影画像であり、文字列検出の精度が期待できない不適切な画像であると判断される。
この他に評価値には、直前の撮影画像（画像ｔ-1、ｔ-2等）との画素差分の総和や平均を用いてもよい。このような評価値が閾値より低い場合、画像ｔは、ブレが少なく文字認識に適切な撮影画像であると判断される。また、加速度センサ等により情報処理装置１００の姿勢検知が可能な場合、その検知結果を評価値に用いてもよい。この場合、情報処理装置１００が静止しているときの画像ｔが文字認識に適切であると判断される。なお、撮影画像処理部２０２によるＳ３０２及びＳ３０３の処理は、どちらが先に行われてもよい。 The captured image processing unit 202 performs the above (Process 2) on the image t after image processing. Thereby, the captured image processing unit 202 determines whether or not the image quality of the image t after the image processing is appropriate for the processing performed by the character string detection unit 204 and the character string recognition unit 205 (S303). This determination is performed using an evaluation value such as the total amount or density of edges included in the image t, for example. If the evaluation value is lower than the predetermined threshold, the image t is determined to be a captured image with low quality in which the blur has occurred and an inappropriate image for which the accuracy of character string detection cannot be expected.
In addition, the sum or average of the pixel differences from the immediately preceding captured image (images t-1, t-2, etc.) may be used as the evaluation value. When such an evaluation value is lower than the threshold value, it is determined that the image t is a captured image with little blur and suitable for character recognition. Further, when the attitude of the information processing apparatus 100 can be detected by an acceleration sensor or the like, the detection result may be used as an evaluation value. In this case, it is determined that the image t when the information processing apparatus 100 is stationary is appropriate for character recognition. Note that either of the processing of S302 and S303 by the captured image processing unit 202 may be performed first.

画像ｔの画質が文字列検出部２０４及び文字列認識部２０５により行われる処理に適切である場合（Ｓ３０３：Y）、撮影画像処理部２０２は、画像ｔに対して上記の（処理３）を行う。撮影画像処理部２０２は、画像ｔとそれ以前に撮像された画像ｔ-x（ｘは１以上の整数）との２画像間で文字認識の対象となる同一の文字列が移動する場合に、その移動量を算出する。文字認識の対象となる文字列とは、具体的には、文字列情報記憶部２０６に記憶された文字列である。この文字列は、過去に、画像ｔ以前の撮影画像に対して後述の文字列認識処理で検出された文字列である。撮影画像処理部２０２は、算出した移動量に基づいて、文字列画像の領域を表す文字列情報を更新する（Ｓ３０４）。移動量の算出時に用いる画像ｔ-xは、例えばｘ＝１として、常に一つ前の撮影画像とする。その場合、Ｓ３０４の処理では、毎回、画像ｔ-1から画像ｔまでの移動量に応じて文字列の座標を更新することで、以降、それらを画像ｔにおける文字列の位置として取り扱うことができる。あるいは、移動量の算出は、後述の文字列検出処理を行った画像ｔ’と、画像ｔとの間で行ってもよい。このとき、オリジナルの文字列座標として画像ｔ’からの検出時の文字列座標を文字列情報に保持しておき、移動量で更新された座標を画像ｔにおける文字列座標とする。さらに、文字列毎に異なる検出時の画像を対象に移動量を算出して文字列座標を更新してもよい。 When the image quality of the image t is appropriate for the processing performed by the character string detection unit 204 and the character string recognition unit 205 (S303: Y), the captured image processing unit 202 performs the above (processing 3) on the image t. Do. When the same character string to be character-recognized moves between two images of the image t and an image t-x (x is an integer of 1 or more) captured before that, the captured image processing unit 202 The movement amount is calculated. Specifically, the character string subject to character recognition is a character string stored in the character string information storage unit 206. This character string is a character string detected in the past by a character string recognition process described later with respect to a captured image before the image t. The captured image processing unit 202 updates the character string information representing the region of the character string image based on the calculated movement amount (S304). The image t-x used for calculating the movement amount is always the previous photographed image, for example, x = 1. In that case, in the process of S304, the coordinates of the character string are updated each time according to the movement amount from the image t-1 to the image t, and thereafter, these can be handled as the position of the character string in the image t. . Alternatively, the movement amount may be calculated between an image t ′ subjected to a character string detection process described later and the image t. At this time, the character string coordinates at the time of detection from the image t ′ are held in the character string information as the original character string coordinates, and the coordinates updated by the movement amount are set as the character string coordinates in the image t. Furthermore, the character string coordinates may be updated by calculating the amount of movement for the detection images that differ for each character string.

移動量の算出には、公知のオプティカルフロー検出による方法を用いることができる。例えば移動量は、公知のＳＩＦＴ（Scale-Invariant Feature Transform）やＯＲＢ（Oriented FAST and Rotated BRIEF）等の局所特徴点検出を用いて算出される。局所特徴点検出により２画像で共通となる特徴点のペアを取得し、それら特徴点ペアの中で対象被写体に一致もしくは近いペア間の距離を被写体の移動量とする。あるいは、２画像のそれぞれ全体から求めた特徴点間で対応点マッチングをとり、画像間のホモグラフィ変換行列を推定し、対象文字列の座標にホモグラフィ変換を行って移動量を算出する。 For the calculation of the movement amount, a known method based on optical flow detection can be used. For example, the movement amount is calculated using local feature point detection such as a well-known SIFT (Scale-Invariant Feature Transform) or ORB (Oriented FAST and Rotated BRIEF). A pair of feature points that are common to the two images is acquired by local feature point detection, and the distance between the pairs that match or are close to the target subject in the feature point pairs is set as the amount of movement of the subject. Alternatively, corresponding point matching is performed between feature points obtained from the entire two images, a homography transformation matrix between the images is estimated, and the movement amount is calculated by performing homography transformation on the coordinates of the target character string.

撮影画像処理部２０２は、画像ｔが文字列検出処理の実行対象か否かを判断する（Ｓ３０５）。画像ｔが文字列検出処理の実行対象か否かは、例えば文字列検出部２０４により１回の検出処理の時間内にｎ回の撮影が行われる場合、画像ｎ、画像２ｎ、画像３ｎが検出処理の実行対象と判断され、それ以外の撮影画像が検出処理の実行対象外と判断される。あるいは、文字列検出部２０４の処理状態をリアルタイムで監視し、以前に取得した画像を処理中でない場合にのみ、現在の画像ｔが検出処理の実行対象と判断される。また、Ｓ３０３の処理による評価値を検出処理の実行対象の判断条件に加えてもよい。 The captured image processing unit 202 determines whether or not the image t is an execution target of the character string detection process (S305). Whether or not the image t is an execution target of the character string detection process is detected by, for example, the image n, the image 2n, and the image 3n when the character string detection unit 204 captures n times within the time of one detection process. It is determined that the process is to be executed, and other captured images are determined not to be the detection process. Alternatively, the processing state of the character string detection unit 204 is monitored in real time, and only when the previously acquired image is not being processed, the current image t is determined to be a detection process execution target. Further, the evaluation value obtained by the process of S303 may be added to the determination condition for the execution target of the detection process.

画像ｔが文字列検出処理の実行対象である場合（Ｓ３０５：Y）、文字列検出部２０４は、画像ｔ内に存在する文字列を検出する（Ｓ３０６）。検出結果は、文字列座標情報として文字列情報記憶部２０６に保存される。文字列座標情報は、例えば文字列領域を包含する矩形の４隅の座標のリストである。文字列座標情報は、画像ｔから検出されたという情報と関連づけられていてもよい。文字列情報記憶部２０６に空き領域がなく、過去の撮影画像から検出された文字列情報が保存されている場合、過去の情報を消去したうえで画像ｔから検出した文字列情報のみを保存するようにしてもよい。あるいは、画像ｔから検出した文字列情報と過去の文字列情報との対応を調べ、一致するものが存在すれば該文字列情報の座標情報を更新し、存在しなければ新規の文字列情報として保存するようにしてもよい。 When the image t is an execution target of the character string detection process (S305: Y), the character string detection unit 204 detects a character string existing in the image t (S306). The detection result is stored in the character string information storage unit 206 as character string coordinate information. The character string coordinate information is, for example, a list of coordinates of four corners of a rectangle that includes a character string region. The character string coordinate information may be associated with information that is detected from the image t. When there is no free space in the character string information storage unit 206 and character string information detected from a past captured image is stored, only the character string information detected from the image t is stored after erasing the past information. You may do it. Alternatively, the correspondence between the character string information detected from the image t and the past character string information is checked, and if there is a match, the coordinate information of the character string information is updated. You may make it preserve | save.

文字列検出部２０４は、画像ｔ内の文字列検出を公知技術により行う。例えば文字列検出部２０４は、画像ｔを二値化し、連結する黒画素をラベリング処理で連結して連結画素塊（Connected Component）を抽出する。「連結画素塊」とは、画素が連結している画素群の領域をいう。文字列検出部２０４は、抽出した連結画素塊のうち、外接矩形の大きさ等から文字らしいと推定される文字画素塊を、さらに近傍に存在する他の文字画素塊と結合して文字列領域を抽出する。抽出された各文字列には、後述する文字列認識処理のための位置情報が付与される。
なお、文字列検出部２０４は、画像ｔの二値化ではなく、例えば、輝度や色の近い画素を連結することで連結画素塊を抽出してもよい。あるいは文字列検出部２０４は、エッジ抽出をおこない、連結するエッジ画素から連結画素塊を抽出してもよい。また、文字列検出処理を高速化するために、文字列検出部２０４は、縮小処理を施した画像ｔから連結画素塊を抽出して文字列を検出してもよい。 The character string detector 204 detects a character string in the image t by a known technique. For example, the character string detection unit 204 binarizes the image t, connects the black pixels to be connected by a labeling process, and extracts a connected pixel block (Connected Component). A “connected pixel block” refers to a region of a pixel group in which pixels are connected. The character string detection unit 204 combines a character pixel block that is estimated to be a character from the size of the circumscribed rectangle among the extracted connected pixel blocks with other character pixel blocks existing in the vicinity, thereby To extract. Each extracted character string is given position information for character string recognition processing to be described later.
Note that the character string detection unit 204 may extract a connected pixel block by connecting pixels having similar luminance and color, for example, instead of binarizing the image t. Alternatively, the character string detection unit 204 may perform edge extraction and extract a connected pixel block from connected edge pixels. In order to speed up the character string detection process, the character string detection unit 204 may detect a character string by extracting a connected pixel block from the image t subjected to the reduction process.

文字列検出後、あるいは画像ｔが文字列検出処理の実行対象ではない場合（Ｓ３０５：N）、文字列認識部２０５は、画像ｔが文字列認識処理の実行対象であるか否かを判断する（Ｓ３０７）。文字列認識処理の実行対象であるか否かの判断は、Ｓ３０５の処理と同様に、文字列認識部２０５の１回の認識処理時間とその間の撮影画像の取得回数に基づいて行われる。文字列認識部２０５は、Ｓ３０３の処理で用いた画像ｔが文字列認識処理に適切か否かの評価値を、実行対象とするか否かの判断条件に加えてもよい。文字列認識部２０５は、文字列検出処理が実際に行われた回数と同期して、１回の文字列検出処理につき、ｎ回（ｎ＞１）の文字列認識処理を行うように、Ｓ３０５の処理における判断と連動して文字列認識処理の対象であるか否かの判断を行ってもよい。文字列情報記憶部２０６に検出された文字列の情報が無い場合、文字列認識部２０５は、画像ｔが文字列認識処理の実行対象ではないと判断する。 After the character string is detected or when the image t is not an execution target of the character string detection process (S305: N), the character string recognition unit 205 determines whether the image t is an execution target of the character string recognition process. (S307). Whether or not the character string recognition process is to be executed is determined based on one recognition processing time of the character string recognition unit 205 and the number of captured images acquired during that time, as in the process of S305. The character string recognition unit 205 may add an evaluation value as to whether or not the image t used in the process of S303 is appropriate for the character string recognition process to the determination condition as to whether or not to execute. In step S <b> 305, the character string recognition unit 205 performs n times (n> 1) of character string recognition processing per character string detection processing in synchronization with the number of times the character string detection processing is actually performed. In conjunction with the determination in this process, it may be determined whether or not the character string recognition process is an object. When there is no information on the detected character string in the character string information storage unit 206, the character string recognition unit 205 determines that the image t is not an execution target of the character string recognition process.

画像ｔが文字列認識処理の実行対象である場合（Ｓ３０７：Y）、文字列認識部２０５は、画像ｔについての文字列認識処理を実行する（Ｓ３０８）。処理対象は、文字列情報記憶部２０６に記憶された文字列情報であり、その処理結果は該当する文字列情報へ追加される。文字列認識処理の詳細については後述する。 When the image t is an execution target of the character string recognition process (S307: Y), the character string recognition unit 205 executes the character string recognition process for the image t (S308). The processing target is character string information stored in the character string information storage unit 206, and the processing result is added to the corresponding character string information. Details of the character string recognition process will be described later.

文字列認識後、表示器１０２の表示が更新される（Ｓ３０９）。なお、画像ｔの画質が文字列検出部２０４及び文字列認識部２０５により行われる処理に適切ではない場合（Ｓ３０３：Y）、あるいは画像ｔが文字列認識処理の実行対象ではない場合（Ｓ３０７：N）にも、表示器１０２の表示が更新される（Ｓ３０９）。表示制御部２０３は、表示器１０２の表示を、処理中の撮影画像により更新する。
このとき表示制御部２０３は、文字列情報記憶部２０６に記憶された文字列の情報を表示に加えてもよい。例えば、表示制御部２０３は、Ｓ３０６の処理で得られた各文字列の検出結果に基づき、検出した文字列位置がユーザにわかるように枠を影画像に重畳するよう表示してもよい。あるいは、表示制御部２０３は、Ｓ３０８の処理で追加された文字列認識結果を枠に添えて、あるいは表示画面中の特定位置に表示してもよい。情報処理装置１００は、表示更新のタイミングで、文字列情報記憶部２０６の内容を表示用途以外に利用してもよい。例えば情報処理装置１００は、動作中の別のアプリケーションで利用してもよい。あるいは情報処理装置１００は、通信部１０４を経由して他の装置で動作しているプログラムに文字列の認識結果を提供するようにしてもよい。 After the character string recognition, the display on the display 102 is updated (S309). When the image quality of the image t is not appropriate for the processing performed by the character string detection unit 204 and the character string recognition unit 205 (S303: Y), or when the image t is not an execution target of the character string recognition processing (S307: N) also updates the display on the display 102 (S309). The display control unit 203 updates the display on the display unit 102 with the captured image being processed.
At this time, the display control unit 203 may add the character string information stored in the character string information storage unit 206 to the display. For example, the display control unit 203 may display the frame so that the frame is superimposed on the shadow image so that the user can recognize the detected character string position based on the detection result of each character string obtained in the process of S306. Alternatively, the display control unit 203 may display the character string recognition result added in the process of S308 with a frame or at a specific position on the display screen. The information processing apparatus 100 may use the contents of the character string information storage unit 206 for purposes other than display at the timing of display update. For example, the information processing apparatus 100 may be used by another application that is operating. Alternatively, the information processing apparatus 100 may provide a character string recognition result to a program operating on another apparatus via the communication unit 104.

図４は、Ｓ３０８の文字列認識処理の流れを表すフローチャートである。ここでは、文字列認識部２０５が、文字列情報記憶部２０６に記憶された一つの文字列ｉを対象として、画像ｔを用いて行う文字列認識処理について説明する。実際には、文字列認識部２０５は、文字列情報記憶部２０６に記憶されるすべての文字列について、文字列認識処理を繰り返し実行する。 FIG. 4 is a flowchart showing the flow of character string recognition processing in S308. Here, a character string recognition process performed by the character string recognition unit 205 using the image t for one character string i stored in the character string information storage unit 206 will be described. Actually, the character string recognition unit 205 repeatedly executes the character string recognition process for all the character strings stored in the character string information storage unit 206.

文字列認識部２０５は、文字列ｉとして認識されるべき文字列画像を画像ｔから取得する（Ｓ４０１）。例えば文字列認識部２０５は、文字列を包含する矩形の４隅の座標情報から該４点に外接する矩形の範囲を導出し、該範囲の座標を取得する。なお、Ｓ３０４の処理により、文字列ｉの座標情報は画像ｔにおける座標に更新されている。 The character string recognition unit 205 acquires a character string image to be recognized as the character string i from the image t (S401). For example, the character string recognition unit 205 derives a rectangular range circumscribing the four points from the coordinate information of the four corners of the rectangle that includes the character string, and acquires the coordinates of the range. Note that the coordinate information of the character string i is updated to the coordinates in the image t by the process of S304.

文字列認識部２０５は、取得した文字列画像から少なくとも文字の一部を構成する文字画素塊を抽出する（Ｓ４０２）。文字列認識部２０５は、Ｓ３０６の処理と同様の方法により、文字画素塊を抽出することができる。文字列認識部２０５は、抽出した文字画素塊の集合から文字要素を生成する（Ｓ４０３）。「文字要素」は、１又は複数の隣接する文字画素塊をグループ化したものである。 The character string recognition unit 205 extracts a character pixel block constituting at least a part of the character from the acquired character string image (S402). The character string recognizing unit 205 can extract a character pixel block by the same method as the process of S306. The character string recognition unit 205 generates a character element from the set of extracted character pixel blocks (S403). A “character element” is a group of one or a plurality of adjacent character pixel blocks.

図５は、文字画素塊の抽出処理及び文字要素の生成処理の説明図である。図５は、文字列画像５００から文字画素塊５０１〜５０６が抽出され、文字画素塊５０１〜５０６から文字要素５１１〜５２１が生成される様子を表す。文字列画像５００は、紙等の被写体上に印刷されており、該被写体を撮影された撮影画像から取得される。図５の例では、文字列画像５００は、被写体上に印刷された「文化的」の３文字の撮影画像から取得される。文字列認識部２０５は、文字列画像５００に基づいて、連結する黒画素の塊として、６つの文字画素塊５０１〜５０６を抽出する。文字列認識部２０５は、文字画素塊５０１〜５０６のそれぞれに対応する文字要素として、文字要素５１１〜５１６を生成する。文字列認識部２０５は、各文字画素塊５０１〜５０６において、右方向に距離が閾値以下で隣り合う１つ以上の文字画素塊をグループ化して、文字要素５１１〜５２１を生成する。文字要素５１１〜５２１の幅は、大きくなりすぎないように制限される。例えば文字列認識部２０５は、グループ化後の幅が、文字列高さの例えば１．２倍を越える文字要素を生成しない。「文字列高さ」は、例えば文字画素塊の最大高さから推定される値が用いられる。 FIG. 5 is an explanatory diagram of character pixel block extraction processing and character element generation processing. FIG. 5 illustrates a state in which character pixel blocks 501 to 506 are extracted from the character string image 500 and character elements 511 to 521 are generated from the character pixel blocks 501 to 506. The character string image 500 is printed on a subject such as paper, and is acquired from a photographed image obtained by photographing the subject. In the example of FIG. 5, the character string image 500 is acquired from a photographed image of three characters “cultural” printed on the subject. Based on the character string image 500, the character string recognition unit 205 extracts six character pixel blocks 501 to 506 as black pixel blocks to be connected. The character string recognition unit 205 generates character elements 511 to 516 as character elements corresponding to the character pixel blocks 501 to 506, respectively. In each character pixel block 501 to 506, the character string recognizing unit 205 groups one or more character pixel blocks adjacent to each other with a distance equal to or less than the threshold value in the right direction to generate character elements 511 to 521. The widths of the character elements 511 to 521 are limited so as not to become too large. For example, the character string recognition unit 205 does not generate a character element whose grouped width exceeds 1.2 times the character string height, for example. As the “character string height”, for example, a value estimated from the maximum height of the character pixel block is used.

図５の例では、文字画素塊５０２と文字画素塊５０３とをグループ化した文字要素５１７が生成される。文字画素塊５０３と文字画素塊５０４とをグループ化した文字要素５１８が生成される。文字画素塊５０４と文字画素塊５０５とをグループ化した文字要素５１９が生成される。文字画素塊５０４と、文字画素塊５０５及び文字画素塊５０６とをグループ化した文字要素５２０が生成される。文字画素塊５０５と文字画素塊５０６とをグループ化した文字要素５２１が生成される。 In the example of FIG. 5, a character element 517 in which the character pixel block 502 and the character pixel block 503 are grouped is generated. A character element 518 in which the character pixel block 503 and the character pixel block 504 are grouped is generated. A character element 519 in which the character pixel block 504 and the character pixel block 505 are grouped is generated. A character element 520 in which the character pixel block 504, the character pixel block 505, and the character pixel block 506 are grouped is generated. A character element 521 in which the character pixel block 505 and the character pixel block 506 are grouped is generated.

文字列認識部２０５は、文字要素５１１〜５２１のうち、１文字に対応する可能性が低い文字要素をこの段階で除外してもよい。例えば、所定の文字画素塊に別の文字画素塊が完全に、あるいは大部分が包含されるときに、それらの文字画素塊が各々別の１文字ずつに対応する可能性が低い。また、文字列画像が横書きの日本語であることが分かっている場合、縦方向に重なる文字画素塊がそれぞれ別の１文字にとなる可能性が低い。文字列認識部２０５は、そのような関係の２以上の文字画素塊を先に１つの文字画素塊として統合した後に、文字要素の生成を行ってもよい。 The character string recognition unit 205 may exclude a character element that is unlikely to correspond to one character among the character elements 511 to 521 at this stage. For example, when another character pixel block is completely or mostly included in a predetermined character pixel block, it is unlikely that each character pixel block corresponds to another character. In addition, when it is known that the character string image is horizontally written Japanese, there is a low possibility that the character pixel block overlapping in the vertical direction becomes a separate character. The character string recognizing unit 205 may generate a character element after first integrating two or more character pixel blocks having such a relationship as one character pixel block.

文字列認識部２０５は、文字要素の生成後に、生成した文字要素の分類を行う（Ｓ４０４）。文字列認識部２０５は、生成した文字要素のそれぞれに対して、文字列画像中の文字要素の範囲内における画素パターンと、事前に学習された文字パターン辞書との照合により分類処理を行う。文字列認識部２０５は、分類結果を文字パターン辞書のカテゴリに対応する文字コードと、分類の信頼度を表す数値として、文字要素に関連付けて記憶する。記憶される分類結果は、１つの文字要素に対して１つの文字コードであっても、複数の信頼度によって順位づけられアット複数の文字コードであっても、どちらでもよい。 The character string recognition unit 205 classifies the generated character elements after generating the character elements (S404). The character string recognition unit 205 performs a classification process on each of the generated character elements by matching a pixel pattern within the range of the character elements in the character string image with a character pattern dictionary learned in advance. The character string recognition unit 205 stores the classification result in association with the character element as a character code corresponding to the category of the character pattern dictionary and a numerical value representing the reliability of the classification. The classification result to be stored may be one character code for one character element, or may be ranked by a plurality of reliability levels and may be a plurality of character codes.

文字列認識部２０５は、例えば公知の文字認識処方により分類処理を行う。例えば文字列認識部２０５は、分類対象の画素パターンのサイズを正規化した上で（７×７）といったブロックに画素パターンを分割し、各ブロック内の黒画素エッジの量を方向毎に集計して特徴ベクトルを生成する。縦、横、２つの斜めの４方向で集計した場合、（４×７×７）のエッジ量が分類対象の特徴ベクトルとなる。同様に、文字列認識部２０５は、各文字の学習データから得た特徴ベクトルを含む文字パターン辞書を記憶する。文字列認識部２０５は、照合の際に、対象の特徴ベクトルと辞書中各文字の特徴ベクトルとのベクトル間距離を計算し、最も距離の近い１文字を分類結果とする。あるいは文字列認識部２０５は、距離が閾値以下の複数文字への分類可能性をもつ分類結果とする。分類結果には同距離に基づいて数値化された信頼度が付与される。なお、文字列認識部２０５は、別の特徴ベクトル抽出方法や、照合方法で文字要素を分類してもよい。例えば、文字列認識部２０５は、ニューラルネットワーク等を用いたパターン学習器によって分類処理を行ってもよい。 The character string recognizing unit 205 performs a classification process by a known character recognition prescription, for example. For example, the character string recognition unit 205 normalizes the size of the pixel pattern to be classified, divides the pixel pattern into blocks (7 × 7), and totals the amount of black pixel edges in each block for each direction. To generate a feature vector. When totaling in the four directions of vertical, horizontal, and diagonal, the edge amount of (4 × 7 × 7) is the feature vector to be classified. Similarly, the character string recognition unit 205 stores a character pattern dictionary including feature vectors obtained from the learning data of each character. The character string recognizing unit 205 calculates the inter-vector distance between the target feature vector and the feature vector of each character in the dictionary at the time of collation, and sets one character having the closest distance as the classification result. Alternatively, the character string recognizing unit 205 sets a classification result having a possibility of classification into a plurality of characters whose distance is equal to or less than a threshold value. The classification result is given a numerical reliability based on the same distance. Note that the character string recognition unit 205 may classify the character elements by another feature vector extraction method or a collation method. For example, the character string recognition unit 205 may perform the classification process using a pattern learning device using a neural network or the like.

文字列認識部２０５は、現在の処理が文字列情報記憶部２０６に記憶される文字列ｉに対する初回の文字列認識処理であるか否かを判断する（Ｓ４０５）。文字列認識部２０５は、例えば後述の認識グラフの情報が対象の文字列情報に付加されているか否かによりこの判断を行う。付加されていない場合、文字列認識部２０５は、初回の文字列認識処理であると判断する。 The character string recognition unit 205 determines whether or not the current process is the first character string recognition process for the character string i stored in the character string information storage unit 206 (S405). The character string recognizing unit 205 makes this determination based on, for example, whether or not information of a recognition graph described later is added to the target character string information. If not added, the character string recognition unit 205 determines that this is the first character string recognition process.

初回の文字列認識処理である場合（Ｓ４０５：Y）、文字列認識部２０５は、文字列ｉに対する認識グラフを新規作成する（Ｓ４０６）。認識グラフは、文字列画像から抽出された文字画素塊をグラフの節、文字要素に対する分類結果をそれぞれ重み付きの辺、とした有向グラフとして作成される。認識グラフは、文字列情報記憶部２０６中の文字列ｉの情報に付加されて記憶される。 In the case of the first character string recognition process (S405: Y), the character string recognition unit 205 creates a new recognition graph for the character string i (S406). The recognition graph is created as a directed graph in which character pixel blocks extracted from the character string image are sections of the graph, and classification results for the character elements are weighted edges. The recognition graph is added to the information of the character string i in the character string information storage unit 206 and stored.

文字列認識部２０５は、例えば以下のようにして認識グラフを作成する。まず、文字列認識部２０５は、文字列ｉからｎ個の文字画素塊の集合｛Ｃ｝＝｛Ｃ１，．．．，Ｃｎ｝を生成する。ここで、文字列ｉは略水平で、Ｃ１，Ｃ２，…は文字画素塊の左端座標が昇順になるように並ぶ。文字列認識部２０５は、文字画素塊Ｃ１〜Ｃｎに対応するように、グラフのｎ個の節Ｎ１〜Ｎｎと、仮想の終端文字画素塊に相当する終端節Ｎｅを作成する。 The character string recognition unit 205 creates a recognition graph as follows, for example. First, the character string recognition unit 205 sets a set {C} = {C1,. . . , Cn}. Here, the character string i is substantially horizontal, and C1, C2,... Are arranged so that the left end coordinates of the character pixel block are in ascending order. The character string recognition unit 205 creates n nodes N1 to Nn of the graph and a terminal node Ne corresponding to a virtual terminal character pixel cluster so as to correspond to the character pixel blocks C1 to Cn.

次に、文字列認識部２０５は、集合｛Ｃ｝に含まれる文字画素塊をグループ化して文字要素の集合｛Ｇ｝＝｛Ｇ１，．．．，Ｇｕ｝を生成する。集合｛Ｇ｝中の文字要素Ｇｔについて、Ｇｔに含まれ、かつ最も左端座標が小さい文字画素塊をＣｉ、またＣｉより右にあって、文字要素Ｇｔには含まれず、かつ最も左端座標が小さい文字画素塊をＣｊとする。この場合、文字列認識部２０５は、文字要素Ｇｔに対して行われた分類結果に基づき、文字画素塊Ｃｉ、Ｃｊにそれぞれ対応する節Ｎｉから節Ｎｊへの重み付きの辺Ｅｉｊを生成する。なお、文字要素Ｇｔが複数の分類可能性を持つ場合、文字列認識部２０５は、それぞれに対応して複数の辺Ｅｉｊを生成する。 Next, the character string recognizing unit 205 groups the character pixel blocks included in the set {C} to set a character element set {G} = {G1,. . . , Gu}. For the character element Gt in the set {G}, the character pixel block that is included in Gt and has the smallest left end coordinate is Ci, and is located to the right of Ci, not included in the character element Gt, and has the smallest left end coordinate. Let Cj be the character pixel block. In this case, the character string recognition unit 205 generates a weighted side Eij from the node Ni to the node Nj corresponding to the character pixel block Ci and Cj based on the classification result performed on the character element Gt. When the character element Gt has a plurality of classification possibilities, the character string recognition unit 205 generates a plurality of sides Eij corresponding to each.

文字列認識部２０５は、認識グラフの各辺に対し、辺に対応する文字要素の分類結果に基づく重みを設定する。文字要素Ｇｔが、分類結果として（文字コードＡ，信頼度ａ）、（文字コードＢ，信頼度ｂ）、（文字コードＣ，信頼度ｃ）の３つの分類可能性を持つ場合を説明する。この場合、文字列認識部２０５は、節Ｎｉから節Ｎｊへの辺Ｅｉｊとして、Ｅｉｊ（Ａ，ａ）、Ｅｉｊ（Ｂ，ｂ）、Ｅｉｊ（Ｃ，ｃ）と表記される３つの異なる辺を生成する。 The character string recognition unit 205 sets a weight based on the classification result of the character elements corresponding to the sides for each side of the recognition graph. The case where the character element Gt has three classification possibilities of (character code A, reliability a), (character code B, reliability b), and (character code C, reliability c) will be described. In this case, the character string recognition unit 205 uses three different sides represented as Eij (A, a), Eij (B, b), and Eij (C, c) as the side Eij from the node Ni to the node Nj. Generate.

図６は、認識グラフの例示図である。この認識グラフは、図５の文字画素塊５０１〜５０６を節、文字要素５１１〜５２１を辺として作成されている。認識グラフは、文字画素塊５０１〜５０６に対応する節Ｎ１〜Ｎ６及び終端節Ｎｅの計７つの節を持つ。文字要素５１１の分類結果として、「文」「交」「父」の３文字への分類可能性があり、それぞれの信頼度として「０．６」、「０．３」、「０．２」が得られたとする。その結果、節Ｎ１から節Ｎ２への辺として、３つの辺Ｅ１２（文，０．６）、Ｅ１２（交，０．３）、Ｅ１２（父，０．２）が生成される。節Ｎ２より右へも同様に、各文字要素の分類結果による辺が生成される。なお、図６では、Ｅ１２以外の辺としてそれぞれ一本ずつの辺しか記載されていないが、これは図を単純化するためであり、実際には複数の分類可能性に対応する辺が存在することもある。 FIG. 6 is an exemplary diagram of a recognition graph. This recognition graph is created with the character pixel blocks 501 to 506 in FIG. 5 as nodes and the character elements 511 to 521 as edges. The recognition graph has a total of seven nodes, nodes N1 to N6 and a terminal node Ne corresponding to the character pixel blocks 501 to 506. As a result of the classification of the character element 511, there is a possibility of classification into three characters, “sentence”, “community”, and “father”, and the reliability is “0.6”, “0.3”, “0.2”. Is obtained. As a result, three sides E12 (sentence, 0.6), E12 (intersection, 0.3), and E12 (father, 0.2) are generated as sides from the node N1 to the node N2. Similarly, an edge based on the classification result of each character element is generated to the right of the node N2. In FIG. 6, only one edge is shown as each edge other than E12, but this is for simplifying the figure, and there are actually edges corresponding to a plurality of classification possibilities. Sometimes.

認識グラフを作成した文字列認識部２０５は、認識グラフにコンテキスト情報を適用する（Ｓ４０７）。「コンテキスト情報を適用する」とは、認識グラフの各辺の重みをコンテキスト情報に基づいて変更することである。「コンテキスト情報」は、文字列中の注目する文字とその前後にあたる文字との相対関係に関する情報である。コンテキスト情報には、例えば、書籍やインターネット文書から収集した大量のテキスト群に連続して存在するＮ文字の発生確率を定義したＮ−ｇｒａｍがある。本実施形態では、Ｎ−ｇｒａｍの一種で、一般によく利用される２文字頻度Ｂｉｇｒａｍを、コンテキスト情報として認識グラフに適用する例を、以下に説明する。 The character string recognition unit 205 that created the recognition graph applies the context information to the recognition graph (S407). “Applying context information” means changing the weight of each side of the recognition graph based on the context information. “Context information” is information relating to the relative relationship between the character of interest in the character string and the characters before and after it. The context information includes, for example, N-gram that defines the probability of occurrence of N characters that are continuously present in a large amount of text collected from books and Internet documents. In the present embodiment, an example in which a two-character frequency Biggram, which is a type of N-gram and is commonly used, is applied to a recognition graph as context information will be described below.

ひとつの節Ｎｋを介して接続する辺Ｅｉｋ（Ａ，ｗ１）とＥｋｊ（Ｂ，ｗ２）とがあり、Ｂｉｇｒａｍ情報に文字「Ａ」と文字「Ｂ」の連続発生確率がＰと定義されているとき、（式１）により両辺の重みｗ１，ｗ２がそれぞれｗ１’、ｗ２’へと変更される。
ｗ'＝ｗ + ｄ（Ｐ≧ｔｈ） …（式１）
「ｄ」は、重みに対する加算値である。「ｔｈ」は、発生確率に関する閾値となる定数である。 There are sides Eik (A, w1) and Ekj (B, w2) connected through one node Nk, and the consecutive occurrence probability of the characters “A” and “B” is defined as P in the Bigram information. Then, the weights w1 and w2 on both sides are changed to w1 ′ and w2 ′, respectively, by (Equation 1).
w ′ = w + d (P ≧ th) (Formula 1)
“D” is an added value to the weight. “Th” is a constant serving as a threshold relating to the occurrence probability.

文字列認識部２０５は、認識グラフ中の両端以外の全節にて、その左右にある２つの辺のすべての組み合わせに対し、その分類結果に対応する２文字の連続発生確率をＢｉｇｒａｍ情報より取得し、（式１）を適用して重みを変更する。なお、所定の節を介して一つの辺と複数の辺との組み合わせがある場合、文字列認識部２０５は、Ｂｉｇｒａｍ情報の中で最も発生確率が高い２文字に対応する２辺の組み合わせのみを選択し、その重みを変更する。この場合、文字列認識部２０５は、すべての組み合わせに対して重みを変更してもよい。もしくは、文字列認識部２０５は、それらの発生確率の相対比で重みに対する加算値を配分したうえで変更してもよい。 The character string recognizing unit 205 obtains, from Bigram information, the continuous occurrence probability of two characters corresponding to the classification result for all combinations of the two sides on the left and right sides in all sections other than both ends in the recognition graph. Then, the weight is changed by applying (Equation 1). When there is a combination of one side and a plurality of sides through a predetermined clause, the character string recognition unit 205 determines only a combination of two sides corresponding to two characters having the highest occurrence probability in the Bigram information. Select and change its weight. In this case, the character string recognition unit 205 may change the weights for all combinations. Alternatively, the character string recognizing unit 205 may change the value after distributing the addition value for the weight based on the relative ratio of the occurrence probabilities.

図６の認識グラフに対し、Ｂｉｇｒａｍ情報を用いた辺の重みの変更例を以下に説明する。Ｂｉｇｒａｍ情報に定義される２文字の連続発生確率が以下のように定義されており、（式１）における閾値ｔｈ＝０．００１、補正値ｄ=０．２とする。
「文化」＝０．０１
「文イ」＝０．０００１
「交化」＝０．００００５
「交イ」＝０．０００１
「父化」＝０．００３
「父イ」＝０．０００２ An example of changing edge weights using Bigram information for the recognition graph of FIG. 6 will be described below. The consecutive occurrence probability of two characters defined in the Bigram information is defined as follows, and the threshold value th = 0.001 and the correction value d = 0.2 in (Expression 1).
"Culture" = 0.01
"Sentence A" = 0.0001
"Correct" = 0.00005
"Intercourse" = 0.0001
“Fathering” = 0.003
"Father Lee" = 0.0002

このとき、（式１）による変更の対象となる、閾値ｔｈ以上の確率を持つ２文字は、「文化」及び「父化」である。これらに対応する辺はＥ１２とＥ２４の組み合わせで、且つＥ１２は「文」及び「父」の両方の分類可能性に対する辺が存在する。文字列認識部２０５は、ここでは最も確率の高い「文化」に対応する２辺のみを変更の対象として、Ｅ１２（文，０．６）、Ｅ２４（化，０．５）の組み合わせのみをＥ１２（文，０．８)、Ｅ２４（化，０．７）へと変更する。 At this time, the two characters having a probability equal to or higher than the threshold th and to be changed by (Equation 1) are “culture” and “paternalization”. The edges corresponding to these are a combination of E12 and E24, and E12 has edges for both “sentence” and “father” classification possibilities. Here, the character string recognizing unit 205 changes only the two sides corresponding to “culture” having the highest probability, and changes only the combination of E12 (sentence, 0.6) and E24 (formation, 0.5) to E12. (Sentence, 0.8) and E24 (Chemical, 0.7).

図７は、コンテキスト情報を適用した認識グラフの例示図である。図７では、「文化」に関する補正に加え、Ｂｉｇｒａｍ情報にある「化的」の発生確率が閾値以上であることから、Ｅ２４及びＥ４ｅに同様の補正値が適用されている。なお、（式１）は一例であり、例えば確率が閾値以下のときに重みを減じてもよく、確率を引数とした関数値、例えばｌｏｇ（Ｐ）やｅｘｐ（Ｐ）を加算したり乗じることで、コンテキスト情報の適用を行ってもよい。 FIG. 7 is an exemplary diagram of a recognition graph to which context information is applied. In FIG. 7, in addition to the correction related to “culture”, the occurrence probability of “chemical” in the Bigram information is equal to or higher than the threshold value, and therefore similar correction values are applied to E24 and E4e. Note that (Equation 1) is an example. For example, the weight may be reduced when the probability is less than or equal to the threshold, and a function value using the probability as an argument, for example, log (P) or exp (P) is added or multiplied. Then, the context information may be applied.

認識グラフへのコンテキスト情報の適用処理後に、文字列認識部２０５は、適用処理後の認識グラフにおいて最適パスを選択する（Ｓ４０８）。「最適パス」は、認識グラフの最初の節から終端節に至るまでに通った辺の分類結果の文字を繋げて、文字列ｉ全体に対する尤もらしい認識結果を得られるパスのことである。最適パスを得るために、文字列認識部２０５は、例えば全通りの辺の組み合わせを網羅する複数のパスを設定し、辺の重みの合計値を通った辺の数で除算した値が最大になるパスを選択する。あるいは文字列認識部２０５は、辺の重みの逆数等に基づく値を辺のペナルティとして定義し、パス合計のペナルティが最小になるパスを選択することで最適パスを選択する。 After applying the context information to the recognition graph, the character string recognition unit 205 selects an optimum path in the recognition graph after the application processing (S408). The “optimal path” is a path that connects the characters of the classification result of the edges that passed from the first node to the terminal node of the recognition graph to obtain a plausible recognition result for the entire character string i. In order to obtain the optimum path, the character string recognition unit 205 sets, for example, a plurality of paths covering all combinations of sides, and the value obtained by dividing the total number of sides by the number of sides passes through the maximum value. Select the path. Alternatively, the character string recognizing unit 205 defines a value based on the reciprocal of the edge weight or the like as an edge penalty, and selects an optimum path by selecting a path that minimizes the total path penalty.

文字列認識部２０５は、選択した最適パスに基づいて、各々選択した辺の分類結果をつなげた文字コード列を、撮影画像に含まれる文字列ｉに対する文字列認識結果として出力する（Ｓ４０９）。文字列認識結果の出力により、文字列認識処理が終了する。出力された文字列認識結果は、文字列情報記憶部２０６の文字列ｉの情報に付加される。 The character string recognition unit 205 outputs, as a character string recognition result for the character string i included in the photographed image, a character code string obtained by connecting the classification results of the selected sides based on the selected optimal path (S409). With the output of the character string recognition result, the character string recognition process ends. The output character string recognition result is added to the information of the character string i in the character string information storage unit 206.

なお、現在の処理が文字列情報記憶部２０６に記憶される文字列ｉに対する初回の文字列認識処理ではない場合（Ｓ４０５：N）、文字列認識部２０５は、以下のような認識グラフの更新処理を行う（Ｓ４１０）。この場合、文字列情報記憶部２０６中で処理対象となる文字列ｉには、前回の文字列認識処理で生成された認識グラフが付加されている。そのために文字列認識部２０５は、今回のＳ４０１の処理で入力された文字列画像から得た文字画素塊、文字要素、及びその分類結果に基づいて、認識グラフを更新する。ここで行われる更新には、新たな節や新たな辺の追加、及び既にある辺の重みの変更の少なくとも一つが含まれる。 When the current process is not the first character string recognition process for the character string i stored in the character string information storage unit 206 (S405: N), the character string recognition unit 205 updates the recognition graph as follows. Processing is performed (S410). In this case, the recognition graph generated in the previous character string recognition process is added to the character string i to be processed in the character string information storage unit 206. Therefore, the character string recognizing unit 205 updates the recognition graph based on the character pixel block, the character element, and the classification result obtained from the character string image input in the current processing of S401. The update performed here includes at least one of adding a new clause or a new side and changing the weight of an existing side.

図８は、認識グラフの更新処理の流れを表すフローチャートである。ここでは、文字列ｉに対し、過去に画像ｔ’から取得した文字列画像より、Ｓ４０６の処理（図４参照）の結果として認識グラフＳ’が既に生成されているものとする。その際、文字列画像から、Ｓ４０２の処理で文字画素塊の集合｛Ｃ’｝が、Ｓ４０３の処理で文字要素の集合｛Ｇ’｝が、それぞれ生成されている。認識グラフＳ’には各々に対応する節の集合｛Ｎ’｝及び辺の集合｛Ｅ’｝があるとする。今回の処理では、同じ文字列画像ｉに対し、画像ｔから得た文字列画像から、すでに文字画素塊の集合｛Ｃ｝及び文字要素の集合｛Ｇ｝が生成されており、文字列認識部２０５は、これらを入力としてＳ４１０の処理を行う。 FIG. 8 is a flowchart showing the flow of the recognition graph update process. Here, it is assumed that the recognition graph S ′ has already been generated for the character string i as a result of the processing of S406 (see FIG. 4) from the character string image acquired from the image t ′ in the past. At that time, a set of character pixel blocks {C ′} is generated from the character string image by the process of S402, and a set of character elements {G ′} is generated by the process of S403. Assume that the recognition graph S ′ includes a set of nodes {N ′} and a set of edges {E ′} corresponding to each. In this process, for the same character string image i, a character pixel block set {C} and a character element set {G} are already generated from the character string image obtained from the image t, and the character string recognition unit In step S <b> 410, the process of S <b> 410 is performed using these as inputs.

なお、認識グラフＳ’の各辺の重みは、Ｓ４０７の処理のコンテキスト情報の適用処理により更新されている可能性がある。以降の処理において、認識グラフＳ’の辺の重みは分類結果に基づく元々の値であることが望ましい。文字列認識部２０５は、それら元々の値を、文字列ｉの認識情報の一部として記憶しておき、図８の処理の際にオリジナル値にリセットするものとする。あるいは、文字列認識部２０５は、Ｓ４０７の処理で各辺の重みを実際に変更せず、Ｓ４０８の処理での最適パス選択の際、経路上にある辺の重みを合計する中でコンテキスト情報を考慮して重みやペナルティを合計するようにしてもよい。 Note that the weight of each side of the recognition graph S ′ may have been updated by the context information application process of S407. In the subsequent processing, it is desirable that the edge weight of the recognition graph S ′ is an original value based on the classification result. The character string recognizing unit 205 stores these original values as part of the recognition information of the character string i, and resets them to the original values in the process of FIG. Alternatively, the character string recognizing unit 205 does not actually change the weight of each side in the process of S407, and when selecting the optimum path in the process of S408, the context information is added while summing the weights of the edges on the route. You may make it add a weight and a penalty in consideration.

文字列認識部２０５は、文字画像塊の集合｛Ｃ｝に含まれる文字画素塊と、文字要素の集合｛Ｇ｝に含まれる文字要素とから、節の集合｛Ｎ｝及び辺の集合｛Ｅ｝で構成される認識グラフＳを作成する（Ｓ８０１）。文字列認識部２０５は、Ｓ４０６の処理と同様の処理で認識グラフを作成する。 From the character pixel block included in the character image block set {C} and the character elements included in the character element set {G}, the character string recognition unit 205 uses the node set {N} and the edge set {E}. } Is created (S801). The character string recognition unit 205 creates a recognition graph by the same process as the process of S406.

文字列認識部２０５は、認識グラフの更新処理で利用する情報として用意されるペア節リストＬｐを空に初期化する（Ｓ８０２）。「ペア節リストＬｐ」は、認識グラフＳの節Ｎｘと、文字列情報に記憶されている認識グラフＳ’の節Ｎ’ｘとの対応ペア（Ｎｘ，Ｎ’ｘ）を要素とするリストである。 The character string recognition unit 205 initializes the pair clause list Lp prepared as information to be used in the recognition graph update processing to be empty (S802). The “pair clause list Lp” is a list having as an element a pair (Nx, N′x) corresponding to the clause Nx of the recognition graph S and the clause N′x of the recognition graph S ′ stored in the character string information. is there.

文字列認識部２０５は、認識グラフＳの節の集合｛Ｎ｝に含まれる節Ｎｘ（＝Ｎ１，Ｎ２，．．．）を順に一つずつ処理対象としてループ処理を行う（Ｓ８０３、Ｓ８０４）。文字列認識部２０５は、まず、認識グラフＳの節Ｎｘに対応する節Ｎ’ｙが認識グラフＳ’に存在するか否かを判断する（Ｓ８０３）。存在する場合（Ｓ８０３：Y）、文字列認識部２０５は、節Ｎｘと節Ｎ’ｙとがペアであるとして、（Ｎｘ，Ｎ’ｙ）をペア節リストＬｐに追加する（Ｓ８０４）。存在しない場合（Ｓ８０３：N）、文字列認識部２０５は、次のＮｘに対してこのループ処理を繰り返し行う。 The character string recognizing unit 205 performs loop processing on the nodes Nx (= N1, N2,...) Included in the node set {N} of the recognition graph S one by one in order (S803, S804). First, the character string recognizing unit 205 determines whether or not a node N′y corresponding to the node Nx of the recognition graph S exists in the recognition graph S ′ (S803). If it exists (S803: Y), the character string recognition unit 205 adds (Nx, N'y) to the pair clause list Lp, assuming that the clause Nx and the clause N'y are a pair (S804). If it does not exist (S803: N), the character string recognition unit 205 repeats this loop process for the next Nx.

ここで、節Ｎｘと節Ｎ’ｙとが対応するか否かの判定方法の一例を説明する。認識グラフＳの節Ｎｘは、現在処理中の画像ｔから抽出された文字画素塊Ｃｘに基づいて生成されたものである。また、認識グラフＳ’の節Ｎ’ｙは、以前の文字列認識処理により抽出された、すなわち過去の画像ｔ’から抽出された文字画素塊Ｃ’ｙに基づいて生成されたものである。ここでは、現在処理中の画像ｔから抽出された文字画素塊Ｃｘと、過去に画像ｔ’から抽出された文字画素塊Ｃ’ｙとが以下の条件式を満たすときに、節Ｎｘと節Ｎ’ｙが対応すると判定する。
｜Ｃｘの左端座標 − （Ｃ’ｙの左端座標＋ΔＣ’ｙ）｜＜Ｄ
ΔＣ’ｙは、画像ｔ’と画像ｔとの間の文字画素塊Ｃ’ｙの移動量である。移動量は、例えば、撮影画像処理部２０２により、対象文字画素塊Ｃ’ｙの移動量として図３のＳ３０４の処理のように求められる。Ｄは正の定数であり、移動量を考慮した上での文字画素塊の位置ずれの許容量を示す。定数Ｄは、文字列ｉの実際の大きさに関する値、例えば文字列ｉの高さの１／１０等に設定される。 Here, an example of a method for determining whether or not the node Nx corresponds to the node N′y will be described. The node Nx of the recognition graph S is generated based on the character pixel block Cx extracted from the image t currently being processed. The node N′y of the recognition graph S ′ is generated based on the character pixel block C′y extracted by the previous character string recognition process, that is, extracted from the past image t ′. Here, when the character pixel block Cx extracted from the image t currently being processed and the character pixel block C′y extracted from the image t ′ in the past satisfy the following conditional expression, the nodes Nx and N It is determined that 'y corresponds.
| Left end coordinate of Cx− (Left end coordinate of C′y + ΔC′y) | <D
ΔC′y is a movement amount of the character pixel block C′y between the image t ′ and the image t. The amount of movement is obtained, for example, by the captured image processing unit 202 as the amount of movement of the target character pixel block C′y as in the process of S304 in FIG. D is a positive constant and indicates an allowable amount of displacement of the character pixel block in consideration of the movement amount. The constant D is set to a value related to the actual size of the character string i, for example, 1/10 of the height of the character string i.

文字列が水平方向に左から右へ印刷されている場合には、上記式による判定方法を用いることができる。文字列が上から下へ印刷されている場合には、上端座標の比較により判定を行ってもよい。また、両文字画素塊の形状特徴を考慮してもよい。例えば形状の一部あるいは全部が一致している場合のみ、節Ｎｘと節Ｎ’ｙとが対応していると判定してもよい。特に、両文字画素塊の形状が部分的に一致する場合、文字画素塊Ｃｘの左側が文字画素塊Ｃ’ｙの左側に一致している、あるいはその逆である場合のみを、対応していると判定してもよい。 When the character string is printed from left to right in the horizontal direction, the determination method based on the above formula can be used. If the character string is printed from top to bottom, the determination may be made by comparing the upper end coordinates. Further, the shape characteristics of both character pixel blocks may be considered. For example, it may be determined that the node Nx and the node N′y correspond only when part or all of the shapes match. In particular, when the shapes of both character pixel blocks partially match, only the case where the left side of the character pixel block Cx matches the left side of the character pixel block C′y or vice versa is supported. May be determined.

認識グラフの節の集合｛Ｎ｝に含まれるすべての節Ｎｘに対してＳ８０３、Ｓ８０４の処理を行うことで、ループ処理が終了する。Ｓ８０３、Ｓ８０４によるループ処理が終了すると、文字列認識部２０５は、認識グラフＳのすべての辺を初期値とする辺リストＬｅを作成する（Ｓ８０５）。文字列認識部２０５は、文字要素の複数の分類可能性により一組の節Ｎｉ，Ｎｊの間に複数の辺がある場合、それぞれ別の辺Ｅｉｊ１，Ｅｉｊ２，．．．として辺リストＬｅに追加する。 The loop processing is completed by performing the processing of S803 and S804 for all the nodes Nx included in the set {N} of nodes of the recognition graph. When the loop processing in S803 and S804 is completed, the character string recognition unit 205 creates a side list Le having all sides of the recognition graph S as initial values (S805). When there are a plurality of sides between a set of clauses Ni and Nj due to a plurality of classification possibilities of character elements, the character string recognition unit 205 has different sides Eij1, Eij2,. . . To the side list Le.

文字列認識部２０５は、辺リストＬｅに含まれる辺Ｅｉｊを順に一つずつ処理対象としてループ処理を行う（Ｓ８０６〜Ｓ８１１）。 The character string recognizing unit 205 performs loop processing on the edges Eij included in the edge list Le one by one in order (S806 to S811).

文字列認識部２０５は、認識グラフＳの辺Ｅｉｊの最初と最後の節、すなわち節Ｎｉと節Ｎｊについて、それぞれ認識グラフＳ’に対応する節があるか否かを調べる（Ｓ８０６）。具体的には、文字列認識部２０５は、ペア節リストＬｐに対象の節とペアを成す要素が含まれているか否かを調べる。文字列認識部２０５は、対応するペア数が（０，１，２）のそれぞれで分岐した処理を行う。ペア数が「０」の場合（Ｓ８０６：０）、すなわち節Ｎｉ、節Ｎｊとも対応する節が認識グラフＳ’に無い場合、文字列認識部２０５は、現在の辺Ｅｉｊに対する処理は行わず、辺リストＬｅにある次の辺に対して、このループ処理を繰り返す。 The character string recognizing unit 205 checks whether or not there are nodes corresponding to the recognition graph S ′ for the first and last nodes of the edge Eij of the recognition graph S, that is, the nodes Ni and Nj (S806). Specifically, the character string recognition unit 205 checks whether the pair clause list Lp includes an element that forms a pair with the target clause. The character string recognizing unit 205 performs a process that branches when the corresponding number of pairs is (0, 1, 2). When the number of pairs is “0” (S806: 0), that is, when there are no clauses corresponding to the clause Ni and the clause Nj in the recognition graph S ′, the character string recognition unit 205 does not perform processing for the current side Eij, This loop process is repeated for the next side in the side list Le.

ペア数が「１」の場合（Ｓ８０６：１）、すなわち節Ｎｉか節Ｎｊのどちらか一方のみが認識グラフＳ’にペアとなる節がある場合、文字列認識部２０５は、認識グラフＳ’に対応する節が無い方の節を、認識グラフＳ’に新しい節として追加する（Ｓ８０７）。例えば、節Ｎｊに対応する節が認識グラフＳ’に無かった場合、文字列認識部２０５は、節Ｎ’ｊ’を認識グラフＳ’に追加する。「ｊ’」は追加後に認識グラフＳ’中でユニークとなる添字を表す。また、文字列認識部２０５は、節とＮｊと新しく追加された節Ｎ’ｊ’とからなるペア（Ｎｊ，Ｎ’ｊ’）をペア節リストＬｐに追加する。以降、辺Ｅｉｊは両節とも認識グラフＳ’に対応する節を持つことになる。 When the number of pairs is “1” (S806: 1), that is, when only one of the node Ni or the node Nj has a node that is a pair in the recognition graph S ′, the character string recognition unit 205 recognizes the recognition graph S ′. The node having no node corresponding to is added as a new node to the recognition graph S ′ (S807). For example, when there is no clause corresponding to the clause Nj in the recognition graph S ′, the character string recognition unit 205 adds the clause N′j ′ to the recognition graph S ′. “J ′” represents a subscript that is unique in the recognition graph S ′ after the addition. Further, the character string recognition unit 205 adds a pair (Nj, N′j ′) including a clause, Nj, and a newly added clause N′j ′ to the pair clause list Lp. Thereafter, the edge Eij has a node corresponding to the recognition graph S ′ in both nodes.

文字列認識部２０５は、辺Ｅｉｊの両端の２節にそれぞれ対応する認識グラフＳ’の２節の間に、辺Ｅｉｊと同じ重みを持つ新しい辺を追加する（Ｓ８０８）。例えば、文字列認識部２０５は、認識グラフＳの節Ｎｉ、節Ｎｊが認識グラフＳ’の節Ｎ’ｋ、節Ｎ’ｌにそれぞれ対応し、辺Ｅｉｊが分類結果Ａ、分類の信頼度ｗであるときに、辺Ｅ’ｋｌ（Ａ，ｗ）を認識グラフＳ’に追加する。節Ｎｉと節Ｎｊとの間に複数の辺がある場合、文字列認識部２０５は、すべての辺を追加する。このとき、認識グラフＳと認識グラフＳ’との間での節の対応づけは、ペア節リストＬｐを用いることで容易に行うことができる。 The character string recognition unit 205 adds a new side having the same weight as the side Eij between the two clauses of the recognition graph S ′ respectively corresponding to the two clauses at both ends of the side Eij (S808). For example, in the character string recognition unit 205, the nodes Ni and Nj of the recognition graph S correspond to the nodes N′k and N′l of the recognition graph S ′, respectively, the edge Eij is the classification result A, and the classification reliability w. , Add edge E′kl (A, w) to recognition graph S ′. When there are a plurality of sides between the node Ni and the node Nj, the character string recognition unit 205 adds all the sides. At this time, the association of the clauses between the recognition graph S and the recognition graph S ′ can be easily performed by using the pair clause list Lp.

文字列認識部２０５は、辺Ｅｉｊを辺リストＬｐから除去し、以降、認識グラフＳ’への追加処理対象とならないようにする（Ｓ８０９）。文字列認識部２０５は、辺リストＬｐの次の辺に対して、このループ処理を繰り返す。 The character string recognizing unit 205 removes the edge Eij from the edge list Lp so that it will not be added to the recognition graph S ′ thereafter (S809). The character string recognition unit 205 repeats this loop process for the next side of the side list Lp.

ペア数が「２」の場合（Ｓ８０６：２）、すなわち節Ｎｉ，節Ｎｊの両方が認識グラフＳ’にペアとなる節がある場合、辺Ｅｉｊの最初と最後の両方に対応する節が認識グラフＳ’にある。この場合、文字列認識部２０５は、節Ｎｉ、節Ｎｊにそれぞれ対応する認識グラフＳ’の節を節Ｎ’ｋ、節Ｎ’ｌとして、辺Ｅｉｊと同じ分類結果を持つ辺Ｅ’ｋｌが存在するかどうかを調べる（Ｓ８１０）。存在しない場合（Ｓ８１０：N）、文字列認識部２０５は、辺Ｅｉｊと同じ重みを持つ新しい辺を認識グラフＳ’に追加する（Ｓ８０８）。 When the number of pairs is “2” (S806: 2), that is, when both the nodes Ni and Nj have a pair of nodes in the recognition graph S ′, the nodes corresponding to both the first and last of the edge Eij are recognized. In the graph S ′. In this case, the character string recognizing unit 205 sets the nodes of the recognition graph S ′ corresponding to the nodes Ni and Nj as the nodes N′k and N′l, and the edge E′kl having the same classification result as the edge Eij is obtained. It is checked whether it exists (S810). If it does not exist (S810: N), the character string recognition unit 205 adds a new edge having the same weight as the edge Eij to the recognition graph S '(S808).

存在する場合（Ｓ８１０：Y）、文字列認識部２０５は、辺Ｅｉｊと同じ分類結果を持つ辺Ｅ’ｋｌの重みを更新する（Ｓ８１１）。例えば、文字列認識部２０５は、両者の分類信頼度を比較し、高い方の値を新しい重みとする。これは一例であり、両者の平均値に重みを更新してもよい。文字列認識部２０５は、辺Ｅｉｊが複数ある場合、各辺に対して、このループ処理を繰り返す。 If it exists (S810: Y), the character string recognition unit 205 updates the weight of the side E′kl having the same classification result as the side Eij (S811). For example, the character string recognition unit 205 compares the classification reliability of the two and sets the higher value as a new weight. This is an example, and the weight may be updated to the average value of both. When there are a plurality of sides Eij, the character string recognition unit 205 repeats this loop process for each side.

辺リストＬｅに含まれるすべての辺に対するＳ８０６〜Ｓ８１１のループ処理が完了すると、文字列認識部２０５は、認識グラフＳ’へ追加可能な辺が、リストＬｅに残っているか否かを判定する（Ｓ８１２）。残っている場合（Ｓ８１２：Y）、文字列認識部２０５は、Ｓ８０６〜Ｓ８１１のループ処理を再度実行する。残っていない場合（Ｓ８１２：N）、文字列認識部２０５は、文字列ｉの認識グラフの更新処理を終了する。 When the loop processing of S806 to S811 for all the edges included in the edge list Le is completed, the character string recognition unit 205 determines whether or not edges that can be added to the recognition graph S ′ remain in the list Le ( S812). If it remains (S812: Y), the character string recognition unit 205 executes the loop processing of S806 to S811 again. If no character string remains (S812: N), the character string recognizing unit 205 ends the recognition graph update process for the character string i.

認識グラフＳ’へ追加可能な辺がリストＬｅに残っているか否かの判定は、例えば、（１）辺リストが空である、（２）直前のＳ８０６〜Ｓ８１１のループ処理で辺リストＬｅの要素数に変化が無い、の２つの条件により行われる。いずれかの条件が成り立つ場合、文字列認識部２０５は、認識グラフＳ’へ追加可能な辺がリストＬｅに残っていると判定する。 For example, (1) the edge list is empty, or (2) the edge list Le in the loop processing of S806 to S811 immediately before is determined as to whether the edge that can be added to the recognition graph S ′ remains in the list Le. This is performed under the two conditions of no change in the number of elements. When any one of the conditions is satisfied, the character string recognition unit 205 determines that there are remaining edges that can be added to the recognition graph S ′ in the list Le.

（文字列認識の具体例）
図９は、図４及び図８で示した文字列認識処理を施す具体例の説明図である。画像９０１は、時刻ｔ’における撮影画像ｔ’、画像９０２は、時刻ｔ’より後の時刻ｔにおける撮影画像ｔである。画像９０１、９０２は、それぞれ文字列画像９１０、９２０を含む。文字列画像９１０、９２０は、同一被写体中の同一の文字列ｉに対応するものである。ただし、撮影時の角度が異なるために、文字列の位置、角度、大きさ等が異なる。図１０は、認識グラフの具体例の説明図である。 (Specific example of character string recognition)
FIG. 9 is an explanatory diagram of a specific example in which the character string recognition process shown in FIGS. 4 and 8 is performed. An image 901 is a captured image t ′ at time t ′, and an image 902 is a captured image t at time t after time t ′. Images 901 and 902 include character string images 910 and 920, respectively. The character string images 910 and 920 correspond to the same character string i in the same subject. However, since the angle at the time of shooting is different, the position, angle, size, etc. of the character string are different. FIG. 10 is an explanatory diagram of a specific example of the recognition graph.

文字列認識部２０５は、画像９０１を入力として、文字列画像９１０を対象に図４の文字列認識処理を行う。Ｓ４０１、Ｓ４０２の処理により、文字列認識部２０５は、文字列画像９１０から文字画素塊９１１〜９１７を抽出する。文字列認識部２０５は、文字列ｉに対する認識処理が初回であるとして、Ｓ４０３〜Ｓ４０６の処理により、認識グラフＳ’として図１０（ａ）の認識グラフ１０００を生成する。認識グラフ１０００は、文字画素塊９１１〜９１７に相当する節１００１〜１００７及び終端節１００８を含む。 The character string recognition unit 205 receives the image 901 and performs the character string recognition process of FIG. 4 for the character string image 910. Through the processing of S401 and S402, the character string recognition unit 205 extracts character pixel blocks 911 to 917 from the character string image 910. The character string recognizing unit 205 assumes that the recognition process for the character string i is the first time, and generates the recognition graph 1000 of FIG. 10A as the recognition graph S ′ through the processing of S403 to S406. The recognition graph 1000 includes nodes 1001 to 1007 and a terminal node 1008 corresponding to the character pixel blocks 911 to 917.

続いて文字列認識部２０５は、画像９０２を入力として、文字列画像９２０を対象に図４の文字列認識処理を行う。Ｓ４０１、Ｓ４０２の処理により、文字列認識部２０５は、文字列画像９２０から文字画素塊９２１〜９２７を抽出する。文字列認識部２０５は、文字列ｉに対して既に認識処理が行われているので、Ｓ４０３〜Ｓ４０５の処理からＳ４１０の処理、すなわち図８の処理を行う。文字列認識部２０５は、Ｓ８０１の処理により、認識グラフＳとして図１０（ｂ）の認識グラフ１０１０を生成する。認識グラフ１０１０は、文字画素塊９２１〜９２７に相当する節１０１１〜１０１７及び終端節１０１８を含む。 Subsequently, the character string recognition unit 205 performs the character string recognition process of FIG. 4 on the character string image 920 with the image 902 as an input. The character string recognizing unit 205 extracts character pixel blocks 921 to 927 from the character string image 920 through the processes of S401 and S402. Since the character string recognition unit 205 has already performed the recognition process on the character string i, the process from S403 to S405 to the process of S410, that is, the process of FIG. 8 is performed. The character string recognizing unit 205 generates the recognition graph 1010 of FIG. 10B as the recognition graph S by the processing of S801. The recognition graph 1010 includes nodes 1011 to 1017 and a terminal node 1018 corresponding to the character pixel blocks 921 to 927.

文字列認識部２０５は、Ｓ８０２〜８０４の処理により、図１０（ａ）の認識グラフ１０００の節と図１０（ｂ）の認識グラフ１０１０の節との間で、ペア節リストＬｐを生成する。節がペアか否かの判定は、Ｓ８０３の処理で説明した判定条件が用いられる。ここでは、（１００１，１０１１），（１００２，１０１２），（１００３，１０１３），（１００５，１０１４），（１００６，１０１５），（１００７，１０１６）が、ペアとなる。 The character string recognition unit 205 generates a pair clause list Lp between the node of the recognition graph 1000 in FIG. 10A and the node of the recognition graph 1010 in FIG. The determination condition described in the processing of S803 is used to determine whether the clause is a pair. Here, (1001, 1011), (1002, 1012), (1003, 1013), (1005, 1014), (1006, 1015), (1007, 1016) form a pair.

文字列認識部２０５は、Ｓ８０５〜Ｓ８１２の処理では、図１０（ｂ）の認識グラフ１０１０から、図１０（ａ）の認識グラフ１０００へと辺を追加あるいは更新する処理を行う。図１０（ｃ）は認識グラフ１０１０の更新後の認識グラフ１０２０を示す。 In the processing from S805 to S812, the character string recognition unit 205 performs processing to add or update an edge from the recognition graph 1010 in FIG. 10B to the recognition graph 1000 in FIG. FIG. 10C shows the recognition graph 1020 after the recognition graph 1010 is updated.

辺の追加は、Ｓ８０６の処理で説明した通り、辺の両端の節のペア数に従って行われる。ペア数が「２」の辺、例えば節１０１１と節１０１２との間の辺１０９１は、ペアとなる節１００１と節１００２との間に同じ分類の辺が無い。従って、文字列認識部２０５は、Ｓ８０８の処理にて、辺１０９１と同じ分類、重みを持つ新しい辺１０９３を追加する。ペア数が「１」の辺、例えば節１０１６と節１０１７との間の辺１０９２の場合、文字列認識部２０５は、まず、ペアの無い方の節１０１７に対応する節として、新規に認識グラフ１０００に節１０２１を追加する。そして文字列認識部２０５は、節１００７と節１０２１との間に、辺１０９２と同じ分類、重みの辺１０９４を追加する。他の辺に関しても同様の処理が行われる。 The addition of the side is performed according to the number of pairs of nodes at both ends of the side as described in the processing of S806. The side having the number of pairs “2”, for example, the side 1091 between the node 1011 and the node 1012 does not have an edge of the same classification between the node 1001 and the node 1002 forming a pair. Therefore, the character string recognition unit 205 adds a new side 1093 having the same classification and weight as the side 1091 in the process of S808. When the number of pairs is “1”, for example, the edge 1092 between the nodes 1016 and 1017, the character string recognition unit 205 first starts a new recognition graph as a node corresponding to the node 1017 having no pair. Add node 1021 to 1000. Then, the character string recognition unit 205 adds a side 1094 having the same classification and weight as the side 1092 between the node 1007 and the node 1021. Similar processing is performed for the other sides.

認識グラフの更新処理が終了すると、文字列認識部２０５は、Ｓ４０７の処理により、図１０（ｃ）の認識グラフ１０２０にコンテキスト情報を適用する。ここでは、Ｓ４０７の処置で説明した（式１）により、対象となる辺の重みを更新する。使用されるＢｉｇｒａｍ情報のうち、連続発生確率Ｐが（式１）の閾値ｔｈ以上である２文字を以下に列挙する。
「塩化」「化カ」「ヒカ」「カリ」「切り」「リウ」「ウム」 When the recognition graph update processing ends, the character string recognition unit 205 applies the context information to the recognition graph 1020 in FIG. Here, the weight of the target side is updated by (Equation 1) described in the procedure of S407. Among the Bigram information to be used, two characters whose continuous occurrence probability P is equal to or greater than the threshold th in (Equation 1) are listed below.
“Chloride” “Chemical” “Hika” “Kari” “Cutting” “Riu” “Um”

図１０（ｆ）は、これらを用いて（式１）により各辺の重みの補正をおこなった後の認識グラフ１０２０である。なお、（式１）の補正値ｄ＝０．１である。一例として、辺１０９４の更新前の重みは「０．７」であったが、Ｂｉｇｒａｍ情報の「リウ」「ウム」のそれぞれにより「０．１」ずつ加算され重みは「０．９」となった。 FIG. 10F shows a recognition graph 1020 after using these to correct the weight of each side according to (Equation 1). Note that the correction value d in Equation 1 is 0.1. As an example, the weight before the update of the edge 1094 was “0.7”, but “0.1” is added by each of the “Liu” and “Um” of the Bigram information, and the weight becomes “0.9”. It was.

文字列認識部２０５は、Ｓ４０８の処理によりコンテキスト情報を適用した認識グラフから、最適パスを選択する。ここでは、文字列認識部２０５は、各辺の重みを加算して辺数で割った値が最大となるものを最適パスとする。図１０（ｆ）の認識グラフ１０２０においては、「塩化カリウム＝０．９０」が最適パスとして得られる。これは正解の文字列認識結果である。 The character string recognizing unit 205 selects an optimum path from the recognition graph to which the context information is applied in the process of S408. Here, the character string recognizing unit 205 determines the optimum path by adding the weight of each side and dividing the divided value by the number of sides. In the recognition graph 1020 of FIG. 10F, “potassium chloride = 0.90” is obtained as the optimum path. This is a correct character string recognition result.

図８に示す認識グラフの更新処理を行わず、図１０（ａ）の認識グラフ１０００及び図１０（ｂ）の認識グラフ１０１０に対して、それぞれ同じコンテキスト情報を適用した結果を、図１０（ｄ）及び図１０（ｅ）に示す。図１０（ｄ）の認識グラフ１０００及び図１０（ｅ）の認識グラフ１０１０において最適パスを求めると、「哲化カリ広＝０．６５」及び「塩イ切りウム＝０．７０」となる。 The result of applying the same context information to the recognition graph 1000 in FIG. 10A and the recognition graph 1010 in FIG. 10B without performing the update processing of the recognition graph shown in FIG. ) And FIG. 10 (e). When the optimum paths are obtained in the recognition graph 1000 in FIG. 10D and the recognition graph 1010 in FIG. 10E, “Tetsuru Karihiro = 0.65” and “salt-cut-offum = 0.70” are obtained.

これは、図９の画像９０１、９０２の各文字列画像９１０、９２０を各々認識してコンテキスト情報を適用しても正解となる文字列認識結果を得ることはできないことを示す。また、２つの文字列認識結果である「哲化カリ広」、「塩イ切りウム」を融合して正解文字列を得ようとしても、各文字の対応関係を一意に定められないために、正確に融合を行うことは困難である。しかし、文字列認識部２０５は、文字列画像９１０から得た認識グラフに、文字列画像９２０から得た認識グラフを統合するような更新処理を行うことで、二つの文字列画像９１０、９２０の解釈を含んだ統合認識グラフを得ることができる。統合グラフでは、コンテキスト情報を適用可能な辺の組み合わせが二つの文字列画像９１０、９２０から同時に得られるために、最適パス選択による正解の文字列認識結果が得られる可能性が高くなる。 This indicates that even if each of the character string images 910 and 920 of the images 901 and 902 in FIG. 9 is recognized and the context information is applied, a correct character string recognition result cannot be obtained. In addition, even when trying to obtain the correct character string by fusing the two character string recognition results “Tetsuka Karihiro” and “Shio-Iri-um”, the correspondence between each character cannot be uniquely determined. Accurate fusion is difficult. However, the character string recognizing unit 205 performs an update process such that the recognition graph obtained from the character string image 920 is integrated into the recognition graph obtained from the character string image 910, whereby the two character string images 910 and 920 are updated. An integrated recognition graph including interpretation can be obtained. In the integrated graph, since a combination of edges to which context information can be applied is obtained simultaneously from the two character string images 910 and 920, there is a high possibility that a correct character string recognition result is obtained by selecting the optimum path.

本実施形態によれば、情報処理装置１００は、対象文字列を撮影した第１の画像を入力として、抽出した文字画素塊を節、文字画素塊を組み合わせた文字要素の分類結果を辺とする第１の認識グラフを生成する。文字画素塊の組み合わせ方は、文字列画像から個々の文字を切り出す際の切り出し方に対応する。次に、情報処理装置１００は、同じ対象文字列を撮影した第２の画像を入力として、同様に抽出した文字画素塊及び文字要素による第２の認識グラフを生成し、その内容を応じて第１の認識グラフを更新する。 According to the present embodiment, the information processing apparatus 100 receives the first image obtained by capturing the target character string as an input, sets the extracted character pixel block as a node, and sets the character element classification result obtained by combining the character pixel blocks as a side. A first recognition graph is generated. The method of combining character pixel blocks corresponds to a method of cutting out individual characters from a character string image. Next, the information processing apparatus 100 receives the second image obtained by photographing the same target character string, generates a second recognition graph using the character pixel block and the character element extracted in the same manner, and changes the contents according to the contents. Update the recognition graph of 1.

情報処理装置１００は、第１の認識グラフの節と第２の認識グラフの節とを、それぞれに対応する文字画素塊の位置関係と、第１の画像と第２の画像の間での各文字画素塊の移動量と、を考慮してペアとして対応づける。情報処理装置１００は、第２の認識グラフの節に対応する節が第１の認識グラフに無い場合、対応可能な新しい節を第１の認識グラフに追加してペアとする。情報処理装置１００は、第２の認識グラフ中の隣り合う２節の間の辺を、第１の認識グラフの該２節にペア対応する２節の間の辺としてコピーする。情報処理装置１００は、更新した第１の認識グラフにコンテキスト情報を適用し、辺の重みを更新した上で最適パスを選択することで、文字列の認識結果を得る。 The information processing apparatus 100 sets the position of the character pixel block corresponding to each of the nodes of the first recognition graph and the nodes of the second recognition graph, and each of the positions between the first image and the second image. Considering the amount of movement of the character pixel block, they are associated as a pair. When the node corresponding to the node of the second recognition graph does not exist in the first recognition graph, the information processing apparatus 100 adds a new compatible node to the first recognition graph as a pair. The information processing apparatus 100 copies an edge between two adjacent nodes in the second recognition graph as an edge between the two nodes corresponding to the two nodes in the first recognition graph. The information processing apparatus 100 obtains a character string recognition result by applying the context information to the updated first recognition graph, updating the edge weights, and selecting the optimum path.

このような処理により、情報処理装置１００は、第１の画像及び第２の画像の両方から抽出した文字画素塊及び文字要素の分類結果の全体に対してコンテキスト情報を適用し、尤もらしい文字列の認識結果を選択することができる。その結果、第１の画像及び第２の画像を個別に文字認識する場合よりも、精度の高い文字列認識処理を行うことができる。 Through such processing, the information processing apparatus 100 applies the context information to the entire character pixel block and character element classification results extracted from both the first image and the second image, and the likely character string. The recognition result can be selected. As a result, it is possible to perform a character string recognition process with higher accuracy than when the first image and the second image are individually recognized.

上記の例では対象文字列を撮影した第１の画像及び第２の画像を処理の対象としたが、対象文字列を撮影した画像は、さらに多くてもよい。情報処理装置１００は、第１の画像及び第２の画像に追加して、対象文字列を撮影した第３、第４…の画像から得た認識グラフを統合してからコンテキスト情報を適用してもよい。 In the above example, the first image and the second image obtained by photographing the target character string are targets of processing, but there may be more images obtained by photographing the target character string. The information processing apparatus 100 applies the context information after integrating the recognition graph obtained from the third, fourth,... Images obtained by capturing the target character string in addition to the first image and the second image. Also good.

情報処理装置１００は、第１の画像及び１以上の他の画像から、同一文字列を対象として抽出した文字画素塊及び文字要素の分類結果の全体を含むような認識グラフを生成する。情報処理装置１００は、認識グラフに対してコンテキスト情報を適用し、尤もらしい文字列認識結果を選択する。その結果、各画像を個別に認識するよりも精度の高い文字列認識処理をおこなうことができる。また、第１および１以上の他の画像を時系列に沿って取得する場合、情報処理装置１００は、認識精度が漸進的に向上するような文字列認識処理を行うことができる。 The information processing apparatus 100 generates a recognition graph including the entire character pixel block and character element classification results extracted from the first image and one or more other images for the same character string. The information processing apparatus 100 applies context information to the recognition graph and selects a plausible character string recognition result. As a result, it is possible to perform character string recognition processing with higher accuracy than recognizing each image individually. Further, when acquiring the first and one or more other images in time series, the information processing apparatus 100 can perform a character string recognition process that gradually improves the recognition accuracy.

（変形例）
図１１は、文字列認識処理の変形例の説明図である。画像１１００、画像１１１０、及び画像１１２０は、剛性の無い布等に印刷された識別情報の文字列「２６Ａ４８９Ｊ」を撮影した撮影画像である。撮影タイミングにより布が変形するために、各画像１１００、１１１０、１１２０は、一部の文字が認識不能な状態にまで変形している。このような撮影画像は、例えばマラソンランナーのゼッケン番号を認識するために撮影した画像等で発生しやすい。情報処理装置１００は、画像１１００、１１１０、１１２０に対して図３、図４、及び図８に示す文字列認識処理を行う。 (Modification)
FIG. 11 is an explanatory diagram of a modification of the character string recognition process. The image 1100, the image 1110, and the image 1120 are captured images obtained by capturing the character string “26A489J” of identification information printed on a non-rigid cloth or the like. Since the cloth is deformed depending on the photographing timing, each image 1100, 1110, 1120 is deformed to a state where some characters cannot be recognized. Such a photographed image is likely to occur, for example, in an image photographed for recognizing a marathon runner number number. The information processing apparatus 100 performs character string recognition processing illustrated in FIGS. 3, 4, and 8 on the images 1100, 1110, and 1120.

情報処理装置１００は、最初に取得する画像１１００から５つの文字画素塊を抽出する。情報処理装置１００は、文字画素塊を文字要素として分類し、認識グラフ１１０１を生成する。図１１の認識グラフは、各辺における文字分類の信頼度の値を省略している。 The information processing apparatus 100 extracts five character pixel blocks from the image 1100 acquired first. The information processing apparatus 100 classifies the character pixel block as a character element and generates a recognition graph 1101. In the recognition graph of FIG. 11, the reliability value of the character classification on each side is omitted.

次に情報処理装置１００は、画像１１１０を取得して、６つの文字画素塊を抽出して文字要素に分類し、認識グラフ１１１１を生成する。情報処理装置１００は、認識グラフ１１０１を更新するために、認識グラフ１１０１の終端節ｆを除く節ａ〜ｅと、認識グラフ１１１１の終端節を除く節ｇ〜ｌと間で、対応するペア節リストを生成する。 Next, the information processing apparatus 100 acquires the image 1110, extracts six character pixel blocks, classifies them into character elements, and generates a recognition graph 1111. In order to update the recognition graph 1101, the information processing apparatus 100 updates the paired clauses corresponding to the clauses a to e excluding the terminal node f of the recognition graph 1101 and the nodes g to l excluding the terminal node of the recognition graph 1111. Generate a list.

この例では、情報処理装置１００は、節に対応する文字画素塊の画像形状を比較し、形状が一致するか近似すると判定した文字画素塊のペアからペア節を生成するものとする。形状が一致するか否かの判定は、公知の画像特徴量として、画像回転や変形に強いＳＩＦＴ等を用いることができる。本例では、（ｇ，ａ）、（ｈ，ｂ）がペア節と判定され、認識グラフ１１１１のそれ以外の節ｉ、ｊ、ｋ、ｌに相当する節が認識グラフ１１０１に追加される。認識グラフ１１１１の各辺は、各々認識グラフ１１０１において対応する節の間の辺としてコピーされる。 In this example, the information processing apparatus 100 compares the image shapes of character pixel blocks corresponding to a node, and generates a pair node from a pair of character pixel blocks determined to match or approximate the shape. The determination of whether or not the shapes match can use SIFT or the like that is resistant to image rotation and deformation as a known image feature amount. In this example, (g, a) and (h, b) are determined as paired clauses, and the nodes corresponding to the other nodes i, j, k, and l in the recognition graph 1111 are added to the recognition graph 1101. Each edge of the recognition graph 1111 is copied as an edge between corresponding nodes in the recognition graph 1101.

続けて情報処理装置１００は、画像１１２０を取得して、５つの文字画素塊を抽出して文字要素に分類し、認識グラフ１１２１を生成する。情報処理装置１００は、認識グラフ１１０１を更新するために、認識グラフ１１０１の節ａ〜ｅ及び追加された節ｉ〜ｌと、認識グラフ１１２１の節ｎ〜ｒとの間で対応するペア節リストを生成する。この例では、情報処理装置１００は、(ｏ，ｊ）、（ｐ，ｋ），（ｑ，ｄ），（ｒ，ｅ）をペア節と判定し、認識グラフ１１２１の節ｎに相当する節を認識グラフ１１０１に追加する。認識グラフ１１２１の各辺は、各々認識グラフ１１０１において対応する節の間の辺としてコピーされる。 Subsequently, the information processing apparatus 100 acquires the image 1120, extracts five character pixel blocks, classifies them into character elements, and generates a recognition graph 1121. In order to update the recognition graph 1101, the information processing apparatus 100 updates a pair graph corresponding to the nodes a to e and the added nodes i to l of the recognition graph 1101 and the nodes n to r of the recognition graph 1121. Is generated. In this example, the information processing apparatus 100 determines that (o, j), (p, k), (q, d), and (r, e) are paired clauses, and a node corresponding to the node n of the recognition graph 1121. Is added to the recognition graph 1101. Each edge of the recognition graph 1121 is copied as an edge between corresponding nodes in the recognition graph 1101.

以上の処理により更新された認識グラフ１１０１は、認識グラフ１１３１となる。認識グラフ１１３１は、異なる節を最適パス検出時の開始節とできるように、仮想の開始節「０」が最初に配置されている。 The recognition graph 1101 updated by the above processing becomes a recognition graph 1131. In the recognition graph 1131, a virtual start node “0” is arranged first so that a different node can be used as the start node when detecting the optimum path.

情報処理装置１００は、認識グラフ１１３１に対してコンテキスト情報を適用し、最適パスを選出する。この例では、コンテキスト情報は、認識対象となる識別情報の文字列として予め定められたフォーマット定義を文字列パターン辞書として用いるものとする。「フォーマット定義」とは、具体的には文字数及び各文字位置である、文字種類を定義する。例えば、フォーマット定義は、識別情報の文字列が必ず７文字で構成され、３文字目と７文字目が大文字のアルファベット、それ以外が数字、といった定義である。認識グラフ１１３１では、この定義を満たすように文字列を出力するパスは一通り「２６Ａ４８９Ｊ」しかない。そのために情報処理装置１００は、文字列「２６Ａ４８９Ｊ」を認識結果として出力し、文字列認識処理を終了する。 The information processing apparatus 100 applies context information to the recognition graph 1131 and selects an optimal path. In this example, the context information uses a format definition predetermined as a character string of identification information to be recognized as a character string pattern dictionary. The “format definition” specifically defines the character type, which is the number of characters and each character position. For example, the format definition is a definition in which the character string of the identification information is always composed of seven characters, the third and seventh characters are uppercase alphabets, and the others are numbers. In the recognition graph 1131, there is only a single path “26A489J” for outputting a character string so as to satisfy this definition. Therefore, the information processing apparatus 100 outputs the character string “26A489J” as a recognition result, and ends the character string recognition process.

なお、この例においても文字認識に用いる撮影画像は、２枚あるいは４枚以上であってもよい。 In this example as well, two or four or more captured images used for character recognition may be used.

以上のような文字列認識処理では、情報処理装置１００が、２以上の撮影画像から、それぞれ同一文字列を対象として抽出した文字画素塊及び文字画素塊の組み合わせである文字要素の分類結果を含む認識グラフを生成し、それらを統合する。情報処理装置１００は、統合認識グラフに対してコンテキスト情報を適用し、尤もらしい文字列の認識結果を選択する。その結果、２以上の撮影画像を個別に文字認識するよりも精度の高い文字列の認識結果を取得することができる。また、複数の撮影画像を時系列に従って取得する場合、情報処理装置１００は、認識精度が漸進的に向上するような文字列の認識処理を行うことができる。 In the character string recognition process as described above, the information processing apparatus 100 includes character pixel blocks extracted from two or more photographed images for the same character string and character element classification results that are combinations of character pixel blocks. Generate recognition graphs and integrate them. The information processing apparatus 100 applies the context information to the integrated recognition graph and selects a likely character string recognition result. As a result, it is possible to acquire a character string recognition result with higher accuracy than character recognition of two or more captured images individually. Further, when acquiring a plurality of captured images in time series, the information processing apparatus 100 can perform a character string recognition process in which recognition accuracy is gradually improved.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

Acquisition means for acquiring a photographed image obtained by photographing a character string printed on a subject;
Extracting means for extracting a character pixel block from a character string portion of the photographed image;
Classification means for classifying the character elements based on the character pixel block;
Generating means for generating a recognition graph from the character pixel block and the classification result of the character elements;
Recognizing means for recognizing the character string based on the recognition graph,
The generating means includes a first character pixel block extracted from a first image of the plurality of photographed images and a first character element classified from the first character pixel block. A recognition graph is generated and classified from a second character pixel block extracted from a second image different from the first image among the plurality of captured images and the second character pixel block. Generating a second recognition graph from the second character element;
The recognizing unit determines whether the first character pixel block and the second character pixel block correspond to the same character string based on the determination as to whether or not the first character pixel block and the second character pixel block correspond to the same character string. Generating an integrated recognition graph integrated with the graph, and recognizing the character string based on the integrated recognition graph,
Information processing device.

The recognition means corresponds to the same character string in the first character pixel block and the second character pixel block based on an optical flow between the first image and the second image. It is characterized by determining whether or not,
The information processing apparatus according to claim 1.

The recognizing unit determines that the first character pixel block and the second character pixel block are the same character string based on a result of homography conversion between the first image and the second image. It is determined whether or not it corresponds to,
The information processing apparatus according to claim 1.

The recognition unit has nodes based on the character pixel block and sides based on the classification result of the character elements, and the weight of each side is set based on the reliability of the classification included in the classification result Produces
The recognition means obtains the recognition result of the character string by selecting a path from which the total sum of the weights of the edges is minimized or maximized from the integrated recognition graph.
The information processing apparatus according to claim 1.

The recognition means obtains a pair between the node of the first recognition graph and the node of the second recognition graph, and based on the pair, the edge of the second recognition graph is determined as the first recognition graph. The integrated recognition graph is generated by copying to a graph,
The information processing apparatus according to claim 4.

The recognition means is based on the node based on the first character pixel block and the second character pixel block when the first character pixel block and the second character pixel block correspond to the same character string. It is determined that there is a pair between the clauses, and when it is determined that they do not correspond, a clause based on the second character pixel block is added to the first recognition graph to form a pair.
The information processing apparatus according to claim 5.

The recognition means updates the weight of each side of the integrated recognition graph using context information, selects the path from the updated integrated recognition graph, and obtains a recognition result of the character string. ,
The information processing apparatus according to claim 4.

The recognition means uses either N-gram or a character string pattern dictionary for the context information.
The information processing apparatus according to claim 7.

A method executed by an information processing apparatus including predetermined imaging means,
A plurality of captured images obtained by capturing a character string printed on a predetermined subject is acquired from the imaging unit, a character pixel block is extracted from a character string portion of the captured image, and the extracted character pixel block is extracted. Classify text elements based on
A first recognition graph is generated from a first character pixel block extracted from a first image of the plurality of photographed images and a first character element classified from the first character pixel block. And a second character pixel block extracted from a second image different from the first image of the plurality of photographed images, and a second character element classified from the second character pixel block And generate a second recognition graph from
The first recognition graph and the second recognition graph are integrated based on whether or not the first character pixel block and the second character pixel block correspond to the same character string. An integrated recognition graph is generated, and the character string is recognized based on the integrated recognition graph.
Character recognition method.

In a computer provided with predetermined imaging means,
A plurality of captured images obtained by capturing a character string printed on a predetermined subject is acquired from the imaging unit, a character pixel block is extracted from a character string portion of the captured image, and the extracted character pixel block is extracted. Processing to classify character elements based on,
A first recognition graph is generated from a first character pixel block extracted from a first image of the plurality of photographed images and a first character element classified from the first character pixel block. And a second character pixel block extracted from a second image different from the first image of the plurality of photographed images, and a second character element classified from the second character pixel block Generating a second recognition graph from
The first recognition graph and the second recognition graph are integrated based on whether or not the first character pixel block and the second character pixel block correspond to the same character string. A process of generating an integrated recognition graph and recognizing the character string based on the integrated recognition graph;
A computer program for running.

A computer-readable storage medium storing the computer program according to claim 10.