JP5835035B2

JP5835035B2 - Character recognition program and character recognition device

Info

Publication number: JP5835035B2
Application number: JP2012056210A
Authority: JP
Inventors: 直紀渋谷; 玉井　敬一; 敬一玉井; 賢一鵜飼
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2012-03-13
Filing date: 2012-03-13
Publication date: 2015-12-24
Anticipated expiration: 2032-03-13
Also published as: JP2013190952A

Description

本発明は、文字列を撮影することにより生成された動画像を処理して、当該画像中の文字が示す情報を認識する技術に関する。 The present invention relates to a technique for processing a moving image generated by photographing a character string and recognizing information indicated by characters in the image.

一般的な文字認識処理では、投影処理によって画像中の文字を個別に抽出した上で、各文字を複数種の文字画像のモデル（以下、「文字モデル」という。）と照合し、所定値以上の類似度が得られた文字モデルが示す文字が該当文字であると判定する。また、この判定に用いられた文字モデルに対する類似度に基づき、認識結果の信頼度が算出される。 In general character recognition processing, characters in an image are individually extracted by projection processing, and then each character is collated with a plurality of types of character image models (hereinafter referred to as “character models”) to obtain a predetermined value or more. It is determined that the character indicated by the character model from which the similarity is obtained is the corresponding character. Further, the reliability of the recognition result is calculated based on the similarity to the character model used for this determination.

より信頼度の高い認識結果を得るために、動画像を用いた認識処理を行うことも提案されている。たとえば、特許文献１には、それぞれ異なる撮像条件下で生成された複数のフレーム画像に対し、文字領域の抽出および文字領域と辞書画像とのマッチング処理を実行し、これらによる認識結果のうちでマッチング度が高い結果を出力することが記載されている（請求項１，段落００１２〜００３１等を参照。）。 In order to obtain a more reliable recognition result, it has also been proposed to perform a recognition process using a moving image. For example, in Patent Document 1, character region extraction and matching processing between a character region and a dictionary image are executed on a plurality of frame images generated under different imaging conditions, and matching among the recognition results obtained by these processing is performed. It is described that a result with a high degree is output (see claim 1, paragraphs 0012 to 0031 and the like).

特許文献２には、より適切な撮影ができるように制御パラメータを変更しながら撮影が行われている状態下において、ユーザによる文字認識操作が行われたとき、その操作時のフレーム画像およびその前後の複数フレームの画像のそれぞれに対し、文字認識に適しているか否かを判断するための評価値を求め、この評価値が最も高い画像を用いて認識処理を行うことが記載されている（請求項１，段落００１８〜００４２等を参照。）。 In Patent Document 2, when a user performs a character recognition operation in a state where shooting is performed while changing control parameters so that more appropriate shooting can be performed, a frame image at the time of the operation and its front and back It is described that an evaluation value for determining whether or not each of the plurality of frame images is suitable for character recognition is obtained, and recognition processing is performed using an image having the highest evaluation value (claim). (Refer item | item 1, paragraphs 0018-0042 etc.).

特開２００３−２４２４４０号公報JP 2003-242440 A 特開２００９−８８９４４号公報JP 2009-88944 A

近年の撮影機能付きの携帯型情報装置（携帯電話など）には、動画像を用いたＯＣＲアプリケーションが導入されているが、携帯型情報装置は、様々な場所に持ち運ばれて利用されるため、様々な明るさの環境下でＯＣＲアプリケーションが利用される可能性が高い。また、ユーザ自身がカメラ付きの機体を手持ちして、認識対象の文字列へのカメラの位置合わせを行うため、その位置合わせ状態によっても画像の状態が大きく変動する。これらの実情の下で文字の認識精度を確保するには、認識処理に適した画質の画像が得られるように、撮影用の各種パラメータを自動調整する必要がある。 OCR applications using moving images have been introduced into portable information devices (such as cellular phones) with a photographing function in recent years, but portable information devices are carried and used in various places. The OCR application is likely to be used under various brightness environments. In addition, since the user himself / herself holds the machine body with the camera and aligns the camera with the character string to be recognized, the state of the image greatly varies depending on the alignment state. In order to ensure character recognition accuracy under these circumstances, it is necessary to automatically adjust various parameters for photographing so that an image having an image quality suitable for recognition processing can be obtained.

特許文献１，２には、照明、シャッタスピード、絞りなどのパラメータを変更しながらの動画撮影により生成された複数の画像を用いて、最も確度の良い認識結果を導き出すことが記載されていることから、上記の課題を解決する手段が示唆されているようにも思われる。 Patent Documents 1 and 2 describe that the most accurate recognition result is derived by using a plurality of images generated by moving image shooting while changing parameters such as illumination, shutter speed, and aperture. Therefore, it seems that the means to solve the above problem is suggested.

しかし、特許文献１に記載の発明では、カメラを固定して撮影を行い、画像に文字領域が入ったことを検出してから上記の処理を行うもので、人がカメラを手持ちして、認識対象の文字列を撮影する場合の事情は全く考慮されていない。また特許文献１に記載の発明では、単に各条件による認識処理を１回ずつ実施して、マッチング度が最も良い結果を選択するだけである。そのマッチング度が最も良い結果であっても、誤りが生じている可能性があり、認識精度が十分であるとは言えない。 However, in the invention described in Patent Document 1, the camera is fixed and shooting is performed, and the above processing is performed after detecting that a character area is included in the image. No circumstances are taken into account when photographing the target character string. In the invention described in Patent Document 1, the recognition process under each condition is simply performed once, and the result with the best matching degree is selected. Even if the matching degree is the best result, there is a possibility that an error has occurred, and it cannot be said that the recognition accuracy is sufficient.

特許文献２に記載の発明では、連続的に生成された複数のフレーム画像の中から文字認識に最も適した画質の画像を選択して文字認識処理を行っているが、文字認識を行うタイミングと関係なく撮影制御が行われるので、文字認識に適した画像を取得できない可能性がある。また、特許文献２に記載の発明では、評価値を算出する最大の区間を設定しているが、操作時のフレームとは被写体が異なる画像については、評価値の算出の対象から外すようにしている（段落００３４〜００３５，図５等を参照。）ので、評価値の算出に用いるフレーム数が不足するおそれがある。また、認識処理前の画像を評価するだけで、文字認識の結果を評価することについては、何も考慮しておらず、このような処理方法で精度の良い認識結果が得られるとするのは、甚だ疑問である。 In the invention described in Patent Document 2, character recognition processing is performed by selecting an image having an image quality most suitable for character recognition from a plurality of continuously generated frame images. Since shooting control is performed regardless of the image, there is a possibility that an image suitable for character recognition cannot be acquired. In the invention described in Patent Document 2, the maximum interval for calculating the evaluation value is set. However, an image whose subject is different from the frame at the time of operation is excluded from the target for calculating the evaluation value. (See paragraphs 0034 to 0035, FIG. 5, etc.), there is a risk that the number of frames used to calculate the evaluation value will be insufficient. Moreover, nothing is considered about evaluating the result of character recognition just by evaluating the image before recognition processing, and it is assumed that accurate recognition results can be obtained by such a processing method. That ’s a serious question.

本発明は上記の問題点に着目し、ユーザがカメラを手持ちして読取作業を行う場合に、煩雑な作業を行わなくとも、文字認識処理のために用意されている複数とおりの設定モードの中から認識に適したモードが自動選択されて、その選択された設定モードにより確度の高い文字認識処理を実行できるようにすることを課題とする。
加えて本発明は、認識結果の精度が確保されているか否かをユーザが容易に確認できるようにすることを課題とする。 The present invention pays attention to the above problems, and when a user performs a reading operation while holding a camera, the present invention is not limited to a complicated operation and includes a plurality of setting modes prepared for character recognition processing. It is an object of the present invention to select a mode suitable for recognition automatically, and to perform character recognition processing with high accuracy by the selected setting mode.
In addition, an object of the present invention is to enable a user to easily confirm whether or not the accuracy of a recognition result is ensured.

本発明によるプログラムは、動画撮影機能を有するカメラおよびカメラにより生成された動画像を表示する表示部に接続されたコンピュータを、カメラが文字列の撮影により生成した動画像を入力して、当該動画像中の文字を読み取る文字読取装置として機能させるものである。 A program according to the present invention inputs a moving image generated by shooting a character string from a computer connected to a camera having a moving image shooting function and a display unit that displays a moving image generated by the camera. It functions as a character reading device that reads characters in an image.

この文字読取装置には、文字列の撮影が行われている間、その撮影による最新のフレーム画像を入力して当該画像中の文字を認識する処理と、その認識結果の信頼度を算出する処理とを繰り返す認識処理手段；文字認識処理の対象のフレーム画像の画質および認識処理手段の動作のうちの少なくとも一方を規定する複数とおりの設定モードの中の１つを選択するモード制御手段；認識処理手段による毎回の認識結果に基づく文字情報を、表示部の動画像の表示と同じ画面に表示する表示制御手段、の各手段が設けられる。 In this character reading device, while a character string is being photographed, a process for recognizing characters in the image by inputting the latest frame image obtained by the photographing and a process for calculating the reliability of the recognition result Recognition processing means for repeating the above; mode control means for selecting one of a plurality of setting modes that define at least one of the image quality of the frame image to be subjected to character recognition processing and the operation of the recognition processing means; recognition processing Each means of the display control means which displays the character information based on the recognition result of each time by the means on the same screen as the display of the moving image of the display unit is provided.

また、モード制御手段は、認識処理手段の処理サイクルに合わせて複数とおりの設定モードを順番に切り替えながら、毎回の認識結果を分析する第１ステップと、あらかじめ定めた整合性の条件を満たす認識結果が毎サイクル得られている状態下で設定モードの切り替えが所定回数循環したことに応じて、分析の結果に基づき認識結果の信頼度が最も高い設定モードを選択し、その後の認識処理手段の複数サイクル分の処理において当該選択を維持する第２ステップとを、実行する。 The mode control means also includes a first step of analyzing a recognition result each time while sequentially switching a plurality of setting modes in accordance with a processing cycle of the recognition processing means, and a recognition result satisfying a predetermined consistency condition. In accordance with the fact that the switching of the setting mode is circulated a predetermined number of times under the condition that is obtained every cycle, the setting mode with the highest reliability of the recognition result is selected based on the analysis result, and a plurality of subsequent recognition processing means The second step of maintaining the selection in the processing for the cycle is executed.

上記の構成によれば、認識対象の文字列にカメラが位置合わせされて文字認識が実行されている間、認識処理のサイクルに合わせて複数とおりの設定モードが順に切り替えられる。この間の認識結果の信頼度は、設定モードの違いによって変動するが、同じ文字列の撮影が続いている間であれば、毎回の認識結果の間には、ある程度の整合性があるはずである。この点に着目して、本発明では、一定の整合性の条件を満たす認識結果が得られている間の設定モードの切り替えが所定回数循環したことに応じて、最も高い信頼度による認識結果が得られた設定モードを選択し、その設定モードによりさらに複数回の認識処理を行うので、その複数回の認識処理による認識結果が整合するか否かによって、認識結果の精度を認識することができる。 According to the above configuration, while the camera is aligned with the character string to be recognized and character recognition is being executed, a plurality of setting modes are sequentially switched in accordance with the recognition processing cycle. The reliability of the recognition result during this period varies depending on the setting mode, but there should be some degree of consistency between the recognition results every time as long as the same character string is being photographed. . Focusing on this point, in the present invention, the recognition result with the highest reliability is obtained in response to a predetermined number of cycles of switching between setting modes while a recognition result satisfying a certain consistency condition is obtained. Since the obtained setting mode is selected and recognition processing is further performed a plurality of times according to the setting mode, the accuracy of the recognition result can be recognized depending on whether or not the recognition results obtained by the plurality of recognition processing match. .

上記の処理が行われている間、表示部には、撮影中の動画像と共に、毎回の認識結果が表示される。設定モードの選択が切り替えられている間は、認識結果の表示が様々に変動する可能性が高いが、適切な設定モードの選択を維持した認識処理に移行すると、認識結果の表示が変動する可能性は低くなる。変動せずに固定表示された認識結果は、連続する複数回の認識処理のそれぞれで共通に得られたものであるので、ユーザは、不安定な表示から固定表示に変化したことをもって、認識結果が確定したと判断することができる。また、適切な設定モードが選択された後でも不安定な表示が続く場合には、認識処理に適した画像が得られていないと認識することができる。 While the above processing is being performed, the recognition result is displayed on the display unit together with the moving image being shot. While the setting mode selection is switched, the recognition result display is likely to fluctuate in various ways, but the recognition result display may fluctuate when moving to recognition processing that maintains the appropriate setting mode selection. The nature becomes low. Since the recognition result that is fixedly displayed without fluctuation is obtained in common in each of a plurality of consecutive recognition processes, the user can recognize that the recognition result has changed from unstable display to fixed display. Can be determined. If unstable display continues even after an appropriate setting mode is selected, it can be recognized that an image suitable for recognition processing has not been obtained.

読取作業が開始された直後など、認識対象の文字列へのカメラの位置合わせが終わっていない状態下では、毎回の認識結果は整合性の条件を満たさない状態になるので、第１ステップから第２ステップに移行することはない。したがって、認識対象の画像を取得する状態になってから適切な設定モードを選択することができるので、これによっても、認識の精度を確保することができる。 Under conditions where the camera has not been aligned with the character string to be recognized, such as immediately after the reading operation is started, each recognition result does not satisfy the consistency condition. There is no transition to two steps. Accordingly, since an appropriate setting mode can be selected after the image to be recognized is acquired, the accuracy of recognition can be ensured also by this.

上記の文字認識装置の第１の実施形態では、モード制御手段は、第２ステップにおいて認識手段があらかじめ定めた数のフレーム画像を処理したことに応じて第１ステップの最初の処理に戻る。すなわち第１ステップによる処理と第２ステップによる処理とが定期的に入れ替わることになる。 In the first embodiment of the character recognition device described above, the mode control means returns to the first process in the first step in response to the recognition means having processed a predetermined number of frame images in the second step. That is, the processing by the first step and the processing by the second step are periodically switched.

このようにすれば、たとえば、最初の第１ステップの結果に基づき選択された設定モードが、その後、認識処理に適さない状態になった場合には、他の設定モードに自動的に変更される。また、最初の位置合わせによる認識処理では精度の良い認識結果が得られなかったために、認識対象の文字列に対するカメラの位置合わせ状態を変更したり、認識対象の文字列自体を変更する場合にも、いちいち撮影を止めたり、読取開始の操作を再度行わなくとも、自動的に第１ステップの最初の処理に戻り、その第１ステップでのモード切り替え処理を経て、変更後の条件に適した設定モードを設定することができる。 In this way, for example, when the setting mode selected based on the result of the first first step is not suitable for the recognition process after that, it is automatically changed to another setting mode. . In addition, since the recognition process with the first alignment did not provide a highly accurate recognition result, the camera alignment state for the recognition target character string or the recognition target character string itself may be changed. Even if shooting is not stopped or scanning is not performed again, the process automatically returns to the first process of the first step, undergoes the mode switching process in the first step, and is set according to the changed conditions. The mode can be set.

上記の文字認識装置の第２の実施形態では、モード制御手段は、認識処理手段による最新の認識結果が一段階前の認識結果に整合しない状態になったとき、第１ステップの最初の処理に戻る。
この構成によれば、認識対象の文字列が変更された場合には、速やかにその新たな文字列に対し、最適な設定モードを探す処理を開始することができるので、効率良く処理を進行させることができる。また、ユーザは、認識対象の文字列を変更するために中止や再開などの面倒な操作をする必要がなく、カメラによる撮影の対象を変更するだけで、変更後の文字列の認識に適した設定モード下での読取作業を行うことができる。 In the second embodiment of the character recognition device described above, the mode control means performs the first process of the first step when the latest recognition result by the recognition processing means becomes inconsistent with the previous recognition result. Return.
According to this configuration, when the character string to be recognized is changed, it is possible to quickly start the process of searching for the optimum setting mode for the new character string. be able to. In addition, the user does not need to perform a cumbersome operation such as stopping or resuming to change the character string to be recognized, and is suitable for recognizing the changed character string by simply changing the object to be photographed by the camera. Reading operation can be performed under the setting mode.

上記の文字認識装置の第３の実施形態では、モード制御手段は、カメラの撮影用のパラメータの設定内容が異なる複数の設定モードを選択の対象とする。これにより、撮像場所の環境に適した撮影モードを自動的に設定して、認識処理の対象の画像の画質を確保することが可能になる。 In the third embodiment of the character recognition device described above, the mode control means selects a plurality of setting modes having different setting contents of the camera shooting parameters. As a result, it is possible to automatically set a shooting mode suitable for the environment of the imaging location and ensure the image quality of the image to be recognized.

上記の文字認識装置の第４の実施形態では、モード制御手段は、認識処理手段が画像から検出した文字を認識のために照合する際に使用する文字モデルの種別が異なる複数の設定モードを選択の対象とする。これにより、認識対象の文字のフォントや言語の種類に適した文字モデルを自動的に選択することが可能になる。 In the fourth embodiment of the character recognition device described above, the mode control means selects a plurality of setting modes with different types of character models used when the recognition processing means collates the characters detected from the images for recognition. The target of. Thereby, it becomes possible to automatically select a character model suitable for the font and language type of the character to be recognized.

本発明によれば、ユーザが、動画の表示を参照しながらカメラを認識対象の文字列に位置合わせすることによって、その文字列の認識に適した設定モードが自動的に選択されて、その選択状態で複数回の認識処理が行われる。この間に、動画像と共に表示される認識結果の表示は、各設定モードが順に切り替えられる状態下では不安定な表示となるが、適切な設定モードが選択された後の複数回の認識処理の結果が一致すれば、固定された表示に変化する。よって、ユーザは、不安定な表示から固定表示に変わったことによって、認識結果が確定したことを容易に把握することができる。これにより、操作性が高められると共に確度の高い認識文字列を表示することができ、利便性が高められる。 According to the present invention, when a user aligns the camera with a character string to be recognized while referring to the display of a moving image, a setting mode suitable for recognition of the character string is automatically selected, and the selection is performed. Multiple recognition processes are performed in the state. During this time, the display of the recognition result displayed together with the moving image is unstable when the setting modes are sequentially switched, but the result of the recognition processing multiple times after the appropriate setting mode is selected. If they match, the display changes to a fixed display. Therefore, the user can easily grasp that the recognition result has been confirmed by changing from unstable display to fixed display. As a result, the operability can be improved and a recognized character string with high accuracy can be displayed, thereby improving convenience.

本発明が適用されたＯＣＲアプリケーションの機能ブロック図である。It is a functional block diagram of an OCR application to which the present invention is applied. 認識結果の表示画面の変遷の例を示す図である。It is a figure which shows the example of transition of the display screen of a recognition result. 撮影モードの種別とその設定内容とを対応づけたテーブルである。It is the table which matched the kind of imaging | photography mode, and the setting content. 毎回の認識処理に設定される撮影モードと、その認識処理により得られた認識結果の信頼度とを、対応づけたテーブルである。It is the table which matched the imaging | photography mode set to each recognition process, and the reliability of the recognition result obtained by the recognition process. 撮影モードを切り替えながらの認識処理と最適モードによる認識処理とを繰り返す場合の毎回の認識処理に設定される撮影モードと、その認識処理により得られた認識結果の信頼度とを、対応づけたテーブルである。A table that associates the shooting mode set for each recognition process when the recognition process while switching the shooting mode and the recognition process in the optimum mode are repeated, and the reliability of the recognition result obtained by the recognition process. It is. 図５のテーブルに示した撮影モードの切替処理が適用される場合のＯＣＲアプリケーションにおける処理の概略手順を示すフローチャートである。FIG. 6 is a flowchart showing a schematic procedure of processing in an OCR application when a shooting mode switching process shown in the table of FIG. 5 is applied. 認識結果の分析処理の詳細な手順を示すフローチャートである。It is a flowchart which shows the detailed procedure of the analysis process of a recognition result. 毎回の認識処理に設定される文字モデルの設定モードと、その認識処理により得られた認識結果の信頼度とを、対応づけたテーブルである。It is the table which matched the setting mode of the character model set to each recognition process, and the reliability of the recognition result obtained by the recognition process.

図１は、携帯型の情報処理装置に組み込まれるＯＣＲアプリケーション１の構成例を、機能ブロック図として表したものである。
この実施例の情報処理装置の具体的形態はスマートフォンであって、動画撮影機能を有するカメラ２と、表示部および操作部が一体化されたタッチパネル３とを具備する。ＯＣＲアプリケーション１には、カメラインタフェース１３、入出力インタフェース１４のほか、スマートフォンの制御部を文字読取装置として機能させるためのライブラリ１０（文字認識用の機能を持つプログラム群）が含まれる。 FIG. 1 is a functional block diagram illustrating a configuration example of an OCR application 1 incorporated in a portable information processing apparatus.
A specific form of the information processing apparatus of this embodiment is a smartphone, and includes a camera 2 having a video shooting function and a touch panel 3 in which a display unit and an operation unit are integrated. In addition to the camera interface 13 and the input / output interface 14, the OCR application 1 includes a library 10 (a program group having a function for character recognition) for causing the control unit of the smartphone to function as a character reading device.

この実施例のライブラリ１０には、文字認識処理部１１および制御部１２の各機能が含まれる。 The library 10 of this embodiment includes the functions of the character recognition processing unit 11 and the control unit 12.

カメラインタフェース１３は、ＯＣＲアプリケーション１の起動に応じてカメラ２に動画撮影を開始させると共に、毎回のフレーム画像を取り込む。取り込まれた画像は、入出力インタフェース１４および文字認識処理部１１に提供される。 The camera interface 13 causes the camera 2 to start moving image shooting in response to the activation of the OCR application 1 and captures a frame image every time. The captured image is provided to the input / output interface 14 and the character recognition processing unit 11.

文字認識処理部１１には、複数種の文字モデルが登録された辞書（図示せず。）が含まれており、カメラインタフェース１３から提供されたフレーム画像に対し、画像投影処理の手法により画像内の個々の文字を抽出する文字切り出し処理や、抽出された文字の画像を辞書内の各種文字モデルと照合する照合処理などを実行して、各文字に対応する文字コードを認識すると共に、その認識結果の信頼度を算出する。 The character recognition processing unit 11 includes a dictionary (not shown) in which a plurality of types of character models are registered. The frame image provided from the camera interface 13 is stored in the image by a method of image projection processing. Character extraction corresponding to each character is recognized and recognized by performing character segmentation processing that extracts individual characters of the character and collation processing that compares the extracted character images with various character models in the dictionary. Calculate the reliability of the result.

上記の認識処理や信頼度の算出処理は、毎回、カメラインタフェース１３がその時点で取り込んでいる最新のフレーム画像を対象にして、繰り返し実行される。各回の認識結果や信頼度は、バッファメモリ（図示せず。）に格納され、制御部１２により処理される。 The above recognition processing and reliability calculation processing are repeatedly executed for each latest frame image captured by the camera interface 13 at that time. The recognition result and reliability of each time are stored in a buffer memory (not shown) and processed by the control unit 12.

制御部１２には、認識処理のための撮影モードを設定する機能と、毎回の認識結果および信頼度を分析する機能と、認識結果を示す文字情報を含む表示用情報を生成する機能とが設けられる。表示用情報は、入出力インターフェース１４に渡されて、カメラインタフェース１３から提供された動画像と共にタッチパネル３に表示される。 The control unit 12 has a function of setting a shooting mode for recognition processing, a function of analyzing a recognition result and reliability every time, and a function of generating display information including character information indicating the recognition result. It is done. The display information is transferred to the input / output interface 14 and displayed on the touch panel 3 together with the moving image provided from the camera interface 13.

図２は、タッチパネル３に表示される画面の例を示す。
この例は、「オムロン」という文字列を撮影して文字認識を行ったものである。図中、中央の枠１００内に大きく示されているのが認識対象の文字列であり、その下に小さく表示されている文字列Ｍ１〜Ｍ４が認識結果である。 FIG. 2 shows an example of a screen displayed on the touch panel 3.
In this example, character recognition is performed by photographing a character string “OMRON”. In the figure, the character string to be recognized is shown largely in the central frame 100, and the character strings M1 to M4 that are displayed below the character string are the recognition results.

この実施例では、「標準モード」「屋外モード１」「屋外モード２」の３種類の撮影モード（図３を参照）が設けられている。標準モードは、主として室内での撮影に適したモードであり、屋外モード１は、天候が晴れのときの屋外での撮影に適したモードであり、屋外モード２は、天候が曇りのときの屋外での撮影に適したモードである。 In this embodiment, three types of photographing modes (see FIG. 3) of “standard mode”, “outdoor mode 1”, and “outdoor mode 2” are provided. The standard mode is a mode suitable mainly for indoor shooting, the outdoor mode 1 is a mode suitable for outdoor shooting when the weather is fine, and the outdoor mode 2 is outdoor when the weather is cloudy. This mode is suitable for shooting with the camera.

図２に示す画面には、カメラ２から入力された最新のフレーム画像が表示されると共に、画面の左上隅に、表示中の画像の撮影に適用された撮影モードを示す情報が表示されている。この実施例では、上記３種類の撮影モードを順に選択するようにして、認識処理の都度、撮影モードを切り替える処理を所定サイクル実行した後に、最も信頼度が高い認識結果が得られた撮影モードを選択し、以後は、その最適モードを維持するようにしている。図２（１）（２）（３）は、撮影モードの最後の切り替えが行われている間の画面であり、図２（４）は最適モードが選択された後の画面である。 In the screen shown in FIG. 2, the latest frame image input from the camera 2 is displayed, and information indicating the shooting mode applied to the shooting of the displayed image is displayed in the upper left corner of the screen. . In this embodiment, the above three types of shooting modes are selected in order, and each time the recognition process is performed, a process for switching the shooting mode is executed for a predetermined cycle, and then the shooting mode with the most reliable recognition result is obtained. After that, the optimum mode is maintained. 2 (1), (2), and (3) are screens during the last switching of the shooting mode, and FIG. 2 (4) is a screen after the optimum mode is selected.

各撮影モードは、それぞれ文字認識処理のサイクルに応じた周期で切り替えられ、撮影モード毎に、そのモード内に得られた１フレーム画像に対する文字認識処理が実行されると共に、つぎのモードに切り替えられる前に認識結果が表示される。各画面における認識結果を示す文字列Ｍ１〜Ｍ４は、いずれも、この画面に表示されている撮影モード中に得られた結果である。たとえば、図２（１）中の文字列Ｍ１は、標準モードが設定されている期間内の１フレーム画像に対する認識結果であり、図２（２）中の文字列Ｍ２は、屋外モード１が設定されている期間内の１フレーム画像に対する認識結果であり、図２（３）中の文字列Ｍ３は、屋外モード２が設定されている期間内の１フレーム画像に対する認識結果である。
また図２（４）の画像は、撮影モード別の信頼度に基づき標準モードが最適モードとして選択され、その最適モード下で数回の認識処理が行われた段階でのフレーム画像であり、文字列Ｍ４は、表示中の画像より前に生成された画像に対する認識結果である。 Each shooting mode is switched at a cycle corresponding to the character recognition processing cycle. For each shooting mode, character recognition processing for one frame image obtained in the mode is executed and the mode is switched to the next mode. The recognition result is displayed before. Each of the character strings M1 to M4 indicating the recognition results on each screen is a result obtained during the shooting mode displayed on this screen. For example, the character string M1 in FIG. 2 (1) is a recognition result for one frame image within the period in which the standard mode is set, and the outdoor mode 1 is set in the character string M2 in FIG. 2 (2). The character string M3 in FIG. 2 (3) is a recognition result for one frame image within the period for which the outdoor mode 2 is set.
The image in FIG. 2 (4) is a frame image when the standard mode is selected as the optimum mode based on the reliability for each shooting mode and the recognition process is performed several times under the optimum mode. Column M4 is a recognition result for an image generated before the image being displayed.

認識結果の表示では、認識された各文字のうち、対応する文字モデルに対する類似度（文字単位での信頼度）が所定の基準値を下回っている文字を、輝度の高い色彩（たとえば赤色）で表示するようにしている。図２では、その色彩に代えて、信頼度が低い文字を、背景の網点パターンにより表現している。 In the display of the recognition result, among the recognized characters, characters whose similarity (reliability in character units) with respect to the corresponding character model is below a predetermined reference value are displayed in a high-luminance color (for example, red). It is trying to display. In FIG. 2, instead of the color, a character with low reliability is represented by a background halftone pattern.

図２の例によれば、３種類の撮影モードのうち、標準モードにおいて認識された文字列Ｍ１では、全ての文字が基準値以上の信頼度をもって認識されている。これに対し、屋外モード２において認識された文字列Ｍ３では、「オ」「ム」「ロ」「ン」の各文字のうちの「ロ」がカギ括弧記号の『［』と『］』として分離されて誤認識されている。また屋外モード１において認識された文字列Ｍ２では、「ロ」が漢字の『口（くち）』に誤認識されている。このため、図２の例では、モード別の信頼度は、標準モードにおいて最も高くなっており、図２（４）に示すように、標準モードが最適モードとして設定されている。 According to the example of FIG. 2, in the character string M1 recognized in the standard mode among the three types of shooting modes, all characters are recognized with a reliability equal to or higher than the reference value. On the other hand, in the character string M3 recognized in the outdoor mode 2, “ro” among the characters “o”, “m”, “ro”, and “n” is indicated by “[” and “]” as the bracket characters. Separated and misrecognized. Further, in the character string M2 recognized in the outdoor mode 1, “Ro” is erroneously recognized as the “Kuchi” character. For this reason, in the example of FIG. 2, the reliability for each mode is the highest in the standard mode, and the standard mode is set as the optimum mode as shown in FIG.

図３は、上記３種類の撮影モードの具体的な設定内容を示す。
各撮影モードには、それぞれ固有のモード番号（図中のNo.の欄）が設定されている。撮影モードが切り替えられる際には、このモード番号が若い順に各撮影モードが選択される。 FIG. 3 shows specific setting contents of the three types of shooting modes.
Each shooting mode has its own mode number (No. column in the figure). When the shooting mode is switched, each shooting mode is selected in ascending order of the mode number.

各撮影モードは、撮影時の明るさ調整の度合とコントラストの強調処理を行うか否かの選択の組み合わせを異にするものである。図２では、標準モードで調整される明るさを「標準」とし、標準より強い調整を「強」としているが、これらが示す実際の設定は、各制御パラメータの具体的な値の組み合わせとして、メモリ内に登録されている。 Each shooting mode has a different combination of the degree of brightness adjustment at the time of shooting and the selection of whether or not to perform contrast enhancement processing. In FIG. 2, the brightness adjusted in the standard mode is “standard”, and the adjustment stronger than the standard is “strong”. However, the actual settings indicated by these are combinations of specific values of each control parameter, Registered in memory.

明るさの調整のためのパラメータには、カメラ２のシャッタ速度を示す数値、絞りの開度を示す数値、ストロボ光源を発光させるか否かを示すフラグなどが含まれる。これらのパラメータは、制御部１２からカメラインタフェース１３に与えられて、それぞれのパラメータに基づきカメラ２やストロボ光源の動作が制御される。
コントラストの強調処理は、カメラ２の信号処理部に標準装備されている機能である。この機能にかかるパラメータは、当該機能を有効にするか否かを示すフラグであり、明るさのパラメータと同様に、制御部１２からカメラインタフェース１３に与えられて、当該機能がオンまたはオフに設定される。 Parameters for adjusting the brightness include a numerical value indicating the shutter speed of the camera 2, a numerical value indicating the aperture of the diaphragm, a flag indicating whether or not the strobe light source emits light, and the like. These parameters are given from the control unit 12 to the camera interface 13, and the operations of the camera 2 and the strobe light source are controlled based on the respective parameters.
The contrast enhancement process is a function that is provided as a standard in the signal processing unit of the camera 2. The parameter relating to this function is a flag indicating whether or not the function is to be enabled. Like the brightness parameter, the parameter is given from the control unit 12 to the camera interface 13 so that the function is turned on or off. Is done.

標準モードでは、標準の明るさによる撮影が行われ、コントラストの強調は行われないように設定される。屋外モード１では、撮影時の明るさを標準として、コントラストの強調処理を行う。屋外モード２では、標準より強い明るさによる撮影が行われると共に、さらに、コントラストの強調処理が行われる。 In the standard mode, shooting is performed with standard brightness, and contrast enhancement is not performed. In the outdoor mode 1, contrast enhancement processing is performed using the brightness at the time of shooting as a standard. In the outdoor mode 2, photographing with brightness higher than the standard is performed, and further contrast enhancement processing is performed.

図４は、毎回の認識処理に設定される撮影モードと、その認識処理により得られた認識結果の信頼度とを、表形式にして対応づけたものである。
この実施例では、標準モード→屋外モード１→屋外モード２の順に撮影モードを切り替える。また、モード別に認識結果の信頼度の平均値を求め、切り替えが二巡する６回目の認識処理を終了したときに、平均信頼度が最も高いモードを最適モードとして選択する。 FIG. 4 is a table in which the shooting modes set for each recognition process and the reliability of the recognition result obtained by the recognition process are associated in a table format.
In this embodiment, the photographing mode is switched in the order of standard mode → outdoor mode 1 → outdoor mode 2. Further, the average value of the reliability of the recognition result is obtained for each mode, and when the sixth recognition process in which the switching is performed twice is completed, the mode having the highest average reliability is selected as the optimum mode.

図４の例では、標準モードでは、２回の認識処理において、それぞれ「８５」「９０」の信頼度による認識結果が得られている。屋外モード２での認識結果の信頼度は、２回とも「７５」となっている。屋外モード３での２回の認識処理では、それぞれ「７０」「６０」の信頼度による認識結果が得られている。よって、表の欄外に記載しているように、標準モードにおける平均信頼度が最も高くなるので、７回目以降の認識処理では、標準モードが維持される。 In the example of FIG. 4, in the standard mode, recognition results with reliability of “85” and “90” are obtained in two recognition processes, respectively. The reliability of the recognition result in the outdoor mode 2 is “75” both times. In the recognition processing twice in the outdoor mode 3, recognition results with reliability of “70” and “60” are obtained. Therefore, as described in the margin of the table, since the average reliability in the standard mode is the highest, the standard mode is maintained in the seventh and subsequent recognition processes.

ただし、動作モード毎の平均信頼度を比較して最適モードを選択する処理は、６回の認識処理における認識結果の間に、あらかじめ定めた基準を満たす整合度合いが認められる場合に限り、実施される。認識結果が整合度合いの基準を満たさない場合には、処理回数のカウントはリセットされ、１回目から処理をやり直す。 However, the process of selecting the optimum mode by comparing the average reliability for each operation mode is performed only when a matching degree satisfying a predetermined criterion is recognized among the recognition results in the six recognition processes. The If the recognition result does not satisfy the matching degree criterion, the processing count is reset and the processing is restarted from the first time.

この実施例では、毎回の認識処理に応じてその認識結果を表示するが、図２に例示したように、信頼度の低い認識結果には誤りが生じている可能性が高い。
特に、撮影モードが切り替えられている間には、毎時の信頼度のばらつきによって表示に変動が生じる可能性が高いが、最適な撮影モードが選択されると、毎回、精度の高い認識処理が行われるので、各回の認識結果が一致するようになる。よって、画面上の認識結果も、不安定な表示から固定表示へと移行するので、ユーザは、固定表示状態になったことをもって、認識結果が確定したと判断することができる。 In this embodiment, the recognition result is displayed according to each recognition process, but as illustrated in FIG. 2, there is a high possibility that an error has occurred in the recognition result with low reliability.
In particular, while the shooting mode is switched, there is a high possibility that the display will fluctuate due to variations in reliability every hour. However, when the optimal shooting mode is selected, a highly accurate recognition process is performed each time. As a result, the recognition results at each time match. Therefore, the recognition result on the screen also shifts from an unstable display to a fixed display, so that the user can determine that the recognition result has been confirmed when the fixed display state is reached.

よって、ユーザは、表示が変動している間は撮影を続ける必要があると考えて、スマートフォンの機体を文字列に向けた状態を維持するようになる。これにより、最適な撮影モードが選択された後の画像の画質がより高められ、より確度の高い認識結果を得ることが可能になる。 Therefore, the user thinks that it is necessary to continue shooting while the display is changing, and maintains the state in which the body of the smartphone faces the character string. As a result, the image quality of the image after the optimum shooting mode is selected is further improved, and a recognition result with higher accuracy can be obtained.

文字列に対するカメラの位置合わせが適切でない場合などには、最適な撮影モード下での認識処理でも信頼度が低くなって誤認識が生じ、不安定表示が続く可能性がある。しかし、この不安定表示の継続によって、ユーザは、認識精度が確保されていないことを認識し、カメラの位置や姿勢を変更するなどの対策をとることができる。 In the case where the camera is not properly aligned with the character string, there is a possibility that the reliability is lowered even in the recognition processing under the optimum photographing mode, erroneous recognition occurs, and unstable display continues. However, by continuing the unstable display, the user can recognize that the recognition accuracy is not ensured and take measures such as changing the position and posture of the camera.

図５は、撮影モードの選択に関する他の実施例を示す。
この実施例でも、標準モード→屋外モード１→屋外モード２の順序で、認識処理の都度撮影モードを切り替え、切り替えが二巡したときの各モードの平均信頼度に基づき最適な撮影モードを選択し、以後、その最適モードによる認識処理を続ける点は、図４の実施例と同様である。ただし、この実施例では、選択した最適モードによる認識処理が２４回続くと、再び、撮影モードを切り替える状態に転じる。そして、この切り替えが二巡したときの各モードの平均信頼度に基づき、最適な撮影モードを選択し直す。 FIG. 5 shows another embodiment relating to selection of the shooting mode.
Also in this embodiment, in the order of standard mode → outdoor mode 1 → outdoor mode 2, the shooting mode is switched every time the recognition processing is performed, and the optimum shooting mode is selected based on the average reliability of each mode when the switching is repeated twice. Thereafter, the recognition process in the optimum mode is continued as in the embodiment of FIG. However, in this embodiment, when the recognition process in the selected optimum mode continues 24 times, the state again switches to the shooting mode. Then, based on the average reliability of each mode when this switching is performed twice, the optimum shooting mode is selected again.

図５に示す具体例では、最初の１回目〜６回目の認識処理で撮影モードが順に切り替えられた場合には、標準モードの平均信頼度が最も高くなっているので、７回目〜３０回目の認識処理では、標準モードが採用されている。しかし、次の３１回目から３６回目までの認識処理で撮影モードが再び切り替えられると、今度は標準モードではなく、屋外モード１の平均信頼度が最も高くなっている。これを受けて、３７回目〜６０回目の認識処理では、屋外モード１が選択されている。 In the specific example shown in FIG. 5, when the shooting modes are sequentially switched in the first to sixth recognition processes, the average reliability of the standard mode is the highest, so the seventh to 30th times. In the recognition process, the standard mode is adopted. However, when the shooting mode is switched again in the next recognition process from the 31st to the 36th, the average reliability of the outdoor mode 1 is the highest instead of the standard mode. In response, outdoor mode 1 is selected in the 37th to 60th recognition processes.

その後も同様に、最適モードとして選択した撮影モード下での認識処理が２４回実行されると、撮影モードを切り替える処理に戻り、この切り替えの下で実施された６回の認識処理におけるモード毎の平均信頼度に基づき、最適な撮影モードが再選択される。つまりは、３０回分の認識処理のうちの最初の６回で毎回の認識処理毎に撮影モードを切り替え、これらの処理により平均信頼度が最も高くなった撮影モードを最適モードとして、残りの２４回の認識処理を実行する、というアルゴリズムが繰り返される。
ただし、この繰り返しも、毎回の認識結果が整合性の基準を満たしていることを条件とする。最新の認識結果と前回の認識結果とが整合性の基準に適合しない状態になると、再び、１回目の処理に戻る。 Similarly, when the recognition process under the shooting mode selected as the optimum mode is executed 24 times, the process returns to the process of switching the shooting mode, and each mode in the 6 recognition processes performed under this switching is performed. Based on the average reliability, the optimum shooting mode is selected again. In other words, in the first six out of 30 recognition processes, the imaging mode is switched for each recognition process, and the imaging mode having the highest average reliability by these processes is set as the optimum mode, and the remaining 24 times. The algorithm of executing the recognition process is repeated.
However, this repetition is also a condition that the recognition result of each time satisfies the consistency standard. If the latest recognition result and the previous recognition result do not meet the consistency standard, the process returns to the first process again.

以下、図５に示した撮影モードの選択を行うことを前提として、ＯＣＲアプリケーション１により実行される処理の具体的手順を説明する。
図６は、処理全体の流れを示し、図７は、図６中のステップＳ８（認識結果の分析処理）の詳細な手順を示す。 Hereinafter, a specific procedure of processing executed by the OCR application 1 will be described on the assumption that the shooting mode shown in FIG. 5 is selected.
FIG. 6 shows the flow of the entire process, and FIG. 7 shows the detailed procedure of step S8 (recognition result analysis process) in FIG.

まず、図６の手順について説明する。この処理は、読取のための撮影開始を指示する操作に応じて開始されるもので、最初のステップＳ１で、処理回数を示すカウンタｎを初期値の１に設定した後、ｎが３０に達するまで、ステップＳ２〜Ｓ９のループを繰り返す。 First, the procedure of FIG. 6 will be described. This process is started in response to an operation for instructing to start photographing for reading. After the counter n indicating the number of processes is set to an initial value 1 in the first step S1, n reaches 30. Until this time, the loop of steps S2 to S9 is repeated.

このループでは、ｎの値が６以下の間は、ステップＳ２が「ＹＥＳ」となり、ｎの値に応じた撮影モードが選択される（ステップＳ３）。具体的には、ｎが１から３の値をとる場合には、その値に対応するモード番号（図２を参照。）の撮影モードが選択され、ｎが４から６の値をとる場合には、そのｎから３を減算した値に対応するモード番号の撮影モードが選択される。さらに、この選択の後、ステップＳ４において、選択した撮影モード用のパラメータによりカメラ２等の動作を制御し、その制御下で生成されたフレーム画像を取得する。 In this loop, while the value of n is 6 or less, step S2 is “YES”, and the photographing mode corresponding to the value of n is selected (step S3). Specifically, when n takes a value from 1 to 3, the shooting mode of the mode number corresponding to that value (see FIG. 2) is selected, and when n takes a value from 4 to 6. The shooting mode of the mode number corresponding to the value obtained by subtracting 3 from n is selected. Further, after this selection, in step S4, the operation of the camera 2 or the like is controlled by the selected shooting mode parameter, and a frame image generated under the control is acquired.

一方、ｎの値が６を上回る場合には、ステップＳ２が「ＮＯ」となってステップＳ５に進む。このステップＳ５では、最適モードのパラメータに基づきカメラ２等の動作を制御し、その制御下で生成されたフレーム画像を取得する。 On the other hand, when the value of n exceeds 6, step S2 becomes “NO” and the process proceeds to step S5. In step S5, the operation of the camera 2 or the like is controlled based on the parameters of the optimum mode, and a frame image generated under the control is acquired.

このように、ｎの値に応じてステップＳ３およびＳ４、またはステップＳ５を実行することにより、選択された撮影モードのパラメータの設定に基づく撮影制御を行い、その撮影により生成されたフレーム画像を取得する。 As described above, by executing steps S3 and S4 or step S5 according to the value of n, the imaging control is performed based on the setting of the parameter of the selected imaging mode, and the frame image generated by the imaging is acquired. To do.

ステップＳ６では、上記の処理を経て取得したフレーム画像を用いた文字認識処理を実行する。簡単に説明すると、処理対象の画像を２値化して、ｘ，ｙの各軸方向に投影することにより、個々の文字画像を検出する。そして検出された文字画像毎に、辞書に登録されている種々の文字モデルと順に照合して類似度を算出し、最も高い類似度を得た文字モデルが示す文字を文字画像にあてはめる。この最大の類似度が文字単位での信頼度となる。認識された文字およびその信頼度は、文字の並びに沿う順番で、バッファメモリに保存される。 In step S6, a character recognition process using the frame image acquired through the above process is executed. Briefly, each character image is detected by binarizing the image to be processed and projecting it in the x and y axis directions. Then, for each detected character image, the similarity is calculated by sequentially comparing with various character models registered in the dictionary, and the character indicated by the character model having the highest similarity is applied to the character image. This maximum similarity is the reliability in character units. The recognized characters and their reliability are stored in the buffer memory in the order that the characters follow.

文字認識処理が終了すると、ステップＳ７では、今回（ｎ回目）の文字認識処理において文字毎に保存された類似度を読み出し、この認識結果の信頼度（文字単位での類似度の平均値）を算出する。算出された信頼度は、先の認識結果に対応づけられて、バッファメモリに保存される。 When the character recognition process is completed, in step S7, the similarity stored for each character in the current (n-th) character recognition process is read, and the reliability of this recognition result (average value of similarity in character units) is obtained. calculate. The calculated reliability is stored in the buffer memory in association with the previous recognition result.

この後、認識結果の分析処理（ステップＳ８）が行われ、その後に認識結果が表示される（ステップＳ９）。このときのｎが３０未満であれば（ステップＳ１０が「ＹＥＳ」）、ステップＳ１１でｎの値を１つ増やしてからステップＳ２に戻る。ｎ＝３０の場合（ステップＳ１０が「ＮＯ」）には、ステップＳ１に戻るので、ｎの値は１に戻る。 Thereafter, a recognition result analysis process (step S8) is performed, and then the recognition result is displayed (step S9). If n at this time is less than 30 ("YES" in step S10), the value of n is incremented by 1 in step S11, and then the process returns to step S2. If n = 30 (“NO” in step S10), the process returns to step S1, and the value of n returns to 1.

つぎに、図７を参照して、ステップＳ８の認識結果の分析処理の詳細を説明する。
この処理では、まずｎの値をチェックし、ｎ＞１であれば（ステップＳ１２が「ＹＥＳ」）、今回（ｎ回目）の認識結果と前回（ｎ−１回目）の認識結果とを照合し、両者が整合性の基準を満たすかどうかをチェックする（ステップＳ１３，Ｓ１４）。 Next, with reference to FIG. 7, the details of the recognition result analysis processing in step S8 will be described.
In this process, first, the value of n is checked. If n> 1 (“YES” in step S12), the current (nth) recognition result and the previous (n−1) th recognition result are collated. Then, it is checked whether or not both satisfy the consistency standard (steps S13 and S14).

たとえば、それぞれの回で認識された文字数を比較し、文字数の差が所定値以下であれば、文字の並びに沿って文字毎の認識結果を比較し、認識結果の一致数を求める。この一致数、または大きい方の認識文字数に対する一致数の割合が、あらかじめ定めた基準値以上であれば、今回の認識結果と前回の認識結果とは、整合性の基準を満たしていると判定する。ここで整合性の基準を満たさないと判定された場合（ステップＳ１４が「ＮＯ」）には、ステップＳ１９においてｎに０をセットした後、分析処理を終了する。 For example, the number of characters recognized at each time is compared, and if the difference in the number of characters is less than or equal to a predetermined value, the recognition results for each character are compared along the character sequence to obtain the number of matches of the recognition results. If the number of matches or the ratio of the number of matches to the larger number of recognized characters is equal to or greater than a predetermined reference value, it is determined that the current recognition result and the previous recognition result satisfy the consistency standard. . If it is determined that the consistency criterion is not satisfied (step S14 is “NO”), n is set to 0 in step S19, and the analysis process is terminated.

整合性の基準を満たすと判定された場合（ステップＳ１４が「ＹＥＳ」）には、ｎの値が維持される。ｎが３以下または７以上であれば、ステップＳ１５が「ＮＯ」となり、この判定をもって分析処理を終了する、 When it is determined that the consistency criterion is satisfied (step S14 is “YES”), the value of n is maintained. If n is 3 or less or 7 or more, step S15 is “NO”, and the analysis process is terminated with this determination.

ｎが４，５，６のいずれかの値をとる場合には、ステップＳ１５が「ＹＥＳ」となってステップＳ１６に進み、そのｎの値に応じた撮影モードの平均信頼度を算出する。具体的には、ｎ−３回目の処理で算出された信頼度とｎ回目の処理で算出された信頼度との平均値を求め、この平均値をモード番号が（ｎ−３）の撮影モードの平均信頼度に設定することになる。算出された平均信頼度は、対応するモード番号と組み合わせられてバッファメモリに保存される。 When n takes any value of 4, 5, and 6, step S15 becomes “YES”, the process proceeds to step S16, and the average reliability of the photographing mode corresponding to the value of n is calculated. Specifically, an average value of the reliability calculated in the n-3th process and the reliability calculated in the nth process is obtained, and this average value is obtained as a shooting mode whose mode number is (n-3). Will be set to the average reliability. The calculated average reliability is stored in the buffer memory in combination with the corresponding mode number.

ｎの値が４または５であれば、ステップＳ１７が「ＮＯ」となり、この判定をもって分析処理を終了する。ｎ＝６であれば、ステップＳ１７が「ＹＥＳ」となってステップＳ１８に進む。
ステップＳ１８では、各モード毎に算出された平均信頼度をバッファメモリから読み出して比較し、平均信頼度が最も高い撮影モードを最適モードとして選択する。これをもって、分析処理は終了する。 If the value of n is 4 or 5, step S17 becomes “NO”, and the analysis process is terminated with this determination. If n = 6, step S17 becomes “YES” and the process proceeds to step S18.
In step S18, the average reliability calculated for each mode is read from the buffer memory and compared, and the shooting mode with the highest average reliability is selected as the optimum mode. This completes the analysis process.

上記図６および図７に示した手順によれば、同一の文字列を対象にした撮影が続く間は、３種類の撮影モードを切り替えながらの認識処理と、最適モードとして選択された撮影モードの下での認識処理とが、定期的に繰り返される。毎回の認識処理は、それぞれ最新のフレーム画像を用いて、その回で独立して行われるので、撮影対象の文字列に対するカメラ２の位置合わせ状態の変化や撮影モードの変更などにより処理対象の画像の状態が多少変化しても、認識結果の整合性の条件が満たされる間は定期的な繰り返しが続く。 According to the procedure shown in FIG. 6 and FIG. 7, while the shooting for the same character string continues, the recognition process while switching between the three shooting modes and the shooting mode selected as the optimum mode are performed. The recognition process below is repeated periodically. Each recognition process is performed independently at each time using the latest frame image. Therefore, the image to be processed is changed by changing the alignment state of the camera 2 with respect to the character string to be photographed or by changing the photographing mode. Even if the state changes slightly, as long as the condition for consistency of recognition results is satisfied, periodic repetition continues.

一方、ある文字列に対する読取作業を終了したユーザが、引き続き別の文字列の読取作業を開始するなどして、認識結果が前回のものに整合しない状態になると、その認識結果を分析する処理において、ｎに０がセットされる（図７のステップＳ１４，Ｓ１９を参照。）。このｎの値は、さらに、分析処理（ステップＳ８）および認識結果の表示（ステップＳ９）が終了した後のステップＳ１１において１に更新されるので、次の認識処理は１回目の処理として実施される。 On the other hand, when a user who has finished reading a certain character string continues to start reading another character string, and the recognition result becomes inconsistent with the previous one, in the process of analyzing the recognition result , N is set to 0 (see steps S14 and S19 in FIG. 7). Since the value of n is further updated to 1 in step S11 after the analysis process (step S8) and the display of the recognition result (step S9) are completed, the next recognition process is performed as the first process. The

ここで、図６および図７の処理により生じる作用・効果を整理する。
まず、この実施例では、毎回の認識結果が整合性の条件を満たす場合にのみ、撮影モードの切り替えによる認識処理や最適モードによる認識処理を実行するので、読取開始時に撮影位置が定まっていない間に撮影モードの切り替えが進行して、適切でない撮影モードが最適モードとして選択されることがない。認識対象の文字列へのカメラ２の位置合わせが完了していないために文字列が適切に含まれる画像が生成されていない間は、認識処理が行われても、毎回の認識結果が整合しないので、ｎの値は０と１との間を行き来する状態となり、実質的に、毎回、１回目の認識処理が実施されることになる。カメラ２の位置が定まると、その時点から、撮影モードの切り替えが開始されるので、早期に最適モードを選択することができる。 Here, the actions and effects generated by the processing of FIGS. 6 and 7 will be organized.
First, in this embodiment, the recognition process by switching the photographing mode or the recognition process by the optimum mode is executed only when the recognition result of each time satisfies the consistency condition, so that the photographing position is not determined at the start of reading. Therefore, the switching of the shooting mode does not proceed and an inappropriate shooting mode is not selected as the optimum mode. While the alignment of the camera 2 to the character string to be recognized is not completed, the recognition result of each time does not match even if the recognition process is performed while the image appropriately including the character string is not generated. Therefore, the value of n goes back and forth between 0 and 1, and the first recognition process is performed each time substantially. When the position of the camera 2 is determined, switching of the shooting mode is started from that point, so that the optimum mode can be selected at an early stage.

その一方で、画像の状態が多少変わっても、認識結果の整合性の条件が満たされている場合には、定期的な繰り返しを続けることができるので、処理の遅延を防ぐことができる。たとえば、撮影モードの切り替え処理が行われている間にスマートフォンの機体を持つ手が多少動いても、処理回数がクリアされることなく、最適モードの選択に進むことができる。 On the other hand, even if the state of the image changes slightly, if the condition for the consistency of the recognition result is satisfied, periodic repetition can be continued, so that processing delay can be prevented. For example, even if the hand holding the body of the smartphone moves slightly while the shooting mode switching process is being performed, the process can proceed to selection of the optimum mode without clearing the number of processes.

また、たとえば移動中の車中で文書の読み取り作業をしている間に明るい場所から暗い場所に移動するなど、文字列の読取処理の途中で周囲環境が変わった場合にも、適宜、その環境変化に応じた撮影モードにより最適モードが更新されるので、認識処理の精度を確保することができる。 Also, if the surrounding environment changes during the character string reading process, such as moving from a bright place to a dark place while reading a document in a moving car, the environment Since the optimum mode is updated by the photographing mode corresponding to the change, the accuracy of the recognition process can be ensured.

表示画面では、毎回の認識結果が撮影中の動画像と同じ画面に表示されるので、文字列へのカメラ２の位置合わせが終了すると、速やかに、現在の撮影状態を確認しながら認識結果の適否を判断することができる。先にも述べたように、撮影モードが切り替えられている間は認識結果の表示が不安定になる可能性が高いが、最適モードが選択された後に認識結果の表示が固定されると、ユーザは、それをもって認識結果が確定したことを認識することができる。 On the display screen, the recognition result of each time is displayed on the same screen as the moving image being shot. Therefore, when the alignment of the camera 2 to the character string is completed, the recognition result is quickly confirmed while checking the current shooting state. Appropriateness can be judged. As described above, the recognition result display is likely to be unstable while the shooting mode is switched, but if the recognition result display is fixed after the optimum mode is selected, the user Can recognize that the recognition result has been confirmed.

認識結果の不安定表示が長く続いたり、固定表示された認識結果に誤りがあった場合には、ユーザは、再度の読取開始操作を行わなくとも、カメラ２の位置や姿勢を変更して読取作業を続けることができる。上記したように、認識結果の不安定表示が固定表示に変わったときに、認識結果が確定したと解釈することができるので、ユーザは、この表示の変更や固定表示の内容に応じて、カメラ２を動かすか否かや動かすタイミングを判断し、表示中のリアルタイムの動画像を参照しながら撮影位置や撮影方向を調整することができる。その調整によって、これまでの最適モードが適合しなくなると、再度の撮影モードの切り替えを経た後に最適モードが更新される。ユーザによるカメラ２の位置や姿勢の調整が認識に適したものであれば、その調整に追随して設定された最適モードにより画像の画質が向上する可能性が高い。
よって誤認識された箇所を正しく認識できる可能性が高められる。 If the recognition result is unstablely displayed for a long time or there is an error in the fixedly displayed recognition result, the user can change the position and orientation of the camera 2 and read it without performing another reading start operation. You can continue working. As described above, when the unstable display of the recognition result is changed to the fixed display, it can be interpreted that the recognition result is confirmed, so that the user can change the display according to the change of the display or the content of the fixed display. It is possible to determine whether or not to move 2 and the timing of movement, and adjust the shooting position and shooting direction while referring to the real-time moving image being displayed. If the adjustment of the optimum mode so far becomes inadequate due to the adjustment, the optimum mode is updated after the switching of the photographing mode again. If the adjustment of the position and orientation of the camera 2 by the user is suitable for recognition, there is a high possibility that the image quality of the image is improved by the optimum mode set following the adjustment.
Therefore, the possibility that a misrecognized location can be correctly recognized is increased.

認識対象の文字列が変更された場合には、認識結果の整合性が得られなくなったことをもって３０回のうちの１回目に処理が戻るので、新たな認識対象の文字列に適した撮影モードを速やかに選択することができる。よって、前回の処理対象の文字列に対するのとは異なる向きにカメラ２が向けられて、周囲の明るさが大きく変わった場合でも、速やかに最適な撮影モードを選択して、精度の良い認識処理を行うことができる。 When the character string to be recognized is changed, the process returns to the first of 30 times when the consistency of the recognition result cannot be obtained, so that the shooting mode suitable for the new character string to be recognized Can be selected promptly. Therefore, even when the camera 2 is pointed in a direction different from that of the character string to be processed last time and the ambient brightness has changed greatly, the optimum shooting mode is quickly selected and accurate recognition processing is performed. It can be performed.

このように、図６および図７に示した処理手順によれば、ユーザは、タッチパネル３に表示される動画像と認識結果とを参照しながら、確度の高い認識結果が表示されるまでカメラ２の位置や姿勢を変更しながら撮影を行い、所望の結果を得ることができる。よってユーザの操作手順を複雑にすることなく、精度の良い認識処理を行うことが可能になり、利便性が高められる。 As described above, according to the processing procedure illustrated in FIGS. 6 and 7, the user can refer to the moving image displayed on the touch panel 3 and the recognition result until the highly accurate recognition result is displayed. It is possible to obtain a desired result by taking a picture while changing the position and posture. Therefore, it is possible to perform a recognition process with high accuracy without complicating the user's operation procedure, and convenience is improved.

なお、図６には示していないが、この処理においては、入出力インタフェース１４の機能によって、適宜、表示中の認識文字列を他のアプリケーションに出力する操作を受け付けることができる。その操作を受け付けた場合には、定期的な繰り返し処理を終了または中止して、認識した文字列を示すテキストデータを指定されたアプリケーションに出力する。これにより、読み取られた文字列をアドレス帳に登録したり、メモ帳、翻訳用アプリケーションなどで利用することができる。 Although not shown in FIG. 6, in this process, the function of the input / output interface 14 can appropriately accept an operation for outputting the recognized character string being displayed to another application. When the operation is accepted, the periodic repetitive processing is terminated or stopped, and text data indicating the recognized character string is output to the designated application. As a result, the read character string can be registered in the address book, used in a memo pad, a translation application, or the like.

また、上記の実施例では、文字認識処理の都度、撮影モードを切り替えたが、切り替えのタイミングは各回毎に限らず、たとえば、同じ撮影モードによる認識処理を２回ずつ実施するようにして、２回目の処理終了後に次の撮影モードを選択してもよい。また、撮影モードを毎回切り替える場合に最適モードを選択するタイミングも、切り替えが２回循環した場合に限らず、より多くの回数の循環を経てからにしてもよい。 In the above embodiment, the shooting mode is switched every time the character recognition process is performed. However, the switching timing is not limited to each time. For example, the recognition process in the same shooting mode is performed twice. The next shooting mode may be selected after the end of the process. In addition, the timing for selecting the optimum mode when switching the shooting mode every time is not limited to the case where the switching is repeated twice, and may be performed after a greater number of circulations.

図３〜図５の具体例では、撮影モードを３種類としたが、より多くの撮影モードを設定してもよい。また、各撮影モードで調整するパラメータの種類も、先に述べたものに限定されるものではない。加えて、認識処理手段に、輝度調整やずれ補正などの前処理を実行する機能を持たせて、この前処理を行うか否かの選択を加えた撮影モードを設定してもよい。 In the specific examples of FIGS. 3 to 5, there are three shooting modes. However, more shooting modes may be set. Also, the types of parameters to be adjusted in each shooting mode are not limited to those described above. In addition, the recognition processing means may have a function of executing preprocessing such as brightness adjustment and deviation correction, and a shooting mode may be set in which selection of whether or not to perform this preprocessing is added.

また、上記では、認識結果の固定表示によって認識結果が確定すると解釈できるとしたが、確定されたことをより明確に表示してもよい。たとえば、最適モードによる認識結果が前回の認識結果に完全一致した回数を計数し、その計数値が所定値（たとえば１０回とする。）に達したときに、認識結果の表示形態を変更する方法が考えられる。 In the above description, it can be interpreted that the recognition result is confirmed by the fixed display of the recognition result. However, the confirmation result may be displayed more clearly. For example, a method of counting the number of times that the recognition result in the optimum mode completely matches the previous recognition result, and changing the display form of the recognition result when the counted value reaches a predetermined value (for example, 10 times). Can be considered.

ここまで、複数とおりの撮影モードの中から最適なモードを選択して、この最適モードによる認識処理によって確度の高い文字認識を行う実施例を説明したが、文字認識処理に対して選択できる要素は撮影モードのほかにも存在する。その一例として、文字画像との照合に用いる文字モデルの種別の設定を切り替える例を説明する。 Up to this point, an embodiment has been described in which an optimum mode is selected from a plurality of shooting modes, and character recognition with high accuracy is performed by recognition processing in the optimum mode. Elements that can be selected for character recognition processing are as follows. Other than shooting mode. As an example, an example will be described in which the setting of the type of character model used for collation with a character image is switched.

図８は、文字認識処理において使用される文字モデルの設定を３種類のモードに分け、これらの設定モードを順に切り替えた例を示す。
具体的に、この実施例では、明朝系およびゴシック系のフォントデータを使用する第１の設定モードと、丸文字ポップ系のフォントデータを使用する第２の設定モードと、行書体系のフォントデータを使用する第３の設定モードとを設けている。そして、これら３種類の設定モードを、認識処理の都度、順に切り替えながら、モード別に平均信頼度を求め、切り替えが二巡したときに、平均信頼度が最も高い設定モードを最適モードとして選択する。７回目以降の認識処理では、最適モードの選択が維持されて、認識処理が継続される。 FIG. 8 shows an example in which the setting of the character model used in the character recognition process is divided into three types of modes, and these setting modes are sequentially switched.
Specifically, in this embodiment, a first setting mode that uses Mincho and Gothic font data, a second setting mode that uses round character pop-type font data, and line-of-book font data. And a third setting mode that uses. Then, these three types of setting modes are sequentially switched every time recognition processing is performed, and the average reliability is obtained for each mode. When the switching is repeated twice, the setting mode having the highest average reliability is selected as the optimum mode. In the seventh and subsequent recognition processes, the selection of the optimum mode is maintained and the recognition process is continued.

図示例では、３種類の設定モードのうちの第２の設定モード（丸文字ポップ系）における平均信頼度が最も高くなっている。これに応じて第２の設定モードが最適モードに選択されて、７回目以降の認識処理が進められる。 In the illustrated example, the average reliability in the second setting mode (round character pop type) among the three types of setting modes is the highest. In response to this, the second setting mode is selected as the optimum mode, and the seventh and subsequent recognition processes are advanced.

上記の処理によれば、ユーザが特に意識して認識対象の文字列に適したフォントデータを選択しなくとも、適切なフォントデータが自動的に選択されて、そのフォントデータによる文字モデルを用いた文字認識処理が行われるので、認識結果の確度を大幅に高めることができる。なお、文字モデルの種別分けはフォントデータに限らず、言語別に複数種の文字モデルを登録して、それぞれの言語毎の設定モードを用いて図８と同様の処理を行うようにしてもよい。 According to the above processing, even if the user is not particularly conscious of selecting font data suitable for the character string to be recognized, appropriate font data is automatically selected, and the character model based on the font data is used. Since the character recognition process is performed, the accuracy of the recognition result can be greatly increased. The classification of character models is not limited to font data, and a plurality of types of character models may be registered for each language, and the same processing as in FIG. 8 may be performed using the setting mode for each language.

また、図８の例にも、図６および図７に示した処理手順を適用して、設定モードを切り替えながらの認識処理と、この切り替え期間での認識結果の分析により選択された最適モードによる認識処理とを定期的に繰り返すようにしてもよい。この場合にも、前回の認識結果に対する整合性が得られなくなったときには、処理は１回目に戻るので、１つの文字列に対する認識処理が終了して、他の異なるフォントの文字列に認識対象が変更された場合でも、その変更に速やかに対応して最適なフォントデータによる認識処理を実施することができる。 Further, the processing procedure shown in FIGS. 6 and 7 is applied to the example of FIG. 8 as well, depending on the recognition process while switching the setting mode and the optimum mode selected by the analysis of the recognition result in this switching period. The recognition process may be repeated periodically. Also in this case, when the consistency with the previous recognition result cannot be obtained, the processing returns to the first time, so that the recognition processing for one character string is finished, and the character string of another different font has a recognition target. Even if it is changed, the recognition process using the optimum font data can be performed in response to the change promptly.

さらに、文字モデルに関する設定モードと撮影モードとの組み合わせを１つの設定モードとして、複数とおりの設定モードを準備し、同様の方法により、これらの設定モードの中から認識処理に最適なモードを選択してもよい。 Furthermore, a combination of the setting mode related to the character model and the shooting mode is set as one setting mode, and a plurality of setting modes are prepared, and the optimum mode for recognition processing is selected from these setting modes by the same method. May be.

また、各実施例では、撮影された文字列を読み取った結果を、そのまま認識結果として表示することを前提にしたが、これに代えて、読み取られた文字列が示す単語を他の言語に翻訳して、その翻訳結果を示す文字列を表示してもよい。 In each embodiment, it is assumed that the result of reading a captured character string is displayed as a recognition result as it is. Instead, the word indicated by the read character string is translated into another language. Then, a character string indicating the translation result may be displayed.

上記の各実施例が適用されるＯＣＲアプリケーション１は、スマートフォンに限らず。スマートフォン以外の携帯電話や通信機能を持たない情報処理装置（たとえばデジタルビデオカメラなど）や、汎用のデジタルビデオカメラに接続された情報処理装置（たとえばパーソナルコンピュータ）にも、組み込むことができる。 The OCR application 1 to which each of the above embodiments is applied is not limited to a smartphone. It can also be incorporated into an information processing apparatus (for example, a digital video camera) having no communication function, such as a mobile phone other than a smartphone, or an information processing apparatus (for example, a personal computer) connected to a general-purpose digital video camera.

１ＯＣＲアプリケーション
２カメラ
３タッチパネル
１０ライブラリ（文字読取装置用のプログラム群）
１１文字認識処理部
１２制御部
１３カメラインタフェース
１４入出力インタフェース
Ｍ１〜Ｍ４認識結果の文字列 1 OCR Application 2 Camera 3 Touch Panel 10 Library (Program Group for Character Reader)
11 Character recognition processing unit 12 Control unit 13 Camera interface 14 Input / output interface M1 to M4 Character string of recognition result

Claims

A computer connected to a camera having a video shooting function and a display unit for displaying a moving image generated by the camera is inputted with a moving image generated by shooting a character string, and characters in the moving image are input. A program for functioning as a character recognition device for recognizing information indicated by
While the character string is photographed, a recognition processing unit that repeats a process of inputting the latest frame image by the photographing and recognizing characters in the image and a process of calculating the reliability of the recognition result. ,
Mode control means for selecting one of a plurality of setting modes that define at least one of the image quality of the frame image to be subjected to character recognition processing and the operation of the recognition processing means;
Causing the computer to function as each means of display control means for displaying the character information based on the recognition result of each time by the recognition processing means on the same screen as the moving image display of the display unit;
The mode control means performs a first step of analyzing each recognition result while sequentially switching the plurality of setting modes in accordance with a processing cycle of the recognition processing means, and a recognition satisfying a predetermined consistency condition. Based on the result of the analysis, the setting mode with the highest reliability of the recognition result is selected based on the predetermined number of times of switching of the setting mode in a state where the result is obtained every cycle, and the recognition thereafter Executing the second step of maintaining the selection in the processing for a plurality of cycles of the processing means,
A character recognition program characterized by the above.

2. The character recognition program according to claim 1, wherein the mode control means returns to the first process of the first step in response to the recognition processing means having processed a predetermined number of frame images in the second step. 3. .

3. The mode control unit according to claim 1, wherein the mode control unit returns to the first process of the first step when the latest recognition result by the recognition processing unit becomes inconsistent with the previous recognition result. A program for character recognition.

The program for character recognition according to any one of claims 1 to 3, wherein the mode control means selects a plurality of setting modes having different setting contents of shooting parameters of the camera.

4. The mode control unit according to claim 1, wherein the mode control unit selects a plurality of setting modes having different types of character models used when the recognition processing unit collates characters detected from an image for recognition. The program for character recognition described in either.

A device that inputs a moving image of a character string generated by a camera having a video shooting function and recognizes information indicated by characters in the moving image,
A display unit for displaying a moving image generated by the camera;
A recognition processing unit that repeats a process of inputting the latest frame image by the shooting and recognizing the character in the image and a process of obtaining the reliability of the recognition result while the character string is being shot. ,
Mode control means for selecting one of a plurality of setting modes that define at least one of the image quality of the frame image to be subjected to character recognition processing and the operation of the recognition processing means;
Display control means for displaying the character information based on the recognition result of each time by the recognition processing means on the same screen as the moving image display of the display unit,
The mode control means includes a first step of analyzing reliability of each recognition result while sequentially switching the plurality of setting modes in accordance with a processing cycle of the recognition processing means, and a predetermined consistency condition Under the condition that the recognition result satisfying each cycle is obtained every cycle, the setting mode with the highest reliability of the recognition result is selected based on the result of the analysis in response to the switching of the setting mode being circulated a predetermined number of times. Performing the second step of maintaining the selection in the processing for a plurality of cycles of the recognition processing means,
A character recognition device.