JP6598402B1

JP6598402B1 - Receipt and other form image automatic acquisition / reading method, program, and portable terminal device

Info

Publication number: JP6598402B1
Application number: JP2018159216A
Authority: JP
Inventors: 敏郎松村; 敬宇蓑和
Original assignee: 株式会社アイエスピー
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2019-10-30
Anticipated expiration: 2038-08-28
Also published as: JP2020035051A

Abstract

【課題】文字読取処理に適するように、レシート等帳票画像を容易に取得するための方法、プログラム、及び携帯端末装置を提供する。【解決手段】撮影手段を有する携帯端末装置を用いて、帳票を撮影した帳票画像を文字読取に適するように取得する方法であって、撮影手段によるプレビュー画像の撮影中に、複数のプレビュー画像のそれぞれから帳票を画成するエッジを決定するステップと、それぞれ決定されたエッジを比較し、エッジが安定しているか否かを判定するステップと、エッジが安定したと判定されたときに、エッジの内側の所定の領域に関してピントが合っているか否かを判定するステップと、ピントが合っていると判定されたときに文字読取のための帳票画像を取得するステップと、を含む。【選択図】図１３A method, a program, and a portable terminal device for easily obtaining a form image such as a receipt so as to be suitable for a character reading process. A method of acquiring a form image obtained by photographing a form so as to be suitable for character reading using a portable terminal device having a photographing means, wherein a plurality of preview images are captured during photographing of the preview image by the photographing means. A step of determining an edge that forms a form from each of the steps, a step of comparing each determined edge and determining whether or not the edge is stable, and when the edge is determined to be stable, A step of determining whether or not a predetermined area on the inside is in focus, and a step of acquiring a form image for character reading when it is determined that the focus is in focus. [Selection] Figure 13

Description

本発明は、携帯端末装置の撮影手段を用いて帳票画像を文字読取のために取得する方法に関する。特に、サイズが不定形なレシートの読取処理に適するように、レシート画像を容易に取得するための方法、プログラム、及び携帯端末装置に関する。 The present invention relates to a method for acquiring a form image for reading a character using a photographing unit of a portable terminal device. In particular, the present invention relates to a method, a program, and a portable terminal device for easily acquiring a receipt image so as to be suitable for a receipt reading process having an irregular size.

スマートフォン等携帯端末装置を用いてレシートを撮影し、レシート画像から文字読取した結果を利用する各種アプリケーションが知られている。 Various applications are known in which a receipt is photographed using a mobile terminal device such as a smartphone, and a result obtained by reading characters from a receipt image is used.

アプリケーションのユーザによるレシートの撮影は、例えば、レシートをテーブルに載せるなどして固定し、携帯端末装置のカメラを起動しファインダ内にレシートが収まるように構え、手振れやピンボケがないようにシャッタボタンを押下して行われる。撮影されたレシート画像のＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ：光学文字認識）処理結果はさまざまな用途に使用され得る。 When taking a receipt by an application user, for example, the receipt is placed on a table and fixed, the camera of the mobile terminal device is activated, the receipt is placed in the viewfinder, and a shutter button is pressed to prevent camera shake and blurring. This is done by pressing. The OCR (Optical Character Recognition) processing result of the photographed receipt image can be used for various applications.

ＯＣＲ処理では画像に含まれる一文字を構成する画素数が一定以上（例えば、２４×２４ドット以上）であることが好ましい。一文字を構成する画素数が少ないと文字読取できないことがある。 In the OCR processing, it is preferable that the number of pixels constituting one character included in the image is equal to or greater than a certain value (for example, 24 × 24 dots or more). Characters may not be read if the number of pixels constituting one character is small.

レシート全体をスマートフォン等のファインダに収めて撮影する場合、例えば、レシートの縦横比がスマートフォンの画面に対応すれば画面を広く使って撮影できるので、一文字を構成する画素数は多くなる。レシートは概して、購入商品が増えるほど縦に長くなり、細長いレシートを画面に収めようとすると一文字当たりの画素数は減少する。このため、小さな画像サイズ（例えば、２ＭＰ、３ＭＰ等）では、長尺のレシートは実質的に文字認識が困難だった。 When the entire receipt is photographed in a finder such as a smartphone, for example, if the receipt has an aspect ratio corresponding to the screen of the smartphone, it can be photographed using the screen widely, so the number of pixels constituting one character increases. Receipts generally become longer and longer as the number of purchased items increases, and the number of pixels per character decreases when an elongated receipt is placed on the screen. For this reason, with a small image size (for example, 2MP, 3MP, etc.), it is substantially difficult to recognize characters on a long receipt.

本発明の発明者により、レシートに印字された行数と、画像サイズと、ＯＣＲの誤読率との関連が調査された。印字された行数がおよそ５０行未満（短いレシート）では、画像サイズが３ＭＰ（メガピクセル）、８ＭＰ、１２ＭＰのいずれでも誤読率は０パーセントから１０〜２０パーセント前後と低かった。レシートの行数がおよそ５０行以上（長いレシート）になると、３ＭＰでは誤読率が３０パーセント前後と高くなり、およそ６０行以上で５０パーセントを超えた。これに対し、８ＭＰ、１２ＭＰでは、レシートの行数がおよそ５０行以上でも誤読率は概ね１０パーセント前後と低く、誤読率が３０パーセントを超えやすくなるのは、８ＭＰでおよそ８０行以上、１２ＭＰでおよそ９０行以上であった。近年、スマートフォン等の携帯端末装置はいわゆる４Ｋ（３８４０×２１６０、８ＭＰ）動画等に対応する傾向にあり、従って長尺のレシートを成功裏に文字読取しやすくなってきたと言える。 The inventor of the present invention investigated the relationship between the number of lines printed on a receipt, the image size, and the OCR misread rate. When the number of printed lines was less than about 50 lines (short receipt), the misreading rate was low from 0 percent to around 10 to 20 percent regardless of whether the image size was 3MP (megapixel), 8MP, or 12MP. When the number of rows of receipts was about 50 or more (long receipt), the misreading rate was high at around 30 percent with 3MP, and exceeded 50 percent with about 60 or more rows. On the other hand, in 8MP and 12MP, even if the number of receipt lines is about 50 or more, the misreading rate is as low as about 10%, and the misreading rate tends to exceed 30%. About 90 lines or more. In recent years, mobile terminal devices such as smartphones have a tendency to support so-called 4K (3840 × 2160, 8MP) moving images, and thus it can be said that it has become easier to read characters on a long receipt successfully.

しかしながら、概して、シャッタボタンを押下する動作は手振れを起こしやすく、画像がピンボケであるとＯＣＲで文字読取できないことがある。手振れしないように一方の手で端末を保持し、他方の手でシャッタボタンを押下すると、両手がふさがる。そうなると、例えば、レシートが照明の真下で影にならないように空いた手で適宜位置を調整したり、長いレシートがカールしないように押さえたり、空中に持ち上げたり壁で支えたりして撮影することはできない。ＯＣＲ処理に適するように撮影できなければ、結局アプリケーションはエラーを返し、ユーザは再び撮影を要求される。そのようなエラーは繰り返されやすく、エラーが続くことにより、ユーザは撮影をあきらめてアプリケーションから離脱するという問題がある。 However, generally, the operation of pressing the shutter button tends to cause camera shake, and if the image is out of focus, characters may not be read by OCR. Holding the terminal with one hand so as not to shake, and pressing the shutter button with the other hand, both hands are blocked. In that case, for example, it is not possible to adjust the position with a free hand so that the receipt does not become a shadow under the lighting, hold the long receipt so that it does not curl, lift it in the air or support it with a wall and shoot it Can not. If the image cannot be shot so as to be suitable for the OCR process, the application eventually returns an error, and the user is requested to capture again. Such an error is likely to be repeated, and there is a problem that the user gives up shooting and leaves the application when the error continues.

従来、認識対象物（被写体）やカメラがぶれやすい状態で撮影される画像から所望の情報を高精度に認識することができる画像処理装置として、カメラで撮影された画像内に存在する認識対象物を検出する認識対象検出部と、認識対象物の画像領域が認識可能な状態か否かを判定する認識対象状態判定部等を備える画像処理装置が提案された（特開２０１０−２１８０６１号公報：特許文献１）。 2. Description of the Related Art Conventionally, as an image processing apparatus capable of accurately recognizing desired information from a recognition target (subject) or an image captured in a state where the camera is likely to shake, a recognition target existing in an image captured by a camera Has been proposed (Japanese Patent Laid-Open No. 2010-218061), which includes a recognition target detection unit for detecting the image and a recognition target state determination unit for determining whether or not the image area of the recognition target is in a recognizable state. Patent Document 1).

多機能携帯端末の利用者が各種のキャンペーンに円滑かつ迅速に応募することができるように、長尺のレシートの撮影時に、長尺のレシートの長さ方向への分割撮影を指示することや、レシートの撮影時にレシートに対して多機能携帯端末の焦点が合わない場合、多機能携帯端末のシャッタを切ることを不能にすることが提案された（特開２０１６−５７６７６号公報：特許文献２）。 In order to allow users of multi-function mobile devices to apply for various campaigns smoothly and quickly, when taking a long receipt, instructing the divisional shooting in the length direction of the long receipt, It has been proposed to make it impossible to release the shutter of the multifunctional portable terminal when the receipt of the multifunctional portable terminal is not focused on the receipt (Japanese Patent Laid-Open No. 2006-57676). .

撮影サイズやピントの調整が必要なカメラを使いＯＣＲを行う場合の、何度も撮影をやり直す手間を解消する情報処理装置、プログラム及び制御方法として、連続撮影の中でリアルタイムにキーワード部分を判定し、さらにその近傍の候補を選択可能な形式で提示することで、ユーザが容易に必要なテキストのみを選択して取り込みし、静止画撮影時に発生していた撮影を繰り返す手間を解消することが提案された（特開２０１７−１１７０２７号公報：特許文献３）。 When performing OCR using a camera that requires adjustment of shooting size and focus, the keyword part is determined in real time during continuous shooting as an information processing device, program, and control method that eliminates the trouble of re-shooting many times. In addition, by presenting the candidates in the vicinity in a selectable format, it is proposed that the user can easily select and capture only the necessary text and eliminate the trouble of repeating the shooting that occurred at the time of still image shooting. (Japanese Patent Laid-Open No. 2017-1117027: Patent Document 3).

紙面を撮影して得られる紙面画像から画像の部分を自動的にスクラップする方法として、紙面画像に関する複数の仕切りエッジのうち、水平方向に伸長する上下の水平エッジ及び垂直方向に伸長する左右の垂直エッジから成るエッジのセットに基づいて紙面画像の区分を検出し、それぞれの区分に対応するエッジのセットに基づいてマップ領域を決定し、該マップ領域を構成する複数の画素がマップ番号に関連付けられるマップ画像を生成し、紙面画像を、スクラップする領域とスクラップしない領域とに区分して表示させることが提案された（特開２０１８−９７５５１号公報：特許文献４）。 As a method of automatically scraping a part of an image from a paper image obtained by photographing a paper surface, among a plurality of partition edges related to the paper image, upper and lower horizontal edges extending in the horizontal direction and left and right vertical extending in the vertical direction A section of a paper image is detected based on a set of edges composed of edges, a map area is determined based on a set of edges corresponding to each section, and a plurality of pixels constituting the map area are associated with a map number. It has been proposed that a map image is generated and a paper image is divided into a scraping area and a non-scraping area (Japanese Patent Laid-Open No. 2018-97551: Patent Document 4).

特開２０１０−２１８０６１号公報JP 2010-218061 A 特開２０１６−５７６７６号公報JP, 2006-57676, A 特開２０１７−１１７０２７号公報JP 2017-1117027 A 特開２０１８−９７５５１号公報JP-A-2018-97551

従来技術は、撮影対象の帳票のためのガイド枠等を用いて、フォーカスの状態や認識可能か否かを判定していた。このため、レシートのように、全体サイズが不定形のものをファインダに収めて撮影する場合、ガイドを予め設定することができず、認識可能かどうか判定できなかった。長尺のレシートを分割して撮影することは煩雑で実用性に乏しかった。また、連続撮影中にキーワードのＯＣＲ処理を行うことを要し、時間のかかるＯＣＲを行う前にＯＣＲに適するかどうかを予め判定することはできなかった。 The prior art uses a guide frame or the like for a form to be photographed to determine the focus state and whether or not it can be recognized. For this reason, when taking an image with an indeterminate overall size in a finder like a receipt, the guide cannot be set in advance, and it cannot be determined whether or not it can be recognized. It was cumbersome and impractical to divide and photograph long receipts. Further, it is necessary to perform OCR processing of keywords during continuous shooting, and it has not been possible to determine in advance whether or not it is suitable for OCR before performing time-consuming OCR.

上記に鑑みて本発明は、ＯＣＲ等文字読取処理を行う前に、文字読取処理に適するように帳票画像を取得する方法等を提供することを目的とする。特に、サイズが不定形のレシートを分割せずに撮影する際に、ユーザがシャッタボタンを押下したり、ピント状態を判定することを要さず自動的に適切な画像を取得する方法、プログラム、及び装置を提供することを目的とする。また、操作性を向上させるように、レシート等帳票にスマートフォン等携帯端末装置のカメラをかざすだけで自動的に文字認識を行うことができる方法、プログラム、及び装置を提供することを目的とする。 In view of the above, an object of the present invention is to provide a method for acquiring a form image so as to be suitable for character reading processing before performing character reading processing such as OCR. In particular, a method, a program for automatically acquiring an appropriate image without requiring the user to press a shutter button or determine a focus state when shooting an unsized receipt without dividing it, And an apparatus. It is another object of the present invention to provide a method, a program, and an apparatus that can automatically perform character recognition by simply holding a camera of a mobile terminal device such as a smartphone over a form such as a receipt so as to improve operability.

上記課題を解決するための本発明の一つの態様は、撮影手段を有する携帯端末装置を用いて、帳票を撮影した帳票画像を文字読取に適するように取得する方法であって、撮影手段によるプレビュー画像の撮影中に、複数のプレビュー画像のそれぞれから帳票に関するエッジを決定するステップと、それぞれ決定されたエッジを比較し、エッジが安定しているか否かを判定するステップと、エッジが安定したと判定されたときに、エッジの内側の所定の領域に関してピントが合っているか否かを判定するステップと、ピントが合っていると判定されたときに文字読取のための帳票画像を取得するステップと、を含む。 One aspect of the present invention for solving the above-described problem is a method of acquiring a form image obtained by photographing a form so as to be suitable for character reading using a portable terminal device having a photographing unit, the preview by the photographing unit While capturing an image, the step of determining an edge related to a form from each of a plurality of preview images, the step of comparing each determined edge, determining whether the edge is stable, and the edge being stable Determining whether or not a predetermined area inside the edge is in focus when determined, and acquiring a form image for character reading when it is determined to be in focus ,including.

単に、エッジが安定したときに画像を取得するとピントが合っていない場合があり、単に、ピントが合っているときに画像を取得すると、例えば、手振れ防止やスポーツ撮影機能を有し、動いている対象物にピントを合わせる携帯端末装置では、対象のレシートが写っていない場合などがある。本発明によれば、エッジの安定を判定し、且つ、エッジの内側のピントを判定して画像を取得するため、文字認識に適した画像が自動的に取得され得る。 If the image is simply acquired when the edge is stable, the image may not be in focus. If the image is acquired only when the image is in focus, for example, it has camera shake prevention and sports shooting functions and is moving. In a mobile terminal device that focuses on an object, the receipt of the object may not be shown. According to the present invention, since the image is acquired by determining the stability of the edge and determining the focus inside the edge, an image suitable for character recognition can be automatically acquired.

帳票はレシートを含む。本発明によれば、レシートのように不定形な帳票であっても、エッジを決定し、ピントを判定して文字読取に適した帳票画像を取得することができる。 The form includes a receipt. According to the present invention, even for an irregular form such as a receipt, a form image suitable for character reading can be acquired by determining an edge and determining a focus.

帳票のエッジは上下のエッジ及び左右のエッジから成り、上下のエッジ及び左右のエッジが、プレビュー画像を解析するための解析画像を水平方向及び垂直方向にブロック化し、最多で８つの白地部分を検出することにより決定される。このようにすることで、レシート等帳票の背景が様々であってもエッジを決定し得る。 The edges of the form consist of top and bottom edges and left and right edges. The top and bottom edges and left and right edges block the analysis image for analyzing the preview image in the horizontal and vertical directions, and detect a maximum of eight white background parts. To be determined. By doing in this way, an edge can be determined even if the background of a form such as a receipt is various.

ピントが合っているか否かは、プレビュー画像のグレースケール画像をエンボス加工し、所定の領域の輝度値毎の画素数（ヒストグラム）を求めることにより判定される。このようにすることで、ＣＰＵに負荷をかけることなく、容易にピント状態を判定することができる。 Whether or not the image is in focus is determined by embossing the grayscale image of the preview image and obtaining the number of pixels (histogram) for each luminance value in a predetermined area. In this way, the focus state can be easily determined without imposing a load on the CPU.

本発明に係る帳票画像取得方法は、さらに、撮影手段の起動時、エッジが決定されないとき、及び／又はピントが合わないと判定されたときに、撮影手段のオートフォーカス（ＡＦ）動作を要求するステップを含む。本発明では、文字読取に適した画像を取得するために、フォーカスのタイミングがコントロールされることが好ましい。適切にワンショットのＡＦが実行されることで、文字読取に適した画像をより取得しやすくなると考えられる。 The form image acquisition method according to the present invention further requests an autofocus (AF) operation of the photographing means when the photographing means is activated, when an edge is not determined, and / or when it is determined that the focus is not achieved. Includes steps. In the present invention, it is preferable that the focus timing is controlled in order to obtain an image suitable for character reading. It is considered that it is easier to acquire an image suitable for character reading by appropriately performing one-shot AF.

エッジが決定されたときに、該決定されたエッジをガイドとして携帯端末装置の画面に表示させることが好ましい。このようにすることでユーザがカメラをかざす目安となり、エッジがより安定しやすくなると考えられる。 When the edge is determined, it is preferable to display the determined edge on the screen of the mobile terminal device as a guide. By doing so, it is considered that the user can hold the camera over and the edge becomes more stable.

本発明に係る帳票画像取得方法は、さらに、取得された帳票画像において文字読取の対象とする領域をエッジ及び／又は白地部分に基づいてクリッピングし文字を読取するステップと、を含む。このようにすることで、予め画像における文字読取の範囲を設定しなくても、文字読取の対象でない背景を除去することで、高精度に文字読取することができる。画像の取得から文字読取まで自動的に行われ、操作性の高いアプリケーションを実現することができる。 The form image acquisition method according to the present invention further includes a step of reading a character by clipping a region to be read in the acquired form image based on an edge and / or a white background portion. In this way, it is possible to read a character with high accuracy by removing a background that is not a character reading target without setting a character reading range in the image in advance. It is automatically performed from image acquisition to character reading, and an application with high operability can be realized.

本発明の他の態様は、上記のいずれかに記載された方法を携帯端末装置のコンピュータに実行させるプログラムである。 Another aspect of the present invention is a program that causes a computer of a mobile terminal device to execute any of the methods described above.

本発明のもう一つの態様は、帳票にかざして帳票画像を取得するための携帯端末装置であって、帳票を撮影するための撮影手段と、撮影手段によるプレビュー画像の撮影中に、複数のプレビュー画像のそれぞれから帳票に関するエッジを決定するエッジ決定手段と、それぞれ決定されたエッジを比較し、エッジが安定しているか否かを判定するエッジ安定判定手段と、エッジが安定したと判定されたときに、エッジの内側の所定の領域に関してピントが合っているか否かを判定するピント判定手段と、ピントが合っていると判定されたときに帳票画像を取得する画像取得手段と、を備える。 Another aspect of the present invention is a portable terminal device for acquiring a form image over a form, and a plurality of previews during photographing of a preview image by the photographing means and photographing means for photographing the form. When it is determined that the edge is determined to be stable, edge determination means for determining the edge of the form from each of the images, edge stability determination means for comparing each determined edge and determining whether the edge is stable In addition, a focus determination unit that determines whether or not a predetermined area inside the edge is in focus, and an image acquisition unit that acquires a form image when it is determined that the focus is in focus.

本発明によれば、４Ｋや８Ｋ等のプレビュー画像から、文字読取に適したプレビュー画像を、ユーザがシャッタボタンを押下したりピントの状態を確認したりすることを要さず、自動的に取得することができる。ユーザは、レシートの長さに関わらず、単純に画面に収まるようにスマートフォン等のカメラを片手でかざすだけでＯＣＲ等文字読取に適した画像を取得することができ、アプリケーションの操作性を向上させることができる。ユーザがアプリケーションから離脱することがなく、種々様々なアプリケーションで文字読取結果を利用することができる。 According to the present invention, a preview image suitable for character reading is automatically obtained from a preview image such as 4K or 8K without requiring the user to press the shutter button or confirm the focus state. can do. Regardless of the length of the receipt, the user can acquire an image suitable for character reading such as OCR by simply holding the camera such as a smartphone with one hand so that it fits on the screen, improving the operability of the application be able to. The user does not leave the application, and the character reading result can be used in various applications.

図１は本発明に係る携帯端末装置の機能ブロック図である。FIG. 1 is a functional block diagram of a portable terminal device according to the present invention. 図２は画像のブロック化を模式的に示す。FIG. 2 schematically illustrates image blocking. 図３はレシートを撮影したプレビュー画像を模式的に示す。FIG. 3 schematically shows a preview image obtained by photographing a receipt. 図４は本発明に係る水平ブロック化による垂直ブランクの検出を模式的に示す。FIG. 4 schematically shows detection of a vertical blank by horizontal blocking according to the present invention. 図５は本発明に係る垂直ブロック化による水平ブランクの検出を模式的に示す。FIG. 5 schematically shows detection of a horizontal blank by vertical blocking according to the present invention. 図６は本発明に係る第１のエッジの決定と文字読取対象範囲の決定を模式的に示す。FIG. 6 schematically shows determination of the first edge and determination of the character reading target range according to the present invention. 図７は本発明の一つの実施形態のブロック、ブランク、及びレクタングルの関係を模式的に示す。FIG. 7 schematically shows the relationship between a block, a blank, and a rectangle according to an embodiment of the present invention. 図８はレシートを撮影したプレビュー画像を模式的に示す。FIG. 8 schematically shows a preview image obtained by photographing a receipt. 図９は本発明に係る水平ブロック化による水平スペースの検出を模式的に示す。FIG. 9 schematically shows detection of a horizontal space by horizontal blocking according to the present invention. 図１０は本発明に係る垂直ブロック化による垂直スペースの検出を模式的に示す。FIG. 10 schematically shows detection of a vertical space by vertical blocking according to the present invention. 図１１は本発明に係る第２のエッジの決定と文字読取対象範囲の決定を模式的に示す。FIG. 11 schematically shows determination of the second edge and determination of the character reading target range according to the present invention. 図１２は本発明の一つの実施形態に係るフォーカスヒストグラムである。FIG. 12 is a focus histogram according to one embodiment of the present invention. 図１３は本発明の一つの実施形態に係る帳票画像自動取得・読取処理のフロー図である。FIG. 13 is a flowchart of a form image automatic acquisition / reading process according to an embodiment of the present invention.

以下、図面を参照しながら、本発明のさまざまな特徴が、本発明の限定を意図するものではない好適な実施例とともに説明される。図面は説明の目的で単純化、概略化されている。 Various features of the present invention will now be described with reference to the drawings, together with preferred embodiments not intended to limit the invention. The drawings are simplified and schematic for illustrative purposes.

図１に、本発明に係る帳票自動読取アプリケーションプログラムを備えた携帯端末装置１００の構成が概略的に示されている。携帯端末装置１００は、例えば、カメラを内蔵したスマートフォン、携帯情報端末（ＰＤＡ)、タブレットＰＣ等の小型コンピュータ装置であってよい。公知の携帯端末装置１００は、ＣＰＵ、ＲＡＭ、ＲＯＭ、ハードディスクなどを実装し、適切なオペレーティングシステム（ＯＳ）の制御の下でプログラミング言語を実行し、様々な処理を実行することができる。 FIG. 1 schematically shows a configuration of a portable terminal device 100 including a form automatic reading application program according to the present invention. The mobile terminal device 100 may be a small computer device such as a smartphone with a built-in camera, a personal digital assistant (PDA), or a tablet PC. A known portable terminal device 100 is mounted with a CPU, RAM, ROM, hard disk, and the like, and can execute various languages by executing a programming language under the control of an appropriate operating system (OS).

本発明に従って、携帯端末装置１００は、撮影手段１、入出力手段２、制御手段３、記憶手段４、エッジ決定手段５、エッジ安定判定手段６、ピント判定手段７、画像取得手段８、読取手段９を備える。 According to the present invention, the mobile terminal device 100 includes a photographing unit 1, an input / output unit 2, a control unit 3, a storage unit 4, an edge determination unit 5, an edge stability determination unit 6, a focus determination unit 7, an image acquisition unit 8, and a reading unit. 9 is provided.

撮影手段１は、携帯端末装置１００に搭載されるデジタルカメラから構成され、撮影対象物であるレシート等帳票を撮影してデジタル画像データに変換することができる。撮影手段１は起動されると、概して、秒間１０〜３０フレームのプレビュー画像の撮影を連続して行うことができる。撮影手段１０はオートフォーカス（ＡＦ）動作を実行することができる。帳票のプレビュー画像は概してカラー画像であり、様々なサイズを有し得る。 The photographing means 1 is composed of a digital camera mounted on the portable terminal device 100, and can photograph a form such as a receipt, which is a photographing object, and convert it into digital image data. When the photographing means 1 is activated, generally, the photographing of the preview image of 10 to 30 frames per second can be continuously performed. The photographing means 10 can execute an autofocus (AF) operation. The form preview image is generally a color image and may have various sizes.

入出力手段２は、公知のタッチパネル等から構成される。例えば、タッチパネルに表示された画面上で本発明に係るアプリケーションプログラムの処理動作開始指令を行うことができる。例えば、画面上のアイコンをタップするとカメラが起動され、ユーザが携帯端末装置をレシートにかざす間に画像の取得から文字の読取まで行われ、適宜「確認」、「保存」、「やり直し」ボタン等がユーザ入力を受け付けるために表示され得る。また入出力手段２は、所定の解像度でプレビュー画像を連続して表示することができ、後述のように検出された帳票エッジがプレビュー画像に重畳して表示され得る。 The input / output means 2 is composed of a known touch panel or the like. For example, the processing operation start command of the application program according to the present invention can be issued on the screen displayed on the touch panel. For example, when the icon on the screen is tapped, the camera is activated, and the user performs from the acquisition of the image to the reading of the character while holding the portable terminal device over the receipt. May be displayed to accept user input. Further, the input / output means 2 can continuously display the preview images with a predetermined resolution, and the form edges detected as described later can be displayed superimposed on the preview image.

制御手段３は、本発明に係る帳票画像自動取得・読取処理を実行するように、各手段を制御するための電子機器、電子回路及び／又はプログラムから成る。制御手段３は、撮影手段１を起動させ、適切なタイミングで撮影手段１がワンショットのＡＦ動作を実行するように制御することができる。撮影手段１により連続して撮影される画像を適切なタイミングでエッジ検出等の解析のための画像として受付するよう制御することができる。適切なタイミングとして、例えば、ワンショットのフォーカス動作が終わるまでプレビュー画像をスキップし、フォーカス動作が終わったときにプレビュー画像を解析のために受け付ける。また、解析結果等を入出力手段２において適宜表示させるように制御することができる。 The control means 3 includes an electronic device, an electronic circuit, and / or a program for controlling each means so as to execute the form image automatic acquisition / reading process according to the present invention. The control unit 3 can control the imaging unit 1 to activate the imaging unit 1 and execute the one-shot AF operation at an appropriate timing. Control can be performed so that images continuously photographed by the photographing means 1 are accepted as images for analysis such as edge detection at an appropriate timing. As an appropriate timing, for example, the preview image is skipped until the one-shot focus operation ends, and the preview image is accepted for analysis when the focus operation ends. Further, it is possible to control the analysis result or the like to be appropriately displayed on the input / output means 2.

記憶手段４は、フラッシュメモリ、ハードディスク（ＨＤＤ）、ＲＡＭ及びＲＯＭ等のメモリから主として構成される。記憶手段４は、本発明に係る帳票画像自動取得・読取処理に係るアプリケーションプログラムを記憶し、処理中の画像や解析結果、最終的な文字読取結果等を記憶することができる。 The storage unit 4 is mainly composed of a memory such as a flash memory, a hard disk (HDD), a RAM, and a ROM. The storage unit 4 stores an application program related to the form image automatic acquisition / reading process according to the present invention, and can store an image being processed, an analysis result, a final character reading result, and the like.

エッジ決定手段５は、受付されたプレビュー画像に含まれるレシートのエッジを検出し、決定するための電子機器、電子回路及び／又はプログラムから成る。 The edge determination means 5 includes an electronic device, an electronic circuit, and / or a program for detecting and determining the edge of the receipt included in the received preview image.

本発明におけるエッジの検出・決定は以下のような利点を有する。一般的なＯＣＲエンジンは、帳票画像のどことどこを読むのか、フォーマットごとにその範囲を設定するツールを提供する。しかしながら、不定形なレシートを撮影したレシート画像を取得して文字読取を行う場合、予めＯＣＲ処理の対象となる画像中の範囲を設定することができない。帳票画像全体をＯＣＲで処理しようとすると、レシートの形状によっては画像中の背景の割合が大きくなり、ＯＣＲで背景を文字として処理するなどして解読可能な文字数を超え、エラーを起こす場合がある。本発明のようにエッジ検出・決定を行うことで、不定形なレシートを撮影した画像において不要な背景を除去することができ、予め読取の対象範囲を設定しなくても成功裏に読取することができる。エッジの内側部分、すなわちレシートが写っている部分のピントを判定することができるので、予めレシートのＯＣＲ処理に適する画像を取得することができる。 Edge detection / determination in the present invention has the following advantages. A general OCR engine provides a tool for setting a range for each format, where and where a form image is read. However, when a receipt image obtained by photographing an irregular receipt is acquired and character reading is performed, it is not possible to set a range in the image to be subjected to OCR processing in advance. If you try to process the entire form image with OCR, depending on the shape of the receipt, the proportion of the background in the image will increase, exceeding the number of characters that can be deciphered by processing the background as characters in OCR, etc. . By performing edge detection / determination as in the present invention, an unnecessary background can be removed from an image obtained by capturing an irregular receipt, and the image can be read successfully without setting a target range for reading in advance. Can do. Since it is possible to determine the focus of the inner portion of the edge, that is, the portion where the receipt is reflected, an image suitable for the OCR processing of the receipt can be acquired in advance.

また、ユーザがレシートを撮影するとき、レシートの背景は様々であることが考えられ、模様（テーブルの木目等）があったり、レシートと同じく白かったりする場合がある。本発明に係るエッジ検出によれば、さまざまな背景に対応してレシートのエッジを検出することができる。以下、帳票はレシートとして説明される場合があるが、レシートに限定することを意図するものではない。本発明は、白地に黒字、周囲に余白がある帳票、例えば、白黒で印字された名刺、カード、チケット、領収書、病院の診療票、学校のプリント等さまざまな印字物に適用することができる。 In addition, when a user photographs a receipt, the background of the receipt may be various, and may have a pattern (table grain or the like) or may be white as the receipt. According to the edge detection according to the present invention, the edge of a receipt can be detected corresponding to various backgrounds. Hereinafter, a form may be described as a receipt, but is not intended to be limited to a receipt. The present invention can be applied to various printed matters such as business cards, cards, tickets, receipts, hospital medical records, school prints printed in black and white, and black and white margins on the white background. .

図２〜図１１を参照して、本発明に係る一つの実施形態のエッジの決定が説明される。 With reference to FIGS. 2 to 11, the edge determination of one embodiment according to the present invention will be described.

初めに、エッジ決定手段５は、プレビュー画像を適宜スケーリングし、トリミングし、二値化し、及び／又はフィルタを実行して、エッジの検出のための解析画像Ｉ_Ａを生成することができる。エッジを検出するために、解析画像Ｉ_Ａはブロック化される。図２（ａ）は、解析画像Ｉ_Ａの水平（幅Ｗ）方向のブロック化を表し、図２（ｂ）は、解析画像Ｉ_Ａの垂直（高さＨ）方向のブロック化を示す。それぞれのブロックサイズはΔｗ、Δｈで表される。Δｗ、Δｈは好適に、１画素よりも大きい整数の画素数である。例示的なΔｗ（Δｈ）は、２、４、６、８、１０、１５（画素）等であり、これらに限定されない。 Initially, the edge determination unit 5, appropriately scaled preview image, trimmed, Two and value, and / or by running a filter, it is possible to generate an analysis image I _A for the detection of edges. To detect an edge, analyzing the image I _A is blocked. FIGS. 2 (a) represents the horizontal (width W) direction of the block of the analysis image _{I A,} FIG. 2 (b) shows the vertical (height H) direction of the block of analyzing the image _{I A.} Each block size is represented by Δw and Δh. Δw and Δh are preferably integer pixel numbers larger than one pixel. Exemplary Δw (Δh) is 2, 4, 6, 8, 10, 15 (pixel), and the like, but is not limited thereto.

水平方向のブロック化において、例えば、Δｗ＝１０画素であるとき、ブロック内の水平方向の１０画素すべてが白画素であるときに白画素となり、ブロック内の水平方向の１０画素のうち一つ以上が黒画素であるときは、ブロック内の全ての画素が黒画素とされる。同様に、垂直方向のブロック化においては、例えば、Δｈ＝１０画素であるとき、ブロック内の垂直方向の１０画素すべてが白画素であるときに白画素となり、ブロック内の垂直方向の１０画素のうち一つ以上が黒画素であるときは、ブロック内の全ての画素が黒画素となる。このようにすることで、水平方向や垂直方向に伸長する白地を検出しやすくなる。本明細書においては、同一ブロック内の連続する白画素の領域をブランク又はレクタングルといい、異なるブロック間で連結する一連のブランク又はレクタングルをスペースということがある。 In the horizontal blocking, for example, when Δw = 10 pixels, all the 10 pixels in the horizontal direction in the block are white pixels and become white pixels, and one or more of the 10 horizontal pixels in the block Is a black pixel, all the pixels in the block are black pixels. Similarly, in the block formation in the vertical direction, for example, when Δh = 10 pixels, all the 10 pixels in the vertical direction in the block become white pixels and become white pixels, and 10 pixels in the vertical direction in the block When one or more of them are black pixels, all the pixels in the block are black pixels. By doing so, it becomes easy to detect a white background extending in the horizontal direction or the vertical direction. In the present specification, a continuous white pixel region in the same block may be referred to as a blank or a rectangle, and a series of blanks or rectangles connected between different blocks may be referred to as a space.

一つの実施形態にかかる、レシート２００の第１のエッジの検出が図３〜図６を参照して説明される。図３は、白い背景とともにレシート２００を撮影したプレビュー画像Ｉを模式的に示す。 Detection of the first edge of the receipt 200 according to one embodiment is described with reference to FIGS. FIG. 3 schematically shows a preview image I taken of the receipt 200 with a white background.

図４は、プレビュー画像Ｉの解析画像Ｉ_Ａにおけるブランクの検出を模式的に示す。解析画像Ｉ_Ａでは適当なフィルタにより、レシート中の黒い印字部分等は、周囲の白地と区別しやすいように、実際の印字部分よりも拡張される場合がある（図示せず）。解析画像Ｉ_Ａにおいてレシート２００_Ａに関する領域が検出される。解析画像Ｉ_Ａの水平方向のブロック化により（図４（ａ））、垂直方向のブランクが検出され、検出されたブランクのうち一定以上の長さ（例えば、画像の高さの半分以上）を有するブランクＢ１〜Ｂ１１が垂直方向のエッジ候補として検出され得る（図４（ｂ））。 Figure 4 shows a blank of a detection in the analysis image I _A preview images I schematically. The analysis image I _A at appropriate filter, the black printing portion and the like in the receipt, as easier to distinguish from the surrounding white background, it may be extended than the actual printing portion (not shown). Regions for receipt 200 _A is detected in the analysis image _{I A.} The horizontal direction of the block of analyzing the image I _A (FIG. 4 (a)), the vertical direction of the blank is detected, a predetermined or longer among the detected blank (e.g., more than half of the height of the image) The blanks B1 to B11 that are included can be detected as edge candidates in the vertical direction (FIG. 4B).

同様に図５を参照して、解析画像Ｉ_Ａの垂直方向のブロック化により（図５（ａ））、水平方向のブランクが検出され、検出されたブランクのうち一定以上の長さ（例えば、画像の幅の半分以上）を有するブランクＢ１２〜Ｂ２２が水平方向のエッジ候補として検出され得る（図５（ｂ））。 Similarly with reference to FIG. 5, by the block of vertical analysis image I _A (FIG. 5 (a)), is detected horizontal blanking, the detected constant over the length of the blank (for example, Blanks B12 to B22 having half or more of the width of the image can be detected as horizontal edge candidates (FIG. 5B).

図６を参照し、エッジの決定及びＯＣＲ等文字読取処理の対象範囲の決定が模式的に示される。初めに、解析画像Ｉ_Ａにおいてレシート部分（２００_Ａ）が画像の中心Ｃを含むと仮定し、中心Ｃの位置に関して、上側、下側、左側、右側のいずれかにブランクを振り分ける。画像の上から、下から、左から、右からそれぞれ順に振り分けられたブランクを調べ、画像中心Ｃに最も近い四つのブランク（図中破線で示されるＢ５、Ｂ６、Ｂ１８、及びＢ１９）がエッジを表すブランクとして検出される（図６（ａ））。これらブランクにより画成される長方形の四辺ｅ１、ｅ２、ｅ３、及びｅ４が検出され得る。 With reference to FIG. 6, the determination of the edge and the determination of the target range of the character reading process such as OCR are schematically shown. First, it is assumed that the receipt part (200 _A ) includes the center C of the image in the analysis image I _A , and the blank is assigned to any one of the upper side, the lower side, the left side, and the right side with respect to the position of the center C. The blanks sorted in order from the top, bottom, left, and right of the image are examined, and the four blanks closest to the image center C (B5, B6, B18, and B19 indicated by broken lines in the figure) have edges. It is detected as a blank to represent (FIG. 6 (a)). The rectangular four sides e1, e2, e3, and e4 defined by these blanks can be detected.

図６（ｂ）を参照し、四辺ｅ１、ｅ２、ｅ３、及びｅ４を、元のプレビュー画像Ｉの座標系に対応させて、レシート２００（破線で示される）に関するエッジＥ１、Ｅ２、Ｅ３、及びＥ４が決定される。画像の左上端を原点（０，０）として、エッジＥ１、Ｅ２、Ｅ３、及びＥ４により画成される長方形の四つの頂点のＸ座標、Ｙ座標それぞれがエッジの頂点座標として記憶され、後続のエッジ安定の判断に用いられる。また、ＯＣＲの読取対象範囲として、領域Ｒが決定される。例示的な領域Ｒは、エッジＥ１、Ｅ２、Ｅ３、及びＥ４により画成される長方形の内側部分である。領域Ｒはレシート２００の文字領域に対応する。プレビュー画像Ｉが文字読取用の画像として取得される場合、領域Ｒの外側をＯＣＲの処理対象にならないように、例えば、黒で塗りつぶした（図６（ｂ）中、グレーの色塗り部分）画像が生成され得る（クリッピング）。このようにすることで、予めＯＣＲの読取範囲としてレシート２００の範囲が設定されなくても必要な範囲をＯＣＲの処理対象とすることができ、精度よく文字認識することができる。 Referring to FIG. 6B, the edges E1, E2, E3, and E4 related to the receipt 200 (shown by a broken line) are associated with the four sides e1, e2, e3, and e4 corresponding to the coordinate system of the original preview image I. E4 is determined. The X coordinate and Y coordinate of the four vertices of the rectangle defined by the edges E1, E2, E3, and E4 are stored as the vertex coordinates of the edge with the upper left corner of the image as the origin (0, 0). Used to determine edge stability. Further, the region R is determined as the OCR reading target range. An exemplary region R is a rectangular inner portion defined by edges E1, E2, E3, and E4. Region R corresponds to the character region of the receipt 200. When the preview image I is acquired as an image for character reading, for example, an image in which the outside of the region R is filled with black so as not to be an OCR processing target (gray colored portion in FIG. 6B). Can be generated (clipping). By doing so, even if the range of the receipt 200 is not set as the OCR reading range in advance, a necessary range can be set as an OCR processing target, and character recognition can be performed with high accuracy.

続いて、第２のエッジの検出が説明される。 Subsequently, detection of the second edge will be described.

初めに、図７は、ブロックサイズΔｗ（＝６画素）で水平ブロック化された一つのブロックにおけるブランクとレクタングルの関係を模式的に示す。ここではブランクはΔｗ×１画素（垂直方向のブロック化では１×Δｈ画素）のサイズを持ち、レクタングルは、ブロック内で連続するブランクをまとめて成る。図示の例では、レクタングルｒ１は８つのブランク（６×８画素）から成り、レクタングルｒ２は４つのブランク（６×４画素）から成る。これらの白地レクタングルは、それぞれ右上及び左下の座標（ラスタデータ）を記憶手段４に記憶されてよい。これらのレクタングルの異なるブロック間での連結を調べることでスペースが検出される。例示的なスペースの検出が特許文献４に記載されているが、これは本発明におけるエッジの検出が公知であることを述べるものではない。 First, FIG. 7 schematically shows a relationship between a blank and a rectangle in one block horizontally formed with a block size Δw (= 6 pixels). Here, the blank has a size of Δw × 1 pixel (1 × Δh pixel in the case of blocking in the vertical direction), and the rectangle is made up of continuous blanks in the block. In the illustrated example, the rectangle r1 is composed of 8 blanks (6 × 8 pixels), and the rectangle r2 is composed of 4 blanks (6 × 4 pixels). These white background rectangles may store the upper right and lower left coordinates (raster data) in the storage means 4 respectively. A space is detected by examining the connection between different blocks of these rectangles. An exemplary space detection is described in US Pat. No. 6,057,056, but does not state that edge detection in the present invention is known.

図８は、模様のある（白くない）背景とともにレシート２００を撮影したプレビュー画像Ｉを模式的に示す。 FIG. 8 schematically shows a preview image I obtained by photographing the receipt 200 with a patterned (not white) background.

図９は、プレビュー画像Ｉの解析画像Ｉ_Ａにおけるスペースの検出を模式的に示す。解析画像Ｉ_Ａにおいては、二値化や適当なフィルタにより、背景の黒い部分、及びレシート中の黒い印字部分は、レシートの周縁の白地と区別しやすいように、予め拡張され得る（図示せず）。水平方向のブロック化（図９（ａ））によりブロック毎の白地レクタングルが検出される。一定範囲の大きさを有するレクタングルの隣り合うブロック間の連結を調べることにより、水平方向に一定以上の長さを有する一連のレクタングルが検出される（図９（ｂ）中、実線で表される複数の矩形）。検出された一連のレクタングルに基づいてスペースが決定される。図示の例では、一連のレクタングルがそのままスペースＳ１、Ｓ２に決定される（同、グレーの色塗り部分）。概して一連のレクタングルは、帳票が画像の縦横に関して傾いていたり（スキュー）、様々な印字パターンによって、多くの場合、ブロック間でレクタングルの位置やサイズが一定しない。従ってスペースは、レシートの周縁の余白の検出に適するように、一連のレクタングルのそれぞれの位置やサイズが調整された、略長方形状や略平行四辺形状の領域に決定されてよい。 Figure 9 shows the detection of space in the analysis image I _A preview images I schematically. In the analysis image I _A, by binarization and appropriate filter, the black portion of the background, and black printed portion in the receipt, as easier to distinguish from the white background of the periphery of the receipt, without prior extended can (shown ). A white rectangle for each block is detected by horizontal blocking (FIG. 9A). A series of rectangles having a certain length or more in the horizontal direction is detected by examining the connection between adjacent blocks of rectangles having a certain range of size (indicated by a solid line in FIG. 9B). Multiple rectangles). A space is determined based on a series of detected rectangles. In the example shown in the figure, a series of rectangles is determined as the spaces S1 and S2 as they are (same as gray colored portions). In general, in a series of rectangles, the form is inclined with respect to the vertical and horizontal sides of the image (skew), and in many cases, the position and size of the rectangle are not constant between blocks due to various print patterns. Therefore, the space may be determined to be a substantially rectangular or substantially parallelogram-shaped region in which the position and size of each of the series of rectangles are adjusted so as to be suitable for detection of the margin at the periphery of the receipt.

同様に図１０は、垂直方向のブロック化（図１０（ａ））による垂直方向のスペースの検出を模式的に示す。ブロック毎に一定範囲の大きさを有するレクタングルが検出され、隣り合うブロックのレクタングルの垂直方向の連結を調べることにより、垂直方向に一定以上の長さを有する一連のレクタングルが検出される（図１０（ｂ）中、実線で表される複数の矩形）。図示のように、一連のレクタングルのそれぞれのサイズはブロック間でまちまちであり、レクタングルの中心位置等も必ずしも一定しない。図示の例では、一連のレクタングル間で位置や大きさが共通する範囲が、スペースＳ３、Ｓ４に決定され得る（同、グレーの色塗り部分）。なお、スペースは、一連のレクタングルの共通部分として決定されなくてもよく、レシート周縁の余白を検出するという目的に適合するように、一連のレクタングルに関する領域として決定されてよい。 Similarly, FIG. 10 schematically shows detection of a vertical space by vertical blocking (FIG. 10A). A rectangle having a certain range of size is detected for each block, and a series of rectangles having a certain length or more in the vertical direction is detected by checking the vertical connection of the rectangles of adjacent blocks (FIG. 10). (B) A plurality of rectangles represented by solid lines). As shown in the figure, the size of each of the series of rectangles varies between blocks, and the center position of the rectangle is not necessarily constant. In the example shown in the drawing, a range in which the position and size are common between a series of rectangles can be determined as the spaces S3 and S4 (same as gray colored portions). The space may not be determined as a common part of the series of rectangles, and may be determined as an area related to the series of rectangles so as to meet the purpose of detecting margins at the periphery of the receipt.

図１１は、検出されたスペースに基づくレシートエッジの決定を模式的に示す。図１１（ａ）を参照して、レシートの上下のエッジは水平方向のブロック化により検出されたスペース（図中、点線で表される）の画像の中心Ｃからの位置関係等に基づき検出される。画像中心より上側にあり最も中心Ｃに近いスペースがレシートの上側エッジを表し、画像中心Ｃより下側にあり、最も中心Ｃに近いスペースがレシートの下側エッジを表す。同様に、レシートの左右のエッジは垂直方向のブロック化により検出されたスペース（図中、点線で表される）の画像中心Ｃからの位置関係に基づき検出される。すなわち、画像中心Ｃより左側にあり最も中心Ｃに近いスペースがレシートの左側エッジを表し、画像中心Ｃより右側にあり、最も中心に近いスペースがレシートの右側エッジを表す。このようにすることで、背景によっては多数検出され得るスペースから、適切なスペースを選択することができる。検出された最大四つのスペースのそれぞれについて、長方形の四辺となり得るｅ５、ｅ６、ｅ７、及びｅ８が決定される。ｅ５、ｅ６、ｅ７、及びｅ８は、それぞれのスペースの面積を等分する水平方向の直線、及び垂直方向の直線であり得る。ｅ５、ｅ６、ｅ７、及びｅ８の決定はこれに限定されず、スペースの最も中心Ｃに近い／遠い直線であってもよい。また、ｅ５、ｅ６、ｅ７、及びｅ８は、スペースに基づいて任意に決定される水平方向の直線、及び垂直方向の直線であってよい。 FIG. 11 schematically illustrates receipt edge determination based on the detected space. Referring to FIG. 11A, the upper and lower edges of the receipt are detected based on the positional relationship from the center C of the image of the space (represented by a dotted line in the figure) detected by horizontal blocking. The The space above the image center and closest to the center C represents the upper edge of the receipt, and the space below the image center C and closest to the center C represents the lower edge of the receipt. Similarly, the left and right edges of the receipt are detected based on the positional relationship from the image center C of the space (represented by a dotted line in the figure) detected by blocking in the vertical direction. That is, the space that is to the left of the image center C and closest to the center C represents the left edge of the receipt, and the space that is to the right of the image center C and closest to the center represents the right edge of the receipt. By doing in this way, an appropriate space can be selected from the space where many can be detected depending on the background. For each of up to four detected spaces, e5, e6, e7, and e8, which can be the four sides of the rectangle, are determined. e5, e6, e7, and e8 may be a horizontal straight line and a vertical straight line equally dividing the area of each space. The determination of e5, e6, e7, and e8 is not limited to this, and may be a straight line closest to / far from the center C of the space. Also, e5, e6, e7, and e8 may be a horizontal straight line and a vertical straight line arbitrarily determined based on the space.

図１１（ｂ）を参照し、スペースに基づいて決定される直線ｅ５、ｅ６、ｅ７、及びｅ８を、長方形を画成するようにを元の画像Ｉに対応させることによりレシート２００に関するエッジＥ５、Ｅ６、Ｅ７、及びＥ８が決定される。エッジＥ５、Ｅ６、Ｅ７、及びＥ８を四辺とする長方形の四つの頂点座標がエッジの頂点座標として記憶手段４に記憶される。クリッピングは、スペースに基づいて行うことができる。スペース（あるいはスペースを構成する各レクタングル）の外側に対応する部分や画像の四隅をＯＣＲの処理対象とならないように塗りつぶすことで（図１１（ｂ）中、グレーの色塗り部分）、文字読取の対象領域Ｒ’を画成することができる。このようにすることで、画像Ｉ全体をＯＣＲ処理しても読取可能文字数を超えることがなく、レシート２００に関連する部分、すなわちレシート２００の文字領域に対応する部分のみを成功裏に文字読取処理することができる。 Referring to FIG. 11 (b), by matching the straight lines e5, e6, e7, and e8 determined based on the space to the original image I so as to define a rectangle, the edge E5 for the receipt 200, E6, E7, and E8 are determined. Four vertex coordinates of a rectangle having the edges E5, E6, E7, and E8 as four sides are stored in the storage unit 4 as the vertex coordinates of the edge. Clipping can be done based on space. By filling the portions corresponding to the outside of the space (or each of the rectangles constituting the space) and the four corners of the image so that they are not subject to OCR processing (the gray colored portion in FIG. 11B), character reading A target region R ′ can be defined. In this way, even if the entire image I is subjected to OCR processing, the number of characters that can be read is not exceeded, and only the portion related to the receipt 200, that is, the portion corresponding to the character region of the receipt 200 is successfully read. can do.

ユーザがレシートを撮影する際、背景はさまざまであることに鑑みて、本発明では好適に、第１の検出と、第２の検出とが組み合わされる。すなわち、第１の検出によりエッジを表す最大四つのブランクが検出され、第２の検出によりエッジを表す最大四つのスペースが検出され、これらに基づいてレシートのエッジが決定される。例えば、上側ブランクと上側スペースが検出された場合、中心Ｃにより近い一方、より長い一方、又は任意の一方を選択したり、両方を組み合わせて四本のエッジを決定することができる。背景によっては、エッジを表すブランク又はスペースの一方しか検出されないため、両方を検出するようにすることで、どのような背景でもエッジを適切に決定することができる。第１、第２の検出ともにエッジが検出されない場合は、画像の周縁をエッジに決定することができる。 In view of the various backgrounds when the user captures a receipt, the present invention preferably combines the first detection and the second detection. That is, a maximum of four blanks representing edges are detected by the first detection, and a maximum of four spaces representing edges are detected by the second detection, and the edges of the receipt are determined based on these. For example, when an upper blank and an upper space are detected, one closer to the center C, one longer, or any one can be selected, or four edges can be determined by combining both. Depending on the background, only one of a blank or a space representing the edge is detected. Therefore, by detecting both, the edge can be appropriately determined in any background. If no edge is detected in both the first and second detections, the periphery of the image can be determined as the edge.

エッジ安定判定手段６（図１）は、上記のように決定されたエッジが安定したかどうかを判定するための電子機器、電子回路及び／又はプログラムから成る。 The edge stability determination means 6 (FIG. 1) includes an electronic device, an electronic circuit, and / or a program for determining whether the edge determined as described above is stable.

エッジ安定判定手段６は、例えば、第１のプレビュー画像の解析により決定されたエッジの四つの頂点座標と、第２のプレビュー画像の解析により決定されたエッジの四つの頂点座標とを比較することにより、エッジが安定したかどうかを判定する。例えば、第１の四つの頂点座標と第２の四つの頂点座標のそれぞれの最小のＸ座標の値同士、最大のＸ座標の値同士、最小のＹ座標の値同士、最大のＹ座標の値同士を比較し、それぞれの差が一定以下である場合にエッジが安定したと判定することができる。このようにエッジの安定を検出することで、対象レシートがファインダ内で定まり、手振れが少ない状態を検出することができる。 The edge stability determination means 6 compares, for example, the four vertex coordinates of the edge determined by the analysis of the first preview image and the four vertex coordinates of the edge determined by the analysis of the second preview image. To determine whether the edge is stable. For example, the minimum X coordinate values of each of the first four vertex coordinates and the second four vertex coordinates, the maximum X coordinate values, the minimum Y coordinate values, and the maximum Y coordinate value. By comparing each other, it can be determined that the edge is stable when the difference between them is not more than a certain value. By detecting the stability of the edge in this way, it is possible to detect a state in which the target receipt is determined in the finder and there is little camera shake.

エッジ安定判定部６は、時刻（ｔ−１）に撮影された第１のプレビュー画像と、時刻ｔに撮影された第２のプレビュー画像とから検出された２回分のエッジを比較してよい。エッジ安定判定部６は、任意のＮ（≧２）回分のエッジを比較して、エッジが安定したかどうか判定してよい。安定と判定されたとき、Ｎ回分の最新（例えば、時刻ｔ）のプレビュー画像が後続のピント判定処理に用いられる。 The edge stability determination unit 6 may compare two edges detected from the first preview image shot at time (t−1) and the second preview image shot at time t. The edge stability determination unit 6 may compare any N (≧ 2) times of edges to determine whether the edge is stable. When it is determined to be stable, N latest preview images (for example, time t) are used for the subsequent focus determination processing.

上記のとおり、レシートエッジは白地の検出に基づいて決定される。カメラのピントが合っていなくても、エッジは検出、決定される。単にエッジが検出されても、ピントが合っていない場合があり、そのような画像を対象としてＯＣＲ処理を行っても、成功裏に文字読取することができない場合がある。 As described above, the receipt edge is determined based on the detection of the white background. Even if the camera is not in focus, the edge is detected and determined. Even if an edge is simply detected, the image may not be in focus, and even if OCR processing is performed on such an image, characters may not be successfully read.

ピント判定手段７（図１）は、エッジが安定したと判定されたときに、エッジが安定したプレビュー画像の読取対象範囲について、ピントが合っているかどうかを判定するための電子機器、電子回路及び／又はプログラムから成る。 When it is determined that the edge is stable, the focus determination unit 7 (FIG. 1) includes an electronic device, an electronic circuit, and an electronic device for determining whether or not the preview image reading target range where the edge is stable is in focus. / Or consists of programs.

ピント判定手段７は、プレビュー画像のグレースケール画像をエンボス加工したエンボス画像を生成する。エンボス加工では画素（ｐ）と左上の画素（ｓ）を用いて、「ｐ＝−ｓ＋ｐ＋１２８」が計算される。画素（ｐ）と（ｓ）に明度の差が少なければ「１２８」に近づき、明度の差が大きければその大きさ（ｎ）に応じて、「１２８±ｎ」となる。すなわちエンボス加工は、明度の差を浮き上がらせる効果と、同じ明度を中間色にする効果とを有する。本発明の発明者は、レシート画像において、ピントが合っていなければ「１２８」に近い画素が多く、ピントがあっていれば「１２８±ｎ」の画素が増えること、及び輝度の数なら画像の大きさ(画素の数)に影響されない事に着目し、これを用いて容易にピント判定を行うように構成した。 The focus determination unit 7 generates an embossed image obtained by embossing the grayscale image of the preview image. In embossing, “p = −s + p + 128” is calculated using the pixel (p) and the upper left pixel (s). If the difference in brightness between the pixels (p) and (s) is small, it approaches “128”, and if the difference in brightness is large, it becomes “128 ± n” according to the size (n). That is, embossing has the effect of raising the difference in lightness and the effect of making the same lightness an intermediate color. In the receipt image, the inventor of the present invention has many pixels close to “128” if the image is not in focus, “128 ± n” pixels increase if the image is in focus, and if the number of luminances, Focusing on the fact that it is not affected by the size (number of pixels), it was configured to easily perform focus determination using this.

図１２は、フォーカスヒストグラムであり、縦軸は画素数（０〜３００）、横軸は輝度（０〜２５５）を表す。ピント判定手段７は、例えば、検出されたエッジ（長方形）の内側で、エッジの長方形の大きさの一定の割合の長方形（例えば、３００〜４００×３００〜４００画素）に対して、エンボス画像のヒストグラムを調べることによってピントが合っているか否か、ピントが合っていない場合はどのくらい合っていないか（ピントレベル）を決定することができる。帳票の特徴に応じて一定の割合の長方形の位置、形状、大きさは変更されてよい。 FIG. 12 is a focus histogram, where the vertical axis represents the number of pixels (0 to 300) and the horizontal axis represents the luminance (0 to 255). For example, the focus determination unit 7 is configured to detect an embossed image on a rectangle (for example, 300 to 400 × 300 to 400 pixels) with a certain ratio of the size of the edge rectangle inside the detected edge (rectangle). By examining the histogram, it can be determined whether or not the subject is in focus, and if it is not in focus, how much is out of focus (focus level). The position, shape, and size of a certain percentage of rectangles may be changed according to the characteristics of the form.

例示的に、図１２のピントレベルＦ１（点線で示される）では、エンボス画像において輝度が１２８に集中している。例えば、１２８よりも大きい方向、小さい方向のそれぞれの輝度を持つ画素が一定数以上である輝度の範囲（差）が６０未満のときに「輝度が１２８に集中している」、すなわち、「ピントがまったく合っていない」と判定することができる。ピントレベルＦ２（二点鎖線で示される）では、依然１２８前後に集中しており、「ピントが合っていない」と判定され得る。ピントレベルＦ３（一点鎖線で示される）はやや明度の差があり、「ピントが少し合っている／やや手振れ」と判定され得る。ピントレベルＦ４（破線で示される）は「ピントがあまい／動きがある」と判定され得る。ピントレベルＦ５（実線で示される）は、輝度が十分に分散し、明度に大きな差があることが分かる。例えば、輝度１２８よりも大きい方向、小さい方向のそれぞれの輝度を持つ画素が一定数以上である輝度の範囲（差）が１８０以上であるときに「輝度が十分に分散している」、すなわち、「ピントが合っている」と判定され得る。 Illustratively, at the focus level F1 (shown by a dotted line) in FIG. 12, the luminance is concentrated at 128 in the embossed image. For example, when the luminance range (difference) in which the number of pixels having the respective luminances in the direction larger and smaller than 128 is a certain number or more is less than 60, “the luminance is concentrated at 128”, that is, Is not suitable at all. " At the focus level F2 (indicated by a two-dot chain line), the focus level is still around 128, and it can be determined that “out of focus”. The focus level F3 (indicated by the alternate long and short dash line) has a slight difference in brightness, and it can be determined that “the focus is a little in focus / a little camera shake”. The focus level F4 (indicated by a broken line) may be determined as “in focus / motion”. At the focus level F5 (indicated by the solid line), it can be seen that the luminance is sufficiently dispersed and there is a large difference in brightness. For example, when the luminance range (difference) in which the number of pixels having the respective luminances in the direction larger and smaller than the luminance 128 is a certain number or more is 180 or more, “the luminance is sufficiently dispersed”, ie, It can be determined that “in focus”.

画像取得手段８（図１）は、エッジが安定したと判定されたプレビュー画像であって、さらにピントが合っていると判定されたプレビュー画像を文字読取用の画像として取得するための電子機器、電子回路及び／又はプログラムから成る。 The image acquisition means 8 (FIG. 1) is an electronic device for acquiring a preview image determined to have a stable edge and further determined to be in focus as a character reading image, It consists of electronic circuits and / or programs.

代替的に、画像取得手段８は、携帯端末装置１００のプレビュー画像の画像サイズが十分でない場合、エッジが安定したと判定され、且つピントが合っていると判定されたときに自動的にシャッタ動作を行って撮影された画像を文字読取用の画像として取得することができる。画像取得手段８は、取得された画像について、上記のクリッピング処理を行うことができる。 Alternatively, when the image size of the preview image of the mobile terminal device 100 is not sufficient, the image acquisition unit 8 automatically performs the shutter operation when it is determined that the edge is stable and it is determined that the focus is in focus. It is possible to acquire an image photographed by performing as a character reading image. The image acquisition unit 8 can perform the above clipping process on the acquired image.

読取手段９（図１）は、文字読取用画像として取得された画像について文字読取に関する処理を行うための電子機器、電子回路及び／又はプログラムから成る。読取手段９は公知のＯＣＲ処理を行うものであってよく、例えば、取得された画像の明るさ、ハイライトを調整して白黒二値化し、レイアウト解析し、罫線を除去し、文字を切り出し、パターン辞書、フォント辞書等を用いて文字読取することができる。読取結果は記憶手段４に記憶されてよい。 The reading unit 9 (FIG. 1) includes an electronic device, an electronic circuit, and / or a program for performing processing related to character reading for an image acquired as a character reading image. The reading unit 9 may perform a known OCR process. For example, the brightness and highlight of the acquired image are adjusted and binarized into black and white, analyzed for layout, removed ruled lines, cut out characters, Characters can be read using a pattern dictionary, a font dictionary, or the like. The read result may be stored in the storage unit 4.

図１３を参照し、本発明の一つの実施形態に係るレシート等帳票画像自動取得・読取フローが説明される。携帯端末装置１００で本発明に係る所定のアプリケーションが開始されると（ＳＴＡＲＴ）、カメラ（撮影手段１）が起動され、ワンショットのオートフォーカス（ＡＦ）動作が実行される（Ｓ３０１）。本発明では好適に、カメラのフォーカスをコントロールし適切なタイミングで画像を取り込むために、コンティニアス（継続フォーカス）モードではなく、ワンショットのオートフォーカスが実行される。 With reference to FIG. 13, a flow for automatically acquiring and reading a form image such as a receipt according to an embodiment of the present invention will be described. When a predetermined application according to the present invention is started on the mobile terminal device 100 (START), the camera (imaging means 1) is activated and a one-shot autofocus (AF) operation is executed (S301). In the present invention, in order to control the focus of the camera and capture an image at an appropriate timing, a one-shot autofocus is executed instead of the continuous (continuous focus) mode.

オートフォーカス動作が終了すると連続して撮影されるプレビュー画像が受付され、解析される（Ｓ３０３）。レシートが写っていなかったり手振れがひどいなどしてエッジが決定されない場合、後続のプレビュー画像が解析される。エッジが決定されると記憶手段４に記憶される（Ｓ３０５）。決定されたエッジは、プレビュー画像に重畳して表示されてよい。このようにすることで、ユーザがエッジを意識するので、エッジが安定しやすくなる。 When the autofocus operation is finished, a preview image continuously taken is received and analyzed (S303). If the receipt is not shown or the edge is not determined due to severe camera shake, the subsequent preview image is analyzed. When the edge is determined, it is stored in the storage means 4 (S305). The determined edge may be displayed superimposed on the preview image. By doing so, the user is aware of the edge, so the edge is easily stabilized.

エッジの安定がＮ回のエッジの検出に基づいて行われるとき、比較対象となる前回（（Ｎ−１）回）分のエッジが決定されているかどうかが調べられ、決定されていなかった場合はさらにプレビュー画像の解析を行ってエッジを決定する。例えば、二回分のエッジを比較する場合、最新のエッジと前回のエッジの位置の変化が小さいかどうかチェックされ、エッジが安定したかどうか判定される（Ｓ３０７）。エッジが安定していない場合は、最初からプレビュー画像の解析を繰り返してよい。 When edge stabilization is performed based on detection of N edges, it is checked whether or not the previous ((N-1)) edges to be compared have been determined. Further, an edge is determined by analyzing the preview image. For example, when comparing two edges, it is checked whether the change in the position of the latest edge and the previous edge is small, and it is determined whether the edge is stable (S307). If the edge is not stable, the analysis of the preview image may be repeated from the beginning.

エッジが安定していると判定されたとき、エッジが安定したプレビュー画像のエッジの内側の所定範囲について、ピントが合っているかどうかが判定される（Ｓ３０９）。 When it is determined that the edge is stable, it is determined whether or not the predetermined range inside the edge of the preview image where the edge is stable is in focus (S309).

ピントが合っていない場合、ピントレベル（Ｆ１〜Ｆ４）に応じて、カメラにワンショットのフォーカス動作を要求する（Ｓ３１１）。例えば、エッジは安定したがピントレベルがＦ１又はＦ２であるとき、起動後にユーザがカメラでレシートを探している途中だったり、レシートを画面に収めるためにカメラを上下していると考えられるので、準備が終わったことを想定してワンショットのＡＦをカメラに要求しプレビュー画像解析以降の処理を繰り返す。エッジが安定しピントレベルがＦ３又はＦ４であるときは、ユーザが微調整しているか、カメラが少し揺れている状態が多いと考えられるため、そのままプレビュー画像解析以降の処理を繰り返す。Ｆ３やＦ４の状態が数回繰り返される場合はユーザが意識していないと考えられるため、カメラにＡＦを要求するようにしてよい。 If the image is not in focus, the camera is requested to perform a one-shot focus operation according to the focus level (F1 to F4) (S311). For example, when the edge is stable but the focus level is F1 or F2, it is considered that the user is in the middle of searching for a receipt with the camera after startup or the camera is moved up and down to fit the receipt on the screen. Assuming that the preparation is completed, the camera is requested to perform one-shot AF, and the processing after the preview image analysis is repeated. When the edge is stable and the focus level is F3 or F4, it is considered that there are many states in which the user has finely adjusted or the camera is slightly shaken, so the processing after the preview image analysis is repeated as it is. When the state of F3 or F4 is repeated several times, it is considered that the user is not conscious, so the camera may be requested to perform AF.

エッジが安定し、さらにピントが合っていると判定された画像は、文字読取に適した画像として取得され、記憶手段に記憶される（Ｓ３１３）。このとき、プレビュー画像の解像度が小さい場合（２ＭＰや３ＭＰ等）、そのまま取り込んでも長尺のレシート画像では読取ができないため、自動でシャッタ動作を行うようにしてよい。このようにすることで、プレビュー画像の画像サイズが十分でない携帯端末装置の機種であっても、レシートが定まり、ピントが合っている状態で自動的に撮影して、文字読取に適した画像を取得することができる。 An image that is determined to have a stable edge and is in focus is acquired as an image suitable for character reading and stored in the storage unit (S313). At this time, if the resolution of the preview image is small (2MP, 3MP, etc.), even if it is taken in as it is, it cannot be read with a long receipt image, so the shutter operation may be performed automatically. In this way, even if the model of the mobile terminal device has a preview image size that is not sufficient, the receipt is fixed and the image is automatically taken in focus and an image suitable for character reading is obtained. Can be acquired.

取得された画像について適宜背景を除去して文字読取処理を行い（Ｓ３１５）、例えば、ユーザに読取結果の確認用の画面を表示し、「保存」ボタンの押下を受けて読取結果が保存されてよい（Ｓ３１７）。このように、ユーザがレシートにカメラをかざすだけで、画像の取得から文字読取まで自動で行われる。文字認識結果は通信手段（図示せず）を介してサーバへ送信され、サーバ側でさらに処理されてよい。あるいは、取得された画像（又はクリッピング画像）が通信手段（図示せず）によりサーバへ送信され、サーバ側で文字認識処理を行ってよく、このようにすることで、サーバからユーザに返されるエラーが低減される。文字読取結果がユーザの意図等と異なる場合、「やり直し」ボタンの押下に応じて、プレビュー画像解析以降の一連の処理が行われてよい。 For the acquired image, the background is appropriately removed and character reading processing is performed (S315). For example, a screen for confirming the reading result is displayed to the user, and the reading result is saved when the "Save" button is pressed. Good (S317). As described above, the image acquisition and the character reading are automatically performed only by the user holding the camera over the receipt. The character recognition result may be transmitted to the server via communication means (not shown) and further processed on the server side. Alternatively, the acquired image (or clipping image) may be transmitted to the server by communication means (not shown), and character recognition processing may be performed on the server side. In this way, an error returned from the server to the user Is reduced. When the character reading result is different from the user's intention or the like, a series of processes after the preview image analysis may be performed in response to pressing of the “redo” button.

本発明によれば、不定形なレシートや長いレシートであっても、ユーザは携帯端末装置のカメラをプレビュー画面に収めるように片手でかざすだけで、ピントの確認やシャッタボタンの押下を要することなく、自動的に文字読取結果を得ることができる。容易なピント判定で、ＯＣＲ処理を行うことなくＯＣＲ処理に適した画像を取得することができ、ＣＰＵに負荷をかけることなく成功裏に文字読取することができる。本発明によれば、文字読取結果を利用する各種アプリケーションの操作性が向上し、ユーザ満足度を向上させることができる。 According to the present invention, even for an irregular receipt or a long receipt, the user can hold the camera of the mobile terminal device with one hand so as to fit in the preview screen, without having to confirm the focus or press the shutter button. The character reading result can be automatically obtained. With easy focus determination, an image suitable for OCR processing can be acquired without performing OCR processing, and characters can be read successfully without imposing a load on the CPU. According to the present invention, the operability of various applications that use the character reading result is improved, and the user satisfaction can be improved.

本発明の思想及び態様から離れることなく多くのさまざまな修正が可能であることは当業者の知るところである。したがって、言うまでもなく、本発明の態様は例示に過ぎず、本発明の範囲を限定するものではない。 Those skilled in the art will appreciate that many different modifications are possible without departing from the spirit and aspects of the invention. Accordingly, it goes without saying that the embodiments of the present invention are merely examples, and do not limit the scope of the present invention.

１００携帯端末装置
１撮影手段
２入出力手段
３制御手段
４記憶手段
５エッジ決定手段
６エッジ安定判定手段
７ピント判定手段
８画像取得手段
９読取手段 DESCRIPTION OF SYMBOLS 100 Portable terminal device 1 Imaging | photography means 2 Input / output means 3 Control means 4 Storage means 5 Edge determination means 6 Edge stability determination means 7 Focus determination means 8 Image acquisition means 9 Reading means

Claims

A method of obtaining a form image obtained by photographing a form so as to be suitable for character reading using a portable terminal device having photographing means,
Determining an edge relating to the form from each of a plurality of preview images during photographing of the preview image by the photographing unit;
Comparing each determined edge to determine whether the edge is stable; and
Determining whether or not the predetermined area inside the edge is in focus when it is determined that the edge is stable;
Obtaining a form image for character reading when it is determined that the image is in focus.

The form image acquisition method according to claim 1, wherein the form is a receipt.

The edge consists of upper and lower edges and left and right edges,
The upper and lower edges and left and right edges are determined by blocking an analysis image for analyzing the preview image in a horizontal direction and a vertical direction and detecting a maximum of eight white background portions. The form image acquisition method according to Item 1.

2. The focus determination according to claim 1, wherein whether or not the image is in focus is determined by embossing a grayscale image of the preview image and obtaining the number of pixels for each luminance value of the predetermined area. Form image acquisition method.

2. The method according to claim 1, further comprising: requesting an autofocus operation of the photographing unit when the photographing unit is activated, when the edge is not determined, and / or when it is determined that the focus is not achieved. Form image acquisition method.

2. The form image acquisition method according to claim 1, wherein when the edge is determined, the determined edge is displayed on the screen of the portable terminal device as a guide.

The form image acquisition method according to claim 3, further comprising a step of clipping a character reading target region in the acquired form image based on the edge and / or a white background portion and reading a character. .

The program which makes the computer of the said portable terminal device perform the method described in any one of Claims 1 thru | or 7.

A portable terminal device for acquiring a form image over a form,
Photographing means for photographing the form;
An edge determination unit that determines an edge related to the form from each of a plurality of preview images during shooting of the preview image by the shooting unit;
Edge stability determination means for comparing each of the determined edges and determining whether the edge is stable;
A focus determination means for determining whether or not a predetermined area inside the edge is in focus when it is determined that the edge is stable;
An image acquisition unit that acquires a form image when it is determined that the image is in focus.