JP2005018507A

JP2005018507A - Personal digital assistant and method of recognizing character using it

Info

Publication number: JP2005018507A
Application number: JP2003183736A
Authority: JP
Inventors: Atsuhiro Imaizumi; 敦博今泉; Teruyuki Yamaguchi; 輝幸山口
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-06-27
Filing date: 2003-06-27
Publication date: 2005-01-20

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem with a method of recognizing characters using a mobile terminal wherein a user must make manual adjustments by holding a terminal body in hand and adjusting the position and direction of the body so that the subject of recognition fits inside the frame of a display part. <P>SOLUTION: A personal digital assistant or a cellphone comprises an image input part for inputting images; a display part for displaying inputted image data; an operating part for the user to operate buttons for inputs; a character recognition part for recognizing characters within the images; and a control part for controlling the whole terminal. After shooting, images of strings or rows of characters on the image are extracted. From the extracted images of strings or rows of characters, the image of the string or row of characters to be recognized is selected by the user's operation to automatically recognize the characters. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像入力手段、具体的にはカメラを持った携帯情報端末または携帯電話等において、撮影した画像中の文字列を選択し文字認識する技術に関する。
【０００２】
【従来の技術】
携帯電話にカメラを備えたような特許文献１が提案されている。一般的にこの携帯情報端末を用いて、画像の文字認識をする際には、利用者が端末本体の位置や向きを手動で調整することで、表示部に表示された認識対象フレームの中に、認識対象が収まるように調整する必要がある。
【０００３】
【特許文献１】
特開２００３−１６９１８７号
【０００４】
【発明が解決しようとする課題】
このような従来の携帯端末を用いて文字認識などの複雑な処理をするには、利用者が端末本体を手に取り、本体の位置や向きを調整することで認識対象が表示部のフレーム内に収まるように利用者自らが微調整しなければならず、操作性の面で以下のような課題があった。
（１）この際、表示部の画面が小さいため、フレーム内の認識対象文字の判別が難しい。
（２）本体を認識対象に向けるために、本体の位置や向きを調整すると同時に、表示部の位置や向きも変わってしまうため表示部の表示が見づらい。
（３）本体を認識対象に向けるために、表示部が覗き込めなくなる角度に向けることができない。
（４）画像取得時のシャッターボタンの押下動作で、端末本体が振動し、認識対象がフレームから外れてしまう。
（５）画像取得時に認識対象を決めるため、画像取得後には認識対象を変更するなどの操作ができず、一度の画像取得で複数箇所の文字認識ができない。
などの問題があった。
【０００５】
そこで、本発明では、以上のような点に鑑みてなされたもので、上記課題の一部又は全部を解決すると共に、特に、携帯情報端末または携帯電話等を用いた文字認識において、画像取得後に利用者の操作によって認識対象となる文字列画像を任意に選択することで、容易に対象となる文字列画像を選択できる文字列選択方法、および選択された文字列画像を認識する文字認識方法を提供することを目的としている。
【０００６】
【課題を解決するための手段】
上記目的を達成するため、画像を入力する画像入力部（又はカメラ、撮像部）と、この入力部によって入力した画像データを表示する表示部と、利用者がボタンなどの入力操作する操作部と、画像内の文字を認識する文字認識部と、全体を制御する制御部を備えた携帯情報端末あるいは携帯電話において、撮影後に画像上の文字列画像あるいは文字行画像を抽出し、抽出した文字列画像あるいは文字行画像の中から利用者の操作によって或る範囲の認識対象となる文字列画像あるいは文字行画像を選択するのに好適な携帯端末、又は文字認識方法を提供する。
【０００７】
また、携帯情報端末あるいは携帯電話、そしてその端末を用いた文字認識方法において、文字列画像内部或いは文字行画像内部を文字認識した後に、文字認識結果と表記辞書との照合によって認識対象文字列を抽出する装置、又は方法を提供する。
【０００８】
また、文字列画像あるいは文字行画像の抽出後に、文字列画像あるいは文字行画像の中ら単語を切り出し、切り出した単語の中から利用者の操作によって認識対象となる単語を選択し、選択した単語を文字認識する装置、方法を提供する。
【０００９】
また、利用者に予め操作部によって画像のどの部分を認識して欲しいのかを選択させることで、その選択された画像の位置を基準に簡単に画像の文字認識を行う装置、また文字認識方法を提供する。
【００１０】
【発明の実施の形態】
以下、本発明の実施形態を図１から図１６を用いて説明する。
図１は、本実施形態に係る画像入力手段を持つ携帯情報端末あるいは携帯電話１００（又は単に形態端末、形態装置とも言う）の概略を示す構成図である。名刺や雑誌、あるいは看板などの文字認識対象の画像が、画像入力部１１０から入力され、文字認識部１５０において行候補の抽出を行い、文字行の候補（画像）を表示部１２０に表示する。利用者が操作部１３０を操作することで抽出した文字行候補から認識対象となる文字行を選択し、選択した文字行を認識部１５０において文字認識する。
【００１１】
操作部１３０は、利用者が一般的に電話をかけるときなどに使用されるものであるが、他に利用者が表示部１２０に表示された画像の認識を実行する時に押下する認識実行ボタン１３１、表示部に表示された画像の認識対象である上の行あるいは文字列を選択する上ボタン１３２、右の文字列を選択するときに使用する右ボタン１３３、下の行あるいは文字列を選択するときに使用する下ボタン１３４、左の文字列を選択するときに使用する左ボタン１３５も有している。
【００１２】
文字認識部１５０は、画像入力部１１０から入力されるディジタル画像を二値化する画像二値化部１５１、黒画素のつながった連結成分を抽出する連結成分抽出部１５２、連結成分同士を融合して文字行候補を抽出する文字行抽出部１５３、文字行画像から文字を切り出す文字切り出し部１５４、一文字ごとに分けられた文字パターンを文字コードに変換する文字識別部１５５、文字認識結果と予め具備している表記辞書との照合を行う表記辞書照合部１５６を有し、この文字認識部１５０、操作部１３０、画像入力部１１０、表示部１２０などの各部、各ユニットの制御は、ＣＰＵ、メモリ等から構成される制御部１４０によってその機能が制御される。尚、上述及び以下に説明する各部は、手段、機構、ユニットとも表現でき、基本的にソフトウェア又はハード、又はソフトウェアとハードとの結合によって処理、制御される機能である。なお、撮影、取得、入力などされた画像は、後述する文字認識に用いるように、制御部等のメモリ又は携帯端末に備わるメモリカードに記憶しておくような態様が望ましい。また、本例では横書きの画像を使用しているため、各箇所の例では文字行として説明するものの、表示画像に対して垂直方向の縦書き画像、即ち、文字列としても良いことは言うまでもない。
【００１３】
図２は、図１の携帯端末を使用した第１の実施形態の処理手順を説明する図である。以下に示す処理、制御は制御部、制御手段１４０によって主に行われるが説明を省略する。
【００１４】
利用者は、携帯情報端末あるいは携帯電話１００が具備するＣＣＤやイメージセンサ等の画像入力手段１１０を用いて、文字認識対象となる名刺や雑誌、あるいは看板などの画像を撮影し、装置内の一時記憶メモリ又は制御部１４０に具備されるメモリ上にディジタル画像として取込む（ステップ２０１）。この撮影し、取込んだディジタル画像を文字認識部１５０の画像二値化部１５１によって二値化し（ステップ２０２）、連結成分抽出部１５２によって二値化された成分のうち黒画素の連結成分を抽出する（ステップ２０３）。
【００１５】
この抽出した連結成分の外接矩形のサイズや、外接矩形間の距離の比較を文字行抽出部１５３によって実行し、取得した画像に含まれる文字行の抽出を行う（ステップ２０４）。後述するが、この抽出された文字行は、表示部１２０に利用者が理解できるように表示され、それを見た利用者が操作部１３０のボタン操作によって、画像内の文字認識対象となる文字行を選択する（ステップ２０５）。
【００１６】
ステップ２０６では、利用者が文字認識対象となる文字行を選択し終えた後、認識実行ボタン１３１を押下することで、認識対象となる文字行を決定し、次ステップへ移る。
このステップ２０６で選択した文字行内に対して、１文字毎の文字切り出しを実施し（ステップ２０７）、切り出された一つ一つの文字パターンに対して文字識別部１５５によって文字識別を行う（ステップ２０８）。そして文字識別の結果として予め具備している表記辞書との照合を表記辞書照合部１５６によって行い（ステップ２０９）、ステップ２１０で照合一致となった文字列を認識結果として出力し、文字認識処理を終了する。この文字認識結果は、表示部１２０に表示され、それを保存するなどの操作を操作部１３０によって実行することで、利用者は画像入力部１１０によって入力した画像の特定領域における一部分を文字認識した形態（文字コード）でアドレス帳などと結合させて保存できる。
【００１７】
図３は、図２の文字行抽出ステップ２０４で主要な働きを行う文字行抽出部１５３の詳細な処理手順を示す図である。
【００１８】
上述したように、ディジタル画像は二値化処理により特に黒画素の連結成分を抽出されるが、その抽出された画像、特に黒画素の状態を分析することで画像の横方向の投影分布を求める（ステップ３０１）。そして投影分布から文字行の縦方向の範囲、具体的にはある基準を基に縦方向の座標位置を求め（ステップ３０２）、この文字行の範囲内で、ステップ２０３で求めた連結成分の外接矩形のサイズや、外接矩形間の距離の比較、即ち連結成分を融合する処理を実行することで（ステップ３０３）、最終的な文字行座標を求めることができ（ステップ３０４）、上述した文字行検出する。
【００１９】
図４は、ステップ３０２において、画像情報として含まれる文字行の投影の算出を説明する原理図を示す。図示するように文字行抽出領域４００はステップ２０１で取得した文字認識対象画像の全面あるいは一部分であり、名刺を例示し名前、会社名、会社の住所、電話及びＦＡＸ番号、ｅーｍａｉｌアドレスの情報を含んでいる。
【００２０】
上述した入力された画像を図面の横方向に投影し、投影分布を求める。投影を求める方法としては、図示するように横方向の画素数を加算する方法が簡素で望ましい。あるいは、ステップ２０３で抽出した連結成分の外接矩形の横辺の長さの加算値を採ってもよい。座標軸４１２は横方向投影分布であり、座標軸４１１は領域の縦方向の座標軸に相当する。この投影分布の算出処理によって、非零値の範囲４１３〜４１８が文字行が存在する縦方向座標の範囲に相当するものであることが携帯端末において把握することができる。
【００２１】
図５は、ステップ３０３において連結成分を融合し、ステップ３０４によって文字行座標を計算する処理を説明する図である。
１行抽出領域５００は、ステップ３０２において計算した文字行範囲の１つであり、具体的に図４の文字行範囲４１６の電話番号の項目を例示している。この１行抽出領域５００は、連結成分の複数の外接矩形５０１が含まれ、これら複数の外接矩形は、文字成分の外接矩形となる。そして次にこれらの外接矩形のサイズや、外接矩形間の距離を比較して、サイズが同様であるものや、距離が近い矩形を横方向に融合する。この連結成分の外接矩形５０１の融合によって、文字行左上座標５１１および文字行右下座標５１２を算出し、最終的なステップ３０４で説明した文字行５１０が生成される。
【００２２】
図６は、文字行抽出結果を説明する図である。文字行抽出領域６００は、図４の当初の画像を上述した特に図３のステップによって認識し、得られた文字行（画像）に展開したものを示し、図示するように６個の文字行６０１〜６０６が含まれた形式で装置上において自動認識できる。これら文字行の座標点はある基準位置から左上座標および右下座標が求められ、図７に示すように文字行毎の左上座標のＸ軸、Ｙ軸の座標点と、右下座標のＸ軸、Ｙ軸座標点が、制御部１４０内に設けられたメモリ上の文字行テーブル７００に格納される。
【００２３】
図８は、ステップ２０５における文字行選択操作の際の、表示部１２０の画面遷移の表示例を説明する図である。
表示部１２０に、文字行選択画面８００と現在選択行８０１を表示する。現在選択行８０１は、ステップ２０４にて抽出された文字行の１つである。利用者が上ボタン１３２あるいは下ボタン１３４を押下することによってそれを検知し、図７に示す文字行テーブル７００の参照ポインタをインクリメントあるいはデクリメントし、テーブル７００に一時記憶された文字行の左上、右下座標を読み出し、表示部１２０に画面として表示する。結果として、現在選択行８０１の表示が移動され認識対象となる新しい文字行が選択され表示される。尚、上ボタン押下では、現在選択行の１つ上の行を次の現在選択行とし、下ボタン押下では、現在選択行の１つ下の行を次の現在選択行とする。
【００２４】
利用者は、認識対象となる文字行が現在選択行となるようにボタン操作し（ステップ２０５）、認識実行ボタン１３１を押下することによって、認識対象となる文字行を決定する（ステップ２０６）。尚、カメラから取得した画像データから文字行認識の操作を利用者が行う際、現在選択行８０１の初期位置は、文字行６０１〜６０６の内、最も画像の中心に近いものとして表示部１２０に表示するのが望ましい。
【００２５】
図９は、文字行の文字認識をし、表記辞書との照合を説明する図である。
ステップ２０５で選択し、ステップ２０６で決定した認識対象文字行９００に対して、ステップ２０７では１文字毎に切り出し、ステップ２０８では各文字の文字識別処理を実行し、文字認識ネットワーク９１０を生成する。文字認識ネットワーク９１０は、文字識別の候補文字をネットワーク表現しメモリに格納したものである。文字認識ネットワーク９１０には、間違った文字識別の候補文字（図面の「｜」「／」など）や、認識対象文字列の前後に不要な文字列（図面の「Ｔ」「ｅ」「直」など）が含まれる場合がある。そこで、文字認識ネットワーク９１０と、予め具備する表記辞書との照合を行い（図２のステップ２０９）、表記と照合が一致した箇所の文字列を抽出することで、正しい認識結果を簡単に、短時間な処理で得ることができる。
【００２６】
また図９は、表記辞書９２０として電話番号の複数の表記パターンを予め記憶しておく例を示し、この表記辞書と上記抽出した文字列とを照合し、そのうち１つの表記９２１と一致した電話番号部分のみを認識結果文字列として抽出した例も示している。そして最終的に認識できた認識結果９３０は携帯端末の制御部１４０等にあるメモリに記憶、そして表示部１２０に表示され、利用者がその後に操作する内容に併せて、文字のテキストベースによる編集が可能となる。
【００２７】
以上説明した第１実施形態の特徴は、利用者が画像入力部１１０によって画像を撮影すると、その画像情報に含まれる文字行（画像情報の一部）が携帯端末１００で自動認識され、それが図８に示されるような枠８０１として表示部１２０に表示されることで、利用者が操作部１３０のボタン操作の選択によって自身が所望の文字行を選択できる点が大きな特徴である。更に、その文字行のうち、表記辞書９２０などを使用して必要な部分、例えば、図９の電話番号のみを自動認識し、必要でない部分（Ｔｅｌ：（直通）の部分）を認識しない点も特徴である。
【００２８】
図１０は、本発明に係る第２の実施形態の処理手順を説明する図である。尚、図１０の中の符号が図２と同一のステップは、同一機能を有するものとする為ここでの説明は省略する。
【００２９】
第２の実施形態において、上述した画像二値化（Ｓ２０２）や連結成分の抽出（Ｓ２０３）が行われた後、抽出した連結成分の外接矩形のサイズや、外接矩形間の距離の比較によって単語の抽出を行う（ステップ１００４）。続いて、ステップ１００５では、利用者が操作部１３０のボタン操作によって、画像内の文字認識対象となる単語を選択する。
【００３０】
図１１は、ステップ１００４において、文字行および単語を抽出し、単語座標を計算する処理を説明する図である。
図示する１行抽出領域１１００は、ステップ３０２と同様の方法で算出した文字行範囲の１つを示す画像の一部分、一領域であって、この１行抽出領域１１００は、連結成分の複数の外接矩形１１０１〜１１０８が含まれる。これらの外接矩形は、文字成分の外接矩形、具体的には１文字単位を示すものとなる。これらの外接矩形のサイズや、外接矩形間の距離を比較して、サイズが同様であるものや、距離が近い矩形を横方向に融合する。この矩形の融合の際には、単語と単語間の距離は、単語内の文字と文字間の距離と比べて大きいという原理を利用している。
【００３１】
続いて外接矩形１１０１〜１１０４の融合によって、単語左上座標１１１１および単語右下座標１１１２を算出し単語１１１０を得る。同様に、外接矩形１１０５〜１１０８の融合によって、単語左上座標１１２１および単語右下１１２２を算出し単語１１２０を得る。後述の例で明らかになるが、図示する１１１０，１１２０の単位が単語の単位を示し、１１０１〜１１０８の単位が文字の単位を示す。
【００３２】
図１２は、単語抽出結果を説明する図である。単語抽出領域１２００は上述からも明らかなように画像入力部１１０で入力された画像を示し、この領域内には複数の単語１２０１が含まれる。このうち、単語１２０２〜１２０４は、単語抽出領域１２００に含まれる単語の一例であり、図１２に示すＸ軸、Ｙ軸とある基準点を基に、図１３に示すように、各単語１２０２〜１２０４の左上座標および右下座標は、Ｘ軸とＹ軸に展開され、メモリ上の単語テーブル１３００に格納される。
【００３３】
図１４は、ステップ１００５における単語選択操作の際の、表示部１２０の画面遷移の表示例を説明する図である。
表示部１２０に、単語選択画面１４００と現在選択単語１４０１を表示する。現在選択単語１４０１は、ステップ１００４にて抽出された単語の１つで、この現在選択単語１４０１は図示するように単語の周りを枠で囲み、更に強調表示（赤色や緑色等）された方が望ましい。尚、上述した現在選択行８０１についても同様である。
【００３４】
利用者は表示部１２０に表示された画像１４００とその枠１４０１の表示を見ながら、操作部１３０の上ボタン１３２、右ボタン１３３、下ボタン１３４、左ボタン１３５を押下することによって、制御部１４０がそれを検知し、図１３に示す単語テーブル１３００の参照ポインタをインクリメントあるいはデクリメントし、単語の左上、右下座標を読み出し、表示部１２０の画面に表示する。
【００３５】
結果として、現在選択単語１４０１の表示が移動され認識対象となる新しい単語が選択され表示される。尚、上ボタン押下では、現在選択単語の１つ上の行に含まれる単語の内、現在選択単語からの座標値が最も近い単語を次の現在選択単語としてもよい。あるいは、現在選択単語の１つ上の行に含まれる単語の内、横方向の重なり合う部分が最も多い単語を次の現在選択単語としてもよい。下ボタン押下でも、同様の方法によって次の現在選択単語を決定する。左ボタン押下では、現在選択単語の１つ左にある単語を次の現在選択単語とし、右ボタン押下では、現在選択単語の１つ右にある単語を次の現在選択単語とする。
【００３６】
以上説明したように第２の実施形態によれば、利用者は表示部１２０に表示された画像のうち、認識対象となる単語が現在選択単語となるように操作部１３０のボタン操作し、認識実行ボタン１３１を押下することによって、認識対象となる単語を決定し、認識対象単語の文字認識処理を実行することが可能となる。
【００３７】
次に第３の実施形態について説明するが、第１，２実施形態が携帯端末１００において文字行又は単語を自動的に画像の中から自動的に抽出し、認識するのに対し、次に述べる例では利用者自身が画像の或る特定位置を選んで、その位置に基づいた認識を行う点で、第１，２実施形態に比較して安価で、簡便な装置を提供する点で優位である。
【００３８】
図１５は、本発明に係る第３の実施形態の処理手順を説明する図である。尚、図１５の中の符号が図２と同一のステップは、同一機能を有するものとする為ここでの説明は省略する。
【００３９】
ステップ１５０４では、取得した画像のある部分領域内部に含まれる文字行の抽出を行う。図１６は、部分領域内部の文字行の抽出を説明する図である。表示部１２０には、画像入力部１１０より取得した認識対象画像１６００と、予め携帯端末１００に備わるカーソル１６０１が表示されている。この認識対象画像１６００に対して、カーソル１６０１を中心とした一定の領域１６０３内部で、前記実施例と同様の処理によって文字行の抽出を行い、抽出した複数の文字行のうち、カーソルに最も近い文字行を現在選択文字行１６０２とする処理を制御部１４０によって実行する。
【００４０】
これにより、利用者が操作部１３０のボタン操作によって、表示部１２０に表示されたカーソル１６０１を、上下左右の一定の距離だけ移動させ（ステップ１５０５）、移動後のカーソル位置を中心として再び文字行抽出処理を実施し、カーソル１６０１に最も近い文字行を現在選択文字行とする。
【００４１】
そして、利用者が文字認識対象となる文字行を選択し終えた後、認識実行ボタン１３１を押下することで、認識対象となる文字行を決定し、次ステップへ移る。
【００４２】
尚、この実施例では文字行画像あるいは文字列画像選択後に文字認識を実行したが、予め画像に含まれる各画像の文字認識を行い、利用者がボタン操作によって選択した文字行画像あるいは文字列画像に対応する認識結果を、文字認識対象の認識結果とする形でもよく、また、上記例の文字行は、文字列と読み替えてもよい。
【００４３】
また、以上説明した第１〜３実施形態の各構成、制御、処理等は最適な形態にて組み合わせ可能であって、例えば、第１実施形態で制御部が抽出、認識した文字行に対し、第３実施形態で表示されたカーソルで指定することで、文字行のうちカーソルで指定された部分の単語（第２実施形態）を文字認識するような形態でも良い。
【００４４】
また、第１実施形態の文字行選択と、第２実施形態の単語選択とを携帯端末上の制御部にて判断させやすいように、表示部に予め文字行認識モードか或いは単語認識モードかをメニュー画面で表示し、利用者が操作部によってどちらのモードを判断するかを制御部で認識することで、上述した第１又は第２実施形態の何れか一方の処理、制御を行う形態であっても良い。
【００４５】
【発明の効果】
以上に説明したように、携帯端末の入力部より取得した画像を、その表示部の画像表示のみを見て、認識対象を選択することができる。また、画像撮影時に、表示部を覗き込めない状態で撮影しても、撮影後に表示部に表示された画像をみながら、認識対象を選択することができる。また、画像撮影の際に手ブレがあっても、取得した画像の範囲内にあっては画像撮影後に認識対象の位置を変更することができる。また、同じ画像に対して、再度認識対象を選択する操作を繰り返すことで、同一画像の複数箇所の認識をすることができる。
【図面の簡単な説明】
【図１】携帯端末の詳細構成図を示す。
【図２】第１の実施形態に係る文字認識方法のフロー図である。
【図３】第１の実施形態に係る文字行抽出処理を説明するフロー図である。
【図４】第１の実施形態に係る文字行抽出の投影分布を説明する図である。
【図５】第１の実施形態に係る文字行の座標計算の方法を説明する図である。
【図６】第１の実施形態に係る文字行抽出結果の例を説明する図である。
【図７】第１の実施形態に係る文字行テーブルを説明する図である。
【図８】第１の実施形態に係る文字行選択時の表示部における画面遷移の例を説明する図である。
【図９】第１の実施形態に係る文字行内の文字認識を説明する図である。
【図１０】第２の実施形態に係る文字認識方法の処理のフロー図である。
【図１１】第２の実施形態に係る単語の座標計算の方法を説明する図である。
【図１２】第２の実施形態に係る単語抽出結果の例を説明する図である。
【図１３】第２の実施形態に係る単語テーブルを説明する図である。
【図１４】第２の実施形態に係る単語選択時の表示部における画面遷移の例を説明する図である。
【図１５】第３の実施形態を説明する文字認識処理のフロー図を示す。
【図１６】第３の実施形態における表示部の画面選択操作及び画像の認識領域を説明する図である。
【符号の説明】
１００…携帯情報端末（携帯端末）、１１０…画像入力部、１２０…表示部、１３０…操作部、１４０…制御部、１５０…文字認識部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for selecting a character string in a captured image and recognizing the character in an image input means, specifically, a portable information terminal or a mobile phone having a camera.
[0002]
[Prior art]
Patent Document 1 in which a mobile phone is equipped with a camera has been proposed. Generally, when recognizing characters on an image using this portable information terminal, the user manually adjusts the position and orientation of the terminal main body, so that the recognition target frame displayed on the display unit , It is necessary to adjust so that the recognition object fits.
[0003]
[Patent Document 1]
JP 2003-169187 A
[Problems to be solved by the invention]
To perform complex processing such as character recognition using such a conventional portable terminal, the user picks up the terminal body and adjusts the position and orientation of the body so that the recognition target is within the frame of the display unit. The user himself had to make fine adjustments so as to be within the range, and there were the following problems in terms of operability.
(1) At this time, since the screen of the display unit is small, it is difficult to determine the character to be recognized in the frame.
(2) In order to direct the main body toward the recognition object, the position and orientation of the main body are adjusted and the position and orientation of the display section change at the same time, so the display on the display section is difficult to see.
(3) Since the main body is directed to the recognition target, it cannot be directed to an angle at which the display unit cannot be viewed.
(4) When the shutter button is pressed during image acquisition, the terminal main body vibrates and the recognition target comes off the frame.
(5) Since the recognition target is determined at the time of image acquisition, an operation such as changing the recognition target cannot be performed after the image acquisition, and character recognition at a plurality of locations cannot be performed with one image acquisition.
There were problems such as.
[0005]
In view of the above, the present invention has been made in view of the above points, and solves some or all of the above problems. In particular, in character recognition using a portable information terminal or a mobile phone, after image acquisition. A character string selection method by which a character string image to be recognized can be easily selected by arbitrarily selecting a character string image to be recognized by a user operation, and a character recognition method for recognizing the selected character string image. It is intended to provide.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, an image input unit (or a camera or an imaging unit) for inputting an image, a display unit for displaying image data input by the input unit, and an operation unit for a user to perform an input operation such as a button, In a portable information terminal or mobile phone having a character recognition unit for recognizing characters in an image and a control unit for controlling the whole, a character string image or a character line image on the image is extracted after shooting, and the extracted character string Provided is a portable terminal or a character recognition method suitable for selecting a character string image or a character line image to be recognized within a certain range by a user operation from an image or a character line image.
[0007]
Further, in a character recognition method using a portable information terminal or a mobile phone and the terminal, after recognizing the inside of the character string image or the inside of the character line image, the recognition target character string is obtained by collating the character recognition result with the notation dictionary. An apparatus or method for extraction is provided.
[0008]
In addition, after extracting a character string image or character line image, a word is extracted from the character string image or character line image, a word to be recognized is selected from the extracted words by a user operation, and the selected word An apparatus and method for character recognition are provided.
[0009]
In addition, by allowing the user to select in advance which part of the image the user wants to recognize by the operation unit, an apparatus and a character recognition method for easily recognizing the character of the image based on the position of the selected image. provide.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS.
FIG. 1 is a configuration diagram showing an outline of a portable information terminal or a mobile phone 100 (or simply referred to as a form terminal or form apparatus) having an image input unit according to the present embodiment. An image of a character recognition target such as a business card, magazine, or signboard is input from the image input unit 110, line candidates are extracted by the character recognition unit 150, and character line candidates (images) are displayed on the display unit 120. A user selects a character line to be recognized from character line candidates extracted by operating the operation unit 130, and the recognition unit 150 recognizes the selected character line.
[0011]
The operation unit 130 is generally used when a user makes a call or the like. In addition, a recognition execution button 131 that is pressed when the user performs recognition of an image displayed on the display unit 120. The upper button 132 for selecting the upper line or character string to be recognized for the image displayed on the display unit, the right button 133 used for selecting the right character string, and the lower line or character string are selected. A lower button 134 used sometimes and a left button 135 used when selecting a left character string are also provided.
[0012]
The character recognition unit 150 includes an image binarization unit 151 that binarizes the digital image input from the image input unit 110, a connected component extraction unit 152 that extracts a connected component in which black pixels are connected, and a combination of the connected components. A character line extraction unit 153 that extracts character line candidates, a character cutout unit 154 that cuts out characters from the character line image, a character identification unit 155 that converts a character pattern divided into characters into character codes, and a character recognition result. The character recognition unit 150, the operation unit 130, the image input unit 110, the display unit 120, etc., and the control of each unit are controlled by a CPU, a memory The function is controlled by the control unit 140 including the above. Each unit described above and below can be expressed as means, mechanism, and unit, and is basically a function that is processed and controlled by software or hardware, or a combination of software and hardware. It should be noted that it is desirable that an image captured, acquired, input, or the like is stored in a memory such as a control unit or a memory card provided in a portable terminal so as to be used for character recognition described later. In this example, since a horizontally written image is used, the example of each part is described as a character line, but it goes without saying that it may be a vertically written image in the vertical direction with respect to the display image, that is, a character string. .
[0013]
FIG. 2 is a diagram for explaining the processing procedure of the first embodiment using the mobile terminal of FIG. The processing and control described below are mainly performed by the control unit and control unit 140, but the description is omitted.
[0014]
The user uses the image input means 110 such as a CCD or an image sensor provided in the portable information terminal or the cellular phone 100 to take an image of a business card, magazine, signboard, or the like as a character recognition target, and temporarily stores the image in the apparatus. The digital image is captured on the memory provided in the storage memory or the control unit 140 (step 201). The digital image captured and binarized is binarized by the image binarization unit 151 of the character recognition unit 150 (step 202), and among the components binarized by the connected component extraction unit 152, the connected components of black pixels are converted. Extract (step 203).
[0015]
The character line extraction unit 153 compares the size of the circumscribed rectangles of the extracted connected components and the distance between the circumscribed rectangles, and extracts the character lines included in the acquired image (step 204). As will be described later, the extracted character line is displayed on the display unit 120 so as to be understood by the user, and the user who sees the extracted character line operates as a character recognition target in the image by operating the button of the operation unit 130. A row is selected (step 205).
[0016]
In step 206, after the user has selected the character line to be recognized, the user presses the recognition execution button 131 to determine the character line to be recognized, and proceeds to the next step.
Character extraction for each character is performed in the character line selected in step 206 (step 207), and character identification is performed by the character identification unit 155 for each extracted character pattern (step 208). ). Then, the notation dictionary matching unit 156 performs matching with a notation dictionary provided in advance as a result of character identification (step 209), and the character string that has been matched in step 210 is output as a recognition result to perform character recognition processing. finish. The character recognition result is displayed on the display unit 120, and the user recognizes a part of a specific area of the image input by the image input unit 110 by executing an operation such as saving the character by the operation unit 130. It can be saved in the form (character code) combined with the address book.
[0017]
FIG. 3 is a diagram showing a detailed processing procedure of the character line extraction unit 153 that performs the main function in the character line extraction step 204 of FIG.
[0018]
As described above, in particular, connected components of black pixels are extracted from the digital image by the binarization process, and the projection distribution in the horizontal direction of the image is obtained by analyzing the extracted image, particularly the state of the black pixels. (Step 301). Then, the vertical range of the character line, specifically, the vertical coordinate position is obtained based on a certain reference from the projection distribution (step 302), and the circumscribing of the connected component obtained in step 203 within this character line range. The final character line coordinates can be obtained (step 304) by comparing the size of the rectangle and the distance between the circumscribed rectangles, that is, the process of merging the connected components (step 304). To detect.
[0019]
FIG. 4 is a principle diagram for explaining the calculation of the projection of the character line included as the image information in step 302. As shown in the figure, the character line extraction area 400 is the whole or a part of the character recognition target image acquired in step 201. The name, company name, company address, telephone and fax number, and e-mail address information are shown as examples of business cards. Is included.
[0020]
The input image described above is projected in the horizontal direction of the drawing to obtain a projection distribution. As a method for obtaining the projection, a method of adding the number of pixels in the horizontal direction as shown in the figure is simple and desirable. Or you may take the addition value of the length of the side of the circumscribed rectangle of the connected component extracted at step 203. The coordinate axis 412 is a horizontal projection distribution, and the coordinate axis 411 corresponds to the vertical coordinate axis of the region. By this projection distribution calculation process, it is possible to grasp in the portable terminal that the non-zero value range 413 to 418 corresponds to the range of the vertical coordinate where the character line exists.
[0021]
FIG. 5 is a diagram for explaining the processing for fusing the connected components in step 303 and calculating the character line coordinates in step 304.
The one-line extraction area 500 is one of the character line ranges calculated in step 302, and specifically illustrates items of telephone numbers in the character line range 416 of FIG. The one-line extraction area 500 includes a plurality of circumscribed rectangles 501 of connected components, and the plurality of circumscribed rectangles become circumscribed rectangles of character components. Then, the sizes of these circumscribed rectangles and the distances between the circumscribed rectangles are compared, and rectangles having the same size or rectangles close to each other are merged in the horizontal direction. By merging the circumscribed rectangles 501 of the connected components, a character line upper left coordinate 511 and a character line lower right coordinate 512 are calculated, and the character line 510 described in the final step 304 is generated.
[0022]
FIG. 6 is a diagram for explaining a character line extraction result. The character line extraction area 600 shows the original image of FIG. 4 recognized by the above-described steps of FIG. 3 and developed into the obtained character lines (images). As shown in the figure, six character lines 601 are shown. ˜606 can be automatically recognized on the apparatus. As for the coordinate points of these character lines, upper left coordinates and lower right coordinates are obtained from a certain reference position, and as shown in FIG. 7, the upper left coordinates X-axis and Y-axis coordinate points and the lower right coordinates X-axis for each character line. , Y-axis coordinate points are stored in a character line table 700 on a memory provided in the control unit 140.
[0023]
FIG. 8 is a diagram for explaining a display example of screen transition of the display unit 120 in the character line selection operation in step 205.
A character line selection screen 800 and a currently selected line 801 are displayed on the display unit 120. The currently selected line 801 is one of the character lines extracted in step 204. When the user depresses the upper button 132 or the lower button 134, this is detected, the reference pointer of the character line table 700 shown in FIG. 7 is incremented or decremented, and the upper left and right of the character line temporarily stored in the table 700 are displayed. The lower coordinates are read and displayed on the display unit 120 as a screen. As a result, the display of the currently selected line 801 is moved, and a new character line to be recognized is selected and displayed. When the up button is pressed, the line immediately above the currently selected line is set as the next currently selected line, and when the down button is pressed, the line immediately below the currently selected line is set as the next currently selected line.
[0024]
The user operates the buttons so that the character line to be recognized becomes the currently selected line (step 205), and presses the recognition execution button 131 to determine the character line to be recognized (step 206). When the user performs a character line recognition operation from the image data acquired from the camera, the initial position of the currently selected line 801 is assumed to be closest to the center of the image among the character lines 601 to 606 on the display unit 120. It is desirable to display.
[0025]
FIG. 9 is a diagram illustrating character recognition of a character line and explaining collation with a notation dictionary.
In step 207, the recognition target character line 900 selected in step 205 and determined in step 206 is cut out character by character. In step 208, character recognition processing for each character is executed to generate a character recognition network 910. The character recognition network 910 is a character representation candidate character represented in a network and stored in a memory. The character recognition network 910 includes incorrect character identification candidate characters (such as “|” and “/” in the drawing) and unnecessary character strings (“T”, “e”, and “direct” in the drawing) before and after the recognition target character string. Etc.) may be included. Therefore, collation between the character recognition network 910 and a notation dictionary provided in advance is performed (step 209 in FIG. 2), and a character string where the notation and collation match is extracted, so that a correct recognition result can be easily and easily shortened. It can be obtained with time-consuming processing.
[0026]
FIG. 9 shows an example in which a plurality of phone number notation patterns are stored in advance as the notation dictionary 920. The notation dictionary is compared with the extracted character string, and the phone number that matches one of the notation 921 is shown. An example in which only a portion is extracted as a recognition result character string is also shown. The finally recognized recognition result 930 is stored in a memory in the control unit 140 or the like of the portable terminal, and is displayed on the display unit 120, and the text-based editing of characters is performed in accordance with the content that the user subsequently operates. Is possible.
[0027]
The feature of the first embodiment described above is that when a user takes an image with the image input unit 110, a character line (part of the image information) included in the image information is automatically recognized by the portable terminal 100, A great feature is that the user can select a desired character line by selecting a button operation on the operation unit 130 by being displayed on the display unit 120 as a frame 801 as shown in FIG. Further, in the character line, a necessary part, for example, only the telephone number in FIG. 9 is automatically recognized by using the notation dictionary 920 and the like (Tel: (direct) part) is not recognized. It is a feature.
[0028]
FIG. 10 is a diagram for explaining the processing procedure of the second embodiment according to the present invention. Note that steps having the same reference numerals in FIG. 10 as those in FIG. 2 have the same functions, and thus description thereof will be omitted.
[0029]
In the second embodiment, after the above-described image binarization (S202) and connected component extraction (S203) are performed, the word is determined by comparing the size of the circumscribed rectangle of the extracted connected component and the distance between the circumscribed rectangles. Is extracted (step 1004). Subsequently, in step 1005, the user selects a word to be a character recognition target in the image by operating a button on the operation unit 130.
[0030]
FIG. 11 is a diagram for explaining the process of extracting character lines and words and calculating word coordinates in step 1004.
The one-line extraction area 1100 shown in the figure is a part of an image showing one of the character line ranges calculated by the same method as in step 302, and this one-line extraction area 1100 is a plurality of circumscribed parts of connected components. Rectangles 1101-1108 are included. These circumscribed rectangles indicate circumscribed rectangles of character components, specifically, one character unit. The sizes of these circumscribed rectangles and the distances between the circumscribed rectangles are compared, and rectangles having the same size or rectangles close to each other are merged in the horizontal direction. When the rectangles are merged, the principle that the distance between the words is larger than the distance between the characters in the word is used.
[0031]
Subsequently, the word 1110 is obtained by calculating the word upper left coordinate 1111 and the word lower right coordinate 1112 by fusing the circumscribed rectangles 1101 to 1104. Similarly, a word 1120 is obtained by calculating a word upper left coordinate 1121 and a word lower right 1122 by fusing the circumscribed rectangles 1105 to 1108. As will be apparent from the example described later, the units 1110 and 1120 shown in the figure indicate word units, and the units 1101 to 1108 indicate character units.
[0032]
FIG. 12 is a diagram for explaining a word extraction result. As is clear from the above, the word extraction area 1200 indicates an image input by the image input unit 110, and a plurality of words 1201 are included in this area. Among these, the words 1202 to 1204 are examples of words included in the word extraction area 1200. Based on the X axis and the Y axis shown in FIG. 12 and a certain reference point, as shown in FIG. The upper left coordinates and lower right coordinates 1204 are expanded on the X axis and the Y axis and stored in the word table 1300 on the memory.
[0033]
FIG. 14 is a diagram for explaining a display example of screen transition of the display unit 120 at the time of the word selection operation in step 1005.
A word selection screen 1400 and a currently selected word 1401 are displayed on the display unit 120. The currently selected word 1401 is one of the words extracted in step 1004. The currently selected word 1401 should be surrounded by a frame as shown in the figure and further highlighted (red, green, etc.). desirable. The same applies to the currently selected row 801 described above.
[0034]
The user presses the upper button 132, the right button 133, the lower button 134, and the left button 135 of the operation unit 130 while viewing the image 1400 displayed on the display unit 120 and the display of the frame 1401, thereby the control unit 140. Is detected, the reference pointer of the word table 1300 shown in FIG. 13 is incremented or decremented, and the upper left and lower right coordinates of the word are read and displayed on the screen of the display unit 120.
[0035]
As a result, the display of the currently selected word 1401 is moved, and a new word to be recognized is selected and displayed. When the up button is pressed, a word having the closest coordinate value from the currently selected word among the words included in the line immediately above the currently selected word may be set as the next currently selected word. Or it is good also considering the word with the most overlapping part of the horizontal direction among the words contained in the line one line above the present selection word as the next present selection word. Even when the down button is pressed, the next currently selected word is determined by the same method. When the left button is pressed, the word to the left of the currently selected word is the next currently selected word, and when the right button is pressed, the word to the right of the currently selected word is the next currently selected word.
[0036]
As described above, according to the second embodiment, the user operates the buttons of the operation unit 130 so that the word to be recognized becomes the currently selected word among the images displayed on the display unit 120, and the recognition is performed. By depressing the execution button 131, a word to be recognized can be determined, and character recognition processing for the recognition target word can be executed.
[0037]
Next, a third embodiment will be described. The first and second embodiments automatically extract and recognize character lines or words from an image in the portable terminal 100, whereas the following will be described. In the example, the user himself / herself selects a specific position of the image and performs recognition based on the position, which is advantageous in terms of providing an inexpensive and simple apparatus compared to the first and second embodiments. is there.
[0038]
FIG. 15 is a diagram for explaining the processing procedure of the third embodiment according to the present invention. Note that the steps in FIG. 15 having the same reference numerals as those in FIG. 2 have the same functions, and thus the description thereof is omitted here.
[0039]
In step 1504, a character line included in a partial area of the acquired image is extracted. FIG. 16 is a diagram for explaining extraction of a character line inside a partial area. The display unit 120 displays a recognition target image 1600 acquired from the image input unit 110 and a cursor 1601 provided in advance on the mobile terminal 100. In this recognition target image 1600, a character line is extracted by a process similar to that in the above embodiment within a certain region 1603 centered on the cursor 1601, and the closest to the cursor among the extracted character lines. The control unit 140 executes processing for setting the character line to the currently selected character line 1602.
[0040]
As a result, the user moves the cursor 1601 displayed on the display unit 120 by a certain distance up, down, left, and right by a button operation on the operation unit 130 (step 1505), and the character line is again centered on the moved cursor position. Extraction processing is performed, and the character line closest to the cursor 1601 is set as the currently selected character line.
[0041]
Then, after the user finishes selecting the character line to be recognized, the user presses the recognition execution button 131 to determine the character line to be recognized, and proceeds to the next step.
[0042]
In this embodiment, the character recognition is executed after selecting the character line image or the character string image. However, the character line image or the character string image selected by the user by performing the character recognition of each image included in the image beforehand. The recognition result corresponding to 1 may be the recognition result of the character recognition target, and the character line in the above example may be read as a character string.
[0043]
In addition, the configurations, controls, processes, and the like of the first to third embodiments described above can be combined in an optimal form. For example, for the character line extracted and recognized by the control unit in the first embodiment, By specifying with the cursor displayed in the third embodiment, the form of the word (second embodiment) of the part specified by the cursor in the character line may be recognized.
[0044]
In addition, whether the character line recognition mode or the word recognition mode is set in advance on the display unit so that the control unit on the portable terminal can easily determine the character line selection of the first embodiment and the word selection of the second embodiment. This is a mode in which one of the processes in the first or second embodiment described above is controlled by displaying on the menu screen and recognizing which mode the user determines with the operation unit by the control unit. May be.
[0045]
【The invention's effect】
As described above, it is possible to select a recognition target for an image acquired from the input unit of the mobile terminal by looking only at the image display on the display unit. Further, even when the image is shot in a state where the display unit cannot be looked into, the recognition target can be selected while viewing the image displayed on the display unit after shooting. Even if there is a camera shake at the time of image capturing, the position of the recognition target can be changed after image capturing within the range of the acquired image. Further, by repeating the operation of selecting a recognition target again for the same image, a plurality of locations of the same image can be recognized.
[Brief description of the drawings]
FIG. 1 shows a detailed configuration diagram of a mobile terminal.
FIG. 2 is a flowchart of a character recognition method according to the first embodiment.
FIG. 3 is a flowchart for explaining character line extraction processing according to the first embodiment;
FIG. 4 is a diagram for explaining a projected distribution of character line extraction according to the first embodiment.
FIG. 5 is a diagram for explaining a method of calculating coordinates of a character line according to the first embodiment.
FIG. 6 is a diagram illustrating an example of a character line extraction result according to the first embodiment.
FIG. 7 is a diagram illustrating a character line table according to the first embodiment.
FIG. 8 is a diagram illustrating an example of screen transition in the display unit when a character line is selected according to the first embodiment.
FIG. 9 is a diagram for explaining character recognition in a character line according to the first embodiment.
FIG. 10 is a flowchart of processing of a character recognition method according to a second embodiment.
FIG. 11 is a diagram illustrating a method for calculating word coordinates according to the second embodiment.
FIG. 12 is a diagram for explaining an example of a word extraction result according to the second embodiment.
FIG. 13 is a diagram for explaining a word table according to the second embodiment.
FIG. 14 is a diagram illustrating an example of screen transition on the display unit when a word is selected according to the second embodiment.
FIG. 15 is a flowchart of character recognition processing for explaining the third embodiment;
FIG. 16 is a diagram illustrating a screen selection operation and an image recognition area of a display unit according to the third embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 100 ... Portable information terminal (mobile terminal), 110 ... Image input part, 120 ... Display part, 130 ... Operation part, 140 ... Control part, 150 ... Character recognition part

Claims

An image input unit that captures or acquires or inputs an image, a display unit that displays image data from the image input unit, an operation unit that can perform an input operation, and a character recognition unit that recognizes characters included in the image data And a portable information terminal having a control unit for controlling each unit,
The control unit automatically extracts a character string, a character line, or a word from the image data from the image input unit, and the image content extracted together with the image data on the display unit as a character string, a character line, or a word. A portable information terminal that displays and recognizes a specific character string, character line, or word displayed on the display unit by a selection operation from the operation unit.

The character recognition unit has a dictionary for storing a plurality of different notation information,
2. The portable information terminal according to claim 1, wherein the control unit recognizes only a specific image of the character string or the character line by comparing with the notation information of the dictionary among the extracted character string or the character line.

The control unit performs a fusion process for obtaining a plurality of circumscribed rectangles of each character included in the image data and connecting the plurality of circumscribed rectangles for the character string or character line to be extracted. The portable information terminal according to claim 1 or 2.

4. The character string, character line, or word displayed by the display unit is displayed so as to be surrounded by a frame or highlighted with respect to the image data displayed together. The portable information terminal described.

An image input unit that captures or acquires or inputs an image, a display unit that displays image data from the image input unit, an operation unit that can perform an input operation, and a character recognition unit that recognizes characters included in the image data And a portable information terminal having a control unit for controlling each unit,
The control unit displays a cursor together with the image data displayed on the display unit, and extracts and extracts an image of a specific area including a position designated by the operation and selection of the cursor from the operation unit. A portable information terminal, wherein a character string, a character line, or a word included in an image of a specific area is extracted and displayed on the display unit.

An input unit for inputting an image, a display unit for displaying image data input by the input unit, an operation unit operated by a user, a character recognition unit for recognizing characters in the image, and a control for controlling each unit In a portable information terminal or mobile phone equipped with a
A character string image or a character line image on the image is extracted after shooting by the input unit, and a character string image or a character line image to be recognized by an operation of the operation unit is extracted from the extracted character string image or character line image. A character recognition method comprising: selecting and recognizing characters inside a selected character string image or character line image.

7. The character recognition method according to claim 6, wherein a character string to be recognized is extracted by collating a character recognition result with a notation dictionary after the character recognition unit recognizes the character string image or the character line image. Character recognition method.

8. The character recognition method according to claim 6, wherein the control unit cuts out a word from the character string image or the character line image after extracting the character string image or the character line image, and performs the operation from the cut out word. A character recognition method, comprising: selecting a word to be recognized by an operation of a unit; and recognizing the selected word.

9. The character recognition method according to claim 6, wherein the control unit extracts a character string image or a character line image, performs character recognition in the character string image or the character line image in advance by the character recognition unit, and performs the operation. A character recognition method in which a recognition result corresponding to a character string image or a character line image selected by operation of a section is a final result.