JP4431335B2

JP4431335B2 - String reader

Info

Publication number: JP4431335B2
Application number: JP2003206391A
Authority: JP
Inventors: 広新庄; 昌史古賀
Original assignee: Hitachi Omron Terminal Solutions Corp
Current assignee: Hitachi Omron Terminal Solutions Corp
Priority date: 2003-08-07
Filing date: 2003-08-07
Publication date: 2010-03-10
Anticipated expiration: 2023-08-07
Also published as: JP2005055991A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像入力手段、具体的にはカメラを持った携帯情報端末または携帯電話等において、撮影した画像中の文字行および文字列を抽出する技術に関する。
【０００２】
【従来の技術】
従来より，紙に印刷ないし手書きされた文字を読取る装置はOCR（Optical Character Reader）として知られている。主な応用分野は，帳票処理，郵便物の区分，文書のテキスト化などである。典型的なOCRでは，以下のような手順で文字を読取る。まず紙面をスキャナを用いて光電変換して計算機に取り込み(画像入力)，読取りの対象である文字行の領域を切出し(文字行切出し)，文字行から個々の文字を切出し(文字切出し)，個々の文字が何であるかを識別し(文字識別)，言語情報などを利用して読取った文字群を文字列として解釈する(後処理)。
【０００３】
文字認識における後処理の一例として，認識結果の個々の文字と言語情報辞書に登録された単語との照合処理することにより文字認識結果の誤りを訂正する機能がある。英単語を例とすると，先頭の文字が誤認識している文字認識結果“oharacter”を辞書内の単語“character”とを照合することにより，誤った文字を修正して単語として正しい認識結果を得ることができる。このような後処理の従来技術としては，非特許文献１の方式や，文字切出し，認識，照合を一体化した非特許文献２の方式が提案されている。
【０００４】
後処理の対象として，単語だけでなく住所の丁目番地号部分の表記形式を照合対象とした特許文献１の方式が提案されている。特許文献１の方式では，N-NN-NN（ただし，-は区切り記号，Nは0から9の数字をあらわす）などの表記形式を辞書に格納し，文字認識結果と表記形式を照合することにより，丁目番地号部分の誤りを修正する。この技術を電話番号認識に適用した場合，「TEL」などの電話番号以外の文字が記載された行から，電話番号部分のみを抽出して認識結果を得ることができる。具体的には，0N-NNNN-NNNN（ただし，0は数字の0，Nは0〜9の数字，-は区切り記号をあらわす）などの電話番号の表記形式を辞書に格納し，文字認識結果と照合することにより，電話番号以外の文字を認識結果から削除することができる。
【０００５】
一方，近年は携帯電話やPDA(personal digital assistant)などの携帯情報端末に搭載されたカメラを画像入力の手段として，文書，看板，標識などの文字を読取る試みが現われている。携帯電話にカメラを備えた従来例として，特許文献２や特許文献３が提案されている。これらの機器での認識対象は，電話番号，メールアドレス，URL，単語などである。特に，特許文献３では，認識精度向上のため，認識対象を事前に利用者が選択し，この認識対象にしたがって字種限定や後処理の方式の切り替える方式が提案されている。
【０００６】
一般に，文字認識における課題の一つとして，読取り対象の文字領域を抽出することが挙げられる。帳票OCRでは，事前に読取り対象の文字が記載された領域を厳密に指定しておき，帳票全面の画像からその領域内の画像を切出して認識するという手法が一般的である。また，文書OCR等では，画面上に表示された文書に対し，認識対象の領域をマウスなどで指定するという手法もある。一方，特許文献２の携帯情報端末では，画面上に表示されたマーカに近い文字行領域を自動的に抽出して，認識するという手法が提案されている。
【０００７】
【特許文献１】
特開平11−207266号公報
【特許文献２】
特開2003−78640号公報
【特許文献３】
特開2002−152696号公報
【特許文献４】
特開2002−366463号公報
【特許文献５】
特開平11−203404号公報
【非特許文献１】
丸川勝美他，“手書き漢字住所認識のためのエラー修正アルゴリズム”、情報処理学会論文誌，Vol.35, No.6, 1994．
【非特許文献２】
O.R. Agazzi, et al., "Connected And Degraded Text Recognition using Planar Hidden Markov Models," Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. V-113-V-116, 1993
【０００８】
【発明が解決しようとする課題】
紙面上の同一行内に複数の文字列もしくは単語があり，読取り対象となるものはその一部であることがある。このような場合，応用分野に応じて予め定められた規則に従い，自動的に装置が読取り対象となる文字列を判別することが必要である。
【０００９】
しかしながら，特許文献２，特許文献３，特許文献４などの携帯情報端末を用いた文字認識の従来例においては，同一行内に読取り対象となる文字列は1つだけであり，不要な文字列は画像中に存在しないという前提に基づいていた。
【００１０】
一方，現実には同一行内に認識対象以外の文字列が存在する。認識対象の文字列の部分のみを画像入力できる場合は問題ない。しかし，一般にカメラからの画像入力ではピントを調整するために利用者の所望の画角が得られないため，認識対象以外の文字列も画像中に含まれてしまうことが多い。このような場合，認識結果に不要な文字列が付加されてしまうため，利用者の意図とは異なる文字認識結果が得られてしまうという問題がある。
【００１１】
URLの認識を例とすると，図５(a)に示すようにURLの他に関係ない文字列がある場合，認識結果はURLと不要な文字列を一緒にした “お問合せhttp://www.xxxxx.co.jptel:012-3456-7890(ダイヤルイン)”になるため，目的のWebページに接続できない。e-mailアドレスの認識も同様である。
【００１２】
英単語を認識して翻訳結果を得たい場合を例とすると，行単位でしか文字認識できない仕様であれば，一単語のみを抽出して翻訳することが困難である。例として，行内に“This is a pen.”という文が記載されており，penのみを認識したい場合について説明する。単純な認識結果は“Thisisapen.”であり，penのみを抽出できない。単語照合によりpenのみを抽出することは可能であるが，単語間の区切りが明確でない場合には，単語照合の回数が増える（apen, sapen,isapenなどと照合）ため，CPUの性能が低い携帯情報端末での実装には不向きである。さらに，文字認識を誤った場合には，単語照合でも正しい結果が得られないという問題もある。例えばaをoと誤認識し，“a pen”を“open”と認識した場合は，単語照合によりopenを認識結果としてしまう場合がある。
【００１３】
上記の問題を解決するため，文字列間の空白を利用して文字列や単語を検出する手法がある。しかしながら，文字間隔と文字列間隔の空白の幅は必ずしも明確に区別できるとは限らないため，空白を利用した文字列抽出は誤りが発生する場合がある。この段階で読取り対象領域を誤って実際よりも短い文字列を選択すれば，決して正しい認識結果を得ることはできない。
【００１４】
そこで、本発明では、以上のような点に鑑みてなされたもので、上記課題の一部又は全部を解決すると共に、特に、携帯情報端末または携帯電話等を用いた文字認識において、利用者の操作によって認識対象となる文字列画像を任意に選択することで、容易に対象となる文字列画像を選択できる文字列選択方法、および選択された文字列画像を認識する文字認識方法を提供することを目的としている。
【００１５】
【課題を解決するための手段】
上記目的を達成するため、画像を撮影又は取得又は入力する画像入力部と、この画像入力部からの画像データを表示する表示部と、入力の操作が可能な操作部と、画像データに含まれる文字を認識する演算部と、前記各部を制御する制御部とを有する携帯情報端末において、画像入力部からの画像データのうち、前記操作部から入力された認識モードにより，読取り対象の文字列又は文字行又は単語の抽出方式を自動的に切り替え，文字列又は文字行又は単語を自動的に抽出し、前記表示部に画像データと共に抽出した画像内容を文字列又は文字行又は単語として表示し、前記操作部からの選択操作によって前記表示部に表示された特定の文字列又は文字行又は単語を認識するのに好適な携帯情報端末、又は文字認識方法を提供する。
【００１６】
具体的には，文字認識の対象が電話番号，e-mailもしくはURL，英単語で行抽出の方式を切り替える。この処理は，前述の後処理で読取り対象を限定できる場合は文字行全体を認識対象とし，限定できない場合は空白などを利用して行を分割して認識対象を限定とするという方針に基づく。より具体的には，電話番号であれば文字行全体を抽出し，e-mailやURLであれば行の左端からe-mailやURLの終了までを抽出し，英単語であれば英単語のみを抽出する。上記の処理の根拠は以下の通りである。電話番号の認識においては，前述の特許文献１を用いた手法により行中から電話番号の文字のみを判別できるため，行全体を認識対象とすることができる。e-mailやURLの認識においては，単語照合により“http://”や“e-mail:”などの文字列を検出できるため，認識対象の左側は限定する必要がない。ただし，前述の通り右側に不要な文字列が存在する場合にはアドレス認識を誤るため，右側の不要な文字列を削除して認識対象とする。英単語の認識においては，左右の空白などを検出して単語のみを認識対象とする。
【００１７】
【発明の実施の形態】
以下、本発明の実施形態を図１から図９を用いて説明する。
図１は、本実施形態に係る画像入力手段を持つ携帯情報端末あるいは携帯電話１０１（又は単に携帯端末、携帯装置とも言う）の概略を示す構成図である。
携帯情報端末１０１は，カメラなどの画像入力部１０２，演算部１０３，表示部１０４，操作部１０５，通信部１０６，記憶部１０７を有する。演算部１０３は記憶部１０７に格納されたプログラムで指定された手順に従い，各部を制御すると共に，文字行抽出や画像符号化などの処理を実行する。表示部１０４は，例えば液晶パネルなどの装置から構成されており，画像や文字などの情報を視覚的に表示する。入力部１０５は，例えばボタンなどの装置から構成されており，装置を操作している人間からの入力を受付ける。通信部１０６は，例えば無線LANのような装置から構成されており，外部との通信を行う。記憶部１０７は，例えばスタティックRAMのようなものであり，処理手順を格納するプログラムや，文字行抽出処理や文字認識などの画像符号化処理の過程で必要な変数値を記憶するものである。なお，PDAなどの携帯端末については，通信部１０６は必須ではない。
【００１８】
なお、上述及び以下に説明する各部は、手段、機構、ユニットとも表現でき、基本的にソフトウェア又はハード、又はソフトウェアとハードとの結合によって処理、制御される機能である。なお、撮影、取得、入力などされた画像は、後述する文字認識に用いるように、制御部等のメモリ又は携帯情報端末に備わるメモリカードに記憶しておくような態様が望ましい。
【００１９】
図２は、図１の携帯情報端末を使用した第１の実施形態の文字認識の処理手順を説明する図である。
【００２０】
利用者は、ステップ２００において操作部１０５を用いて，表示部１０４に表示されたメニューから文字認識対象の文字列の属性（認識モード）を選択する。認識モードの例としては，図３に示すように「電話番号」，「e-mail」，「URL」，「英単語」などがある。この他に，「住所」「氏名」など適宜認識モードを追加しても良い。
【００２１】
次に，ステップ２０２において，携帯情報端末あるいは携帯電話１００が具備するＣＣＤやイメージセンサ等の画像入力部１０２を用いて、文字認識対象となる名刺や雑誌、あるいは看板などの画像を撮影し、記憶部１０７にディジタル画像として取込む。この時，特許文献２や図５(b)に示すように表示部１０４上に画像とマーカを同時に表示し，利用者がマーカの位置が認識対象の文字行上なるようにして画像を採取すれば，読取り対象の文字行の位置を利用者が携帯機器に指示することができる。なお，図５(b)におけるマーカの形状は十字型であるが，マーカの形状には制限はない。なお、文字認識対象の文字列の属性を決定するステップ２００は、画像入力のステップ２０２の後に行ってもよい。
【００２２】
次に，ステップ２０４において，取込んだディジタル画像に対して，マーカの位置を基準に文字行抽出，文字列抽出，単語抽出のいずれかの処理を，認識モードに応じて切り替えて実行する。ステップ２０４の処理の詳細については，図４を用いて後述する。この抽出結果は，ステップ２０６において表示部１０４に利用者が理解できるように表示される。
【００２３】
次に，利用者が文字認識対象となる文字行を確認し終えた後、操作部１０５を用いて確認または認識実行の指示を入力することにより、ステップ２０８において，行もしくは文字列もしくは単語として抽出された領域内の１文字ごとの文字切り出しを実施し、切り出された個々の文字パターンに対して文字識別文字識別を実施する。さらに，ステップ２１０において，文字識別の結果として予め具備している表記辞書との照合を行い，認識結果として出力する。最後にステップ２１２において，文字認識結果を表示部１０４に表示する。
【００２４】
図４は，図２のステップ２０４における行／文字列／単語抽出の処理手順を説明する図である。なお，図４においては，電話番号，e-mail，URL，英単語の認識モードしか記載していないが，他の認識モードを追加しても良い。追加した認識モードの後段の処理についても適宜追加してよい。
【００２５】
まず，ステップ４００において，認識モードにより処理を分岐する。認識モードが電話番号の場合には，ステップ４０２において文字行抽出を行い，処理を終了する。文字行抽出の手法としては，特許文献５や特許文献２に記載された方式を用いることが可能である。特許文献５の方式では，入力画像を2値化した後，文字の記載方向に黒画素の射影をとって生成された周辺分布から，黒画素が多く分布する範囲を抽出することにより行の領域を特定する。特許文献２の方式では，入力画像を2値化した後，黒画素の塊である連結成分を生成し，文字の記載方向に近傍の連結成分を統合することにより行の領域を特定する。
【００２６】
図５は，電話番号認識モードにおける表示部１０４の表示例である。図５(a)は認識対象である。図５(b)は図２のステップ２０２において画像入力する際の表示例である。ここでは，認識対象の文字列として電話番号「012-3456-7890」を選択しているため，マーカをこの文字列上になるように調整した後，画像を入力する。図５(c)は図２のステップ２０４において抽出した文字行領域をステップ２０６で表示した例である。電話番号モードでは，前述の通り後処理で電話番号のみを抽出できるため，ステップ２０４において行抽出が選択されている（ステップ４０２）。
【００２７】
次に，認識モードがURLの場合について説明する。まず，ステップ４０２において行抽出を行う。次に，ステップ４０４において行内において右側に存在する空白を検出し，ステップ４０８においてこの空白で文字行を分割した領域を文字列抽出結果とする。空白の検出方法としては，行内の連結成分の間隔の分布から単語中の文字間隔と単語間隔の差異を判定する方式などが考えられる。
【００２８】
図６はURL認識モードにおける表示部１０４の表示例である。図６(a)は認識対象である。図６(b)は図２のステップ２０２において画像入力する際の表示例である。ここでは，認識対象の文字列としてURL「http://www.xxxxx.co.jp」を選択しているため，マーカをこの文字列上になるように調整した後，画像を入力する。図６(c)は図２のステップ２０４において抽出した文字列領域をステップ２０６で表示した例である。URLモードでは，前述の通り後処理で“http://”を抽出できるため，ステップ２０４において図４のステップ４０２−４０４−４０８の処理の流れが選択されている。e-mail認識モードも図６と同様である。
【００２９】
次に，認識モードが英単語の場合について説明する。まず，ステップ４０２において行抽出を行う。次に，ステップ４０６において行内において読取り対象となる単語の両側の空白を検出し，ステップ４１０においてこの空白で文字行を分割した領域を単語抽出結果とする。
【００３０】
図７は英単語認識モードにおける表示部１０４の表示例である。図７(a)は認識対象である。図７(b)は図２のステップ２０２において画像入力する際の表示例である。ここでは，認識対象の文字列として英単語「thousands」を選択しているため，マーカをこの単語上になるように調整した後，画像を入力している。図７(c)は図２のステップ２０４において抽出した単語領域をステップ２０６で表示した例である。英単語モードでは，ステップ２０４において図４のステップ４０２−４０６−４１０の処理の流れが選択されている。
【００３１】
以上説明した第１実施形態の特徴は、利用者が認識対象を選択して画像を撮影すると、認識モードに応じて認識対象の領域を抽出する方式を自動的に選択し，それが図５(c)，図６(c)，図７(c)に示されるに表示部に表示されることで、利用者が認識対象の領域を確認できる。
【００３２】
図８は、本発明に係る第２の実施形態の処理手順を説明する図である。なお、図８の中の符号が図２と同一のステップは、同一機能を有するものとする為ここでの説明は省略する。図８は，ステップ２０２において画像を入力した後に，ステップ２００において認識モードを選択する。以降の処理は図２と同様である。
【００３３】
なお，第１および第２の実施例において，文字認識（ステップ２０８）と後処理（ステップ２１０）は携帯機器内で行う必要は無い。例えば通信機能を用いてサーバに認識対象領域の画像を転送し，サーバで上記の処理を行うということも可能である。
【００３４】
次に，図５(b)，図６(b)，図７(b)のマーカの表示形式について補足する。第一のマーカの表示形式は，ディスプレイなどの表示デバイス上の固定位置に表示されているものである。この場合，形態機器を動かすことにより認識対象の文字行を選択することになる。第二の表示形式は，マーカの位置は固定ではなく，操作部からの指示、または入力画像中から文字列を切り出してその文字列位置にカーソルを表示すること、などにより移動できるものである。
【００３５】
図９は，本発明に係る第３の実施形態である，英単語認識モードにおいて英熟語を認識する場合の表示部１０４の表示例である。図９(a)は認識対象である。図９(b)は図２のステップ２０２において画像入力する際の表示例である。ここでは，認識対象の文字列として英熟語「thousands of」を選択する。ここで，thousands とofの間の空白にマーカ位置を調整した後，画像を入力している。図７(c)は図２のステップ２０４において抽出した単熟語領域をステップ２０６で表示した例である。この場合，ステップ２０４において図４のステップ４０２−４０６−４１０の処理の流れが選択されている。ただし，単語間の空白にマーカを置いているため，この空白をより左右の空白が区切り位置として検出されることになり，結果として２字熟語を選択することができる。
【００３６】
【発明の効果】
以上に説明したように、携帯情報端末の入力部より取得した画像から，読取り対象の文字列のみを自動抽出して認識することができる。
【図面の簡単な説明】
【図１】携帯情報端末の構成図を示す。
【図２】第１の実施形態に係る文字認識方法のフロー図である。
【図３】第１の実施形態に係る認識モード選択のためのメニューを説明する図である。
【図４】第１の実施形態に係る行／文字列／単語抽出方法のフロー図である。
【図５】第１の実施形態に係る電話番号モードにおける行抽出の表示を説明する図である。
【図６】第１の実施形態に係るURLモードにおける文字列抽出の表示を説明する図である。
【図７】第１の実施形態に係る英単語モードにおける単語抽出の表示を説明する図である。
【図８】第２の実施形態に係る文字認識方法のフロー図である。
【図９】第３の実施形態に係る英単語モードにおける英熟語抽出の表示を説明する図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for extracting a character line and a character string in a photographed image in an image input means, specifically, a portable information terminal or a mobile phone having a camera.
[0002]
[Prior art]
Conventionally, a device that reads characters printed or handwritten on paper is known as an OCR (Optical Character Reader). The main application fields are form processing, mail classification, and text conversion of documents. In typical OCR, characters are read in the following procedure. First, the paper is photoelectrically converted into a computer using a scanner (image input), the area of the character line to be read is extracted (character line extraction), and individual characters are extracted from the character line (character extraction). Is identified (character identification), and a group of characters read using language information is interpreted as a character string (post-processing).
[0003]
As an example of post-processing in character recognition, there is a function of correcting an error in the character recognition result by collating the individual characters in the recognition result with words registered in the language information dictionary. Taking an English word as an example, the character recognition result “oharacter”, whose leading character is misrecognized, is checked against the word “character” in the dictionary to correct the incorrect character and obtain the correct recognition result as a word. Obtainable. As a conventional technique for such post-processing, the method of Non-Patent Document 1 and the method of Non-Patent Document 2 in which character extraction, recognition, and collation are integrated have been proposed.
[0004]
As a post-processing target, a method of Patent Document 1 is proposed in which not only the word but also the notation format of the address address part of the address is to be collated. In the method of Patent Document 1, a notation format such as N-NN-NN (where-is a delimiter and N is a number from 0 to 9) is stored in the dictionary, and the character recognition result is compared with the notation format. Correct the error in the chome address. When this technology is applied to telephone number recognition, it is possible to extract only the telephone number part from a line in which characters other than the telephone number such as “TEL” are written, and obtain a recognition result. Specifically, the phone number notation format such as 0N-NNNN-NNNN (where 0 is the number 0, N is a number from 0 to 9, and-is the delimiter) is stored in the dictionary, and the character recognition result By comparing with, characters other than the telephone number can be deleted from the recognition result.
[0005]
On the other hand, in recent years, attempts have been made to read characters such as documents, signs, signs, etc. using a camera mounted on a portable information terminal such as a mobile phone or a PDA (personal digital assistant) as an image input means. Patent Documents 2 and 3 have been proposed as conventional examples in which a camera is provided in a mobile phone. The recognition targets of these devices are telephone numbers, e-mail addresses, URLs, words, and the like. In particular, Patent Document 3 proposes a method in which a user selects a recognition target in advance to improve recognition accuracy, and a character type limitation or post-processing method is switched according to the recognition target.
[0006]
In general, one of the problems in character recognition is to extract a character area to be read. In the form OCR, a method is generally known in which an area in which characters to be read are written is specified in advance, and an image in the area is cut out and recognized from the entire image of the form. In addition, in document OCR or the like, there is also a method of designating a recognition target area with a mouse or the like for a document displayed on the screen. On the other hand, in the portable information terminal of Patent Document 2, a method of automatically extracting and recognizing a character line area close to a marker displayed on a screen has been proposed.
[0007]
[Patent Document 1]
Japanese Patent Laid-Open No. 11-207266 [Patent Document 2]
JP 2003-78640 A [Patent Document 3]
JP 2002-152696 A [Patent Document 4]
JP 2002-366463 A [Patent Document 5]
Japanese Patent Laid-Open No. 11-203404 [Non-Patent Document 1]
Katsumi Marukawa et al., “Error Correction Algorithm for Handwritten Kanji Address Recognition”, Transactions of Information Processing Society of Japan, Vol.35, No.6, 1994.
[Non-Patent Document 2]
OR Agazzi, et al., "Connected And Degraded Text Recognition using Planar Hidden Markov Models," Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. V-113-V-116, 1993
[0008]
[Problems to be solved by the invention]
There may be a plurality of character strings or words in the same line on paper, and a part to be read may be a part of them. In such a case, it is necessary to automatically discriminate the character string to be read by the device in accordance with a predetermined rule according to the application field.
[0009]
However, in conventional examples of character recognition using portable information terminals such as Patent Document 2, Patent Document 3, and Patent Document 4, there is only one character string to be read in the same line, and unnecessary character strings are It was based on the premise that it does not exist in the image.
[0010]
On the other hand, there are actually character strings other than the recognition target in the same line. There is no problem if only the character string portion to be recognized can be input as an image. However, in general, since an image input from a camera cannot adjust a focus to obtain a user's desired angle of view, character strings other than recognition targets are often included in the image. In such a case, since an unnecessary character string is added to the recognition result, there is a problem that a character recognition result different from the user's intention is obtained.
[0011]
Taking URL recognition as an example, as shown in Fig. 5 (a), if there is an unrelated character string in addition to the URL, the recognition result is the URL and unnecessary character string combined together. .xxxxx.co.jptel: 012-3456-7890 (dial-in) ", so the target Web page cannot be connected. The same is true for e-mail address recognition.
[0012]
For example, when it is desired to recognize an English word and obtain a translation result, it is difficult to extract and translate only one word if the specification can recognize characters only in line units. As an example, a case where a sentence “This is a pen.” Is described in a line and only the pen is to be recognized will be described. The simple recognition result is "Thisisapen." Although it is possible to extract only pen by word matching, if the delimiter between words is not clear, the number of word matching increases (matching with apen, sapen, isapen, etc.), so the mobile phone with low CPU performance It is not suitable for implementation on information terminals. In addition, if the character recognition is incorrect, there is a problem that correct results cannot be obtained even by word matching. For example, if a is misrecognized as o and “a pen” is recognized as “open”, open may be recognized as a result of word matching.
[0013]
In order to solve the above problem, there is a method for detecting a character string or a word using a space between character strings. However, since the width of the space between the character space and the character space is not always clearly distinguishable, an error may occur when extracting the character string using the space. If a character string shorter than the actual one is selected by mistake in the reading target area at this stage, a correct recognition result can never be obtained.
[0014]
Therefore, the present invention has been made in view of the above points, and solves some or all of the above problems, and particularly in character recognition using a portable information terminal or a mobile phone, etc. To provide a character string selection method by which a character string image to be recognized can be easily selected by arbitrarily selecting a character string image to be recognized by an operation, and a character recognition method for recognizing the selected character string image. It is an object.
[0015]
[Means for Solving the Problems]
In order to achieve the above object, the image data includes an image input unit that captures, acquires, or inputs an image, a display unit that displays image data from the image input unit, an operation unit that can perform an input operation, and image data. In a portable information terminal having a calculation unit for recognizing characters and a control unit for controlling the units, a character string to be read or an image data from an image input unit depending on a recognition mode input from the operation unit. Automatically switching the extraction method of character lines or words, automatically extracting character strings or character lines or words, and displaying the image content extracted together with the image data on the display unit as character strings, character lines or words, Provided is a portable information terminal or a character recognition method suitable for recognizing a specific character string, character line, or word displayed on the display unit by a selection operation from the operation unit.
[0016]
Specifically, the line extraction method is switched depending on whether the character recognition target is a phone number, e-mail or URL, or an English word. This process is based on the policy that if the reading target can be limited by the above-described post-processing, the entire character line is the recognition target, and if the reading target cannot be limited, the line is divided using a blank or the like to limit the recognition target. More specifically, if it is a phone number, the entire text line is extracted, if it is an e-mail or URL, it extracts from the left end of the line to the end of the e-mail or URL. To extract. The grounds for the above processing are as follows. In recognition of a telephone number, since only the characters of the telephone number can be determined from the line by the method using the above-mentioned Patent Document 1, the entire line can be recognized. In e-mail and URL recognition, character strings such as “http: //” and “e-mail:” can be detected by word matching, so the left side of the recognition target need not be limited. However, if there is an unnecessary character string on the right side as described above, the address recognition is incorrect, so the unnecessary character string on the right side is deleted and made a recognition target. In recognition of English words, left and right blanks are detected and only words are recognized.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 9.
FIG. 1 is a configuration diagram showing an outline of a portable information terminal or a cellular phone 101 (or simply referred to as a portable terminal or a portable device) having an image input unit according to the present embodiment.
The portable information terminal 101 includes an image input unit 102 such as a camera, a calculation unit 103, a display unit 104, an operation unit 105, a communication unit 106, and a storage unit 107. The arithmetic unit 103 controls each unit according to the procedure specified by the program stored in the storage unit 107 and executes processing such as character line extraction and image encoding. The display unit 104 is configured by a device such as a liquid crystal panel, for example, and visually displays information such as images and characters. The input unit 105 is composed of a device such as a button, for example, and receives an input from a person operating the device. The communication unit 106 is configured by a device such as a wireless LAN, for example, and performs communication with the outside. The storage unit 107 is, for example, a static RAM, and stores a variable value necessary for a program for storing processing procedures and image encoding processing such as character line extraction processing and character recognition. Note that the communication unit 106 is not essential for a portable terminal such as a PDA.
[0018]
Each unit described above and below can be expressed as a means, a mechanism, and a unit, and is basically a function that is processed and controlled by software or hardware, or a combination of software and hardware. It should be noted that it is desirable that an image captured, acquired, input, etc. is stored in a memory such as a control unit or a memory card provided in a portable information terminal so as to be used for character recognition described later.
[0019]
FIG. 2 is a diagram for explaining a character recognition processing procedure according to the first embodiment using the portable information terminal of FIG.
[0020]
In step 200, the user selects an attribute (recognition mode) of the character string to be recognized from the menu displayed on the display unit 104 using the operation unit 105. Examples of the recognition mode include “telephone number”, “e-mail”, “URL”, and “English word” as shown in FIG. In addition, a recognition mode such as “address” and “name” may be added as appropriate.
[0021]
Next, in step 202, an image of a business card, magazine, signboard or the like that is a character recognition target is photographed and stored using the image input unit 102 such as a CCD or an image sensor provided in the portable information terminal or the cellular phone 100. The image is taken into the unit 107 as a digital image. At this time, as shown in Patent Document 2 and FIG. 5 (b), an image and a marker are simultaneously displayed on the display unit 104, and the user collects an image so that the position of the marker is on the character line to be recognized. For example, the user can instruct the portable device of the position of the character line to be read. Note that the shape of the marker in FIG. 5B is a cross shape, but the shape of the marker is not limited. The step 200 for determining the attribute of the character string to be recognized may be performed after the image input step 202.
[0022]
Next, in step 204, any one of character line extraction, character string extraction, and word extraction processing is switched and executed on the captured digital image according to the recognition mode. Details of the processing in step 204 will be described later with reference to FIG. This extraction result is displayed on the display unit 104 in step 206 so that the user can understand it.
[0023]
Next, after the user has confirmed the character line to be character-recognized, it is extracted as a line, a character string, or a word in step 208 by inputting a confirmation or recognition execution instruction using the operation unit 105. Character segmentation is performed for each character in the segmented area, and character identification character identification is performed on each segmented character pattern. Further, in step 210, collation with a notation dictionary previously provided as a result of character identification is performed and output as a recognition result. Finally, in step 212, the character recognition result is displayed on the display unit 104.
[0024]
FIG. 4 is a diagram for explaining the processing procedure of line / character string / word extraction in step 204 of FIG. In FIG. 4, only the phone number, e-mail, URL, and English word recognition modes are shown, but other recognition modes may be added. You may add suitably the process of the back | latter stage of the added recognition mode.
[0025]
First, in step 400, the process branches depending on the recognition mode. If the recognition mode is a telephone number, character line extraction is performed in step 402, and the process ends. As a technique for extracting character lines, the methods described in Patent Document 5 and Patent Document 2 can be used. In the method of Patent Document 5, after binarizing an input image, a region of rows is extracted by extracting a range in which many black pixels are distributed from a peripheral distribution generated by projecting black pixels in the direction of writing characters. Is identified. In the method of Patent Document 2, after binarizing an input image, a connected component that is a block of black pixels is generated, and a region of a line is specified by integrating neighboring connected components in the character writing direction.
[0026]
FIG. 5 is a display example of the display unit 104 in the telephone number recognition mode. FIG. 5A is a recognition target. FIG. 5B shows a display example when inputting an image in step 202 of FIG. Here, since the telephone number “012-3456-7890” is selected as the character string to be recognized, the image is input after adjusting the marker to be on the character string. FIG. 5C shows an example in which the character line area extracted in step 204 in FIG. In the telephone number mode, since only the telephone number can be extracted by post-processing as described above, row extraction is selected in step 204 (step 402).
[0027]
Next, the case where the recognition mode is URL will be described. First, line extraction is performed in step 402. Next, in step 404, a blank on the right side in the line is detected, and in step 408, a region obtained by dividing the character line by this blank is taken as a character string extraction result. As a method for detecting a blank, a method of determining a difference between a character interval and a word interval in a word from a distribution of intervals of connected components in a line can be considered.
[0028]
FIG. 6 is a display example of the display unit 104 in the URL recognition mode. FIG. 6A is a recognition target. FIG. 6B shows a display example when inputting an image in step 202 of FIG. Here, since the URL “http://www.xxxxx.co.jp” is selected as the character string to be recognized, an image is input after adjusting the marker to be on this character string. FIG. 6C shows an example in which the character string area extracted in step 204 in FIG. In the URL mode, since “http: //” can be extracted by post-processing as described above, the processing flow of steps 402-404-408 in FIG. The e-mail recognition mode is the same as in FIG.
[0029]
Next, the case where the recognition mode is English words will be described. First, line extraction is performed in step 402. Next, in step 406, blanks on both sides of the word to be read in the line are detected, and in step 410, an area obtained by dividing the character line by the blanks is set as a word extraction result.
[0030]
FIG. 7 is a display example of the display unit 104 in the English word recognition mode. FIG. 7A is a recognition target. FIG. 7B is a display example when inputting an image in step 202 of FIG. Here, since the English word “thousands” is selected as the character string to be recognized, an image is input after adjusting the marker to be on this word. FIG. 7C shows an example in which the word region extracted in step 204 in FIG. In the English word mode, the processing flow of steps 402-406-410 in FIG.
[0031]
The feature of the first embodiment described above is that when a user selects a recognition target and shoots an image, a method for extracting a recognition target region is automatically selected according to the recognition mode, which is shown in FIG. By displaying on the display unit as shown in c), FIG. 6C, and FIG. 7C, the user can confirm the recognition target area.
[0032]
FIG. 8 is a diagram for explaining the processing procedure of the second embodiment according to the present invention. 8 that have the same reference numerals as those in FIG. 2 have the same functions, description thereof is omitted here. In FIG. 8, after inputting an image in step 202, a recognition mode is selected in step 200. The subsequent processing is the same as in FIG.
[0033]
In the first and second embodiments, character recognition (step 208) and post-processing (step 210) need not be performed in the portable device. For example, it is possible to transfer the image of the recognition target area to the server using the communication function and perform the above-described processing on the server.
[0034]
Next, the marker display formats in FIGS. 5B, 6B, and 7B will be supplemented. The display format of the first marker is displayed at a fixed position on a display device such as a display. In this case, the character line to be recognized is selected by moving the form device. In the second display format, the marker position is not fixed, but can be moved by an instruction from the operation unit, or by cutting out a character string from the input image and displaying a cursor at the character string position.
[0035]
FIG. 9 is a display example of the display unit 104 when an idiom is recognized in the English word recognition mode according to the third embodiment of the present invention. FIG. 9A is a recognition target. FIG. 9B shows a display example when inputting an image in step 202 of FIG. Here, the English phrase “thousands of” is selected as the character string to be recognized. Here, the image is input after adjusting the marker position in the space between “thousands” and “of”. FIG. 7C is an example in which the simple phrase area extracted in step 204 of FIG. In this case, in step 204, the processing flow of steps 402-406-410 in FIG. 4 is selected. However, since a marker is placed in the space between words, this space is detected as a left and right space as a break position, and as a result, a two-character idiom can be selected.
[0036]
【The invention's effect】
As described above, only the character string to be read can be automatically extracted and recognized from the image acquired from the input unit of the portable information terminal.
[Brief description of the drawings]
FIG. 1 shows a configuration diagram of a portable information terminal.
FIG. 2 is a flowchart of a character recognition method according to the first embodiment.
FIG. 3 is a diagram illustrating a menu for selecting a recognition mode according to the first embodiment.
FIG. 4 is a flowchart of a line / character string / word extraction method according to the first embodiment.
FIG. 5 is a diagram for explaining display of row extraction in the telephone number mode according to the first embodiment.
FIG. 6 is a diagram for explaining display of character string extraction in the URL mode according to the first embodiment.
FIG. 7 is a diagram for explaining display of word extraction in the English word mode according to the first embodiment.
FIG. 8 is a flowchart of a character recognition method according to a second embodiment.
FIG. 9 is a diagram for explaining display of idiom extraction in English word mode according to the third embodiment.

Claims

Portable information having an image input unit for inputting an image, a display unit for displaying the image, an operation unit capable of input operation, and a calculation unit for controlling each unit and extracting a character line image from the image In the calculation part of the terminal,
At least the word recognition mode and e-mail, U by the selection input received in the operation unit
A first step of switching between two types of RL recognition modes;
A second step of extracting a range for character recognition from the character line image by an algorithm according to the recognition mode;
A third step of displaying a range for character recognition on the display unit;
And execute
The algorithms for the two types of recognition modes are algorithms that use the positional relationship of the character line image, the marker shown in the image, and the blank in the character line image, and the character line image A character string reading program characterized in that a part of the character string is extracted as a range for character recognition,
When the recognition mode is the word recognition mode, in the algorithm of the second step, when the character line image extracted from the image is horizontal writing, white space existing on both sides of the marker is detected in the character line image. , Execute the process of extracting the area between the blanks,
When the recognition mode is e-mail or URL recognition mode, when the character line image extracted from the image is horizontal writing in the algorithm of the second step, the character line image exists immediately to the right of the marker. A character string reading program, comprising: detecting a blank to be detected, dividing a character line by the blank, and extracting a region from a left end of the character line image to immediately before the blank.