JP4109738B2

JP4109738B2 - Image processing method and apparatus and storage medium therefor

Info

Publication number: JP4109738B2
Application number: JP00399098A
Authority: JP
Inventors: 北洋金田; 知俊金津
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-01-12
Filing date: 1998-01-12
Publication date: 2008-07-02
Anticipated expiration: 2018-01-12
Also published as: JPH11203410A

Description

【０００１】
【発明の属する技術分野】
本発明は画像処理方法及び装置及びその記憶媒体に関し、特に異なる言語が混在する文書の画像を文字認識する処理に関するものである。
【０００２】
本発明は画像処理方法及び装置及びその記憶媒体に関し、特に複数のブロックに分割し得る画像を解析する処理に関するものである。
【０００３】
【従来の技術】
従来の文字認識処理は、基本的には単一言語を対象としており、例えば、日本語なら日本語専門文字認識装置を、英語なら英語専門文字認識装置を使用して文字認識を行っていた。ただし、日本語専門文字認識装置の場合、対象文字としてアルファベットも含んでいる場合が多いので、アルファベットからなる言語、例えば英語の認識も可能ではあった。しかしながらこのような場合、英語専門文字認識装置に比較し英語部分の認識精度が悪化することは否めない。
【０００４】
このような欠点をなくすため、ユーザーが文章ごとに英語か日本語かを視認により判断し、日本語の文章部分を指定して日本語専門の文字認識を行うよう指示し、或は、英語の文章部分を指定して英語専門の文字認識を行うように指示させる文字認識装置もある。
【０００５】
【発明が解決しようとする課題】
しかしながら前記従来例で述べた文字認識装置においては、異なる言語が混在した文書を認識させる場合、日英両文章に対して無理に日本語で文字認識を行うか、或は文章部分の指定と認識言語種の指示を、日本語と英語の２回繰り返さなければならないというように、認識率と使い勝手は両立せず、いずれか一方をとらざるを得なかった。また、これは文字認識装置の普及のためにも大きな障害となっていた。
【０００６】
【課題を解決するための手段】
上記課題を解決するために、本発明の画像処理方法は、入力画像からテキスト領域を抽出する抽出ステップと、前記抽出ステップで抽出されたテキスト領域毎に画像特性を解析することにより、各テキスト領域の言語種が日本語・英語・未知のいずれであるかを判定し、当該判定した言語種を各領域に設定する言語種判定ステップと、前記入力画像から抽出されたテキスト領域のいずれかが日本語の言語種であると前記言語種判定ステップで判定された場合、前記言語種判定ステップで未知であると判定されたテキスト領域の言語種を日本語に設定し、一方、前記入力画像から抽出された全てのテキスト領域が日本語の言語種でないと前記言語種判定ステップで判定された場合、前記言語種判定ステップで未知であると判定されたテキスト領域の言語種を英語に設定する再設定ステップと、前記言語種判定ステップと前記再設定ステップとにより設定された言語種が日本語のテキスト領域の画像に対しては、日本語の文字認識に適し且つアルファベットも認識可能な日本語文字認識機能を用いて文字認識し、一方、前記言語種判定ステップと前記再設定ステップとにより設定された言語種が英語のテキスト領域の画像に対しては、英語の文字認識に適した英語文字認識機能を用いて文字認識する文字認識ステップと、を有することを特徴とする。
【０００７】
上記課題を解決するために、本発明の画像処理装置は、入力画像からテキスト領域を抽出する抽出手段と、前記抽出手段で抽出されたテキスト領域毎に画像特性を解析することにより、各テキスト領域の言語種が日本語・英語・未知のいずれであるかを判定し、当該判定した言語種を各領域に設定する言語種判定手段と、前記入力画像から抽出されたテキスト領域のいずれかが日本語の言語種であると前記言語種判定手段で判定された場合、前記言語種判定手段で未知であると判定されたテキスト領域の言語種を日本語に設定し、一方、前記入力画像から抽出された全てのテキスト領域が日本語の言語種でないと前記言語種判定手段で判定された場合、前記言語種判定手段で未知であると判定されたテキスト領域の言語種を英語に設定する再設定手段と、前記言語種判定手段と前記再設定手段とにより設定された言語種が日本語のテキスト領域の画像に対しては、日本語の文字認識に適し且つアルファベットも認識可能な日本語文字認識機能を用いて文字認識し、一方、前記言語種判定手段と前記再設定手段とにより設定された言語種が英語のテキスト領域の画像に対しては、英語の文字認識に適した英語文字認識機能を用いて文字認識する文字認識手段と、を有することを特徴とする。
【０００８】
上記課題を解決するために、本発明のコンピュータ読取可能な記憶媒体は、入力画像からテキスト領域を抽出する抽出ステップと、前記抽出ステップで抽出されたテキスト領域毎に画像特性を解析することにより、各テキスト領域の言語種が日本語・英語・未知のいずれであるかを判定し、当該判定した言語種を各領域に設定する言語種判定ステップと、前記入力画像から抽出されたテキスト領域のいずれかが日本語の言語種であると前記言語種判定ステップで判定された場合、前記言語種判定ステップで未知であると判定されたテキスト領域の言語種を日本語に設定し、一方、前記入力画像から抽出された全てのテキスト領域が日本語の言語種でないと前記言語種判定ステップで判定された場合、前記言語種判定ステップで未知であると判定されたテキスト領域の言語種を英語に設定する再設定ステップと、前記言語種判定ステップと前記再設定ステップとにより設定された言語種が日本語のテキスト領域の画像に対しては、日本語の文字認識に適し且つアルファベットも認識可能な日本語文字認識機能を用いて文字認識し、一方、前記言語種判定ステップと前記再設定ステップとにより設定された言語種が英語のテキスト領域の画像に対しては、英語の文字認識に適した英語文字認識機能を用いて文字認識する文字認識ステップと、の各ステップをコンピュータに実行させるための制御プログラムが格納される。
【００２３】
【発明の実施の形態】
以下図面を参照して本発明の実施の形態を説明する。
【００２４】
図５は本発明に係る装置の構成図である。
【００２５】
５１はＣＰＵ（中央処理装置）であって、ＭＥＭ５３に格納されている制御プログラムに従って本発明に係る処理の制御を行う。後述するフローチャートに示す処理もＣＰＵ５１の制御により実行される。５３はＭＥＭ（ＲＡＭ及びＲＯＭからなる）であって、ＣＰＵ５１が実行する処理の制御プログラムや、その処理に用いる各種パラメータ、入力画像、文字認識の辞書等、各種データの格納はこのＭＥＭ５３に行われる。５４はＣＲＴやＬＣＤ等の表示器であって、入力画像、処理結果のテキスト、操作指示画面、入力手段５９により特定された文書識別情報に応じてファイルから読み出した認識結果等を表示する。ＭＥＭ５３に格納された文字認識結果を表示器５４に表示し、入力手段５９を用いて複数の候補文字から正しい文字を選択する等の編集が可能である。５５はＬＢＰやＢＪプリンタ等のプリンタであって、画像、テキスト等を印字する。５６はスキャナであって、原稿の画像を光学的に読み取り、電気的信号として装置に入力する。５７は通信Ｉ／Ｆであって、公衆回線やＬＡＮ等を介してデータの送・受信を制御する。本発明に係る画像の入力及びその画像を処理した結果の出力を、この通信Ｉ／Ｆを介して他端末と送受信することもできる。５８は、本装置に着脱可能であり、コンピュータにより読み取り、更には書き込み可能な記憶媒体であって、ＣＤ−ＲＯＭ、ＣＲ−Ｒ、ＦＤ等である。本発明に係る画像をこの記憶媒体５８から読み取り、処理結果を記憶媒体５８に書き込んでも良い。また、ＭＥＭ５３に格納する制御プログラムを、通信Ｉ／Ｆを介して他端末からインストールしたり、或は記憶媒体５８からインストールしても良い。５９はキーボード、ポインティングデバイス等の入力手段であり、オペレータからの指示はこの入力手段５９を介して行う。５０はバスであって、各手段間のデータの授受を行う。
【００２６】
図1は本発明に係る装置のプロセス概略図である。本実施の形態では、日本語と英語を認識対象としている場合を例に挙げ、説明する。
【００２７】
図1において、2は原稿画像を入力する画像入力部であり、スキャナ５６或は通信Ｉ／Ｆ５７を介して他端末から、或は記憶媒体５８から入力し、ＭＥＭ５３に格納する。4は入力された原稿画像の領域識別部、6は当該領域が日本語か英語かを判別する日英判別部、8は前記日英判別部を制御する日英判別制御部であり、入力手段５９による言語種自動判別を行うか否か、自動判別を行わない場合の認識モードの指示の入力に応じて、その指示を表わす制御信号を生成する。10は日本語と英語の文字認識を行う文字認識部であり、これらの各プロセスは後述するフローチャートに示すように、ＭＥＭ５３に格納された制御プログラムに従ってＣＰＵ５１により実行される。
【００２８】
次に動作について説明する。
【００２９】
画像入力部2より取得された原稿画像は、領域識別部4において、図2に示すようにその属性ごとに小領域（以下ブロックとする）に分割され、ブロックナンバー、ブロック属性、ブロックの大きさ、ブロックの位置を識別する。この、領域識別部４で行う領域識別の処理は、入力した原稿画像における黒画素の配置を分析し、テキスト、図、画像、セパレータ等の属性を判別し、同属性であって一固まりの画像を一つのブロックとして識別するものである。また、同じテキストであっても、行方向が異なっていたり、或は行間が規定値以上に広いものについては、段が異なるテキストであると判断し、異なるブロックとして識別する。
【００３０】
ブロックの識別がなされたら、各ブロックごとに、例えば上から順にブロックナンバーを自動付与し、ブロックの大きさを表す幅情報及び高さ情報と、ブロックの位置を規定する、例えばブロックの左上端の座標データをセットにしてブロックデータとしてＭＥＭ５３に格納する。このブロックデータを参照することにより、入力画像の所望の１ブロックの画像を抽出することも、また、原稿におけるテキスト、図、画像、セパレータ等の配置を再現することもできる。一方、日英判別制御部8においては、ユーザーによる、日英自動判別を行うか否かの設定、及び日英自動判別を行うことがユーザーにより設定されなかった場合に設定される認識言語モード（日本語認識か、英語認識か）に応じた制御信号を生成し、日英判別部6に入力する。日英判別部6においては、前記日英判別制御部8で生成された制御信号と、前記領域識別部4において設定された領域属性を基に文字領域の日本語か英語かの判断を行う。
【００３１】
日本語、あるいは英語の属性を加えられた文字領域の画像データは文字認識部10に送られ、それぞれの属性が特定する言語に応じた文字認識が施される。文字認識部１０は、後述するＳ４０６の日本語文字認識のルーチン及びＳ４０８の英語文字認識ルーチンを行うが、これは、各々別個の文字認識別部を備えても良いし、或は文字画像の切り出し、認識用辞書を言語毎に備え、マッチングのアルゴリズムを共通にしても良い。
【００３２】
文字認識部１０での文字認識処理が終了したら、文字認識の結果は文書識別情報を付加して記憶（ファイリング）する。この文書識別情報は、ユーザによるキーボード操作により入力しても良いし、また、文字認識結果から抽出しても良い。或は、入力日時、時刻、ユーザＩＤ等を自動的に付与しても良い。後にファイルから読み出す際のインデックスとして用い得るよう、各文書毎に異なっていれば良い。また、ファイリングするデータは、文字認識結果のみならず、入力画像や入力画像を圧縮したデータ領域のレイアウト情報も共にして良い。
【００３３】
ここで日英判別部6と、文字認識部10について詳細に説明する。
【００３４】
図3に日英判別部6の処理の流れを表すフローチャートを示す。
【００３５】
S302は、ユーザーの指示に応じて日英判別制御部8において生成され、ＭＥＭ５３に格納してある制御信号を読み込むルーチンである。処理対象の原稿画像が複数ある場合は、現在処理対象として特定されている画像に対応づけてＭＥＭ５３に格納されている制御信号を選択的に読み込む。この制御信号は、日英自動判別を行うか否か、および行わない場合の認識言語モードを表わすものである。
【００３６】
S304において、S302で読込まれた制御信号に基づいて日英自動判別を行うか否かを決定し、行う場合はS306へ、行わない場合はS322へ制御を移行させる。
【００３７】
S306は、領域識別部４により図２のごとく分割されたブロック毎の属性を読み込むルーチンであって、ＭＥＭ５３に格納されているブロックデータをブロックナンバー順に１つずつ読み込む。
【００３８】
S308において、S306で読込まれたブロックデータ内のブロック属性を基に当該ブロックが文章領域であってＳ３１０に進むべきか否かを判別し、文章領域である場合はS310以下の当該ブロックに対する日英判別処理へ制御を移行する。Ｓ３０８において当該ブロックが文章領域でなく、Ｓ３１０に進むべきでないと判別された場合はS306に制御を戻し、次のブロックデータをＭＥＭ５３から読込む。図２の例では、ブロック2,3,4にテキストの属性が与えられていることから、これらのブロックは文章領域であると判別し、Ｓ３１０以下の日英判別処理の対象ブロックとなる。
【００３９】
S310は、当該ブロックの画像特性を解析して言語種が日本語か英語かを判別するためのルーチンであって、Ｓ３０６で読み込んだブロックデータの内のブロック位置及び大きさデータに基づいて特定し得るＭＥＭ５３の入力画像から対応領域の画像に対して行う処理である。これは、例えば、特開平8-339424、特開平8-305792に記述されている入力画像の特徴から日英等の言語種を自動判別する技術などを適用すれば可能である。
【００４０】
S312において、S310で判定された結果を元に当該ブロックデータに新たな属性、すなわち言語種を設定してＭＥＭ５３に格納する。ここで設定するのは、日本語、英語、未知の三種である。“未知”は、Ｓ３１０における日英判別ルーチンでの日本語である確からしさ及び英語である確からしさが閾値よりも低い場合に設定する。
【００４１】
S314において、ＭＥＭ５３に格納されている当該原稿画像の全ブロックについてＳ３０６〜Ｓ３１２の言語種判定処理がなされたか否かを判定し、処理がすべて終了している場合はS316へ、そうでない場合は、S306へ制御を移行する。
【００４２】
S316、S318、S320は当該原稿の全ブロックの言語種をチェックし、その中に日本語ブロックが一つでも存在している場合は未知ブロックを日本語ブロックと、そうでない場合、すなわちすべて英語ブロックであった場合は、英語ブロックと設定する未知ブロックの言語種決定のルーチンである。このルーチンでは、Ｓ３１６において、ＭＥＭ５３に格納されている全ブロックの言語種に関する属性を読み込んで、日本語の属性が設定されているブロックが少なくとも１つあるか判定し、あると判定される場合は、Ｓ３１６で読み込んだブロックの属性が未知となっているブロックに対して言語種を日本語であると設定してＭＥＭ５３に格納し（Ｓ３１８）、Ｓ３１６で１つも日本語ブロックがないと判定された場合は、Ｓ３１６で読み込んだブロックの属性が未知となっているブロックに対して言語種を英語であると設定してＭＥＭ５３に格納する（Ｓ３２０）。尚、Ｓ３１６の判定基準は、予め定めておけば良く、特定の言語種ブロックの所定の個数（１、２、・・・）でも、所定の割合（５％、１０％、・・・）であっても良い。
【００４３】
これは、未知ブロックが実際は英語ブロックであるにもかかわらずＳ３１８において日本語ブロックと設定した場合でも、文字認識部１０で行う日本語文字認識にはアルファベットの認識も入っている（即ち、日本語のかな、漢字、数字、記号等の他に、アルファベットも認識対象文字としている）ので、認識不能には陥らないが、逆の場合、すなわち日本語ブロックを英語ブロックと間違って設定した場合は認識不能に陥ってしまうので、そのように判断する方を厳しく行っているということである。すなわち認識対象文字数が多い言語種をより選択しやすくしている。より好ましくは、本実施例で述べているように一方の言語種の認識対象文字が、もう一方の言語種の認識対象文字を含んでいる場合、含んでいる方の言語種を選択しやすくするとよい。
【００４４】
S322は、S304で日英自動判別を行わないと決定した場合、認識言語モードの設定を行うルーチンである。この設定は、入力手段５９を介してオペレータにより操作、指示された言語種に応じて日英判別制御部8において生成された制御信号を基にして行う。
【００４５】
S324は、S322で日本語認識モードと判定された場合、当該原稿内の全文章領域を日本語と設定し、ＭＥＭ５３に格納するルーチン。
【００４６】
S326は、S322で英語認識モードと判定された場合、当該原稿内の全文章領域を英語と設定し、ＭＥＭ５３に格納するルーチン。
【００４７】
図4に文字認識部10が行う文字認識処理の流れを表すフローチャートを示す。
【００４８】
S402は、ＭＥＭ５３に格納されているブロックデータから、属性がテキストとして設定されている文章ブロックのブロックデータ及びブロック内の画像データを読込むルーチン。
【００４９】
S404は、S402で読込んだブロックデータの属性により、当該ブロックの言語種属性を判定するルーチン。このルーチンにより判定する属性が、Ｓ３１０、Ｓ３１８、Ｓ３２０、Ｓ３２４、Ｓ３２６で設定した言語種の属性である。
【００５０】
S406は、S404で日本語と判断された場合、文字認識部１０において日本語文字認識を行うルーチン。
【００５１】
S408は、S404で英語と判断された場合、文字認識部１０において英語文字認識を行うルーチン。
【００５２】
S410は、ＭＥＭ５３に格納されている当該原稿の全文章ブロックに対しＳ４０２〜Ｓ４０８の文字認識処理を行ったか否かを判定し、まだ処理ブロックが残っている場合は制御をS402へ移行し、次の文章ブロックを読み込む。
【００５３】
以上述べてきたように、本発明によれば、日英自動判別技術を核として、その制御のためにユーザーフレンドリーなインターフェースを提供し、さらに未知ブロックの処理に関しても、独自の後処理を施した文字認識装置を構築できるので、日本語、英語の混在した文書においても、認識精度と、使い勝手の向上を同時に実現させることができ、文字認識装置の普及のため大きく貢献することとなる。
【００５４】
本実施の形態では日本語と英語の言語種判別を例に挙げて説明したが、何もこれに限ることはなく、他言語の判別を行っても良い。この場合、基本的には図3のS310日英判別部をそれに適する言語種判別ルーチンに置き換えるだけで対応することができる。
【００５５】
【発明の効果】
以上述べてきたように本発明によれば、言語種自動判別技術を核として、その制御のためにユーザーフレンドリーなインターフェースを提供し、さらに誤判定時の後処理も考慮した文字認識装置を構築できるので、異なる言語が混在した文書においても、認識精度と、使い勝手の向上を同時に実現させることができ、文字認識機能の普及のため大きく貢献することができる。
【００５６】
以上述べたように本発明によれば、原稿に含まれる複数の領域に分かれた文章を、領域ごとに言語種を判別し、各領域の言語種に合わせた文字認識を行うので、複数言語が混在した文書の認識を高精度で高速に行うことができる。
【図面の簡単な説明】
【図１】本発明に係る文字認識装置のプロセス概略図
【図２】領域識別結果の例示図
【図３】日英判別部６の処理の流れを表すフローチャート
【図４】文字認識部１０の処理の流れを表すフローチャート
【図５】本発明に係る装置の構成図[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing method and apparatus and a storage medium therefor, and more particularly to processing for character recognition of an image of a document in which different languages are mixed.
[0002]
The present invention relates to an image processing method and apparatus and a storage medium therefor, and more particularly to processing for analyzing an image that can be divided into a plurality of blocks.
[0003]
[Prior art]
Conventional character recognition processing basically targets a single language, and for example, character recognition is performed using a Japanese character recognition device for Japanese and an English character recognition device for English. However, in the case of a specialized Japanese character recognition device, since the target character often includes an alphabet, it is also possible to recognize a language composed of the alphabet, for example, English. However, in such a case, it cannot be denied that the recognition accuracy of the English portion is deteriorated as compared with the English professional character recognition device.
[0004]
In order to eliminate these disadvantages, the user visually determines whether each sentence is English or Japanese, and instructs the Japanese character recognition by specifying the Japanese sentence part. There is also a character recognition device that designates a sentence portion and instructs to perform English-specific character recognition.
[0005]
[Problems to be solved by the invention]
However, in the character recognition device described in the above-mentioned conventional example, when a document in which different languages are mixed is recognized, character recognition is forcibly performed in Japanese for both Japanese and English sentences, or a sentence part is designated and recognized. As the instruction of language type must be repeated twice in Japanese and English, the recognition rate and usability are not compatible, and either one has to be taken. This has also been a major obstacle for the spread of character recognition devices.
[0006]
[Means for Solving the Problems]
In order to solve the above problems, an image processing method according to the present invention includes an extraction step of extracting a text region from an input image , and analyzing each image region by analyzing image characteristics for each text region extracted in the extraction step. The language type determination step for determining whether the language type is Japanese, English, or unknown, and setting the determined language type in each area, and any of the text areas extracted from the input image is Japanese If it is determined in the language type determination step that the language type of the word is determined, the language type of the text region determined to be unknown in the language type determination step is set to Japanese, while being extracted from the input image If it is determined in the language type determination step that all the text regions that have been determined are not Japanese language types, the text region determined to be unknown in the language type determination step The language type set by the language type determination step and the resetting step is set to English, and the language type set in the Japanese language region is suitable for Japanese character recognition. Character recognition is performed using a Japanese character recognition function capable of recognizing alphabets. On the other hand, the language type set by the language type determination step and the resetting step is English for the text region image in English. And a character recognition step for recognizing characters using an English character recognition function suitable for character recognition.
[0007]
In order to solve the above-described problems, an image processing apparatus according to the present invention includes an extracting unit that extracts a text region from an input image, and an image characteristic is analyzed for each text region extracted by the extracting unit. Language type determination means for determining whether the language type is Japanese, English, or unknown, and setting the determined language type in each region, and any of the text regions extracted from the input image is Japanese If the language type determining unit determines that the language type is the language type, the language type of the text region determined to be unknown by the language type determining unit is set to Japanese, while being extracted from the input image If the language type determination unit determines that all the text areas that have been processed are not Japanese language types, the language type of the text area that is determined to be unknown by the language type determination unit is set to English. Japanese characters that are suitable for Japanese character recognition and can also recognize alphabets for images in a Japanese text region whose language type set by the determination unit, the language type determination unit, and the resetting unit Recognize characters using a recognition function, and on the other hand, an English character recognition suitable for English character recognition for an image of an English text region whose language type is set by the language type determination unit and the resetting unit And character recognition means for recognizing characters using the function.
[0008]
In order to solve the above problems, the computer-readable storage medium of the present invention includes an extraction step of extracting a text region from an input image, and analyzing image characteristics for each text region extracted in the extraction step. A language type determination step for determining whether the language type of each text area is Japanese, English, or unknown, and setting the determined language type for each area, and any of the text areas extracted from the input image If the language type determination step determines that is a Japanese language type, the language type of the text area determined to be unknown in the language type determination step is set to Japanese, while the input If it is determined in the language type determination step that all the text regions extracted from the image are not Japanese language types, it is determined that the text type is unknown in the language type determination step. A reset step for setting the language type of the text region set to English, and a language type set by the language type determination step and the reset step for a Japanese text region image. Character recognition is performed using a Japanese character recognition function suitable for character recognition and also capable of recognizing the alphabet, while the language type set by the language type determination step and the resetting step is for an image in an English text region Thus, a control program for causing a computer to execute each step of character recognition using an English character recognition function suitable for English character recognition is stored.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0024]
FIG. 5 is a block diagram of an apparatus according to the present invention.
[0025]
A CPU (Central Processing Unit) 51 controls processing according to the present invention in accordance with a control program stored in the MEM 53. The process shown in the flowchart described later is also executed under the control of the CPU 51. 53 is a MEM (comprising a RAM and a ROM), and various data such as a control program for processing executed by the CPU 51, various parameters used for the processing, input images, and a dictionary for character recognition are stored in the MEM 53. . Reference numeral 54 denotes a display such as a CRT or LCD, which displays an input image, text of a processing result, an operation instruction screen, a recognition result read from a file in accordance with document identification information specified by the input means 59, and the like. The character recognition result stored in the MEM 53 is displayed on the display 54, and editing such as selecting a correct character from a plurality of candidate characters using the input means 59 is possible. A printer 55 such as an LBP or BJ printer prints an image, text, or the like. A scanner 56 optically reads an image of a document and inputs it to the apparatus as an electrical signal. A communication I / F 57 controls data transmission / reception via a public line or a LAN. The input of an image according to the present invention and the output of the result of processing the image can be transmitted and received with other terminals via this communication I / F. Reference numeral 58 denotes a storage medium that can be attached to and detached from the apparatus and can be read and written by a computer, such as a CD-ROM, CR-R, and FD. The image according to the present invention may be read from the storage medium 58 and the processing result may be written to the storage medium 58. Further, the control program stored in the MEM 53 may be installed from another terminal via the communication I / F, or may be installed from the storage medium 58. Reference numeral 59 denotes input means such as a keyboard and a pointing device, and instructions from the operator are given via this input means 59. Reference numeral 50 denotes a bus, which exchanges data between each means.
[0026]
FIG. 1 is a process schematic diagram of an apparatus according to the present invention. In the present embodiment, a case where Japanese and English are recognized as examples will be described.
[0027]
In FIG. 1, reference numeral 2 denotes an image input unit for inputting a document image, which is input from another terminal or a storage medium 58 via a scanner 56 or a communication I / F 57 and stored in the MEM 53. 4 is a region identifying unit for the input document image, 6 is a Japanese / English discriminating unit for discriminating whether the region is Japanese or English, 8 is a Japanese / English discriminating control unit for controlling the Japanese / English discriminating unit, and input means In response to whether or not language type automatic discrimination by 59 is to be performed, and in response to an input of a recognition mode instruction in the case where automatic discrimination is not to be performed, a control signal representing the instruction is generated. Reference numeral 10 denotes a character recognition unit for recognizing Japanese and English characters, and each of these processes is executed by the CPU 51 in accordance with a control program stored in the MEM 53 as shown in a flowchart described later.
[0028]
Next, the operation will be described.
[0029]
The document image acquired from the image input unit 2 is divided into small regions (hereinafter referred to as blocks) for each attribute in the region identification unit 4 as shown in FIG. 2, and the block number, block attribute, and block size are divided. Identify the location of the block. This area identification processing performed by the area identification unit 4 analyzes the arrangement of black pixels in the input document image, determines attributes such as text, figure, image, separator, etc. Are identified as one block. Also, even if the text is the same, if the line direction is different or the line spacing is wider than the specified value, it is determined that the text is different in level and is identified as a different block.
[0030]
When the block is identified, a block number is automatically assigned to each block, for example, in order from the top, and width information and height information indicating the size of the block, and the position of the block are defined. Coordinate data is set and stored in the MEM 53 as block data. By referring to this block data, it is possible to extract a desired one block image of the input image, or to reproduce the arrangement of text, diagrams, images, separators, etc. in the document. On the other hand, in the Japanese / English discrimination control unit 8, the user sets whether or not to perform Japanese / English automatic discrimination, and the recognition language mode (when the user does not set automatic Japanese / English discrimination) ( A control signal corresponding to Japanese recognition or English recognition) is generated and input to the Japanese-English discriminating unit 6. The Japanese / English discriminating unit 6 determines whether the character area is Japanese or English based on the control signal generated by the Japanese / English discriminating control unit 8 and the region attribute set by the region identifying unit 4.
[0031]
The image data of the character region to which the Japanese or English attribute is added is sent to the character recognition unit 10, and character recognition corresponding to the language specified by each attribute is performed. The character recognition unit 10 performs a Japanese character recognition routine in S406 and an English character recognition routine in S408, which will be described later. This may include separate character recognition separate units, or cut out character images. A recognition dictionary may be provided for each language, and a common matching algorithm may be used.
[0032]
When the character recognition process in the character recognition unit 10 is completed, the result of character recognition is added with document identification information and stored (filed). This document identification information may be input by a keyboard operation by the user, or may be extracted from a character recognition result. Alternatively, the input date / time, time, user ID, and the like may be automatically given. It may be different for each document so that it can be used as an index for later reading from a file. The data to be filed may include not only the character recognition result but also the layout information of the input image and the data area where the input image is compressed.
[0033]
Here, the Japanese / English discrimination unit 6 and the character recognition unit 10 will be described in detail.
[0034]
FIG. 3 is a flowchart showing the processing flow of the Japanese-English discriminating unit 6.
[0035]
S302 is a routine for reading a control signal generated by the Japanese / English discrimination control unit 8 and stored in the MEM 53 in accordance with a user instruction. When there are a plurality of document images to be processed, the control signal stored in the MEM 53 is selectively read in association with the image currently specified as the processing target. This control signal represents whether or not automatic Japanese / English discrimination is performed and the recognition language mode when it is not performed.
[0036]
In S304, it is determined whether or not to perform Japanese-English automatic discrimination based on the control signal read in S302. If so, control is passed to S306, and if not, control is passed to S322.
[0037]
S306 is a routine for reading the attribute of each block divided as shown in FIG. 2 by the area identification unit 4, and reads the block data stored in the MEM 53 one by one in the order of the block numbers.
[0038]
In S308, based on the block attribute in the block data read in S306, it is determined whether or not the block is a text area and the process should proceed to S310. Control is transferred to discrimination processing. If it is determined in S308 that the block is not a text area and should not proceed to S310, the control is returned to S306 and the next block data is read from the MEM 53. In the example of FIG. 2, since the text attributes are given to the blocks 2, 3, and 4, these blocks are determined to be sentence regions, and are the target blocks for the Japanese-English discrimination processing in S310 and thereafter.
[0039]
S310 is a routine for analyzing the image characteristics of the block to determine whether the language type is Japanese or English, and is specified based on the block position and size data in the block data read in S306. This process is performed on the image of the corresponding area from the input image of the obtained MEM 53. This can be achieved by applying, for example, a technique for automatically discriminating language types such as Japanese and English from the features of input images described in Japanese Patent Application Laid-Open Nos. 8-339424 and 8-305792.
[0040]
In S312, based on the result determined in S310, a new attribute, that is, a language type, is set in the block data and stored in the MEM 53. Three types are set here: Japanese, English, and unknown. “Unknown” is set when the probability of being Japanese and the probability of being English are lower than a threshold in the Japanese-English discrimination routine in S310.
[0041]
In S314, it is determined whether or not the language type determination process in S306 to S312 has been performed for all blocks of the document image stored in the MEM 53. If all the processes have been completed, the process proceeds to S316. Control is transferred to S306.
[0042]
S316, S318, and S320 check the language type of all blocks in the document. If there is any Japanese block in the block, unknown block is set as Japanese block, otherwise it is all English block. If it is, it is a routine for determining the language type of the English block and the unknown block to be set. In this routine, in S316, the attributes related to the language types of all the blocks stored in the MEM 53 are read to determine whether there is at least one block in which the Japanese attribute is set. The language type is set to Japanese for the block whose attribute of the block read in S316 is unknown and stored in the MEM 53 (S318), and it is determined that there is no Japanese block in S316. In this case, the language type is set to English for the block whose attribute of the block read in S316 is unknown and stored in the MEM 53 (S320). Note that the determination criterion of S316 may be determined in advance, and a predetermined number (1, 2,...) Of a specific language type block is a predetermined ratio (5%, 10%,...). There may be.
[0043]
Even if the unknown block is actually an English block even if it is set as a Japanese block in S318, the Japanese character recognition performed by the character recognition unit 10 includes alphabet recognition (ie, Japanese (In addition to kana, kanji, numbers, symbols, etc., the alphabet is also recognized as a recognition character), so it will not become unrecognizable, but in the opposite case, that is, if the Japanese block is set incorrectly as an English block Because it falls into impossibility, it means that the person who makes such a judgment is strictly conducted. That is, it is easier to select a language type having a large number of characters to be recognized. More preferably, when the recognition target character of one language type includes the recognition target character of the other language type as described in the present embodiment, it is easy to select the language type of the other language type. Good.
[0044]
S322 is a routine for setting a recognition language mode when it is determined in S304 that automatic Japanese-English discrimination is not performed. This setting is performed based on a control signal generated by the Japanese / English discrimination control unit 8 according to the language type operated and instructed by the operator via the input means 59.
[0045]
S324 is a routine for setting all the text areas in the document to Japanese and storing them in the MEM 53 when it is determined in Japanese language recognition mode in S322.
[0046]
S326 is a routine for setting all the text areas in the document to English and storing them in the MEM 53 when the English recognition mode is determined in S322.
[0047]
FIG. 4 is a flowchart showing the flow of character recognition processing performed by the character recognition unit 10.
[0048]
S402 is a routine for reading, from the block data stored in the MEM 53, block data of a sentence block whose attribute is set as text and image data in the block.
[0049]
S404 is a routine for determining the language type attribute of the block based on the attribute of the block data read in S402. Attributes determined by this routine are the attributes of the language type set in S310, S318, S320, S324, and S326.
[0050]
S406 is a routine for performing Japanese character recognition in the character recognition unit 10 when it is determined in S404 that the language is Japanese.
[0051]
S408 is a routine for performing English character recognition in the character recognition unit 10 when it is determined that the English is determined in S404.
[0052]
S410 determines whether or not the character recognition processing of S402 to S408 has been performed on all the text blocks of the document stored in the MEM 53. If there are still processing blocks, the control proceeds to S402, and the next Read a sentence block.
[0053]
As described above, according to the present invention, a Japanese-English automatic discrimination technique is used as a core, a user-friendly interface is provided for the control, and an unknown post-processing is also performed for unknown block processing. Since a character recognition device can be constructed, recognition accuracy and usability can be improved at the same time even in a document in which Japanese and English are mixed, which contributes greatly to the spread of the character recognition device.
[0054]
In this embodiment, the language type discrimination between Japanese and English has been described as an example. However, the present invention is not limited to this, and other languages may be discriminated. In this case, basically, the S310 Japanese / English discriminating unit in FIG. 3 can be dealt with only by replacing it with a language type discriminating routine suitable for it.
[0055]
【The invention's effect】
As described above, according to the present invention, it is possible to construct a character recognition device that provides a user-friendly interface for the control based on automatic language type discrimination technology, and also considers post-processing at the time of erroneous determination. Even in a document in which different languages are mixed, recognition accuracy and usability can be improved at the same time, which can greatly contribute to the spread of the character recognition function.
[0056]
As described above, according to the present invention, a sentence divided into a plurality of regions included in a manuscript is distinguished for each region, and character recognition is performed according to the language type of each region. Recognition of mixed documents can be performed at high speed with high accuracy.
[Brief description of the drawings]
FIG. 1 is a process schematic diagram of a character recognition apparatus according to the present invention. FIG. 2 is an exemplary diagram of region identification results. FIG. 3 is a flowchart showing a process flow of a Japanese / English discrimination unit. FIG. 5 is a configuration diagram of an apparatus according to the present invention.

Claims

An extraction step for extracting a text region from the input image;
By analyzing image characteristics for each text area extracted in the extraction step, it is determined whether the language type of each text area is Japanese, English, or unknown, and the determined language type is assigned to each area. and language type determination step of setting,
If any of the text regions extracted from the input image is determined to be a Japanese language type in the language type determination step, the language type of the text region determined to be unknown in the language type determination step On the other hand, if it is determined in the language type determination step that all the text regions extracted from the input image are not Japanese language types, it is determined that the language type is not known in the language type determination step. A reset step for setting the language type of the designated text area to English,
A Japanese character recognition function suitable for Japanese character recognition and capable of recognizing the alphabet is used for an image of a text region in which the language type set in the language type determination step and the resetting step is Japanese. On the other hand, for an image of an English text area whose language type is set in the language type determination step and the resetting step, an English character recognition function suitable for English character recognition is used. image processing method for a character recognizing character recognition step, and Turkey to have a characterized.

An automatic determination permission / inhibition step for determining whether or not to automatically determine the language type;
If it is determined in the automatic determination enable / disable step that language type automatic determination is instructed , the language type of each area is set in the language type determination step and the resetting step , and the setting is performed in the character recognition step. Character recognition based on the language type
On the other hand, if it is determined in the automatic determination availability step that the language type automatic determination is not instructed and the language type is instructed by the user, in the character recognition step, the language instructed by the user The image processing method according to claim 1, wherein the image of each region is recognized based on a seed .

Extraction means for extracting a text area from the input image;
By analyzing image characteristics for each text area extracted by the extraction means, it is determined whether the language type of each text area is Japanese, English, or unknown, and the determined language type is assigned to each area. and language type determination means for setting,
The language type of the text region determined to be unknown by the language type determination unit when the language type determination unit determines that any of the text regions extracted from the input image is a Japanese language type On the other hand, if it is determined by the language type determination unit that all the text regions extracted from the input image are not Japanese language types, the language type determination unit determines that the text region is unknown. Resetting means for setting the language type of the designated text area to English,
A Japanese character recognition function suitable for Japanese character recognition and capable of recognizing the alphabet is used for an image of a Japanese text region whose language type is set by the language type determination unit and the resetting unit. On the other hand, the language type set by the language type determination unit and the resetting unit uses an English character recognition function suitable for English character recognition for an image of an English text region. the image processing apparatus according to claim and Turkey that Yusuke and character recognition character recognition means.

An extraction step for extracting a text region from the input image;
By analyzing image characteristics for each text area extracted in the extraction step, it is determined whether the language type of each text area is Japanese, English, or unknown, and the determined language type is assigned to each area. and language type determination step of setting,
If any of the text regions extracted from the input image is determined to be a Japanese language type in the language type determination step, the language type of the text region determined to be unknown in the language type determination step On the other hand, if it is determined in the language type determination step that all the text regions extracted from the input image are not Japanese language types, it is determined that the language type is not known in the language type determination step. A reset step for setting the language type of the designated text area to English ,
A Japanese character recognition function suitable for Japanese character recognition and capable of recognizing the alphabet is used for an image of a text region in which the language type set in the language type determination step and the resetting step is Japanese. On the other hand, for an image of an English text area whose language type is set in the language type determination step and the resetting step, an English character recognition function suitable for English character recognition is used. A computer-readable storage medium storing a character recognition step and a control program for causing a computer to execute each step .