JP3445394B2

JP3445394B2 - How to compare at least two image sections

Info

Publication number: JP3445394B2
Application number: JP33297894A
Authority: JP
Inventors: ピー．ハッテンロハーダニエル; ダブリュ．ジャキスエリック
Original assignee: ゼロックス・コーポレーション
Priority date: 1993-12-17
Filing date: 1994-12-14
Publication date: 2003-09-08
Anticipated expiration: 2018-09-08
Also published as: US5539841A; JPH07200745A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、あるイメージデータの
配列内に表わされ、ワード（単語）、複数の連結した構
成要素、あるいは意味が解釈可能な同様の複数のユニッ
トを形成するイメージトークンを形状により比較する方
法であって、そのためにキャラクター（文字）、シンボ
ル（記号）、グリフ（絵文字など）、あるいはトークン
を形成する要素を個々に検出、あるいは識別しなくても
良い方法に関するものである。【０００２】【従来の技術】電子的にエンコードされた文書（電子文
書）内のテキストは、２つの互いに異なったフォーマッ
トのいずれかであることが多い。第１のフォーマットで
は、テキストがビットマップになっており、この場合、
テキストはイメージデータあるいは画素の配列としての
み定義された同様の表現の隣接したイメージと基本的に
区別できない。このフォーマットでは、テキストは文章
の内容だけをベースとするコンピュータの処理対象には
殆どならず、処理するためにはイメージユニットに分割
しなければならない。第２のフォーマットは、以下にお
いてキャラクターコードフォーマットと呼ぶが、このテ
キストはキャラクターコード（例えばアスキーコード）
のストリング（列）として表現される。このキャラクタ
ーコードフォーマットにおいては、テキストのイメージ
あるいはビットマップは不要である。【０００３】光学式文字認識プロセス（ＯＣＲ）による
ビットマップからキャラクターコードへの変換は、時間
と処理に係る手間を考えると非常に高価である。個々の
キャラクターのビットマップを、それに隣接するものか
ら区別し、その外観を解析し、さらに、意思決定プロセ
スによって予めセットされたキャラクター群の中のある
キャラクターとして識別しなければならない。ミヤタケ
らに付与された米国特許第４，９５６，８６９号にはコ
ンタワーライン（濃度の等しい線、輪郭線）をトレース
するさらに効率的な方法が示唆されている。【０００４】しかしながら、あるオリジナル（原稿）を
スキャンして電子文書を抽出するときに、その複製物の
イメージ品質やノイズによって、ビットマップの実際の
外観が不確定なものになる。ビットマップの外観が劣化
するのは、品質の悪いオリジナル文章、スキャニングの
エラー、あるいはイメージのデジタル再生に影響を与え
る同種のファクターが原因である。従って、キャラクタ
ーを識別するための決定プロセスには、それに関する固
有の不確実さが存在する。これに関し特に問題となるこ
とは、テキスト内のキャラクターが不鮮明となったり、
あるいは結合し易いことである。大抵のキャラクター識
別プロセスは、あるキャラクターが連結した画素が１つ
の独立したセットであることを仮定することから始ま
る。入力されたイメージの品質が原因でこの仮定が成り
立たないと、キャラクターの識別も失敗することにな
る。【０００５】以下の特許はキャラクターの判別を改善す
るアプローチに関する技術を特に示している。マノンに
付与された米国特許第４，９２６，４９０、シェランジ
に付与された米国特許第４，５５８，４６１、グレイ等
に付与された米国特許第３，２９５，１０５、バースキ
ーらに付与された米国特許４，９４９，３９２、ロング
フードらに付与された米国特許第５，１４２，５８９。【０００６】ＯＣＲ法は様々なやり方でイメージを分割
するようにしている。例えば、シェランジに付与された
米国特許第４，５５８，４６１およびペッパーズらに付
与された米国特許第４，８０９，３４４がある。【０００７】ＯＣＲ法では辞書の単語と照合することに
よって信頼性を向上している。例えば、ヒシノに付与さ
れた米国特許第４，０１０，４４５に開示されている。
富士通科学技術ジャーナル２６，３、ページ２２４〜２
３３（１９９０年１０月）の「Ｆ６３６５日本語文書リ
ーダー」は、ブロック抽出、スキュー調整、ブロック分
割、隣接するキャラクターの区分、ライン抽出、および
辞書によるチェックおよび比較を行いパターンマッチン
グによってキャラクター認識をする各ステップを示して
いる。【０００８】単語やキャラクターのストリングを形成す
る複数のキャラクターのセットを識別するには、例えば
米国特許第２，９０５，９２７にあるような読みかたを
することが望ましいであろう。【０００９】認識するための基本的なユニットとしてワ
ード全体を用いることは、サインを認識する際に考えら
れており、フリシュコプに付与された米国特許第３，１
３３，２６６に示唆されている。しかし、分割されたキ
ャラクターを保持するという考えはない。【００１０】【発明が解決しようとする課題】本発明はＯＣＲ技術固
有の問題を防止することであり、そのためにワード（単
語）およびテキストストリングの基本的な特性を潜在的
に活用している。単語と単語の間のスペースは、文字と
文字との間のスペースより大きい傾向があり、従って、
キャラクターストリングを構成するトークンの分離およ
び識別を、そのトークン内の個々のキャラクターを識別
するのと比較できるほど改善できる。ＯＣＲ法は、しか
しながら、正しく識別を行う前段階としてキャラクター
の形態について幾つかの正しい判断が要求され、その中
には、アセンダー、デセンダー、カーブなどといったキ
ャラクター（文字）の部分の識別も含みまれ、これらは
間違い易いものである。本発明は、一方において、単語
やシンボルあるいはキャラクターのストリングといった
連結した構成要素（以下においてトークンという）のセ
ットを、さらに確実に認識し識別可能とすることであ
る。１つの実施例において、本発明はイメージ内のテキ
ストやシンボルの特性を最初に決めるためにワードの境
界を利用している。続いて、その境界内で分離されたト
ークンの相互の、あるいはトークンイメージの辞書内の
既知のトークンとの比較が行われる。従って、比較する
段階までトークンのクラス分けは行われず、これによっ
て後続の処理における比較の間違いや、間違った決定の
原因となる無効な部分的にクラス分けを行うことの影響
を除くことができる。【００１１】コンピュータ処理されたテキストの潜在的
な用法を思案すると、少なくともあるケースにおいて
は、ワードのそれぞれの文字を導くことが処理上の要求
として課せられないことが決定された。従って、例え
ば、あるテキストイメージのキーワードサーチを行う
際、ＯＣＲ技術を介しておれぞれのワードのそれぞれの
文字をコンバートするのではなく、１つあるいはそれ以
上のキーワードがあるか否かを、欠陥があるかもしれな
いキャラクターコードから連続して決定する際に、コン
ピューターは、何かを生成するのではなく、テキストイ
メージ内の複数のトークンの形状とそのキーワードを表
すトークンの形状とを比較し、そのトークンの形状の対
比からキーワードが存在するか否かを評価する。このよ
うなシステムで出力すると、ユーザーが容認できる程度
の精度でキーワードの存在を示す何らかの表示を表せる
ものである。さらに、ここで説明する新規の方法はキャ
ラクターを認識するために設計された幾つかの方法より
処理スピードが早いと考えられる。またさらに、本発明
はイメージ編集システムにも適用でき、従って、本発明
は説明している実施例に限定されるものではない。【００１２】ＯＣＲ法によって文字が正しく決定できな
い確率は比較的に低いと思われるが、プロダクト（積）
ルールを適用するとその確率はワード全てに対し倍加し
て蓄積される。従って、ＯＣＲを用いて複数のワードを
キャラクターコードのストリングに変換すると、これら
のワードをサーチし、あるいは認識する以前にかなりの
エラーが発生するであろう。本発明はトークンのレベ
ル、あるいはテキストを認識する実施例ではワードレベ
ル、イメージデータを分割したものを用いて、通過する
テキストを読み抽出する際に人間が用いているのと同様
の方法で連続した認識を可能としている。さらに、説明
しているトークンの形状を認識するプロセスは幾つかの
効果を備えている。第１に、ビットマップイメージデー
タは回復できないような状態で失われることはなく、ま
た、そのビットマップの合理的表示は残るので、ユーザ
ーは必要であればキャラクター、シンボル、グリフ、あ
るいはワードを決定するために再生されたビットマップ
で確認できる。第２に、連結した要素（トークン）を用
いることによって、それぞれのシンボリックなエレメン
ト（すなわちキャラクター（文字））はトークン（すな
わちワード）全体の文脈を持って、そのトークンを他の
トークンの形状と比較する際の助けとなる。例えば、あ
るワードトークン内に形の崩れた文字があっても、これ
はワードの形状の全体を識別するのには殆ど影響を与え
ず、それらの単語を表す２つの比較されたトークンの間
が一致する確率を若干減らすだけである。さらに、ＯＣ
Ｒ法の能力と比較すると、ＯＣＲ法はキャラクターを多
く持っているワードに対し間違った結果となり易いのに
対し、本発明は一般にもっと骨の折れるワードを識別す
る能力がある。【００１３】ＯＣＲ法は、ビットマップから代表するキ
ャラクターコードに変換し、これによってビットマップ
の情報を含んだ内容を失うことがある。概ね、このプロ
セスは、キャラクターコードからオリジナルのビットマ
ップを得るような可逆的なものではない。しかしなが
ら、形状を基にしてワードトークンを識別すると、本発
明の１つに従って説明してあるように、認識するプロセ
スまでビットマップの情報を持っていることができ、こ
れによってビットマップを再構築することができる。【００１４】【課題を解決するための手段】本発明によると、各々が
トクーンを表わすと共に複数のイメージシグナルを備え
た少なくとも２つのイメージセクションを比較して、類
似のトークンを判別する、以下のステップを有する方法
を提供できる。（ａ）第１のトークンを表すイメージシ
グナルを第１のモデルメモリーに格納し、（ｂ）第１の
イメージメモリーに第１のトークンを膨張して表したも
のを作成し、（ｃ）第２のトークンを表すイメージシグ
ナルを第２のモデルメモリーに格納し、（ｄ）第２のイ
メージメモリーに第２のトークンを膨張して表したもの
を作成し、（ｅ）第１のモデルメモリーに格納されたイ
メージシグナルを、第２のイメージメモリーに格納され
たイメージシグナルを比較し、第１の類似性の距離を決
定し、（ｆ）第２のモデルメモリーに格納されたイメー
ジシグナルを、第１のイメージメモリーに格納されたイ
メージシグナルを比較し、第２の類似性の距離を決定
し、さらに、（ｇ）第１および第２の類似性の距離に対
応して第１のトークンと第２のトークンが類似か否かを
判定する。本発明の一態様は各イメージセクションがト
クーンを表わし、複数のイメージシグナルを備えた少な
くとも２つのイメージセクションを比較して、類似のト
ークンを識別する方法であって、（ａ）第１のトークン
を表すイメージシグナルを第１のモデルメモリーに格納
し、（ｂ）第１のイメージメモリーに前記第１のトーク
ンを膨張して表したものを作成し、（ｃ）第２のトーク
ンを表すイメージシグナルを第２のモデルメモリーに格
納し、（ｄ）第２のイメージメモリーに前記第２のトー
クンを膨張して表したものを作成し、（ｅ）前記第１の
モデルメモリーに格納された前記イメージシグナルを、
第２のイメージメモリーに格納された前記イメージシグ
ナルと比較し、第１の類似の距離を決定し、（ｆ）前記
第２のモデルメモリーに格納された前記イメージシグナ
ルを、第１のイメージメモリーに格納された前記イメー
ジシグナルと比較し、第２の類似の距離を決定し、さら
に、（ｇ）前記第１および第２の類似の距離に対応して
前記第１のトークンと前記第２のトークンが類似か否か
を判定する、少なくとも二つのイメージセクションの比
較方法である。【００１５】【実施例】以下において、図面を参照するが、図面に示
されているものは本発明の望ましい実施例を示すための
ものであり、同等のものに限定するためではない。図１
は、一般化された画像処理システムの概要を示してあ
り、本発明を有効に活用できる多くの状況をこれでカバ
ーできる。一般に、ソースイメージは、スキャナー、フ
ァクシミリ装置、あるいは記録システムなどのソースイ
メージ抽出システム２から抽出される。このソースイメ
ージはコンピュータ処理装置４に送られるが、処理装置
４は幾つかの公知の装置のいずれでも良く、ここで述べ
るような発明に係る装置でも良い。ユーザーインタフェ
ース６に入力されたコマンドに応答して、処理装置４は
出力装置８に出力を行い、この出力装置もプリンター、
ディスプレイ、ファクシミリ装置あるいは他の記録装置
であっても良い。基本的には、図１の上部に示したよう
に、入力文書がシステムに入れられ、そこから出力文書
が回収される。【００１６】以下において、イメージとはイメージビッ
トマップとして記述されたものを言い、ここではイメー
ジとは複数のラスタライズ（ラスタ化）された（走査線
に分解された）イメージシグナルによって表される。こ
れらの信号は画素と通常呼ばれており、文章上で対応し
たマークやアクティブなポジションを表現するときは一
般に黒色で現され、これらによって文書やマークが作成
される。これらの構成は、本発明を記述するために用い
られているが、白黒や２値のイメージなどに範囲が限定
されるものではない。むしろ、本発明はイメージを表す
技術の広い範囲にわたって概ね適用できるものである。
さらに、本発明は、イメージ内、あるいはイメージ間の
複数のトークンの類似性を判定することも目指してい
る。１つの実施例において、本発明はワード境界内の単
語の対象（ワードオブジェクト）の類似性を決定するた
めに好適であるが、これはイメージを編集し圧縮するた
めにも用いることができ、このように、以下に述べる実
施例に完全に限定されるものではない。【００１７】図２に、本発明の実施例である単語をその
形状から決定、分割および比較するシステムを示してあ
る。本システムのそれぞれの要素は多くの装置であって
も良く、あるいは、単に１つの装置内において１つのプ
ログラムであっても良い。同様に、以下においてワード
オブジェクトを認識するための望ましい実施例を説明し
てあるが、本発明の基幹をなす比較技術は、この特定の
実施例に関わり説明してある厳格な前処理操作を必要と
するものではない。【００１８】入力ビットマップ１０によって開始される
が、このソースは決定的なもの、あるいは本発明の一部
をなすものでもない。ビットマップは最初に分割システ
ム（セグメンテイションシステム）１２に送られ、そこ
で複数のトークン（ワード、キャラクターストリング、
あるいは意味の解釈できる他のユニット）の境界が決定
される。最初に、イメージビットマップはデスキューワ
ー（スキュー戻し器）１４を通り、このデスキューワー
はイメージ内に配向されたテキストの角度を決定し、そ
の配向を修正する。このデスキュー操作によって作られ
たデスキューされたイメージを用い、ワードボクサー
（ワード囲い器）１６において複数のワードトークンの
境界が決定され、このトークンの境界と共に、イメージ
内のテキストラインの境界も識別される。単語分離器
（ワードセグメンター）１８において、イメージビット
マップに対してワードトークンの境界が適用され、その
イメージ内のそれぞれのワードグループが読み出される
順番に分離され、これらが１つのユニットとしてその後
取り扱われる。ここで「ワード（単語）」、「シンボル
ストリング」あるいは「キャラクターストリング（文字
列）」とは連結したアルファベットあるいは句読点など
の要素、あるいはさらに広範囲なトークンの集合を意味
し、意味を解釈可能なユニットの全てあるいは一部を形
成するものである。このような解釈可能なユニットは、
イメージの中で特徴付けられており、そのユニット自体
を構成する隣接する要素、サインあるいはシンボルを分
離する隙間（スペーシング）より大きな隙間によって区
別されている。この点で本発明は異なった適用が可能で
あり、例えば、文章や単語の編集システムにおいて、イ
メージを連続して処理（操作）するために独立した単語
の形状を使用できる。従って、本発明は単語認識関係だ
けに限定されるものではない。【００１９】次に、形状比較器２４がそのイメージ内の
個々のワードを現すワードトークンの形状を、辞書２６
からの既知あるいは既に識別されたワードトークンの形
状と比較する。他の例として、形状比較器２４をイメー
ジ１０から決定された２つあるいはそれ以上のワードト
ークンの形状を比較するために用いても良い。望ましい
実施例において、コンパレーター２４は比較されるワー
ドトークン形状同士の間の類似度を特徴付けるためにハ
ウスドロフ（Hausdorff ）距離を変数として用いてい
る。さらに重要なことは、形状比較器２４は、識別され
ていないキャラクターのストリングからのワードトーク
ン形状を既知のワードトークン形状と比較するだけに止
まらないことである。単純な文脈においては、コンパレ
ーター２４は、１つのトークンの形状を他のトークンの
形状と比較する単なる装置であり、本発明の実施例にお
いては、突き合わせ指示出力によって表される。２つの
トークン形状の間の類似度を相対的に示す。【００２０】ワードイメージの文脈内のトークン形状を
決定し、比較する方法あるいは装置の概略を示すため
に、形状を比較する実施例の各々の工程を以下でさらに
詳しく説明する。本発明の処理をさらに説明するため
に、図３にサンプルイメージを示してあり、これは公衆
の資産となったものから取ってあり、その中には文章が
何行か含まれている。図３は、テキスト（文章）のペー
ジ上にイメージが現れる様子の概略を示し、図４、５お
よび６は、そのページのスキャンされたイメージの一部
を示してあり、そこにはビットマップのイメージが拡大
して示され公知のＯＣＲ技術における問題を示してあ
る。図３を見ると、例えば、このテキストイメージの２
行目のワードイメージ５０は「formation 」であり、４
行目のワードイメージ５４は「automobile」であり、幾
つかの文字が繋がって見える。【００２１】さらに、数多くの公知の微小角度のイメー
ジの回転する方法、あるいはスキューの修正方法を、こ
のイメージのデスキューされた表示を得るために用いる
ことができる。【００２２】イメージがデスキューされると、トークン
が多くの方法によって抽出でき、その方法の選択はトー
クン比較の第一のアプリケーション（適用）に依存す
る。本発明において例示されている比較技術は、複数の
境界５８内に表された複数の構成要素トークンに用いて
それらが互いが一致すること、あるいはキーとなるトー
クンと一致ことを識別する。識別されると、より大きな
文章イメージ内の一致した、あるいは既知のトークンは
ラベルが付けられ、あるいは後続の処理のために同様に
識別される。例えば、後続の処理には、電子的に表現さ
れたドキュメント内の情報を識別し、アクセスし、抽出
することが含まれ、さらに、１９９３年６月２４日に公
開されたピーターＢ．マークらの公開公報（Ｗ０−９３
／１２６１０）「イメージを圧縮するための方法および
装置」に開示されている圧縮技術も含まれる。ドキュメ
ントイメージの部分を表す複数のトークンは直ぐには活
用できるが、イメージ処理を進め、ワードベースのトー
クンを生成することがワードを識別する上で望ましい。
ワードや関連するキャラクターのストリングによって構
成された複数のトークンを比較し認識する実施例を以下
で説明する。【００２３】図２に示したように、ワードボクサー１６
は、デスキューされたイメージに対し図７および図８に
示すフローチャートに従った操作をする。ワードボクサ
ーで実施される処理工程の以下の説明は、プログラム可
能なコンピューターにおいて実行される操作を用いて説
明してあるが、本発明をこの実施例に限定しているので
はない。ステップ８０が開始すると、ワードボクサーは
先ず図３の入力イメージを読み、このイメージは必要で
あればデスキューワー１４でデスキューされている。こ
の機能は、例えば、ハードディスクあるいは同様のスト
レージ（記憶）装置といったメモリーに収納されたイメ
ージに単にアクセスするものであり、さらに、そのイメ
ージをそのイメージのために割り当てられたメモリーの
場所にコピーし、さらに、必要であれば、そのイメージ
にポインターを割り当てる。【００２４】イメージが抽出（検索）されると、ステッ
プ８２はそのイメージの中の連結した要素にを見つけ
る。このプロセスはストアーされた２値イメージの中か
ら黒色の画素を見つけるだけである。黒色の画素が見つ
かると、対話形（相互作用）プロセスが継続して隣接す
る黒色の画素、さらにそれらに隣接する黒色の画素を次
々と見つけ、連結した画素の範囲が決まるまで、継続し
て行う。さらに詳しく説明すると、８隣接定義(eight-n
eighbor connection definition)が用いられる。すなわ
ち、１つの画素が他の画素に対し８つのコンパス（範
囲）方向の一つに隣接すれば、、それらは隣接している
と考えられ同じ連結要素となる。さらに、そのイメージ
内の全ての黒色の画素が他の黒色の画素と適切に関連付
けられ、連結した要素が形成されるまでこのプロセスは
繰り返される。図４に示すように、連結した画素が関係
付けられると、角形のボックスあるいは境界５８が識別
され、これは連結した画素の最大の範囲を反映したもの
となり、角形のボックスはそのイメージのｘ−ｙ座標に
沿って配向される。【００２５】イメージ内の全ての連結した要素のグルー
プの周りに領域を示すボックス（バウンディングボック
ス）が規定されると、図４にイメージの一部を用いて示
してあるように、連結した要素のボックスあるいは境界
が識別されたセットの中から、ワードボクサーはバッド
（良くない）ボックス（図示されていないが）を見つけ
出す。バッドボックスは、以下のように特徴付けられ
る。（ａ）背の高いボックスであって、その高さが全イ
メージの高さより約２０パーセントより高く、そのイメ
ージ内のほぼ９０パーセンチル（百分位数）の高さより
大きなボックス、あるいは（ｂ）短いボックスであっ
て、９０パーセンチルの高さのほぼ１／３より下の高さ
のボックス。分析が終わると、残ったボックスは次に、
ドキュメントの垂直あるいはｙ軸（ｙ軸はデスキューさ
れたテキスト行の方向に対し垂直な軸と考えられる）に
投影されヒストグラムを形成し、これによってボックス
の境界の数がｙ軸に沿ったポジションの関数として反映
され、図３のイメージ全てに対し図９に示したようにな
る。望ましい実施例においては、ｙ軸に投影されたヒス
トグラムデータに対し、テキストラインの位置を決定す
る前にガウシンアン分布に従ったスムージング（平滑
化）を行っても良い。次に、ヒストグラムの結果から、
暫定的なラインあるいは行の境界がイメージのｙ軸に沿
ったポジションとして識別され、このラインはヒストグ
ラムにある谷にあたる。例えば、図９に示してあるよう
に、複数の谷あるいは最低点１２０は、隣接するピーク
あるいは最高点１２２同士の間に識別され、さらに、谷
１２０によってライン間のスペースの位置が判り、これ
らを図５に参照番号６２として示してある。この操作は
ステップ８８によって行われる。最後に、暫定（予備）
的なテキストラインあるいは行（ｒｏｗ）が決まると、
連結した要素のボックス全てを規定された行に割り当て
る機能が動作する。【００２６】テキストラインあるいは行６２の位置が暫
定的に決まると、連結した要素のバウンディングボック
スであって２つの行に横たわったものを、先ず特定の行
に割り当てる手順が行われる。フローチャートのステッ
プ９２、９４、および９６にあるように、この手順にお
いては前のステップであるステップ８８で識別された暫
定的なテキストラインが正しいかのチェックがさらに行
われる。先ず最初に、追って説明するように、テキスト
行の分離が失敗ではないことを確認する機能が動作す
る。一般に、連結した要素に着目すると、あるテキスト
行の中の投影された部分は、それらがｙ軸方向にひどく
重なっていないかぎりｘ方向に投影された部分がそれほ
ど重なることはない。ステップ９２において識別された
ように、投影した部分が重なっていると、その識別され
た行は２つあるいはそれ以上の別れた行である可能性が
高く、ｙ方向に投影されたグラフ内にさらに最低点を見
つけて分離しなければならない。また、テキストイメー
ジ内の例えば、「ｉ」の上のドットやワードの下線とい
った連結した要素の小さなグループの回りのバウンディ
ングボックスは無視し、テキスト行をさらに分離するよ
うな間違ってトリガーを引き起こさないようにしなけれ
ばならない。【００２７】２番目に、ステップ９６にあるように、ｘ
軸方向にそって互いに重なった残りのボックスを、この
マージ（併合）された要素を囲う境界をもった１つのボ
ックスにマージする。一般に、このマージプロセスでは
１つの行の中の複数のボックスを見渡し、ｘ方向に重な
り、さらに、ｙ方向にも最小限ある程度かさなったボッ
クスを識別する。このｙ方向の最小限の重なりは、約５
０パーセント程度が良い。例えば、スキャンしたイメー
ジが「fort」というワードを含んでいた場合、スキャン
によって、「f 」のボックスの右端が「o 」ボックスの
左端と重なることがあり、従って、x 軸に沿って重なっ
たボックスの要素をマージすると、「f」と「o 」のボ
ックスがマージされることになる。この手順において、
サイズのテストも行われ、所定のサイズより小さなボッ
クスはマージされない。続いて、この小さなボックスは
イメージ内のノイズとして識別され削除されることがで
きる。【００２８】３番目に、テキスト行が正確に検出される
と、この行内の残りのボックスは連結した要素あるいは
トークンであり、これらの内のあるものはワードあるい
は意味を解釈できる同様のエレメントを形成するために
さらに連結する必要がある。さらに隣接する要素を結合
してスキャンされたイメージ内のワードベースのトーク
ンを形成するために、ステップ９８においてテキスト行
内の隣接する要素同士の間の分離距離のヒストグラム化
を継続して行う。一般的なテキスト行の分布の結果を図
１０に示してあり、破線のカーブが行のヒストグラムデ
ータを示し、実線のカーブはそれを滑らかにしたもので
ある。期待通り、得られたカーブは２値モデルの分布を
概ね示し、ピーク１３０および１３２の第１のセットは
キャラクター間のスペーシングの分離距離の分布を表し
ており、これに対し、第２のピークは幅が広く、頻度も
低く、隣接するワードの間のセパレーションを反映して
いる。さらに、ある条件下では、単一モデルの分布も現
れる。２値モデル（ｂi-model ）の分布の２つの最大値
は、ステップ１００において、分離用のしきい値を先ず
識別するのに用いられ、さらに続いてワード間のセパレ
ーション（分離）とキャラクター間のセパレーションを
区別するためも用いられる。【００２９】この分離用のしきい値を用いて、次に、テ
キスト行内の隣接するボックスのうち、ｘ方向のセパレ
ーションが分離を示すしきい値より小さなものをマージ
するために、図８のステップ１０２の手順がコールされ
る。この手順では、単純に、それぞれの行にある連結し
た要素のセットで隣接したものの中で、分離を示すしき
い値より短い距離だけ分離されているもの全てがマージ
される。ワード内の隣接するキャラクターをマージする
と、その結果得られたボックス構造は、各々のテキスト
行内のワードトークンの境界を反映しており、例えば、
図６では複数のワードを囲う複数のボックス６６を示し
てある。この時点で、小さなマージされなかったボック
スをイメージ内のノイズとして認識し、取り除くオプシ
ョン操作を実施しても良い。続いて、読む順番（上から
下、およびそれぞれのテキスト行の左から右）に並べら
れたボックスのリストがステップ１０４で作成される。
ボックスリスト内の配列それぞれが、入力されたイメー
ジ内のそれぞれ１つのワードトークン、ピクチャー、句
読点、あるいは意味を解釈できる同等のユニットのバウ
ンディングボックス６６を規定している。【００３０】図２に戻って、例えば、イメージ内のワー
ドベースのトークンの境界を表すボックスリストを作成
するワードボクサー１６によって、あるトークンのリス
トが作成されると、このリストおよびビットマップイメ
ージはトークンあるいはワードの分割器（segmenter ）
１８に送られる。概ね、分割器（セグメンター）１８
は、一つのイメージ処理装置であり、入力されたイメー
ジ１０のビットマップをボックスリスト内に規定された
ワードあるいはトークンの境界に従って一連のより小さ
なビットマップイメージに分割できるものである。ワー
ドセグメンター１８からの出力はビットマップイメージ
の一連の流れ（シリーズ）であり、それぞれのイメージ
は、ワードボクサー１６によって識別されたワードトー
クンあるいは意味を解釈可能な同等のユニットを表すビ
ットマップを備えている。好ましい実施例において、ワ
ードセグメンター１８は、ワードボックスによって囲わ
れた入力イメージのそれぞれの部分に対して、分離され
たビットマップを実際に発生するものでない。むしろ、
セグメンターは単に窓開けの操作をしたり、あるいはビ
ットマップのある部分を選択することによって、特定の
トークンボックスの境界内であると規定されたそのイメ
ージのその部分へのアクセスを許可するものである。先
に説明したように、ワードセグメンター１８の出力であ
るワードトークンはコンパレーター２４に送られ、そこ
でトークンは辞書２６からの他のビットマップイメージ
と対比され、セグメンター１８から出力されたトークン
イメージと辞書から供給されたワードトークンとが一致
するか否かを判断される。【００３１】以下に説明するように、ワードイメージ同
士を比較するための１つの好ましい方法としてハウスド
ロフ距離を計る技術が使用されており、これは１９９１
年６月のヒュッテンロッチャーらによる「ハウスドロフ
距離を用いたイメージの比較」（ＴＲ９１−１２１
１）、および１９９２年１２月の「ハウスドロフ距離を
用いたイメージを比較するための多重解像技術」（ＴＲ
９２−１３２１）の記載に関連しており、いずれもカー
ネル大学のコンピューターサイエンス学部から出版され
ている。【００３２】一般に、ボックス化されたトークン同士を
比較する方法としては、特定のボックスへ識別された要
素同士を比較するための図１１および１２に示されたプ
ロセスが用いられる。以下に説明する単純化された実施
例は、あるイメージ内のワードベースのトークンが同じ
か、あるいは違うかを決定するためのものである。それ
ぞれのセクションあるいはワードトークンのビットマッ
プによる表現は、予め決定された境界を規定するための
ボックス（バウンディングボックス）により定められた
領域に対応している。ビットマップセクション同士間の
このような比較を行う一般的な方法は相関関係として一
般的に知られており、２つのイメージの論理的なＡＮＤ
（論理積）をとる操作が類似性を決定するために用いら
れる。本発明において、一方、この相関関係を膨張（di
lation) 技術を用いて改善しており、これによってイメ
ージを形成するために用いられるデジタル化処理に固有
の量子化エラーによる影響を排除している。【００３３】以下で採用しているように、比較される２
つのトークンイメージをボックス１およびボックス２と
する。これらのイメージ部分（イメージセクション）
は、同一のイメージからの２つのセクションであっても
良く、異なったイメージからの２つのセクションでも良
く、あるいはあるイメージからの１つのセクションと、
入力されたシンボルのストリング、ワードあるいはトー
クンを形成する意味の解釈可能なユニットから電子的に
作成された１つのセクションであっても良い。図２にお
いてワードイメージの「辞書」として表されているが、
ブロック２６の一般的な目的は、他のトークンイメージ
のセクション（ボックス１）と比較するためのトークン
イメージのセクション（ボックス２）を提供することで
ある。図６に示してあるように、「automobile」７０お
よび７２の２つのワードベースのトークンの表現を本発
明に従って比較しても良く、ここでは表現７２は「辞
書」から導いても良い。ボックス１およびボックス２の
セクション７０および７２がそれぞれ規定されると、そ
れぞれにあるイメージは「モデル」と呼ばれ、さらに、
このモデルを膨張したものが作られ以下ではこれを「イ
メージ」と呼ぶ。【００３４】図１１および１２に概略を示してあるよう
に、コンパレータ２４で用いられている比較法は、先
ず、モデル１（１５０）内の画素、すなわち、ボックス
１で囲われたセクション内のオリジナルの画素を、ボッ
クス２で表された画素の膨張された表現のイメージ２内
（１５６）の画素と比較し、ブロック１６０でこの比較
から第１の距離が発生される。同様に、このプロセスが
逆転され、モデル２（１５２）内の画素であるボックス
２で囲われたセクションのオリジナルの画素と、ボック
ス１で現れた画素の膨張された表現のイメージ１（１５
４）内の画素とが比較され、第２の距離がこの比較から
ブロック１６２で生成される。続いて、ブロック１６
４、１６６および１６８においてこの２つの距離が数値
的に処理され、ボックス１およびボックス２で囲われた
これら２つのイメージセクションの類似性の度合が決定
される。【００３５】さらに詳しく説明すると、コンパレータ２
４は、最初に、モデル用のメモリーの場所にイメージ２
６の「辞書」に指定されたあるワードイメージ（ボック
ス２）の境界内の画素を複製する。これらの画素は以下
でモデル２とされる。コンパレータは次にメモリーの第
２の場所にモデル２を複製し、さらに、図１１に示すよ
うに膨張し、イメージ２（膨張されたイメージ１５６）
をステップ２００で作成する。すなわち、モデル２のメ
モリーに格納された「オン」あるいは黒色の画素全てに
対しイメージ２のメモリー内のこれの周囲に隣接する部
分をオンあるいは黒くする。隣合った正確な数は膨張半
径（ディレイションラディウス）として規定されてお
り、予め決められている。例として、好ましい膨張半径
が画素１．０個であると隣接した４つがオンになり、半
径を画素１．４個とすると隣接する近傍の画素８つが全
てオンになる。これ以上膨張半径を大きくすると、同一
でないワード同士を間違って一致させてしまう可能性が
大きくなる。【００３６】膨張半径の選択は量子化によるエラーの防
護となるように行われ、このエラーは主にデジタル化の
プロセスにおいて発生する。膨張半径を選択する際に、
単純な相関関係を求める際（膨張半径を効果的に上０と
した場合であるが）に導入されるようなエラーを抑制す
ることが望ましく、一方、膨張させすぎたイメージ（例
えば、大きな膨張半径とした場合）の比較に起因する混
乱は避ける必要がある。従って、画素１．０および１．
４個の範囲の望まし膨張半径がこの限度内で許容できる
折衷案として示されている。【００３７】次に、このプロセスは比較されるシンボル
ストリングに対しモデルおよび膨張されたイメージ版を
作るために繰り返される。例えば、全体の入力イメージ
１０のコピーが上述したように膨張されても良く、ステ
ップ２０２においては、ボックスリスト内に規定された
全てのボックスに対する膨張させた境界内の画素はこの
膨張された入力イメージから複写される。これらの画素
のセットは、個々の膨張された「ワード」を表し、以下
においてイメージ１（入力イメージの膨張された部分、
１５６）とし、これに対し、入力イメージのオリジナル
で膨張されていないワードセグメントを以下においてモ
デル１（１５０）とする。ボックス２のイメージと同様
に、それぞれのイメージ内のワードを表す画素は太って
見え、対応するモデルよりさらに詰まった状態となる。【００３８】入力されたイメージおよび「辞書」のイメ
ージに対しそれらの関連モデルおよび膨張イメージが形
成されメモリーに収納されると、１対の入力（ボックス
１）および辞書（ボックス２）のイメージが対比のため
にステップ２０４において選択される。次に、コンパレ
ータ２４は、ステップ２０６において、これらのボック
スが「合理的に」近い寸法か否かを決定するためにテス
トを行う。すなわち、これらのボックスがそれぞれの長
さおよび高さにおいて所定の範囲内であるか否かであ
る。図１１に示すように、寸法の差ΔＬは２つのイメー
ジセクションに対しΔＬ＝｜Ｌ１−Ｌ２｜で定められ
る。さらに、ステップ２０６におけるサイズテストでは
高さの比較（図示されていない）を行うことが望まし
く、この比較は長さの比較で説明したと同じ方法で行う
ことができる。これらのボックス内において、互いに相
対的にイメージがずれていても良いようにして、この比
較における信頼性をさらに向上させるために、大きな寸
法の差を許容しても良い。寸法の差ΔＬが所定の範囲に
入らないときは、ステップ２０８においてより多くのボ
ックス（イメージセクション）が利用できることが決定
されると、ステップ２０４において、異なるイメージの
ペアー（入力および辞書）が対比のために選択される。
そうでない場合は、選択された入力および辞書のイメー
ジペアーの境界を示すボックスはほぼ同じサイズである
と想定され、それぞれのワードボックスのペアーはそれ
らが一致しているか否かを見るためにさらに比較され
る。２値イメージはＡというポイントの限定されたセッ
トを表してると考えられ、Ａのそれぞれのポイントの座
標は２値イメージ内の「オン」状態の画素によって表さ
れる。従って、比較するポイントのセットに対する測
度、ハウスドロフ距離を２値イメージを比較するために
適用できると考えられる。特に、ＡおよびＢの限定され
たポイントのセットが与えられるとハウスドロフ距離は
以下のように定義される。【００３９】【外１】および｜ａ−ｂ｜は２つの与えられたポイントａおよび
ｂの間の距離である。【００４０】実際には、関数ｈ（Ａ，Ｂ）はＡのそれぞ
れのポイントをＢの最も近いポイントへのＡの距離をベ
ースにランク分けされ、最も大きなランク分けされたそ
れらのポイント（最も合わないポイント）で距離の値が
指定される。従って、ｈ（Ａ，Ｂ）＜＝δ（デルタ）で
あれば、ＡのそれぞれのポイントがＢのあるポイントの
距離δ以内にあることを意味する。関数Ｈ（Ａ，Ｂ）は
２つの非対称距離の最大値を示し、従って、Ｈ（Ａ，
Ｂ）＜＝δであれば、ＡのそれぞれのポイントはＢのあ
るポイントのδ以内にあり、その逆も成立する。このハ
ウスドルフ距離はこのように２つの２値イメージ（ある
いは限定されたポイントのセット）の類似性の指標（測
度）となり、δの値が大きいとこれらのイメージ同士間
の類似性は低いことを示す。【００４１】トークンのビットマップイメージ同士を比
較する際に、δの小さな値はデジタル化の処理における
量子化ノイズ（トークンの境界においてランダムにオン
あるいはオフしている画素）を補償するので望ましい
が、これらのイメージが相対的に類似していることもあ
りうる。δの小さな値のハウスドロフ距離の計算を行う
望まし方法として、論理的なＡＮＤ（論理積）を取る操
作と共に膨張技術（ディレイション）を用いている。あ
る２値イメージＡをある半径δで膨張したものでは、イ
メージＡのそれぞれのオンあるいは黒色の画素が半径δ
のサークルに置き変わっている。ある画素の４つの最も
近隣（水平および垂直）のものを表すためにはδ＝１．
０が用いられ、一方、δ＝１．４は、ある画素の８つの
最も近隣（水平、垂直および対角線）のものを表す。こ
れらは量子化ノイズを相殺するために望ましい値であ
る。【００４２】Ｂ’をＡをδだけ膨張させたものとする
と、ｈ（Ａ，Ｂ）＜＝δが正しくＡ∧Ｂ’＝Ａであると
きなのは明らかあであり、ここで∧はＡおよびＢ’の論
理積（ＡＮＤ）を表す。すなわち、Ａの全ての黒点はＢ
のどれかの黒点の距離δ内に有るはずであり、この場合
Ａの全ての黒点はＢ’のある黒点と一致しなければなら
ない。従って、ｈ（Ａ，Ｂ）＜＝δであるか否かを決定
でき、これと同様に、Ｈ（Ａ，Ｂ）＜＝δは単にＢをδ
だけ膨張させ（Ａから推定して）、さらにＡ（Ｂから推
定して）との論理積を計算すれば良い。【００４３】一般に、Ａのいずれかのポイントは、Ｂの
いずれのポイントとも近くないことがあり、その逆もあ
る。従って、ハウスドロフ距離は最大値を求めるのをあ
る量（例えば、中心値あるいは他のパーセンティル）の
計算に置き換えるのが一般的である。この定義は以下で
ある。【００４４】【外２】この計算は、ＡのそれぞれのポイントからＢの最も近い
ポイントまでの距離の最大値（最長値）の代わりにＫ番
目に大きな値を計算している。従って、Ａのある数ある
いはある部分をこの距離の計算において無視している。
セットＡの中にｍ個のポイントがある場合、ｋ＝ｍとす
るとこの定義はｈ（Ａ，Ｂ）と同じである。しかし、一
般に０＜＝τ＜＝１の範囲の値を取るτのある値に対し
ｋ＝τ×ｍとすると、Ａのポイントのｍ−ｋ＝（１−
τ）×ｍは無視される（すなわち、Ｂに近いポイントに
ある必要はない）。これはＨ（Ａ，Ｂ）に対しても適用
できる。望ましい実施例において、τはＡの４パーセン
トまでとがＢ’に対し一致しなくても良く（τ＞＝０．
９６）、逆の場合も同様である。【００４５】δの小さな値にこの計算を好ましく適用す
る場合も膨張させて論理積をとればよいが、この場合
Ｂ’に重ならないＡのポイントがあり得る（逆の場合も
であるが）。これらの重ならないポイントの部分は規定
したフラクション（fraction：部分）τより小さくなく
てはならない。従って、ＡおよびＡ∧Ｂ’内の黒色の画
素の数を比較してｈ’（Ａ，Ｂ）を我々は計算してい
る。ｐをＡ内の黒色の画素の数とし、ｑをＡ∧Ｂ’内の
黒色の画素の数とすると、τの所与値に対して正確に
ｈ’（Ａ，Ｂ）＜＝δのときｑ／ｐ＞＝τである。【００４６】さらに、本発明は、ハウスドルフ距離の最
小値（最良のアライメント）を見つけるためにＡおよび
Ｂを互いに相対的にシフトした場合ときにハウスドルフ
距離を評価することも可能である。この技術は公知の相
関関係の操作において用いられる相対的なシフトさせた
場合と同じであるが、ただし、これは量子化ノイズを十
分に許容できるという重要な点で相関関係を求める場合
と明確に異なっている。相関関係を求める場合は、Ａの
ポイントとＢのポイントの間が近接していることには着
目していない（例えば、δ＝０を用いるとこの制限され
たケースが相関関係に近くなる）。【００４７】上述した技術を実行するための次のような
一般化された手順を用いると、２つのイメージセクショ
ンを比較でき、それらが合致しているかが決定できる。【００４８】１）モデル１がイメージ２に重ね合わされ
る。２）黒いイメージ２の画素に一致する黒いモデル１の画
素の数を計数し、次に黒いモデル１の画素の全数で割る
（ステップ２１４）３）一致した黒い画素のパーセンテージが所定のしきい
値のパーセンテージτ（τは０．９６程度が望ましい）
より上であれば、これらのボックスは第１の検証（事
例）では合致していると決定される（ステップ２１６）４）モデル２がイメージ１に重ね合わされる。５）これら２つのイメージセクションを上記のステップ
２と同じく再び比較し、一致した黒い画素の第２のパー
センテージを決定する（ステップ２２０）６）この第２のパーセンテージが所定のしきい値のパー
センテージτより上であれば、これらのボックスは第２
の検証（事例）で合致していると判断される（ステップ
２２２）。さらに、７）両方の検証においてこれらのイメージセクションが
一致している場合は、これらは同じワードであると考え
られ、さらに、図２のコンパレータ２４からワードが合
致した指示が出力される（ステップ２２４）。【００４９】上述した実施例に加え、本発明は、ある比
較技術として利用でき、あるイメージ内のワードトーク
ンの等価なクラスを作ることができる。本発明はＯＣＲ
システムの前処理操作として用いることが可能であり、
これによって利用可能なＯＣＲシステムのスピードと精
度を改善することができる。さらに他の案として、本発
明はワードトークンが度重なって現れていることを判断
するためにも利用でき、そのワードが続いて現れたとき
はあるアイコンに変換してサイズを減らし大きな文章を
保持するために必要なデータファイルの全体的なサイズ
を縮減することができる。【００５０】この好ましい実施例において、ボックスの
サイズを比較し、一致しているであろうボックスやイメ
ージセクションの既設のクラス（例えば、入力イメージ
および「辞書」イメージの両方の中の同じトークンであ
ると考えられる複数のボックス）毎にライブラリーを作
るプログラムが実行される。例えば、それらの長さ
（幅）からトークンイメージセクションをクラス分けす
るために適したデータ構造が形成されており、トークン
のペアの比較されるスピードを改善できる。ある入力イ
メージのある部分を既知あるいは「辞書」のトークンと
比較することに関し説明してきたが、例えば、図６のイ
メージセクション７０および７２において、本発明は同
一のあるいは異なるイメージ内のトークンを比較するこ
とも可能であり、本発明の操作を示す目的で記載された
例に限定して理解されるものではない。【００５１】【発明の効果】本発明は複数のイメージシグナルあるい
は画素によって構成された２つのイメージセクションあ
るいはトークンを比較する方法であって、それぞれのト
ークンは１つあるいはそれ以上の連結したシンボルで表
されており、同じトークンとして識別される。本発明
は、さらに、トークンを形成するシンボルあるいはキャ
ラクターを個々に検出あるいは識別する必要なく動作す
る。この方法は、あるイメージ内の要素を検出し、ま
ず、トークンの境界を決め、次に２段階のプロセスを適
用し、その中で膨張されたイメージがトークンを表すモ
デルと比較され、それらの間の相対的な類似性が決定さ
れる。【００５２】従って、本発明により、要素あるいは同様
のシンボルを備えたイメージ領域を規定する複数のトー
クンを比較する方法を提供できることは明白である。こ
の発明は好ましい実施例を参照して説明してあるよう
に、コンピュータシステムにおいて使用できるように設
計されたソフトウェア手段として説明してあり、所定の
指令を実行可能な１つあるいはそれ以上のマイクロプロ
セッサーあるいは計算能力のある処理装置を用いて、こ
れらに対しイメージデータの処理に関して上記にて説明
したような操作を行えるようにしてある。さらに、本発
明は、ここで説明した処理を行えるように設計された特
定のハードウェアを用いても実現できる。さらに、本発
明は、大きなワード認識システムの一部として説明して
ある。しかし、先に記載したように、本発明はテキスト
あるいはイメージの編集、あるいはそれに係わるシステ
ムにも用いることが可能である。実際には、トークンあ
るいはシンボルのストリングを識別し、クラス分けし、
グルーピングすることが必要ないずれのシステムにも本
発明を用いることができる。最後に、本発明はテキスト
形式のイメージを元に説明してある。しかし、テキスト
形式でないイメージを部分を含んだイメージに対しても
同様に適用することもできる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention
Words (words), multiple concatenated structures represented in an array
Components or similar units whose meaning can be interpreted.
For comparing image tokens that form
It is a law, for which characters (characters), symbols
Characters (symbols), glyphs (such as emoji), or tokens
Without individually detecting or identifying the elements that form
It's about a good way. [0002] Electronically encoded documents (electronic documents)
Text in two different formats
Often one of them. In the first format
Means that the text is a bitmap,
Text can be image data or an array of pixels
Basically defined with adjacent images of similar expressions defined
Indistinguishable. In this format, the text is a sentence
Computers that are based solely on the content of
Rarely split into image units for processing
Must. The second format is as follows
This is called the character code format.
Kist is character code (for example, ASCII code)
As a string. This character
-In code format, an image of text
Alternatively, no bitmap is required. [0003] By the optical character recognition process (OCR)
Conversion from bitmap to character code takes time
It is very expensive considering the time and labor involved in processing. Individual
Whether the character's bitmap is adjacent to it
And analyze their appearance, and furthermore,
In a group of characters preset by
Must be identified as a character. Miyatake
U.S. Pat. No. 4,956,869 to U.S. Pat.
Trace lines (equal concentration lines, contour lines)
A more efficient way to do this has been suggested. However, an original (original)
When you scan and extract an electronic document,
Depending on the image quality and noise, the actual bitmap
The appearance is uncertain. Bitmap appearance degraded
What we do is scan original documents with poor quality
Errors or affect the digital reproduction of the image
Due to similar factors. Therefore, the character
The decision-making process to identify
There is a certain uncertainty. This is of particular concern.
Means that the characters in the text are blurred,
Or it is easy to combine. Most character knowledge
Another process is one pixel connected by a certain character
Start by assuming that they are independent sets of
You. This assumption is made due to the quality of the input image.
Otherwise, character identification will fail.
You. The following patents improve character discrimination:
In particular, it demonstrates the techniques for different approaches. To Manon
U.S. Pat. No. 4,926,490 issued to Cherange
U.S. Patent No. 4,558,461 issued to Gray et al.
U.S. Pat. No. 3,295,105 to Birski
U.S. Pat. No. 4,949,392 to Long et al.
U.S. Patent No. 5,142,589 to Hood et al. The OCR method splits an image in various ways
I am trying to do it. For example, given to Chelange
US Patent No. 4,558,461 and Peppers et al.
There is U.S. Pat. No. 4,809,344 granted. [0007] In the OCR method, matching with words in a dictionary is performed.
Therefore, the reliability is improved. For example, given to Hishino
No. 4,010,445.
Fujitsu Science and Technology Journal 26,3, pages 224-2
33 (October 1990)
Is a block extraction, skew adjustment,
Split, adjacent character classification, line extraction, and
Perform pattern matching by checking and comparing with a dictionary
Show each step of character recognition by
I have. Form a string of words or characters
To identify multiple character sets,
Reading as in US Patent No. 2,905,927
Would be desirable. [0009] As a basic unit for recognition,
Using the entire code is not considered when recognizing the signature.
U.S. Pat. No. 3,1,1 issued to Frishkop
33,266. However, the split key
There is no idea to hold the character. [0010] The present invention is based on the OCR technology.
To prevent problems with
Word) and the fundamental properties of text strings
We are utilizing it. The space between words consists of letters and
Tend to be larger than the space between characters,
Separation and tokenization of tokens that make up character strings
Identify individual characters within the token
Can be improved as compared to OCR method
However, the character must be
Some correct decisions are required on the form of
Include keys such as ascenders, descenders, curves, etc.
It also includes the identification of the characters (characters),
It is easy to make mistakes. The invention, on the other hand, uses the word
Or a string of symbols or characters
The security of connected components (hereinafter referred to as tokens)
Is to be more reliably recognized and identifiable.
You. In one embodiment, the present invention provides for
Word boundaries to determine the characteristics of the list or symbol first.
Utilizes the world. Then, the toe separated within that boundary
In Kuhn's mutual or token image dictionary
A comparison with a known token is performed. So compare
Classification of tokens is not performed until the stage,
Mistakes in subsequent processing or incorrect decisions
The effects of invalid partial classification causing causation
Can be excluded. The potential of computerized text
And at least in some cases,
Processing requirements that can lead to each letter of the word
It was decided not to be imposed. So, for example
If you do a keyword search for a text image
At the time, each of the words
Instead of converting characters, one or more
Whether there is a keyword above may be defective
When continuously determining from character codes,
Instead of creating something, the pewter
Displays the shape of multiple tokens in the image and their keywords.
The token shape is compared with the token shape.
Evaluate whether the keyword exists from the ratio. This
Output to a user-friendly system
Can show some indication of the presence of a keyword with an accuracy of
Things. In addition, the new method described here
Than some methods designed to recognize lactors
It is considered that the processing speed is fast. Still further, the present invention
Can also be applied to image editing systems, and
Is not limited to the described embodiment. Characters cannot be correctly determined by the OCR method
Is likely to be relatively low, but the product
Applying the rule doubles the probability for every word
Is accumulated. Therefore, multiple words can be written using OCR.
When converted to a character code string, these
Before searching or recognizing the word
An error will occur. The present invention provides a token level
Or word level in embodiments that recognize text or text.
Pass through using divided image data
Similar to what humans use when reading and extracting text
Method enables continuous recognition. Further explanation
The process of recognizing the shape of a token
Has an effect. First, bitmap image data
Data is not lost in an irreversible condition,
Also, the reasonable display of the bitmap remains, so the user
-Characters, symbols, glyphs,
Or bitmap reproduced to determine word
You can check it. Second, use connected elements (tokens)
Each symbolic element by being
(Ie, the character (character))
Word) with its entire context and
Helps compare with token shape. For example,
Even if there are characters in the word token
Has little effect on identifying the whole word shape
And between two compared tokens representing those words
It only slightly reduces the probability of matching. Furthermore, OC
Compared to the ability of the R method, the OCR method has more characters.
It is easy to get wrong results for words you have
In contrast, the present invention generally identifies more arduous words.
Have the ability to The OCR method uses a key represented by a bitmap.
Character code, and then convert it to a bitmap
You may lose the content containing the information. Generally, this professional
Seth uses the original bitmap from the character code.
It is not a reversible thing to get a tip. However
When the word token is identified based on the shape,
As described in accordance with one of the
Can have bitmap information up to
This allows the bitmap to be reconstructed. According to the present invention, each of
Represents Tokun and has multiple image signals
Comparing at least two image sections
A method for determining similar tokens, comprising the following steps:
Can be provided. (A) An image representing the first token
Storing the signal in a first model memory;
The first token is expanded and represented in the image memory
And (c) an image sig representing the second token
(D) storing the null in the second model memory;
Inflated representation of the second token in the image memory
And (e) storing the image stored in the first model memory.
The image signal stored in the second image memory
Image signals to determine the first similarity distance.
(F) the image stored in the second model memory
The image signal stored in the first image memory.
Compare image signals to determine second similarity distance
And (g) the distance between the first and second similarities
In response, determine whether the first and second tokens are similar
judge. One aspect of the invention is that each image section is
Represents a coon, a few with multiple image signals
Compare at least two image sections to see similar
A method for identifying a token, comprising: (a) a first token;
The image signal representing the image in the first model memory
(B) storing the first talk in the first image memory;
(C) The second talk
The image signal representing the
(D) storing the second toe in a second image memory;
Creating an expanded representation of the kun, and (e) the first
The image signal stored in the model memory,
The image sig stored in the second image memory;
Comparing with null to determine a first similar distance;
The image signal stored in a second model memory;
The image stored in the first image memory.
A second similar distance is determined by comparing the
(G) corresponding to the first and second similar distances
Whether the first token is similar to the second token
The ratio of at least two image sections
It is a comparison method. BRIEF DESCRIPTION OF THE DRAWINGS FIG.
What is shown is to illustrate a preferred embodiment of the present invention.
And not to limit it to equivalents. FIG.
Outlines a generalized image processing system.
This covers many situations where the invention can be used effectively.
-I can do it. Generally, the source image is
Sources such as facsimile machines or recording systems
It is extracted from the image extraction system 2. This source image
The page is sent to the computer processing unit 4, but the processing unit
4 can be any of several known devices, described herein.
Such a device according to the invention may be used. User interface
In response to the command input to the source 6, the processing device 4
Outputs to an output device 8, which is also a printer,
Display, facsimile machine or other recording device
It may be. Basically, as shown at the top of FIG.
The input document is put into the system and the output document is
Is collected. In the following, an image is an image bit.
Is described as a map, and here the image
What are multiple rasterized (rasterized) (scan lines
(Resolved into the image signal). This
These signals are usually called pixels and correspond to the text.
When expressing a mark or active position
Commonly appear in black, which creates documents and marks
Is done. These configurations are used to describe the present invention.
But limited to black and white or binary images
It is not done. Rather, the present invention represents an image
It is generally applicable over a wide range of technologies.
In addition, the present invention provides for intra-image or inter-image
It also aims to determine the similarity of multiple tokens
You. In one embodiment, the present invention provides a simple
Determine similarity of word objects (word objects)
This is useful for editing and compressing images.
It can also be used for
It is not completely limited to the examples. FIG. 2 shows a word which is an embodiment of the present invention.
A system for determining, dividing, and comparing shapes is shown.
You. Each element of the system is a number of devices
Or just a single device in one device.
It may be a program. Similarly, in the following the word
Describe the preferred embodiment for recognizing objects
However, the comparative technology that forms the basis of the present invention
Requires strict pre-processing operations as described in the examples
It does not do. Triggered by input bitmap 10
But this source is definitive or part of the present invention
It does not make up. Bitmaps are initially partitioned
(Segmentation system) 12
With multiple tokens (words, character strings,
Or other units whose meaning can be interpreted)
Is done. First, the image bitmap is
-(Skew return unit) 14
Determines the angle of text oriented in the image, and
Correct the orientation of. Made by this deskew operation
Word boxer using the deskewed image
(Word enclosure) at 16
The boundaries are determined, along with the boundaries of this token, the image
The boundaries of the text lines within are also identified. Word separator
(Word segmenter) 18
Word token boundaries are applied to the map and
Each word group in the image is read
Are separated in order and these are then
Will be handled. Here "word (word)", "symbol
String "or" character string (character
Column) "means connected alphabets or punctuation
Element, or a broader set of tokens
And form all or some of the units whose meaning can be interpreted.
Is what it does. Such interpretable units are:
Characterized in the image, the unit itself
Of adjacent elements, signs or symbols
The gap is larger than the gap to be separated (spacing).
Have been separated. In this regard, the invention can be applied differently.
Yes, for example, in text and word editing systems,
Independent words to process (operate) the image continuously
Can be used. Therefore, the present invention is related to word recognition.
However, the present invention is not limited to this. Next, the shape comparator 24 determines whether the
The shape of the word token representing each word is stored in the dictionary 26
Form of a known or already identified word token from
Compare with the shape. As another example, the shape comparator 24 is
Two or more words determined from page 10
It may be used to compare the shapes of the bubbles. desirable
In an embodiment, the comparator 24 is
To characterize the similarity between
Using the Hausdorff distance as a variable
You. More importantly, shape comparator 24 is identified
Word Talk from Strings of Non-Characters
Only compare the token shape to a known word token shape
It is inconvenient. In a simple context,
The converter 24 converts the shape of one token
It is a simple device to compare with the shape,
Is represented by the matching instruction output. Two
The similarity between token shapes is relatively indicated. The token shape in the context of the word image is
To outline the method or device to be determined and compared
Next, each step of the embodiment for comparing the shapes will be further described below.
explain in detail. To further explain the process of the present invention
FIG. 3 shows a sample image, which is a public image.
Are taken from what became the assets of
Several lines are included. Figure 3 shows a text (text) page.
The outline of the appearance of the image on the page is shown in FIGS.
And 6 are part of the scanned image of the page
Where the bitmap image is enlarged
To illustrate the problems with known OCR technology.
You. Referring to FIG. 3, for example, 2 of this text image
The word image 50 on the line is “formation”, and 4
The word image 54 on the line is “automobile”
Some characters appear to be connected. In addition, many known small angle images
How to rotate or correct skew
Used to get a deskewed display of the image
be able to. When the image is deskewed, the token
Can be extracted in many ways, and the choice of
Depends on the first application (application) of Kung comparison
You. The comparison technique illustrated in the present invention is based on multiple
Used for multiple component tokens represented within boundary 58
That they match each other, or that
Identify a match with Kung. Once identified, the larger
Matched or known tokens in the sentence image
Labeled or similarly for subsequent processing
Be identified. For example, for subsequent processing,
Identify, access, and extract information in protected documents
And publicly announced on June 24, 1993
Peter B. opened. Published by Mark et al. (W0-93)
/ 12610) "A method for compressing images and
The compression technique disclosed in "Apparatus" is also included. Docume
Multiple tokens representing parts of the
Can be used, but with advanced image processing, word-based
Generating a kun is desirable for identifying words.
Structured by words and strings of related characters
An example of comparing and recognizing multiple tokens generated is as follows
Will be described. As shown in FIG. 2, the word boxer 16
Are shown in FIGS. 7 and 8 for the deskewed image.
An operation is performed according to the flowchart shown. Word boxer
The following description of the process steps performed in the
Using operations performed on a functional computer
However, since the present invention is limited to this embodiment,
There is no. When step 80 starts, the word boxer
First, read the input image in Fig. 3 and this image is necessary
If there is, it is deskewed by the deskewer 14. This
Functions, for example, a hard disk or similar storage
Image stored in memory such as storage device
Access to the image, and furthermore, the image
Page of memory allocated for that image.
Copy it to a location and, if necessary, its image
Assign a pointer to. When the image is extracted (retrieved), the step
Step 82 finds the connected element in the image
You. Is this process inside a stored binary image?
Only find the black pixels. Black pixels are found
The interactive (interaction) process continues to
Black pixel and the adjacent black pixel
Find them and continue until the range of connected pixels is determined.
Do it. More specifically, the eight-neighbor definition (eight-n
eighbor connection definition) is used. Sand
That is, one pixel has eight compasses (ranges)
If they are adjacent to one of the directions, they are adjacent
It is considered to be the same connecting element. Furthermore, the image
All black pixels within are properly associated with other black pixels
Until the connected elements are formed
Repeated. As shown in FIG. 4, the connected pixels
When attached, a square box or border 58 is identified
Which reflects the largest range of connected pixels
And the square box represents the xy coordinates of the image.
Oriented along. The glue of all connected elements in the image
Box around bounding box (bounding box
Is defined, a part of the image is shown in FIG.
Boxes or boundaries of connected elements, as shown
Out of the set where the word boxer is bad
Find the box (not shown) (not shown)
put out. Bad box is characterized as follows
You. (A) A tall box whose height is
About 20 percent higher than the height of the image
More than 90 percentiles in the page
Big box or (b) short box
And less than almost 1/3 of the height of 90 percentile
Box. After the analysis, the remaining boxes are
The vertical or y axis of the document (the y axis is the deskew
Considered to be the axis perpendicular to the direction of the line of text
Projected to form a histogram, which is a box
Number of borders of a reflection as a function of position along the y-axis
As shown in FIG. 9 for all the images in FIG.
You. In the preferred embodiment, the hiss projected on the y-axis
Determine the position of the text line for the program data
Before smoothing (smoothing)
May be performed. Next, from the result of the histogram,
Provisional lines or line boundaries are aligned along the y-axis of the image.
This line is
It is a valley in Lamb. For example, as shown in FIG.
In addition, a plurality of valleys or lowest points 120 are adjacent peaks
Alternatively, it is identified between the highest points 122,
The position of the space between the lines can be determined from 120,
These are shown as reference numeral 62 in FIG. This operation
This is performed by step 88. Finally, provisional (preliminary)
Once a typical text line or row is determined,
Assign all connected element boxes to the specified line
Function operates. If the position of the text line or line 62 is temporarily
Once decided, the bounding box of the connected element
First, the two rows that lie on a particular row
Is performed. Flow chart steps
Steps 92, 94, and 96
In the meantime, the temporary
Check if the regular text line is correct
Will be First, as explained later, text
Function to confirm that line separation is not a failure
You. In general, focusing on connected elements, a certain text
The projected parts of the rows are as bad as they are in the y-axis direction.
Unless they overlap, the part projected in the x direction
They do not overlap. Identified in step 92
If the projected parts overlap,
Rows may be two or more separate rows
Look for the lowest point in the higher, y-projected graph.
Must be attached and separated. Also, text image
For example, a dot above "i" or an underline of a word
Around a small group of connected elements
Ignore text boxes and separate text lines further
Must not trigger the trigger by mistake
Must. Second, as in step 96, x
The remaining boxes that overlap each other along the axial direction
One button with a border surrounding the merged element
Merge into the box. Generally, this merging process
Overlook multiple boxes in one line and overlap in the x direction
And in the y direction
Identify the box. The minimum overlap in the y direction is about 5
About 0% is good. For example, a scanned image
If the message contains the word "fort", scan
Will cause the right edge of the "f" box to
May overlap the left edge, and therefore overlap along the x-axis.
Merged box elements will result in the “f” and “o”
Will be merged. In this procedure,
A size test is also performed, and a box smaller than the specified size is
Boxes are not merged. Then this little box
Can be identified and removed as noise in the image.
Wear. Third, text lines are correctly detected
And the remaining boxes in this row are connected elements or
Tokens, some of which are words or
To form similar elements that can interpret meaning
Further connection is required. Combine further adjacent elements
-Based talk in scanned images
In step 98, a text line is
Histogram of separation distance between adjacent elements
Is performed continuously. Diagram showing the distribution of common text lines
10 and the dashed curve is the histogram data of the row.
And the solid curve is a smoother version
is there. As expected, the resulting curve shows the distribution of the binary model
In general, the first set of peaks 130 and 132 is
Represents the distribution of spacing separation between characters
In contrast, the second peak is broader and more frequent.
Low, reflecting the separation between adjacent words
I have. In addition, under certain conditions, the distribution of a single model
It is. Two maximum values of the distribution of the binary model (bi-model)
First sets the separation threshold in step 100
Used to identify, followed by separation between words
Separation and separation between characters
Also used to distinguish. Next, using this threshold value for separation,
Separation in the x direction among adjacent boxes in the text line
Merge smaller than the separation threshold
To do so, the procedure of step 102 of FIG.
You. In this step, simply concatenate each row
Set of elements that are adjacent to each other
All that are separated by a distance shorter than
Is done. Merge adjacent characters in a word
And the resulting box structure contains each text
Reflects the boundaries of word tokens within a line, for example,
FIG. 6 shows a plurality of boxes 66 surrounding a plurality of words.
It is. At this point, a small unmerged Bock
Option to recognize and remove noise as noise in the image.
May be performed. Next, read in order (from the top
Bottom, and from left to right of each line of text)
A list of the created boxes is created at step 104.
Each of the arrays in the box list is
One word token, picture, phrase in each
Readings or bows of equivalent units that can interpret meaning
A binding box 66 is defined. Returning to FIG. 2, for example,
A box list that represents the boundaries of the domain-based tokens
The spelling of a token by the word boxer 16
Once the list is created, this list and bitmap image
Page is a token or word segmenter
18 is sent. Generally, a divider (segmenter) 18
Is an image processing device, and the input image
The bitmap of page 10 is specified in the box list.
A series of smaller words according to word or token boundaries
It can be divided into various bitmap images. Wah
The output from the dosegmenter 18 is a bitmap image
Is a series of flows (series), each image
Is the word toe identified by word boxer 16.
Bean or equivalent
It has a bitmap. In a preferred embodiment,
The word segmenter 18 is surrounded by a word box.
For each part of the input image
It does not actually generate a bitmap. Rather,
The segmenter can simply open the window or
By selecting certain parts of the
The image specified to be within the bounds of the token box
Access to that part of the page. Destination
As described above, the output of the word segmenter 18 is
Is sent to the comparator 24, where the
And the token is another bitmap image from the dictionary 26
And the token output from segmenter 18
The image matches the word token supplied from the dictionary
It is determined whether or not to do so. As described below, the word image
Housed as one preferred way to compare professionals
A technique for measuring Loff distance is used, which is 1991
"Housedorf
Comparison of Images Using Distance "(TR91-121
1), and the December 1992 "Housedorf
Multiple Resolution Technology for Comparing Used Images "(TR
92-1321), all of which are
Published by Computer Science Faculty at Nell University
ing. Generally, boxed tokens are
The method of comparison is to identify the element identified in a particular box.
11 and 12 for comparing elements.
Process is used. Simplified implementation described below
An example is the same word-based token in an image
Or different. It
The bitmap of each section or word token
Expression by a loop is used to define predetermined boundaries.
Defined by a box (bounding box)
It corresponds to the area. Between bitmap sections
A common way of making such comparisons is
Generally known, the logical AND of two images
(Logical AND) operations are used to determine similarity.
It is. In the present invention, on the other hand, this correlation is expanded (di)
lation) technology to improve
Specific to the digitization process used to form the image
The effect of the quantization error is eliminated. As adopted below, the two compared
Two token images as Box 1 and Box 2
I do. These image parts (image sections)
Is even if two sections from the same image
Good, even two sections from different images
Or one section from an image,
The string, word or toe of the entered symbol
Electronically from an interpretable unit of meaning forming a cun
One created section may be used. Figure 2
It is represented as a word image "dictionary",
The general purpose of block 26 is to use other token images
Token to compare with section (box 1)
By providing a section of the image (Box 2)
is there. As shown in FIG. 6, "automobile" 70 and
Original word representation of two word-based tokens, 72 and 72
May be compared according to the description, where the expression 72 is
Book. Box 1 and Box 2
Once sections 70 and 72 are defined, respectively,
Each image is called a "model".
An inflated version of this model is made,
Image ". As shown schematically in FIGS.
The comparison method used in the comparator 24 is as follows.
Pixel in model 1 (150), ie, box
Replace the original pixels in the section surrounded by
Image 2 of the expanded representation of the pixel represented by box 2
The pixel is compared with the pixel of (156).
A first distance is generated. Similarly, this process
Box that is inverted and is a pixel in model 2 (152)
The original pixel of the section surrounded by 2 and the box
Image 1 (15)
4) is compared with the pixels in
Generated at block 162. Subsequently, block 16
At 4, 166 and 168 these two distances are numerical
Processed in box 1 and box 2
Determine the degree of similarity between these two image sections
Is done. More specifically, the comparator 2
4 first places image 2 in the memory location for the model
A certain word image specified in the “dictionary” (Box 6)
2) Duplicate the pixels within the boundary. These pixels are
Is model 2. The comparator then goes to memory
Duplicate Model 2 at location 2 and add
Image 2 (expanded image 156)
Is created in step 200. That is, the model 2
For all "on" or black pixels stored in Molly
On the other hand, in the memory of the image 2, the part adjacent to this periphery
Turn minute on or black. Exact numbers next to each other are half-expanded
Defined as the diameter (delayion radius)
Is predetermined. As an example, the preferred expansion radius
Is 1.0 pixels, four adjacent pixels are turned on, and
Assuming that the diameter is 1.4 pixels, all eight neighboring pixels are adjacent.
Turn on. If the expansion radius is further increased, the same
Non-words may be incorrectly matched
growing. The choice of the expansion radius prevents errors by quantization.
This error is mainly due to digitization.
Occurs in the process. When choosing the expansion radius,
When calculating a simple correlation (effectively increasing the expansion radius to 0
Errors that would be introduced in
Is desirable, while images that are over-inflated (eg,
(For example, a large expansion radius)
Rebellion must be avoided. Therefore, pixels 1.0 and 1..
A desired expansion radius of four ranges is acceptable within this limit
Shown as a compromise. Next, the process determines the symbols to be compared.
Model and inflated image versions for strings
Repeated to make. For example, the whole input image
Ten copies may be inflated as described above,
In the box 202, the information specified in the box list
Pixels within the expanded border for all boxes are
Copied from the dilated input image. These pixels
Represents an individual dilated "word", and
At image 1 (the dilated portion of the input image,
156), whereas the original of the input image
Word segments not expanded by
Dell 1 (150). Same as box 2 image
The pixels representing the words in each image are fat
It looks and is even more compact than the corresponding model. Input image and image of "dictionary"
Image and their associated models and inflated images
Created and stored in memory, a pair of inputs (box
1) and dictionary (box 2) images are for comparison
Is selected in step 204. Next, Compare
Data 24 in step 206
Test to determine if the
Do That is, these boxes are
Whether the height and height are within a predetermined range.
You. As shown in FIG. 11, the dimensional difference ΔL
ΔL = | L1-L2 |
You. Further, in the size test in step 206,
It is desirable to do a height comparison (not shown)
This comparison is done in the same way as described for the length comparison.
be able to. Within these boxes, each other
By contrast, the image may be shifted,
Large dimensions to further improve reliability
A difference in the law may be allowed. Dimensional difference ΔL is within specified range
If not, at step 208 more buttons
(Image section) is available
Then, in step 204, a different image
Pairs (input and dictionary) are selected for comparison.
Otherwise, the image of the selected input and dictionary
Boxes indicating the boundaries of the diaper are approximately the same size
Each word box pair
Are further compared to see if they match
You. A binary image is a limited set of points A
It is thought that it represents the point, the position of each point of A
The target is represented by the "on" pixel in the binary image.
It is. Therefore, the measurement for the set of points to be compared
Degrees, Hausdoff distances to compare binary images
It is considered applicable. In particular, the limitations of A and B
Given a set of points
It is defined as follows: [Outside 1] And | ab | are two given points a and
b. In practice, the function h (A, B) is
The distance of A to the nearest point of B
Source, and the largest ranked source
The distance value at these points
It is specified. Therefore, for h (A, B) <= δ (delta)
If there is, each point of A is a point of B
Means within the distance δ. The function H (A, B) is
Shows the maximum of the two asymmetric distances, and therefore H (A,
B) If <= δ, each point of A is
Point within δ, and vice versa. This c
The Usdorf distance is thus two binary images (there are
Or a limited set of points)
Degree), and if the value of δ is large,
Indicate low similarity. Compare bitmap images of tokens
When comparing, the small value of δ
Quantization noise (random at token boundaries
Or off pixels).
However, these images may be relatively similar.
Can be. calculate the Hausdoff distance for small values of δ
As a desirable method, an operation to take a logical AND (logical product)
The expansion technique (delayion) is used together with the work. Ah
When the binary image A is expanded by a certain radius δ,
Each on or black pixel of image A has a radius δ
Has been replaced by a circle. The four most of a pixel
To represent neighbors (horizontal and vertical), δ = 1.
0 is used, while δ = 1.4 is equivalent to the 8
Represents the nearest (horizontal, vertical and diagonal). This
These are desirable values to cancel the quantization noise.
You. B ′ is obtained by expanding A by δ.
And h (A, B) <= δ is A 正しく B '= A correctly
It is clear that the mushroom is where ∧ is the theory of A and B '
Represents logical product (AND). That is, all the black spots of A are B
Must be within the distance δ of any sunspot of
All black spots in A must match certain black spots in B '
Absent. Therefore, it is determined whether h (A, B) <= δ
Similarly, H (A, B) <= δ simply replaces B with δ
Only inflated (estimated from A) and further A (estimated from B
) Is calculated. In general, any point of A is
May not be close to any point, and vice versa
You. Therefore, it is necessary to find the maximum value of the Hausdoff distance.
Amount (eg, median or other percentile)
It is common to replace it with calculations. This definition is
is there. [Outside 2] This calculation calculates the nearest point of B from each point of A
No. K instead of the maximum (longest) distance to the point
The eyes are calculating large values. So there is some number of A
Or some parts are ignored in the calculation of this distance.
If there are m points in set A, let k = m
Then, this definition is the same as h (A, B). But one
Generally, for a certain value of τ that takes a value in the range of 0 <= τ <= 1,
Assuming that k = τ × m, mk = (1-
τ) × m is ignored (ie, at points near B)
It doesn't have to be). This also applies to H (A, B)
it can. In the preferred embodiment, τ is 4 percent of A
And may not coincide with B ′ (τ> = 0.
96), and vice versa. This calculation is preferably applied to small values of δ.
In the case of
There may be a point of A that does not overlap with B '(and vice versa)
In Although). These non-overlapping points are defined.
Not less than fraction τ
must not. Therefore, the black images in A and A∧B '
We calculate h '(A, B) by comparing prime numbers.
You. Let p be the number of black pixels in A, and q be A∧B '
Given the number of black pixels, for a given value of τ, exactly
When h ′ (A, B) <= δ, q / p> = τ. Further, the present invention provides a method for calculating the Hausdorff distance.
A and to find the small value (best alignment)
Hausdorff when B shifts relative to each other
It is also possible to evaluate the distance. This technique is known in the art.
Relative shifted used in relational operations
Same as the case, except that this reduces quantization noise.
Finding correlations that are important in terms of minutes
And clearly different. To find the correlation,
The point between point B and point B
(For example, using δ = 0,
Case is closer to a correlation). To execute the above-described technique,
Using a generalized procedure, two image sections
Can be compared to determine if they match. 1) Model 1 is superimposed on Image 2
You. 2) Black model 1 image that matches the pixels of black image 2
Count the number of primes, then divide by the total number of black model 1 pixels
(Step 214) 3) The percentage of matching black pixels is a predetermined threshold
Percentage of value τ (τ is preferably about 0.96)
If above, these boxes are the first verification
In the example), it is determined that they match (step 216). 4) The model 2 is superimposed on the image 1. 5) Add these two image sections to the above steps
The second pixel of the matched black pixel is compared again as in
Determine the percentage (step 220) 6) This second percentage is the percentage of the predetermined threshold
Above the percentage τ, these boxes are
Is determined to be consistent with the verification (example)
222). And 7) these image sections in both tests
If they match, consider them to be the same word
In addition, the words from comparator 24 in FIG.
The provided instruction is output (step 224). In addition to the embodiments described above, the present invention
Word talk in an image that can be used as a comparison technique
You can create an equivalent class for The invention uses OCR
It can be used as a pre-processing operation of the system,
This allows for the speed and precision of the available OCR system.
The degree can be improved. As yet another alternative,
Akira decides that word tokens appear repeatedly
Can also be used to
Is converted to a certain icon to reduce the size and large text
The overall size of the data file needed to hold it
Can be reduced. In this preferred embodiment, the box
Compare the sizes and boxes or images that might match
Existing classes in the page section (for example, input image
And the same token in both
Create a library for each of several boxes
Program is executed. For example, their length
Classify token image section by (width)
Data structure suitable for
Can improve the speed at which pairs are compared. A certain input
Some parts of the image are replaced with known or "dictionary" tokens.
Although comparison has been described, for example, FIG.
In image sections 70 and 72, the present invention is the same.
Compare tokens in one or different images
Are possible and have been described for the purpose of illustrating the operation of the present invention.
It is not to be understood that it is limited to examples. According to the present invention, a plurality of image signals or
Represents two image sections composed of pixels.
Or a method of comparing tokens, where each token
Coons are represented by one or more linked symbols.
And are identified as the same token. The present invention
Is also the symbol or character that forms the token.
Operate without the need to individually detect or identify
You. This method detects elements in an image and
The token boundaries, then apply a two-step process.
Model in which the inflated image represents the token
And the relative similarity between them is determined.
It is. Thus, according to the present invention, elements or similar
Multiple toes that define an image area with
Obviously, we can provide a way to compare kung. This
The invention is described with reference to the preferred embodiments.
To make it available for use in computer systems.
Described as measured software means,
One or more microprocessors capable of executing commands
Use a processor or a processor with computational capabilities to
Explained above about image data processing
Such operations can be performed. In addition,
The description is based on features designed to perform the processing described here.
It can also be realized using fixed hardware. In addition,
Ming explained as part of a larger word recognition system
is there. However, as noted above, the present invention provides
Or image editing or related systems
It can also be used for programs. In fact, the token
Or identify strings of symbols, classify them,
Book on any system that needs to be grouped.
The invention can be used. Finally, the present invention relates to text
The explanation is based on the format image. But the text
For images that contain non-format images
The same can be applied.

【図面の簡単な説明】【図１】本発明を有効に使用できるイメージ処理システ
ムの概略のシステムダイヤグラムである。【図２】ワードイメージに適用可能な新規のトークンの
認識システムの実施例を構成するシステム要素の組み合
わせを示すブロックグラムである。【図３】新規のプロセスを説明するために例としたテキ
ストを抽出したイメージサンプルを示す図である。【図４】本発明のプロセスの途中の段階の、例としたテ
キストをスキャンしたイメージの一部を示す図である。【図５】本発明のプロセスの途中の段階の、例としたテ
キストをスキャンしたイメージの一部を示す図である。【図６】本発明のプロセスの途中の段階の、例としたテ
キストをスキャンしたイメージの一部を示す図である。【図７】あるイメージ内のワードの境界を決めるための
プロセスを示すフローチャートである。【図８】あるイメージ内のワードの境界を決めるための
プロセスを示すフローチャートである。【図９】図７のステップ８７で求められたヒストグラム
データを示すグラフである。【図１０】図８のステップ９８で求められたヒストグラ
ムデータを示すグラフである。【図１１】本発明に基づいて図２のコンパレータによっ
て操作される処理を示す挿絵の入ったフローチャートで
ある。【図１２】図１１に示したワードの境界内のイメージを
対比するためのプロセスの概略を示すフローチャートで
ある。【符号の説明】２・・ソース４・・イメージ処理６・・ユーザーインタフェース８・・出力先１０・・入力イメージ１４・・デスキューワー１６・・ワードボクサー１８・・ワードセグメンター２４・・ワード比較器２６・・ワードイメージの「辞書」BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic system diagram of an image processing system that can effectively use the present invention. FIG. 2 is a blockgram showing combinations of system elements that make up an embodiment of a novel token recognition system applicable to word images. FIG. 3 is a diagram showing an image sample from which text is extracted as an example to explain a new process. FIG. 4 shows a portion of a scanned image of an example text at an intermediate stage in the process of the present invention. FIG. 5 illustrates a portion of an example scanned text image at an intermediate stage in the process of the present invention. FIG. 6 illustrates a portion of a scanned image of an example text at an intermediate stage in the process of the present invention. FIG. 7 is a flow chart illustrating a process for determining word boundaries in an image. FIG. 8 is a flow chart illustrating a process for demarcating words in an image. FIG. 9 is a graph showing histogram data obtained in step 87 of FIG. 7; FIG. 10 is a graph showing histogram data obtained in step 98 of FIG. FIG. 11 is a flow chart with illustrations illustrating processing operated by the comparator of FIG. 2 in accordance with the present invention. FIG. 12 is a flowchart outlining a process for comparing images within word boundaries shown in FIG. 11; [Description of Signs] 2 Source 4 Image processing 6 User interface 8 Output destination 10 Input image 14 Deskewer 16 Word boxer 18 Word segmenter 24 Word comparison "Dictionary" of word image

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平５−242298（ＪＰ，Ａ) 特開平７−200732（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/00 - 9/82 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-5-242298 (JP, A) JP-A-7-200732 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06K 9/00-9/82

Claims

(57) A method for identifying at least two image sections, each image section representing a token, and comparing at least two image sections with a plurality of image signals, to identify similar tokens, (A) storing an image signal representing a first token in a first model memory; (b) creating an expanded representation of the first token in a first image memory; Storing an image signal representing the second token in a second model memory; (d) creating an expanded representation of the second token in a second image memory; and (e) generating the first model. Comparing the image signal stored in a memory with the image signal stored in a second image memory to determine a first similar distance; Comparing the image signal stored in a model memory with the image signal stored in a first image memory to determine a second similarity distance; and (g) the first and second similarities. Determining whether or not the first token and the second token are similar according to the distance of at least two image sections.