JP3937687B2

JP3937687B2 - Image processing apparatus, image processing method, and recording medium

Info

Publication number: JP3937687B2
Application number: JP2000136158A
Authority: JP
Inventors: 正己久貝
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2000-05-09
Filing date: 2000-05-09
Publication date: 2007-06-27
Anticipated expiration: 2020-05-09
Also published as: JP2001319231A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像の処理に関し、特に、画像のファイリング及びファイリングされた画像の検索に関する。
【０００２】
【従来の技術】
従来、文書等を含む画像を蓄積するイメージファイリングシステムにおいては、イメージスキャナで取り込んだ画像に検索用のキーワードのインデックスを付加して蓄積し、検索時の便宜を図っていた。
【０００３】
【発明が解決しようとする課題】
しかし、従来のファイリングシステムにおいては、新たに登録しようとする画像が、既に登録、蓄積された登録画像と二重に登録されないようにするためには、蓄積してある登録画像を、キーワード検索したり、すべての登録画像を一覧表示するなどして手作業で探し出し、新たに登録しようとする画像との一致を人間が目でみて確認しなければならなず、手間がかかった。このため、二重登録を許してしまうような事態も生じていた。
【０００４】
すなわち、従来のファイリングシステムにおいては、新たな画像と近似するか又は一致する登録画像の検索機能が、ユーザにおいて使い勝手のよいものではなかった。
【０００５】
従って、本発明の目的は、新たな画像と近似するか又は一致する画像を好適に検索し得る画像処理装置及び画像処理方法、記録媒体を提供することにある。
【０００７】
【課題を解決するための手段】
本発明によれば、予め登録された複数の登録画像の中から、入力された入力画像に近似又は一致する登録画像を検索する画像処理装置であって、前記登録画像に対して領域分割処理を実行することにより得られた領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを、当該登録画像と共に記憶した記憶手段と、前記入力画像に対して領域分割処理を実行することにより、当該入力画像内に含まれる領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを得る処理手段と、前記処理手段により得られた前記入力画像内に含まれる領域の数と、領域の数が一致する前記登録画像を前記記憶手段から第１の検索候補として検索する第１検索手段と、前記第１検索候補となった前記登録画像の中から、前記処理手段により得られた、前記入力画像内に含まれる各領域の位置及び大きさに類似する、領域の位置及び大きさを有する前記登録画像を第２検索候補として検索する第２検索手段と、前記第２検索候補となった前記登録画像を対象として、前記入力画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果と、前記対象の登録画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果とを比較することにより、テキストに関する類似度を算出するとともに、前記入力画像内の前記領域種別がイメージである領域から抽出した画像特徴量と前記対象の登録画像内の前記領域種別がイメージである領域から抽出した画像特徴量とを比較することにより、イメージに関する類似度を算出し、更に、当該算出されたテキストに関する類似度とイメージに関する類似度とに対して、予め定めた重み付けをおこなって加算することにより総合類似度を求め、当該求めた総合類似度に基づいて前記第２検索候補となった前記登録画像の中から前記入力画像に類似する登録画像を判定する第３検索手段と、を備えたことを特徴とする画像処理装置が提供される。
【０００９】
また、本発明によれば、予め登録された複数の登録画像の中から、入力された入力画像に近似又は一致する登録画像を検索する画像処理方法であって、前記登録画像に対して領域分割処理を実行することにより得られた領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを、当該登録画像と共に記憶手段に記憶する記憶工程と、前記入力画像に対して領域分割処理を実行することにより、当該入力画像内に含まれる領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを得る処理工程と、前記処理工程において得られた前記入力画像内に含まれる領域の数と、領域の数が一致する前記登録画像を前記記憶手段から第１の検索候補として検索する第１検索工程と、前記第１検索候補となった前記登録画像の中から、前記処理工程において得られた、前記入力画像内に含まれる各領域の位置及び大きさに類似する、領域の位置及び大きさを有する前記登録画像を第２検索候補として検索する第２検索工程と、前記第２検索候補となった前記登録画像を対象として、前記入力画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果と、前記対象の登録画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果とを比較することにより、テキストに関する類似度を算出するとともに、前記入力画像内の前記領域種別がイメージである領域から抽出した画像特徴量と前記対象の登録画像内の前記領域種別がイメージである領域から抽出した画像特徴量とを比較することにより、イメージに関する類似度を算出し、更に、当該算出されたテキストに関する類似度とイメージに関する類似度とに対して、予め定めた重み付けをおこなって加算することにより総合類似度を求め、当該求めた総合類似度に基づいて前記第２検索候補となった前記登録画像の中から前記入力画像に類似する登録画像を判定する第３検索工程と、を含むことを特徴とする画像処理方法が提供される。
【００１１】
また、本発明によれば、予め登録された複数の登録画像の中から、入力された入力画像に近似又は一致する登録画像を検索するために、コンピュータを、前記登録画像に対して領域分割処理を実行することにより得られた領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを、当該登録画像と共に記憶した記憶手段、前記入力画像に対して領域分割処理を実行することにより、当該入力画像内に含まれる領域の数と各領域の位置と各領域の大きさと各領域の領域種別とを得る処理手段、前記処理手段により得られた前記入力画像内に含まれる領域の数と、領域の数が一致する前記登録画像を前記記憶手段から第１の検索候補として検索する第１検索手段、前記第１検索候補となった前記登録画像の中から、前記処理手段により得られた、前記入力画像内に含まれる各領域の位置及び大きさに類似する、領域の位置及び大きさを有する前記登録画像を第２検索候補として検索する第２検索手段、前記第２検索候補となった前記登録画像を対象として、前記入力画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果と、前記対象の登録画像内の前記領域種別がテキストである領域を文字認識して得た文字認識結果とを比較することにより、テキストに関する類似度を算出するとともに、前記入力画像内の前記領域種別がイメージである領域から抽出した画像特徴量と前記対象の登録画像内の前記領域種別がイメージである領域から抽出した画像特徴量とを比較することにより、イメージに関する類似度を算出し、更に、当該算出されたテキストに関する類似度とイメージに関する類似度とに対して、予め定めた重み付けをおこなって加算することにより総合類似度を求め、当該求めた総合類似度に基づいて前記第２検索候補となった前記登録画像の中から前記入力画像に類似する登録画像を判定する第３検索手段、として機能させるプログラムを記録した記録媒体が提供される。
【００１２】
【発明の実施の形態】
以下、本発明の好適な実施の形態について、添付図面を参照して説明する。
【００１３】
図６は、本発明の一実施形態に係るイメージファイリングシステムが実現されるハードウェアの一例を示した図である。図６に示すように、本システムは、一般的なコンピュータシステム上で実現可能である。
【００１４】
図６において、６００はアドレス信号を伝えたり、データを伝達させるバス、６０１は制御を行うＣＰＵ、６０２はＢＩＯＳやＯＳをブートするためのプログラムを記憶するＲＯＭ、６０３はＯＳや各種プログラムをロードしたり作業領域に使用するＲＡＭである。
【００１５】
また、６０４は画像データベースを蓄積したり、ＯＳや各種プログラムを記憶したり、あるいは作業データの一時ファイルを記憶する外部記憶装置、６０５は、文書画像や各種メッセージなどを表示するディスプレイ、６０６はイメージスキャナインターフェースであり、６０７は文書を読み込み文書画像にするイメージスキャナである。
【００１６】
図１は、新たな画像の登録処理を説明するフローチャートである。以下、本実施形態において、説明の便宜上、新たに登録するために与えられた画像を入力画像と称し、既に登録され、蓄積された画像を登録画像と称する。
【００１７】
入力画像は、例えば、イメージスキャナ６０７によって文書等が読み込まれ、カラー画像、または、白黒多値画像または二値画像として外部記憶装置６０４等に記憶される。
【００１８】
また、入力画像は、アプリケーションソフトで作られた文書データをビットマップ形式等の画像に変換することによっても得られる。
【００１９】
図８は、アプリケーションソフトで作られた文書データの場合、それをビットマップ形式の画像に変換するステップを説明している。
【００２０】
ステップＳ８０２では、入力された文書データを（ステップＳ８０１）、ワープロソフト等のアプリケーションソフト（例えばMicrosoft社のWord、一太郎：共に登録商標）がＧＤＩ形式８０３に変換する（ステップＳ８０３）。
【００２１】
そして、ステップＳ８０４では、プリンタドライバないしＦＡＸドライバなどが、ビットマップ形式の画像に変換する（ステップＳ８０５）。
【００２２】
次に、図１に戻り、ステップＳ１０１では、入力画像に対して領域分割処理を実行する。領域分割処理とは、画像を、その内容の種別に従ったブロック（領域）に分割する処理である。例えば、画像中のテキスト部分、イメージ部分、表部分等の種別に従ったブロックに分割する処理である。このような領域分割処理の具体的内容は種々提案されており、例えば、特開平０６−０６８３０１号公報等において開示されている。
【００２３】
図７は、領域分割処理を行った一例を示す図である。図７において、７０１は画像全体、７０２，７０３，７０４は、テキストブロック、７０５，７０６はイメージブロックを示している。なお、ブロック７０３と７０４の中の太線は文字列を簡略化して表したものである。図７の例では、ブロックの種類を、テキストとイメージとの二種類に分けているが、ブロックの種類をもっと多くの種類に細分してもよいことはいうまでもない。
【００２４】
入力画像を領域分割した結果は、図１の１１のブロック情報としてＲＡＭ６０３に記憶される。図４はこのブロック情報を説明した図である。
【００２５】
ブロック情報は、ブロック情報ヘッダと、領域分割された各ブロックのブロック情報データ１〜ブロック情報データｎからなる。
【００２６】
ブロック情報ヘッダには、例えば、総ブロック数、テキストブロック数、イメージブロック数、判別不能なブロック数、に係る情報が含まれる。また、各ブロック情報データには、例えば、ブロックＩＤ、ブロック種別、ブロックの座標情報、ブロックの横幅、ブロックの高さ、に係る情報が含まれる。なお、本実施形態では、ブロックの座標情報として、ブロックの中心の座標を用いるが、中心以外の座標（例えば、ブロック左上頂点の座標）を用いてもよい。図４の下方に示したプログラムは、ブロック情報ヘッダ及びブロック情報データの内容についてＣ言語で記述したプログラムの例を示している。
【００２７】
以下、領域分割してできる各ブロックを、Ｂ１，Ｂ２，Ｂ３，…，Ｂｎとする。上述した通り、ブロックは、テキストブロックとイメージブロックに区別され、また、ブロックＢiの中心座標をＣＸ(Ｂi)，ＣＹ(Ｂi)とする。
【００２８】
そして、テキストブロックを、その中心座標ＣＸ(Ｂi)を第１キー、その中心座標ＣＹ(Ｂi)を第２キーとしてソートする。ソートされた結果のテキストブロックをＴＢ１，ＴＢ２，…，ＴＢｍとする。
【００２９】
同様にイメージブロックをソートし、ソートされた結果を、ＩＢ１，ＩＢ２，…，ＩＢｋとする。図４のブロック情報データには、テキストブロックをＴＢ１，ＴＢ２，…，ＴＢｍ、イメージブロックＩＢ１，ＩＢ２，…，Ｉｂｋの順に記録される。
【００３０】
そして、外部記憶装置６０４に蓄積、構成されている画像データベースに、入力画像は登録画像としてブロック情報１１と対にして記憶保管されるとともに、その記憶位置を一時的にＲＡＭ６０３に記憶する。
【００３１】
画像データベースには、図３で示されているページテーブルが記憶されている。図３においてページＩＤは登録画像を一意的に決定できる番号であり、例えば登録画像を登録した順番につけた順序番号である。図３でｍ，ｋは、それぞれ各登録画像を領域分割して抽出したテキストブロックの個数とイメージブロックの個数とを示している。図において、インスタンスポインタは、対応する登録画像とブロック情報の対が記録されている外部記憶装置６０４内の記録位置を示している。
【００３２】
ステップＳ１０２では、まず、今回登録した画像についてのページＩＤと，テキストブロック及びイメージブロックの個数ｍ＋ｋ，イメージブロックの個数ｋ、インスタンスポインタをページテーブルに追加記録する。次に、ステップＳ１０３では、このテーブルを、ｍ＋ｋを第１キー、ｋを第２キーとしてソートする。
【００３３】
このようにして、入力画像は登録画像として保存される。しかし、入力画像が既存の登録画像と同一である場合には、二重登録を防止する必要がある。また、既存の登録画像と著しく近似する場合は、ユーザにおいてその登録を希望しない場合もある。そこで、本システムでは、図２のフローチャートに従って二重登録を防止する登録処理がなされる。
【００３４】
ステップＳ２０２では、入力画像に対して、領域分割処理を実行する。この処理は、図１の場合と同じである。その結果、図４に示すようなブロック情報が得られる。
【００３５】
ステップＳ２０３では、入力画像のテキストブロック数ｍ、イメージブロック数ｋとから、総ブロック数ｍ＋ｋ＝ｎを計算し、ページテーブルを参照して総ブロック数ｎと一致する登録画像を第１の検索候補として絞り込みを行う。なお、第１の検索候補が一つしかない場合等には、これを最終的な候補としてもよい。
【００３６】
ステップＳ２０４では、これら第１の検索候補の各登録画像と入力画像との間の一致度を求める。本実施形態では、各登録画像と入力画像との一致度として、各ブロックの大きさ、位置に基づき、両者の距離を求める。ここでは、登録画像と入力画像との距離を以下のようにして求める。
【００３７】
入力画像について、上述した方法でソートされたテキストブロックを、
ＴＢ’１，ＴＢ’２，…，ＴＢ’ｍ’
イメージブロックを
ＩＢ’１，ＩＢ’２，…，ＩＢ’ｋ’
また、登録画像のテキストブロックを、
ＴＢ１，ＴＢ２，…ＴＢｍ
イメージブロックを
ＩＢ１，ＩＢ２，…ＩＢｋ
とする。更に、テキストブロックＴＢiの幅、高さを、Ｗ（ＴＢi），Ｈ（ＴＢi）、イメージブロックＩＢjの幅、高さを、Ｗ（ＩＢj），Ｈ（ＩＢj）と表す。
【００３８】
２つの画像の距離Ｄは、以下のように計算される。
【００３９】
【数１】

【００４０】
ここで、級数の項数ｍtは、
ｍt＝ｍｉｎ（ｍ，ｍ’）
である。
【００４１】
【数２】

【００４２】
ここで、級数の項数ｋｉは、
ｋi＝ｍｉｎ（ｋ，ｋ’）
である。そして、
Ｄ＝Ｄt＋αＤi
として距離Ｄ（以下、第１識別関数と呼ぶ。）を計算する。ここで、αは画像の識別がもっともよくなるようにあらかじめ実験的に決めておいた定数である。一般に、テキストブロックよりイメージブロックのほうが精度よく求められると考えられるので、例えば、経験的に２ぐらいの値にしておいてもよい。つまり、ＤiのほうがＤtよりも識別に有効に働くわけである。
【００４３】
このようにして、第１の検索候補に係る各登録画像と入力画像との距離Ｄを求めたら、距離Ｄの小さいほうからいくつかの登録画像を選ぶことにより、検索候補を絞り込む。たとえば、距離Ｄの小さいほうから３つだけを選ぶ、あるいは、第１検索候補の数がある割合（たとえば１／５）に減るように距離Ｄの小さいほうから選ぶことにより絞込みを行う。こうして絞り込みを行い、第２の検索候補とする。この第２の検索候補の集合をＳ２とする。
【００４４】
なお、この段階で、距離Ｄが最小な登録画像を１つだけに絞り込み、これを最終候補としてもよい。
【００４５】
次にステップＳ２０５では次のようにさらに絞り込みを行う。
【００４６】
この集合Ｓ２のなかから、入力画像にもっとも近いものを以下のように選び出す。上記検索では、双方の対応するブロックについて、ブロック位置とブロックサイズを比較し、距離計算を行った。今度は、各対応するブロックの中身の比較をして、さらに距離計算を行う。それには、テキストブロック同士の比較とイメージブロック同士の比較がある。入力画像と登録画像との対応するテキストブロックをＴＸＴＢ２，ＴＸＴＢとする。対応付けは、ブロックの中心のＸ座標を第１キー、Ｙ座標を第２キーとしてソートした場合に同順位にあるブロックを対応させることで行う。これらのテキストブロックを２値化してＯＣＲ（光学的文字認識）を行えば文字列が得られる。そしてＴＸＴＢ２の文字列とＴＸＴＢの文字列をＤＰマッチング（Dynamic Programming：動的計画法）の手法で比較することにより、
ＴＸＴＢ２にあって、ＴＸＴＢにない文字の個数：ｎ１
ＴＸＴＢ２になくて、ＴＸＴＢにある文字の個数：ｎ２
ＴＸＴＢ２とＴＸＴＢと対応する文字列が異なっている文字の個数：ｎ３２，ｎ３（ｎ３２は、ＴＸＴＢ２のほうの文字数、ｎ３はＴＸＴＢのほうの文字数である）
が求められる。ＤＰマッチングは、例えば、情報科学講座「音声認識」（新美康永著、共立出版）の１０７ページにも開示されている公知の技術である。
【００４７】
図５は、ＴＸＴＢ２とＴＸＴＢの各文字列をＤＰマッチングした例の説明図である。図で各文字列は太線で表されている。Ｅの部分は、文字列が一致した部分、Ｘの部分は上記１（ＴＸＴＢ２にあって、ＴＸＴＢにない文字）の部分、Ｙの部分は上記２（ＴＸＴＢ２になくて、ＴＸＴＢにある文字）の部分、Ｚの部分は上記３（ＴＸＴＢ２とＴＸＴＢと対応する文字列が異なっている文字）の部分である。
【００４８】
この結果、２つのテキストブロックＴＸＴＢ２とＴＸＴＢの距離を次のように計算できる。
【００４９】
Ｄ（ＴＸＴＢ２，ＴＸＴＢ）＝（ｎ１＋ｎ２＋ｎ３２＋ｎ３）／ＮＣ
ここで、ＮＣはＴＸＴＢ２の文字数とＴＸＴＢの文字数の合計である。
【００５０】
このようにして、対応するテキストブロックについて距離が求まる。また、入力画像と登録画像との間で、テキストブロック数が一致しない場合も考えられる。たとえば、入力画像のほうがテキストブロック数が多くて、ＴＸＴＢ２に対応する登録画像のテキストブロックがないならば、距離は１となる。このようにして、すべてのテキストブロックについて求まった距離を合計したものをテキストブロック距離と呼ぶことにする。
【００５１】
今度は、イメージブロックについて入力画像と登録画像との比較である。入力画像のイメージブロックＩＭＧＢ２と登録画像のイメージブロックＩＭＧＢが対応するものとする。対応付けは、ブロックの中心のＸ座標を第１キー、Ｙ座標を第２キーとしてソートした場合に同順位にあるブロックを対応させることで行う。ＩＭＧＢ２を二値化してできる画像について、全画素数に対する黒画素数の比（すなわち、黒画素数÷全画素数）ratio（ＩＭＧＢ２）を求める。同様にして、ratio（ＩＭＧＢ）を求める。
【００５２】
｜ratio（ＩＭＧＢ２）−ratio（ＩＭＧＢ）｜
をＩＭＧＢ２とＩＭＧＢとの距離とする。対応するイメージブロックがない場合は、距離は最大値の１とする。そして、全イメージブロックについての距離の合計値をイメージブロック距離と呼ぶことにする。
【００５３】
さて、入力画像と登録画像との詳細識別距離ｄを
ｄ＝テキストブロック距離＋β×イメージブロック距離
で求める。ここで、βは前に述べたαと同様で、イメージブロック距離にかける重みづけファクターである。画像の識別がうまくいくように実験的にβをもとめるのが望ましいが、イメージブロックのほうがテキストブロックよりも精度良く抽出できる（つまり信頼性が高い）ので、おおまかに１より大きい値（たとえば２）にしてもよい。上記ｄを第２識別関数と呼ぶ。
【００５４】
さて、集合Ｓ２のすべての登録画像と入力画像との詳細識別距離ｄを求め、ステップＳ２０６では、最小の詳細識別距離ｄ０について、所定の値δと比較する。δよりｄ０が小さければ、このｄ０を与える集合Ｓ２の登録画像を入力画像と一致する登録画像だと判定する。
【００５５】
ここで、δはあらかじめの実験で求める値である。たとえば、ひとつの画像を条件を変えてイメージスキャナで何回も読み込んでできる１０００個の画像のものと、ある条件で読み込んだ画像との詳細距離（１０００個ある）をもとめ、この１０００個の数値の最大値をδとする。
【００５６】
ステップＳ２０６で一致するものがあれば入力画像は画像データベースには登録しない。一致するものがなければ、ステップＳ２０７へ進む。
【００５７】
ステップＳ２０７は、図１で説明したＳ１０２からＳ１０３までの文書登録処理とまったく同じである。
【００５８】
以上、本発明の好適な実施形態について説明したが、上述した第１及び第２識別関数は、登録画像と入力画像との距離を計算するものとしたが、距離ではなく類似度を計算してもよい。例えば、距離の逆数を計算すれば、すなわち類似度となることは明白である。識別関数を類似度とした場合は、候補の選択は、類似度の大きいものから順番に絞り込むことになる。
【００５９】
また、上記実施形態は、画像の二重登録を避けるためのものであったが、同様の処理を応用して他の用途、例えば、画像検索装置としても用いることができる。画像の検索を行う場合、入力画像に対してステップＳ２０２乃至Ｓ２０６の処理を施すことにより、一致する登録画像をデータベースから探し出すことができるので、その後、一致する登録画像を取り出す処理を行うことにより、画像検索が可能となる。
【００６０】
たとえば、文書画像を検索したい場合、手元に探し出したい文書画像とほとんど同じであるが、少し違っている文書画像があり、原本の文書画像をデータベースから取り出したいという用途がある。この場合、データベースが前記実施形態のように構成されていれば、手元にある文書画像ともっとも似通った文書画像の検索を、手作業によらず、行うことができる。このような用途としては、手元の文書画像は原本を何回もコピーしたものによるものであるために、印刷状態が悪くなったものであった場合、原本から再び印刷状態の良好な文書を取り出したいというケースがある。
【００６１】
なお、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはCPUやMPU）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム(OS)などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６２】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるCPUなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００６３】
【発明の効果】
以上、本発明によれば、新たな画像と近似するか又は一致する画像を好適に検索することができる。
【図面の簡単な説明】
【図１】新たな画像の登録処理を説明するフローチャートである。
【図２】二重登録を防止しつつ新たな画像の登録処理を説明するフローチャートである。
【図３】ページテーブルの説明図である。
【図４】ブロック情報の説明図である。
【図５】テキストブロックのＤＰマッチングの説明図である。
【図６】本発明の一実施形態に係るイメージファイリングシステムが実現されるハードウェアの一例を示した図である。
【図７】領域分割処理を行った一例を示す図である。
【図８】文書データから入力画像を得る場合の処理のフローチャートである。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to image processing, and more particularly to image filing and filing image retrieval.
[0002]
[Prior art]
Conventionally, in an image filing system for storing images including documents and the like, an index of a search keyword is added to and stored in an image captured by an image scanner for convenience during a search.
[0003]
[Problems to be solved by the invention]
However, in the conventional filing system, in order to prevent an image to be newly registered from being registered twice with an already registered and accumulated registered image, a keyword search is performed on the accumulated registered image. In addition, all the registered images must be displayed in a list and searched manually, and humans have to visually confirm the match with the image to be newly registered, which is troublesome. For this reason, there was a situation that allowed double registration.
[0004]
That is, in the conventional filing system, the search function for a registered image that approximates or coincides with a new image is not user-friendly.
[0005]
Accordingly, an object of the present invention is to provide an image processing apparatus and image processing method that obtained by suitably searches an image or matches approximate the new image, a recording medium.
[0007]
[Means for Solving the Problems]
According to the present invention, in advance from among the registered plurality of registered images, an image processing apparatus for searching a registered image that approximates or matches the input image, area dividing process for the registration picture image and the number of the area obtained by performing the position of each region and the size of each region and the region type of each region, and storing means for storing with the reference image, the area dividing process in respect to the input image by executing the obtaining the number of regions included in the input image and the position of each region and the size of each region and the region type of each area processing unit and pre Symbol processing the input image obtained by the means A first search unit that searches the storage unit as a first search candidate for the registered image that matches the number of regions included in the number of regions, and the registered image that is the first search candidate. Obtained by the processing means A second search means for searching the registered image having the position and size of the region, which is similar to the position and size of each region included in the input image, as a second search candidate; For the registered image, the character recognition result obtained by character recognition of the region in which the region type is text in the input image, and the region in which the region type in the target registered image is text Is compared with the character recognition result obtained by character recognition, and the similarity with respect to the text is calculated, and the image feature amount extracted from the region in which the region type in the input image is an image and the registration of the target By comparing the image feature amount extracted from the region where the region type in the image is an image, the similarity degree regarding the image is calculated, and further, the calculated text is related. The registered image that is the second search candidate based on the obtained overall similarity is obtained by adding and weighting a predetermined weight to the similarity and the similarity related to the image. And a third search means for determining a registered image similar to the input image .
[0009]
Further, according to the present invention, from among a plurality of registered images registered in advance, an image processing method of searching a registered image that approximates or matches the input image, for the registration picture image area A storage step of storing the number of regions obtained by executing the division process, the position of each region, the size of each region, and the region type of each region in a storage unit together with the registered image; and for the input image Te by performing region segmentation processing, a processing step of obtaining a number of areas contained within the input image and the position of each region and the size of each region and the region type of each region, obtained in the previous SL process A first search step of searching the storage means as a first search candidate for the number of regions included in the input image and the number of regions that match, and the first search candidate From the registered images, A second search step for searching as a second search candidate the registered image having a region position and size similar to the position and size of each region included in the input image obtained in a logical step; Character recognition results obtained by character recognition of an area in which the area type in the input image is text for the registered image that is the second search candidate, and the area type in the target registered image The image feature amount extracted from the region in which the region type in the input image is calculated while calculating the similarity with respect to the text by comparing the character recognition result obtained by character recognition of the region in which the character is text Is compared with the image feature amount extracted from the region whose image type is the image in the registered image of the object, and the similarity with respect to the image is calculated. The total similarity is obtained by adding a predetermined weight to the similarity relating to the text and the similarity relating to the image, and the second search candidate is obtained based on the obtained overall similarity. And a third search step of determining a registered image similar to the input image from the registered images .
[0011]
Further, according to the present invention, from among a plurality of registered images registered in advance, in order to find the registered image that approximates or matches the input image, a computer, area division for the registration picture image and the number of the area obtained by executing the process and the position of each region and the size of each region and the region type of each region, storing means for storing with the reference image, the area dividing process in respect to the input image by executing the processing means for obtaining and the input region contained in the image number and position and the area of each region size and the region region class, prior Symbol processing the input image obtained by the means First search means for searching the storage means as a first search candidate for the registered image having the same number of areas as the number of areas included , from among the registered images that have become the first search candidate, Obtained by processing means Second search means for searching the registered image having the position and size of the region similar to the position and size of each region included in the input image as a second search candidate; For the registered image, the character recognition result obtained by character recognition of the region in which the region type is text in the input image, and the region in which the region type in the target registered image is text Is compared with the character recognition result obtained by character recognition, and the similarity with respect to the text is calculated, and the image feature amount extracted from the region in which the region type in the input image is an image and the registration of the target By comparing the image feature amount extracted from the region where the region type in the image is an image, the similarity degree regarding the image is calculated, and further, the calculated text is related. The registered image that is the second search candidate based on the obtained overall similarity is obtained by adding and weighting a predetermined weight to the similarity and the similarity related to the image. A recording medium recording a program that functions as third search means for determining a registered image similar to the input image is provided.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of the invention will be described with reference to the accompanying drawings.
[0013]
FIG. 6 is a diagram illustrating an example of hardware in which an image filing system according to an embodiment of the present invention is realized. As shown in FIG. 6, this system can be realized on a general computer system.
[0014]
In FIG. 6, 600 is a bus for transmitting address signals and data, 601 is a CPU for controlling, 602 is a ROM for storing programs for booting BIOS and OS, 603 is for loading OS and various programs. Or RAM used for the work area.
[0015]
Reference numeral 604 denotes an external storage device that stores an image database, stores an OS and various programs, or stores temporary files of work data. 605 is a display that displays document images and various messages. 606 is an image. A scanner interface 607 is an image scanner which reads a document and converts it into a document image.
[0016]
FIG. 1 is a flowchart for explaining a new image registration process. Hereinafter, in this embodiment, for convenience of explanation, an image given for new registration is referred to as an input image, and an image that has already been registered and accumulated is referred to as a registered image.
[0017]
As the input image, for example, a document or the like is read by the image scanner 607 and stored in the external storage device 604 or the like as a color image, a monochrome multi-value image, or a binary image.
[0018]
The input image can also be obtained by converting document data created by application software into an image of a bitmap format or the like.
[0019]
FIG. 8 illustrates a step of converting document data created by application software into a bitmap format image.
[0020]
In step S802, the input document data (step S801) is converted into GDI format 803 by application software such as word processing software (for example, Microsoft Word, Ichitaro: both are registered trademarks) (step S803).
[0021]
In step S804, a printer driver or a FAX driver converts the image into a bitmap format image (step S805).
[0022]
Next, returning to FIG. 1, in step S <b> 101, region division processing is executed on the input image. The area dividing process is a process of dividing an image into blocks (areas) according to the type of contents. For example, it is a process of dividing into blocks according to the type of text portion, image portion, table portion, etc. in the image. Various specific contents of such area division processing have been proposed, and are disclosed in, for example, Japanese Patent Laid-Open No. 06-068301.
[0023]
FIG. 7 is a diagram illustrating an example in which the region division processing is performed. In FIG. 7, reference numeral 701 denotes an entire image,

reference numerals

702, 703, and 704 denote text blocks, and

reference numerals

705 and 706 denote image blocks. The bold lines in the

blocks

703 and 704 are simplified representations of character strings. In the example of FIG. 7, the types of blocks are divided into two types, text and image, but it goes without saying that the types of blocks may be subdivided into more types.
[0024]
The result of area division of the input image is stored in the RAM 603 as the block information 11 in FIG. FIG. 4 is a diagram illustrating this block information.
[0025]
The block information includes a block information header and block information data 1 to block information data n of each block divided into regions.
[0026]
The block information header includes, for example, information related to the total number of blocks, the number of text blocks, the number of image blocks, and the number of blocks that cannot be determined. Each block information data includes, for example, information related to a block ID, a block type, block coordinate information, a block width, and a block height. In this embodiment, the coordinates of the center of the block are used as the block coordinate information, but coordinates other than the center (for example, the coordinates of the upper left vertex of the block) may be used. The program shown in the lower part of FIG. 4 shows an example of a program in which the contents of the block information header and the block information data are described in C language.
[0027]
In the following, each block formed by dividing the area is referred to as B1, B2, B3,. As described above, a block is classified into a text block and an image block, and the center coordinates of the block Bi are CX (Bi) and CY (Bi).
[0028]
The text blocks are sorted with the center coordinate CX (Bi) as the first key and the center coordinate CY (Bi) as the second key. The text blocks resulting from the sorting are denoted by TB1, TB2,.
[0029]
Similarly, the image blocks are sorted, and the sorted result is defined as IB1, IB2,. In the block information data of FIG. 4, text blocks are recorded in the order of TB1, TB2,..., TBm, and image blocks IB1, IB2,.
[0030]
The input image is stored and stored as a registered image in a pair with the block information 11 in the image database stored and configured in the external storage device 604, and the storage position is temporarily stored in the RAM 603.
[0031]
In the image database, the page table shown in FIG. 3 is stored. In FIG. 3, the page ID is a number that can uniquely determine a registered image, and is, for example, a sequence number given in the order in which the registered images are registered. In FIG. 3, m and k respectively indicate the number of text blocks and the number of image blocks extracted by dividing each registered image into regions. In the figure, the instance pointer indicates a recording position in the external storage device 604 where a pair of a corresponding registered image and block information is recorded.
[0032]
In step S102, first, the page ID, the number m + k of text blocks and image blocks, the number k of image blocks, and the instance pointer are additionally recorded in the page table for the currently registered image. In step S103, the table is sorted with m + k as the first key and k as the second key.
[0033]
In this way, the input image is saved as a registered image. However, when the input image is the same as an existing registered image, it is necessary to prevent double registration. In addition, in the case of remarkably approximating an existing registered image, the user may not desire the registration. Therefore, in this system, registration processing for preventing double registration is performed according to the flowchart of FIG.
[0034]
In step S202, an area division process is performed on the input image. This process is the same as in the case of FIG. As a result, block information as shown in FIG. 4 is obtained.
[0035]
In step S203, the total number of blocks m + k = n is calculated from the number of text blocks m and the number of image blocks k of the input image, and a registered image that matches the total number of blocks n is referred to as the first search candidate by referring to the page table. Narrow down as If there is only one first search candidate, this may be the final candidate.
[0036]
In step S204, the degree of coincidence between each registered image of the first search candidate and the input image is obtained. In the present embodiment, as the degree of coincidence between each registered image and the input image, the distance between the two is obtained based on the size and position of each block. Here, the distance between the registered image and the input image is obtained as follows.
[0037]
For the input image, the text blocks sorted by the method described above are
TB'1, TB'2, ..., TB'm '
IB'1, IB'2, ..., IB'k '
In addition, the text block of the registered image
TB1, TB2, ... TBm
IB1, IB2, ... IBk
And Further, the width and height of the text block TBi are represented as W (TBi) and H (TBi), and the width and height of the image block IBj are represented as W (IBj) and H (IBj).
[0038]
The distance D between the two images is calculated as follows.
[0039]
[Expression 1]

[0040]
Here, the series term mt is
mt = min (m, m ′)
It is.
[0041]
[Expression 2]

[0042]
Here, the series term ki is
ki = min (k, k ′)
It is. And
D = Dt + αDi
The distance D (hereinafter referred to as the first discriminant function) is calculated as follows. Here, α is a constant experimentally determined in advance so that the image can be best identified. In general, it is considered that an image block is more accurately obtained than a text block. For example, a value of about 2 may be set empirically. That is, Di works more effectively for identification than Dt.
[0043]
When the distance D between each registered image related to the first search candidate and the input image is obtained in this way, the search candidates are narrowed down by selecting several registered images from the smaller distance D. For example, narrowing down is performed by selecting only three from the smaller distance D, or selecting from the smaller distance D so that the number of first search candidates decreases to a certain ratio (for example, 1/5). In this way, narrowing down is performed as a second search candidate. This second set of search candidates is defined as S2.
[0044]
At this stage, the registered image with the smallest distance D may be narrowed down to one, and this may be the final candidate.
[0045]
In step S205, further narrowing is performed as follows.
[0046]
From this set S2, the one closest to the input image is selected as follows. In the above search, for both corresponding blocks, the block position and the block size were compared, and the distance was calculated. This time, the contents of each corresponding block are compared, and further distance calculation is performed. There are comparisons between text blocks and image blocks. The text blocks corresponding to the input image and the registered image are denoted as TXTB2 and TXTB. The association is performed by associating blocks having the same rank when sorting with the X coordinate of the center of the block as the first key and the Y coordinate as the second key. A character string can be obtained by binarizing these text blocks and performing OCR (optical character recognition). Then, by comparing the TXTB2 character string and the TXTB character string using the DP matching (Dynamic Programming) technique,
Number of characters in TXTB2 but not in TXTB: n1
Number of characters in TXTB but not in TXTB2: n2
Number of characters with different character strings corresponding to TXTB2 and TXTB: n32, n3 (n32 is the number of characters in TXTB2 and n3 is the number of characters in TXTB)
Is required. DP matching is a well-known technique that is also disclosed on page 107 of the information science course "voice recognition" (by Yasunaga Niimi, Kyoritsu Shuppan).
[0047]
FIG. 5 is an explanatory diagram of an example in which each character string of TXTB2 and TXTB is DP-matched. In the figure, each character string is represented by a bold line. The E part is the part where the character strings match, the X part is the part 1 (characters in TXTB2 and not in TXTB), and the Y part is the part 2 (characters in TXTB, not TXTB2) The part Z is the part 3 (characters having different character strings corresponding to TXTB2 and TXTB).
[0048]
As a result, the distance between the two text blocks TXTB2 and TXTB can be calculated as follows.
[0049]
D (TXTB2, TXTB) = (n1 + n2 + n32 + n3) / NC
Here, NC is the total number of TXTB2 characters and TXTB characters.
[0050]
In this way, the distance is determined for the corresponding text block. Further, there may be a case where the number of text blocks does not match between the input image and the registered image. For example, if the input image has more text blocks and there is no registered image text block corresponding to TXTB2, the distance is 1. Thus, the sum of the distances obtained for all the text blocks is referred to as a text block distance.
[0051]
This time, it is a comparison between the input image and the registered image for the image block. Assume that the image block IMGB2 of the input image corresponds to the image block IMGB of the registered image. The association is performed by associating blocks having the same rank when sorting with the X coordinate of the center of the block as the first key and the Y coordinate as the second key. For an image obtained by binarizing IMGB2, a ratio of the number of black pixels to the total number of pixels (that is, the number of black pixels / total number of pixels) ratio (IMGB2) is obtained. Similarly, ratio (IMGB) is obtained.
[0052]
| Ratio (IMGB2) -ratio (IMGB) |
Is the distance between IMGB2 and IMGB. If there is no corresponding image block, the distance is the maximum value of 1. The total value of the distances for all image blocks is referred to as an image block distance.
[0053]
Now, a detailed identification distance d between the input image and the registered image is obtained by d = text block distance + β × image block distance. Here, β is the same as α described above, and is a weighting factor applied to the image block distance. It is desirable to experimentally obtain β so that the image can be easily identified. However, since an image block can be extracted more accurately than a text block (ie, more reliable), a value roughly larger than 1 (for example, 2) It may be. Said d is called a 2nd discriminant function.
[0054]
Now, the detailed identification distance d between all registered images in the set S2 and the input image is obtained, and in step S206, the minimum detailed identification distance d0 is compared with a predetermined value δ. If d0 is smaller than δ, it is determined that the registered image of the set S2 giving this d0 is a registered image that matches the input image.
[0055]
Here, δ is a value obtained by a prior experiment. For example, the number of 1000 images that can be read many times with an image scanner under different conditions and the detailed distance (1000 images) between the image read under a certain condition is obtained. Is the maximum value of δ.
[0056]
If there is a match in step S206, the input image is not registered in the image database. If there is no match, the process proceeds to step S207.
[0057]
Step S207 is exactly the same as the document registration process from S102 to S103 described in FIG.
[0058]
The preferred embodiment of the present invention has been described above. The first and second discrimination functions described above calculate the distance between the registered image and the input image, but calculate the similarity instead of the distance. Also good. For example, it is obvious that the reciprocal of the distance is calculated, that is, the similarity is obtained. When the discrimination function is the similarity, the selection of candidates is narrowed down in descending order of the similarity.
[0059]
Moreover, although the said embodiment was for avoiding double registration of an image, it can be used also as another use, for example, an image search apparatus, applying the same process. When searching for an image, by performing the processing of steps S202 to S206 on the input image, it is possible to search for a matching registered image from the database. Image search is possible.
[0060]
For example, when searching for a document image, there is a document image that is almost the same as the document image to be found at hand, but is slightly different, and the original document image is to be retrieved from the database. In this case, if the database is configured as in the above-described embodiment, it is possible to search for a document image that is most similar to the document image at hand, regardless of manual operation. In such a case, since the original document image is a copy of the original many times, if the print state has deteriorated, a document with a good print state is taken out from the original again. There is a case of wanting.
[0061]
An object of the present invention is to supply a storage medium (or recording medium) that records a program code of software that implements the functions of the above-described embodiments to a system or apparatus, and to perform a computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.
[0062]
Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is determined based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.
[0063]
【The invention's effect】
As described above, according to the present invention can be suitably search to Rukoto an image or matches approximate the new image.
[Brief description of the drawings]
FIG. 1 is a flowchart illustrating a new image registration process.
FIG. 2 is a flowchart illustrating a new image registration process while preventing double registration.
FIG. 3 is an explanatory diagram of a page table.
FIG. 4 is an explanatory diagram of block information.
FIG. 5 is an explanatory diagram of DP matching of a text block.
FIG. 6 is a diagram illustrating an example of hardware for realizing an image filing system according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating an example in which region division processing is performed.
FIG. 8 is a flowchart of processing when an input image is obtained from document data.

Claims

An image processing apparatus that searches for a registered image that approximates or matches an input image that has been input from a plurality of registered images that are registered in advance,
A storage means for a number of the resulting area and position of each region and the size of each region and the region type of each region, stored along with the reference image by performing the area dividing process for the registration picture image,
By performing the region segmentation processing to the input image, and processing means for obtaining a region of the size and the area of the position and the area of the number and the area of the region contained in the input image type,
The number of areas contained within the resulting the input image by the pre-Symbol processing means, a first retrieving means for retrieving the registered image number of regions matches the first search candidates from the storage means,
The position and size of a region similar to the position and size of each region included in the input image obtained by the processing means from the registered images that are the first search candidates. Second search means for searching for registered images as second search candidates;
Character recognition results obtained by character recognition of an area in which the area type in the input image is text for the registered image that is the second search candidate, and the area type in the target registered image The image feature amount extracted from the region in which the region type in the input image is calculated while calculating the similarity with respect to the text by comparing the character recognition result obtained by character recognition of the region in which the character is text Is compared with the image feature amount extracted from the region whose image type is the image in the target registered image, and the similarity regarding the image is calculated, and the similarity regarding the calculated text and the image A total weight is obtained by adding a predetermined weight to the degree of similarity to determine the total similarity, and based on the obtained total similarity, the first Third search means for determining image similar to the input image from among the registered images that have searched candidate,
An image processing apparatus comprising:

The image feature amount extracted from the area whose area type is an image is a ratio of the number of black pixels to the total number of pixels in the image area, which is calculated by binarizing the area corresponding to the image. The image processing apparatus according to claim 1 .

If the registered image determined to be similar to the input image by the third search means does not exist, the input image is stored in the storage means as a new registered image, while the input by the third search means The image processing apparatus according to claim 1, further comprising: a registration unit that does not store the input image as the registered image in the storage unit when the registered image determined to be similar to an image exists.

The image processing apparatus according to claim 1, wherein in the weighting, a weight with respect to the similarity with respect to the image is larger than a weight with respect to the similarity with respect to the text.

The second search means sets the similarity weight related to the position and size of the area where the area type is an image to be larger than the similarity weight related to the position and size of the area where the area type is text. The image processing apparatus according to claim 1, wherein the second search candidate is searched.

An image processing method for searching for a registered image that approximates or matches an input image input from a plurality of registered images registered in advance,
Wherein a number of the area obtained by for the registered image image to perform region segmentation process and the position of each region and the size of each region and the region type of each region, stored in the storage means together with the registered image storage Process,
By performing the region segmentation processing to the input image, a processing step of obtaining a region of the size and the area of the position and the area of the number and the area of the region contained in the input image type,
The number of areas included in the pre-Symbol processing the input image obtained in step, a first retrieval step of retrieving the registered image number of regions matches the first search candidates from the storage means,
The position and size of the region similar to the position and size of each region included in the input image obtained in the processing step from the registered image that is the first search candidate. A second search step of searching for a registered image as a second search candidate;
Character recognition results obtained by character recognition of an area in which the area type in the input image is text for the registered image that is the second search candidate, and the area type in the target registered image The image feature amount extracted from the region in which the region type in the input image is calculated while calculating the similarity with respect to the text by comparing the character recognition result obtained by character recognition of the region in which the character is text Is compared with the image feature amount extracted from the region whose image type is the image in the target registered image, and the similarity regarding the image is calculated, and the similarity regarding the calculated text and the image A total weight is obtained by adding a predetermined weight to the degree of similarity to determine the total similarity, and based on the obtained total similarity, the first A third search step of determining image similar to the input image from among the registered images that have searched candidate,
An image processing method comprising:

The image feature amount extracted from the area whose area type is an image is a ratio of the number of black pixels to the total number of pixels in the image area, which is calculated by binarizing the area corresponding to the image. The image processing method according to claim 6 .

If the registered image determined to be similar to the input image in the third search step does not exist, the input image is stored in the storage unit as a new registered image, while the input is performed in the third search step. The image processing method according to claim 6, further comprising a registration step of not storing the input image as the registered image in the storage unit when the registered image determined to be similar to an image exists.

The image processing method according to claim 6, wherein in the weighting, a weight with respect to the similarity with respect to the image is larger than a weight with respect to the similarity with respect to the text.

In the second search step, the similarity weight related to the position and size of the region where the region type is an image is set larger than the similarity weight related to the position and size of the region where the region type is text. The image processing method according to claim 6, wherein the second search candidate is searched.

In order to search for a registered image that approximates or matches the input image input from a plurality of registered images registered in advance,
Storage means for the the number of the area obtained by for the registered image image to perform region segmentation process and the position of each region and the size of each region and the region type of each region, stored along with the reference image,
Wherein by executing the input image for a region division processing, processing means for obtaining the size and area of each region of the position and the area of the number and the area of the region contained in the input image type,
First searching means for searching a number of areas contained within the resulting the input image by the pre-Symbol processing means, the registered image number of regions matches the first search candidates from the storage means,
The position and size of a region similar to the position and size of each region included in the input image obtained by the processing means from the registered images that are the first search candidates. Second search means for searching for registered images as second search candidates;
Character recognition results obtained by character recognition of an area in which the area type in the input image is text for the registered image that is the second search candidate, and the area type in the target registered image The image feature amount extracted from the region in which the region type in the input image is calculated while calculating the similarity with respect to the text by comparing the character recognition result obtained by character recognition of the region in which the character is text Is compared with the image feature amount extracted from the region whose image type is the image in the target registered image, and the similarity regarding the image is calculated, and the similarity regarding the calculated text and the image A total weight is obtained by adding a predetermined weight to the degree of similarity to determine the total similarity, and based on the obtained total similarity, the first Third search means for determining image similar to the input image from among the registered images that have searched candidate,
A recording medium that records a program that functions as a computer.

The image feature amount extracted from the area whose area type is an image is a ratio of the number of black pixels to the total number of pixels in the image area, which is calculated by binarizing the area corresponding to the image. The recording medium according to claim 11 .

The program stores the input image as a new registered image in the storage unit when the registered image determined to be similar to the input image by the third search unit does not exist. And a program for causing the input image to function as a registration unit that is not stored in the storage unit as the registration image when there is the registration image determined to be similar to the input image by a third search unit. Item 12. The recording medium according to Item 11.

The recording medium according to claim 11, wherein, in the weighting, a weight for the similarity regarding the image is larger than a weight for the similarity regarding the text.

The second search means sets the similarity weight related to the position and size of the area where the area type is an image to be larger than the similarity weight related to the position and size of the area where the area type is text. The recording medium according to claim 11, wherein the second search candidate is searched.