JP4140221B2

JP4140221B2 - Image collation device and image collation program

Info

Publication number: JP4140221B2
Application number: JP2001284095A
Authority: JP
Inventors: 景則長尾
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2001-09-18
Filing date: 2001-09-18
Publication date: 2008-08-27
Anticipated expiration: 2021-09-18
Also published as: JP2003091730A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像データ同士を照合する画像照合装置に関する。
【０００２】
【従来の技術】
近年、記憶装置が大容量化し、ネットワークが高速化してきており、これに伴って、文書データや画像データをデータベースに登録して複数の利用者で共有する、文書管理システムが普及しつつある。このような文書管理システムを用いると、他の利用者が作成した文書データや画像データを別の利用者が閲覧、流用することができる。
【０００３】
文書管理システムの効率的な運用のためには、蓄積されている大量のデータの中から閲覧や流用の対象となる文書データ等が迅速に見つけ出せるよう、検索機能の充実が望まれている。そこで従来の文書管理システムでは、文書データや画像データをデータベースに登録する際に、登録者やデータベースの管理者が検索のためにキーワードや所定の分類コードを関連づけて登録するなど、検索に配慮した登録を行う必要があって管理者等の負担が大きくなる。
【０００４】
また、すでに膨大な文書データ等が蓄積されているような場合、またはデータベースシステムの交換などデータの再登録作業を要する場合、改めて全文書データや画像データに対してキーワード等を入力するのは現実的でない。
【０００５】
文書データならば、データベースへの登録処理を自動化するために、システムが文書データからキーワードを抽出する技術や、いわゆる全文検索技術などがある。しかし、紙文書をスキャナ等で読み込んだ文書画像データにはこれらの技術をそのまま適用することはできない。このような画像データに対しては、いわゆるＯＣＲ（Optical Character Recognition）技術を利用して画像データ内に含まれているテキスト部分をテキストデータ化し、このテキストデータに対して上記の全文検索等の処理を行なうことになる。
【０００６】
【発明が解決しようとする課題】
しかしながら、上記従来の文書管理システムのように、画像データをＯＣＲにて読み込む場合、紙文書に書き込みや押印などのノイズがあったり、文書のレイアウトが複雑である場合は文字認識精度が大幅に低下し、全文検索等の用途に供するには実用的でない。
【０００７】
具体的に通常の企業活動では、社内で回覧されてきた紙文書内の記述（例えばグラフなど）を流用して新たな書類を作成したいというような場合がある。この場合、回覧中になされた手書きの書き込みや押印がなければスキャナにより流用したい部分をデータ化して取り込むことも容易であるが、通常これらが存在するために、取り込みが困難になっていることも多い。そこで文書管理システムのデータベースから原本の画像データを取り出したい、という要望が生まれるのであるが、上記従来の文書管理システムでは、全文検索技術等を利用しているので、テキストデータの存在が前提であるうえ、テキストデータがあっても原本が画像データであるときには、レイアウトが複雑であったりすると（グラフなどではレイアウトは極めて複雑になる）、上述のようにＯＣＲの精度が落ち、テキストデータが正しく取り出されず、その結果、検索用のテキストデータが誤ったものとなり、全文検索の際に見落とされてしまうのである。
【０００８】
そこで、画像データ同士の比較により、所望の画像データの検索を行う技術（類似画像検索技術）が、例えば「コンピュータ画像処理：応用実践編３，総研出版，pp.187-227，1992」で紹介されている。しかし、この技術は自然画のように自由度の大きい画像データには適しているが、構造的に非常に類似した文書を識別するのは難しく、逆に識別能を高めると文書上の書き込みや押印等のノイズに対する耐性が大きく損なわれることが知られている。
【０００９】
また、特開2000-112995号公報には、罫線や線画などの情報のみを用いて画像データの照合を行う技術が開示されている。しかしこの技術ではテキストのみから構成されている文書の照合を行なうことができない。さらに特開平7-282088号公報には、単語長のパターンを照合してテキストを含んだ画像データ同士を照合する技術が開示されている。この技術は単語間が分離しているような言語、例えば英語には適しているが、単語間が必ずしも分離していない日本語にはＯＣＲを用いる必要があり、結局テキストデータを正しく抽出できないような場合には適用できない。
【００１０】
本発明は上記実情に鑑みて為されたもので、書き込みや押印等のノイズがあっても安定した結果を得ることができ、ＯＣＲ等テキストデータを抽出する必要がなく、テキストのみからなる文書データ同士をも照合することのできる画像照合装置を提供することを目的とする。
【００１１】
【課題を解決するための手段】
上記従来例の問題点を解決するための本発明は、画像照合装置であって、処理対象となった画像データに対し、複数の所定経路の各々について、経路上の画素値を累算する演算手段と、前記各経路上の画素値を累算した結果を経路の演算順に含んでなる累算値系列を用い、複数の画像データ間の照合処理を行う手段と、を含むことを特徴としている。
【００１２】
ここで前記経路は、互いに直交する２つの軸方向のそれぞれに沿って複数配列され、各軸方向ごとに累算値系列が演算され、前記照合処理を行う手段が、各軸方向ごとの累算値系列の比較により照合の処理を行うことも好ましい。さらに、前記照合処理を行う手段が、照合の対象となった各画像データについての累算値系列間の相似性に基づき、照合を行うこととするのも好ましい。
【００１３】
また、上記従来例の問題点を解決するための本発明は、前記相似性を検出するために、照合の対象となった各画像データについて、それぞれの累算値系列内での累算値の変化が特徴的となる系列上の位置を表す、系列位置情報を生成し、当該各累算値系列に対する系列位置情報を比較して、少なくとも平行移動量を含んでなる相似変換パラメータを演算し、当該相似変換パラメータを用いて、各累算値系列の相似性を検出することを特徴としている。ここで、相似変換パラメータには、さらに一次変換パラメータを含む。すなわち、相似変換パラメータは、アフィン変換の際に用いられる一次変換係数及び平行移動の量である。また、前記累算値の変化が特徴的となる系列位置情報は、累算値のピーク位置のリストであることが好ましい。
【００１４】
さらに、前記相似変換パラメータを決定するために、照合の対象となった各画像データのうち、相異なる画像データについてのピーク位置のリストを参照し、各ピーク位置同士の組み合わせについて相似変換パラメータを推定し、前記組み合わせごとに推定された相似変換パラメータを含んでなる推定相似変換パラメータ群を生成し、当該推定相似変換パラメータ群に含まれる各推定相似変換パラメータに基づき、尤度の高いパラメータとして前記相似変換パラメータを決定することが好ましい。
【００１５】
すなわち本発明による画像照合装置では、原本である画像データを検索するために、検索対象となる画像を入力し、該入力した画像の水平・垂直方向の少なくとも一方への投影波形を形成し、該水平または垂直方向の投影波形から特徴量を抽出し、前記入力した画像の投影波形、もしくは照合対象となる基準画像（原本の候補となる画像データ群）の投影波形の少なくとも一方に対して、相似変換処理を行なった変換投影波形を生成し、入力した画像の投影波形から抽出された前記特徴量と、前記基準画像の投影波形から抽出された特徴量との対応関係から、前記投影変換手段で適用する変換パラメータを求め、前記入力した画像の投影波形またはその変換投影波形と、前記基準画像の投影波形またはその変換投影波形との類似度を測定する。
【００１６】
具体的に上記抽出される特徴量は、投影波形の局所的ピークであり、該投影波形において検出された全ての局所的ピークの中から、所定の条件を満たすピークの位置のみを選別してピークリスト（系列位置情報）として取り出したものである。
【００１７】
さらに変換パラメータを得るための処理は次のようになる。すなわち、前記入力画像のピークリストにおける局所的ピーク位置と、照合対象となる基準画像のピークリストにおける局所的ピーク位置との対応付けを全てのピーク位置の組み合わせについて行ない、最も尤度の高いパラメータを決定するために、（スケーリング補正量）−（平行移動量）平面において該直線上への投票処理を行ない、該投票処理によって前記（スケーリング補正量）−（平行移動量）平面上に形成された投票値のピーク座標を検出し、該ピーク座標に対応するスケーリング補正量及び平行移動量を見いだす。これは、上記の対応付けにおいて対応付けられた２つのピーク位置を一致させるために必要なスケーリング補正量と平行移動量が直線関係を為すことに基づく。
【００１８】
これにより、画像上の所定経路に対する投影波形を形成し、比較に係る各画像データの投影波形を利用して照合するので、文書への押印、書き込み等があっても、大域的な投影波形の変化状態（波形そのものや微分波形間の相関係数）は一致したままとなり、かかるノイズ成分が類似度測定に与える影響は限定的となる。また、所定経路を水平・垂直方向への各経路とすることで、特に水平・垂直成分を多く含む文書画像に対して高い照合能力を発揮する。
【００１９】
また入力画像の投影波形とデータベース内画像の投影波形との照合に先立って、両波形が最も良く一致するようにアフィン変換による投影波形の相似変換処理を行なうため、入力画像の拡大率の違いや位置ずれを許容することができる。ここで相似変換のためのパラメータ（相似変換パラメータ）は、投票処理、換言すれば多数決原理に基づいて尤度の高いものが求められるため、局所的ピークの検出時に少数の誤検出や検出漏れがあっても補正量は正当なものを求めることができる。
そして、本発明に係る装置は、処理対象となった画像データ及び前記照合の対象となった各画像データについて、それぞれ前記累算値系列の重心位置を計算する手段、を更に備え、前記照合処理を行う手段は、前記処理対象となった画像データの累算値系列の重心位置と、前記照合の対象となった画像データの累算値系列の重心位置と、に基づき、前記相似変換パラメータ中の前記平行移動量の推定存在範囲を限定し、前記推定相似変換パラメータ群から前記平行移動量の推定存在範囲のなかで前記尤度の高いパラメータを前記相似変換パラメータとして決定する。
【００２０】
また、参考例の画像照合方法は、処理対象となった画像データに対し、複数の所定経路の各々について、経路上の画素値を累算する工程と、前記各経路上の画素値を累算した結果を経路の演算順に含んでなる累算値系列を用い、複数の画像データ間の照合処理を行う工程と、を含むことを特徴としている。
【００２１】
さらに、上記従来例の問題点を解決するための本発明は、画像照合プログラムであって、処理対象となった画像データに対し、複数の所定経路の各々について、経路上の画素値を累算する工程と、前記各経路上の画素値を累算した結果を経路の演算順に含んでなる累算値系列を用い、複数の画像データ間の照合処理を行う工程と、をコンピュータに実行させることを特徴としている。
【００２２】
【発明の実施の形態】
本発明の実施の形態について図面を参照しながら説明する。まず、本発明の実施の形態に係る画像照合装置がいかなる原理に基づき画像データ間の照合を行うかについて概括的に説明する。
【００２３】
本実施の形態の画像照合装置は、いわば画像データ自体をキーとして、類似する画像データをデータベースから検索するもので、キーとなる画像データは、スキャナ等の画像入力手段によって読み取られ、ラスタ画像データに変換される。そして、このラスタ画像データに対し、水平方向又は垂直方向の少なくとも一方に対する投影波形を形成する。例えば読み込まれる文書が縦書きであるか横書きであるかが予めわかっており、当該方向の投影波形のみで十分個々の文書の特徴を表すことができることがわかっている場合は、一方向のみへの投影を形成するだけでも良いが、画像データの特徴が不明瞭な場合は両方向への投影波形を形成するのが好ましい。
【００２４】
ここで投影とは、ラスタ形式の画像データに対し、ある経路に沿って、当該経路上の画素の画素値を累算する処理をいい、当該経路が複数ある場合に、その累算した結果の数列が投影波形をなす。コンピュータ処理による場合には画素値の累算結果が経路ごとに得られ、この累算結果の組（累算値系列）により投影波形をディジタル表現することになる。この場合、現実的には投影波形の形成処理前に公知の傾き補正技術を利用して読み取り時の軽微な傾きを補正しておくことが好ましい。
【００２５】
また、データベース内の画像データに対しては、事前に投影波形を作成しておき、画像データと関連づけておくことが好ましい。ここで登録されたデータがスキャナ等によって読み込まれた画像データの場合はそのまま投影波形を生成することができるが、ワープロ等で作成したテキストデータである場合は、これをラスタライズして、画像データに変換し、その後に投影波形の生成を行なう。
【００２６】
なお、投影波形は垂直、水平方向のみならず、任意の複数経路上で画素値を累算するものであって構わないが、比較を行うために、各経路は事前に決定しておく必要があり、また累算を行う順序（各経路の順序、すなわち累算値の演算の順序）を同じにしておく必要がある。累算値系列は順列で比較されるものだからである。
【００２７】
このようにして、キーとなる画像データの投影波形とデータベース内の画像データとの投影波形が求められ、これらの投影波形の類似度を演算して、類似度の高いものを検索結果としてユーザに提示する。
【００２８】
本実施の形態において特徴的なことは、キーとなる画像データに一致する画像データがある場合でも、両画像データの縮尺が変化していたり、読み取りを行う際に異なる解像度で読み取られる可能性があり、投影波形をそのまま比較しても類似度が適切な値とならない場合に配慮して各投影波形に基づき、スケーリングのパラメータを求め、スケーリングの処理を行うことである。また、画像データの読み取りの際の都合によっては画像全体が平行移動した状態で読み取られたり、上下反転したり、縦横が置き換えられたり、傾きをもって読み取られたりする場合もある。このような場合を想定して、投影波形に対し、アフィン変換を施して比較する。これにより相似度によって比較することとなり、検索の正確性を向上できる。
【００２９】
ここでアフィン変換の対象となる投影波形は、データベース内の画像データに関するものであってもよいし、キーとなる画像データであってもよい。データベース内の画像データに対してアフィン変換を行う場合、データベース内の画像データに対し、投影波形を事前に演算してあれば、キーとなる画像データを読み取るスキャナの解像度などに合わせて、この事前に演算して得た投影波形を所定のスケール（既定スケール）にスケーリングした、いわば規格化した投影波形を用意しておくことも好ましい。
【００３０】
次に、このアフィン変換のためのパラメータ（相似変換パラメータ）を求める方法について概説する。本実施の形態では、相似変換パラメータを求めるために、各投影波形の特徴的部分を抽出し、その位置関係を比較して相似変換パラメータを得る。ここで特徴的部分としては、極値（局所的ピーク）、変曲点などの微分波形特徴量であってもよいし、最大値や、所定値となる位置、重心位置、ピークの幅など投影波形自体から得られる特徴量であってもよい。データベース内の画像データに対しては、これらの特徴量を事前に抽出し、対応する画像データに関連づけておくことも好ましい。
【００３１】
例えば各投影波形の局所的ピークが求められた場合、この局所的ピークの位置を求める。通常投影波形内に複数の局所的ピークがあるので、この位置は、位置のリストとなる。この位置のリスト（ピークリスト）が本発明の「系列位置情報」に相当するものとなる。
【００３２】
そしてキーとなる画像データの投影波形に関するピークリストと、データベース内の画像データの投影波形に関するピークリストとを比較して相似変換パラメータを求める。この比較処理においては、各ピーク位置を対応付けして相似変換パラメータを求めるのであるが、例えば図１に概略を示すように、比較される２つの投影波形にそれぞれ３つのピーク（Ａ，Ｂ，Ｃ及びα，β，γ）があり、２つの画像データが同じ画像データである場合に、Ａとα、Ｂとβとを一致させる（正しい）対応付けを行った場合と、Ａとβ、Ｂとγを一致させる（誤った）対応付けをした場合とでは次のように事情が異なる。なお、以下の説明では簡単のため、比較される両画像データの縮尺は一致しているものとする（すなわちスケーリングについては考えない）。なお、図１では説明のため、ピーク以外を平坦で示した。
【００３３】
すなわち、Ａとαとの関係により求められる平行移動量と、Ｂとβとの関係により求められる平行移動量とはほぼ一致するのに対し、Ａとβとの関係により求められる平行移動量と、Ｂとγとの関係により求められる平行移動量は一般に異なるのである。
【００３４】
そこで、すべての対応付けを総当たり的に試行し、各結果同士を比較して多数決原理により最も尤もらしい値（最尤値）を求めることにより、適切な値が求められることになる。
【００３５】
スケーリング補正を要する場合、それぞれのピークリスト内のピーク位置の対応関係からは、スケーリング量ｋと平行移動量ｓとの関係が求められる。すなわち、一方の画像データに関連するピーク位置ｘiと、他方の画像データに関連するピーク位置ξiとを対応づける場合、次の（１）式の関係が得られる。
【００３６】
【数１】

【００３７】
そして各ピーク位置の総当たり的に対応付けて、（１）式による直線をｋ−ｓ平面（（スケーリング補正量）−（平行移動量）平面）に複数描画すると、理想的には多くの直線が交差する座標が求められ、この座標値（ｋ，ｓ）として正しいスケーリング補正量が求められることとなる。ここで「理想的には」というのは、演算の誤差等の要因により、各直線から求められるｋやｓは必ずしも一致しないからである。そこで多数決原理により最尤値を選んでｋ，ｓを決定することになる。このように正しく対応付けがされた場合に、多数のパラメータの候補がある値近傍に集まることに着目し、多数決原理を利用して最尤値を求めているのである。同様にして、例えば一方の画像データの水平方向に対する投影波形のピークリストと、他方の画像データの垂直方向に対する投影波形のピークリストとが正しく対応づけられる（つまりより多数のパラメータ候補があるパラメータ値近傍に集まる）ならば９０度回転したものと判別することができ、この場合に一方のピークリストの順序を入れ替える（要するに一方の投影波形の向きを逆転する）とよりよく対応付けられるとするならば、反転しているものと判別することができるなど、回転、反転、スケーリングなどを含んだ一次変換パラメータを同様の処理によって求めることができる。なお、以下では、説明を簡約にするため、スケーリング補正と平行移動とに限って説明をする。
【００３８】
また、ここまでの説明ではキーとなる画像データの投影波形と、データベース内の画像データに対する投影波形とを比較するとしたが、それぞれの波形を微分して得た、微分波形同士を比較しても良い。キーとなる画像データとして複写文書を用いる場合、画像データに写真やドローイングのような中間調領域を含んでいると、その濃度値は大きく変化している場合が多く、その投影波高値も安定しない場合が多い。このような場合でも投影波形の凹凸の様子は多くの場合維持されているので、比較の対象として微分投影波形を用いる方が好ましい場合も多いからである。
【００３９】
さらに、ここでは投影波形からピークリストを作成するにあたり、投影波形そのものを用いているが、文字列から生成される投影波形には高周波成分が多く、また書き込みや押印などによるノイズ成分も多く含んでいる一方、相似変換パラメータの推定の際には、行のピッチや各行の長さなどの情報を用いれば十分であり、これら行のピッチなどの形状は投影波形の低域成分に反映される。そこで、ノイズ除去や局所的ピーク検出処理の安定化の観点から、投影波形の低域濾波処理を行う。さらに、低域濾波された投影波形にも微小な凹凸が残っているために、数多くの局所的ピークが検出され、演算量が大きくなるので、相似変換パラメータの推定に必要となる大局的な凹凸を表す局所的ピークの位置のみをピークリストに含めることが好ましい。
【００４０】
そこで以下、大局的な凹凸を表す局所的ピークのみを検出する方法について概説する。図２は投影波形の大局的な凹凸の様子を検出するための局所的ピーク検出法を説明する図である。図２において、ｐ（x）は低域濾波後の投影波形の位置ｘにおける波高値、αは所定のウインドウ幅である。局所的ピーク位置では投影波形の傾きがゼロになるが、これを微分投影波形のゼロ交叉点として求めるかわりに、ウインドウの両端位置での投影波高値を結ぶ直線の傾きにより求める。位置Ｘaにおけるこの直線の傾きは次の（２）式のＳ（Ｘa）に比例する。
【００４１】
【数２】

【００４２】
このＳ（Ｘa）の正負が変化する点が局所的ピーク位置である。ここで当該点が正から負へ変化する凸ピークであるか、負から正へ変化する凹ピークであるかを表す情報をピーク属性情報としてピークリストとともに取得しておくことが好ましい。このように注目位置から一定の距離αだけ離れた点での投影波高値を参照することにより、投影波形の微小な凹凸に影響されることなく、投影波形の大局的な凹凸の様子を表すピーク候補を検出することができる。
【００４３】
そしてこれらピーク候補から、相似変換パラメータの検出に有効なピークのみを選別する。相似変換パラメータは投影波形の大局的な凹凸を対応付けることによって求められるため、微小なピークは取り除いた方がよい。
【００４４】
図３は局所的ピークを選別する手段の動作原理を示す図である。図３では図２に示した方法により得られた局所的ピークの候補の位置（ｘ1〜ｘ5までの、計５点）が検出されている。この図３の例では、例えばｘ2やｘ3における局所的ピークが微小な凹凸として除外したいピーク候補である。
【００４５】
微小な局所的ピークを除外するために、本実施の形態では、以下のような処理を行なう。すなわち、ある局所的ピーク位置Ｘnについて、その位置での投影波高値と、隣接する局所的ピーク位置での波高値との差に関する量Ｄnを次の（３）式のように求める。
【００４６】
【数３】

【００４７】
また、位置ｘ1やｘ5のような片側にのみ隣接する局所的ピークを持つものに関しては、その局所的ピークの波高値との差に対する絶対値をＤnとする。
【００４８】
このようにして求めたＤnと所定のしきい値Ｔdとが次の（４）式の条件を満たす場合に、位置Ｘnにおける局所的ピークを実際に比較対照となるピークとしてピークリストに加える。
【００４９】
【数４】

【００５０】
また、隣接する局所的ピーク位置における投影波高値の差ではなく、（５）式に示すような波高値の比Ｒnを用いてピークの選別を行なってもよい。
【００５１】
【数５】

【００５２】
そして、この（５）式により求めたＲnと所定のしきい値Ｔrとが（６）式の条件を満たす場合に、位置Ｘnにおける局所的ピークをピークリストに加える。
【００５３】
【数６】

【００５４】
さらにピークリスト間の比較に補助的に資する目的で、ピーク属性情報として、投影波形の幅と重心を求めておくことも好ましい。これらの情報は変換パラメータ検出に必須のものではないが、後述するように変換パラメータの検出の高速化や信頼性の向上に役立てることができる。
【００５５】
ここでピークの幅を求める場合、投影波形にノイズか含まれておらず、また紙の地色等によるオフセット分も含まれていない場合は、投影波形の非ゼロ区間の幅を以って投影波形の幅とすれば良い。しかし、このような理想的な例は極めて希であるため、実際には投影幅を求めるためには次のような処理を行なう必要がある。
【００５６】
図４は投影波形の幅を検出するための処理を説明する図である。実際の投影波形には図４のＡ，Ｂの領域のようにノイズや紙の地色等によるオフセット分（地色により画素が部分的に「白」でなくなっているために発生する非ゼロ領域）が含まれている。そこでこれらの成分を除去するためにしきい値処理を行なう。ただし、しきい値付近の波高値が多く存在する場合は、しきい値を越えるか否かの微妙な違いが検出される投影波形の幅に大きく影響するため、図４の例では投影波形に対して２つの相異なるしきい値、ＴhとＴlを設定する。これらのうち低い方のしきい値Ｔlは、ノイズやオフセット分のほとんどがこの値を下回るように設定され、高い方のしきい値Ｔhは本来の投影波形のほとんどがこの値を上回るように設定される。
【００５７】
このようなしきい値を設定した上で、投影波形の幅Ｗを次の（７）式により求める。
【００５８】
【数７】

【００５９】
ここで、ｗ（・）は、図５に示すように、０＜ｘ＜Ｔlであるようなｘで「０」、Ｔh＜ｘとなるようなｘで「１」となり、Ｔl＜ｘ＜Ｔhでは、「０」から「１」まで線形に変化する関数である。また、Ｐは投影波形全体を表し、従って（７）式では、投影波形全体に亘って総和することを意味する。このようにしきい値Ｔh、Ｔl間になだらかな勾配を持つしきい値処理関数を設けることにより、単一しきい値を用いた時に問題となるしきい値付近の投影波形が多く存在する場合でも、検出される投影波形の幅に大きく差が生じないようにすることができる。
【００６０】
一方、投影波形の重心位置Ｇは次の（８）式から求められる。
【００６１】
【数８】

【００６２】
こうして求められたピークリスト及びピーク属性情報を利用して相似変換パラメータを求める。次に具体的に相似変換パラメータを求める手続きについて概説する。まずキーとなる画像データと、データベース内の比較対象となる画像データとのそれぞれのピークリストを参照し、ピーク位置の対応付けを行なう。ピーク位置の対応付けはピーク属性情報を参照して凸ピークと凹ピークについて別々に行なう。次に対応付けられた２つの局所的ピーク位置を一致させるために必要なスケーリング補正量と平行移動量が直線関係を為すことに基いて、（スケーリング補正量）−（平行移動量）平面において該直線上への投票処理を行なう。このような投票処理を全ての局所的ピーク位置の組合わせについて行なうと、（スケーリング補正量）−（平行移動量）平面上には投票値の累積値の分布が形成される。（スケーリング補正量）−（平行移動量）平面上において、この累積投票値が大きくなる点は、２つの局所的ピークリストが良く一致することを示しているため、その点の座標（スケーリング補正量，平行移動量）が求める相似変換パラメータとなる。
【００６３】
すなわち、図６に示すように、投影波形に対する局所的凸ピークリスト（Ａ1〜Ａ3）と、キーとなる画像データの投影波形から求められた局所的凸ピークリスト（Ｂ1〜Ｂ4）がある場合、上記２つの局所的ピークリストについてピーク位置の対応付けを行なう。例えば局所的ピークＡ1とＢ1の対応付けを行なう場合、両者のピーク位置が一致するために必要なスケーリング補正量をｋ、平行移動量をｓとすると、Ａ1の座標「１０」と、Ｂ1の座標「１９」とを用いて、次の（９）式が成り立つ。
【００６４】
【数９】

【００６５】
この（９）式はｋとｓに関する一次式であるので、これに基づきｋ−ｓ平面、即ち（スケーリング補正量）−（平行移動量）平面において一本の直線を描画できる。（９）式の条件に拘束されるｋ，ｓを用いれば、比較に係る両画像データは、少なくとも局所的ピーク位置Ａ1とＢ1では重ね合わせることができる。そこで、補正量の候補としてこの直線上に「一票を投じる」。すなわち、この直線上の各点に投票を行う。後にこの投票数の多い点を最尤点として求めたいのである。
【００６６】
また同様にして図６に示した局所的ピークＡ1を他のＢ2〜Ｂ4にも対応付けることにより、（スケーリング補正量）−（平行移動量）平面において、Ａ1に関しては都合４本の直線上への投票を行なうことになる。一般にあるデータベース内の画像データの投影波形から求められた局所的ピークリストにおけるｉ番目のピークの位置をＡix、キーとなる画像データの投影波形に対する局所的ピークリストにおけるｊ番目のピークの位置をＢjxとすると、両者のピーク位置が一致するために必要なスケーリング補正量ｋ、および平行移動量ｓの関係は（１）式と同様に、（１０）式の関係になる。
【００６７】
【数１０】

【００６８】
そして同様の投票処理を残りの局所的ピークＡ2およびＡ3についても行なうことにより、（スケーリング補正量）−（平行移動量）平面上には投票値の累積値の分布が形成される。
【００６９】
尚、投票値は上の例のように常に１票を投じる他にも、局所的ピークの波高値情報を反映した投票値を用いることもできる。局所的ピークについては、その波高値の大きなもの同士、あるいは小さなもの同士を対応付けるのが自然であるから、投票値も波高値に応じて変化させる。すなわち投影波形から求められた局所的ピークリストにおけるｉ番目のピークの波高値をＡis、キーとなる画像データの投影波形に対する局所的ピークリストにおけるｊ番目のピークの波高値をＢjsとして、例えば投票値Ｖを次の（１１）式のように定める。
【００７０】
【数１１】

【００７１】
なお、（１１）式の投票値を適用するためには、局所的ピークの波高値の最大値が所定の値になるようにするなどの正規化処理を前もって行なっておく必要がある。
【００７２】
このようにして（スケーリング補正量）−（平行移動量）平面上に形成された累積投票値は、２つの局所的ピークリストの一致度を反映している。従って、同平面上に形成された累積投票値のピーク点を探索すれば、その点の座標が求める（スケーリング補正量，平行移動量）を表す。この補正量を用いて一方の投影波形に補正を加えることにより、２つの投影波形が最も良く一致することになる。図６の例では（スケーリング補正量，平行移動量）＝（０．９，１０）において（累積投票値）＝３の最大累積投票値を持つことから、登録文書の投影波形を０．９倍し、さらに１０だけ平行移動することにより、２つの投影波形が最も良く一致する。
【００７３】
ところで、上記の補正量を適用すると、被照合文書側の局所的ピークＢ3のみがデータベース内の画像データに対応する局所的ピークを持たないことがわかる。このことから、局所的ピークＢ3は、文書上への書き込みや押印等のノイズによるピーク成分であることと推測される。本実施の形態の相似変換パラメータの検出手続きでは、このようなノイズによる局所的ピークの発生、あるいは検出漏れがあっても、投票処理という多数決原理に基づいて相似変換パラメータが推定されるため、推定される補正量の精度の低下を防止できる。
【００７４】
以上の説明では、（スケーリング補正量）−（平行移動量）平面上に形成された累積投票値のピーク点を探索することにより相似変換パラメータの検出を行なったが、これは「理想的」な例であり、実際には局所的ピーク検出時の誤差や、元々の文書の印刷精度などにより、投票を行なう直線群が厳密に一点で交わることは希である。そこで、相似変換パラメータを推定する手続きとして、（スケーリング補正量）−（平行移動量）平面を、図７のようにスケーリング補正量方向にｎ分割、平行移動量方向にｍ分割したセルに「量子化」して、各セルの投票値の累計を計算し、該累計が最大となるセルを求める方法も考えられる。そして当該累計が最大となったセルの中心座標（ｋc，ｓc）を以って相似変換パラメータの最尤値とする。このようにするとコンピュータ処理にも適合できる。
【００７５】
さらに精度を高めるためには、図８に示すように、投票値の累計が最大となるセルを求めた後、投票を行なう直線群の中から該セルを通過する直線群のみを選び、それら直線群から最短距離に位置する点の座標を求め、当該座標を最尤値とする。このような点の座標は以下の手順により求めることができる。
【００７６】
まず投票値の累計が最大となるセルを通過する直線群がｈ本あるとして、それらが（１２）式で表されるとする。
【００７７】
【数１２】

【００７８】
ただしｉ＝０，１，…，ｈ−１とする。これらの直線の一つと（スケーリング補正量）−（平行移動量）平面上の一点（ｋ，ｓ）との距離ｄiの間には次の（１３）式の関係がある。
【００７９】
【数１３】

【００８０】
点（ｋ，ｓ）から各直線までの距離の２乗和Ｄは、次の（１４）式で示される。
【００８１】
【数１４】

【００８２】
従って、投票値の累計が最大となるセルを通過する直線群から最短距離に位置する点の座標（ｋl，ｓl）は、次の（１５）、（１６）式を連立して解くことによって求めることができる。
【００８３】
【数１５】

【００８４】
さらに、直線群から最短距離に位置する点の座標を求める際に、各直線からの距離を（１１）式に定めた投票値Ｖで重み付けしても良い。投票値Ｖによる重み付けを行なうことにより、局所的ピークの対応付けが正しく行われている可能性が高い直線を重視した補正量検出が行なわれる。この場合、（１４）式は、（１７）式のように変更される。
【００８５】
【数１６】

【００８６】
（１７）式において、Ｖiはｉ番目の直線上への投票値である。またピーク属性情報のうち、投影波形の幅と重心の情報は相似変換パラメータの推定における処理の高速化や信頼性の向上に役立てることができる。
【００８７】
すなわち、一般に投票処理によるパラメータ推定では、不適切な位置に比較的大きなピークを生じる、いわゆるフォールスピーク（false peak）問題を有している。この効果を軽減するためには、投票空間全体への投票を行なうのではなく、明らかに必要ないと思われる領域への投票を行なわないことが有効である。本実施形態においては投影波形の幅と重心の情報を用いて投票領域を限定することにより、変換パラメータ検出処理を高速化すると同時に、上記問題による効果も軽減することができる。
【００８８】
具体的に投影波形の幅を用いて、（スケーリング補正量）−（平行移動量）平面上におけるスケーリング補正量方向の投票領域を限定することができる。データベース内の画像データの投影波形幅をＷr、キーとなる画像データの投影波形幅をＷiとして次の（１８）式に示す投影幅比Ｒを求めると、相似変換パラメータにおけるスケーリング補正量はＲに近い値となるはずである。なぜなら２つの文書が同一の文書であるならば、それらの投影波形も相似形をなしているはずであり、スケーリングの相違は投影幅の比となるはずだからである。
【００８９】
【数１７】

【００９０】
投影波形の幅はノイズや紙の地色等によるオフセット分などに起因する誤差分を含んでいるが、それでもなお、真の補正量、すなわち投票値のピーク位置のスケーリング座標は、（１８）式のＲに近い値であるはずである。そこで（１８）式のＲを中心とした所定のスケーリングマージンβを設定し、そのスケーリングマージンβの範囲内に投票領域を限定する。この場合、（スケーリング補正量）−（平行移動量）平面上に描画した各直線のうち、当該スケーリングマージンβの範囲内に「投票した」こととし、当該範囲の外に対しては「投票」しない。
【００９１】
また、投影波形の重心位置情報からも投票領域の限定を行なうことができる。すなわち、もし特徴抽出手段により求められた重心位置が正確なものであれば、２つの投影波形の重心位置間には（１０）式と同様に次の（１９）式の関係が成立するはずである。
【００９２】
【数１８】

【００９３】
ただし、（１９）式においてＧrとＧiはそれぞれ、データベース内の画像データの投影重心位置とキーとなる画像データの投影重心位置である。実際には求められた重心位置の情報にはノイズや紙の地色等によるオフセット分などに起因する誤差分を含んでいるが、それでもなお、真の補正量、すなわち投票値のピーク位置は（１９）式で表される直線の近傍に位置するはずである。そこで（１９）式の直線を中心として、適当な平行移動量マージンγを設定し、その範囲内に投票領域を限定する。
【００９４】
これにより、投票領域を図９に示した範囲に限定し、この領域外の直線上、あるいはそもそもこの領域を通過しない直線上への投票処理を省略し、処理を高速化するとともに、上記false peak問題による効果を低減できる。
【００９５】
なお、こうして求められた相似変換パラメータに基いて、投影波形の相似変換処理を行なうにあたり、非サンプリング位置での信号を要する場合など、補間処理を要するときには、広く知られている補間法として最近傍法、線形補間法、cubic convolution（３次畳み込み）法、スプライン補間法など様々なものを利用できる。本実施の形態の装置は特に特定の補間法に依存するものではないので、上に挙げた補間法のいずれを用いてもよい。上記の補間法は多くの文献で解説されているため、その説明を省略する。
【００９６】
次に、上述の技法により推定された相似変換パラメータを用いて相似変換した後の投影波形を比較する方法について概説する。理想的には、一方の投影波形を相似変換し、他方の投影波形と比較したとき、両画像データが一致する場合には、これらの投影波形は厳密に一致する。しかし、比較に係る両画像データが同一のものであっても、スキャナの読み取り精度や、キーとなる画像データに押印や書き込みが行われている場合、さらには相似変換パラメータの算出誤差など、様々な理由により両投影波形が厳密に一致することは少ない。また、キーとなる画像データとして複写した後の画像データを用いる場合、文書上にある写真やドローイングのような中間調領域の濃度値は大きく変化している場合が多く、その投影波高値も安定しない場合が多い。従ってノイズ等による投影波形の変形や相似変換パラメータの誤差、文書上の中間調領域に起因する投影波高値の変化などに対しても堅牢な方法で類似度の測定を行なう必要がある。
【００９７】
このような投影波形の照合方法として、本実施の形態では投影波形の微分波形間の相関係数を用いる。これは文書上にある中間調領域によって投影波形の一部にオフセットが加えられていても、その微分波形はほとんど影響を受けることがないことに着目したものである。投影波形の微分波形を生成する方法としては、本来の微分波形を用いる以外にも（２）式で示したＳ（ｘ）を用いることもできる。（２）式のＳ（ｘ）は投影波形上の微小な凹凸の影響を受けにくいため、投影波形照合にＳ（ｘ）を用いると良好な結果が得られることになる。
【００９８】
データベース内の画像データに対し、相似変換を行い、その相似変換後の微分投影波形をｑr（ｘ）、キーとなる画像データの微分投影波形をｑi（ｘ）とすると、両者の相関係数ｒは次の（２０）式のように求められる。
【００９９】
【数１９】

【０１００】
また、Ｌは波形の長さを表す。
【０１０１】
このようにして水平・垂直の両方向に投影波形を形成した場合、両方向の投影波形に対する相関係数がそれぞれ求められる。出力する画像データの類似度の情報としては、これら２つの相関係数を加味したものとしても良いし、どちらか一方の方向に対して照合処理を行なうだけで十分であることがわかっている場合は、その方向の相関係数を類似度として扱ってもよい。
【０１０２】
両方向の相関係数を加味して画像データの類似度を求める例としては、水平・垂直方向について求められた相関係数の和を類似度として出力するものがある。また別の例として、２方向の相関係数の荷重和を類似度とする方法もある。この場合、水平方向の相関係数をｒh、垂直方向の相関係数をｒvとし、水平方向の相関係数の重みをｗh、垂直方向の相関係数の重みを１−ｗhとすれば、出力すべき類似度は次の（２１）式のように表される。
【０１０３】
【数２０】

【０１０４】
どちらかの方向の相関係数が文書画像の違いをより強く反映している場合には、（２１）式によってその方向により大きな重み付けをして類似度の計算を行なうことにより、照合精度を向上させることができる。
【０１０５】
さらに、相似変換パラメータの推定により得られた補正量を類似度に反映させることもできる。これは文書のマージン幅やスケールが大きく異なる場合に別文書として扱いたい場合などに適する。この場合、水平・垂直方向の相関係数やスケーリング補正量、平行移動量などの荷重和により定義される合成変量Ｚを作り、大量の文書サンプルを用いた判別分析により各重みを調整することで、各文書を判別する判別関数を求めることができる。判別分析については多くの文献に解説されているため、詳細は割愛する。
【０１０６】
一方、両方向の相関係数を用いることなく、どちらか一方向の相関係数を類似度として扱う場合について説明する。一般に横書きの文書の場合、横方向の投影には明確な凹凸が現れるが、縦方向の投影には明確な凹凸が生じない場合が多い。このような場合、縦方向の投影波形の照合結果からは文書を特定するための有益な情報は得られないため、対象となる文書が縦書きであるか横書きであるかが予めわかっている場合には、一方向のみの相関係数を用いて類似度の測定を行なった方が好ましい。また、水平・垂直方向の局所的ピークリストに含まれるピーク数の多い方向のみを類似度測定に使用するのも有効な方法である。
【０１０７】
これによりキーとなった画像データに対し、最も高い類似度を示す、データベース内に登録された画像データがキーとなった画像データと同一の文書であると判定されることになる。
【０１０８】
なお、ここまでの説明では、相似変換パラメータの推定時に（スケーリング補正量）−（平行移動量）平面上に形成された累積投票値が最大値を取るピーク点を探索し、そのピーク点の座標を（スケーリング補正量，平行移動量）とする唯一の補正量として用いてきた。しかし、上記平面上では大きな累積投票値を持つ点が複数点形成されることも希ではなく、局所的ピーク検出の過程で多少の誤りが含まれている場合は、累積投票値において２位以下のピーク点に対応する補正量が正しい補正量となる場合もある。このような場合を想定して、上記平面に形成された累積投票値の中で、所定複数のピーク点を求め、それぞれのピーク点に対応する複数通りの補正量で投影波形の変換処理を行なう方法も考えられる。そして、上記複数通りの補正量を用いて変換処理を行なった投影波形を用いて複数通りの類似度を測定し、その最大類似度を最終的な類似度とすることで、より確実な文書画像照合処理を行なうことが可能となる。
【０１０９】
また、（スケーリング補正量）−（平行移動量）平面上に形成された累積投票値が２つの投影波形の一致度を反映しているという点に着目すれば、この投票値自体を２つの投影波形の類似度と見なすことができる場合もある。もしも局所的ピークの誤検出、あるいは検出漏れが無いことが保証できる場合は、上記投票値自体を類似度として使用し、実際の投影波形を相似変換する必要がなく、また変換後の投影波形などを比較する処理もする必要がない。
【０１１０】
以上に、本実施の形態の画像照合装置の動作原理を述べたので、以下、本発明の実施の形態に係る画像照合装置の実装例について説明する。本実施の形態の画像照合装置１は、一般的なパーソナルコンピュータを利用して実現でき、図１０に示すように制御部１１と、記憶部１２と、ストレージ１３と、外部記憶部１４と、スキャナ部１５と、表示部１６とから基本的に構成されている。
【０１１１】
ストレージ１３は、制御部１１によって実行されるオペレーティングシステム等の基本プログラムのほか、画像照合プログラムがインストールされたプログラム格納領域１３ａと、比較対象となる画像データが蓄積したデータ格納領域１３ｂとを含んでいる。
【０１１２】
制御部１１は、ストレージ１３に格納されているプログラムを実行して各種処理を行う。本実施の形態においては、この制御部１１が、画像照合プログラムに従い、本発明の演算手段、及び画像データの照合処理を行う手段を実現している。すなわち、この制御部１１は、キーとなる画像データがスキャナ部１５から入力されると、これを処理対象の画像データとして記憶部１２に格納し、事前に設定された複数の経路に沿って画素値を累算し、演算した経路の順に累算値系列として記憶部１２に格納する処理（演算処理）を実行する。また、制御部１１は、この累算値系列を利用してストレージ１３に蓄積されている画像データと、処理対象の画像データとを照合する処理を実行する。これらの処理については、後により詳しく述べる。
【０１１３】
記憶部１２は、制御部１１のワークメモリとして動作する。外部記憶部１４は、画像照合プログラムなどを格納したコンピュータ読み取り可能な記録媒体（例えばＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなど磁気的乃至光学的に情報を記録したもの）から当該画像照合プログラムなどを読み取って制御部１１に出力し、これによりストレージ１３にインストールさせる処理を行う。
【０１１４】
スキャナ部１５は、光学的に紙媒体の原稿を読み取り、これを画像データに変換して制御部１１に出力する。表示部１６は、制御部１１から入力される指示に従い、種々の情報を表示する。
【０１１５】
ここで制御部１１の画像照合プログラムの処理についてより詳しく説明する。なお、ストレージ１３に蓄積されている画像データについては、予め累算値系列の情報及びピーク位置テーブル（後にその演算方法を説明する）が関連づけられて格納されているものとする。また、以下の説明では累算値の演算の経路は水平方向及び垂直方向であるとし、順序は水平方向については紙面上側から下側へ、垂直方向については左側から右側へ走査した順序であるとする（図１１）。図１１では実線が演算経路を示し、破線が経路の演算順序を示すために用いられている。
【０１１６】
制御部１１は、スキャナ部１５からキーとなる画像データ（処理対象画像データ）の入力を受けてこれを記憶部１２に格納して図１２に示す処理を開始し、まず当該処理対象の画像データに対し図１１（ａ）に示すような水平方向に延びる複数の経路に沿って、当該経路上の画素値を累算し、各経路での累算結果を演算順に記憶部１２に格納する（Ｓ１）。同様に図１１（ｂ）に示したような垂直方向に延びる複数の経路に沿って、当該経路上の画素値を累算し、各経路での累算結果を演算順に記憶部１２に格納する（Ｓ２）。これら処理Ｓ１及びＳ２で累算した結果が本発明の累算値系列となる。ここでは第ｉ番目の演算順序の経路に対する累算値をｐiと表すこととする。
【０１１７】
この累算値を演算するにあたり、白黒画像データであれば、白の点の画素値を「０」、黒の点の画素値を「１」として累算すればよいし、中間調を含む多値画像データであれば、例えば白を「０」、黒を「２５５」等とすればよい。さらにカラー画像データの場合は、その濃度成分を分離して、該濃度画像を上記多値画像と見なして投影値を計算する。濃度成分の分離は、広く知られたカラー値の変換処理により行うことができる。
【０１１８】
制御部１１は、この累算値系列の情報に対し、その高周波成分を除去する低域濾波の処理を行って記憶部１２に格納する（Ｓ３）。この処理は、例えば隣接する経路の累算値の平均値を演算するなどの公知の処理によって行うことができる。そして低域濾波後の累算値系列について、所定のａ（ａは整数）の値を用い、（２）式に対応するＳi＝（ｐi-a）−（ｐi+a）についてｉを順次変化させながらその正負の変化点を見出し、変化点となったｉをピーク候補として取り出す。また、各変化点のｉの情報に、正から負へ変化した凸であるか、負から正へ変化した凹であるかを示すピーク属性の情報を関連づけて、記憶部１２に図１３に示すようなテーブルとして記憶する（Ｓ４）。そして（３）〜（６）式に例示したような処理により、隣接ピークとの大きさの差乃至比に基づき、顕著でないピーク候補を処理Ｓ４で生成したピーク候補のテーブルから削除し、ピーク位置テーブルを生成する（Ｓ５）。
【０１１９】
また、ここで制御部１１は、累算値系列の幅と重心とを（７）及び（８）式により求めて、ピーク位置テーブルに関連づけて記憶部１２に格納する（Ｓ６）。この処理については、既に概説したものを離散的に扱うことによりコンピュータ処理として実現できるので、詳しい説明を省略する。
【０１２０】
ここまでの処理により、記憶部１２には、ピーク位置テーブルとして上記の局所的ピークリスト、すなわち本発明の系列位置情報が格納されているようになる。
【０１２１】
次に、制御部１１はストレージ１３から画像データを順次取り出して（Ｓ７）、記憶部１２上に所定のｎ×ｍの配列を記憶する領域を確保して、この領域の値を「０」に初期化し（Ｓ８）、当該取り出した画像データに関連するピーク位置テーブルと、処理Ｓ６にて記憶部１２に格納したピーク位置テーブルとを比較して、各ピーク位置の対応付け処理を実行する（Ｓ９）。この処理は、各ピーク位置テーブル上のピークについてその第１番目と第ｊ番目、第２番目と第ｊ＋１番目、…といったような対応付けをｊを１から順次設定することによって行う。この処理Ｓ９ではそのうちの対応付けの候補の一つを設定する。
【０１２２】
そして制御部１１は、処理Ｓ９で設定された対応付けの結果に基づき、（１）式（や（９）式）に基づきスケーリング補正量ｋと、平行移動量ｓとの間の関係式を見出して、処理Ｓ９にて確保した記憶領域上に「描画」する（Ｓ１０）。すなわち、スケーリング補正量の最大値がＫ、平行移動量の最大値がＳである場合、（ｘ，ｙ）の位置にある記憶領域がＫ・ｘ／ｎ、Ｓ・ｙ／ｍの値に対応しているから、この記憶領域が（Ｋ・ｘ／ｎ）×ｋ＋ｓ＝（Ｓ・ｙ／ｍ）の関係を満たす場合に、この記憶領域内の値をインクリメント（投票）することになる。これが上記の投票処理に相当する。またここではインクリメントするだけでなく、（１１）式により演算される投票値Ｖを当該記憶領域の値に加算（投票）して上書きしてもよい。
【０１２３】
制御部１１は、さらに処理Ｓ８で設定されていない他の組み合わせがあるか否かを判断し（Ｓ１１）、他の組み合わせがあれば（Ｙｅｓならば）処理Ｓ９に戻って処理を続ける。また、他の組み合わせがなければ（Ｎｏならば）、このｎ×ｍの記憶領域を参照し、値が最も大きい記憶領域上の位置（ｘmax，ｙmax）を検索する（Ｓ１２）。そして、当該（ｘmax，ｙmax）に基づき、スケーリング補正量ｋ、及び平行移動量ｓを決定し、処理Ｓ７で取り出した画像データについて補正処理を行う（Ｓ１３）。補正後の累算値系列λｊ（０≦ｊ＜ｎｋ）は、ｎ個の経路について得られた累算値系列ｐｉ（０≦ｉ＜ｎ）に対し、λｊ＝ｐｌとして求められる。ここでｌは、ｌ＝Ｒｏｕｎｄ（（ｊ−ｓ）／ｋ）となる整数である。Ｒｏｕｎｄ（ｘ）はｘを四捨五入した整数を表すものとする。また、ｌ＜０、またはｌ≧ｎとなるｊについては、λｊ＝０とする。
【０１２４】
また、位置（ｘmax，ｙmax）に基づきスケーリング補正量ｋ及び平行移動量ｓを決定する場合には、その中心座標により、ｋ＝（Ｋ／ｎ）×（ｘmax＋１／２）、ｓ＝（Ｓ／ｍ）×（ｙmax＋１／２）としてもよいし、（１５）式、（１６）式を連立して解く方法によってもよい。なお、後者の場合、当該偏微分方程式を解く方法としては、ニュートン法のような数値解法を利用してもよいし、（１３）式や（１４）式を代入して事前に解析的な解を求めておくこととしてもよい。さらに、明らかに誤りとなる記憶領域上の位置（ｘ，ｙ）について、処理Ｓ１２に先だって当該記憶領域上の位置（ｘ，ｙ）に「０」を設定することも好適である。明らかに誤りであるか否かは、（１８）式に対応した累算値系列内の累算値の個数（系列長）の比によるスケーリング補正量ｋ方向への領域制限、（１９）式により求められる重心位置に基づく領域制限を利用することができる。またここでは処理Ｓ１３の補正処理を処理Ｓ７で取り出した画像データに対して行っているが、キーとなる画像データに対して行ってもよい。この場合処理Ｓ３１のときとはスケーリング補正量は上記ｋの逆数となり、平行移動量は、上記ｓと符号が反対のものとなる。
【０１２５】
そして、この新たな累算値系列λiについて、処理Ｓ１，Ｓ２で演算したキーとなる画像データに対する累算値系列との比較を行う、照合処理を実行する（Ｓ１４）。制御部１１は、当該照合結果を処理Ｓ７で取り出した画像データを識別する情報（例えば当該画像データのファイル名など）に関連づけて記憶部１２に格納する（Ｓ１５）。制御部１１は、まだ取り出していない画像データがあるか否かを調べ（Ｓ１６）、取り出していない画像データがあれば（Ｙｅｓならば）、処理Ｓ６に戻って次の画像データに対して処理Ｓ７〜処理１６を行い（図１２の（Ａ））、取り出していない画像データがなければ（すべての画像データに対して処理を行ったならば；Ｎｏならば）、記憶部１２に格納された照合処理の結果を表示部１６に表示するよう指示して（Ｓ１７）、処理を終了する。
【０１２６】
なお、処理Ｓ１７においては、照合処理の結果、一致度の高い順にソートした画像データの一覧を表示することが好ましい。また、各一覧には、画像データのファイル名を含め、画像データをプレビューしたり、取り出して外部記憶部１４に挿入された媒体にコピーを保存できるようにしておくことが好適である。
【０１２７】
また、制御部１１による照合処理（処理Ｓ１４，Ｓ１５）では、（２０）式で示したような相関係数を演算することによって求めることができる。つまり、水平、垂直の各方向に対し、次の演算を行う。すなわち、ηjについてその差分数列（隣接累算値間の差分からなる系列）に相当するｑr（j）と、キーとなる画像データについての累算値系列ｐiについてその差分数列に相当するｑi（i）を生成し、これについて系列長Ｌを用いて、（２０）式を演算する。こうして水平、垂直の各方向についての相関係数ｒh及びｒvが演算され、これらを利用して類似度Scoreをｒh＋ｒvとして、又は（２１）式のようにして求め、この類似度を照合処理の結果として処理Ｓ１５にて記憶部１２に格納することになる。
【０１２８】
本実施の形態の装置によると、制御部１１が、スキャナ部１５から入力される画像データを処理対象として所定の複数の経路に順次沿って画素値を累算し、当該累算（演算）順に累算値を配列した累算値系列を用いてストレージ１３に蓄積された画像データとの照合を行い、その結果を表示部１６に出力する。従って、これを利用すると、回覧された文書を再利用したい場合、これに書き込みや押印があっても、そのままスキャナ部１５で文書をスキャンして照合を行わせる。制御部１１の処理により、ストレージ１３に蓄積された画像データとの照合が行われて、その結果が表示部１６に表示され、利用者はこの結果を参照して示された文書をストレージ１３から取り出すことができる。
【０１２９】
【発明の効果】
本発明によると、処理対象となった画像データに対し、複数の所定経路の各々について、経路上の画素値を累算し、各経路上の画素値を累算した結果を経路の演算順に含んでなる累算値系列を用い、複数の画像データ間の照合処理を行う画像照合装置としているので、画像データそのものを用い、その累算値系列というような特徴的な量を比較することで照合を実行することにより、書き込みや押印等のノイズがあっても安定した結果を得ることができ、ＯＣＲ等テキストデータを抽出する必要がなく、テキストのみからなる文書データ同士をも照合することができる。
【図面の簡単な説明】
【図１】ピーク位置の対応付けの概要に関する説明図である。
【図２】投影波形の大局的な凹凸の様子を検出するための局所的ピーク検出方法を表す説明図である。
【図３】局所的ピークを選別する手段の動作原理を示す説明図である。
【図４】投影波形の幅を検出するための処理を表す説明図である。
【図５】投影波形の幅を求めるための関数を表す説明図である。
【図６】ピークリストの一例を表す説明図である。
【図７】相似変換パラメータを推定する手続きの概要を表す説明図である。
【図８】相似変換パラメータを推定する手続きの概要を表す説明図である。
【図９】投票領域の制限の一例を表す説明図である。
【図１０】本発明の実施の形態に係る画像照合装置の構成ブロック図である。
【図１１】累算値系列の演算方向を表す説明図である。
【図１２】画像照合装置の処理の一例を表すフローチャート図である。
【図１３】ピークリストとしてのピーク位置テーブルの内容の例を表す説明図である。
【符号の説明】
１画像照合装置、１１制御部、１２記憶部、１３ストレージ、１４外部記憶部、１５スキャナ部、１６表示部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image collation apparatus that collates image data.
[0002]
[Prior art]
In recent years, the capacity of storage devices has increased and the speed of networks has increased, and along with this, document management systems in which document data and image data are registered in a database and shared by a plurality of users are becoming widespread. When such a document management system is used, another user can browse and divert document data and image data created by another user.
[0003]
In order to efficiently operate a document management system, it is desired to enhance a search function so that document data to be browsed or diverted can be quickly found from a large amount of accumulated data. Therefore, in the conventional document management system, when registering document data or image data in the database, the registrant or database administrator registers keywords and predetermined classification codes in association with each other for the search. It is necessary to register, which increases the burden on the administrator.
[0004]
In addition, when a large amount of document data has already been accumulated, or when data re-registration work such as database system replacement is required, it is a reality to input keywords for all document data and image data again. Not right.
[0005]
In the case of document data, there are a technique in which the system extracts keywords from the document data and a so-called full-text search technique in order to automate the registration process in the database. However, these techniques cannot be directly applied to document image data obtained by reading a paper document with a scanner or the like. For such image data, the text portion included in the image data is converted into text data using so-called OCR (Optical Character Recognition) technology, and the above-described processing such as full-text search is performed on the text data. Will be performed.
[0006]
[Problems to be solved by the invention]
However, as in the conventional document management system, when image data is read by OCR, character recognition accuracy is greatly reduced if there is noise such as writing or stamping on a paper document or the layout of the document is complicated. However, it is not practical for use in applications such as full-text search.
[0007]
Specifically, in normal corporate activities, there are cases where it is desired to create a new document by diverting a description (for example, a graph) in a paper document circulated within the company. In this case, if there is no handwritten writing or stamp made during circulation, it is easy to capture the portion to be diverted with the scanner as data, but it is usually difficult to capture because these exist. Many. Therefore, there is a desire to extract original image data from the database of the document management system. However, since the conventional document management system uses a full-text search technique or the like, the existence of text data is assumed. In addition, even if there is text data, when the original is image data, if the layout is complicated (the layout becomes extremely complicated in graphs or the like), the accuracy of OCR is reduced as described above, and the text data is correctly extracted. As a result, the text data for search becomes incorrect and is overlooked during full-text search.
[0008]
Therefore, a technique to search for desired image data by comparing image data (similar image search technique) is introduced in, for example, “Computer Image Processing: Applied Practice 3, Soken Publishing, pp.187-227, 1992”. Has been. However, this technique is suitable for image data with a high degree of freedom such as natural images, but it is difficult to identify structurally very similar documents. It is known that resistance to noise such as imprinting is greatly impaired.
[0009]
Japanese Laid-Open Patent Publication No. 2000-112995 discloses a technique for collating image data using only information such as ruled lines and line drawings. However, this technique cannot collate a document composed only of text. Further, Japanese Patent Laid-Open No. 7-282088 discloses a technique for matching image data including text by matching word length patterns. This technique is suitable for languages in which words are separated, for example, English, but it is necessary to use OCR for Japanese where words are not necessarily separated, so that text data cannot be extracted correctly after all. It is not applicable in such cases.
[0010]
The present invention has been made in view of the above circumstances, and can obtain a stable result even when there is noise such as writing or stamping, and does not require extraction of text data such as OCR, and is document data consisting only of text. An object of the present invention is to provide an image collation apparatus that can collate each other.
[0011]
[Means for Solving the Problems]
The present invention for solving the problems of the above-described conventional example is an image collation device, which performs an operation of accumulating pixel values on a path for each of a plurality of predetermined paths for image data to be processed. And means for performing a collation process between a plurality of image data using an accumulated value series including the result of accumulating pixel values on each path in the order of path calculation. .
[0012]
Here, a plurality of the paths are arranged along each of the two axial directions orthogonal to each other, an accumulated value series is calculated for each axial direction, and the means for performing the collation processing includes an accumulated for each axial direction. It is also preferable to perform collation processing by comparing value series. Further, it is preferable that the means for performing the collation processing perform collation based on the similarity between the accumulated value series for each image data to be collated.
[0013]
Further, in order to detect the similarity, the present invention for solving the problems of the conventional example described above, for each image data subject to collation, the accumulated value in the respective accumulated value series. Generate sequence position information representing the position on the sequence where the change is characteristic, compare the sequence position information for each of the accumulated value series, and calculate a similarity transformation parameter including at least the translation amount, Using the similarity transformation parameter, the similarity of each accumulated value series is detected. Here, the similarity conversion parameter further includes a primary conversion parameter. That is, the similarity transformation parameter is the primary transformation coefficient and the amount of translation used in the affine transformation. Further, it is preferable that the sequence position information that characterizes the change in the accumulated value is a list of peak positions of the accumulated value.
[0014]
Furthermore, in order to determine the similarity conversion parameter, a list of peak positions for different image data among the respective image data to be collated is referred to, and similarity conversion parameters are estimated for combinations of peak positions. Then, an estimated similarity transformation parameter group including the similarity transformation parameters estimated for each combination is generated, and the similarity is obtained as a parameter having a high likelihood based on each estimated similarity transformation parameter included in the estimation similarity transformation parameter group. It is preferable to determine the conversion parameters.
[0015]
That is, in the image collating apparatus according to the present invention, in order to search for original image data, an image to be searched is input, a projection waveform is formed on at least one of the input image in the horizontal and vertical directions, A feature amount is extracted from a projection waveform in the horizontal or vertical direction, and is similar to at least one of the projection waveform of the input image or the projection waveform of a reference image (image data group that is a candidate for original) to be verified. Based on the correspondence between the feature quantity extracted from the projection waveform of the input image and the feature quantity extracted from the projection waveform of the reference image, the projection conversion means A conversion parameter to be applied is obtained, and a similarity between the projection waveform of the input image or its conversion projection waveform and the projection waveform of the reference image or its conversion projection waveform is measured.
[0016]
Specifically, the extracted feature amount is a local peak of the projection waveform, and only the peak position satisfying a predetermined condition is selected from all the local peaks detected in the projection waveform. It is taken out as a list (sequence position information).
[0017]
Further processing for obtaining the conversion parameter is as follows. That is, the local peak position in the peak list of the input image and the local peak position in the peak list of the reference image to be matched are associated with all peak position combinations, and the parameter with the highest likelihood is set. In order to determine, a voting process is performed on the straight line in the (scaling correction amount)-(parallel movement amount) plane, and the voting process is performed on the (scaling correction amount)-(parallel movement amount) plane. The peak coordinates of the vote value are detected, and the scaling correction amount and the parallel movement amount corresponding to the peak coordinates are found. This is based on the fact that the scaling correction amount and the parallel movement amount necessary for matching the two peak positions associated in the above association have a linear relationship.
[0018]
As a result, a projection waveform for a predetermined path on the image is formed, and collation is performed using the projection waveform of each image data related to the comparison. The change state (correlation coefficient between the waveform itself and the differential waveform) remains the same, and the influence of such noise components on the similarity measurement is limited. In addition, by setting the predetermined path to each path in the horizontal and vertical directions, a high collation capability is exhibited particularly for a document image containing a large amount of horizontal and vertical components.
[0019]
  Prior to the comparison of the projected waveform of the input image with the projected waveform of the image in the database, the similarity transformation process of the projected waveform by affine transformation is performed so that both waveforms are best matched. Misalignment can be allowed. Here, the parameters for similarity conversion (similarity conversion parameters) are required to have a high likelihood based on the voting process, in other words, the principle of majority voting. Even if it exists, a correct thing can be calculated | required.
  The apparatus according to the present invention further includes means for calculating a centroid position of the accumulated value series for each of the image data to be processed and each image data to be verified. Means for performing the processing based on the centroid position of the accumulated value series of the image data to be processed and the centroid position of the accumulated value series of the image data to be collated; An estimated existence range of the parallel movement amount is limited, and a parameter having a high likelihood is determined as the similarity conversion parameter in the estimated existence range of the parallel movement amount from the estimated similarity conversion parameter group.
[0020]
  Also,Reference exampleImage matching methodIsA process of accumulating pixel values on the path for each of a plurality of predetermined paths for the image data to be processed, and a result of accumulating the pixel values on the paths in the order of path computation. And a step of performing a collation process between a plurality of image data using the accumulated value series.
[0021]
Furthermore, the present invention for solving the problems of the conventional example is an image collation program, which accumulates pixel values on a path for each of a plurality of predetermined paths for image data to be processed. And a step of performing a matching process between a plurality of image data using an accumulated value series including a result of accumulating pixel values on each path in the order of path calculation. It is characterized by.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described with reference to the drawings. First, the principle by which the image collation apparatus according to the embodiment of the present invention performs collation between image data will be described.
[0023]
In other words, the image collating apparatus according to the present embodiment retrieves similar image data from a database using image data itself as a key. The image data serving as a key is read by an image input means such as a scanner, and raster image data. Is converted to A projection waveform for at least one of the horizontal direction and the vertical direction is formed for the raster image data. For example, if it is known in advance whether the document to be read is vertical or horizontal, and it is known that the characteristics of an individual document can be expressed sufficiently with only the projection waveform in that direction, only one direction can be displayed. Although only the projection may be formed, it is preferable to form a projection waveform in both directions when the characteristics of the image data are unclear.
[0024]
Here, projection refers to a process of accumulating the pixel values of pixels on the path along a certain path for image data in raster format, and when there are a plurality of such paths, Several sequences form a projected waveform. In the case of computer processing, pixel value accumulation results are obtained for each path, and the projection waveform is digitally expressed by a set of accumulation results (accumulation value series). In this case, practically, it is preferable to correct a slight inclination at the time of reading using a known inclination correction technique before the projection waveform forming process.
[0025]
For image data in the database, it is preferable to create a projection waveform beforehand and associate it with the image data. If the data registered here is image data read by a scanner or the like, a projection waveform can be generated as it is, but if it is text data created by a word processor or the like, it is rasterized and converted into image data. Then, the projection waveform is generated.
[0026]
The projection waveform may be obtained by accumulating pixel values not only in the vertical and horizontal directions but also on arbitrary plural paths, but each path needs to be determined in advance for comparison. In addition, the order in which accumulation is performed (the order of each path, that is, the order of calculation of accumulated values) must be the same. This is because the accumulated value series is compared in a permutation.
[0027]
In this way, a projection waveform of the projection waveform of the image data as a key and the image data in the database is obtained, the similarity of these projection waveforms is calculated, and a user with a high similarity as a search result is obtained. Present.
[0028]
What is characteristic in this embodiment is that even when there is image data that matches the key image data, the scales of both image data may change or may be read at different resolutions when reading. In consideration of the case where the similarity does not become an appropriate value even if the projection waveforms are compared as they are, scaling parameters are obtained based on each projection waveform and the scaling processing is performed. Depending on the convenience of reading image data, the entire image may be read in a translated state, turned upside down, vertically and horizontally replaced, or read with an inclination. Assuming such a case, the projection waveform is subjected to affine transformation and compared. As a result, the comparison is made according to the similarity, and the accuracy of the search can be improved.
[0029]
Here, the projection waveform to be subjected to the affine transformation may be related to the image data in the database, or may be image data to be a key. When performing affine transformation on the image data in the database, if the projection waveform is calculated in advance for the image data in the database, this advance is set according to the resolution of the scanner that reads the key image data. It is also preferable to prepare a so-called standardized projection waveform obtained by scaling the projection waveform obtained by the above calculation to a predetermined scale (predetermined scale).
[0030]
Next, an outline of a method for obtaining a parameter for the affine transformation (similar transformation parameter) will be described. In the present embodiment, in order to obtain the similarity conversion parameter, the characteristic part of each projection waveform is extracted, and the positional relationship is compared to obtain the similarity conversion parameter. Here, the characteristic portion may be a differential waveform feature quantity such as an extreme value (local peak) or an inflection point, or a projection such as a maximum value, a position that becomes a predetermined value, a position of the center of gravity, or a peak width. It may be a feature amount obtained from the waveform itself. For image data in the database, it is also preferable to extract these feature amounts in advance and associate them with the corresponding image data.
[0031]
For example, when the local peak of each projection waveform is obtained, the position of this local peak is obtained. Since there are usually multiple local peaks in the projected waveform, this position is a list of positions. This list of positions (peak list) corresponds to the “sequence position information” of the present invention.
[0032]
Then, the peak list related to the projection waveform of the image data as a key is compared with the peak list related to the projection waveform of the image data in the database to obtain the similarity conversion parameter. In this comparison process, the similarity conversion parameters are obtained by associating each peak position. For example, as schematically shown in FIG. 1, three peaks (A, B, C and α, β, γ), and when two image data are the same image data, when A and α, B and β are matched (correct), A and β, The situation differs as follows when B and γ are matched (incorrect). In the following description, for the sake of simplicity, it is assumed that the scales of both image data to be compared are the same (that is, scaling is not considered). In FIG. 1, the portions other than the peaks are shown flat for explanation.
[0033]
That is, the translation amount obtained from the relationship between A and α and the translation amount obtained from the relationship between B and β substantially coincide with each other, whereas the translation amount obtained from the relationship between A and β The amount of translation required by the relationship between B and γ is generally different.
[0034]
Therefore, an appropriate value is obtained by trying all associations brute force and comparing each result to obtain the most likely value (maximum likelihood value) by the majority rule.
[0035]
When scaling correction is required, the relationship between the scaling amount k and the parallel movement amount s is obtained from the correspondence between the peak positions in each peak list. That is, when associating the peak position xi related to one image data with the peak position ξi related to the other image data, the relationship of the following equation (1) is obtained.
[0036]
[Expression 1]

[0037]
When a plurality of straight lines according to the formula (1) are drawn on the ks plane ((scaling correction amount)-(parallel movement amount) plane) in association with each peak position, ideally, a large number of straight lines. Is obtained, and the correct scaling correction amount is obtained as the coordinate value (k, s). Here, “ideally” is because k and s obtained from each straight line do not necessarily match due to factors such as calculation errors. Therefore, k and s are determined by selecting the maximum likelihood value based on the majority rule. Focusing on the fact that a large number of parameter candidates gather in the vicinity of a certain value when correctly associated in this way, the maximum likelihood value is obtained using the majority rule. Similarly, for example, the peak list of the projection waveform with respect to the horizontal direction of one image data is correctly associated with the peak list of the projection waveform with respect to the vertical direction of the other image data (that is, the parameter value having a larger number of parameter candidates). It can be determined that it has been rotated by 90 degrees, and in this case, if the order of one peak list is changed (in other words, the direction of one projection waveform is reversed) For example, the primary conversion parameters including rotation, inversion, scaling, and the like can be obtained by the same processing, such as being able to be determined as being inverted. In the following description, only the scaling correction and the parallel movement will be described in order to simplify the description.
[0038]
In the description so far, the projection waveform of the key image data is compared with the projection waveform for the image data in the database. However, even if the differential waveforms obtained by differentiating each waveform are compared with each other. good. When a copied document is used as key image data, if the image data includes a halftone area such as a photograph or drawing, the density value often changes greatly, and the projection peak value is not stable. There are many cases. Even in such a case, the projections and depressions of the projection waveform are maintained in many cases, so it is often preferable to use the differential projection waveform as a comparison target.
[0039]
Furthermore, here, the projection waveform itself is used to create the peak list from the projection waveform. However, the projection waveform generated from the character string has many high-frequency components, and also contains many noise components due to writing and imprinting. On the other hand, when estimating the similarity transformation parameters, it is sufficient to use information such as the pitch of the rows and the length of each row, and the shape such as the pitch of the rows is reflected in the low-frequency component of the projection waveform. Therefore, low-pass filtering of the projected waveform is performed from the viewpoint of noise removal and stabilization of local peak detection processing. In addition, since small irregularities remain in the low-pass filtered projection waveform, a large number of local peaks are detected and the amount of computation increases, so the global irregularities required for estimating the similarity transformation parameters It is preferable to include only the position of the local peak representing the peak list.
[0040]
Therefore, an outline of a method for detecting only local peaks representing global unevenness will be described below. FIG. 2 is a diagram for explaining a local peak detection method for detecting the general unevenness of the projection waveform. In FIG. 2, p (x) is the peak value at the position x of the projected waveform after low-pass filtering, and α is a predetermined window width. At the local peak position, the slope of the projection waveform becomes zero, but instead of obtaining this as the zero crossing point of the differential projection waveform, it is obtained by the slope of the straight line connecting the projection wave height values at both end positions of the window. The slope of this straight line at the position Xa is proportional to S (Xa) in the following equation (2).
[0041]
[Expression 2]

[0042]
The point where the sign of S (Xa) changes is the local peak position. Here, it is preferable that information indicating whether the point is a convex peak changing from positive to negative or a concave peak changing from negative to positive is acquired together with the peak list as peak attribute information. By referring to the projected wave height value at a point a certain distance α from the target position in this way, the peak representing the general unevenness of the projected waveform without being affected by the minute unevenness of the projected waveform Candidates can be detected.
[0043]
From these peak candidates, only the peaks effective for detecting the similarity transformation parameter are selected. Since the similarity transformation parameter is obtained by associating the general unevenness of the projection waveform, it is better to remove the minute peak.
[0044]
FIG. 3 is a diagram showing the operating principle of the means for selecting local peaks. In FIG. 3, the positions of local peak candidates (a total of five points from x1 to x5) obtained by the method shown in FIG. 2 are detected. In the example of FIG. 3, for example, local peaks at x2 and x3 are peak candidates to be excluded as minute irregularities.
[0045]
In this embodiment, the following processing is performed to exclude a minute local peak. That is, for a certain local peak position Xn, a quantity Dn relating to the difference between the projected peak value at that position and the peak value at the adjacent local peak position is obtained as in the following equation (3).
[0046]
[Equation 3]

[0047]
In addition, with respect to those having a local peak adjacent only on one side, such as the position x1 or x5, the absolute value with respect to the difference from the peak value of the local peak is defined as Dn.
[0048]
When the Dn thus obtained and the predetermined threshold value Td satisfy the condition of the following equation (4), the local peak at the position Xn is actually added to the peak list as a peak for comparison.
[0049]
[Expression 4]

[0050]
Further, peak selection may be performed using a ratio Rn of peak values as shown in the equation (5) instead of the difference between the projected peak values at adjacent local peak positions.
[0051]
[Equation 5]

[0052]
Then, when Rn obtained by the equation (5) and the predetermined threshold value Tr satisfy the condition of the equation (6), a local peak at the position Xn is added to the peak list.
[0053]
[Formula 6]

[0054]
Further, for the purpose of assisting comparison between peak lists, it is also preferable to obtain the width and center of gravity of the projected waveform as peak attribute information. These pieces of information are not essential for conversion parameter detection, but can be used for speeding up detection of conversion parameters and improving reliability as described later.
[0055]
When calculating the width of the peak here, if the projection waveform does not contain noise and does not include the offset due to the ground color of the paper, the projection waveform is projected with the width of the non-zero section of the projection waveform. The width of the waveform may be used. However, since such an ideal example is extremely rare, in practice, the following processing must be performed in order to obtain the projection width.
[0056]
FIG. 4 is a diagram illustrating a process for detecting the width of the projection waveform. In the actual projection waveform, as shown in the areas A and B in FIG. 4, an offset due to noise or paper ground color (non-zero region generated because the pixel is not partially “white” due to the ground color) )It is included. Therefore, threshold processing is performed to remove these components. However, when there are many crest values near the threshold value, a subtle difference in whether or not the threshold value is exceeded greatly affects the width of the detected projection waveform. Therefore, in the example of FIG. On the other hand, two different threshold values, Th and Tl, are set. Of these, the lower threshold value Tl is set so that most of noise and offset are less than this value, and the higher threshold value Th is set so that most of the original projected waveform exceeds this value. Is done.
[0057]
After setting such a threshold value, the width W of the projection waveform is obtained by the following equation (7).
[0058]
[Expression 7]

[0059]
Here, as shown in FIG. 5, w (•) is “0” for x such that 0 <x <T1, and “1” for x such that Th <x, and Tl <x <Th. Then, it is a function that linearly changes from “0” to “1”. Further, P represents the entire projection waveform, and therefore the expression (7) means that the summation is performed over the entire projection waveform. By providing a threshold value processing function having a gentle slope between the threshold values Th and Tl in this way, even when there are many projection waveforms near the threshold value that are problematic when a single threshold value is used. Thus, it is possible to prevent a significant difference from occurring in the width of the detected projection waveform.
[0060]
On the other hand, the center-of-gravity position G of the projection waveform is obtained from the following equation (8).
[0061]
[Equation 8]

[0062]
The similarity conversion parameter is obtained using the peak list and the peak attribute information thus obtained. Next, the procedure for obtaining the similarity transformation parameters will be outlined. First, peak positions are correlated by referring to the respective peak lists of key image data and image data to be compared in the database. The peak positions are associated with each other for the convex peak and the concave peak with reference to the peak attribute information. Next, based on the fact that the scaling correction amount and the parallel movement amount necessary for matching the two corresponding local peak positions have a linear relationship, the (scaling correction amount)-(parallel movement amount) plane Perform voting on a straight line. When such voting processing is performed for all combinations of local peak positions, a distribution of cumulative values of voting values is formed on the (scaling correction amount)-(parallel movement amount) plane. On the (Scaling correction amount)-(Parallel movement amount) plane, the point at which this cumulative vote value increases indicates that the two local peak lists are in good agreement. , Parallel movement amount) is a similarity conversion parameter to be obtained.
[0063]
That is, as shown in FIG. 6, when there is a local convex peak list (A1 to A3) for the projected waveform and a local convex peak list (B1 to B4) obtained from the projected waveform of the key image data, The peak positions are associated with the two local peak lists. For example, when the local peaks A1 and B1 are associated with each other, if the scaling correction amount necessary for matching the peak positions is k and the translation amount is s, the coordinate “10” of A1 and the coordinate B1 Using “19”, the following equation (9) is established.
[0064]
[Equation 9]

[0065]
Since this equation (9) is a linear equation relating to k and s, a straight line can be drawn on the ks plane, that is, the (scaling correction amount)-(parallel movement amount) plane. If k and s constrained by the condition of equation (9) are used, both image data relating to the comparison can be superimposed at least at the local peak positions A1 and B1. Therefore, “put a vote” on this straight line as a candidate for the correction amount. That is, voting is performed for each point on the straight line. I want to find the point with the most votes later as the maximum likelihood point.
[0066]
Similarly, by associating the local peak A1 shown in FIG. 6 with other B2 to B4 as well, on the (scaling correction amount)-(parallel movement amount) plane, regarding A1, it is convenient to have four straight lines. Will vote. In general, the position of the i-th peak in the local peak list obtained from the projection waveform of image data in a certain database is Aix, and the position of the j-th peak in the local peak list for the projection waveform of key image data is Bjx. Then, the relationship between the scaling correction amount k and the parallel movement amount s necessary for the peak positions of the two to coincide is the relationship of Equation (10), as in Equation (1).
[0067]
[Expression 10]

[0068]
By performing the same voting process for the remaining local peaks A2 and A3, a distribution of cumulative values of voting values is formed on the (scaling correction amount)-(parallel movement amount) plane.
[0069]
In addition to always casting one vote as in the above example, a vote value reflecting the peak value information of the local peak can also be used. For local peaks, it is natural to associate large peak values or small peak values, so the vote value is also changed according to the peak value. That is, the peak value of the i-th peak in the local peak list obtained from the projected waveform is Ais, and the peak value of the j-th peak in the local peak list for the projected waveform of key image data is Bjs, for example, a vote value V is defined as the following equation (11).
[0070]
[Expression 11]

[0071]
Note that in order to apply the vote value of equation (11), it is necessary to perform normalization processing in advance such that the maximum value of the peak value of the local peak becomes a predetermined value.
[0072]
Thus, the cumulative vote value formed on the (scaling correction amount)-(parallel movement amount) plane reflects the degree of coincidence between the two local peak lists. Therefore, if the peak point of the cumulative vote value formed on the same plane is searched, the coordinates of that point are obtained (scaling correction amount, parallel movement amount). By applying correction to one projection waveform using this correction amount, the two projection waveforms are best matched. In the example of FIG. 6, since (scaling correction amount, parallel movement amount) = (0.9, 10) has a maximum cumulative vote value of (cumulative vote value) = 3, the projected waveform of the registered document is multiplied by 0.9. Then, by further translating by 10, the two projected waveforms are best matched.
[0073]
By the way, when the above correction amount is applied, it can be seen that only the local peak B3 on the collated document side does not have a local peak corresponding to the image data in the database. From this, it is presumed that the local peak B3 is a peak component due to noise such as writing on a document or imprinting. In the procedure for detecting the similarity transformation parameter according to the present embodiment, the similarity transformation parameter is estimated on the basis of the majority rule called voting even if such a local peak due to noise or omission of detection is present. It is possible to prevent the accuracy of the correction amount to be reduced.
[0074]
In the above description, the similarity conversion parameter is detected by searching for the peak point of the cumulative vote value formed on the (scaling correction amount)-(parallel movement amount) plane. In actuality, it is rare that a group of straight lines for voting intersects exactly at one point due to errors in detecting local peaks, printing accuracy of the original document, and the like. Therefore, as a procedure for estimating the similarity transformation parameter, a (scaling correction amount)-(parallel movement amount) plane is divided into “quantum” in a cell obtained by dividing n by the scaling correction amount direction and m by dividing the parallel movement amount direction as shown in FIG. It is conceivable to calculate the total vote value of each cell and obtain the cell with the maximum total. Then, the maximum likelihood value of the similarity transformation parameter is determined by using the center coordinates (kc, sc) of the cell where the cumulative total is maximum. In this way, it can be adapted to computer processing.
[0075]
In order to further improve the accuracy, as shown in FIG. 8, after obtaining a cell with the largest total vote value, only the straight line group passing through the cell is selected from the straight line group to be voted, and those straight lines are selected. The coordinates of the point located at the shortest distance from the group are obtained, and the coordinates are set as the maximum likelihood value. The coordinates of such a point can be obtained by the following procedure.
[0076]
First, it is assumed that there are h straight line groups passing through a cell having the largest total vote value, and these are represented by the equation (12).
[0077]
[Expression 12]

[0078]
Here, i = 0, 1,..., H−1. Between the distance di between one of these straight lines and a point (k, s) on the (scaling correction amount)-(parallel movement amount) plane, there is a relationship of the following equation (13).
[0079]
[Formula 13]

[0080]
The square sum D of the distance from the point (k, s) to each straight line is expressed by the following equation (14).
[0081]
[Expression 14]

[0082]
Therefore, the coordinates (kl, sl) of the point located at the shortest distance from the straight line group passing through the cell having the largest total vote value are obtained by solving the following equations (15) and (16) simultaneously. be able to.
[0083]
[Expression 15]

[0084]
Furthermore, when obtaining the coordinates of the point located at the shortest distance from the straight line group, the distance from each straight line may be weighted by the vote value V defined in the equation (11). By performing weighting with the voting value V, correction amount detection is performed with emphasis on a straight line that is highly likely to be correctly associated with a local peak. In this case, equation (14) is changed to equation (17).
[0085]
[Expression 16]

[0086]
In the equation (17), Vi is a vote value on the i-th straight line. Of the peak attribute information, the width and center of gravity information of the projected waveform can be used for speeding up the processing and improving the reliability in estimating the similarity transformation parameter.
[0087]
That is, parameter estimation by voting generally has a so-called false peak problem that causes a relatively large peak at an inappropriate position. In order to reduce this effect, it is effective not to vote for the entire voting space, but not to vote for areas that are clearly deemed unnecessary. In the present embodiment, by limiting the voting area using the information on the width and the center of gravity of the projection waveform, the conversion parameter detection process can be speeded up, and the effect due to the above problem can be reduced.
[0088]
Specifically, the voting area in the scaling correction amount direction on the (scaling correction amount)-(parallel movement amount) plane can be limited by using the width of the projection waveform. When the projection waveform width of the image data in the database is Wr and the projection waveform width of the key image data is Wi, and the projection width ratio R shown in the following equation (18) is obtained, the scaling correction amount in the similarity transformation parameter is R. Should be close. This is because if two documents are the same document, their projection waveforms should be similar, and the difference in scaling should be the ratio of the projection width.
[0089]
[Expression 17]

[0090]
Although the width of the projection waveform includes an error due to an offset due to noise, paper ground color, etc., the true correction amount, that is, the scaling coordinate of the peak position of the vote value is still expressed by equation (18). Should be close to R. Therefore, a predetermined scaling margin β centered on R in the equation (18) is set, and the voting area is limited within the range of the scaling margin β. In this case, among the straight lines drawn on the (scaling correction amount)-(parallel movement amount) plane, it is assumed that “voting” is performed within the range of the scaling margin β, and “voting” is performed outside the range. do not do.
[0091]
The voting area can also be limited from the center of gravity position information of the projection waveform. That is, if the centroid position obtained by the feature extraction means is accurate, the relationship of the following expression (19) should be established between the centroid positions of the two projection waveforms as in the expression (10). is there.
[0092]
[Expression 18]

[0093]
In Equation (19), Gr and Gi are the projected centroid position of the image data in the database and the projected centroid position of the key image data, respectively. Actually, the obtained information on the center of gravity position includes an error due to an offset due to noise, paper ground color, etc. However, the true correction amount, that is, the peak position of the vote value is still ( It should be located in the vicinity of the straight line represented by the equation (19). Therefore, an appropriate parallel movement amount margin γ is set around the straight line of the equation (19), and the voting area is limited within the range.
[0094]
As a result, the voting area is limited to the range shown in FIG. 9, the voting process on a straight line outside this area or on a straight line that does not pass through this area is omitted, the processing speed is increased, and the false peak The effect of the problem can be reduced.
[0095]
In addition, based on the similarity conversion parameters thus obtained, when performing a similar conversion process of the projected waveform, such as when a signal at a non-sampling position is required, the interpolation method is the nearest known interpolation method when interpolation processing is required. Various methods such as a linear interpolation method, a cubic convolution method, and a spline interpolation method can be used. Since the apparatus of the present embodiment does not particularly depend on a specific interpolation method, any of the interpolation methods listed above may be used. Since the above interpolation method is explained in many literatures, its explanation is omitted.
[0096]
Next, an outline of a method for comparing the projection waveforms after the similarity transformation using the similarity transformation parameters estimated by the above technique will be described. Ideally, when one of the projection waveforms is similarly transformed and compared with the other projection waveform, if the two image data match, these projection waveforms match exactly. However, even if the image data for comparison is the same, there are various factors such as the reading accuracy of the scanner, the imprinting or writing on the key image data, and the calculation error of the similarity conversion parameter. For this reason, the two projection waveforms rarely match exactly. Also, when using copied image data as key image data, the density values of halftone areas such as photographs and drawings on the document often change greatly, and the projection peak value is also stable. Often not. Therefore, it is necessary to measure the degree of similarity in a robust manner even with respect to deformation of the projection waveform due to noise or the like, errors in similarity conversion parameters, and changes in the projection peak value due to the halftone area on the document.
[0097]
As a method for collating such a projected waveform, in this embodiment, a correlation coefficient between the differential waveforms of the projected waveform is used. This is because even if an offset is added to a part of the projected waveform by the halftone area on the document, the differential waveform is hardly affected. As a method for generating the differential waveform of the projected waveform, S (x) shown in the equation (2) can be used in addition to the original differential waveform. Since S (x) in equation (2) is not easily affected by minute irregularities on the projection waveform, good results can be obtained when S (x) is used for projection waveform matching.
[0098]
Similarity transformation is performed on the image data in the database, the differential projection waveform after the similarity transformation is qr (x), and the differential projection waveform of the key image data is qi (x). Is obtained by the following equation (20).
[0099]
[Equation 19]

[0100]
L represents the length of the waveform.
[0101]
When projection waveforms are formed in both the horizontal and vertical directions in this way, correlation coefficients for the projection waveforms in both directions are obtained. Information on the similarity of image data to be output may be a combination of these two correlation coefficients, or it is known that it is sufficient to perform collation processing in either direction May handle the correlation coefficient in that direction as the similarity.
[0102]
An example of obtaining the similarity of image data in consideration of the correlation coefficient in both directions is to output the sum of correlation coefficients obtained in the horizontal and vertical directions as the similarity. As another example, there is a method in which a load sum of correlation coefficients in two directions is used as a similarity. In this case, if the correlation coefficient in the horizontal direction is rh, the correlation coefficient in the vertical direction is rv, the weight of the correlation coefficient in the horizontal direction is wh, and the weight of the correlation coefficient in the vertical direction is 1-wh, the output The similarity to be expressed is expressed by the following equation (21).
[0103]
[Expression 20]

[0104]
If the correlation coefficient in either direction more strongly reflects the difference in the document image, the collation accuracy is improved by calculating the degree of similarity by weighting the direction in accordance with equation (21). Can be made.
[0105]
Furthermore, the correction amount obtained by estimating the similarity conversion parameter can be reflected in the similarity. This is suitable when the document margin width and scale are greatly different and it is desired to treat the document as a separate document. In this case, by creating a composite variable Z defined by the load sum of horizontal and vertical correlation coefficients, scaling correction amount, translation amount, etc., and adjusting each weight by discriminant analysis using a large number of document samples A discriminant function for discriminating each document can be obtained. Discriminant analysis is explained in many literatures, so the details are omitted.
[0106]
On the other hand, a case will be described in which a correlation coefficient in one direction is treated as a similarity without using a correlation coefficient in both directions. In general, in the case of a horizontally written document, clear unevenness appears in the horizontal projection, but in many cases, clear unevenness does not occur in the vertical projection. In such a case, since the useful information for specifying the document cannot be obtained from the collation result of the vertical projection waveform, it is known in advance whether the target document is vertical writing or horizontal writing. For this, it is preferable to measure the similarity using a correlation coefficient in only one direction. It is also an effective method to use only the direction with a large number of peaks included in the local peak list in the horizontal and vertical directions for similarity measurement.
[0107]
As a result, it is determined that the image data registered in the database and having the highest degree of similarity with the key image data is the same document as the key image data.
[0108]
In the description so far, the peak point at which the cumulative vote value formed on the (scaling correction amount)-(parallel movement amount) plane takes the maximum value when the similarity conversion parameter is estimated is searched, and the coordinates of the peak point are searched. Has been used as the only correction amount (scaling correction amount, parallel movement amount). However, it is not rare that a plurality of points having a large cumulative vote value are formed on the plane, and if there are some errors in the process of local peak detection, the cumulative vote value is second or lower. In some cases, the correction amount corresponding to the peak point is the correct correction amount. Assuming such a case, a predetermined plurality of peak points are obtained from the accumulated voting values formed on the plane, and the projection waveform is converted with a plurality of correction amounts corresponding to the respective peak points. A method is also conceivable. A plurality of similarities are measured using the projection waveform that has been subjected to conversion processing using the plurality of correction amounts, and the maximum similarity is determined as the final similarity, thereby ensuring a more reliable document image. Verification processing can be performed.
[0109]
Further, if attention is paid to the fact that the accumulated voting value formed on the (scaling correction amount)-(parallel movement amount) plane reflects the degree of coincidence of the two projection waveforms, the voting value itself is converted into two projections. In some cases, it can be regarded as the similarity of waveforms. If it can be guaranteed that there is no false detection of local peaks or omission of detection, the above voting value itself is used as the similarity, and there is no need to convert the actual projection waveform to a similarity, and the converted projection waveform, etc. There is also no need to compare them.
[0110]
The operation principle of the image collation apparatus according to the present embodiment has been described above. Hereinafter, an implementation example of the image collation apparatus according to the embodiment of the present invention will be described. The image collation apparatus 1 according to the present embodiment can be realized by using a general personal computer. As shown in FIG. 10, the control unit 11, the storage unit 12, the storage 13, the external storage unit 14, and the scanner The unit 15 and the display unit 16 are basically configured.
[0111]
In addition to a basic program such as an operating system executed by the control unit 11, the storage 13 includes a program storage area 13a in which an image collation program is installed and a data storage area 13b in which image data to be compared is accumulated. Yes.
[0112]
The control unit 11 executes programs stored in the storage 13 to perform various processes. In the present embodiment, the control unit 11 realizes a calculation means of the present invention and a means for performing image data matching processing according to an image matching program. That is, when image data as a key is input from the scanner unit 15, the control unit 11 stores this as image data to be processed in the storage unit 12, and pixels along a plurality of preset routes. A process of accumulating values and storing them in the storage unit 12 as an accumulated value series in the order of the calculated path is executed. Further, the control unit 11 executes processing for collating the image data accumulated in the storage 13 with the image data to be processed using the accumulated value series. These processes will be described in detail later.
[0113]
The storage unit 12 operates as a work memory for the control unit 11. The external storage unit 14 reads the image collation program from a computer-readable recording medium storing the image collation program or the like (for example, a magnetically or optically recorded information such as a CD-ROM or DVD-ROM). A process of outputting to the control unit 11 and causing the storage 13 to be installed is performed.
[0114]
The scanner unit 15 optically reads a paper medium document, converts it into image data, and outputs the image data to the control unit 11. The display unit 16 displays various information according to instructions input from the control unit 11.
[0115]
Here, the processing of the image collation program of the control unit 11 will be described in more detail. Note that the image data stored in the storage 13 is stored in advance in association with accumulated value series information and a peak position table (the calculation method will be described later). Further, in the following description, it is assumed that the calculation path of the accumulated value is the horizontal direction and the vertical direction, and the order is the order in which the horizontal direction is scanned from the upper side to the lower side of the paper, and the vertical direction is the order scanned from the left side to the right side. (FIG. 11). In FIG. 11, a solid line indicates a calculation path, and a broken line is used to indicate the calculation order of the path.
[0116]
The control unit 11 receives input of image data (processing target image data) as a key from the scanner unit 15 and stores it in the storage unit 12 to start the processing shown in FIG. 12. First, the processing target image data On the other hand, the pixel values on the path are accumulated along a plurality of paths extending in the horizontal direction as shown in FIG. 11A, and the accumulated results in each path are stored in the storage unit 12 in the order of calculation ( S1). Similarly, the pixel values on the path are accumulated along a plurality of paths extending in the vertical direction as shown in FIG. 11B, and the accumulated results in each path are stored in the storage unit 12 in the order of calculation. (S2). The result of accumulation in these processes S1 and S2 is the accumulated value series of the present invention. Here, the accumulated value for the path of the i-th calculation order is represented as pi.
[0117]
In calculating the accumulated value, in the case of black and white image data, the pixel value of the white point may be accumulated as “0”, the pixel value of the black point as “1”, and many values including halftones may be included. In the case of value image data, for example, white may be “0”, black may be “255”, and the like. Further, in the case of color image data, the density component is separated, and the projection value is calculated by regarding the density image as the multi-value image. The separation of density components can be performed by a widely known color value conversion process.
[0118]
The control unit 11 performs low-pass filtering on the accumulated value series information to remove high-frequency components and stores the information in the storage unit 12 (S3). This process can be performed by a known process such as calculating an average value of accumulated values of adjacent paths. For the accumulated value series after low-pass filtering, a predetermined value of a (a is an integer) is used, and i is sequentially changed for Si = (pi-a)-(pi + a) corresponding to equation (2). Then, the positive / negative change point is found, and i which is the change point is taken out as a peak candidate. Further, the information of i of each change point is associated with peak attribute information indicating whether it is a convexity changing from positive to negative or a concaveity changing from negative to positive, and is shown in FIG. Such a table is stored (S4). Then, by the processing illustrated in the formulas (3) to (6), based on the size difference or ratio with the adjacent peak, the insignificant peak candidate is deleted from the peak candidate table generated in step S4, and the peak position A table is generated (S5).
[0119]
In addition, here, the control unit 11 obtains the width and the center of gravity of the accumulated value series by the equations (7) and (8), and stores them in the storage unit 12 in association with the peak position table (S6). Since this process can be realized as a computer process by discretely handling what has already been outlined, detailed description thereof will be omitted.
[0120]
Through the processing so far, the storage unit 12 stores the above-described local peak list as the peak position table, that is, the sequence position information of the present invention.
[0121]
Next, the control unit 11 sequentially takes out the image data from the storage 13 (S7), secures an area for storing a predetermined n × m array on the storage unit 12, and sets the value of this area to “0”. Initialization is performed (S8), the peak position table related to the extracted image data is compared with the peak position table stored in the storage unit 12 in process S6, and the process of associating each peak position is executed (S9). ). This processing is performed by sequentially setting j such as 1 from the first and jth, the second and j + 1th,... For the peaks on each peak position table. In this process S9, one of the candidates for association is set.
[0122]
Then, the control unit 11 finds a relational expression between the scaling correction amount k and the parallel movement amount s based on the expression (1) (or expression (9)) based on the association result set in step S9. Then, “draw” is performed on the storage area secured in step S9 (S10). That is, when the maximum value of the scaling correction amount is K and the maximum value of the parallel movement amount is S, the storage area at the position (x, y) corresponds to the values of K · x / n and S · y / m. Therefore, when the storage area satisfies the relationship (K · x / n) × k + s = (S · y / m), the value in the storage area is incremented (voted). This corresponds to the above voting process. In addition to incrementing here, the voting value V calculated by equation (11) may be added (voted) to the value in the storage area and overwritten.
[0123]
The control unit 11 further determines whether there is another combination that has not been set in the process S8 (S11), and if there is another combination (if Yes), the process returns to the process S9 and continues the process. If there is no other combination (if No), this n × m storage area is referred to and the position (xmax, ymax) on the storage area having the largest value is searched (S12). Then, based on the (xmax, ymax), the scaling correction amount k and the parallel movement amount s are determined, and the correction process is performed on the image data extracted in the process S7 (S13). The corrected accumulated value series λj (0 ≦ j <nk) is obtained as λj = pl with respect to the accumulated value series pi (0 ≦ i <n) obtained for n paths. Here, l is an integer such that l = Round ((j−s) / k). Round (x) represents an integer obtained by rounding off x. For j where l <0 or l ≧ n, λj = 0.
[0124]
When determining the scaling correction amount k and the parallel movement amount s based on the position (xmax, ymax), k = (K / n) × (xmax + 1/2), s = (S / m) × (ymax + 1/2), or a method of solving equations (15) and (16) simultaneously. In the latter case, as a method of solving the partial differential equation, a numerical solution method such as Newton's method may be used, or an analytical solution in advance by substituting Equations (13) and (14). It is good to ask for. Furthermore, it is also preferable to set “0” to the position (x, y) in the storage area prior to the processing S12 for the position (x, y) in the storage area that is clearly erroneous. Whether or not it is clearly an error is determined by the area limitation in the scaling correction amount k direction by the ratio of the number of accumulated values (sequence length) in the accumulated value series corresponding to the equation (18), and the equation (19). A region restriction based on the required center-of-gravity position can be used. Further, although the correction process in step S13 is performed on the image data extracted in step S7, it may be performed on the key image data. In this case, the amount of scaling correction is the reciprocal of k as in step S31, and the amount of translation is opposite to that of s.
[0125]
Then, the new accumulated value series λi is compared with the accumulated value series for the key image data calculated in steps S1 and S2, and a matching process is executed (S14). The control unit 11 stores the collation result in the storage unit 12 in association with information (for example, the file name of the image data) for identifying the image data extracted in step S7 (S15). The control unit 11 checks whether there is image data that has not yet been extracted (S16). If there is image data that has not yet been extracted (if Yes), the control unit 11 returns to process S6 and performs process S7 on the next image data. -Processing 16 is performed ((A) in FIG. 12), and if there is no image data that has not been extracted (if processing has been performed for all image data; if No), the verification stored in the storage unit 12 is performed. The display unit 16 is instructed to display the processing result (S17), and the processing ends.
[0126]
In step S17, it is preferable to display a list of image data sorted in descending order of coincidence as a result of the matching process. Further, it is preferable that each list includes a file name of the image data so that the image data can be previewed or copied and stored in a medium inserted into the external storage unit 14.
[0127]
Moreover, in the collation process (process S14, S15) by the control part 11, it can obtain | require by calculating a correlation coefficient as shown by (20) Formula. That is, the following calculation is performed for each of the horizontal and vertical directions. That is, qr (j) corresponding to the difference number sequence (series including differences between adjacent accumulated values) for ηj and qi (i) corresponding to the difference number sequence for the accumulated value series pi for key image data. ) And (20) is calculated using the sequence length L. Thus, the correlation coefficients rh and rv for the horizontal and vertical directions are calculated, and using these, the similarity score is obtained as rh + rv, or as shown in equation (21), and the similarity is obtained as a result of the collation process. Is stored in the storage unit 12 in step S15.
[0128]
According to the apparatus of the present embodiment, the control unit 11 sequentially accumulates pixel values along a plurality of predetermined paths using the image data input from the scanner unit 15 as a processing target, and in the order of the accumulation (calculation). Using the accumulated value series in which the accumulated values are arranged, the image data accumulated in the storage 13 is collated, and the result is output to the display unit 16. Therefore, when this is used, when it is desired to reuse the circulated document, the scanner unit 15 scans the document as it is and performs collation even if there is writing or stamping on it. By the processing of the control unit 11, collation with the image data stored in the storage 13 is performed, and the result is displayed on the display unit 16, and the user refers to this result from the storage 13. It can be taken out.
[0129]
【The invention's effect】
According to the present invention, pixel values on a path are accumulated for each of a plurality of predetermined paths with respect to image data to be processed, and the results of accumulating the pixel values on each path are included in the calculation order of the paths. Since this is an image matching device that uses a series of accumulated values and performs matching between multiple image data, the image data itself is used to compare by comparing characteristic quantities such as the accumulated value series. By executing the above, it is possible to obtain a stable result even if there is noise such as writing or imprinting, it is not necessary to extract text data such as OCR, and it is possible to collate document data consisting only of text. .
[Brief description of the drawings]
FIG. 1 is an explanatory diagram relating to an outline of peak position association;
FIG. 2 is an explanatory diagram illustrating a local peak detection method for detecting a general unevenness of a projection waveform.
FIG. 3 is an explanatory diagram showing an operation principle of a means for selecting a local peak.
FIG. 4 is an explanatory diagram showing a process for detecting the width of a projected waveform.
FIG. 5 is an explanatory diagram illustrating a function for obtaining a width of a projected waveform.
FIG. 6 is an explanatory diagram illustrating an example of a peak list.
FIG. 7 is an explanatory diagram showing an outline of a procedure for estimating a similarity transformation parameter.
FIG. 8 is an explanatory diagram showing an outline of a procedure for estimating a similarity transformation parameter.
FIG. 9 is an explanatory diagram illustrating an example of restriction of a voting area.
FIG. 10 is a block diagram showing the configuration of an image collating apparatus according to an embodiment of the present invention.
FIG. 11 is an explanatory diagram showing a calculation direction of an accumulated value series.
FIG. 12 is a flowchart illustrating an example of processing performed by the image collating apparatus.
FIG. 13 is an explanatory diagram illustrating an example of the contents of a peak position table as a peak list.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Image collation apparatus, 11 Control part, 12 Storage part, 13 Storage, 14 External storage part, 15 Scanner part, 16 Display part

Claims

An arithmetic means for accumulating pixel values on the path for each of a plurality of predetermined paths for the image data to be processed;
Means for performing a collation process between a plurality of image data using an accumulated value series including the result of accumulating the pixel values on each path in the order of calculation of the path;
Only including,
The means for performing the collation process performs collation based on the similarity between the accumulation value series for each image data to be collated, and each image to be collated is detected in order to detect the similarity. For the data, generate sequence position information that represents the position on the sequence where the change in the accumulated value within each accumulated value series is characteristic, and compare the sequence position information for each accumulated value series , Calculating a similarity conversion parameter including at least a primary conversion coefficient and a translation amount, and using the similarity conversion parameter to detect similarity of each accumulated value series;
The sequence position information in which the change of the accumulated value is characteristic is a list of peak positions of the accumulated value,
The means for performing the collation processing refers to a list of peak positions for different image data among the respective image data to be collated in order to determine the similarity conversion parameter, and a combination of the peak positions. A similar transformation parameter is estimated for each combination, and an estimated similar transformation parameter group including the similar transformation parameters estimated for each of the combinations is generated. Based on each estimated similar transformation parameter included in the estimated similar transformation parameter group, the likelihood Determining the similarity transformation parameter as a high parameter of
An image matching device,
Means for calculating the centroid position of the accumulated value series for each of the image data to be processed and each image data to be collated;
Further comprising
The means for performing the collation processing is based on the centroid position of the accumulated value series of the image data to be processed and the centroid position of the accumulated value series of the image data to be collated. Limiting the estimated existence range of the parallel movement amount in the conversion parameter, and determining the parameter having the highest likelihood as the similarity conversion parameter in the estimated existence range of the parallel movement amount from the estimated similarity conversion parameter group;
An image collating apparatus characterized by that.

  Means for obtaining a width of the accumulated value series in the forward direction of operation for each of the image data to be processed and each image data to be collated;
  Further comprising
  The means for performing the collation processing is based on the similarity conversion parameter based on the width of the accumulated value series of the image data to be processed and the width of the accumulated value series of the image data to be collated. Limiting the estimated existence range of the primary transformation coefficient in the parameter, and setting the parameter with the high likelihood in the intersection of the estimated existence range of the translation amount and the estimated existence range of the primary transformation from the estimated similarity transformation parameter group. Determined as the similarity transformation parameter,
  The image collating apparatus according to claim 1.

Computer
  An arithmetic means for accumulating pixel values on the path for each of a plurality of predetermined paths for the image data to be processed;
  Means for performing a collation process between a plurality of image data using an accumulated value series including a result of accumulating pixel values on each path in the order of calculation of the path;
  A program for functioning as
  The means for performing the collation process performs collation based on the similarity between the accumulation value series for each image data to be collated, and each image to be collated is detected in order to detect the similarity. For the data, generate sequence position information that represents the position on the sequence where the change in the accumulated value within each accumulated value series is characteristic, and compare the sequence position information for each accumulated value series Calculating a similarity transformation parameter including at least a primary transformation coefficient and a translation amount, and calculating the similarity Use transformation parameters to detect the similarity of each accumulated value series,
  The sequence position information in which the change of the accumulated value is characteristic is a list of peak positions of the accumulated value,
  The means for performing the collation processing refers to a list of peak positions for different image data among the respective image data to be collated in order to determine the similarity conversion parameter, and a combination of the peak positions. A similar transformation parameter is estimated for each combination, and an estimated similar transformation parameter group including the similar transformation parameters estimated for each of the combinations is generated. Based on each estimated similar transformation parameter included in the estimated similar transformation parameter group, the likelihood Determining the similarity transformation parameter as a high parameter of
  Along with the program, the computer
  Means for calculating the center of gravity position of the accumulated value series for each of the image data to be processed and each image data to be collated;
  Further function as
  The means for performing the collation processing is based on the centroid position of the accumulated value series of the image data to be processed and the centroid position of the accumulated value series of the image data to be collated. Limiting the estimated existence range of the parallel movement amount in the conversion parameter, and determining the parameter having the highest likelihood as the similarity conversion parameter in the estimated existence range of the parallel movement amount from the estimated similarity conversion parameter group;
  A program characterized by that.

The computer,
  Means for obtaining a width of the accumulated value series in the forward direction of operation for each of the image data to be processed and each image data to be collated;
  And further function as
  The means for performing the collation processing is based on the similarity conversion parameter based on the width of the accumulated value series of the image data to be processed and the width of the accumulated value series of the image data to be collated. Limiting the estimated existence range of the primary transformation coefficient in the parameter, and setting the parameter with the high likelihood in the intersection of the estimated existence range of the translation amount and the estimated existence range of the primary transformation from the estimated similarity transformation parameter group. Determined as the similarity transformation parameter,
  The program according to claim 3.