JP2017138743A

JP2017138743A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2017138743A
Application number: JP2016018419A
Authority: JP
Inventors: 智之清水; Tomoyuki Shimizu; 椎山　弘隆; Hirotaka Shiiyama; 弘隆椎山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-02-02
Filing date: 2016-02-02
Publication date: 2017-08-10

Abstract

PROBLEM TO BE SOLVED: To make it possible to accurately compare a comparison source image with a comparison destination image.SOLUTION: The present invention estimates a difference area between a comparison source image and a comparison destination image, and executes OCR processing on the difference area.SELECTED DRAWING: Figure 3

Description

本発明は、比較元画像と比較先画像とを比較する技術に関する。 The present invention relates to a technique for comparing a comparison source image and a comparison destination image.

画像の局所的な特徴量（局所特徴量）を用いて類似画像を検索する方法が提案されている。この方法では、まず、画像から特徴的な点（局所特徴点）を抽出する（非特許文献１）。次に、当該局所特徴点とその周辺の画像情報とに基づいて、当該局所特徴点に対応する特徴量（局所特徴量）を計算する（非特許文献２）。 There has been proposed a method of searching for a similar image using a local feature amount (local feature amount) of an image. In this method, first, characteristic points (local feature points) are extracted from an image (Non-Patent Document 1). Next, a feature amount (local feature amount) corresponding to the local feature point is calculated based on the local feature point and surrounding image information (Non-Patent Document 2).

局所特徴量を利用する手法においては、局所特徴量を回転不変、拡大・縮小不変となる複数の要素で構成される情報として定義する。これにより、画像を回転させたり、拡大または縮小させたりした場合であっても、検索を可能にする。局所特徴量は一般的にベクトルとして表現される。ただし、局所特徴量が回転不変、拡大・縮小不変であることは理論上の話であり、実際のデジタル画像においては、画像の回転や拡大・縮小処理前の局所特徴量と処理後の対応する局所特徴量との間に若干の変動が生じる。 In the method using the local feature amount, the local feature amount is defined as information including a plurality of elements that are rotation invariant and enlargement / reduction invariant. Thereby, even when the image is rotated, enlarged or reduced, the search can be performed. The local feature amount is generally expressed as a vector. However, it is a theoretical story that the local feature is invariant to rotation and enlargement / reduction. In an actual digital image, the local feature before image rotation and enlargement / reduction processing corresponds to that after processing. Some variation occurs between the local feature amount.

回転不変の局所特徴量抽出のために、たとえば非特許文献２では、局所特徴点周辺の局所領域の画素パターンから主方向を算出し、局所特徴量算出時に主方向を基準に局所領域を回転させて方向の正規化を行う。また、拡大・縮小不変の局所特徴量を算出するために、異なるスケールの画像を内部で生成し、各スケールの画像からそれぞれ局所特徴点の抽出と局所特徴量の算出を行う。ここで、内部で生成した一連の異なるスケールの画像集合は一般的にスケールスペースと呼ばれる。 In order to extract the rotation-invariant local feature value, for example, in Non-Patent Document 2, the main direction is calculated from the pixel pattern of the local region around the local feature point, and the local region is rotated based on the main direction when calculating the local feature value. To normalize the direction. Further, in order to calculate the local feature quantity that does not change in size, the image of different scales is generated internally, and local feature points are extracted from the images of the respective scales and the local feature quantities are calculated. Here, a series of image sets of different scales generated internally is generally called a scale space.

上述の方式により、１枚の画像から複数の局所特徴点が抽出される。局所特徴量を用いた画像検索では、それぞれの局所特徴点から算出した局所特徴量同士の比較を行うことによりマッチングを行う。多く利用されている投票方式（特許文献１）は、検索元画像から抽出された各特徴点の局所特徴量に予め定めた閾値以上類似する特徴点を最近傍処理で見つけ、存在すれば「画像」に対して１票を投票し、その投票数の多いものほど類似するとするものである。 With the above-described method, a plurality of local feature points are extracted from one image. In image retrieval using local feature amounts, matching is performed by comparing local feature amounts calculated from respective local feature points. A widely used voting method (Patent Document 1) finds a feature point that is similar to a local feature amount of each feature point extracted from a search source image by a nearest threshold process. ”Is voted, and the higher the number of votes, the more similar.

更に、予め定めた閾値以上類似する特徴点の対応関係を求め、そのペアの位置情報が、同じ幾何変換を満たしているかを検証するＲＡＮＳＡＣ処理もある。これは、予め定めた閾値以上類似する特徴点のペアからランダムに２ペアを選択しアフィン変換行列を求める。次に、残りの予め定めた閾値以上類似する特徴点のペアの位置情報がアフィン変換行列を満たすかを検証し、予め定めた閾値数のペアが満たす場合に合致と判断する方法である。（非特許文献３） Furthermore, there is also a RANSAC process that obtains a correspondence relationship between feature points that are similar to each other by a predetermined threshold or more and verifies whether the position information of the pair satisfies the same geometric transformation. In this process, two pairs are randomly selected from pairs of feature points that are similar to each other by a predetermined threshold or more to obtain an affine transformation matrix. Next, it is a method of verifying whether or not the position information of the remaining feature point pairs that are more than a predetermined threshold satisfies the affine transformation matrix, and determining that they match if a predetermined number of threshold pairs are satisfied. (Non Patent Literature 3)

特開２００９−２８４０８４号公報JP 2009-284084 A

Ｃ．ＨａｒｒｉｓａｎｄＭ．Ｊ．Ｓｔｅｐｈｅｎｓ，“Ａｃｏｍｂｉｎｅｄｃｏｒｎｅｒａｎｄｅｄｇｅｄｅｔｅｃｔｏｒ，” ＩｎＡｌｖｅｙＶｉｓｉｏｎＣｏｎｆｅｒｅｎｃｅ，ｐａｇｅｓ１４７−１５２，１９８８．C. Harris and M.M. J. et al. Stephens, “A combined corner and edge detector,” In Album Vision Conference, pages 147-152, 1988. ＤａｖｉｄＧ．Ｌｏｗｅ， “ＤｉｓｔｉｎｃｔｉｖｅＩｍａｇｅＦｅａｔｕｒｅｓｆｒｏｍＳｃａｌｅ−ＩｎｖａｒｉａｎｔＫｅｙｐｏｉｎｔｓ，” ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，６０，２（２００４），ｐｐ．９１−１１０．David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60, 2 (2004), pp. 199 91-110. Ｍ．Ａ．ＦｉｓｃｈｌｅｒａｎｄＲ．Ｃ．Ｂｏｌｌｅｓ， ”Ｒａｎｄｏｍｓａｍｐｌｅｃｏｎｓｅｎｓｕｓ：Ａｐａｒａｄｉｇｍｆｏｒｍｏｄｅｌｆｉｔｔｉｎｇｗｉｔｈａｐｐｌｉｃａｔｉｏｎｓｔｏｉｍａｇｅａｎａｌｙｓｉｓａｎｄａｕｔｏｍａｔｅｄｃａｒｔｏｇｒａｐｈｙ，” Ｃｏｍｍｕｎ．ＡＣＭ，ｎｏ．２４，ｖｏｌ．６，ｐｐ．３８１−３９５，Ｊｕｎｅ１９８１．M.M. A. Fischler and R.M. C. Bolles, "Random sample consensus: A paradigm formatfitting with applications to imaging analysis," Commun. ACM, no. 24, vol. 6, pp. 381-395, June 1981.

特許文献１に記載の方法では、比較元画像（検索元画像）と比較先画像（登録画像）との画像間で対応する特徴点に基づいて、その類似の程度を比較、評価する。そのため、例えば、文字を含む画像を対象とする場合、文字部から抽出される特徴点の数が、画像中の自然画像、背景画像、背景パターンに含まれる特徴点よりも相対的に少ないと、文字部の差異は吸収されてしまう。具体的には、チラシ、パンフレット、案内状といった印刷物によくあるような、同一背景で文字だけが異なるような画像比較においては、文字部の特徴点の差が反映されにくく、比較元画像に類似する画像として複数の画像が検索されてしてしまう可能性がある。そこで、本発明は、画像の大部分が類似し一部に差異があるような画像同士の比較においても、比較元画像と比較先画像とを精度よく比較出来るようにすることを目的とする。 In the method described in Patent Document 1, the degree of similarity is compared and evaluated based on corresponding feature points between images of a comparison source image (search source image) and a comparison destination image (registered image). Therefore, for example, when targeting an image including characters, if the number of feature points extracted from the character portion is relatively smaller than the feature points included in the natural image, background image, and background pattern in the image, Differences in character parts are absorbed. Specifically, in image comparisons such as flyers, pamphlets, and guides, where only the characters in the same background are different, the difference in the feature points of the character part is not easily reflected and is similar to the comparison source image. There is a possibility that a plurality of images may be searched as images to be performed. Therefore, an object of the present invention is to make it possible to accurately compare a comparison source image and a comparison target image even in comparison between images in which most of the images are similar and partially different.

上記課題を解決するために、本発明は、比較元画像と比較先画像とを入力する入力手段と、前記入力された比較元画像と比較先画像との差異領域を推定する推定手段と、前記推定された差異領域に対してＯＣＲ処理を実行するＯＣＲ処理部と、を有することを特徴とする。 In order to solve the above-described problem, the present invention provides an input unit that inputs a comparison source image and a comparison destination image, an estimation unit that estimates a difference area between the input comparison source image and the comparison destination image, And an OCR processing unit that performs an OCR process on the estimated difference area.

以上の構成によれば、本発明は、比較元画像と比較先画像とを精度よく比較出来るようになる。 According to the above configuration, the present invention can accurately compare the comparison source image and the comparison destination image.

第１の実施形態に係る画像処理装置のハードウェア構成を示すブロック図。1 is a block diagram showing a hardware configuration of an image processing apparatus according to a first embodiment. 第１の実施形態に係る画像処理装置のソフトウェア構成を示すブロック図。FIG. 2 is a block diagram showing a software configuration of the image processing apparatus according to the first embodiment. 第１の実施形態に係る画像処理装置における比較処理のフローチャート。5 is a flowchart of comparison processing in the image processing apparatus according to the first embodiment. 第１の実施形態に係る比較処理の概要を説明する図。The figure explaining the outline | summary of the comparison process which concerns on 1st Embodiment. 第１の実施形態においてＲＡＮＳＡＣ処理を用いた幾何関係算出のフローチャート。The flowchart of the geometric relationship calculation using the RANSAC process in 1st Embodiment.

［第１の実施形態］
以下、本発明の第１の実施形態について、図面を参照しながら説明する。まず、本実施形態の画像処理装置のハードウェア構成について、図１のブロック図を参照して説明する。本実施形態の画像処理装置は、サーバ装置やクライアント装置により構成される。サーバ装置やクライアント装置はそれぞれ単一のコンピュータ装置で実現してもよいし、必要に応じた複数のコンピュータ装置に各機能を分散して実現するようにしてもよい。複数のコンピュータ装置で構成される場合は、互いに通信可能なようにＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ（ＬＡＮ）などで接続されている。コンピュータ装置は、パーソナルコンピュータ（ＰＣ）やワークステーション（ＷＳ）等の情報処理装置によって実現することができる。 [First Embodiment]
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. First, the hardware configuration of the image processing apparatus according to the present embodiment will be described with reference to the block diagram of FIG. The image processing apparatus according to the present embodiment includes a server apparatus and a client apparatus. Each of the server device and the client device may be realized by a single computer device, or may be realized by distributing each function to a plurality of computer devices as necessary. When configured by a plurality of computer devices, they are connected by a local area network (LAN) or the like so that they can communicate with each other. The computer device can be realized by an information processing device such as a personal computer (PC) or a workstation (WS).

図１において、ＣＰＵ１０１はコンピュータ装置１００全体を制御するＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔである。ＲＯＭ１０２は、変更を必要としないプログラムやパラメータを格納するＲｅａｄＯｎｌｙＭｅｍｏｒｙである。ＲＡＭ１０３は、外部装置などから供給されるプログラムやデータを一時記憶するＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙである。外部記憶装置１０４は、コンピュータ装置１００に固定して設置されたハードディスクやメモリカードなどの記憶装置である。なお、外部記憶装置１０４は、コンピュータ装置１００から着脱可能なフレキシブルディスク（ＦＤ）やＣｏｍｐａｃｔＤｉｓｋ（ＣＤ）等の光ディスク、磁気や光カード、ＩＣカード、メモリカードなどを含んでもよい。入力デバイスインターフェイス１０５はユーザの操作を受け、データを入力するポインティングデバイスやキーボードなどの入力デバイス１０９とのインターフェイスである。 In FIG. 1, a CPU 101 is a central processing unit that controls the entire computer apparatus 100. The ROM 102 is a Read Only Memory that stores programs and parameters that do not need to be changed. The RAM 103 is a Random Access Memory that temporarily stores programs and data supplied from an external device. The external storage device 104 is a storage device such as a hard disk or a memory card that is fixedly installed in the computer device 100. The external storage device 104 may include an optical disk such as a flexible disk (FD) and a Compact Disk (CD) that can be detached from the computer apparatus 100, a magnetic or optical card, an IC card, a memory card, and the like. The input device interface 105 is an interface with an input device 109 such as a pointing device or a keyboard that receives data from a user and inputs data.

出力デバイスインターフェイス１０６は、コンピュータ装置１００の保持するデータや供給されたデータを表示するためのモニタ１１０とのインターフェイスである。通信インターフェイス１０７はインターネットなどのネットワーク回線１１１や、デジタルカメラ１１２，デジタルビデオカメラ１１３，スマートフォン１１４などに接続するための通信インターフェイスである。システムバス１０８は、１０１から１０７の各ユニットを通信可能に接続する伝送路である。後述する本実施形態の各フローチャートの処理等は、ＲＯＭ１０２等のコンピュータ読み取り可能な記憶媒体に格納されたプログラムをＣＰＵ１０１が実行することにより実行される。 The output device interface 106 is an interface with a monitor 110 for displaying data held by the computer apparatus 100 and supplied data. The communication interface 107 is a communication interface for connecting to a network line 111 such as the Internet, a digital camera 112, a digital video camera 113, a smartphone 114, and the like. The system bus 108 is a transmission path that connects the units 101 to 107 so that they can communicate with each other. The processing of each flowchart of the present embodiment, which will be described later, is executed by the CPU 101 executing a program stored in a computer-readable storage medium such as the ROM 102.

図２は、本実施形態に係る画像処理装置のソフトウェア構成を示すブロック図である。同図において、画像入力部２０１は、比較元画像（検索元画像）および比較先画像（登録画像）の入力を行う。画像特徴量群抽出部２０２は、画像入力部２０１から入力された比較元画像および比較先画像の画像特徴量群を抽出する。すなわち、画像特徴量群抽出部２０２は、比較元画像と比較先画像に対し、特徴点抽出を行い、特徴点近傍の画素からＳＩＦＴの様な回転・拡縮不変の特徴量の抽出処理を行う。画像特徴量群抽出部２０２は、各画像から局所特徴点を抽出する特徴点抽出手段、および抽出した局所特徴点について局所特徴を算出する特徴算出手段（第１の算出手段）としての機能を有する。 FIG. 2 is a block diagram showing a software configuration of the image processing apparatus according to the present embodiment. In the figure, an image input unit 201 inputs a comparison source image (search source image) and a comparison destination image (registered image). The image feature amount group extraction unit 202 extracts an image feature amount group of the comparison source image and the comparison destination image input from the image input unit 201. That is, the image feature quantity group extraction unit 202 performs feature point extraction on the comparison source image and the comparison destination image, and performs extraction processing of rotation / scaling invariant feature quantities such as SIFT from pixels near the feature point. The image feature quantity group extraction unit 202 has a function as a feature point extraction unit that extracts local feature points from each image and a feature calculation unit (first calculation unit) that calculates local features for the extracted local feature points. .

なお、本実施形態では、比較元画像と比較先画像の二つの画像を比較する場合について記載する。ただし、これに限るものではなく、比較先画像が複数あり、当該比較先画像集合に対して予め特徴量を算出しておくことで、比較元画像の特徴量群と、算出しておいた特徴量群とを比較するようにしても構わない。その際、比較元画像の各特徴量に対し、予め定めた閾値内に存在する最近傍点の集合を求め、当該最近傍点が同一比較先画像由来の特徴量を多く含む場合に、当該比較先画像を類似画像候補とみなす類似画像検索処理の一環として、後述する処理を適用してもよい。 In the present embodiment, a case where two images of a comparison source image and a comparison destination image are compared will be described. However, the present invention is not limited to this, and there are a plurality of comparison destination images. By calculating feature amounts in advance for the comparison destination image set, the feature amount group of the comparison source images and the calculated features You may make it compare with a quantity group. At that time, for each feature quantity of the comparison source image, a set of nearest neighbor points existing within a predetermined threshold is obtained, and when the nearest neighbor point includes many feature quantities derived from the same comparison destination image, the comparison destination image As a part of the similar image search process that regards as a similar image candidate, a process described later may be applied.

画像特徴量比較部２０３は、ＲＡＮＳＡＣ処理を行い、更に特徴点座標を比較し、検索結果としてふさわしいものかを確認する処理である。具体的には、画像特徴量比較部２０３は、比較元画像の局所特徴量と比較先画像の局所特徴量の類似性に基いて、対応関係にある局所特徴点を決定する対応点決定手段（第１の決定手段）としての機能を有する。また、アフィン変換を行い変換後の特徴点座標が合致する特徴点の度数と、座標が一致しない度数をカウントする投票数集計手段としての機能、投票数が予め定めた閾値を超えた場合にリトライを停止する停止手段としての機能を有する。また、画像特徴量比較部２０３は、投票結果から比較元画像と比較先画像の共通領域を推定する共通領域推定手段（第１の推定手段）としての機能を有する。また、共通領域中で局所特徴量と比較先画像の局所特徴量の類似性に基づく対応関係の無い特徴点と、アフィン変換を行い変換後の特徴点座標が合致しない特徴点を決定する非正対応点抽出手段（第２の決定手段）としての機能を有する。また、これらの結果を用い、比較元画像と比較先画像の差異のある領域を求める差異領域推定手段（第２の推定手段）としての機能も有する。 The image feature amount comparison unit 203 is a process of performing RANSAC processing, further comparing feature point coordinates, and confirming whether the search result is appropriate. Specifically, the image feature quantity comparison unit 203 determines corresponding local feature points that are in a correspondence relationship based on the similarity between the local feature quantity of the comparison source image and the local feature quantity of the comparison destination image ( A first determination means). Also, a function as a vote counting means that counts the frequency of feature points that match the converted feature point coordinates by performing affine transformation and the frequency that the coordinates do not match, retry when the number of votes exceeds a predetermined threshold It has the function as a stop means to stop. Further, the image feature amount comparison unit 203 has a function as a common region estimation unit (first estimation unit) that estimates a common region of the comparison source image and the comparison destination image from the voting result. In addition, in the common area, a feature point that has no correspondence based on the similarity between the local feature amount and the local feature amount of the comparison target image and a feature point that performs affine transformation and does not match the feature point coordinates after transformation are determined. It has a function as corresponding point extraction means (second determination means). Moreover, it also has a function as a different area estimation means (second estimation means) for obtaining an area having a difference between the comparison source image and the comparison destination image using these results.

画像正立処理部２０４は、比較元画像に対して、画像特徴量比較部２０３で求めたアフィン変換等の幾何変換係数によって回転・拡縮等の変換処理を実施し、比較先画像と位置合わせを行う。本実施形態では、比較元画像としてスキャンされたイメージデータ、比較先画像としてラスタライズされた文書画像データのユースケースを考えており、比較先画像としては、印刷物の元画像として正立している場合を想定できる。このような場合であれば、比較先画像に位置合わせをすることで、比較元画像が回転していても、正立化することができる。これにより、後述するＯＣＲ処理部２０５で行うＯＣＲ処理を精度よく実施できる。 The image erecting processing unit 204 performs a conversion process such as rotation / enlargement / reduction on the comparison source image using a geometric conversion coefficient such as an affine transformation obtained by the image feature amount comparison unit 203, and aligns with the comparison target image. Do. In this embodiment, a use case of scanned image data as a comparison source image and rasterized document image data as a comparison destination image is considered, and the comparison destination image is upright as an original image of a printed material Can be assumed. In such a case, by aligning with the comparison destination image, it is possible to erect even if the comparison source image is rotated. Thereby, the OCR processing performed by the OCR processing unit 205 described later can be performed with high accuracy.

ＯＣＲ処理部２０５は、画像特徴量比較部２０３で求めた差異領域について、比較先画像から部分画像を抽出する。同時に、画像正立処理部２０４で位置合わせを行った変換後の比較元画像から、同領域について部分画像を抽出する。そして、それぞれの部分画像に対してＯＣＲ処理を実施して、当該部分画像の文字認識情報を得る。 The OCR processing unit 205 extracts a partial image from the comparison destination image for the difference area obtained by the image feature amount comparison unit 203. At the same time, a partial image is extracted for the same region from the converted comparison source image that has been aligned by the image erecting processing unit 204. Then, OCR processing is performed on each partial image to obtain character recognition information of the partial image.

ＯＣＲ結果比較部２０６は、ＯＣＲ処理部２０５で得た、比較元画像と比較先画像の差異部分の文字認識情報を比較し、特徴量比較において差異のあった領域での文字的な情報の違いを精査する。そして、出力部２０７は、ＯＣＲ結果比較部２０６の比較結果を出力する。記憶部２０８は処理中のデータを記憶するメモリ、ＨＤＤ等である。なお、これら各構成は、ＣＰＵ１０１により統括的に制御されている。 The OCR result comparison unit 206 compares the character recognition information of the difference part between the comparison source image and the comparison target image obtained by the OCR processing unit 205, and the difference in character information in the region where there is a difference in the feature amount comparison. Scrutinize. Then, the output unit 207 outputs the comparison result of the OCR result comparison unit 206. The storage unit 208 is a memory, HDD, or the like that stores data being processed. Note that each of these components is centrally controlled by the CPU 101.

次に、本実施形態における比較元画像と比較先画像との比較処理の詳細について説明する。図３は、本実施形態に係る画像処理装置における比較処理手順の一例を示すフローチャートである。同図において、まずステップＳ３０１では、画像入力部２０１により比較元画像と比較先画像が読み込まれる。図４は、本実施形態に係る比較処理の概略を説明する図であり、このＳ３０１では、図４（ａ）に示すような比較元画像と比較先画像が読み込まれたものとする。なお、比較先画像については、この処理フローよりも前に予め読み込まれ、記憶部２０８にその情報が登録されていてもよい。 Next, details of the comparison process between the comparison source image and the comparison destination image in the present embodiment will be described. FIG. 3 is a flowchart illustrating an example of a comparison processing procedure in the image processing apparatus according to the present embodiment. In the figure, first in step S301, the image input unit 201 reads a comparison source image and a comparison destination image. FIG. 4 is a diagram for explaining the outline of the comparison processing according to this embodiment. In this S301, it is assumed that the comparison source image and the comparison destination image as shown in FIG. Note that the comparison target image may be read in advance before this processing flow, and the information may be registered in the storage unit 208.

ステップＳ３０２では、画像特徴量群抽出部２０２が、比較元画像および比較先画像の両画像について局所特徴を計算する。まず、画像特徴量群抽出部２０２は、画像の回転があってもロバスト（ｒｏｂｕｓｔ）に抽出されるような局所的な特徴点（局所特徴点）を抽出する。この局所特徴点の抽出方法として、ここではＨａｒｒｉｓ作用素を用いる（Ｃ．ＨａｒｒｉｓａｎｄＭ．Ｊ．Ｓｔｅｐｈｅｎｓ， “Ａｃｏｍｂｉｎｅｄｃｏｒｎｅｒａｎｄｅｄｇｅｄｅｔｅｃｔｏｒ，” ＩｎＡｌｖｅｙＶｉｓｉｏｎＣｏｎｆｅｒｅｎｃｅ，ｐａｇｅｓ１４７−１５２，１９８８．参照）。 In step S302, the image feature quantity group extraction unit 202 calculates local features for both the comparison source image and the comparison target image. First, the image feature quantity group extraction unit 202 extracts local feature points (local feature points) that are robustly extracted even when the image is rotated. As a method for extracting local feature points, a Harris operator is used here (see C. Harris and MJ Stephens, “A combined corner and edge detector,” In Alvery Vision Conference, pages 147-152, 1988.).

具体的には、Ｈａｒｒｉｓ作用素を作用させて得られた出力画像Ｈ上の画素について、当該画素及び当該画素の８近傍にある画素（合計９画素）の画素値を調べる。そして、当該画素が局所極大になる（当該９画素の中で当該画素の画素値が最大になる）点を局所特徴点として抽出する。ここで、当該画素が局所極大になったときでも、当該画素の値がしきい値以下の場合には局所特徴点として抽出しないようにする。なお、局所特徴点を抽出可能な方法であれば、上述のＨａｒｒｉｓ作用素による特徴点抽出方法に限らず、どのような特徴点抽出方法でも適用可能である。 Specifically, with respect to the pixel on the output image H obtained by applying the Harris operator, the pixel values of the pixel and pixels in the vicinity of the pixel (eight pixels in total) (total nine pixels) are examined. Then, a point at which the pixel becomes a local maximum (a pixel value of the pixel becomes the maximum among the nine pixels) is extracted as a local feature point. Here, even when the pixel reaches a local maximum, it is not extracted as a local feature point if the value of the pixel is less than or equal to the threshold value. Note that any feature point extraction method is applicable as long as it is a method that can extract local feature points, not limited to the above-described feature point extraction method using the Harris operator.

そして、画像特徴量群抽出部２０２は、抽出した局所特徴点の特徴量（局所特徴）を算出する。局所特徴としては、ＳＩＦＴ、ＳＵＲＦ、ＦＡＳＴ等の公知の技術を利用することができる。また、比較先画像が記憶部２０８に予め登録されている場合には、前もって局所特徴を求めて記憶部２０８に記憶しておいてもよい。 Then, the image feature quantity group extraction unit 202 calculates the feature quantity (local feature) of the extracted local feature points. As local features, known techniques such as SIFT, SURF, FAST, etc. can be used. When the comparison destination image is registered in the storage unit 208 in advance, local features may be obtained in advance and stored in the storage unit 208.

ステップＳ３０３で、画像特徴量比較部２０３は、比較元画像から比較先画像に対して投票処理を行い、投票数と対応点のペアを求める。ここでは、まず、比較先画像の局所特徴をＶｑ、局所特徴に関連付けされている局所特徴点をＱ、座標をＱ（ｘ’，ｙ’）とする。また、比較元画像に存在する局所特徴をＶｓ、局所特徴に関連付けされている局所特徴点をＳ、座標をＳ（ｘ，ｙ）とする。画像特徴量比較部２０３は、ＶｑとＶｓとのベクトルベクトル特徴間距離を全ての組合せについて計算し、最短距離対応点リストを作成する。そして、計算したベクトル特徴間の距離が閾値Ｔｖ以下となり、かつ、最短距離となるようなＶｑとＶｓとの対応点ペアを抽出する。 In step S303, the image feature amount comparison unit 203 performs a voting process on the comparison target image from the comparison source image to obtain a pair of the number of votes and corresponding points. Here, first, the local feature of the comparison target image is Vq, the local feature point associated with the local feature is Q, and the coordinates are Q (x ′, y ′). Further, a local feature existing in the comparison source image is Vs, a local feature point associated with the local feature is S, and a coordinate is S (x, y). The image feature quantity comparison unit 203 calculates vector vector feature distances between Vq and Vs for all combinations, and creates a shortest distance corresponding point list. Then, a corresponding point pair of Vq and Vs is extracted such that the calculated distance between the vector features is equal to or less than the threshold value Tv and is the shortest distance.

ステップＳ３０４では、画像特徴量比較部２０３が、投票数が閾値以上か判断する。投票数が閾値未満の場合には、処理をステップＳ３０５に進め、比較元画像と比較先画像とは類似するものではないとして処理を終了する。 In step S304, the image feature amount comparison unit 203 determines whether the number of votes is equal to or greater than a threshold value. If the number of votes is less than the threshold value, the process proceeds to step S305, and the process ends, assuming that the comparison source image and the comparison destination image are not similar.

他方、投票数が閾値以上の場合には、処理をステップＳ３０６へ進める。そして、ステップＳ３０６にて、抽出した対応点ペアに対してＲＡＮＳＡＣを用い、比較元画像と比較先画像の特徴点位置の幾何関係を求め、幾何関係が正しいもの、すなわち正対応点を決定する。そして、画像特徴量比較部２０３は、正対応点の個数を求め、ペアリングスコアとする。それと同時に、対応点ペアで幾何関係を満たさない特徴点である非正対応点を決定し、その個数を求める。画像特徴量比較部２０３は、比較元画像および比較先画像それぞれの非正対応点を記憶部２０８に記憶する。なお、本ステップＳ３０６の処理の詳細は後述する。 On the other hand, if the number of votes is equal to or greater than the threshold, the process proceeds to step S306. In step S306, RANSAC is used for the extracted corresponding point pair to obtain the geometric relationship between the feature point positions of the comparison source image and the comparison destination image, and the correct geometric relationship, that is, the positive corresponding point is determined. Then, the image feature amount comparison unit 203 obtains the number of positive corresponding points and uses it as a pairing score. At the same time, non-positive corresponding points that are feature points that do not satisfy the geometric relationship in the corresponding point pairs are determined, and the number thereof is obtained. The image feature amount comparison unit 203 stores the non-positive corresponding points of the comparison source image and the comparison destination image in the storage unit 208. Details of the process in step S306 will be described later.

ステップＳ３０７において、画像特徴量比較部２０３は、比較先画像上の正対応点から共通領域を特定する。具体的には、比較先画像における正対応点の分布に基づいて、正対応点を包含する領域を求めて、それを共通領域として特定する。領域の求め方として、簡易には正対応点を含む最外接矩形を求めればよい。なお、領域の形状は矩形に限らず、多角形や不定形状等の他の形態であっても構わない。図４に示す例では、比較元画像がある文書画像の一部であるのに対して、比較先画像はある文書画像全体であるため、ここでは、比較先画像上で共通領域を特定する。すなわち、比較元画像と比較先画像のどちらで共通領域を特定するか予め決められていてもよいが、好適には、比較元画像と比較先画像とで包含関係を推定し、より多くの領域をカバーする画像を選択して、その画像上で共通領域を特定する方がよい。 In step S <b> 307, the image feature amount comparison unit 203 specifies a common area from the positive corresponding points on the comparison target image. Specifically, based on the distribution of the positive corresponding points in the comparison target image, an area including the positive corresponding points is obtained and specified as a common area. As a method for obtaining the region, it is only necessary to simply obtain the circumscribed rectangle including the positive corresponding point. Note that the shape of the region is not limited to a rectangle, but may be another shape such as a polygon or an indefinite shape. In the example shown in FIG. 4, since the comparison source image is a part of a certain document image, the comparison destination image is the entire document image. Therefore, here, a common area is specified on the comparison destination image. That is, it may be determined in advance whether the comparison source image or the comparison destination image specifies the common region, but preferably, the inclusion relationship is estimated between the comparison source image and the comparison destination image, and more regions are determined. It is better to select an image that covers and identify a common area on the image.

ここで、図４（ｂ）に、ステップＳ３０２〜ステップＳ３０７の処理の概要を示す。同図において、○や×で示している個所が局所特徴を求める特徴点である。図４（ｂ）では、ＲＡＮＳＡＣ処理で局所特徴が類似し、その幾何関係を満足するものを正対応点として○で表示する。他方、局所特徴が類似するものが無い、または局所特徴が類似していても幾何関係を満足しない点を非正対応点として×で示す。図中の点線は、対応関係を表すものである。また、ステップＳ３０７までの処理によって、正対応点○の分布に基づき求めた共通領域を図４（ｂ）の比較先画像上の黒い矩形で示している。 Here, FIG. 4B shows an outline of the processing in steps S302 to S307. In the figure, the points indicated by ○ and × are feature points for obtaining local features. In FIG. 4B, a local feature that is similar in the RANSAC process and satisfies the geometric relationship is indicated by a circle as a positive corresponding point. On the other hand, a point that does not satisfy the geometric relationship even if the local feature is not similar or the local feature is similar is indicated by x as a non-positive corresponding point. The dotted line in the figure represents the correspondence. Further, the common area obtained based on the distribution of the positive corresponding points ◯ by the processing up to step S307 is indicated by a black rectangle on the comparison target image in FIG.

続いて、ステップＳ３０８では、画像特徴量比較部２０３が、比較先画像上で特定された共通領域における非正対応点の特徴点群を求める。まず、画像特徴量比較部２０３は、比較元画像の非正対応点と対応点ペアにならなかった比較先画像の共通領域における特徴点（類似する対応点の無い特徴点）を求める。更に、比較先画像の非正対応点と対応点ペアにならなかった比較元画像の特徴点に対して、ステップＳ３０６で求まった幾何変換を行い、比較先画像の共通領域の中に射影されるものだけを選別し求める。なお、本実施形態では、対応点ペアにならなかった特徴点、すなわち、閾値Ｔｖｅｃ内の距離の対応点が無かった特徴点も非正対応点に含めるものとする。 Subsequently, in step S308, the image feature amount comparison unit 203 obtains a feature point group of non-positive corresponding points in the common area specified on the comparison destination image. First, the image feature amount comparison unit 203 obtains a feature point (a feature point having no similar corresponding point) in the common area of the comparison destination image that has not been paired with a non-positive corresponding point of the comparison source image. Furthermore, the geometric transformation obtained in step S306 is performed on the feature points of the comparison source image that did not form a pair of corresponding points with the non-positive corresponding point of the comparison destination image, and projected into the common area of the comparison destination image. Select and ask only for things. In the present embodiment, feature points that did not become a corresponding point pair, that is, feature points that did not have a corresponding point within the threshold Tvec are included in the non-positive corresponding points.

これらの結果の論理和を取ることにより、比較元画像及び比較先画像の共通領域における比較元画像と比較先画像の一致しない特徴点群が求まる。図４（ｃ）には、ステップＳ３０８の処理結果を示しており、黒枠で示す共通領域内において、上記２種類の特徴点の両方が求められた様子を示している。 By calculating the logical sum of these results, a feature point group in which the comparison source image and the comparison destination image do not match in the common region of the comparison source image and the comparison destination image is obtained. FIG. 4C shows the processing result of step S308, and shows a state in which both of the two types of feature points are obtained in the common area indicated by a black frame.

本実施形態では、これら２種類の特徴点の論理和を取ることで、比較元画像と比較先画像とで個別に上記特徴点を求めた場合や、非正対応点あるいは対応点ペアにならなかった特徴点をそれぞれ単独で求めた場合よりも差異領域を拡張して特定することが出来る。特に、比較元画像と比較先画像とで画像全体が類似して一部領域で差異があるような場合は、例えば、背景の画像が同一で文字の部分のみに差異がある場合等が考えられる。このような場合、文字では局所特徴点が然程とれず、差異領域を大きく取れないこともあるが、上述のように２種類の特徴点の論理和を取ることで、差異領域を拡張して取ることが出来る。 In the present embodiment, by calculating the logical sum of these two types of feature points, the above feature points are obtained individually for the comparison source image and the comparison destination image, or they do not become non-positive corresponding points or corresponding point pairs. The difference area can be expanded and specified as compared with the case where the obtained feature points are obtained individually. In particular, when the entire image is similar between the comparison source image and the comparison destination image and there is a difference in a part of the area, for example, a case where the background image is the same and only the character portion is different is considered. . In such a case, local feature points cannot be taken so much in characters, and a large difference area may not be obtained, but the difference area is expanded by taking the logical sum of two types of feature points as described above. I can take it.

上記２種類の非正対応点に基づいて差異領域を決定する方法は、例えば、ＯＣＲにおける像域分離の手法を応用して決定すればよい。像域分離ではｘ方向、ｙ方向に対する２値画像の黒画素の投影ヒストグラムを生成し文字や図の領域が推定するが、本実施形態では、比較先画像におけるｘ方向、ｙ方向の非正対応点のヒストグラムより差異領域が決定される。簡易的には、ｘ方向、ｙ方向それぞれでヒストグラムの値が所定値以上となる区間を含む最外接矩形として決定される。本実施形態では、例えば図４（ｄ）、（ｅ）に示すように、ｘ方向、ｙ方向の非正対応点のヒストグラムから１以上の領域群が差異領域が決定される。 The method of determining the difference area based on the two types of non-corresponding points may be determined by applying an image area separation method in OCR, for example. In the image area separation, a projection histogram of a black pixel of a binary image with respect to the x direction and the y direction is generated and a region of a character or a figure is estimated, but in this embodiment, non-positive correspondence in the x direction and the y direction in the comparison target image. A difference area is determined from the histogram of the points. For simplicity, it is determined as a circumscribed rectangle including a section in which the value of the histogram is a predetermined value or more in each of the x direction and the y direction. In the present embodiment, for example, as shown in FIGS. 4D and 4E, one or more area groups are determined as different areas from the histogram of non-positive corresponding points in the x direction and the y direction.

ステップＳ３０９では、画像特徴量比較部２０３が、比較先画像上で決定された各差異領域において非正対応点数を求める。そして、非正対応点が所定の閾値に満たない領域は差異領域から除外する。これは、例えば文書ドキュメントの紙の汚れなどによるイレギュラで生じた特徴点の影響を排除し、例えば文字等のある程度の大きさを持つ領域に差異領域を限定するためである。なお、ここでは、各差異領域において非正対応点数を求めているが、各差異領域に対する非正対応点数（密度）を算出して、密度が所定値よりも低いものを除外するようにしてもよい。 In step S309, the image feature amount comparison unit 203 obtains the number of non-positive corresponding points in each difference area determined on the comparison target image. Then, an area where the non-positive corresponding point is less than the predetermined threshold is excluded from the difference area. This is because, for example, the influence of the feature points caused by irregularity due to paper stains of the document document is eliminated, and the difference area is limited to an area having a certain size such as characters. Here, the number of non-corresponding corresponding points is obtained in each different area, but the number of non-positive corresponding points (density) for each different area is calculated to exclude those having a density lower than a predetermined value. Good.

ステップＳ３１０では、画像特徴量比較部２０３が、ステップＳ３０９で特定された差異領域を抽出する（切り出す）。本実施形態では、比較元画像は、回転・拡縮している可能性があるので、まず比較元画像を比較先画像との共通領域に位置合わせをした画像に変換した上で、比較元画像上での差異領域を得る。具体的には、ステップＳ３０６におけるＲＡＮＳＡＣ処理で求めた幾何変換のための係数（後述する図５における変換行列Ｍ、Ｔ）を用いて、比較元画像の各画素の幾何変換を行う。その上で、変換後の比較元画像と比較先画像それぞれから差異領域を切り出すようにする。図４（ｆ）は、ステップＳ３１０の処理により比較元画像を、比較先画像との共通領域に合うように回転させる様子を示している。また、図４（ｇ）は、変換後の比較元画像と比較先画像それぞれから差異領域を切り出す様子を示す図である。 In step S310, the image feature amount comparison unit 203 extracts (cuts out) the difference area specified in step S309. In this embodiment, since the comparison source image may be rotated / scaled, the comparison source image is first converted into an image aligned with the common area with the comparison destination image, and then the comparison source image Get the difference area at. Specifically, the geometric transformation of each pixel of the comparison source image is performed using the geometric transformation coefficients (transformation matrices M and T in FIG. 5 described later) obtained by the RANSAC process in step S306. Then, a different area is cut out from each of the converted comparison source image and comparison destination image. FIG. 4F shows a state in which the comparison source image is rotated so as to fit the common area with the comparison destination image by the process of step S310. FIG. 4G is a diagram illustrating a state in which a different area is cut out from each of the converted comparison source image and the comparison destination image.

ステップＳ３１１では、ＯＣＲ処理部２０５が、ステップＳ３１０で得られた比較元画像および比較先画像の各差異領域に対してＯＣＲ処理を実施し、文字認識情報を取得する。本実施形態では、文字認識情報の認識結果は、処理の結果として得られた「文字」ないし「文字列」の情報とする。なお、ＯＣＲ処理は一般的に公知の技術を利用すればよい。 In step S311, the OCR processing unit 205 performs OCR processing on each difference area of the comparison source image and the comparison destination image obtained in step S310, and acquires character recognition information. In this embodiment, the recognition result of the character recognition information is “character” or “character string” information obtained as a result of the processing. The OCR process may use a generally known technique.

本実施形態では、比較元画像および比較先画像の各差異領域に対してＯＣＲ処理を行うにあたり、差異領域毎にＯＣＲ処理のための文字の２値化画像の抽出処理を実施する。一般的なＯＣＲ処理では、文字を含む画像全体に対して、文字画像とその他の画像を分離するために、濃度による２値化処理を行う。例えば、文字が黒であり、背景が淡い色のパターン画像であれば、画素の濃度の頻度分布は、背景に該当する部分と、文字に該当する部分の大きく二つの山（周囲の濃度に対して頻度分布が高い部分）が現れる。よって、この二つの山を分離する濃度値に閾値を設定することで、文字とそれ以外の部分の画素を切り分けることができるため、当該閾値の決定は比較的容易に行える。そして、当該閾値以上の濃度の画素を文字の画素と見なすことで、文字の２値化画像が得られる。 In the present embodiment, when performing the OCR process on each difference area of the comparison source image and the comparison target image, a character binary image extraction process for the OCR process is performed for each difference area. In general OCR processing, binarization processing based on density is performed on an entire image including characters in order to separate the character image from other images. For example, if the character is black and the background is a light pattern image, the frequency distribution of pixel density is roughly divided into two peaks (the surrounding density and the surrounding density). Part with a high frequency distribution). Therefore, by setting a threshold value for the density value that separates these two peaks, it is possible to separate the character and the pixels of the other portions, so that the threshold value can be determined relatively easily. A binary image of a character is obtained by regarding a pixel having a density equal to or higher than the threshold as a character pixel.

しかし、チラシやパンフレット、招待状のような印刷物の場合、背景は複雑なパターンや自然画であることがあり、また文字色も場所によって異なることがある。このような状況で同様な２値化処理を行ったとしても、適切な閾値を求めることは困難であり、文字画像を精度よく抽出することは困難である。これに対し、本実施形態のステップＳ３１１におけるＯＣＲ処理では、各差異領域の部分画像に限定して２値化処理を行うことで、文字とそれ以外の画素の分離を行える可能性が高くなる。すなわち、印刷画像であっても、文字が記載された部分に限定すれば、人が視認可能なコントラストを付けている場合が多く、閾値がうまく決められる可能性が高いためである。 However, in the case of printed materials such as flyers, brochures, and invitations, the background may be a complex pattern or a natural image, and the character color may differ depending on the location. Even if similar binarization processing is performed in such a situation, it is difficult to obtain an appropriate threshold value, and it is difficult to extract a character image with high accuracy. On the other hand, in the OCR process in step S311 of the present embodiment, it is highly possible that the character and other pixels can be separated by performing the binarization process only on the partial images of the different areas. That is, even if it is a printed image, if it is limited to a portion where characters are described, it is often the case that a contrast that humans can visually recognize is added, and there is a high possibility that the threshold value can be determined well.

本実施形態では、上述したように比較元画像と比較先画像との差異領域を求めているが、このような差異領域はチラシなどの印刷物の画像中の文字部分である可能性が高く、精度よく文字認識できることが期待される。図４（ｈ）には、ステップＳ３１１の処理により、比較元画像および比較先画像の各差異領域から文字認識情報が取得される様子を概略的に示している。 In the present embodiment, as described above, the difference area between the comparison source image and the comparison target image is obtained. However, such a difference area is highly likely to be a character portion in an image of a printed matter such as a flyer, and is accurate. It is expected to be able to recognize characters well. FIG. 4H schematically shows how character recognition information is acquired from each difference region of the comparison source image and the comparison destination image by the process of step S311.

ステップＳ３１２では、ＯＣＲ結果比較部２０６が、比較元画像と比較先画像の対応する差異部分から得た「文字」ないし「文字列」をそれぞれ比較する。そして、ステップＳ３１３で、出力部２０７が、上記比較した文字認識情報の差異の情報を出力する。本実施形態は、差異の情報の出力の仕方により限定されるものではないが、ここでは、認識した文字列の違いを出力する。図４の比較元画像と比較先画像との比較であれば、図４（ｅ）のように、「６．８」と「１２．３１」の差異があるという情報を出力され、このような情報は、例えば、画像処理装置に備えられた表示部により表示される。また、例えば、ＯＣＲ結果比較部２０６の比較結果をスコア化し、局所特徴量による投票から得られるスコアから減じることで、類似スコアに基づく画像検索の処理に供する等、他の構成をとるようにしてもよい。 In step S312, the OCR result comparison unit 206 compares “characters” or “character strings” obtained from corresponding differences between the comparison source image and the comparison target image. In step S313, the output unit 207 outputs information on the difference of the compared character recognition information. The present embodiment is not limited by the method of outputting the difference information, but here, the recognized character string difference is output. If the comparison source image and the comparison destination image in FIG. 4 are compared, information indicating that there is a difference between “6.8” and “12.31” is output as shown in FIG. The information is displayed, for example, by a display unit provided in the image processing apparatus. Further, for example, the comparison result of the OCR result comparison unit 206 is scored and subtracted from the score obtained from the voting based on the local feature amount, so that it is used for image search processing based on the similarity score, and other configurations are adopted. Also good.

次に、上述したステップＳ３０６の処理の詳細について図５を用いて説明する。図５は、ＲＡＮＳＡＣ処理を用いた比較元画像と比較先画像の特徴点位置の幾何関係の算出のフローチャートである。 Next, details of the processing in step S306 described above will be described with reference to FIG. FIG. 5 is a flowchart for calculating the geometric relationship between the feature point positions of the comparison source image and the comparison destination image using the RANSAC process.

まず、記号の定義を行う。比較先画像の局所特徴をＶｑ、局所特徴に関連付けされている局所特徴点をＱ，座標をＱ（ｘ’，ｙ’）とする。また、比較元画像に存在する局所特徴をＶｓ、局所特徴に関連付けされている局所特徴点をＳ、座標をＳ（ｘ，ｙ）とする。 First, symbols are defined. It is assumed that the local feature of the comparison target image is Vq, the local feature point associated with the local feature is Q, and the coordinate is Q (x ′, y ′). Further, a local feature existing in the comparison source image is Vs, a local feature point associated with the local feature is S, and a coordinate is S (x, y).

ステップＳ５０１で、最終投票数を表す変数ＶｏｔｅＭａｘを０に初期化する。次に、ステップＳ５０２で、Ｓ３０３の処理結果である最短距離対応点リストを読み込む。その内容は、繰り返しになるが、計算したベクトル特徴間の距離が閾値Ｔｖ以下となり、かつ、最短距離となるようなＶｑとＶｓとの対応点ペアである。 In step S501, a variable VoteMax representing the final number of votes is initialized to zero. Next, in step S502, the shortest distance corresponding point list which is the processing result of S303 is read. The content is repeated, but is a corresponding point pair of Vq and Vs such that the calculated distance between the vector features is equal to or less than the threshold value Tv and the shortest distance.

これ以降、最短距離対応点リストに登録されたｋ番目の対応点について、当該対応点の局所特徴をそれぞれＶｑ（ｋ）とＶｓ（ｋ）と記載としベクトル特徴間の距離が閾値Ｔｖ以下となる特徴の個数をそれぞれＮｑ（ｋ）とＮｓ（ｋ）とする。更にＶｑ（ｋ）とＶｓ（ｋ）に対応付けられている局所特徴点をそれぞれＱｋ、Ｓｋ、座標をＱｋ（ｘ’ｋ，ｙ’ｋ）、Ｓｋ（ｘ_ｋ，ｙ_ｋ）などと添え字を合わせて記載する。またテップＳ３０３で作成された最短距離対応点リストに登録された対応点の組数をｍ組とする。 Thereafter, for the kth corresponding point registered in the shortest distance corresponding point list, the local features of the corresponding point are described as Vq (k) and Vs (k), respectively, and the distance between the vector features is equal to or less than the threshold Tv. Let the number of features be Nq (k) and Ns (k), respectively. Furthermore, local feature points associated with Vq (k) and Vs (k) are subscripts such as Qk and Sk, coordinates are Qk (x′k, y′k), Sk (x _k , y _k ), and the like. Are described together. Further, the number of pairs of corresponding points registered in the shortest distance corresponding point list created in step S303 is m.

次に、ステップＳ５０３で、類似度算出処理の反復カウント数を表す変数Ｃｏｕｎｔを０に初期化する。次に、ステップＳ５０４で、反復カウント数Ｃｏｕｎｔが予め定められた最大反復処理回数Ｒｎを超えていないを判定する。ここで、超えている場合はステップＳ５２１へ進み、最終投票数ＶｏｔｅＭａｘを出力して、この処理を終了する。 In step S503, a variable Count indicating the number of iterations of the similarity calculation process is initialized to zero. In step S504, it is determined whether the iteration count number Count has not exceeded a predetermined maximum number of iterations Rn. Here, when it exceeds, it progresses to step S521, the final vote number VoteMax is output, and this process is complete | finished.

また、ステップＳ５０４で、反復カウント数Ｃｏｕｎｔが最大反復処理回数Ｒｎを超えていない場合はステップＳ５０５へ進み、投票数を表す変数Ｖｏｔｅを０に初期化する。次に、ステップＳ５０６で、当該最短距離対応点リストから対応点の組の座標を２組ランダムに抽出する。 If it is determined in step S504 that the iteration count number Count does not exceed the maximum number of iterations Rn, the process proceeds to step S505, and a variable Vote indicating the number of votes is initialized to zero. Next, in step S506, two sets of coordinates of corresponding point pairs are randomly extracted from the shortest distance corresponding point list.

ここで、これらの座標をＱ１（ｘ’_１，ｙ’_１）、Ｓ１（ｘ_１，ｙ_１）及びＱ２（ｘ’_２，ｙ’_２）、Ｓ２（ｘ_２，ｙ_２）と記載する。次に、ステップＳ５０７で、抽出したＱ１（ｘ’_１，ｙ’_１）、Ｓ１（ｘ_１，ｙ_１）及びＱ２（ｘ’_２，ｙ’_２）、Ｓ２（ｘ_２，ｙ_２）が式（１）に示す変換を満たしていると仮定し、（数式１）中の変数ａからｆを求める。ただし、図５に示すステップＳ５０７では、変数ａからｄで構成される行列をＭで示し、変数ｅからｆで構成される行列をＴで示している。 Here, these coordinates are described as Q1 (x ′ ₁ , y ′ ₁ ), S1 (x ₁ , y ₁ ), Q2 (x ′ ₂ , y ′ ₂ ), and S2 (x ₂ , y ₂ ). Next, in step S507, the extracted Q1 (x ′ ₁ , y ′ ₁ ), S1 (x ₁ , y ₁ ), Q2 (x ′ ₂ , y ′ ₂ ), and S2 (x ₂ , y ₂ ) Assuming that the conversion shown in (1) is satisfied, f is obtained from the variable a in (Formula 1). However, in step S507 shown in FIG. 5, a matrix composed of variables a to d is denoted by M, and a matrix composed of variables e to f is denoted by T.

ここで、第１の実施形態では、簡略化のため、相似変換だけを考える。このとき、上記（数式１）は以下の（数式２）のように書き換えられる。

Here, in the first embodiment, only the similarity transformation is considered for simplification. At this time, the above (Formula 1) is rewritten as the following (Formula 2).

このとき、変数ａ、ｂ、ｅ、ｆはｘ’_１、ｙ’_１、ｘ_１、ｙ_１、ｘ’_２、ｙ’_２、ｘ_２、ｙ_２を使って式（３）から式（６）で表される。

At this time, the variables a, b, e, and f are changed from Equation (3) to Equation (6) using x ′ ₁ , y ′ ₁ , x ₁ , y ₁ , x ′ ₂ , y ′ ₂ , x ₂ , y _2. ).

次に、投票数集計手段の説明をステップＳ５０８からＳ５１７までの処理を用いて行う。

Next, the vote counting means will be described using the processing from step S508 to S517.

ステップＳ５０８で、上述のステップＳ５０５で当該最短距離対応点リストから選択した２組の点以外の点を選択するために、対応点選択変数ｋを３に初期化する。そして、ステップＳ５０９で、対応点選択変数ｋが当該最短距離対応点リストに登録されている対応点の組数ｍを超えていないかを判定する。ここで、超えている場合はステップＳ５１８へ処理を移すが、これについては後述する。ステップＳ５０９における判定で対応点選択変数ｋが当該最短距離対応点リストに登録されている対応点の組数ｍを超えていない場合はステップＳ５１０へ処理を移す。 In step S508, the corresponding point selection variable k is initialized to 3 in order to select a point other than the two sets of points selected from the shortest distance corresponding point list in step S505 described above. In step S509, it is determined whether or not the corresponding point selection variable k exceeds the number m of corresponding points registered in the shortest distance corresponding point list. Here, if it exceeds, the process moves to step S518, which will be described later. If it is determined in step S509 that the corresponding point selection variable k does not exceed the number m of corresponding points registered in the shortest distance corresponding point list, the process proceeds to step S510.

このステップＳ５１０では、上述のステップＳ５０５で当該最短距離対応点リストからランダムに抽出した２組の点Ｓ１（ｘ_１，ｙ_１）及びＳ２（ｘ_２，ｙ_２）以外の点を当該最短距離対応点リストから抽出する。第１の実施形態では、抽出された点をＳｋ（ｘ_ｋ，ｙ_ｋ）と記載する。 In this step S510, points other than the two sets of points S1 (x ₁ , y ₁ ) and S2 (x ₂ , y ₂ ) randomly extracted from the shortest distance corresponding point list in step S505 described above correspond to the shortest distance. Extract from the point list. In the first embodiment, the extracted point is described as Sk (x _k , y _k ).

次に、ステップＳ５１１で、Ｓｋ（ｘ_ｋ，ｙ_ｋ）が式（２）を使って移される座標Ｓｋ’（ｘ’_ｋ，ｙ’_ｋ）を求める。その後、ステップＳ５１２では、座標Ｓｋ’（ｘ’_ｋ，ｙ’_ｋ）と座標Ｑｋ（ｘ’_ｋ，ｙ’_ｋ）との幾何学的距離をユークリッド距離で計算し、当該ユークリッド距離が閾値Ｔｄと比較し閾値以下であるか否かを判定する。当該ユークリッド距離が閾値Ｔｄ以下の場合はステップＳ５１３へ進み、投票数Ｖｏｔｅをインクリメントし、ステップＳ５１４でＳ‘ｋの座標を正対応点として記憶する。そして、ステップＳ５１５に進む。 Next, in step S511, coordinates Sk ′ (x ′ _k , y ′ _k ) to which Sk (x _k , y _k ) is transferred using the equation (2) are _obtained . Thereafter, in step S512, the geometric distance between the coordinates Sk ′ (x ′ _k , y ′ _k ) and the coordinates Qk (x ′ _k , y ′ _k ) is calculated as the Euclidean distance, and the Euclidean distance is calculated from the threshold Td. A comparison is made to determine whether it is below the threshold. If the Euclidean distance is less than or equal to the threshold value Td, the process proceeds to step S513, the vote number Vote is incremented, and the coordinates of S′k are stored as positive corresponding points in step S514. Then, the process proceeds to step S515.

また、当該ユークリッド距離が閾値Ｔｄより大きい場合は、ステップＳ５１６で対応ペアだったが幾何関係を満たさなかった特徴点数であるＮＧ＿Ｖｏｔｅをインクリメントし、ステップＳ５１７で、Ｓ‘ｋの座標を非正対応点として記憶する。そして、ステップＳ５１５に進む。 If the Euclidean distance is greater than the threshold value Td, NG_Vote, which is the number of feature points that did not satisfy the geometrical relationship in step S516, is incremented. Remember as. Then, the process proceeds to step S515.

ステップＳ５１５では、対応点選択変数ｋをインクリメントし、ステップＳ５０９に戻り、最短距離対応点ペアリスト上の未処理のペアを抽出し、対応点選択変数ｋが当該最短距離対応点リストに登録されている対応点の組数ｍを超えるまで上述の処理を繰り返す。 In step S515, the corresponding point selection variable k is incremented, and the process returns to step S509 to extract an unprocessed pair on the shortest distance corresponding point pair list, and the corresponding point selection variable k is registered in the shortest distance corresponding point list. The above-described processing is repeated until the number m of corresponding points is exceeded.

次に、ステップＳ５０９で、対応点選択変数ｋが当該最短距離対応点リストに登録されている対応点の組数ｍを超えた場合の処理であるステップＳ５１８を説明する。ステップＳ５１８では、投票数Ｖｏｔｅの値と最終投票数ＶｏｔｅＭａｘの値とを比較し、投票数Ｖｏｔｅの値が最終投票数ＶｏｔｅＭａｘの値よりも大きい場合には、ステップＳ５１９へ処理を移す。 Next, step S518, which is a process when the corresponding point selection variable k exceeds the number m of corresponding points registered in the shortest distance corresponding point list in step S509, will be described. In step S518, the value of the vote number Vote and the value of the final vote number VoteMax are compared. If the value of the vote number Vote is larger than the value of the final vote number VoteMax, the process proceeds to step S519.

このステップＳ５１９では、最終投票数ＶｏｔｅＭａｘの値を投票数Ｖｏｔｅの値で置き換えた後、ステップＳ５２０で反復カウント数Ｃｏｕｎｔをインクリメントし、上述のステップＳ５０４に処理を戻す。 In step S519, after the value of the final vote number VoteMax is replaced with the value of the vote number Vote, the iteration count number Count is incremented in step S520, and the process returns to step S504 described above.

また、ステップＳ５１８で、投票数Ｖｏｔｅの値が最終投票数ＶｏｔｅＭａｘの値以下の場合にはステップＳ５２０へ処理を移し、反復カウント数Ｃｏｕｎｔをインクリメントし、上述のステップＳ５０４に処理を戻す。 If the value of the vote number Vote is equal to or smaller than the value of the final vote number VoteMax in step S518, the process proceeds to step S520, the repeat count number Count is incremented, and the process returns to step S504 described above.

なお、第１の実施形態におけるスコアの算出方法の説明では、相似変換だけを考えて説明したが、アフィン変換などその他の幾何学変換についても、ステップＳ５０７でそれぞれに応じた変換行列を求めることにより、対応可能である。例えば、アフィン変換の場合には、まずステップＳ５０６で、対応点の組の座標数を３とする。次に、ステップＳ５０７で、式（２）ではなく式（１）を使うこととし、ステップＳ５０６で選択した３組の対応点（合計６点）を使って変数ａからｆを求めればよい。 In the description of the score calculation method according to the first embodiment, only the similarity transformation has been considered. However, for other geometric transformations such as affine transformation, a transformation matrix corresponding to each is obtained in step S507. It is possible to respond. For example, in the case of affine transformation, first, in step S506, the number of coordinates of the set of corresponding points is set to 3. Next, in step S507, equation (1) is used instead of equation (2), and the three sets of corresponding points selected in step S506 (6 points in total) may be used to obtain f from variable a.

以上、本実施形態によれば、全体的に類似する画像同士を比較する際、画像領域内に存在する差異のある領域についてその領域をＯＣＲ処理することにより、精度よく比較可能になる。 As described above, according to the present embodiment, when comparing images that are generally similar to each other, it is possible to perform comparison with high accuracy by performing OCR processing on a region having a difference in the image region.

なお、本実施形態では、比較元画像と比較先画像との差異領域が「文字」ないし「文字列」情報である場合について説明したが、差異領域が「文字」ないし「文字列」以外であっても、効果を有するものである。本実施形態におけるＳＩＦＴ等の回転・拡縮不変の特徴量ベースの類似度比較では、特徴点間に存在する線分情報の違いを捉えることが原理上難しい。これに対し、ＯＣＲ処理は、文字に特化したものではあるが、線分の情報を捉えた類似クラス判定処理であると言える。よって、差異領域に対して同等のＯＣＲ処理モジュールを使って、同等の条件でＯＣＲ処理を行った場合、文字では無い領域に対しても、ＯＣＲ処理結果として可読性はなくても統一性のある出力が得られる。差異領域の部分画像に線分的な違いがあれば、認識結果の違いとしてあらわれる。この時、例えば、文字認識結果の文字列ラティスをそのまま比較し、差異の大きさによって、文字的な情報の違いの大きさとして扱っても構わない。 In this embodiment, the case where the difference area between the comparison source image and the comparison destination image is “character” or “character string” information has been described. However, the difference area is other than “character” or “character string”. However, it has an effect. In principle, it is difficult to grasp the difference in line segment information existing between feature points in the rotation / scaling invariant feature amount-based similarity comparison such as SIFT in this embodiment. On the other hand, the OCR processing is specialized for characters, but can be said to be similar class determination processing that captures line segment information. Therefore, when OCR processing is performed under the same conditions using the same OCR processing module for the difference area, even if the area is not a character, the OCR processing result is not readable but has a uniform output. Is obtained. If there is a line segment difference in the partial image of the difference area, it will appear as a difference in recognition results. At this time, for example, the character string lattices of the character recognition result may be compared as they are, and may be treated as the difference in character information depending on the difference.

あるいは、ＯＣＲ処理結果として、画像特徴量群抽出部２０２で得た特徴量とは異なる、文字認識固有の特徴量が得られる場合は、該特徴量で類似性を比較しても構わない。例えば、特徴量として、多次元ベクトルが得られるのであれば、ベクトル間の距離情報を利用してもよい。このように、本実施形態によれば、差異領域は必ずしも文字の領域であるとは限らない場合であっても、当該差異領域の文字的な情報の差異を捉えた比較が可能となる。 Alternatively, when a feature quantity unique to character recognition that is different from the feature quantity obtained by the image feature quantity group extraction unit 202 is obtained as an OCR processing result, the similarity may be compared with the feature quantity. For example, distance information between vectors may be used as long as a multidimensional vector is obtained as a feature quantity. As described above, according to the present embodiment, even when the difference area is not necessarily a character area, it is possible to perform a comparison that captures a difference in character information in the difference area.

また、本実施形態において、画像正立処理部２０４は、比較元画像に対して、画像特徴量比較部２０３で求めたアフィン変換等の幾何変換係数によって回転・拡縮等の幾何変換処理を実施し、比較先画像と位置合わせを行うようにした。例えば、比較元画像としてスキャンされたイメージデータ、比較先画像としてラスタライズされた文書画像データというような、比較元画像と比較先画像とで条件が異なる場合でも、比較先画像に位置合わせをすることで、正立化することができる。これにより、後段のＯＣＲ処理部２０５で行うＯＣＲ処理を精度よく実施できる。 In the present embodiment, the image erecting processing unit 204 performs a geometric transformation process such as rotation / enlargement / reduction on the comparison source image using a geometric transformation coefficient such as an affine transformation obtained by the image feature amount comparison unit 203. Alignment with the comparison target image was performed. For example, even if the conditions of the comparison source image and the comparison destination image are different, such as scanned image data as the comparison source image and rasterized document image data as the comparison destination image, alignment is performed on the comparison destination image. And can be erect. Thereby, the OCR processing performed by the OCR processing unit 205 at the subsequent stage can be performed with high accuracy.

また、本実施形態において、ＯＣＲ処理部２０５は、各差異領域の部分画像に限定して２値化処理を行うようにした。印刷画像であっても、文字が記載された部分に限定すれば、人が視認可能なコントラストを付けている場合が多く、閾値がうまく決められ、文字とそれ以外の画素の分離を行える可能性が高くなる。これにより、チラシなどの印刷物のような画像中の文字部分であっても、精度よく文字認識することが可能になる。 In the present embodiment, the OCR processing unit 205 performs binarization processing only on partial images of each different area. Even if it is a printed image, if it is limited to the part where the character is written, it is often the case that a contrast that humans can visually recognize is added, and the threshold is well determined, and the possibility of separating the character from other pixels is possible Becomes higher. Thereby, even if it is a character part in an image like printed matter, such as a leaflet, it becomes possible to recognize a character accurately.

なお、上述の説明では、局所特徴として、輝度画像から得られるＳＩＦＴやＳＵＲＦを特徴量の例に挙げたが、色差信号やＲＧＢカラーチャネルでの局所特徴を用いることも可能である。また、図４の説明においては、比較元画像と比較先画像との差異領域が１つの例を示しているが、差異領域はいくつあても構わない。それぞれ、比較元と比較先の同領域のものであることがわかるように対応付けて処理を行えばよい。具体的には，ステップＳ３０９で非正対応点の密度の高い領域群を抽出した後，個々の領域毎にステップＳ３１１〜Ｓ３１２の処理を実施すればよい。これにより、比較元画像と比較先画像とで複数の差異領域がある場合、具体的には、チラシ等の印刷物に複数の異なる背景・文字フォントの文字領域がある場合であっても、それぞれの文字部の比較が可能になる。 In the above description, SIFT or SURF obtained from a luminance image is given as an example of the feature amount as a local feature. However, it is also possible to use a local feature in a color difference signal or an RGB color channel. In the description of FIG. 4, one example of the difference area between the comparison source image and the comparison destination image is shown, but any number of difference areas may be provided. The processing may be performed in association with each other so that it can be understood that the comparison source and the comparison destination are in the same region. Specifically, after extracting a group of regions having a high density of non-positive corresponding points in step S309, the processing in steps S311 to S312 may be performed for each region. Thereby, when there are a plurality of different areas between the comparison source image and the comparison destination image, specifically, even when there are a plurality of different background / character font character areas on a printed matter such as a flyer, Character parts can be compared.

［その他の実施形態］
また、本発明は、上記実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形（各実施例の有機的な組合せを含む）が可能であり、それらを本発明の範囲から除外するものではない。即ち、上述した各実施例及びその変形例を組み合わせた構成も全て本発明に含まれるものである。 [Other Embodiments]
In addition, the present invention supplies software (program) for realizing the functions of the above-described embodiments to a system or apparatus via a network or various storage media, and the computer of the system or apparatus (or CPU, MPU, etc.) programs Is read and executed. Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. The present invention is not limited to the above embodiments, and various modifications (including organic combinations of the embodiments) are possible based on the spirit of the present invention, and these are excluded from the scope of the present invention. is not. That is, the present invention includes all the combinations of the above-described embodiments and modifications thereof.

２０１画像入力部
２０２画像特徴量群抽出部
２０３画像特徴量比較部
２０４画像正立処理部
２０５ＯＣＲ処理部
２０６ＯＣＲ結果比較部
２０７出力部
２０８記憶部 DESCRIPTION OF SYMBOLS 201 Image input part 202 Image feature-value group extraction part 203 Image feature-value comparison part 204 Image erecting process part 205 OCR process part 206 OCR result comparison part 207 Output part 208 Storage part

Claims

An input means for inputting a comparison source image and a comparison destination image;
Estimating means for estimating a difference area between the input comparison source image and the comparison destination image;
OCR processing means for performing OCR processing on the estimated difference area;
An image processing apparatus comprising:

The estimation means includes
Extraction means for extracting local feature points from the input comparison source image and comparison destination image;
Calculating means for calculating local features for the extracted local feature points;
First determination means for determining the local feature point in a correspondence relationship between the comparison source image and the comparison target image based on the calculated local feature;
First estimation means for estimating a common area between the comparison source image and the comparison destination image based on the determined local feature points in the correspondence relationship;
Second determining means for determining local feature points having no correspondence relationship between the comparison source image and the comparison destination image in the estimated common region;
Second estimation means for estimating a difference area between the comparison source image and the comparison destination image based on the determined local feature points having no corresponding relationship;
The image processing apparatus according to claim 1, further comprising:

The image processing apparatus further includes an erecting processing unit that erects the comparison source image and the comparison destination image based on a geometric relationship between the local feature points in the correspondence relationship determined by the first determination unit. The image processing apparatus according to claim 2.

4. The image processing apparatus according to claim 1, wherein the OCR processing unit performs OCR processing by performing binarization processing only on the estimated difference area. 5.

Enter the comparison source image and the comparison destination image,
Estimating a difference area between the input comparison source image and the comparison destination image;
Performing OCR processing on the estimated difference area;
An image processing method.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 5.