JP6598480B2

JP6598480B2 - Image processing apparatus, image processing method, and program

Info

Publication number: JP6598480B2
Application number: JP2015061682A
Authority: JP
Inventors: 日出来空門; 昌弘松下; 弘隆椎山
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-03-24
Filing date: 2015-03-24
Publication date: 2019-10-30
Anticipated expiration: 2035-03-24
Also published as: JP2016181181A

Description

本発明は画像の局所特徴量を比較する技術に関するものである。 The present invention relates to a technique for comparing local feature amounts of images.

画像の局所的な特徴量（局所特徴量）を用いて類似画像を検索する方法が提案されている。この方法では、まず、画像から特徴的な点（局所特徴点）を抽出する。次に、当該局所特徴点とその周辺の画像情報とに基づいて、当該局所特徴点に対応する特徴量（局所特徴量）を計算する。 There has been proposed a method of searching for a similar image using a local feature amount (local feature amount) of an image. In this method, first, characteristic points (local feature points) are extracted from an image. Next, a feature amount (local feature amount) corresponding to the local feature point is calculated based on the local feature point and surrounding image information.

局所特徴量を利用する手法においては、局所特徴量を回転不変、拡大・縮小不変となる複数の要素で構成される情報として定義する。これにより、画像を回転させたり、拡大又は縮小させたりした場合であっても、検索を可能にする。局所特徴量は一般的にベクトルとして表現される。ただし、局所特徴量が回転不変、拡大・縮小不変であることは理論上の話であり、実際のデジタル画像においては、画像の回転や拡大・縮小処理前の局所特徴量と処理後の対応する局所特徴量との間に若干の変動が生じる。 In the method using the local feature amount, the local feature amount is defined as information including a plurality of elements that are rotation invariant and enlargement / reduction invariant. Thereby, even when the image is rotated, enlarged or reduced, the search can be performed. The local feature amount is generally expressed as a vector. However, it is a theoretical story that the local feature is invariant to rotation and enlargement / reduction. In an actual digital image, the local feature before image rotation and enlargement / reduction processing corresponds to that after processing. Some variation occurs between the local feature amount.

回転不変の局所特徴量抽出のために、局所特徴点周辺の局所領域の画素パターンから主方向を算出し、局所特徴量算出時に主方向を基準に局所領域を回転させて方向の正規化を行う。また、拡大・縮小不変の局所特徴量を算出するために、異なるスケールの画像を内部で生成し、各スケールの画像からそれぞれ局所特徴点の抽出と局所特徴量の算出を行う。ここで、内部で生成した一連の異なるスケールの画像集合は一般的にスケールスペースと呼ばれる。 In order to extract rotation-invariant local feature values, the main direction is calculated from the pixel pattern of the local region around the local feature point, and the local region is rotated based on the main direction when the local feature value is calculated to normalize the direction. . Further, in order to calculate the local feature quantity that does not change in size, the image of different scales is generated internally, and local feature points are extracted from the images of the respective scales and the local feature quantities are calculated. Here, a series of image sets of different scales generated internally is generally called a scale space.

上述の方式により、１枚の画像から複数の局所特徴点が抽出される。局所特徴量を用いた画像検索では、それぞれの局所特徴点から算出した局所特徴量同士の比較を行うことによりマッチングを行う。多く利用されている投票方式（特許文献１）では、入力画像から抽出された、各特徴点の局所特徴量に類似する特徴量を持つ特徴点が登録画像に存在すれば、その比較先の登録画像に対して１票を投票する。その投票数の多い登録画像ほど比較元画像と類似するとするものである。なお、入力画像は、比較元画像、クエリ画像または検索元画像とも言う。登録画像は、比較先画像または検索先画像とも言う。 With the above-described method, a plurality of local feature points are extracted from one image. In image retrieval using local feature amounts, matching is performed by comparing local feature amounts calculated from respective local feature points. In a widely used voting method (Patent Document 1), if a feature point having a feature amount similar to a local feature amount of each feature point extracted from an input image exists in the registered image, the comparison destination is registered. Vote for an image. It is assumed that a registered image having a larger number of votes is more similar to the comparison source image. The input image is also referred to as a comparison source image, a query image, or a search source image. The registered image is also referred to as a comparison destination image or a search destination image.

また、画像照合処理又は画像検索処理を高精度に行うために、事前に画像から検索したい対象オブジェクトを切り出す方法がある（特許文献２）。この方法では、ユーザがペン入力部を介して画像上の対象オブジェクトの輪郭情報を入力し、背景を取り除いた部分のオブジェクト画像を抽出している。 In addition, there is a method of cutting out a target object to be searched from an image in advance in order to perform image collation processing or image search processing with high accuracy (Patent Document 2). In this method, the user inputs the contour information of the target object on the image via the pen input unit, and extracts the object image of the part from which the background is removed.

特開２００９−２８４０８４号JP 2009-284084 A 特許平１０−２５４９０１号Japanese Patent No. 10-254901

しかし、特許文献１の投票方式では、撮像位置からの距離が異なる前景と背景の特徴量が混在した画像について、前景から抽出される特徴量と背景から抽出される特徴量について区別せずに、他の画像との類似度が決定されていた。例えば、図１に示すような前景よりも背景の特徴点が多い画像との類似度を求める場合は、背景にある特徴点の局所特徴量が所定値以上の類似度を持ってしまう可能性がある。この場合に、背景にある特徴量が投票値に影響し、得られた投票値がより大きくなることがあった。そのため、画像間の適正な類似度が得られないことがあり、照合精度を低下させていた。また、画像検索に適用して、複数の類似画像を類似度でソートして検索結果を作成する場合などに、適正な類似度が得られないので、検索精度を低下させていた。 However, in the voting method of Patent Document 1, for an image in which the foreground and background feature quantities having different distances from the imaging position are mixed, the feature quantity extracted from the foreground and the feature quantity extracted from the background are not distinguished. Similarity with other images was determined. For example, when obtaining the similarity with an image having more background feature points than the foreground as shown in FIG. 1, the local feature amount of the feature points in the background may have a similarity greater than or equal to a predetermined value. is there. In this case, the feature quantity in the background may affect the vote value, and the obtained vote value may become larger. Therefore, an appropriate similarity between images may not be obtained, and collation accuracy is reduced. In addition, when applying to image search and sorting a plurality of similar images by similarity to create a search result, an appropriate similarity cannot be obtained, thus reducing the search accuracy.

また、特許文献２の方式では、ユーザが画像にあるすべての対象オブジェクトに対してユーザがペン入力による抽出処理のような特別な処理を行わなければならないので、ユーザの処理負荷が高い問題があった。 In the method of Patent Document 2, the user has to perform special processing such as extraction processing by pen input on all target objects in the image. It was.

本発明は、上記の問題に鑑みてなされたものであり、ユーザが特別な処理をしなくても、比較対象の特徴点に絞って、画像照合又は画像検索を行うことを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to perform image matching or image search by focusing on feature points to be compared without requiring a user to perform special processing.

上記の課題を解決するために、本発明の画像処理装置は、第一画像の複数の特徴点ごとの第一局所特徴量と、撮像対象を撮像して得られた第二画像の複数の特徴点ごとの第二局所特徴量と、を抽出する抽出手段と、前記第二画像のそれぞれの前記特徴点に対応する前記撮像対象の奥行情報を保持する保持手段と、前記奥行情報に基づいて、少なくとも１つの距離範囲を特定する特定手段と、前記距離範囲について得られた、前記第二局所特徴量と前記第一局所特徴量との類似度が、所定値以上であれば、前記第二画像が前記第一画像と類似すると判定する判定手段と、を備え、前記判定手段は、複数の前記距離範囲ごとに得られた、前記第二局所特徴量と前記第一局所特徴量との前記類似度の少なくとも二つ以上が、前記所定値以上であれば、前記所定値以上の前記類似度の合計に基づいて、前記第二画像が前記第一画像と類似するかを判定することを特徴とする。 In order to solve the above problems, an image processing apparatus according to the present invention includes a first local feature amount for each of a plurality of feature points of a first image and a plurality of features of a second image obtained by imaging an imaging target. Based on the depth information, extraction means for extracting a second local feature amount for each point, holding means for holding depth information of the imaging target corresponding to each of the feature points of the second image, If the similarity between the specifying means for specifying at least one distance range and the second local feature value and the first local feature value obtained for the distance range is equal to or greater than a predetermined value, the second image Determining means that is similar to the first image, and the determining means obtains the similarity between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges. time of at least two or more, if the predetermined value or more Based on the sum of the similarity of the above predetermined value, the second image and judging whether similar to the first image.

本発明によれば、比較先画像の奥行情報を用いて、比較先画像の特徴量を距離範囲ごとに特定することによって、ユーザが特別な処理をしなくても、比較対象の特徴点に絞って、画像照合処理又は画像検索処理を行うことができる。 According to the present invention, by using the depth information of the comparison destination image, the feature amount of the comparison destination image is specified for each distance range, so that the user can focus on the feature points to be compared without performing any special processing. Thus, image collation processing or image retrieval processing can be performed.

画像処理装置が処理する画像の一例である。It is an example of the image which an image processing apparatus processes. 第１の実施形態におけるコンピュータ装置の構成例の図である。It is a figure of the structural example of the computer apparatus in 1st Embodiment. 第１の実施形態における画像照合装置の構成例の図である。It is a figure of the example of composition of the image collation device in a 1st embodiment. 第１の実施形態における投票のデータ構造の例の説明図である。It is explanatory drawing of the example of the data structure of the vote in 1st Embodiment. 第１の実施形態における画像照合処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the image collation process in 1st Embodiment. 第１の実施形態における局所特徴量抽出処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the local feature-value extraction process in 1st Embodiment. 縮小画像の説明図である。It is explanatory drawing of a reduction image. 第２の実施形態における画像検索装置の構成例の図である。It is a figure of the structural example of the image search device in 2nd Embodiment. （ａ）第２の実施形態における画像インデックス及び投票のデータ構造の例の説明図である。（ｂ）クラスタごとに生成された投票値の例の説明図である。（ｃ）投票値から類似度を求める例の説明図である。(A) It is explanatory drawing of the example of the data structure of the image index and vote in 2nd Embodiment. (B) It is explanatory drawing of the example of the voting value produced | generated for every cluster. (C) It is explanatory drawing of the example which calculates | requires similarity from vote value. 第２の実施形態における登録処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the registration process in 2nd Embodiment. 第２の実施形態における画像検索処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the image search process in 2nd Embodiment. 第２の実施形態における量子化空間の説明図である。It is explanatory drawing of the quantization space in 2nd Embodiment. 第３の実施形態における画像検索処理の流れを説明するためのフローチャートである。14 is a flowchart for explaining a flow of image search processing in the third embodiment. 投票値から類似度を求める例の説明図である。It is explanatory drawing of the example which calculates | requires a similarity from a vote value.

［第１の実施形態］
本実施形態のサーバ装置やクライアント装置を構成するコンピュータ装置の構成について、図２のブロック図を参照して説明する。サーバ装置やクライアント装置はそれぞれ単一のコンピュータ装置で実現してもよいし、必要に応じた複数のコンピュータ装置に各機能を分散して実現するようにしてもよい。複数のコンピュータ装置で構成される場合は、互いに通信可能なようにＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ（ＬＡＮ）などで接続されている。コンピュータ装置は、パーソナルコンピュータ（ＰＣ）やワークステーション（ＷＳ）等の情報処理装置によって実現することができる。 [First Embodiment]
The configuration of the computer device that constitutes the server device and the client device of this embodiment will be described with reference to the block diagram of FIG. Each of the server device and the client device may be realized by a single computer device, or may be realized by distributing each function to a plurality of computer devices as necessary. When configured by a plurality of computer devices, they are connected by a local area network (LAN) or the like so that they can communicate with each other. The computer device can be realized by an information processing device such as a personal computer (PC) or a workstation (WS).

図２において、ＣＰＵ２０１はコンピュータ装置２００全体を制御するＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔである。ＲＯＭ２０２は変更を必要としないプログラムやパラメータを格納するＲｅａｄＯｎｌｙＭｅｍｏｒｙである。ＲＡＭ２０３は外部装置などから供給されるプログラムやデータを一時記憶するＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙである。記憶装置２０４はコンピュータ装置２００に固定して設置されたハードディスクやメモリカードなどの記憶装置である。なお、記憶装置２０４は、コンピュータ装置２００内部にあるハードディスクでもよい、コンピュータ装置２００から着脱可能なフレキシブルディスク（ＦＤ）やＣＤ等の光ディスク、磁気や光カード、ＩＣカード、メモリカードなどを含んでもよい。入力デバイスインターフェイス２０５はユーザの操作を受け、データを入力するポインティングデバイスやキーボードなどの入力デバイス２０９とのインターフェイスである。出力デバイスインターフェイス２０６はコンピュータ装置２００の保持するデータや供給されたデータを表示するためのモニタ２１０とのインターフェイスである。通信インターフェイス２０７はインターネットなどのネットワーク回線２１１や、デジタルカメラ２１２、デジタルビデオカメラ２１３、スマートフォン２１４などの撮像装置に接続するための通信インターフェイスである。システムバス２０８は各ユニット２０１〜２０７を通信可能に接続する伝送路である。 In FIG. 2, a CPU 201 is a central processing unit that controls the entire computer apparatus 200. The ROM 202 is a Read Only Memory that stores programs and parameters that do not need to be changed. A RAM 203 is a Random Access Memory that temporarily stores programs and data supplied from an external device. The storage device 204 is a storage device such as a hard disk or a memory card that is fixedly installed in the computer device 200. Note that the storage device 204 may be a hard disk inside the computer device 200, a flexible disk (FD) that can be attached to and removed from the computer device 200, an optical disk such as a CD, a magnetic or optical card, an IC card, and a memory card. . An input device interface 205 is an interface with an input device 209 such as a pointing device or a keyboard that receives data from a user and inputs data. The output device interface 206 is an interface with the monitor 210 for displaying data held by the computer apparatus 200 and supplied data. A communication interface 207 is a communication interface for connecting to a network line 211 such as the Internet or an imaging device such as a digital camera 212, a digital video camera 213, and a smartphone 214. A system bus 208 is a transmission path that connects the units 201 to 207 so that they can communicate with each other.

本実施形態で用いる撮像装置に、撮像対象の奥行情報を生成する生成手段（非図示）が含まれる。撮像対象の奥行情報は、生成手段から撮像対象までの距離を示す奥行値又は距離情報である。 The imaging device used in the present embodiment includes a generation unit (not shown) that generates depth information of an imaging target. The depth information of the imaging target is a depth value or distance information indicating a distance from the generation unit to the imaging target.

後述する各動作は、ＲＯＭ２０２等のコンピュータ読み取り可能な記憶媒体に格納されたプログラムをＣＰＵ２０１が実行することにより実行される。 Each operation described below is executed by the CPU 201 executing a program stored in a computer-readable storage medium such as the ROM 202.

［画像照合装置の構成］
本実施形態の画像処理装置を画像照合装置として用いた例を説明する。画像照合装置では、２枚の画像が類似するか否かを判定する。本実施形態では、画像照合装置が処理する２枚の画像を比較元画像と比較先画像として説明する。以下、本実施形態の画像照合装置の構成について図３を用いて説明する。画像照合装置が、画像入力部３０１、特徴量抽出部３０２、データ保持部３０３、特徴量特定部３０５、特徴量比較部３０６、類似判定部３０７及び照合結果出力部３０８を含む。 [Configuration of image verification device]
An example in which the image processing apparatus of this embodiment is used as an image collation apparatus will be described. In the image collation device, it is determined whether or not the two images are similar. In the present embodiment, two images processed by the image collating apparatus will be described as a comparison source image and a comparison destination image. Hereinafter, the configuration of the image collating apparatus of the present embodiment will be described with reference to FIG. The image collation apparatus includes an image input unit 301, a feature amount extraction unit 302, a data holding unit 303, a feature amount identification unit 305, a feature amount comparison unit 306, a similarity determination unit 307, and a collation result output unit 308.

画像入力部３０１は、入力デバイス２０９などを介して、画像照合装置の外部から比較元画像（クエリ画像）を入力する。なお、比較先画像（登録画像）が記憶装置２０４に記憶されているので、入力する必要はない。 The image input unit 301 inputs a comparison source image (query image) from the outside of the image matching apparatus via the input device 209 or the like. Since the comparison destination image (registered image) is stored in the storage device 204, it is not necessary to input it.

特徴量抽出部３０２は、比較元画像及び比較先画像から特徴的な点（局所特徴点）を抽出し、抽出された局所特徴点とその周辺の画像情報とに基づいて、局所特徴点に対応する特徴量（局所特徴量）を計算する。局所特徴量抽出処理の詳細については、図６と図７を用いて後ほど説明する。 The feature amount extraction unit 302 extracts a characteristic point (local feature point) from the comparison source image and the comparison destination image, and corresponds to the local feature point based on the extracted local feature point and surrounding image information. The feature amount (local feature amount) to be calculated is calculated. Details of the local feature amount extraction processing will be described later with reference to FIGS. 6 and 7.

データ保持部３０３は撮像対象の奥行情報を保持しているので、データ保持部３０３から特徴量の算出領域の奥行情報を取得する。また、奥行値を保持している時は、画像における特徴点の位置に対応する撮像対象の奥行値を取得する。あるいは、撮像画像と距離画像が別々のデータとしてあるとき、撮像画像の画素と距離画像のマッピングを予め保持しておき、そのマッピングに従って特徴点の位置に対応する撮像対象の奥行値を取得する。また、距離画像にノイズが多い場合などは、予め距離画像に平滑化処理を行ってノイズを低減するなどしてから、奥行値を取得するようにしてもよい。なお、本実施形態における奥行情報の取得方法は、これらの方法に限定されるものではない。 Since the data holding unit 303 holds the depth information of the imaging target, the depth information of the feature amount calculation area is acquired from the data holding unit 303. Further, when the depth value is held, the depth value of the imaging target corresponding to the position of the feature point in the image is acquired. Alternatively, when the captured image and the distance image are separate data, mapping between the pixel of the captured image and the distance image is held in advance, and the depth value of the imaging target corresponding to the position of the feature point is acquired according to the mapping. Further, when there is a lot of noise in the distance image, the depth value may be acquired after smoothing the distance image in advance to reduce the noise. In addition, the acquisition method of depth information in this embodiment is not limited to these methods.

特徴量特定部３０５は、まず、特徴点の位置に対応する撮像対象の奥行値に基づいて、複数の距離範囲を特定する。例えば、撮像対象の奥行値は１０〜１００メールの距離範囲に分布する場合は、１０〜１００メールの距離範囲を１０〜２０メール、２０〜３０メートル、３０〜４０メールなどの複数の１０メートル間隔の距離範囲に分ける。本実施形態では、この処理を奥行値の階調化と呼ぶ。奥行値を階調化した後に、奥行値に基づいて特徴量をクラスタリングする。具体的には、特徴量抽出部３０２で得た特徴量ごとに、奥行情報保持部３０３で奥行値を得る。そして、奥行値を階調化することで特徴量をクラスタリングする。例えば、予め階調化の幅を決めておき、この階調幅で特徴量の奥行値を割ったときの商を求める。この階調の幅は一定間隔を示す距離であり、先の例では１０メートルの間隔を示す距離である。そして、商が同じ特徴量を同じクラスタに割り当てることでクラスタリングを行う。これによって、所定の奥行値の幅（距離範囲）ごとに特徴量を特定して分離することができるため、前景にある撮像対象と背景にある撮像対象とに対応する画像の特徴量を分離することができる。 The feature amount specifying unit 305 first specifies a plurality of distance ranges based on the depth value of the imaging target corresponding to the position of the feature point. For example, when the depth value of the imaging target is distributed in a distance range of 10 to 100 emails, the distance range of 10 to 100 emails is a plurality of 10 meter intervals such as 10 to 20 emails, 20 to 30 meters, and 30 to 40 emails. Divide into distance ranges. In the present embodiment, this process is referred to as depth value gradation. After gradation of the depth value, the feature values are clustered based on the depth value. Specifically, a depth value is obtained by the depth information holding unit 303 for each feature amount obtained by the feature amount extraction unit 302. Then, the feature values are clustered by gradation of the depth value. For example, the gradation width is determined in advance, and the quotient obtained by dividing the depth value of the feature amount by this gradation width is obtained. The gradation width is a distance indicating a constant interval, and in the above example, is a distance indicating an interval of 10 meters. Then, clustering is performed by assigning feature quantities having the same quotient to the same cluster. As a result, the feature amount can be specified and separated for each width (distance range) of the predetermined depth value, so that the feature amount of the image corresponding to the imaging target in the foreground and the imaging target in the background is separated. be able to.

また、クラスタリングの一例として、画像の２値化処理などに使われる判別分析法等を用いてクラスタを決定してもよい。これによって、２つのクラスタに特徴量を分けることができるため、前景と背景に特徴量を分けることができる。 Further, as an example of clustering, a cluster may be determined using a discriminant analysis method used for image binarization processing or the like. Thereby, since the feature amount can be divided into two clusters, the feature amount can be divided into the foreground and the background.

さらに、階層的クラスタリングなどを用いてクラスタリングしてもよい。本実施形態の階層的クラスタリングは、複数の連続した距離範囲に対応して特徴量をクラスタリングして複数のクラスタが得られた場合に行う。この場合に、隣り合う二つの距離範囲に対応する二つのクラスタに割り当てられた特徴量の数がそれぞれ所定値以上であれば、この二つのクラスタを一つのクラスタに統合する。この処理を繰り返すと、奥行値の異なる複数の撮像対象のそれぞれに対応した特徴量のクラスタが得られる。これによって、奥行値の異なる複数のオブジェクトが前景として存在するとき、複数のオブジェクトのそれぞれに対応した特徴量のクラスタを得ることができる。 Further, clustering may be performed using hierarchical clustering or the like. The hierarchical clustering of the present embodiment is performed when a plurality of clusters are obtained by clustering feature amounts corresponding to a plurality of continuous distance ranges. In this case, if the number of feature amounts assigned to two clusters corresponding to two adjacent distance ranges is equal to or greater than a predetermined value, the two clusters are integrated into one cluster. When this process is repeated, a cluster of feature amounts corresponding to each of a plurality of imaging targets having different depth values is obtained. Accordingly, when a plurality of objects having different depth values exist as the foreground, a cluster of feature amounts corresponding to each of the plurality of objects can be obtained.

なお、本実施形態では、奥行値の小さい撮像対象（オブジェクト）は撮像時の視点から見た前景部分であり、奥行値の大きい撮像対象（オブジェクト）は撮像時の視点から見た撮像対象の背景部分である。 In the present embodiment, the imaging target (object) with a small depth value is a foreground portion viewed from the viewpoint at the time of imaging, and the imaging target (object) with a large depth value is the background of the imaging target viewed from the viewpoint at the time of imaging. Part.

また、本実施形態における奥行値を用いた特徴量のクラスタリング方法は、これらの方法に限定されるものではない。 In addition, the clustering method of the feature amount using the depth value in the present embodiment is not limited to these methods.

特徴量比較部３０６は、比較元画像の特徴量と比較先画像の特徴量とを比較し、比較元画像の特徴量と類似する比較先画像の特徴量をペアとして求める。具体的には、特徴量抽出部３０２で抽出された比較元画像の特徴量と比較先画像の特徴量とのベクトル間のユークリッド距離を算出する。算出された二つの特徴量のベクトル間のユークリッド距離が所定値以下の場合、二つの特徴量が類似していると判断し、この比較元画像の特徴量と比較先画像の特徴量とをペアとして求める。 The feature amount comparison unit 306 compares the feature amount of the comparison source image with the feature amount of the comparison destination image, and obtains the feature amount of the comparison destination image similar to the feature amount of the comparison source image as a pair. Specifically, the Euclidean distance between vectors of the feature amount of the comparison source image extracted by the feature amount extraction unit 302 and the feature amount of the comparison destination image is calculated. If the calculated Euclidean distance between the two feature vectors is less than or equal to a predetermined value, it is determined that the two feature values are similar, and the feature value of the comparison source image and the feature value of the comparison destination image are paired. Asking.

類似判定部３０７は、比較先画像のクラスタごとに、ペア数を数え上げる。具体的には、ペアをなした特徴量に対応する「クラスタ」に対して１票を投票していく。これによって、クラスタごとにペアの数を数え上げていく。 The similarity determination unit 307 counts the number of pairs for each cluster of the comparison target image. Specifically, one vote is voted for the “cluster” corresponding to the paired feature amount. As a result, the number of pairs is counted for each cluster.

概念的には、図４に示すように、クラスタＩＤごとに投票値が生成されることになる。なお、ペア数４０１はクラスタＩＤ：００２の投票値を表している。ペア数４０２はクラスタＩＤ：００５の投票値を表している。この図では、クラスタＩＤが小さい方が、奥行きが手前にあるクラスタを表している。そのため、前景にあたるクラスタのペア数４０１と背景にあたるクラスタのペア数４０２が分かれて求まることが分かる。このように、所定距離範囲ごとにペア数を数え上げることがなされる。 Conceptually, as shown in FIG. 4, a vote value is generated for each cluster ID. The number of pairs 401 represents the vote value of the cluster ID: 002. The number of pairs 402 represents the vote value of the cluster ID: 005. In this figure, the smaller cluster ID represents the cluster with the depth closer to the front. Therefore, it can be seen that the number of pairs 401 of clusters corresponding to the foreground and the number of pairs 402 of clusters corresponding to the background are obtained separately. In this way, the number of pairs is counted for each predetermined distance range.

類似判定部３０７は、比較先画像のクラスタをもとに、所定のペア数を有するクラスタを特定して、該クラスタのペア数から比較元画像と比較先画像の類似度を生成する。具体的には、最大のペア数を有するクラスタを特定して、該クラスタのペア数を類似度とする。あるいは、該最大値の８０％に相当する値以上のペア数を有するクラスタを特定して、特定されたクラスタのペア数を和算したものを類似度としてもよい。あるいは、予め定めた閾値以上のペア数を有するクラスタを特定して、特定されたクラスタのペア数を和算したものを用いてもよい。類似度が所定値以上であれば、類似判定部３０７は、比較先画像と比較元画像とが類似すると判定する。 The similarity determination unit 307 identifies a cluster having a predetermined number of pairs based on the cluster of the comparison destination image, and generates the similarity between the comparison source image and the comparison destination image from the number of pairs of the cluster. Specifically, the cluster having the maximum number of pairs is specified, and the number of pairs of the clusters is set as the similarity. Alternatively, it is possible to specify a cluster having the number of pairs equal to or greater than 80% of the maximum value and add the number of pairs of the specified clusters as the similarity. Alternatively, a cluster having the number of pairs equal to or greater than a predetermined threshold value may be specified, and the sum of the specified number of pairs of clusters may be used. If the similarity is greater than or equal to a predetermined value, the similarity determination unit 307 determines that the comparison target image and the comparison source image are similar.

また、確認したい対象オブジェクトの奥行値がどの距離範囲にあるのかを事前に分かっている場合は、類似判定部３０７は、クラスタＩＤごとに投票値を生成する必要はなく、特定の距離範囲に対応するクラスタＩＤだけに投票値を生成することもできる。図４の例では、距離範囲が正確に分かる場合は、クラスタＩＤ：００２の投票値だけを生成すればよい。距離範囲が正確に分からないが、対象オブジェクトが前景にあると分かっている場合は、クラスタＩＤ：００１〜００３の投票値だけを生成すればよい。 In addition, when it is known in advance which distance range the depth value of the target object to be confirmed is, the similarity determination unit 307 does not need to generate a vote value for each cluster ID, and corresponds to a specific distance range It is also possible to generate voting values only for the cluster IDs to be performed. In the example of FIG. 4, if the distance range is accurately known, only the vote value of the cluster ID: 002 needs to be generated. If the distance range is not accurately known, but it is known that the target object is in the foreground, only the vote values of the cluster IDs: 001 to 003 need be generated.

なお、本実施形態で求めた画像間の類似度は、特定されたクラスタのペア数又はペア数を和算したものであるが、これに限らない。例えば、ペア数を数えないで、所定値以下の二つの特徴量のベクトル間のユークリッド距離をパラメータとして用いて画像間の類似度を求めてもよい。 In addition, although the similarity between the images calculated | required by this embodiment is the sum of the number of pairs or the number of pairs of the identified cluster, it is not restricted to this. For example, instead of counting the number of pairs, the similarity between images may be obtained using the Euclidean distance between two feature quantity vectors equal to or less than a predetermined value as a parameter.

照合結果出力部３０８は、類似判定部３０７の判定した結果をモニタ２１０などに出力する。 The collation result output unit 308 outputs the result determined by the similarity determination unit 307 to the monitor 210 or the like.

本実施形態の画像照合処理の流れを図５に示すフローチャートを用いて説明する。 The flow of the image matching process of the present embodiment will be described using the flowchart shown in FIG.

ステップＳ５０１では、比較元画像と比較先画像から局所特徴量を抽出する。詳細な処理内容については、局所特徴量抽出処理として図６、図７を用いて別途説明する。 In step S501, local feature amounts are extracted from the comparison source image and the comparison destination image. Detailed processing contents will be separately described with reference to FIGS. 6 and 7 as local feature amount extraction processing.

ステップＳ５０２では、比較先画像（登録画像）の特徴量をクラスタリングする。具体的には、データ保持部３０３から画像における各特徴点の位置に対応する撮像対象の奥行き値を得る。そして、奥行値に基づいて、特徴量特定部３０５によって各特徴点に対応する特徴量はクラスタに分ける。 In step S502, the feature quantities of the comparison target image (registered image) are clustered. Specifically, the depth value of the imaging target corresponding to the position of each feature point in the image is obtained from the data holding unit 303. Based on the depth value, the feature amount specifying unit 305 divides the feature amount corresponding to each feature point into clusters.

ステップＳ５０３〜ステップＳ５０６では、比較元画像から抽出された局所特徴量に類似する比較先画像の特徴量を見つけ、見つけた特徴量に対応する「クラスタ」に対して１票を投票する処理を行う。概念的には、図４に示すように、クラスタＩＤごとに投票値を生成することを行う。以下はステップごとに説明する。 In steps S503 to S506, a feature amount of the comparison target image similar to the local feature amount extracted from the comparison source image is found, and a process of voting one vote for the “cluster” corresponding to the found feature amount is performed. . Conceptually, as shown in FIG. 4, a vote value is generated for each cluster ID. The following will be described step by step.

ステップＳ５０３では、ステップＳ５０２で得た比較元画像の局所特徴量を順に処理するためのループであり、特徴量には１から順に番号が割り当てられているものとする。これを変数ｉとして用いて参照するため、はじめにｉを１に初期化する。さらに、ｉが局所特徴量の個数以下であるときステップＳ５０４へ移り、これを満たさないときループを抜けてステップＳ５０７へ移る。 Step S503 is a loop for sequentially processing the local feature values of the comparison source image obtained in Step S502, and numbers are assigned to the feature values in order from 1. In order to refer to this using variable i, first, i is initialized to 1. Further, when i is equal to or smaller than the number of local feature values, the process proceeds to step S504. When i is not satisfied, the process exits the loop and proceeds to step S507.

ステップＳ５０４では、特徴量比較部３０６によって、比較元画像の特徴量ｉと類似する比較先画像の特徴量を見つける。具体的には、比較元画像の特徴量ｉと、比較先画像の全ての特徴量との間でユークリッド距離を求めて、所定閾値以下のユークリッド距離にある比較先画像の特徴量を特定する。 In step S504, the feature amount comparison unit 306 finds a feature amount of the comparison destination image similar to the feature amount i of the comparison source image. Specifically, the Euclidean distance is obtained between the feature amount i of the comparison source image and all the feature amounts of the comparison destination image, and the feature amount of the comparison destination image at the Euclidean distance equal to or less than a predetermined threshold is specified.

ステップＳ５０５では、類似判定部３０７によって、「クラスタ」に対してペア数を加算することが行われる。具体的には、図４に示すような情報を保持するために、クラスタＩＤと投票値を保持するリストなどを予め用意しておく。そして、ステップＳ５０４で得た類似する特徴量に対応するクラスタＩＤの投票値に１加算する。これをステップＳ５０４で見つけた全ての特徴量に対して行う。 In step S <b> 505, the similarity determination unit 307 adds the number of pairs to “cluster”. Specifically, in order to hold the information as shown in FIG. 4, a list for holding the cluster ID and the vote value is prepared in advance. Then, 1 is added to the vote value of the cluster ID corresponding to the similar feature amount obtained in step S504. This is performed for all the feature values found in step S504.

ステップＳ５０６は、ループの終端であり、ｉに１を加算してステップＳ５０３へ戻る。 Step S506 is the end of the loop, 1 is added to i, and the flow returns to step S503.

ステップＳ５０７では、類似判定部３０７によって、画像間の類似度が生成される。具体的には、最大の投票値を持つクラスタを特定して、このクラスタの投票値を類似度とする。これによって、比較元画像と比較先画像の類似度が生成される。類似度が所定値以上であれば、類似判定部３０７は、比較元画像と比較先画像とが類似すると判定する。 In step S507, the similarity determination unit 307 generates a similarity between images. Specifically, the cluster having the maximum vote value is specified, and the vote value of this cluster is set as the similarity. Thereby, the similarity between the comparison source image and the comparison destination image is generated. If the similarity is greater than or equal to a predetermined value, the similarity determination unit 307 determines that the comparison source image and the comparison destination image are similar.

以下では、本実施例で用いた「局所特徴量抽出処理」の一例について説明を行う。 Hereinafter, an example of “local feature extraction processing” used in the present embodiment will be described.

［局所特徴量抽出処理］
画像から局所特徴量を抽出する方法の一例について、図６を用いて説明する。 [Local feature extraction processing]
An example of a method for extracting a local feature amount from an image will be described with reference to FIG.

ステップＳ６０１で、入力された画像から輝度成分を抽出し、抽出した輝度成分に基づいて輝度成分画像を生成する。 In step S601, a luminance component is extracted from the input image, and a luminance component image is generated based on the extracted luminance component.

次にステップＳ６０２で、輝度成分画像を倍率（縮小率）ｐに従って順次縮小することを繰り返し、オリジナルのサイズの画像から段階的に縮小した、オリジナルの画像を含めてｎ枚の縮小画像を生成する。ここで、倍率ｐ及び縮小画像の枚数ｎは予め決められているものとする。 In step S602, the luminance component image is sequentially reduced in accordance with the magnification (reduction rate) p, and n reduced images including the original image, which are reduced stepwise from the original size image, are generated. . Here, it is assumed that the magnification p and the number n of reduced images are determined in advance.

図７は、縮小画像生成処理で得られた縮小画像の一例を示す図である。図７に示す例は、倍率ｐが「２の−（１／４）乗」、縮小画像の枚数ｎが「９」の場合である。もちろん、倍率ｐは必ずしも「２の−（１／４）乗」で無くとも良い。図７において、画像７０１はステップＳ６０１で生成された輝度成分画像である。縮小画像７０２は当該輝度成分画像７０１から倍率ｐに従って再帰的に４回の縮小処理を行って得られた縮小画像である。そして、縮小画像７０３は当該輝度成分画像７０１から倍率ｐに従って８回縮小された縮小画像である。 FIG. 7 is a diagram illustrating an example of a reduced image obtained by the reduced image generation process. The example shown in FIG. 7 is a case where the magnification p is “2 to the power of − (1/4)” and the number n of reduced images is “9”. Of course, the magnification p is not necessarily "2 to the power of-(1/4)". In FIG. 7, an image 701 is the luminance component image generated in step S601. The reduced image 702 is a reduced image obtained by recursively reducing the luminance component image 701 four times in accordance with the magnification p. The reduced image 703 is a reduced image that is reduced from the luminance component image 701 eight times according to the magnification p.

この例では、縮小画像７０２は輝度成分画像７０１が１／２に縮小された画像となり、縮小画像７０３は輝度成分画像７０１が１／４に縮小された画像となる。尚、画像を縮小する方法は何れの方法でも良く、本実施形態では、線形補間による縮小方法により縮小画像を生成するものとする。 In this example, the reduced image 702 is an image obtained by reducing the luminance component image 701 to ½, and the reduced image 703 is an image obtained by reducing the luminance component image 701 to ¼. Any method may be used to reduce the image. In this embodiment, the reduced image is generated by a reduction method using linear interpolation.

次に、ステップＳ６０３では、ｎ枚の縮小画像の各々に画像の回転があってもロバスト（ｒｏｂｕｓｔ）に抽出されるような局所的な特徴点（局所特徴点）を抽出する。この局所特徴点の抽出方法として、本実施形態ではＨａｒｒｉｓ作用素を用いる。 Next, in step S603, local feature points (local feature points) that are robustly extracted even if there is image rotation in each of the n reduced images are extracted. As a method for extracting local feature points, a Harris operator is used in the present embodiment.

具体的には、Ｈａｒｒｉｓ作用素を作用させて得られた出力画像Ｈ上の画素について、当該画素及び当該画素の８近傍にある画素（合計９画素）の画素値を調べる。そして、当該画素が局所極大になる（当該９画素の中で当該画素の画素値が最大になる）点を局所特徴点として抽出する。ここで、当該画素が局所極大になったときでも、当該画素の値がしきい値以下の場合には局所特徴点として抽出しないようにする。 Specifically, with respect to the pixel on the output image H obtained by applying the Harris operator, the pixel values of the pixel and pixels in the vicinity of the pixel (eight pixels in total) (total nine pixels) are examined. Then, a point at which the pixel becomes a local maximum (a pixel value of the pixel becomes the maximum among the nine pixels) is extracted as a local feature point. Here, even when the pixel reaches a local maximum, it is not extracted as a local feature point if the value of the pixel is less than or equal to the threshold value.

なお、局所特徴点を抽出可能な方法であれば、上述のＨａｒｒｉｓ作用素による特徴点抽出方法に限らず、どのような特徴点抽出方法を用いてもよい。 Note that any feature point extraction method may be used as long as it is a method capable of extracting local feature points, without being limited to the feature point extraction method using the above Harris operator.

次に、ステップＳ６０４で、ステップＳ６０３で抽出された局所特徴点の各々について、画像の回転があっても不変となるように定義された特徴量（局所特徴量）を算出する。この局所特徴量の算出方法として、本実施形態ではＬｏｃａｌＪｅｔ及びそれらの導関数の組み合わせを用いる。 Next, in step S604, for each of the local feature points extracted in step S603, a feature quantity (local feature quantity) defined so as to be unchanged even when the image is rotated is calculated. As a method for calculating the local feature amount, in this embodiment, a local jet and a combination of derivatives thereof are used.

具体的には、以下の式（１）により局所特徴量Ｖを算出する。 Specifically, the local feature amount V is calculated by the following equation (1).

ただし、式（１）の右辺で用いている記号は、以下に示す式（２）から式（７）で定義される。ここで、式（２）右辺のＧ（ｘ，ｙ）はガウス関数、Ｉ（ｘ，ｙ）は画像の座標（ｘ，ｙ）における画素値であり、“＊”は畳み込み演算を表す記号である。また、式（３）は式（２）で定義された変数Ｌのｘに関する偏導関数、式（４）は当該変数Ｌのｙに関する偏導関数である。式（５）は式（３）で定義された変数Ｌｘのｙに関する偏導関数、式（６）は式（３）で定義された変数Ｌｘのｘに関する偏導関数、式（７）は式（４）で定義されたＬｙのｙに関する偏導関数である。 However, the symbols used on the right side of the equation (1) are defined by the following equations (2) to (7). Here, G (x, y) on the right side of Expression (2) is a Gaussian function, I (x, y) is a pixel value at image coordinates (x, y), and “*” is a symbol representing a convolution operation. is there. Equation (3) is a partial derivative of variable L defined by equation (2) with respect to x, and equation (4) is a partial derivative of variable L with respect to y. Equation (5) is the partial derivative of variable Lx defined in equation (3) with respect to y, equation (6) is the partial derivative of variable Lx defined in equation (3) with respect to x, and equation (7) is the equation. It is a partial derivative with respect to y of Ly defined in (4).

なお、局所特徴量を算出可能な方法であれば、上述したような特徴量算出方法に限らず、どのような特徴量算出方法を用いてもよい。 In addition, as long as a local feature value can be calculated, the feature value calculation method is not limited to the above-described feature value calculation method, and any feature value calculation method may be used.

以上によって、対象画像から局所特徴量を抽出することができる。 As described above, the local feature amount can be extracted from the target image.

本実施形態の画像処理装置は、２画像間の類似度を求め、２画像間の類似判定の結果を出力する画像照合装置として構成される。本実施形態の構成により、比較先画像における撮像対象の奥行値に基づいて、撮像対象に対応した特徴量のクラスタを生成することができるので、距離範囲の異なる特徴量を区別して、照合精度を向上させることができる。なお、照合精度は比較元画像と比較先画像との２枚の画像が類似するか否かを判定した際の判定精度である。 The image processing apparatus according to the present embodiment is configured as an image collation apparatus that obtains a similarity between two images and outputs a result of similarity determination between the two images. With the configuration of the present embodiment, it is possible to generate a cluster of feature quantities corresponding to the imaging target based on the depth value of the imaging target in the comparison target image. Can be improved. The collation accuracy is a determination accuracy when it is determined whether or not the two images of the comparison source image and the comparison destination image are similar.

［第２の実施形態］
本実施形態の画像処理装置を画像検索装置として用いた例を説明する。画像処理装置は、奥行情報をもつ画像を登録画像又は検索先画像として登録して、登録された複数枚の登録画像又は検索先画像から、クエリ画像又は検索元画像に類似する画像を検索する装置である。第１の実施形態の画像照合装置の比較先画像は一枚であるが、本実施形態の画像検索装置の登録画像又は検索先画像が複数ある。ここでは、本実施形態の画像検索装置と第１の実施形態の画像照合装置との共通部分について説明を省略するが、異なる部分について以下に説明する。 [Second Embodiment]
An example in which the image processing apparatus of this embodiment is used as an image search apparatus will be described. An image processing apparatus registers an image having depth information as a registered image or a search destination image, and searches for an image similar to a query image or a search source image from a plurality of registered images or search destination images. It is. The image collation device of the first embodiment has one comparison destination image, but there are a plurality of registered images or search destination images of the image retrieval device of the present embodiment. Here, description of the common parts of the image search apparatus of the present embodiment and the image collation apparatus of the first embodiment is omitted, but different parts will be described below.

具体的には、登録画像又は検索先画像の登録の際に、奥行情報（奥行値）を用いて登録画像の特徴量をクラスタリングして、クラスタごとに特徴量を記憶する。検索の際には、クエリ画像の特徴量とペアをなす登録画像の特徴量の個数をクラスタごとに数え上げる。次に、所定のペア数を有するクラスタを求めて、求めたクラスタのペア数を用いて画像同士の類似度を生成する。画像同士の類似度の降順に登録画像の一覧を出力することで、クエリ画像に対する画像の検索を実現する。 Specifically, when registering a registered image or a search destination image, the feature amount of the registered image is clustered using depth information (depth value), and the feature amount is stored for each cluster. At the time of search, the number of feature values of registered images that are paired with feature values of query images is counted for each cluster. Next, a cluster having a predetermined number of pairs is obtained, and similarity between images is generated using the obtained number of pairs of clusters. By outputting a list of registered images in descending order of similarity between images, an image search for a query image is realized.

本実施形態では、登録画像は奥行情報を有する画像を利用するが、クエリ画像は奥行情報を有していなくてもよい。 In the present embodiment, the registered image uses an image having depth information, but the query image may not have depth information.

［画像検索装置の構成］
以下、本実施形態の画像検索装置の構成について図８を用いて説明する。 [Configuration of image search device]
Hereinafter, the configuration of the image search apparatus of the present embodiment will be described with reference to FIG.

図８の画像検索装置の画像入力部８０１、特徴量抽出部８０２及びデータ保持部８０３の処理は、図３の画像照合装置の画像入力部３０１、特徴量抽出部３０２及びデータ保持部３０３の処理と同じである。また、図８の特徴量特定部８０５、特徴量比較部８０６、類似判定部８０７及び処理結果出力部８０８の処理は、図３の特徴量特定部３０５、特徴量比較部３０６、類似判定部３０７及び処理結果出力部３０８の処理と同じである。本実施形態では、これらの構成部分の説明を省略する。画像インデックス部８０４について説明する。 The processing of the image input unit 801, feature amount extraction unit 802, and data holding unit 803 of the image search device of FIG. 8 is the same as the processing of the image input unit 301, feature amount extraction unit 302, and data holding unit 303 of the image matching device of FIG. Is the same. Further, the processing of the feature amount specifying unit 805, the feature amount comparing unit 806, the similarity determining unit 807, and the processing result output unit 808 of FIG. The processing is the same as the processing result output unit 308. In the present embodiment, description of these components is omitted. The image index unit 804 will be described.

画像インデックス部８０４は、登録された画像の特徴量と画像ＩＤとクラスタＩＤを関連付けて管理する。 The image index unit 804 manages the registered image feature amount, image ID, and cluster ID in association with each other.

本実施形態の画像インデックス部８０４が管理するデータのデータ構造の一例を図９（ａ）に示す。量子化特徴量と対応づけて、画像ＩＤとクラスタＩＤを管理している。量子化特徴量は、後述する「局所特徴量量子化処理」によって生成される。この量子化値が同じとき、局所特徴量は所定以上の類似度があると判定することができる。なお、以降の説明において、図９（ａ）に示す１行をレコードと呼ぶことにする。 An example of the data structure of the data managed by the image index unit 804 of this embodiment is shown in FIG. The image ID and the cluster ID are managed in association with the quantized feature amount. The quantized feature value is generated by a “local feature value quantization process” to be described later. When the quantized values are the same, it can be determined that the local feature amount has a predetermined degree of similarity or more. In the following description, one line shown in FIG. 9A is called a record.

なお、本実施形態の画像インデックス部８０４が管理するデータのデータ構造では、同じ量子化値を持つレコードが複数作成されるが、これを避けるために、量子化値に対して「画像ＩＤとクラスタＩＤの組」のリストを対応づけてもよい。さらに、同じ量子化値を持つ特徴量が同じ画像の同じクラスタ内に複数あるとき、複数回同じ組が出現する。これを避けるために、「画像ＩＤとクラスタＩＤと個数の組」のリストを量子化値に対応づけるデータ構造にしてもよい。画像インデックス部８０４が管理するデータのデータ構造はこれらに限定されるのではない。例えば、量子化特徴量のクラスタＩＤは、さらに奥行値の属する複数の距離範囲のそれぞれに対応づけて管理することができる。 In the data structure of the data managed by the image index unit 804 of the present embodiment, a plurality of records having the same quantization value are created. To avoid this, “quantity ID and cluster” are used for the quantization value. A list of “ID pairs” may be associated. Furthermore, when there are a plurality of feature quantities having the same quantization value in the same cluster of the same image, the same set appears multiple times. To avoid this, a list of “a set of image ID, cluster ID, and number” may have a data structure that is associated with a quantized value. The data structure of data managed by the image index unit 804 is not limited to these. For example, the cluster ID of the quantized feature value can be managed in association with each of a plurality of distance ranges to which the depth value belongs.

なお、画像インデックス部８０４が管理するデータは、データ保持部８０３に保持されるので、画像インデックス部８０４がデータ保持部８０３の一部として構成することができる。 Note that the data managed by the image index unit 804 is held in the data holding unit 803, so that the image index unit 804 can be configured as a part of the data holding unit 803.

特徴量比較部８０６は、クエリ画像の特徴量と類似する登録画像の特徴量をペアとして求める。具体的には、クエリ画像の特徴量を量子化して、画像インデックス部８０４から同じ量子化値を有するレコードを特定する。これによって、クエリ画像の特徴量に類似する特徴量を含む登録画像のクラスタが特定される。 The feature amount comparison unit 806 obtains a feature amount of the registered image similar to the feature amount of the query image as a pair. Specifically, the feature amount of the query image is quantized, and a record having the same quantized value is specified from the image index unit 804. As a result, a cluster of registered images including a feature amount similar to the feature amount of the query image is specified.

類似判定部８０７は、画像ＩＤとクラスタＩＤの組ごとに、特徴量比較部８０６によって得たペアの数を数え上げる。具体的には、特徴量比較部８０６を用いて、クエリ画像から抽出された局所特徴量に類似する特徴量を画像インデックス部８０４から見つける。次に、見つけた特徴量に対応する「画像とクラスタの組」に対して１票を投票していく。これによって、画像ＩＤとクラスタＩＤの組ごとにペアの数を数え上げていく。 The similarity determination unit 807 counts the number of pairs obtained by the feature amount comparison unit 806 for each set of image ID and cluster ID. Specifically, using the feature amount comparison unit 806, a feature amount similar to the local feature amount extracted from the query image is found from the image index unit 804. Next, one vote is voted for “a set of image and cluster” corresponding to the found feature amount. As a result, the number of pairs is counted for each set of image ID and cluster ID.

概念的には、図９（ｂ）に示すように、画像ＩＤとクラスタＩＤごとに投票値が生成されることになる。なお、ペア数９０１は画像ＩＤ：００１のクラスタＩＤ：００２の投票値を表している。ペア数９０２は画像ＩＤ：００１のクラスタＩＤ：００５の投票値を表している。この図では、クラスタＩＤが小さい方が、奥行値が小さい（撮像対象が手前にある）クラスタを表している。そのため、図９の例では、前景にあたるクラスタのペア数９０１と背景にあたるクラスタのペア数９０２が分かれて求まることが分かる。このように、所定の距離範囲ごとにペア数を数え上げることがなされる。 Conceptually, as shown in FIG. 9B, a vote value is generated for each image ID and cluster ID. The number of pairs 901 represents the vote value of the cluster ID: 002 of the image ID: 001. The number of pairs 902 represents the vote value of the cluster ID: 005 of the image ID: 001. In this figure, the smaller cluster ID represents a cluster with a smaller depth value (the imaging target is in front). Therefore, in the example of FIG. 9, it can be seen that the number 901 of pairs of clusters corresponding to the foreground and the number 902 of pairs of clusters corresponding to the background are obtained separately. In this way, the number of pairs is counted for each predetermined distance range.

類似判定部８０７は、画像ＩＤごとに所定のペア数を有するクラスタを特定して、特定されたクラスタのペア数をクエリ画像と登録画像の類似度として生成する。具体的には、最大のペア数を有するクラスタを特定して、特定されたクラスタのペア数を類似度とする。例えば、類似判定部８０７によって図９（ｂ）に示すようにクラスタごとにペア数が得られたとき、図９（ｃ）に示すように画像ＩＤごとにペア数の最大値が類似度として生成される。 The similarity determination unit 807 identifies clusters having a predetermined number of pairs for each image ID, and generates the number of pairs of the identified clusters as the similarity between the query image and the registered image. Specifically, the cluster having the maximum number of pairs is specified, and the number of pairs of the specified clusters is set as the similarity. For example, when the similarity determination unit 807 obtains the number of pairs for each cluster as shown in FIG. 9B, the maximum number of pairs is generated as the similarity for each image ID as shown in FIG. 9C. Is done.

あるいは、ペア数の最大値の８０％に相当する値以上のペア数を有するクラスタを特定して、特定されたクラスタのペア数を和算したものを類似度としてもよい。あるいは、予め定めた閾値以上のペア数を有するクラスタを特定して、特定されたクラスタのペア数を和算したものを用いてもよい。 Or it is good also considering the cluster which has the number of pairs more than the value equivalent to 80% of the maximum value of the number of pairs, and adding the number of pairs of the identified cluster as a similarity. Alternatively, a cluster having the number of pairs equal to or greater than a predetermined threshold value may be specified, and the sum of the specified number of pairs of clusters may be used.

処理結果出力部８０８は、類似判定部８０７が画像ＩＤの全てに上記処理を適用することで、画像ＩＤごとに得られた類似度を出力することができる。あるいは、処理結果出力部８０８は、類似度の降順に画像又は画像ＩＤを出力してもよい。 The processing result output unit 808 can output the similarity obtained for each image ID when the similarity determination unit 807 applies the above processing to all of the image IDs. Alternatively, the processing result output unit 808 may output images or image IDs in descending order of similarity.

［画像登録処理］
次に、画像登録処理についてフローチャート図１０を用いて説明する。本処理の実行時には登録画像および登録画像の画像ＩＤが与えられる。そして、登録画像から抽出した局所特徴量をクラスタに分けて画像インデックス部８０４に保存する。具体的な制御内容について図１０を用いて説明する。 [Image registration process]
Next, the image registration process will be described with reference to the flowchart of FIG. When this process is executed, a registered image and an image ID of the registered image are given. Then, the local feature amount extracted from the registered image is divided into clusters and stored in the image index unit 804. Specific control contents will be described with reference to FIG.

ステップＳ１００１では、登録画像から局所特徴量を抽出する。画像から特徴的な点（局所特徴点）を抽出する。次に、当該局所特徴点とその周辺の画像情報とに基づいて、当該局所特徴点に対応する特徴量（局所特徴量）を計算する。局所特徴量抽出処理は、第１の実施形態と同じである。 In step S1001, a local feature amount is extracted from the registered image. Characteristic points (local feature points) are extracted from the image. Next, a feature amount (local feature amount) corresponding to the local feature point is calculated based on the local feature point and surrounding image information. The local feature amount extraction process is the same as that in the first embodiment.

ステップＳ１００２では、局所特徴量を量子化する。類似する特徴量は同じ量子化値を持つように量子化が行われる。具体的な処理内容については、局所特徴量量子化処理として後ほど図１２を用いて説明する。 In step S1002, the local feature is quantized. Quantization is performed so that similar feature quantities have the same quantization value. Specific processing contents will be described later with reference to FIG. 12 as local feature quantization processing.

ステップＳ１００３では、特徴量をクラスタリングする。具体的には、距離情報取得部７０２によって各特徴点の位置の奥行値を得る。そして、奥行値を用いて、特徴量特定部８０５によって特徴量はクラスタに分けられる。 In step S1003, feature quantities are clustered. Specifically, the depth information of the position of each feature point is obtained by the distance information acquisition unit 702. Then, the feature value is divided into clusters by the feature value specifying unit 805 using the depth value.

ステップＳ１００４では、特徴量が画像インデックス部８０４に登録される。具体的には、ステップＳ１００３で得たクラスタに対してＩＤを割り当てる。例えば、奥行値が小さい（撮像対象が手前にある）距離範囲に対応する特徴量のクラスタから順に１から１ずつ加算して連続した整数をクラスタＩＤとして割り当てる。そして、各特徴量の量子化値と画像ＩＤとクラスタＩＤを対応付けた組にして画像インデックス部８０４に保存する。 In step S1004, the feature amount is registered in the image index unit 804. Specifically, an ID is assigned to the cluster obtained in step S1003. For example, consecutive integers are assigned as cluster IDs by adding 1 to 1 in order from a cluster of feature values corresponding to a distance range having a small depth value (an imaging target is in front). Then, the image index unit 804 stores the quantization value of each feature value, the image ID, and the cluster ID in association with each other.

［画像検索処理］
次に、画像検索処理についてフローチャート図１１を用いて説明する。本処理の実行時にはクエリ画像が与えられる。そして、クエリ画像から局所特徴量を抽出して、クエリ画像の特徴量とペアをなす登録画像の特徴量の個数をクラスタごとに数え上げる。次に、所定のペア数を有するクラスタを求めて、該クラスタのペア数を用いて画像同士の類似度を生成する。各画像の類似度と画像ＩＤの組の一覧を検索結果として生成する。具体的な制御内容については図１１を用いて説明する。 [Image search processing]
Next, image search processing will be described with reference to the flowchart of FIG. A query image is given when this process is executed. Then, local feature amounts are extracted from the query image, and the number of registered image feature amounts paired with the query image feature amount is counted for each cluster. Next, a cluster having a predetermined number of pairs is obtained, and similarity between images is generated using the number of pairs of the clusters. A list of combinations of similarity and image ID of each image is generated as a search result. Specific control contents will be described with reference to FIG.

ステップＳ１１０１では、クエリ画像から局所特徴量を抽出する。局所特徴量抽出処理は、第１の実施形態と同じである。 In step S1101, a local feature amount is extracted from the query image. The local feature amount extraction process is the same as that in the first embodiment.

ステップＳ１１０２では、局所特徴量を量子化する。具体的な処理内容については、局所特徴量量子化処理として後ほど図１２を用いて説明する。 In step S1102, the local feature is quantized. Specific processing contents will be described later with reference to FIG. 12 as local feature quantization processing.

ステップＳ１１０３〜ステップＳ１１０６では、クエリ画像から抽出された局所特徴量に類似する特徴量を画像インデックス部８０４から見つけ、見つけた特徴量に対応する「画像とクラスタの組」に対して１票を投票する処理を行う。概念的には、図９（ｂ）に示すように、画像ＩＤとクラスタＩＤごとに投票値を生成することを行う。以下ステップごとに説明する。 In step S1103 to step S1106, a feature amount similar to the local feature amount extracted from the query image is found from the image index unit 804, and one vote is voted for “a set of image and cluster” corresponding to the found feature amount. Perform the process. Conceptually, as shown in FIG. 9B, a vote value is generated for each image ID and cluster ID. Each step will be described below.

ステップＳ１１０３では、ステップＳ１１０２で得たクエリの量子化した局所特徴量を順に処理するためのループであり、特徴量には１から順に番号が割り当てられているものとする。これを変数ｉを用いて参照するため、はじめにｉを１に初期化する。さらに、ｉが局所特徴量の個数以下であるときステップＳ５０４へ移り、これを満たさないときループを抜けてステップＳ１１０７へ移る。 Step S1103 is a loop for sequentially processing the quantized local feature quantities of the query obtained in step S1102, and numbers are assigned to the feature quantities in order from 1. In order to refer to this using the variable i, first, i is initialized to 1. Further, when i is equal to or smaller than the number of local feature values, the process proceeds to step S504. When i is not satisfied, the process exits the loop and proceeds to step S1107.

ステップＳ１１０４では、特徴量比較部８０６によって、クエリ画像の特徴量ｉと類似する登録画像の特徴量を見つける。具体的には、図９（ａ）に示す画像インデックス部８０４の量子化特徴量を走査して、クエリの量子化特徴量ｉに一致するレコードを特定する。同じ量子化値をもつレコードが複数あるとき、複数のレコードが得られることになる。 In step S1104, the feature amount comparison unit 806 finds the feature amount of the registered image similar to the feature amount i of the query image. Specifically, the quantization feature amount of the image index unit 804 shown in FIG. 9A is scanned to identify a record that matches the quantization feature amount i of the query. When there are a plurality of records having the same quantized value, a plurality of records are obtained.

ステップＳ１１０５では、類似判定部８０７によって、「画像とクラスタの組」に対してペア数を加算することが行われる。具体的には、図９（ｂ）に示すような情報を保持するために、画像ＩＤとクラスタＩＤと投票値を保持するリストなどを予め用意しておく。そして、ステップＳ１１０４で得たレコードの画像ＩＤとクラスタＩＤの組の投票値に１加算する。これをステップＳ１１０４で見つけたレコード全てに対して行う。 In step S <b> 1105, the similarity determination unit 807 adds the number of pairs to “a set of images and clusters”. Specifically, in order to hold information as shown in FIG. 9B, a list for holding an image ID, a cluster ID, and a vote value is prepared in advance. Then, 1 is added to the vote value of the set of the image ID and cluster ID of the record obtained in step S1104. This is performed for all the records found in step S1104.

ステップＳ１１０６は、ループの終端であり、ｉに１を加算してステップＳ５０３へ戻る。 Step S1106 is the end of the loop, 1 is added to i, and the flow returns to step S503.

ステップＳ１１０７〜ステップＳ１１０９の処理は、類似判定部７０６によって行われ、クエリ画像との類似度を生成する処理である。概念的には、図９（ｃ）に示すように、所定の投票値を有するクラスタを特定して、該クラスタの投票値を用いて類似度を生成する。以下ステップごとに説明する。 The processes in steps S1107 to S1109 are performed by the similarity determination unit 706 and generate a similarity with the query image. Conceptually, as shown in FIG. 9C, a cluster having a predetermined vote value is specified, and the similarity is generated using the vote value of the cluster. Each step will be described below.

ステップＳ１１０７は、ステップＳ１１０３〜ステップＳ１１０６で得た画像ＩＤを順に処理するためのループである。ステップＳ１１０３〜ステップＳ１１０６で得た画像ＩＤは１から順番に番号が割り当てられているものとする。これを変数ｉを用いて参照するため、はじめにｉを１に初期化する。さらに、ｉがステップＳ１１０３〜ステップＳ１１０６で得た画像ＩＤの個数以下であるときステップＳ１１０８へ移り、これを満たさないときループを抜けてステップＳ１１１０へ移る。 Step S1107 is a loop for sequentially processing the image IDs obtained in steps S1103 to S1106. Assume that numbers are assigned in order from 1 to the image IDs obtained in steps S1103 to S1106. In order to refer to this using the variable i, first, i is initialized to 1. Furthermore, when i is equal to or less than the number of image IDs obtained in steps S1103 to S1106, the process proceeds to step S1108, and when this is not satisfied, the process exits the loop and proceeds to step S1110.

ステップＳ１１０８では、類似判定部７０６によって、ｉ番目の画像ＩＤの類似度が生成される。具体的には、ｉ番目の画像ＩＤの中にあって、最大の投票値を持つクラスタを特定して、該クラスタの投票値を類似度とする。この類似度を画像ＩＤと対応づけて記憶する。 In step S1108, the similarity determination unit 706 generates the similarity of the i-th image ID. Specifically, a cluster having the largest vote value in the i-th image ID is specified, and the vote value of the cluster is set as the similarity. This similarity is stored in association with the image ID.

ステップＳ１１０９は、ループの終端であり、ｉに１を加算してステップＳ１１０７へ戻る。 Step S1109 is the end of the loop, 1 is added to i, and the flow returns to step S1107.

ステップＳ１１１０では、ステップＳ１１０８で記憶していた画像ＩＤを類似度に基づいて降順にソートして、処理結果出力部８０８がソートされた画像又は画像を特定する情報（画像ＩＤ又はファイル名）の一覧を処理結果としてモニタ２１０などに出力する。これによって、クエリ画像に類似する登録画像の画像ＩＤを類似する順に得ることができる。なお、処理結果出力部８０８は、ステップＳ１１０８で記憶していた画像ＩＤで特定される全ての画像ではなく、類似度が所定値以上の登録画像だけを処理結果として出力してもよい。 In step S1110, the image IDs stored in step S1108 are sorted in descending order based on the similarity, and the processing result output unit 808 lists the sorted images or information for identifying the images (image IDs or file names). Is output to the monitor 210 or the like as a processing result. Thereby, the image IDs of registered images similar to the query image can be obtained in the order of similarity. Note that the processing result output unit 808 may output not only all images specified by the image ID stored in step S1108 but only registered images having a similarity equal to or higher than a predetermined value as processing results.

なお、ステップＳ１１０８において、最大の投票値を持つクラスタを特定して、該クラスタの投票値を類似度としていた。しかし、クエリに映るオブジェクトが、登録画像側で撮像対象の奥行値が異なる距離範囲（例えば、前景と背景）にそれぞれ存在していることもある。このとき、最大の投票値をとるだけだと、前景又は背景の何れか一方のオブジェクトとの一致で類似度が決定してしまう。そこで、所定数以上の投票値を有するクラスタを特定して、該クラスタの投票値の和を類似度としてもよい。例えば、最大値の８０％に相当する値以上の投票値のクラスタを特定して、特定されたクラスタの投票値の和を類似度とすることができる。 In step S1108, the cluster having the maximum vote value is specified, and the vote value of the cluster is set as the similarity. However, the object shown in the query may exist in a distance range (for example, foreground and background) in which the depth value of the imaging target is different on the registered image side. At this time, if only the maximum voting value is taken, the similarity is determined by matching with either the foreground object or the background object. Therefore, a cluster having a predetermined number or more of voting values may be specified, and the sum of the voting values of the clusters may be used as the similarity. For example, a cluster having a vote value equal to or greater than a value corresponding to 80% of the maximum value can be specified, and the sum of the vote values of the specified cluster can be used as the similarity.

これによって、類似する複数のオブジェクトが前景と背景のそれぞれに写る画像を検索したい場合にも適切な類似度が得られるようになる。 As a result, an appropriate similarity can be obtained even when it is desired to search for an image in which a plurality of similar objects appear in each of the foreground and the background.

また、図９（ｂ）では投票値が０の画像ＩＤとクラスタＩＤの組も図示されている。しかし、ステップＳ１１０５で用いる画像ＩＤとクラスタＩＤと投票値を保持するリストでは、投票値が０のときはリスト上で保持しないようにしてもよい。このようにすることで、メモリの使用量を少なくすることができる。反対に、画像ＩＤとクラスタＩＤの全ての組み合わせに対して投票値を参照可能にする連想配列を予め構成しておいてもよい。このとき、高速に投票値を特定することができ、投票処理の高速化が期待できる。なお、この場合は画像ごとにクラスタ数が分かる必要があるため、画像インデックス部８０４にてクラスタ数を同時に管理するようにしてもよい。あるいは別の手段によってクラスタ数を管理するようにしてもよい。 FIG. 9B also shows a set of an image ID and a cluster ID with a vote value of 0. However, the list that holds the image ID, the cluster ID, and the vote value used in step S1105 may not be held on the list when the vote value is 0. In this way, the memory usage can be reduced. Conversely, an associative array that enables referencing of vote values for all combinations of image IDs and cluster IDs may be configured in advance. At this time, the voting value can be specified at high speed, and speeding up of the voting process can be expected. In this case, since the number of clusters needs to be known for each image, the image index unit 804 may manage the number of clusters simultaneously. Alternatively, the number of clusters may be managed by another means.

以下では、本実施例で用いた「局所特徴量量子化処理」の一例について説明を行う。 In the following, an example of “local feature quantization processing” used in the present embodiment will be described.

［局所特徴量量子化処理］
局所特徴量同士のマッチングを行いやすくするために、上記の局所特徴量を量子化する。 [Local feature quantization]
In order to facilitate matching between local feature quantities, the local feature quantities are quantized.

例えば、局所特徴量をＮ次元ベクトルＶとし、各次元をＶｎと表記する。このとき、ｎ番目の次元の特徴量について、Ｋｎ階調の量子化は、以下の式（８）により行うことができる。なお、ＮおよびＫｎは予め決められた値である。
Ｑｎ＝（Ｖｎ＊Ｋｎ）／（Ｖｎ＿ｍａｘ−Ｖｎ＿ｍｉｎ＋１）・・・（８）
ここで、Ｑｎは、Ｎ次元のうちのｎ番目の次元の特徴量Ｖｎを量子化した値である。Ｖｎ＿ｍａｘとＶｎ＿ｍｉｎはそれぞれｎ番目の次元の特徴量の取りうる値の最大値、および、最小値である。 For example, the local feature amount is represented as an N-dimensional vector V, and each dimension is represented as Vn. At this time, the quantization of the Kn gradation can be performed by the following formula (8) for the feature quantity of the nth dimension. N and Kn are predetermined values.
Qn = (Vn * Kn) / (Vn_max−Vn_min + 1) (8)
Here, Qn is a value obtained by quantizing the feature quantity Vn of the nth dimension among the N dimensions. Vn_max and Vn_min are a maximum value and a minimum value that can be taken by the feature quantity of the nth dimension, respectively.

なお、上記の量子化では、次元ごとに量子化階調数を定めているが、全次元で共通の階調数を用いてもよい。この量子化方法は、図１２に示すように、特徴量空間を格子状に分割する方法である。この図で、格子１２０１は特徴量空間における量子化領域、点１２０２は各特徴を表している。図１２は二次元の特徴量空間を量子化分割している例であるが、これを局所特徴量の次元数分の多次元に拡張した分割を行う。 In the above quantization, the number of quantization gradations is determined for each dimension, but a common number of gradations may be used in all dimensions. As shown in FIG. 12, this quantization method is a method of dividing the feature amount space into a lattice pattern. In this figure, a lattice 1201 represents a quantization region in the feature amount space, and a point 1202 represents each feature. FIG. 12 shows an example in which a two-dimensional feature amount space is quantized and divided, and the division is performed by expanding this to multi-dimensions corresponding to the number of dimensions of the local feature amount.

また、特徴量空間を分割可能な方法であれば、上述したような規則に基づいて量子化する方法に限らずに、どのような分割方法でも適用可能である。例えば、複数の画像を機械学習させることによりクラスタリングのルールを作成し、そのルールに則って特徴量空間を分割し、量子化するようにしてもよい。 Further, as long as it is a method that can divide the feature amount space, any division method can be applied without being limited to the method of quantizing based on the rules as described above. For example, a clustering rule may be created by machine learning of a plurality of images, and the feature amount space may be divided and quantized according to the rule.

また、各次元についての量子化を行った後、以下の式（９）により、量子化値群のラベル化を行うことで、実質的に一次元の特徴量と同等に扱うことも可能である。
ＩＤＸ＝Ｑ_１＋Ｑ_２＊Ｋ_１＋Ｑ_３＊Ｋ_１＊Ｋ_２＋・・・＋Ｑ_ｎ＊Ｋ_１＊Ｋ_２＊・・・＊Ｋ_ｎ−１・・・（９）
また、全次元の階調数が共通の場合は、以下の式（１０）により、量子化値群のラベル化が可能である。ここで、Ｋは階調数である。 In addition, after performing quantization for each dimension, the quantization value group is labeled according to the following equation (9), so that it can be handled substantially equivalent to a one-dimensional feature amount. .
IDX = Q ₁ + Q ₂ * K ₁ + Q ₃ * K ₁ * K ₂ + ... + Q _n * K ₁ * K ₂ * ... * K _n-1 (9)
Further, when the number of gradations of all dimensions is common, the quantization value group can be labeled by the following equation (10). Here, K is the number of gradations.

なお、ラベル化可能な算出方法であれば、上述したような算出方法に限らずに、どのようなラベル化方法を用いてもよい。 Note that any labeling method may be used as long as the calculation method allows labeling, without being limited to the above-described calculation method.

量子化値をキーとして画像ＩＤなどを検索可能とするデータベースなどを構成することで、局所特徴量同士のマッチングを高速に行うことが可能になる。これを画像インデックスと呼ぶ。 By configuring a database or the like that can search for an image ID or the like using a quantized value as a key, matching between local feature quantities can be performed at high speed. This is called an image index.

本実施形態の画像検索装置によって、図９（ｃ）に示すようにペア数の投票値を奥行方向の複数の距離範囲に対応する複数のクラスタごとに分けて得る。そのため、登録画像の前景にあるオブジェクトを写した画像がクエリ画像であるとき、「前景の特徴量との一致数（ペア数）」と「背景の特徴量との偶然の一致数」が分けて得られる。そして、偶然の一致は正しい一致に比べて少ないため、最大値を有するクラスタを特定することで、正しい一致により得られた投票値を得ることを可能としている。従って、登録画像又は検索先画像から、クエリ画像又は検索元画像に類似する画像を検索する検索精度を高めることができる。 With the image search device of the present embodiment, as shown in FIG. 9C, the vote value of the number of pairs is obtained separately for each of a plurality of clusters corresponding to a plurality of distance ranges in the depth direction. Therefore, when the image of the object in the foreground of the registered image is a query image, the “number of matches with the foreground feature quantity (number of pairs)” and “number of coincidence with the background feature quantity” are separated. can get. Since the chance coincidence is smaller than the correct coincidence, it is possible to obtain the vote value obtained by the correct coincidence by specifying the cluster having the maximum value. Accordingly, it is possible to improve the search accuracy for searching for an image similar to the query image or the search source image from the registered image or the search destination image.

例えば、図９（ｃ）に示す例では、従来は画像ＩＤ００１と画像ＩＤ００２の類似度はともに２６０で同じであった。しかし、本実施形態の画像検索装置では、画像ＩＤ００１と画像ＩＤ００２の類似度は２００と１８０と異なり、画像ＩＤ００１の方がより類似していることが分かるようになる。 For example, in the example shown in FIG. 9C, conventionally, the similarity between the image ID 001 and the image ID 002 is 260, which is the same. However, in the image search apparatus of the present embodiment, the similarity between the image ID 001 and the image ID 002 is different from 200 and 180, and it can be seen that the image ID 001 is more similar.

［第３の実施形態］
第１の実施形態又は第２の実施形態において、奥行値を階調化して特徴量をクラスタリングすると、一つのクラスタにオブジェクトの特徴量がおさまらないことがありえる。例えば、クエリ画像に車のような奥行のあるオブジェクトが写っている場合は、車の特徴量は二つ以上の距離範囲に対応する複数のクラスタに割り当てられてしまう可能性がある。そこで、近傍のクラスタの投票値を加算して得られた投票値が所定値以上のクラスタを特定して、該クラスタからクエリ画像と登録画像の類似度を生成する画像検索装置について述べる。なお、第１の実施形態又は第２の実施形態と同様に、特徴量の複数のクラスタはそれぞれ距離範囲と対応する。 [Third Embodiment]
In the first embodiment or the second embodiment, if the depth values are gradationized and the feature amounts are clustered, the feature amounts of the object may not be contained in one cluster. For example, when an object having a depth such as a car is shown in the query image, there is a possibility that the feature amount of the car is assigned to a plurality of clusters corresponding to two or more distance ranges. Therefore, an image search apparatus that identifies clusters whose vote values obtained by adding the vote values of neighboring clusters to a predetermined value or more and generates the similarity between the query image and the registered image will be described. Note that, as in the first embodiment or the second embodiment, each of the plurality of clusters of feature amounts corresponds to a distance range.

本実施形態の画像検索装置の構成は、第２の実施形態で示した図８の構成と同じである。ただし、類似判定部８０７の動作が異なる。本実施形態の類似判定部８０７は、画像ＩＤごとに近傍のクラスタの投票値を加算する。次に、所定値以上の投票値を有するクラスタを特定して、該クラスタからクエリ画像と登録画像の類似度を生成する。 The configuration of the image search apparatus of the present embodiment is the same as the configuration of FIG. 8 shown in the second embodiment. However, the operation of the similarity determination unit 807 is different. The similarity determination unit 807 of this embodiment adds the voting values of neighboring clusters for each image ID. Next, a cluster having a vote value greater than or equal to a predetermined value is specified, and the similarity between the query image and the registered image is generated from the cluster.

例えば、類似判定部８０７によって図１４の上側に記載の画像とクラスタの組に対しての投票値が求まったとき、図１４の下側に示すように近傍のクラスタの投票値を加算する。この例では、左右あわせて３つのクラスタの投票値を足し合わせている。ただし、クラスタＩＤ：００１のように左にクラスタがないときは、その方向の投票値は無視して存在するクラスタの投票値のみを足し合わせている。このようにして、近傍のクラスタの投票値を足し合わせる。その後、第１の実施形態と同様に、所定以上の投票値を有するクラスタを特定して、該クラスタからクエリ画像と登録画像の類似度を生成する。具体的には、第１の実施形態と同様に、特定したクラスタの最大値を類似度として用いる。あるいは、最大値の８０％に相当する値以上のペア数を有するクラスタを特定して、該クラスタのペア数を和算したものを類似度としてもよい。あるいは、所定の閾値以上のペア数を有するクラスタを特定して、該クラスタのペア数を和算したものを用いてもよい。 For example, when the similarity determination unit 807 obtains voting values for the set of images and clusters shown on the upper side of FIG. 14, the voting values of neighboring clusters are added as shown on the lower side of FIG. In this example, the vote values of three clusters are added together on the left and right. However, when there is no cluster on the left like cluster ID: 001, the vote value in that direction is ignored and only the vote values of the existing clusters are added. In this way, the vote values of neighboring clusters are added together. Thereafter, as in the first embodiment, a cluster having a vote value greater than or equal to a predetermined value is specified, and the similarity between the query image and the registered image is generated from the cluster. Specifically, as in the first embodiment, the maximum value of the identified cluster is used as the similarity. Alternatively, it is also possible to specify a cluster having the number of pairs equal to or greater than the value corresponding to 80% of the maximum value and add the number of pairs of the clusters as the similarity. Alternatively, a cluster having the number of pairs equal to or larger than a predetermined threshold value may be specified and the number of pairs of the clusters may be added.

次に、本実施形態の画像検索処理についてフローチャート図１３を用いて説明する。フローチャート図１３は、第１の実施形態のフローチャート図１１のステップＳ１１０６とステップＳ１１０７の間に処理が追加されたところが異なる点である。そのため、ステップＳ１３０１〜ステップＳ１３０６はステップＳ１１０１〜ステップＳ１１０６と同じである。同様にステップＳ１３１２〜ステップＳ１３１５も、ステップＳ１１０７〜ステップＳ１１１０と同じである。以下、新たに追加された処理ステップＳ１３０７〜ステップＳ１３１１について、ステップごとに説明する。 Next, image search processing according to the present embodiment will be described with reference to a flowchart of FIG. The flowchart of FIG. 13 is different in that processing is added between step S1106 and step S1107 of the flowchart of FIG. 11 of the first embodiment. Therefore, step S1301 to step S1306 are the same as step S1101 to step S1106. Similarly, steps S1312-S1315 are the same as steps S1107-S1110. Hereinafter, newly added processing steps S1307 to S1311 will be described step by step.

ステップＳ１３０７〜ステップＳ１３１１は、画像ＩＤごとに近傍のクラスタの投票値を加算することを行う。なお、近傍の数は予め与えられているものとする。この例では前後含めて３つのクラスタを近傍とする。以下ステップごとに説明する。 Steps S1307 to S1311 add the voting values of neighboring clusters for each image ID. It is assumed that the number of neighbors is given in advance. In this example, three clusters including the front and rear are assumed to be neighbors. Each step will be described below.

ステップＳ１３０７は、ステップＳ１３０３〜ステップＳ１３０６で得た画像ＩＤを順に処理するためのループである。ステップＳ１３０３〜ステップＳ１３０６で得た画像ＩＤは１から順番に番号が割り当てられているものとする。これを変数ｉを用いて参照するため、はじめにｉを１に初期化する。さらに、ｉがステップＳ１３０３〜ステップＳ１３０６で得た画像ＩＤの個数以下であるときステップＳ１３０８へ移り、これを満たさないときループを抜けてステップＳ１３１２へ移る。 Step S1307 is a loop for sequentially processing the image IDs obtained in steps S1303 to S1306. Assume that the image IDs obtained in steps S1303 to S1306 are assigned numbers in order from 1. In order to refer to this using the variable i, first, i is initialized to 1. Further, when i is equal to or less than the number of image IDs obtained in steps S1303 to S1306, the process proceeds to step S1308, and when this is not satisfied, the process exits the loop and proceeds to step S1312.

ステップＳ１３０８では、画像ｉのクラスタを順に処理するためのループである。クラスタＩＤは１から順番に番号が割り当てられているものとする。これを変数ｊを用いて参照するため、はじめにｊを１に初期化する。さらに、ｊがクラスタの個数以下であるときステップＳ１３０９へ移り、これを満たさないときループを抜けてステップＳ１３１１へ移る。 Step S1308 is a loop for sequentially processing the cluster of the image i. Assume that the cluster IDs are assigned numbers in order from 1. In order to refer to this using the variable j, j is first initialized to 1. Further, when j is equal to or less than the number of clusters, the process proceeds to step S1309, and when this is not satisfied, the process exits the loop and proceeds to step S1311.

ステップＳ１３０９では、画像ｉのクラスタｊの近傍のペア数を足し合わせる。例えば、前述した図１４の例の通り、クラスタｊがクラスタＩＤ：００２であるとき、前後あわせた近傍３つのクラスタのペア数を足し合わせるため、クラスタＩＤ：００１〜００３のペア数を足し合わせる。これを別途用意した画像ＩＤとクラスタＩＤとペア数を記憶するリストに保存する。このリストは後段のステップＳ１３１２以降で使用される。 In step S1309, the number of pairs in the vicinity of cluster j of image i is added. For example, as shown in the example of FIG. 14 described above, when the cluster j is the cluster ID: 002, the number of pairs of the cluster IDs: 001 to 003 is added in order to add up the number of pairs of neighboring three clusters. This is stored in a list that stores separately prepared image IDs, cluster IDs, and the number of pairs. This list is used in step S1312 and subsequent steps.

ステップＳ１３１０は、クラスタのループの終端であり、ｊに１を加算してステップＳ１３０８へ戻る。 Step S1310 is the end of the loop of the cluster, and 1 is added to j, and the process returns to step S1308.

ステップＳ１３１１は、画像のループの終端であり、ｉに１を加算してステップＳ１３０７へ戻る。 Step S1311 is the end of the image loop, and 1 is added to i, and the flow returns to step S1307.

なお、本実施形態では、近傍数は予め与えられることを想定していた。しかしながら、これをクエリ画像から求めてもよい。例えば、クエリ画像も奥行値を有するとき、奥行値をクラスタリングするなどして、クエリ画像に写っているオブジェクトの特徴量群のクラスタに対応する距離範囲からオブジェクトの厚みを得る。あるいは、最大の特徴量数を有するクラスタの厚みを得る。これによって得たクエリ画像にあるオブジェクトの厚みを、階調幅で割ることで、近傍数を求めることができる。例えば、クエリ画像に車が写っており、クラスタリングで得たオブジェクトの厚みが３ｍであったとき、さらに階調幅が１ｍであったとき、近傍数は３と求めることができる。 In the present embodiment, it is assumed that the number of neighbors is given in advance. However, this may be obtained from the query image. For example, when the query image also has a depth value, the depth value is clustered to obtain the thickness of the object from the distance range corresponding to the cluster of the feature amount group of the object shown in the query image. Alternatively, the thickness of the cluster having the maximum number of features is obtained. The number of neighbors can be obtained by dividing the thickness of the object in the obtained query image by the gradation width. For example, when the car is shown in the query image, the thickness of the object obtained by clustering is 3 m, and the gradation width is 1 m, the number of neighbors can be calculated as 3.

あるいは、画像認識などの技術によって、クエリ画像のオブジェクト種別を得て、そのオブジェクト種別にあった厚みを用いて近傍数を得てもよい。例えば、人体検出等を適用した結果、クエリ画像から人物が検出されたとき、予め用意した一般的な人物の厚みを用いることが考えられる。あるいは単純に、クエリ画像とともにユーザ入力で得るようにしてもよい。本実施形態におけるオブジェクトの厚みの特定方法はこれらに限定されない。 Alternatively, the object type of the query image may be obtained by a technique such as image recognition, and the number of neighbors may be obtained using the thickness suitable for the object type. For example, when a person is detected from a query image as a result of applying human body detection or the like, it is conceivable to use a general person thickness prepared in advance. Or you may make it obtain simply by a user input with a query image. The method for specifying the thickness of the object in the present embodiment is not limited to these.

本実施形態の画像検索装置によって、奥行値を階調化する際の階調幅がオブジェクトの厚みより小さく、同じオブジェクトの特徴量を分断して異なるクラスタに登録してしまう場合に対応する。このときは、近傍のクラスタのペア数を足し合わせることで、適切に求めたペア数をもとに類似度を生成できる。例えば、画像インデックス部８０４に登録されるクラスタに対応する距離範囲の間隔（階調幅）が１ｍであるとする。このとき、車などのオブジェクトは奥行き方向に３ｍ程度の幅がある。そのため、いくつかのクラスタに車の特徴量が分けて登録されてしまう。第１の実施形態又は第２の実施形態では、この分かれたクラスタの１つから類似度が決定されてしまう。しかしながら、本実施形態では、近傍のクラスタのペア数を足し合わせる。そのため、近傍数を３つとすれば、およそ３ｍ程度の奥行き幅でペア数を求めることができるようになる。そのため、正しく車全体の特徴量と一致したペア数が求められることになる。 This corresponds to the case where the image search apparatus of the present embodiment has a gradation width when gradation of depth values is smaller than the thickness of the object, and the feature amount of the same object is divided and registered in different clusters. At this time, by adding the number of pairs of neighboring clusters, the similarity can be generated based on the number of pairs obtained appropriately. For example, it is assumed that the distance (gradation width) of the distance range corresponding to the cluster registered in the image index unit 804 is 1 m. At this time, an object such as a car has a width of about 3 m in the depth direction. As a result, car features are registered separately in several clusters. In the first embodiment or the second embodiment, the similarity is determined from one of the divided clusters. However, in this embodiment, the number of pairs of neighboring clusters is added. Therefore, if the number of neighbors is 3, the number of pairs can be obtained with a depth width of about 3 m. Therefore, the number of pairs that correctly matches the feature amount of the entire vehicle is obtained.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

３０１画像入力部
３０２特徴量抽出部
３０３データ保持部
３０５特徴量特定部
３０６特徴量比較部
３０７類似判定部
３０８処理結果出力部 DESCRIPTION OF SYMBOLS 301 Image input part 302 Feature-value extraction part 303 Data holding part 305 Feature-value specific | specification part 306 Feature-value comparison part 307 Similarity determination part 308 Processing result output part

Claims

Extraction means for extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding means for holding depth information of the imaging target corresponding to each feature point of the second image;
A specifying means for specifying at least one distance range based on the depth information;
A determination unit that determines that the second image is similar to the first image if the similarity between the second local feature and the first local feature obtained for the distance range is equal to or greater than a predetermined value. When,
If at least two or more of the similarities between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges are equal to or greater than the predetermined value, the determination unit Determines whether the second image is similar to the first image based on a total of the similarities equal to or greater than the predetermined value .

Extraction means for extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding means for holding depth information of the imaging target corresponding to each feature point of the second image;
Based on the depth information, the specifying means for specifying at least one distance range, and the similarity between the second local feature and the first local feature obtained for the distance range is a predetermined value or more. If there is a determination unit that determines that the second image is similar to the first image,
With
If at least two or more of the similarities between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges are equal to or greater than the predetermined value, the determination means obtained for each two or more of the distance range to fit, on the basis of the sum of the similarity between the first local feature quantity and the second local features, the second image is similar to the first image An image processing apparatus characterized by determining whether or not.

The one distance range specified by the specifying means corresponds to a foreground part or a background part in the second image, and a similarity between the foreground part or the background part and the first image is equal to or greater than the predetermined value. The image processing apparatus according to claim 1, wherein the determination unit determines that the second image is similar to the first image.

The plurality of distance ranges specified by the specifying means correspond to the foreground part and the background part in the second image, the similarity between the foreground part and the first image, the background part and the first image 2. The image processing according to claim 1, wherein the determination unit determines that the second image is similar to the first image if any of the similarity to the first image is equal to or greater than the predetermined value. apparatus.

The plurality of distance ranges specified by the specifying means correspond to the foreground part and the background part in the second image, the similarity between the foreground part and the first image, the background part and the first image 2. The image processing apparatus according to claim 1, wherein the determination unit determines that the second image is similar to the first image if the similarity to each other is equal to or greater than the predetermined value.

The holding means includes information for specifying the second local feature amount and the second image for each of the plurality of feature points of the second image extracted by the extraction means, and the each of the second images. The image processing apparatus according to claim 1, wherein the depth information of the imaging target corresponding to a feature point is held in association with each other.

The specifying means specifies the second local feature quantity corresponding to the distance range, and the holding means holds the second local feature quantity corresponding to the distance range and the distance range in association with each other. The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

8. The similarity between the second local feature and the first local feature is the number of pairs of the first local feature that is similar to the second local feature. The image processing apparatus according to any one of the above.

9. The depth information is generated by a generation unit outside the image processing apparatus, and is a depth value indicating a distance from the generation unit to the imaging target. An image processing apparatus according to 1.

An extraction step of extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding the depth information of the imaging target corresponding to each feature point of the second image in a holding unit;
A specific step of identifying at least one distance range based on the depth information;
A determination step of determining that the second image is similar to the first image if the similarity between the second local feature value and the first local feature value obtained for the distance range is equal to or greater than a predetermined value. When,
With
If at least two or more of the similarities between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges are greater than or equal to the predetermined value, in the determination step, An image processing method comprising: determining whether the second image is similar to the first image based on a sum of the similarities equal to or greater than a predetermined value .

An extraction step of extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding the depth information of the imaging target corresponding to each feature point of the second image in a holding unit;
Based on the depth information, the specific step of specifying at least one distance range, and the similarity between the second local feature amount and the first local feature amount obtained for the distance range is a predetermined value or more. If there is a determination step for determining that the second image is similar to the first image;
With
If at least two or more of the similarities between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges are equal to or greater than the predetermined value, in the determination step, obtained for each two or more of the distance range to fit, on the basis of the sum of the similarity between the first local feature quantity and the second local features, the second image is similar to the first image An image processing method characterized by determining whether or not.

An extraction step of extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding the depth information of the imaging target corresponding to each feature point of the second image in a holding unit;
A specific step of identifying at least one distance range based on the depth information;
Determination step of determining that the second image is similar to the first image if the similarity between the second local feature value and the first local feature value obtained for the distance range is equal to or greater than a predetermined value. When,
To the computer,
Wherein in the determination step, was obtained for each of a plurality of said distance range, said second local characteristic amount and the first local feature quantity and the similarity of at least two of, if the predetermined value or more, the A program for determining whether the second image is similar to the first image based on a sum of the similarities equal to or greater than a predetermined value .

An extraction step of extracting a first local feature amount for each of a plurality of feature points of the first image and a second local feature amount of each of a plurality of feature points of the second image obtained by imaging the imaging target;
Holding the depth information of the imaging target corresponding to each feature point of the second image in a holding unit;
Based on the depth information, the specifying step for specifying at least one distance range, and the similarity between the second local feature and the first local feature obtained for the distance range is a predetermined value or more. If there is a determination step for determining that the second image is similar to the first image;
To the computer,
In the determination step , if at least two or more of the similarities between the second local feature quantity and the first local feature quantity obtained for each of the plurality of distance ranges are equal to or greater than the predetermined value, obtained for each two or more of the distance range to fit, on the basis of the sum of the similarity between the first local feature quantity and the second local features, the second image is similar to the first image A program characterized by determining whether or not.