JP6055435B2

JP6055435B2 - Subject recognition apparatus, subject recognition method, and subject recognition program

Info

Publication number: JP6055435B2
Application number: JP2014080928A
Authority: JP
Inventors: 島村　潤; 潤島村; 大我吉田; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-04-10
Filing date: 2014-04-10
Publication date: 2016-12-27
Anticipated expiration: 2034-04-10
Also published as: JP2015201123A

Description

本発明は、複数の被写体を撮影した訓練画像群を予め学習し、別時刻、別の位置・角度で撮影されたテスト画像が訓練画像群のいずれの被写体を撮影したかを推定する被写体認識装置、被写体認識方法及び被写体認識プログラムに関する。 The present invention learns in advance a training image group in which a plurality of subjects are photographed, and presumes which subject in the training image group is photographed by test images photographed at different times and at different positions and angles. The present invention relates to a subject recognition method and a subject recognition program.

空間を撮像したカメラにより入力されたテスト画像が、予め被写体を撮影した訓練画像群の何れに類似するかを推定する技術は、被写体認識や検出を可能としている。例えば非特許文献１記載の方法では、被写体を撮影した訓練画像群それぞれについて、予めＳＩＦＴと呼ばれる局所特徴量を複数抽出して訓練画像に関連付けてデータベース（ＤＢ）に格納しておく。次に、撮影被写体が未知のテスト画像からも同様に複数のＳＩＦＴ特徴を抽出する。 A technique for estimating which test image input by a camera that captures a space is similar to a training image group in which a subject is captured in advance enables subject recognition and detection. For example, in the method described in Non-Patent Document 1, a plurality of local feature amounts called SIFT are extracted in advance for each training image group in which a subject is photographed, and stored in a database (DB) in association with the training image. Next, a plurality of SIFT features are similarly extracted from a test image whose shooting subject is unknown.

そして、それぞれのＳＩＦＴ特徴について、或る訓練画像に関連付けて記録されている局所特徴量から、最もベクトル間距離が小さい局所特徴量を特定する処理を全てのテスト画像のＳＩＦＴ特徴に亘って実施する。そして、暫定対応点を決定した後に誤対応点を除去し、残った対応点数を記録する。この処理をデータベースに格納されている全ての訓練画像に対して行ない、対応点数が最も多かった訓練画像を、テスト画像に対応する画像とする。この処理により、テスト画像がどの被写体を撮影したかを推定することを実現している。 Then, for each SIFT feature, a process of specifying a local feature amount having the smallest inter-vector distance from local feature amounts recorded in association with a certain training image is performed over SIFT features of all test images. . Then, after the provisional corresponding points are determined, the erroneous corresponding points are removed, and the number of remaining corresponding points is recorded. This process is performed on all the training images stored in the database, and the training image having the largest number of corresponding points is set as an image corresponding to the test image. By this process, it is realized to estimate which subject is photographed by the test image.

図５〜図７は、従来技術による誤対応点除去の例を示す説明図である。暫定対応点を決定した後の誤対応点の除去は、高速化のために次のように行う。まず、暫定対応点を構成する或る２つのＳＩＦＴ特徴間が保持するＳＩＦＴ特徴の属性情報である、図５に示す位置情報（ｘ１，ｙ１）、スケール（ｓ１）、回転量（ｒ１）の変化量（Δｘ，Δｙ，Δｓ，Δｒ）が正しい対応点では同一になると仮定する。 5 to 7 are explanatory diagrams showing examples of erroneous corresponding point removal according to the prior art. The removal of the erroneous corresponding point after the provisional corresponding point is determined is performed as follows for speeding up. First, changes in position information (x1, y1), scale (s1), and rotation amount (r1) shown in FIG. 5, which are SIFT feature attribute information held between two certain SIFT features constituting a provisional corresponding point. Assume that the quantities (Δx, Δy, Δs, Δr) are the same at the correct corresponding points.

そして、全ての暫定対応点に亘って位置情報、スケール、局所領域の回転量から成るセルで区切られた４次元空間（変化量Δｘ，Δｙ，Δｓ，Δｒの軸で定義される４次元空間：図６参照）に投票する。この投票の結果、一定数以上の票を得た対応点のみを採用して誤対応を除去する（図７参照）。次に、一定数以上の票を得たセルに属する暫定対応点から、最小二乗法に基づきテスト画像と訓練画像間での主要な二次元アフィン変換を算出する。算出されたアフィン変換のｘ，ｙ、スケール、回転量に従わない属性情報を有する暫定対応点を誤対応として更に除去することで誤対応点除去を実現する。 Then, a four-dimensional space (four-dimensional space defined by the axes of change amounts Δx, Δy, Δs, and Δr: divided by cells including position information, scale, and rotation amount of the local region over all provisional corresponding points: Vote (see Figure 6). As a result of this voting, only corresponding points that have obtained a certain number of votes or more are adopted to eliminate erroneous correspondence (see FIG. 7). Next, the main two-dimensional affine transformation between the test image and the training image is calculated based on the least square method from the provisional corresponding points belonging to the cells that have obtained a certain number of votes or more. The false corresponding point removal is realized by further removing the provisional corresponding point having attribute information that does not follow the calculated affine transformation x, y, scale, and rotation amount as an incorrect correspondence.

D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, Vol. 60 Issue 2, pp.91-110, November 2004.D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, Vol. 60 Issue 2, pp.91-110, November 2004.

ところで、従来の方法では、局所特徴量が保持する位置情報（ｘ，ｙ座標）、スケール、回転量等の属性情報が正しいという仮定のもとで、誤対応除去の処理において、算出されたアフィン変換のｘ，ｙ、スケール、回転量に従わない属性情報を有する暫定対応点を除去している。 By the way, in the conventional method, under the assumption that the attribute information such as the position information (x, y coordinates), the scale, and the rotation amount held by the local feature amount is correct, the calculated affine is calculated in the process of removing the incorrect correspondence. Temporary corresponding points having attribute information that does not follow the conversion x, y, scale, and rotation amount are removed.

しかしながら、局所特徴の属性情報のうち、特にスケール、回転量は、画像に生じたノイズや、被写体が３次元物体の時に生じる画像歪みによって誤差が生じ、結果的に、正しく誤対応を除去できなくなるという問題がある。 However, among the attribute information of local features, in particular, the scale and the rotation amount cause errors due to noise generated in the image and image distortion generated when the subject is a three-dimensional object, and as a result, it becomes impossible to correctly remove the erroneous correspondence. There is a problem.

本発明は、このような事情に鑑みてなされたもので、局所特徴の位置情報以外の属性情報が正しく求まっていない場合でも、高速かつ精度良く誤対応点を除去し、テスト画像がどの被写体を撮影したものであるかを推定することができる被写体認識装置、被写体認識方法及び被写体認識プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and even when attribute information other than position information of local features has not been correctly obtained, erroneous correspondence points are removed at high speed and with high accuracy, and which subject the test image shows. It is an object of the present invention to provide a subject recognition device, a subject recognition method, and a subject recognition program that can estimate whether a photograph is taken.

本発明は、被写体を撮像した複数の訓練画像から抽出した局所特徴量と、該局所特徴量の属性を示す属性情報とを前記訓練画像毎に関係付けて記憶する画像情報記憶部と、認識すべき被写体を撮像したテスト画像から局所特徴量と前記属性情報を抽出する局所特徴抽出部と、前記テスト画像から抽出した前記局所特徴量と、前記画像情報記憶部に記憶された前記局所特徴量とを比較して、暫定対応点を決定する暫定対応点決定部と、前記属性情報の種類の数と同一の次元数から成る所定のセルのサイズで区切られた投票空間を構築し、前記暫定対応点全てについて、前記属性情報間の変化量を算出して該当する前記セルに対して投票する弱幾何検証部と、前記投票の結果に基づき、票数が所定数以上の全てのセルに対して、前記暫定対応点を所定の幾何モデルを当て嵌め、幾何変換パラメータを推定することにより、外れ値でないインライア対応点を求める強幾何検証部と、前記インライア対応点の信頼度を算出し、該信頼度の値が所定値より低いインライア点群を棄却する信頼度推定部と、前記棄却されていないインライア対応点数の総和を前記訓練画像のスコアとして出力するスコアリング部とを備えることを特徴とする。 The present invention recognizes a local feature amount extracted from a plurality of training images obtained by imaging a subject and attribute information indicating the attribute of the local feature amount in association with each training image, and recognizes it. a local feature extraction unit configured to extract the attribute information and the local feature amount from the test image obtained by imaging a subject to, and the local feature quantity extracted from the test image, and the local feature amount stored in the image information storage unit by comparing the interim and corresponding points tentative corresponding point determination unit for determining a, to build the attribute information of the type of number Vote separated by size of a given cell of the same number of dimensions and space, the provisional response For all points, a weak geometric verification unit that calculates the amount of change between the attribute information and votes for the corresponding cell, and based on the result of the voting, for all the cells having a predetermined number of votes, The provisional corresponding point Fitting a constant geometric model, by estimating the geometric transformation parameters, the strength geometric verification unit for determining the inliers corresponding points not an outlier, the calculated inliers corresponding points reliability value of the reliability is a predetermined value A reliability estimation unit that rejects a lower inlier point group, and a scoring unit that outputs the sum of the unrejected inlier corresponding points as a score of the training image are provided.

本発明は、棄却されていない前記インライア対応点からアフィン変換もしくはホモグラフィ変換の幾何変換行列を求めることにより、前記訓練画像上の点が前記テスト画像上のどこの点に対応するかを示す情報を出力する正対応２Ｄ幾何変換行列推定部をさらに備えたことを特徴とする。 The present invention provides information indicating which point on the training image corresponds to the point on the training image by obtaining a geometric transformation matrix of affine transformation or homography transformation from the inlier corresponding points that are not rejected. Is further provided with a positive correspondence 2D geometric transformation matrix estimation unit.

本発明は、棄却されていない前記インライア対応点から基礎行列を算出するとともに、前記テスト画像及び前記訓練画像それぞれを撮影したカメラの焦点距離を推定し、前記基礎行列と前記テスト画像及び前記訓練画像それぞれを撮影したカメラの焦点距離とから、前記カメラの相対的な並進と回転を計算して出力する相対姿勢推定部をさらに備えたことを特徴とする。 The present invention calculates a basic matrix from the inlier corresponding points that have not been rejected, estimates a focal length of a camera that has captured the test image and the training image, and calculates the basic matrix, the test image, and the training image. The camera further comprises a relative posture estimation unit that calculates and outputs relative translation and rotation of the camera from the focal length of the camera that has captured each of the images.

本発明は、被写体を撮像した複数の訓練画像から抽出した局所特徴量と、該局所特徴量の属性を示す属性情報とを前記訓練画像毎に関係付けて記憶する画像情報記憶部を備えた被写体認識装置が行う被写体認識方法であって、認識すべき被写体を撮像したテスト画像から局所特徴量と前記属性情報を抽出する局所特徴抽出ステップと、前記テスト画像から抽出した前記局所特徴量と、前記画像情報記憶部に記憶された前記局所特徴量とを比較して、暫定対応点を決定する暫定対応点決定ステップと、前記属性情報の種類の数と同一の次元数から成る所定のセルのサイズで区切られた投票空間を構築し、前記暫定対応点全てについて、前記属性情報間の変化量を算出して該当する前記セルに対して投票する弱幾何検証ステップと、前記投票の結果に基づき、票数が所定数以上の全てのセルに対して、前記暫定対応点を所定の幾何モデルを当て嵌め、幾何変換パラメータを推定することにより、外れ値でないインライア対応点を求める強幾何検証ステップと、前記インライア対応点の信頼度を算出し、該信頼度の値が所定値より低いインライア点群を棄却する信頼度推定ステップと、前記棄却されていないインライア対応点数の総和を前記訓練画像のスコアとして出力するスコアリングステップとを有することを特徴とする。 The present invention includes a subject having an image information storage unit that stores a local feature amount extracted from a plurality of training images obtained by imaging a subject and attribute information indicating attributes of the local feature amount in association with each training image. A method for recognizing a subject performed by a recognition apparatus, wherein a local feature extraction step for extracting a local feature and the attribute information from a test image obtained by imaging a subject to be recognized, the local feature extracted from the test image, by comparing the local feature amount stored in the image information storage unit, the provisional corresponding points and tentative corresponding point determination step of determining the size of the predetermined cells of the same number of dimensions and the number of kinds of the attribute information A weak geometric verification step of constructing a voting space separated by, calculating a change amount between the attribute information for all the provisional corresponding points, and voting the corresponding cell, and a result of the voting Strong geometric verification step for obtaining inlier corresponding points that are not outliers by fitting the provisional corresponding points to a predetermined geometric model and estimating geometric transformation parameters for all cells having a predetermined number or more of votes If the calculated inliers corresponding points reliability, the value of reliability reliability estimating step of rejecting low inliers point group than the predetermined value, the sum of inliers corresponding points that are not the rejected of the training images And a scoring step for outputting as a score.

本発明は、棄却されていない前記インライア対応点からアフィン変換もしくはホモグラフィ変換の幾何変換行列を求めることにより、前記訓練画像上の点が前記テスト画像上のどこの点に対応するかを示す情報を出力する正対応２Ｄ幾何変換行列推定部ステップをさらに有することを特徴とする。 The present invention provides information indicating which point on the training image corresponds to the point on the training image by obtaining a geometric transformation matrix of affine transformation or homography transformation from the inlier corresponding points that are not rejected. Is further provided with a positive correspondence 2D geometric transformation matrix estimator step.

本発明は、棄却されていない前記インライア対応点から基礎行列を算出するとともに、前記テスト画像及び前記訓練画像それぞれを撮影したカメラの焦点距離を推定し、前記基礎行列と前記テスト画像及び前記訓練画像それぞれを撮影したカメラの焦点距離とから、前記カメラの相対的な並進と回転を計算して出力する相対姿勢推定ステップをさらに有することを特徴とする。 The present invention calculates a basic matrix from the inlier corresponding points that have not been rejected, estimates a focal length of a camera that has captured the test image and the training image, and calculates the basic matrix, the test image, and the training image. The method further comprises a relative posture estimation step of calculating and outputting the relative translation and rotation of the camera from the focal length of the camera that captured each of them.

本発明は、コンピュータを、前記被写体認識装置として機能させるための被写体認識プログラムである。 The present invention is a subject recognition program for causing a computer to function as the subject recognition device.

本発明によれば、局所特徴の位置情報以外の属性情報が正しく求まっていない場合でも、高速かつ精度良く誤対応点を除去し、精度良くテスト画像がどの被写体を撮影したかを推定することができるという効果が得られる。 According to the present invention, even when attribute information other than position information of local features is not correctly obtained, it is possible to remove erroneous correspondence points with high speed and accuracy and to estimate which subject the test image has captured with high accuracy. The effect that it can be obtained.

本発明の第１実施形態による被写体認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the to-be-recognized apparatus by 1st Embodiment of this invention. 図１に示す被写体認識装置の処理動作を示すフローチャートである。3 is a flowchart showing a processing operation of the subject recognition apparatus shown in FIG. 1. 本発明の第２実施形態による被写体認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the to-be-recognized apparatus by 2nd Embodiment of this invention. 本発明の第１実施形態による被写体認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the to-be-recognized apparatus by 1st Embodiment of this invention. 従来技術による誤対応点除去の例を示す説明図である。It is explanatory drawing which shows the example of the false corresponding point removal by a prior art. 従来技術による誤対応点除去の例を示す説明図である。It is explanatory drawing which shows the example of the false corresponding point removal by a prior art. 従来技術による誤対応点除去の例を示す説明図である。It is explanatory drawing which shows the example of the false corresponding point removal by a prior art.

＜第１実施形態＞
以下、図面を参照して、本発明の第１実施形態による被写体認識装置を説明する。図１は同実施形態の構成を示すブロック図である。同図に示すように、被写体認識装置１は、データベース（ＤＢ）２０と、訓練部３０と認識部４０とを備えている。訓練部３０は、前処理部２と、局所特徴抽出部３と、局所特徴量、属性情報保存部４から構成する。また認識部４０は、前処理部２、局所特徴抽出部３と、暫定対応点決定部５と、弱幾何検証部６と、強幾何検証部７と、信頼度推定部８と、スコアリング部９とから構成する。前処理部２、局所特徴抽出部３は、訓練部３０と認識部４０において同一のものである。 <First Embodiment>
A subject recognition apparatus according to a first embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. As shown in FIG. 1, the subject recognition apparatus 1 includes a database (DB) 20, a training unit 30, and a recognition unit 40. The training unit 30 includes a preprocessing unit 2, a local feature extraction unit 3, and a local feature quantity / attribute information storage unit 4. The recognition unit 40 includes a preprocessing unit 2, a local feature extraction unit 3, a provisional corresponding point determination unit 5, a weak geometric verification unit 6, a strong geometric verification unit 7, a reliability estimation unit 8, and a scoring unit. 9. The preprocessing unit 2 and the local feature extraction unit 3 are the same in the training unit 30 and the recognition unit 40.

次に、図１を参照して、図１に示す訓練部３０の動作を説明する。訓練部３０はそれぞれ異なる被写体を撮影した訓練画像群を入力する。訓練部３０に訓練画像群が入力されると、１枚１枚ずつ順次以下の処理が行われる。まず前処理部２は、入力された画像を所定のサイズに拡大縮小する。この処理において、画像ノイズを軽減するために、例えば平滑化処理を行ってもよい。 Next, the operation of the training unit 30 shown in FIG. 1 will be described with reference to FIG. The training unit 30 inputs a training image group obtained by photographing different subjects. When a training image group is input to the training unit 30, the following processing is performed sequentially one by one. First, the preprocessing unit 2 enlarges or reduces the input image to a predetermined size. In this process, for example, a smoothing process may be performed in order to reduce image noise.

次に、局所特徴抽出部３は、拡大縮小された画像から局所特徴量、属性情報を抽出する。局所特徴としては、非特許文献１同様のＳＩＦＴでもよいし、他の局所特徴であるＳＵＲＦ、ＢＲＩＥＦ、ＢＲＩＳＫ、ＯＲＢ、ＦＲＥＡＫ等でもよい。また属性情報としては、非特許文献１同様に位置情報（ｘ，ｙ）、スケール、回転量でもよいし、参考文献１の方法によって、回転方向を３自由度分取得して、位置情報（ｘ，ｙ）、スケール、回転（ロール，ピッチ，ヨー）としてもよい。
参考文献１：J. Morel, G. Yu, "ASIFT: A New Framework for Fully Affine Invariant Image Comparison", Journal SIAM Journal on Imaging Sciences, Volume 2 Issue 2, pp. 438-469, April 2009 Next, the local feature extraction unit 3 extracts local feature amounts and attribute information from the enlarged / reduced image. As the local feature, SIFT similar to Non-Patent Document 1 may be used, or other local features such as SURF, BRIEF, BRISK, ORB, and FREAK may be used. As attribute information, position information (x, y), a scale, and a rotation amount may be used as in Non-Patent Document 1, and the rotation direction is acquired for three degrees of freedom by the method of Reference Document 1, and position information (x , Y), scale, rotation (roll, pitch, yaw).
Reference 1: J. Morel, G. Yu, "ASIFT: A New Framework for Fully Affine Invariant Image Comparison", Journal SIAM Journal on Imaging Sciences, Volume 2 Issue 2, pp. 438-469, April 2009

次に、局所特徴量、属性情報保存部４は、抽出した属性情報と局所特徴量のセットを、訓練画像毎に唯一に付与された訓練画像ＩＤに関連付けて、データベース２０に記録保存する。 Next, the local feature quantity / attribute information storage unit 4 records and stores the set of the extracted attribute information and local feature quantity in the database 20 in association with the training image ID uniquely assigned to each training image.

次に、図２を参照して、図１に示す認識部４０の処理動作を説明する。図２は、図１に示す認識部４０の処理動作を示すフローチャートである。認識部４０は未知の被写体を撮影した画像を入力する。画像が入力されると、前処理部２は、画像を指定サイズに縮小する（ステップＳ１）。続いて、局所特徴抽出部３は、局所特徴量、属性情報を抽出する（ステップＳ２）。前処理部２と局所特徴抽出部３の処理は、訓練部３０と同様であるので詳細な説明は省略する。 Next, the processing operation of the recognition unit 40 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the recognition unit 40 shown in FIG. The recognition unit 40 inputs an image obtained by photographing an unknown subject. When an image is input, the preprocessing unit 2 reduces the image to a specified size (step S1). Subsequently, the local feature extraction unit 3 extracts local feature amounts and attribute information (step S2). Since the processes of the preprocessing unit 2 and the local feature extraction unit 3 are the same as those of the training unit 30, detailed description thereof is omitted.

次に、暫定対応点決定部５は、データベース２０から訓練画像ＩＤに関連付けられた局所特徴量群を読み出す（ステップＳ３）。続いて、訓練画像ＩＤ毎に以下の処理（ステップＳ４〜Ｓ１２）を行う。まず、暫定対応点決定部５は、抽出したテスト画像の或る局所特徴Ｆｉについてその局所特徴量と、読み出した対象訓練画像ＩＤの全局所特徴量を比較して、特徴量間の距離が最も小さくなる対象訓練画像ＩＤの局所特徴Ｆｊを決定し、ＦｉとＦｊのペアを暫定対応点Ｃｐ｛Ｆｉ，Ｆｊ｝とする（ステップＳ５）。この処理をテスト画像から抽出した全ての局所特徴に対して行なう。テスト画像の局所特徴数がＮ個の場合、暫定対応点数もＮとなる。 Next, the provisional corresponding point determination unit 5 reads a local feature amount group associated with the training image ID from the database 20 (step S3). Subsequently, the following processing (steps S4 to S12) is performed for each training image ID. First, the provisional corresponding point determination unit 5 compares the local feature amount of a certain local feature Fi of the extracted test image with all the local feature amounts of the read target training image ID, and the distance between the feature amounts is the largest. A local feature Fj of the target training image ID to be reduced is determined, and a pair of Fi and Fj is set as a provisional corresponding point Cp {Fi, Fj} (step S5). This process is performed for all local features extracted from the test image. When the number of local features of the test image is N, the number of provisional corresponding points is also N.

次に、弱幾何検証部６は、局所特徴の属性情報の数と同一の次元数から成る、所定のセルサイズで区切られた投票空間を構築する。例えば、局所特徴の属性情報が位置情報（ｘ，ｙ）、スケール、回転量であれば、４次元の投票空間を構築する。位置情報（ｘ，ｙ）、スケール、回転（ロール、ピッチ、ヨー）であれば６次元の投票空間を構築する。なお、ここで、６次元すべて使わずに、スケール、回転（ロール、ピッチ、ヨー）で４次元としてもよい。ここでは、位置情報（ｘ，ｙ）、スケール、回転量の、４次元の投票空間を構築したものとする。 Next, the weak geometric verification unit 6 constructs a voting space that is divided by a predetermined cell size and has the same number of dimensions as the number of attribute information of local features. For example, if the attribute information of local features is position information (x, y), scale, and rotation amount, a four-dimensional voting space is constructed. If the position information (x, y), scale, and rotation (roll, pitch, yaw), a 6-dimensional voting space is constructed. In addition, it is good also as a 4 dimension by a scale and rotation (roll, pitch, yaw), without using all 6 dimensions here. Here, it is assumed that a four-dimensional voting space of position information (x, y), scale, and rotation amount is constructed.

次に、弱幾何検証部６は、暫定対応点全てについて、属性情報間の変化量を算出して該当セルに投票する（ステップＳ６）。この処理は例えば、或る暫定対応点Ｃｐ｛Ｆｉ，Ｆｊ｝について、Ｆｉの属性情報が位置情報（ｘ，ｙ）、スケール、回転量の順に｛１０，０，１，９０｝、同様にＦｊの属性情報が｛３０，３０，３，２００｝であれば、Ｆｉに対するＦｊの変化量は｛２０，３０，３，１１０｝となる。ここで位置、回転成分は差分量、スケール成分は変化の比により算出する。この変化量｛２０，３０，３，１１０｝から、準備した投票空間のうち値が最も近いセルを特定し、当該セルに対象の暫定対応点Ｃｐ｛Ｆｉ，Ｆｊ｝を格納する。以上の処理を全ての暫定対応点に亘って行う。 Next, the weak geometry verification unit 6 calculates the amount of change between the attribute information for all the provisional corresponding points and votes for the corresponding cell (step S6). In this process, for example, for a certain provisional corresponding point Cp {Fi, Fj}, the attribute information of Fi is {10, 0, 1, 90} in the order of position information (x, y), scale, and rotation amount, and similarly Fj. If the attribute information is {30, 30, 3, 200}, the change amount of Fj with respect to Fi is {20, 30, 3, 110}. Here, the position and rotation components are calculated by the difference amount, and the scale component is calculated by the change ratio. From this variation {20, 30, 3, 110}, the cell having the closest value in the prepared voting space is specified, and the target provisional corresponding point Cp {Fi, Fj} is stored in the cell. The above processing is performed over all provisional corresponding points.

次に、票数が所定数以上の全てのセルに対して以降の処理（ステップＳ７〜Ｓ１０）を行う。ここで、所定数は後述する幾何モデル当て嵌めにおいて必要となる最小の対応点数であり、例えば幾何モデルとして訓練画像の撮像カメラとテスト画像の撮像カメラの位置関係を示す基礎行列Ｆを用いる場合は７もしくは８となる。また、例えばアフィン変換を用いる場合は３、射影変換（ホモグラフィ）を用いる場合は４となる。 Next, the subsequent processing (steps S7 to S10) is performed for all cells having a predetermined number of votes. Here, the predetermined number is the minimum number of corresponding points required in the geometric model fitting described later. For example, when the basic matrix F indicating the positional relationship between the imaging camera of the training image and the imaging camera of the test image is used as the geometric model. 7 or 8. For example, 3 is used when affine transformation is used, and 4 is used when projective transformation (homography) is used.

まず、強幾何検証部７は、処理対象セルに属す暫定対応点群に、ロバスト推定を用いて所定の幾何モデルを当て嵌める。ここでロバスト推定とは、データのランダムサンプリングに基づいて、データに外れ値が混じっていたとしても妥当なパラメータ推定を行う方法である。ここでは、データは暫定対応点群、外れ値は誤対応点、パラメータは幾何モデルのパラメータに該当する。強幾何検証部７が行うロバスト推定では最終的に、推定された幾何モデルのパラメータと、入力した暫定対応点群のうち、外れ値ではないインライアと判定された対応点群が求まるので、当該インライアと判定された対応点群をインライア対応点群として出力する（ステップＳ８）。 First, the strong geometric verification unit 7 fits a predetermined geometric model to the provisional corresponding point group belonging to the processing target cell using robust estimation. Here, the robust estimation is a method for estimating a valid parameter based on random sampling of data even if outliers are mixed in the data. Here, the data corresponds to a provisional corresponding point group, the outlier corresponds to an erroneous corresponding point, and the parameter corresponds to a geometric model parameter. In the robust estimation performed by the strong geometric verification unit 7, a corresponding point group determined to be an inlier that is not an outlier among the parameters of the estimated geometric model and the input temporary corresponding point group is obtained. Is output as an inlier corresponding point group (step S8).

次に、信頼度推定部８は、インライア対応点群の信頼度を算出し、当該信頼度値が所定値より低ければ、対象インライア点群を棄却する（ステップＳ９）。逆に所定値より大きければ受諾とし、インライア対応点数をメモリ記録し処理を終える。ここで信頼度の算出は、例えば非特許文献１に記載されている、推定した幾何変換パラメータを用いてテスト画像を訓練画像上にマッピングし、当該マッピング領域内の局所特徴点数に応じて誤対応が生じる確率を算出する方法によって実現できる。 Next, the reliability estimation unit 8 calculates the reliability of the inlier corresponding point group, and rejects the target inlier point group if the reliability value is lower than a predetermined value (step S9). On the contrary, if it is larger than the predetermined value, it is accepted and the number of inlier corresponding points is recorded in the memory, and the process is finished. Here, the calculation of the reliability is performed by mapping the test image on the training image using the estimated geometric transformation parameter described in Non-Patent Document 1, for example, and miscorresponding according to the number of local feature points in the mapping region This can be realized by a method of calculating the probability of occurrence.

票数が所定数以上の全セルについて処理を終えたら、スコアリング部９は、メモリ記録されたインライア対応点数の総和を算出し、対象訓練画像ＩＤのスコアとしてメモリ記録する（ステップＳ１１）。 When the processing is completed for all the cells whose number of votes is equal to or larger than the predetermined number, the scoring unit 9 calculates the sum of the inlier corresponding points recorded in the memory, and records it in the memory as the score of the target training image ID (step S11).

そして、全ての訓練画像ＩＤについて処理を終えたら、最後に、訓練画像ＩＤとスコアを関連付けて出力し、認識部４０の処理を終える（ステップＳ１３）。このスコアの高い訓練画像が、テスト画像に写っている被写体の画像であると見なすことによって、被写体の認識を行うことができる。 And if processing is completed about all training image IDs, finally, training image ID and a score are linked | related and output, and the process of the recognition part 40 is complete | finished (step S13). The subject can be recognized by regarding the training image having a high score as an image of the subject in the test image.

この構成により、局所特徴の属性情報が不確かな場合でも、高速かつ正しく誤対応点を除去し、正確にテスト画像がどの被写体を撮影したかを推定することができる。 With this configuration, even when the attribute information of the local feature is uncertain, it is possible to remove the erroneous corresponding point at high speed and accurately estimate which subject the test image has captured.

＜第２実施形態＞
次に、本発明の第２実施形態による被写体認識装置を説明する。前述した第１実施形態では、局所特徴の属性情報の変化量投票に基づく幾何検証を行い、続いて局所特徴の位置のみに基づくロバスト推定による幾何検証を行うことで、高速かつ正しく誤対応点を除去し、テスト画像がどの被写体を撮影したかを正確に推定することができる。 Second Embodiment
Next, a subject recognition apparatus according to a second embodiment of the present invention will be described. In the first embodiment described above, geometric verification is performed based on the vote of change amount of attribute information of the local feature, and then geometric verification is performed by robust estimation based only on the position of the local feature. It is possible to accurately estimate which subject is taken by the test image.

しかしながら、該当する訓練画像の領域、つまり被写体の領域がテスト画像上においてどこに存在するのかは不明であるといった問題がある。第２実施形態はこの問題を解決するものである。 However, there is a problem that it is unclear where the corresponding training image region, that is, the subject region exists on the test image. The second embodiment solves this problem.

図３は、同実施形態による被写体認識装置の構成を示すブロック図である。この図において、図１に示す装置と同一の部分には同一の符号を付し、その説明を省略する。この図に示す装置が図１に示す装置と異なる点は、正対応２Ｄ幾何変換行列推定部１０を備えている点である。 FIG. 3 is a block diagram showing the configuration of the subject recognition apparatus according to the embodiment. In this figure, the same parts as those in the apparatus shown in FIG. The apparatus shown in this figure is different from the apparatus shown in FIG. 1 in that a positive correspondence 2D geometric transformation matrix estimation unit 10 is provided.

次に、図３に示す被写体認識装置１の動作を説明する。正対応２Ｄ幾何変換行列推定部１０は、スコアリング部９がスコア算出に用いたインライア対応点を入力する。インライア対応点はテスト画像と、対応する訓練画像間での、局所特徴の確からしい対応を示す。正対応２Ｄ幾何変換行列推定部１０は、この対応点を用いて、訓練画像に該当するテスト画像上での領域がどこかを推定する。具体的には、訓練画像からテスト画像への幾何変換をアフィン変換もしくはホモグラフィ変換として、アフィン変換もしくはホモグラフィ変換のパラメータを、最小二乗法により推定する。ここで、対応点からアフィン変換もしくはホモグラフィ変換のパラメータを例えば最小二乗法により推定する方法は、アフィン変換の場合は非特許文献１、ホモグラフィの場合は参考文献２に記載の方法と同様であるのでここでの詳細な説明は省略する。
参考文献２：Richard Hartley and Andrew Zisserman,「Multiple View Geometry in Computer Vision Second Edition」,Cambridge University Press, March 2004. pp.88~91. Next, the operation of the subject recognition apparatus 1 shown in FIG. 3 will be described. The positive correspondence 2D geometric transformation matrix estimation unit 10 inputs inlier corresponding points used by the scoring unit 9 for score calculation. The inlier correspondence points indicate a probable correspondence of local features between the test image and the corresponding training image. The positive correspondence 2D geometric transformation matrix estimation unit 10 uses the corresponding points to estimate where the region on the test image corresponding to the training image is. Specifically, the geometric transformation from the training image to the test image is affine transformation or homography transformation, and the parameters of the affine transformation or homography transformation are estimated by the least square method. Here, the method for estimating the affine transformation or homography transformation parameters from the corresponding points by, for example, the least square method is the same as the method described in Non-Patent Document 1 for affine transformation and Reference 2 for homography. Therefore, detailed description thereof is omitted here.
Reference 2: Richard Hartley and Andrew Zisserman, "Multiple View Geometry in Computer Vision Second Edition", Cambridge University Press, March 2004. pp.88-91.

最後に、正対応２Ｄ幾何変換行列推定部１０は、推定した幾何変換行列を２Ｄ幾何変換行列として出力し、処理を終える。なお、算出されたアフィン変換もしくはホモグラフィ変換は訓練画像上の或る点がテスト画像上の何処に対応するかを示す。このため、例えば訓練画像に撮影された被写体領域を示す領域として、訓練画像の四隅を用いる場合、当該四隅の点を推定したアフィン変換行列もしくはホモグラフィ変換を用いてテスト画像上の座標にそれぞれ変換することで、テスト画像上での被写体の存在領域を決定することができる。 Finally, the correct correspondence 2D geometric transformation matrix estimation unit 10 outputs the estimated geometric transformation matrix as a 2D geometric transformation matrix, and ends the process. Note that the calculated affine transformation or homography transformation indicates where on the test image a certain point on the training image corresponds. For this reason, for example, when using the four corners of the training image as the region indicating the subject region photographed in the training image, the coordinates are converted into the coordinates on the test image using the affine transformation matrix or the homography transformation that estimates the points of the four corners, respectively. By doing so, it is possible to determine the existence area of the subject on the test image.

この構成により、該当する訓練画像の領域、つまり被写体の領域がテスト画像上においてどこに存在かを決定することができる。 With this configuration, it is possible to determine where the corresponding training image region, that is, the region of the subject exists on the test image.

＜第３実施形態＞
次に、本発明の第３実施形態による被写体認識装置を説明する。第２実施形態では、インライア対応点からアフィン変換もしくはホモグラフィ変換行列を算出することで、該当する訓練画像の領域、つまり被写体の領域がテスト画像上においてどこに存在するかを推定することができる。 <Third Embodiment>
Next, a subject recognition apparatus according to a third embodiment of the present invention will be described. In the second embodiment, by calculating an affine transformation or homography transformation matrix from inlier corresponding points, it is possible to estimate where the corresponding training image region, that is, the region of the subject exists on the test image.

しかしながら、該当する訓練画像の被写体に対して、テスト画像の被写体がどのような姿勢や位置で撮影されているかは不明であるといった問題がある。第３実施形態はこの問題を解決するものである。 However, there is a problem that it is unclear what posture and position the subject of the test image is photographed with respect to the subject of the corresponding training image. The third embodiment solves this problem.

図４は、同実施形態による被写体認識装置の構成を示すブロック図である。この図において、図３に示す装置と同一の部分には同一の符号を付し、その説明を省略する。この図に示す装置が図３に示す装置と異なる点は、相対姿勢推定部１１を備えている点である。 FIG. 4 is a block diagram showing the configuration of the subject recognition apparatus according to the embodiment. In this figure, the same parts as those in the apparatus shown in FIG. The apparatus shown in this figure is different from the apparatus shown in FIG. 3 in that a relative posture estimation unit 11 is provided.

次に、図４に示す被写体認識装置１の動作を説明する。相対姿勢推定部１１は、スコアリング部９がスコア算出に用いたインライア対応点を入力する。相対姿勢推定部１１は当該インライア対応点を用いて、まず基礎行列を算出する。次に、相対姿勢推定部１１は、テスト画像、訓練画像それぞれを撮影したカメラの焦点距離を推定する。次に、基礎行列とテスト画像、訓練画像それぞれを撮影したカメラの焦点距離から、２台のカメラの相対的な並進と回転を計算する。ここで、基礎行列、焦点距離と、相対的な並進と回転の推定は、例えば参考文献３に記載の方法によって実現できる。
参考文献３：山田，金澤，金谷，菅谷，”２画像からの３次元復元の最新アルゴリズム”，情報処理学会研究報告，ＣＶＩＭ，［コンピュータビジョンとイメージメディア］２００９−ＣＶＩＭ−１６８（１５），１−８，２００９−０８−２４ Next, the operation of the subject recognition apparatus 1 shown in FIG. 4 will be described. The relative posture estimation unit 11 inputs the inlier corresponding point used by the scoring unit 9 for calculating the score. The relative posture estimation unit 11 first calculates a basic matrix using the inlier corresponding points. Next, the relative posture estimation unit 11 estimates the focal length of the camera that captured each of the test image and the training image. Next, the relative translation and rotation of the two cameras are calculated from the focal length of the camera that captured the basic matrix, the test image, and the training image. Here, the basic matrix, the focal length, and the relative translation and rotation can be estimated by the method described in Reference 3, for example.
Reference 3: Yamada, Kanazawa, Kanaya, Sugaya, “Latest algorithm for three-dimensional reconstruction from two images”, IPSJ Research Report, CVIM, [Computer Vision and Image Media] 2009-CVIM-168 (15), 1 -8, 2009-08-24

最後に、相対姿勢推定部１１は、推定した相対的な並進と回転を相対姿勢として出力し、処理を終える。なお、訓練画像を撮影したカメラから、テスト画像を撮影したカメラへの相対姿勢が求まれば、インライア対応点の三次元座標も算出可能となる。当該三次元座標を、相対姿勢推定部において併せて算出し、相対姿勢に加えて出力してもよい。 Finally, the relative posture estimation unit 11 outputs the estimated relative translation and rotation as a relative posture, and ends the process. In addition, if the relative attitude to the camera that has captured the test image is obtained from the camera that has captured the training image, the three-dimensional coordinates of the inlier corresponding point can also be calculated. The three-dimensional coordinates may be calculated together in the relative posture estimation unit and output in addition to the relative posture.

この構成によって、該当する訓練画像の被写体に対して、テスト画像の被写体がどのような姿勢や位置で撮影されたかを推定することができることができる。 With this configuration, it is possible to estimate in what posture and position the subject of the test image was photographed with respect to the subject of the corresponding training image.

以上説明したように、暫定対応点全てをテスト画像と訓練画像間の幾何変換を示す、所定のサイズのセルで区切られた投票空間に投票した後に、票数が所定数以上のセルに属する暫定対応点群に対して、当該暫定対応点を構成する対応点の位置座標（ｘ，ｙ）のみを用いて、所定の幾何モデルを当て嵌めることで、高速かつ正しく誤対応点を除去し、正確にテスト画像がどの被写体を撮影したかを推定することができる。 As explained above, after voting all tentative corresponding points to a voting space delimited by cells of a predetermined size, indicating the geometric transformation between the test image and the training image, the provisional correspondence belonging to a cell whose number of votes is a predetermined number or more By applying a predetermined geometric model to the point group using only the position coordinates (x, y) of the corresponding points constituting the provisional corresponding points, the erroneous corresponding points are removed at high speed and accurately. It is possible to estimate which subject is captured by the test image.

前述した実施形態における被写体認識装置をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 The subject recognition device in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

局所特徴の位置情報以外の属性情報が正しく求まっていない場合でも、高速かつ正しく誤対応点を除去し、正確にテスト画像がどの被写体を撮影したかを推定することが不可欠な用途に適用できる。 Even when attribute information other than position information of local features is not correctly obtained, it can be applied to a use in which it is indispensable to correctly and correctly estimate which subject has been captured by removing erroneous corresponding points at high speed.

１・・・被写体認識装置、２・・・前処理部、３・・・局所特徴抽出部、４・・・局所特徴量、属性情報保存部、５・・・暫定対応点決定部、６・・・弱幾何検証部、７・・・強幾何検証部、８・・・信頼度推定部、９・・・スコアリング部、１０・・・正対応２Ｄ幾何変換行列推定部、１１・・・相対姿勢推定部、２０・・・データベース（ＤＢ）、３０・・・訓練部、４０・・・認識部 DESCRIPTION OF SYMBOLS 1 ... Subject recognition apparatus, 2 ... Pre-processing part, 3 ... Local feature extraction part, 4 ... Local feature-value, attribute information storage part, 5 ... Temporary corresponding point determination part, 6. .. Weak geometric verification unit, 7 ... Strong geometric verification unit, 8 ... Reliability estimation unit, 9 ... Scoring unit, 10 ... Positive correspondence 2D geometric transformation matrix estimation unit, 11 ... Relative posture estimation unit, 20 ... database (DB), 30 ... training unit, 40 ... recognition unit

Claims

An image information storage unit that stores local feature amounts extracted from a plurality of training images obtained by imaging a subject and attribute information indicating attributes of the local feature amounts in association with each training image;
A local feature extraction unit that extracts a local feature amount and the attribute information from a test image obtained by imaging a subject to be recognized;
A provisional corresponding point determining unit that compares the local feature amount extracted from the test image with the local feature amount stored in the image information storage unit to determine a provisional corresponding point;
Wherein constructing the attribute information of the type of number Vote separated by size of a given cell of the same number of dimensions and space for said tentative corresponding point all the relevant by calculating the amount of change between the attribute information A weak geometry verifier voting against a cell;
Based on the result of the voting, an inlier corresponding point that is not an outlier is obtained by fitting the provisional corresponding point to a predetermined geometric model and estimating a geometric transformation parameter for all cells having a predetermined number of votes or more. Strong geometry verification part,
Calculating a reliability of the inliers corresponding points, a reliability estimation unit value of the reliability is rejected low inliers point group than a predetermined value,
A scoring unit that outputs the sum of the inlier corresponding points not rejected as a score of the training image.

By obtaining a geometric transformation matrix of affine transformation or homography transformation from the inlier corresponding points that have not been rejected, the information indicating which point on the test image corresponds to the point on the training image is output. The subject recognition apparatus according to claim 1, further comprising a corresponding 2D geometric transformation matrix estimation unit.

The basic matrix was calculated from the inlier corresponding points that were not rejected, the focal length of the camera that captured each of the test image and the training image was estimated, and each of the basic matrix, the test image, and the training image was captured The subject recognition apparatus according to claim 1, further comprising a relative posture estimation unit that calculates and outputs a relative translation and rotation of the camera from a focal length of the camera.

A subject recognition apparatus including an image information storage unit that stores a local feature amount extracted from a plurality of training images obtained by imaging a subject and attribute information indicating an attribute of the local feature amount in association with each training image is performed. An object recognition method,
A local feature extraction step of extracting a local feature amount and the attribute information from a test image obtained by imaging a subject to be recognized;
A provisional corresponding point determination step of determining a provisional corresponding point by comparing the local feature amount extracted from the test image with the local feature amount stored in the image information storage unit;
Wherein constructing the attribute information of the type of number Vote separated by size of a given cell of the same number of dimensions and space for said tentative corresponding point all the relevant by calculating the amount of change between the attribute information Weak geometry verification step for voting against the cell;
Based on the result of the voting, an inlier corresponding point that is not an outlier is obtained by fitting the provisional corresponding point to a predetermined geometric model and estimating a geometric transformation parameter for all cells having a predetermined number of votes or more. A strong geometry verification step;
Calculating a reliability of the inliers corresponding points, a reliability estimation step the value of the reliability is rejected low inliers point group than a predetermined value,
A scoring step of outputting the sum of the inlier corresponding points not rejected as a score of the training image.

By obtaining a geometric transformation matrix of affine transformation or homography transformation from the inlier corresponding points that have not been rejected, the information indicating which point on the test image corresponds to the point on the training image is output. The subject recognition method according to claim 4, further comprising a corresponding 2D geometric transformation matrix estimation unit step.

The basic matrix was calculated from the inlier corresponding points that were not rejected, the focal length of the camera that captured each of the test image and the training image was estimated, and each of the basic matrix, the test image, and the training image was captured The object recognition method according to claim 4, further comprising a relative posture estimation step of calculating and outputting a relative translation and rotation of the camera from a focal length of the camera.

A subject recognition program for causing a computer to function as the subject recognition device according to any one of claims 1 to 3.