JP6172432B2

JP6172432B2 - Subject identification device, subject identification method, and subject identification program

Info

Publication number: JP6172432B2
Application number: JP2013001062A
Authority: JP
Inventors: 太田　雅彦; 雅彦太田; 岩元　浩太; 浩太岩元
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-01-08
Filing date: 2013-01-08
Publication date: 2017-08-02
Anticipated expiration: 2033-01-08
Also published as: JP2014134856A

Description

本発明は、撮影した画像や映像中に存在する物を精度良く識別するための技術に関する。 The present invention relates to a technique for accurately identifying an object present in a captured image or video.

コンピュータビジョンの分野において、カメラ等で取得した画像から複数の特徴点を抽出し、抽出した特徴点とデータベースに登録された被写体画像の特徴点とを照合することで被写体認識を行う技術が知られている。図６を用いて、そのような技術の例を説明する。カメラ装置610などの装置に含まれる画像取得部601は、取得した画像を特徴量抽出部602に入力する。特徴量抽出部602は、入力画像から複数の特徴点を抽出する。照合部603は、特徴量抽出部602にて抽出した特徴点と、データベース604に登録された被写体画像の特徴点とを照合することで被写体認識を行う。 In the field of computer vision, a technique is known in which a plurality of feature points are extracted from an image acquired by a camera or the like, and subject recognition is performed by comparing the extracted feature points with the feature points of a subject image registered in a database. ing. An example of such a technique will be described with reference to FIG. The image acquisition unit 601 included in a device such as the camera device 610 inputs the acquired image to the feature amount extraction unit 602. The feature amount extraction unit 602 extracts a plurality of feature points from the input image. The collation unit 603 performs subject recognition by collating the feature points extracted by the feature amount extraction unit 602 with the feature points of the subject image registered in the database 604.

このような被写体認識技術で用いられる特徴量抽出のアルゴリズムとしては、撮影サイズや角度の変化、オクルージョンに対して頑健に識別可能とするために、画像内の特徴的な点（特徴点）を多数検出し、各特徴点周辺の局所領域の特徴量（局所特徴量）を抽出する方式が提案されている。その代表的な方式として、ＳＩＦＴ（Scale-Invariant Feature Transform）やＳＵＲＦ（Speeded Up Robust Features）が知られている。ＳＩＦＴについては、例えば、特許文献１や非特許文献１に、その詳細が記載されている。 As an algorithm for feature quantity extraction used in such subject recognition technology, many feature points (feature points) in an image are used in order to be able to identify robustly against changes in shooting size, angle, and occlusion. A method for detecting and extracting feature quantities (local feature quantities) in a local region around each feature point has been proposed. As typical methods, SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features) are known. Details of SIFT are described in, for example, Patent Document 1 and Non-Patent Document 1.

米国特許第６７１１２９３号明細書US Pat. No. 6,711,293

David G. Lowe著、「Distinctive image features from scale-invariant keypoints」、（米国）、International Journal of Computer Vision、60(2)、2004年、p. 91-110David G. Lowe, "Distinctive image features from scale-invariant keypoints" (USA), International Journal of Computer Vision, 60 (2), 2004, p. 91-110

しかしながら、上記の文献に記載の技術により特徴量の抽出を行う被写体識別装置では、通常、撮影された画像全体の特徴量を抽出して照合を行う。そのため、認識対象物が画像中のどこに存在するかを考慮せずに利用できるようになる。一方で、撮影された画像の一部に認識対象物が存在した場合、画像全体から特徴量を抽出し、それらを照合する必要がある。この場合、本来特徴量抽出を行う必要がない領域から類似する特徴量が検出されるため、認識の精度が悪化することがある。 However, in a subject identification apparatus that extracts feature amounts by the technique described in the above-mentioned document, the feature amounts of the entire captured image are usually extracted and collated. Therefore, it can be used without considering where the recognition object exists in the image. On the other hand, when a recognition target exists in a part of the photographed image, it is necessary to extract feature amounts from the entire image and collate them. In this case, since a similar feature amount is detected from a region where it is not necessary to perform feature amount extraction, recognition accuracy may deteriorate.

本発明は上記の課題を解決するためになされたものである。本発明の目的は、被写体識別の認識精度を向上させる技術を提供することにある。 The present invention has been made to solve the above problems. An object of the present invention is to provide a technique for improving the recognition accuracy of subject identification.

本発明に係る被写体識別装置は、対象物のデプス画像を取得するデプス画像取得手段と、前記対象物の画像を取得する画像取得手段と、前記デプス画像取得手段により取得されたデプス画像から面を検出する面検出手段と、前記画像取得手段により取得された画像の中から、前記面検出手段で検出された面に対応する画像領域を切り出す画像加工手段と、前記加工手段で切り出された画像領域から、特徴量を抽出する特徴量抽出手段と、前記特徴量抽出手段で抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合手段とを備えることを特徴とする。 A subject identification device according to the present invention includes a depth image acquisition unit that acquires a depth image of an object, an image acquisition unit that acquires an image of the object, and a surface from the depth image acquired by the depth image acquisition unit. A surface detecting means for detecting; an image processing means for cutting out an image area corresponding to the surface detected by the surface detecting means from the images acquired by the image acquiring means; and an image area cut out by the processing means A feature amount extracting unit that extracts a feature amount; and a collation unit that collates the feature amount extracted by the feature amount extraction unit with a feature amount extracted from an image to be collated. .

本発明に係る被写体識別方法は、対象物を撮影してデプス画像を取得するデプス画像取得ステップと、前記対象物を撮影して画像を取得する画像取得ステップと、前記デプス画像から面を検出する面検出ステップと、前記面検出ステップにおいて面が検出された場合、前記画像取得ステップにおいて取得した画像から、前記検出された面に対応する画像領域を切り出す画像加工ステップと、前記画像加工ステップにおいて切り出された画像領域から特徴量を抽出する特徴量抽出ステップと、前記特徴量抽出ステップで抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合ステップと、前記照合の結果、特徴量が一致すると判断された場合、前記照合対象となる画像の被写体を識別する情報を出力するステップとを備えることを特徴とする。 In the subject identification method according to the present invention, a depth image acquisition step of acquiring a depth image by capturing an object, an image acquisition step of acquiring an image by capturing the object, and detecting a surface from the depth image When a surface is detected in the surface detection step, and in the surface detection step, an image processing step of cutting out an image area corresponding to the detected surface from the image acquired in the image acquisition step, and the image processing step A feature amount extracting step for extracting a feature amount from the image area obtained, a matching step for comparing the feature amount extracted in the feature amount extraction step with a feature amount extracted from an image to be collated, and As a result, when it is determined that the feature amounts match, a step of outputting information for identifying the subject of the image to be collated is provided. And wherein the Rukoto.

本発明に係るプログラムは、コンピュータを、対象物のデプス画像を取得するデプス画像取得手段、前記対象物の画像を取得する画像取得手段、前記デプス画像取得手段により取得されたデプス画像から面を検出する面検出手段、前記画像取得手段により取得された画像の中から、前記面検出手段で検出された面に対応する画像領域を切り出す画像加工手段、前記加工手段で切り出された画像領域から、特徴量を抽出する特徴量抽出手段、前記特徴量抽出手段で抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合手段として機能させることを特徴とする。 The program according to the present invention allows a computer to detect a surface from a depth image acquisition unit that acquires a depth image of an object, an image acquisition unit that acquires an image of the object, and a depth image acquired by the depth image acquisition unit. A surface detecting unit that performs image processing on the image acquired by the surface detecting unit from the image acquired by the image acquiring unit, an image processing unit that extracts an image region corresponding to the surface detected by the surface detecting unit, A feature amount extracting unit that extracts a quantity, and a feature unit that functions as a collating unit that collates a feature amount extracted by the feature amount extracting unit with a feature amount extracted from an image to be collated.

本発明によれば、被写体識別の認識精度を向上させることができる。 According to the present invention, the recognition accuracy of subject identification can be improved.

本発明の第１実施形態に係る被写体識別装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a subject identification device according to a first embodiment of the present invention. 本発明の第１実施形態に係る被写体識別方法の手順を示すフローチャートである。It is a flowchart which shows the procedure of the to-be-photographed object identification method which concerns on 1st Embodiment of this invention. 本発明の第１実施形態に係る画像加工部の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of the image process part which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る画像加工部の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of the image process part which concerns on 2nd Embodiment of this invention. 本発明の第３実施形態に係る被写体識別装置の構成を示すブロック図である。It is a block diagram which shows the structure of the to-be-photographed object identification apparatus which concerns on 3rd Embodiment of this invention. 従来技術の原理を示す図である。It is a figure which shows the principle of a prior art.

以下に、図面を参照して、本発明の実施の形態について例示的に詳しく説明する。ただし、以下の実施の形態に記載されている構成要素は単なる例示であり、本発明の技術範囲をそれらのみに限定する趣旨のものではない。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings. However, the constituent elements described in the following embodiments are merely examples, and are not intended to limit the technical scope of the present invention only to them.

［第１実施形態］
図１は、本発明の第１実施形態としての被写体識別装置の機能構成を示すブロック図である。本発明による被写体識別装置は、主な機能構成として、デプス画像取得部１０１、面検出部１０２、画像加工部１０３、画像取得部１０４、特徴量抽出部１０５、照合部１０６、及び特徴量（被写体）データベース１０７を有する。デプス画像取得部１０１及び画像取得部１０４は、カメラ装置１１０の一部として構成することができるが、独立した構成として構成することもできる。 [First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a subject identification device as a first embodiment of the present invention. The subject identification device according to the present invention has, as main functional configurations, a depth image acquisition unit 101, a surface detection unit 102, an image processing unit 103, an image acquisition unit 104, a feature amount extraction unit 105, a collation unit 106, and a feature amount (subject ) The database 107 is included. The depth image acquisition unit 101 and the image acquisition unit 104 can be configured as a part of the camera device 110, but can also be configured as independent configurations.

また、被写体識別装置は、ハードウェア構成として、制御部（ＣＰＵ（Central Processing Unit）など）、メモリ（ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）など）、記憶部（ハードディスクドライブなど）、通信部、表示部、及び操作部など、一般的なコンピュータが有する構成を備える。また、上記の各機能構成は、例えば、被写体識別装置が有する制御部が、記憶部からプログラムをメモリに展開して実行することで実現することができる。 The subject identification device has a hardware configuration including a control unit (CPU (Central Processing Unit), etc.), a memory (RAM (Random Access Memory), ROM (Read Only Memory), etc.), a storage unit (hard disk drive, etc.), The computer has the configuration of a general computer, such as a communication unit, a display unit, and an operation unit. Further, each functional configuration described above can be realized by, for example, a control unit included in the subject identification device developing a program from the storage unit into a memory and executing the program.

デプス画像取得部１０１は、被写体（対象物）を撮像することで、又は外部装置と通信を行うことで、デプス画像を取得する。ここでデプス画像とは、２次元画像（ｘｄ、ｙｄ）に対して、カメラ装置からの距離ｚｄが与えられる画像のことである。デプス画像を構成する各画素は、３次元空間上の点であり、３次元座標（ｘｄ、ｙｄ、ｚｄ）で表現することができる。すなわち、デプス画像は３次元空間上の点の集合を表す点群データとして扱っても良い。デプス画像を取得する方法としては、例えば、パターンをもったレーザー光線を対象物にあて、反射光のパターンのひずみを三角測量で計測し、距離を測定する、パターン投射による3次元画像取得方法を用いる。また、光が往復するのにかかる時間から距離を計測するＴｉｍｅｏｆＦｌｉｇｈｔ方式を利用し、光をパルス変調して画角内に照射し、イメージ・センサ側でこのパルスの位相遅れを計測する3次元画像取得方法を用いてもよい。さらに、対象物を複数の異なる方向から同時に撮影することにより、その両眼視差を再現し、奥行き方向ｚｄの情報も記録できるようにしたステレオカメラによる3次元画像取得方法を用いても良い。 The depth image acquisition unit 101 acquires a depth image by imaging a subject (target object) or by communicating with an external device. Here, the depth image is an image in which a distance zd from the camera device is given to the two-dimensional image (xd, yd). Each pixel constituting the depth image is a point on a three-dimensional space and can be expressed by three-dimensional coordinates (xd, yd, zd). That is, the depth image may be treated as point cloud data representing a set of points in the three-dimensional space. As a method for acquiring a depth image, for example, a three-dimensional image acquisition method by pattern projection is used in which a laser beam having a pattern is applied to an object, the distortion of the pattern of reflected light is measured by triangulation, and the distance is measured. . Also, using the Time of Flight method that measures the distance from the time it takes for the light to reciprocate, the light is pulse-modulated and irradiated into the angle of view, and the phase delay of this pulse is measured on the image sensor side 3 A dimensional image acquisition method may be used. Furthermore, a three-dimensional image acquisition method using a stereo camera may be used in which the binocular parallax is reproduced by simultaneously capturing images of the object from a plurality of different directions, and information in the depth direction zd can be recorded.

画像取得部１０４は、被写体（対象物）を撮像することで、又は外部装置と通信を行うことで、デプス画像ではない通常の２次元画像（以下、単に画像とも言う。）を取得する。画像取得部１０４は、デプス画像取得部１０１とおおよそ同じ方向から被写体を撮影した画像を取得する。取得したデプス画像と画像は、あらかじめそれぞれの画素単位で対応関係が定まっているものとする。例えば画像（ｘ、ｙ）に対して、ｘｄ＝ｘ+α、ｙｄ＝ｙ＋β、などの計算式で表現することができる。 The image acquisition unit 104 acquires a normal two-dimensional image (hereinafter also simply referred to as an image) that is not a depth image by imaging a subject (target object) or by communicating with an external device. The image acquisition unit 104 acquires an image obtained by photographing the subject from approximately the same direction as the depth image acquisition unit 101. It is assumed that the correspondence between the acquired depth image and the image is determined in advance for each pixel unit. For example, the image (x, y) can be expressed by a calculation formula such as xd = x + α, yd = y + β.

デプス画像取得部１０１で取得されたデプス画像は、面を検出する面検出部１０２に入力される。 The depth image acquired by the depth image acquisition unit 101 is input to the surface detection unit 102 that detects a surface.

面検出部１０２では、デプス画像の中から面の領域を検出する処理を行う。ここで、面とは、3次元空間における連続領域のことであり、例えば平面であっても曲面であってもよい。面検出部１０２で検出する面の領域は、ひとつでも複数であってもよい。検出された面の領域は、画像加工部１０３に入力される。 The surface detection unit 102 performs processing for detecting a surface region from the depth image. Here, the surface is a continuous region in the three-dimensional space, and may be a flat surface or a curved surface, for example. One or a plurality of surface areas may be detected by the surface detection unit 102. The detected surface area is input to the image processing unit 103.

面検出部１０２は、デプス画像の点群を、同一の面に属する点の部分集合に分割（セグメンテーション）することで、面の領域を検出することができる。具体的には、例えばデプス画像の各画素（点群の各点）に対して、その周辺の画素（点）の３次元座標値から法線ベクトルなどのパラメータ値を算出し、各画素（各点）の座標値とパラメータ値を用いてクラスタリングを行い、面領域を分割（セグメンテーション）することができる。こうすることで、各画素が持つ法線ベクトルの向きが連続的となっている領域を面として検出することができる。法線ベクトルの向きが連続的に変化しており、且つある一定の向きにそろっていることが条件とすれば、面の中でも特に平面を検出することができる。 The surface detection unit 102 can detect a surface area by dividing (segmenting) the point group of the depth image into a subset of points belonging to the same surface. Specifically, for example, for each pixel (each point in the point group) of the depth image, a parameter value such as a normal vector is calculated from the three-dimensional coordinate values of the surrounding pixels (points), and each pixel (each Clustering is performed using the coordinate values and parameter values of (points), and the surface area can be divided (segmented). By doing so, it is possible to detect a region where the normal vector direction of each pixel is continuous as a surface. If it is a condition that the direction of the normal vector is continuously changing and is aligned in a certain direction, a plane can be detected particularly among the surfaces.

ここで、デプス画像の各画素（点群の各点）に対して法線ベクトルを求める方法を説明する。デプス画像の各画素の周辺の画素の座標情報から法線ベクトルを算出する方法としては、共分散行列の主成分分析により求める方法がある。具体的には、ある注目画素に対して、その周辺の画素の座標値（xd、yd、zd）を用いて共分散行列（３×３行列）を算出する。そして、共分散行列の固有値と固有ベクトルを算出する。固有ベクトルのうち、固有値が最小となる固有ベクトルを、その注目画素の法線ベクトルとする。また、法線ベクトルを求める別の方法として、ある注目画素に対して、その周辺の画素の座標値から三角形を形成し、その外積をいくつか求め、その平均を法線ベクトルとする方法もある。 Here, a method of obtaining a normal vector for each pixel (each point in the point group) of the depth image will be described. As a method of calculating the normal vector from the coordinate information of the pixels around each pixel of the depth image, there is a method of obtaining by a principal component analysis of a covariance matrix. Specifically, for a given pixel of interest, a covariance matrix (3 × 3 matrix) is calculated using the coordinate values (xd, yd, zd) of the surrounding pixels. Then, eigenvalues and eigenvectors of the covariance matrix are calculated. Among the eigenvectors, the eigenvector having the smallest eigenvalue is set as the normal vector of the target pixel. As another method for obtaining a normal vector, there is a method for forming a triangle from the coordinate values of surrounding pixels for a certain target pixel, obtaining several outer products thereof, and using the average as a normal vector. .

次に、デプス画像の各画素（点群の各点）に対して求まった法線ベクトルを用いて、面領域を分割（セグメンテーション）する方法について説明する。１つの方法としては、ＲｅｇｉｏｎＧｒｏｗｉｎｇ（領域拡大）という方法がある。この方法は、法線ベクトルが類似する（すなわち法線ベクトル間の内積が大きい）周辺の画素（周辺の点）を順次マージしていき、面を分割する方法である。まずある画素（ある点）を領域の初期値として、その初期値の画素（点）の法線ベクトルと、その周辺の画素（周辺の点）の法線ベクトルと比較して、類似する法線ベクトルを持つ画素（点）を領域に追加する。これを順次行っていくことで、連続的に法線ベクトルが変化する画素（点）の集合を、面の領域として分割（セグメンテーション）することができる。 Next, a method for dividing (segmenting) a surface area using a normal vector obtained for each pixel (each point of the point group) of the depth image will be described. As one method, there is a method called Region Growing. This method is a method of dividing a surface by sequentially merging peripheral pixels (peripheral points) having similar normal vectors (that is, having a large inner product between the normal vectors). First, a certain pixel (a point) is set as the initial value of the region, and the normal vector of the pixel (point) of the initial value is compared with the normal vector of the surrounding pixel (peripheral point), and similar normal lines are compared. A pixel (point) having a vector is added to the region. By sequentially performing this, a set of pixels (points) whose normal vector continuously changes can be divided (segmented) as a surface area.

また、各画素（点群の各点）の３次元座標値と法線ベクトルなどのパラメータ値を用いて、グラフ理論に基づくカット手法を適用して、面の領域を分割（セグメンテーション）してもよい。具体的には、例えば、類似度を座標値間の距離と法線ベクトルの類似度に基づいて画素間（点間）の類似度を算出してグラフを形成し、ＮｏｒｍａｌｉｚｅｄＣｕｔなどの手法で、画素（点）の集合を分割してもよい。 In addition, by using a three-dimensional coordinate value of each pixel (each point of the point group) and a parameter value such as a normal vector, a cutting method based on graph theory is applied to divide the surface area (segmentation). Good. Specifically, for example, the similarity is calculated by calculating the similarity between pixels (between points) based on the distance between coordinate values and the similarity of normal vectors, and a method such as Normalized Cut is used. A set of pixels (points) may be divided.

そのほかにも、平面を検出する手法には、従来は2次元の画像処理で直線認識に利用されていたHough変換を3次元に拡張した3次元Hough変換利用し、3次元的な直線、すなわち平面を検出する方法がある。デプス画像から得られる3次元座標情報を、3次元Hough変換を通すことで平面の領域の位置姿勢を検出することが可能となる。また、ハフ変換の他にもＲＡＮＳＡＣ（RANdom SAmple Consensus）などのロバスト推定手法を用いてもよい。また、最小二乗法を用いて推定してもよい。 In addition, the method for detecting a plane uses a three-dimensional Hough transform, which is a three-dimensional extension of the Hough transform, which has been used for straight line recognition in two-dimensional image processing. There is a way to detect. By passing 3D coordinate information obtained from the depth image through 3D Hough transform, it becomes possible to detect the position and orientation of the planar area. In addition to the Hough transform, a robust estimation method such as RANSAC (RANdom SAmple Consensus) may be used. Moreover, you may estimate using the least squares method.

以上のような方法により、面検出部１０２は、デプス画像から面の領域を検出する。面検出部１０２は、検出した面の領域を特定する情報を画像加工部１０３に入力する。ここで、面の領域を特定する情報とは、例えば、その面領域を構成するデプス画像の画素値の集合（点の集合）などである。また例えば、面の領域を特定する情報は、その面領域の輪郭線や、その面領域を含むバウンディング・ボックスなどの情報でもよい。 By the method as described above, the surface detection unit 102 detects a surface region from the depth image. The surface detection unit 102 inputs information specifying the detected surface area to the image processing unit 103. Here, the information for specifying the surface area is, for example, a set of pixel values (set of points) of the depth image constituting the surface area. Further, for example, the information specifying the surface area may be information such as a contour line of the surface area or a bounding box including the surface area.

また、面検出部１０２は、面の領域を特定する情報だけでなく、同時に検出した面の向き示す情報を算出、出力してもよい。面の向きを示す情報は、例えば面が平面の場合は、その法線ベクトル（a,b,c）などでもよい。平面の向きを表す法線ベクトルは、例えば、その面を構成する画素集合（点集合）の各画素（各点）の法線ベクトルの平均として算出してもよい。なお、面の向きを示す情報は、後述する第２実施形態の画像加工部１０３で用いられる。第１実施形態においては、面の向きを示す情報の出力は不要である。 Further, the surface detection unit 102 may calculate and output not only information for specifying the surface area but also information indicating the orientation of the surface detected at the same time. The information indicating the orientation of the surface may be, for example, the normal vector (a, b, c) when the surface is a plane. The normal vector representing the orientation of the plane may be calculated, for example, as the average of the normal vectors of each pixel (point) of the pixel set (point set) that constitutes the plane. Note that the information indicating the orientation of the surface is used by the image processing unit 103 of the second embodiment to be described later. In the first embodiment, it is not necessary to output information indicating the orientation of the surface.

画像取得部１０４で取得された画像は、画像加工部１０３に入力される。
画像加工部１０３では、画像取得部１０４で取得された画像の中から、面検出部１０２で検出された面の領域に対応した画像領域を切り出し、切り出した画像領域を特徴量抽出部１０５に入力する。ここでは、物理的に切り出さなくても、検出された面の領域に対応した画像領域の四隅の座標値を与えるなど、画像中の画像領域を特定する情報を与える方法を用いてもよい。 The image acquired by the image acquisition unit 104 is input to the image processing unit 103.
In the image processing unit 103, an image region corresponding to the surface region detected by the surface detection unit 102 is cut out from the image acquired by the image acquisition unit 104, and the cut-out image region is input to the feature amount extraction unit 105. To do. Here, a method of giving information for specifying an image area in the image, such as giving coordinate values of four corners of the image area corresponding to the detected area of the surface, may be used without physically cutting out.

画像加工部１０３は、例えば面検出部１０２から入力される面領域の情報が、デプス画像の画素値の集合（点の集合）である場合は、例えばその画素値の集合を含む２次元画像上のバウンディング・ボックスを求め、そのバウンディング・ボックスを切り出す画像領域としてもよい。またこのとき、面検出部１０２が検出した面領域に対応しない切り出された画像領域内の画素については、例えば単色で塗りつぶして画像情報を除去してもよい。 For example, when the surface area information input from the surface detection unit 102 is a set of pixel values of a depth image (a set of points), the image processing unit 103 is, for example, on a two-dimensional image including the set of pixel values. The bounding box may be obtained and an image area from which the bounding box is cut out may be used. At this time, for example, pixels in a cut image region that does not correspond to the surface region detected by the surface detection unit 102 may be painted with a single color to remove image information.

特徴量抽出部１０５では、画像加工部１０３から入力された画像領域に対して、特徴量を抽出する。例えばSIFT特徴量を用いて画像の特徴点から局所特徴量を抽出する。特徴量抽出部１０５で抽出した画像領域の特徴量は、照合部１０６に入力される。なお、ここではSIFT特徴量のような局所特徴量を例に挙げたが、それに限らず、グローバル特徴量を抽出してもよい。例えば、国際標準規格ISO/IEC 15938-3(MPEG-7 Visual)に規定されているColor Histogram, Color Layout, Edge Histogram, Image Signature, Video Signatureなどの視覚特徴量などを用いてもよい。 The feature amount extraction unit 105 extracts a feature amount from the image region input from the image processing unit 103. For example, local feature amounts are extracted from feature points of an image using SIFT feature amounts. The feature amount of the image area extracted by the feature amount extraction unit 105 is input to the matching unit 106. Here, the local feature amount such as the SIFT feature amount is taken as an example, but the present invention is not limited to this, and a global feature amount may be extracted. For example, visual features such as Color Histogram, Color Layout, Edge Histogram, Image Signature, and Video Signature defined in the international standard ISO / IEC 15938-3 (MPEG-7 Visual) may be used.

特徴量（被写体）データベース１０７には、あらかじめ識別の対象となる被写体画像から抽出された特徴量が格納されている。なお、特徴量データベース１０７にあらかじめ登録されている画像の特徴量は、特徴量抽出部１０５で用いた特徴量抽出と同じ方法で特徴量抽出されているものである。特徴量データベース１０７に登録されている特徴量は、例えば識別の対象となる被写体を正面から撮影した画像から抽出されたものであってもよい。 The feature quantity (subject) database 107 stores feature quantities extracted in advance from subject images to be identified. Note that the feature amount of the image registered in advance in the feature amount database 107 is the feature amount extracted by the same method as the feature amount extraction used in the feature amount extraction unit 105. For example, the feature amount registered in the feature amount database 107 may be extracted from an image obtained by photographing a subject to be identified from the front.

照合部１０６では、特徴量抽出部１０５で生成された画像領域の特徴量と、特徴量(被写体)データベース１０７にあらかじめ格納されている被写体画像の特徴量とを照合し、一致するデータがあるか否か（すなわち２つの画像の間で同一の被写体を含むか否か）を判定する。一致するデータがあると判定された場合、照合結果として、そのデータに基づいて、認識した対象物を特定する情報を出力する。 The collation unit 106 collates the feature amount of the image area generated by the feature amount extraction unit 105 with the feature amount of the subject image stored in the feature amount (subject) database 107 in advance, and whether there is matching data. It is determined whether or not (that is, whether or not the same subject is included between the two images). When it is determined that there is matching data, information that identifies the recognized object is output as a comparison result based on the data.

照合の方法としては、例えば、特徴量抽出部１０５から入力された特徴量が、多数の特徴点から抽出された局所特徴量（ＳＩＦＴ特徴量など）である場合、下記の方法で行うことができる。まず、入力された画像領域の特徴点と、特徴量(被写体)データベース１０７にあらかじめ登録されている被写体画像の特徴量データの特徴点との、局所特徴量間距離を用いて対応点を検出する。次に、対応点の位置関係から幾何学的整合性を検証することで、2つの画像間で共通の被写体を含むか否かを判定する。 As a collation method, for example, when the feature quantity input from the feature quantity extraction unit 105 is a local feature quantity (SIFT feature quantity or the like) extracted from a large number of feature points, the following method can be used. . First, corresponding points are detected using the distance between local feature amounts between the feature points of the input image region and the feature points of the feature amount data of the subject image registered in the feature amount (subject) database 107 in advance. . Next, it is determined whether or not a common subject is included between the two images by verifying the geometric consistency from the positional relationship of the corresponding points.

対応点の検出では、例えば、入力画像の任意の特徴点と、参照画像の全ての特徴点との局所特徴量間のユークリッド距離を算出し、最小の距離値と2番目に小さい距離値との比率が閾値以下ならば、最小距離の特徴点を対応点と判定する方法を用いてもよい。2つの局所特徴量間距離の比率を用いることで、他の特徴点と比較して十分に局所特徴量間距離が小さい対応点のみを検出できるため、誤対応を抑制できる。 In the detection of corresponding points, for example, the Euclidean distance between the local feature values of any feature point of the input image and all the feature points of the reference image is calculated, and the minimum distance value and the second smallest distance value are calculated. If the ratio is equal to or smaller than the threshold, a method of determining the feature point with the minimum distance as the corresponding point may be used. By using the ratio of the distances between the two local feature amounts, it is possible to detect only corresponding points having a sufficiently small distance between the local feature amounts as compared to other feature points, and thus it is possible to suppress miscorrespondence.

幾何学的整合性の判定では，例えば、対応点の位置関係を用いて画像間の幾何変換パラメータを推定する。推定する幾何変換モデルには基礎行列や射影変換行列（ホモグラフィ行列）などが用いられ，幾何変換パラメータの推定方法にはロバストな推定が可能なRANSAC(RANdom SAmple Consensus)を用いる方法がある。 In the determination of geometric consistency, for example, a geometric transformation parameter between images is estimated using a positional relationship between corresponding points. As a geometric transformation model to be estimated, a basic matrix or a projective transformation matrix (homography matrix) is used, and as a method for estimating a geometric transformation parameter, there is a method using RANSAC (RANdom SAmple Consensus) capable of robust estimation.

RANSAC で推定した幾何変換パラメータに対して整合する対応点をインライア、それ以外の対応点をアウトライア（外れ値）とする。アウトライアに対してインライアが十分多ければ，推定した幾何変換パラメータの信頼度が高いと判断できるため，画像間で共通の被写体が含まれていると判定する。このように SIFT 特徴量などの局所特徴量とRANSAC を用いることで被写体を識別できる。 The corresponding points that match the geometric transformation parameters estimated by RANSAC are inliers, and the other corresponding points are outliers. If there are a sufficient number of inliers relative to the outliers, it can be determined that the reliability of the estimated geometric transformation parameter is high, and therefore it is determined that a common subject is included between the images. In this way, the subject can be identified by using RANSAC and local feature quantities such as SIFT feature quantities.

被写体を識別した結果、同一の被写体を含む画像を見つけた場合は、識別結果として、識別対象（被写体）を特定する識別情報を出力する。また、SIFT特徴量などの局所特徴量を用いた場合は、照合において幾何学的整合性の検証を行った結果、ホモグラフィ行列などの幾何変換の情報が得られるため、認識した識別対象の識別情報だけでなく、画像加工部１０３で切り取られた画像領域内での識別対象の位置（被写体の正確な位置）や方向を得ることができ、これらの情報（例えばホモグラフィ行列など）を出力することができる。 As a result of identifying the subject, if an image including the same subject is found, identification information for identifying the identification target (subject) is output as the identification result. In addition, when using local feature quantities such as SIFT feature quantities, geometrical consistency information is obtained as a result of verification of geometric consistency in collation. In addition to the information, the position (the exact position of the subject) and direction of the identification target in the image region cut out by the image processing unit 103 can be obtained, and the information (for example, a homography matrix) is output. be able to.

図２のフローチャートを参照して、上記したような被写体識別の処理方法について説明する。この処理は、例えば、被写体識別装置が有する制御部が、記憶部に記憶されたプログラムをメモリに展開して実行することで制御される。 A subject identification processing method as described above will be described with reference to the flowchart of FIG. This process is controlled by, for example, a control unit included in the subject identification device expanding and executing a program stored in the storage unit in the memory.

まず、デプス画像取得部101および画像取得部104により被写体面を撮影する（ステップS201）。次に、面検出部102は、デプス画像取得部101から出力されたデプス画像に対して、面検出処理を行う（ステップS202）。次に、面検出部102により面が検出されたかを判断し（ステップS203）、検出されなかった場合には、処理を終了する。面が検出された場合には、画像加工部103は、画像取得部104により取得された画像の中から、面検出部102により検出された全ての面に対応する画像領域を切り出す処理が行われる（ステップS204）。 First, the depth image acquisition unit 101 and the image acquisition unit 104 capture a subject surface (step S201). Next, the surface detection unit 102 performs surface detection processing on the depth image output from the depth image acquisition unit 101 (step S202). Next, it is determined whether a surface is detected by the surface detection unit 102 (step S203). If no surface is detected, the process ends. When a surface is detected, the image processing unit 103 performs a process of cutting out image areas corresponding to all the surfaces detected by the surface detection unit 102 from the image acquired by the image acquisition unit 104. (Step S204).

次に、特徴量抽出部105は、画像加工部103から出力された画像領域の中から、最大の画像領域に対して、特徴量を抽出する処理を行う（ステップS205）。次に、照合部106は、抽出した特徴量と、特徴量データベース107に登録された特徴量とを照合することによって、画像領域から抽出された特徴量の中に、特徴量データベースに登録されている特徴量データとマッチする特徴量があるかどうかを判定する（ステップS206）。 Next, the feature amount extraction unit 105 performs a process of extracting a feature amount for the largest image region from the image regions output from the image processing unit 103 (step S205). Next, the collation unit 106 collates the extracted feature quantity with the feature quantity registered in the feature quantity database 107, thereby registering the feature quantity extracted from the image region in the feature quantity database. It is determined whether there is a feature quantity that matches the feature quantity data that is present (step S206).

あると判定された場合には、照合部106は、照合結果として認識した対象物を特定する情報を出力し（ステップS207）、ないと判定された場合には当該情報を出力する処理をスキップする。その後、特徴量抽出していない画像領域が残っているかどうかが判定される（ステップS208）。画像領域が残っている場合には、再び最大の画像領域に対して特徴量抽出を行う処理（ステップS205）に戻り、その後、照合が行われていない画像領域がなくなるまでステップS205〜ステップS208を繰り返す。画像領域がなくなったら、処理を終了する。 If it is determined that there is a collation, the collation unit 106 outputs information that identifies the object recognized as the collation result (step S207), and if it is determined that there is no collation, the process of outputting the information is skipped. . Thereafter, it is determined whether or not an image area from which the feature amount has not been extracted remains (step S208). When the image area remains, the process returns to the process of extracting the feature amount with respect to the maximum image area again (step S205), and thereafter, steps S205 to S208 are performed until there is no image area that has not been collated. repeat. When there is no more image area, the process ends.

以上が本実施形態における被写体識別処理の方法である。なお、上記は画像の処理の場合を記しているが、動画を処理する場合は、上記の処理を毎フレーム繰り返すことで対応できる。 The above is the method of subject identification processing in this embodiment. The above describes the case of image processing. However, when processing a moving image, it can be handled by repeating the above processing every frame.

図３のフローチャートを参照して、画像加工部103の処理について説明する。まず、画像加工部103は、画像取得部104から画像を得る（ステップS301）。次に、画像加工部103は、面検出部から面の領域を取得する（ステップS302）。最後に、画像加工部103は、面の領域と対応している画像領域を切り出し（ステップS303）、処理を終了する。 The processing of the image processing unit 103 will be described with reference to the flowchart of FIG. First, the image processing unit 103 obtains an image from the image acquisition unit 104 (step S301). Next, the image processing unit 103 acquires a surface area from the surface detection unit (step S302). Finally, the image processing unit 103 cuts out an image area corresponding to the surface area (step S303), and ends the process.

本実施形態によれば、映像中の認識対象物に対して面検出部１０２で検出されていない面に対応する画像領域に対しては特徴量抽出・照合を行わず、面検出されている面に対応する画像領域に対してのみ特徴量抽出・照合を行うことができるため、被写体識別の認識の精度を上げることができる。 According to the present embodiment, a surface whose surface is detected without performing feature amount extraction / collation for an image region corresponding to a surface that is not detected by the surface detection unit 102 with respect to a recognition target in the video. Since the feature amount extraction / collation can be performed only on the image region corresponding to the object area, the recognition accuracy of the subject identification can be improved.

また、画面全体が面検出される場合を除き、従来の画像全体の特徴量を抽出して照合を行う場合よりも、特徴量抽出・照合を行う画像領域を限定させることができるため、被写体識別の処理を高速化させることができる。 In addition, except for the case where the entire screen is detected, it is possible to limit the image area where feature amount extraction / collation is performed, compared to the case where the feature amount of the entire entire image is extracted and collation is performed. Can be speeded up.

なお、図２の特徴量抽出していない画像領域が残っているかどうかを判定する（ステップS208）において、あるサイズ以上の画像領域が残っているかを判定するステップに変更することも可能である。画像領域のサイズにある閾値を設け、それより小さいサイズの領域、例えば特徴量抽出や照合処理が困難なほど小さい画像領域に対しては、ステップS205〜ステップS206を行っても結果が得られない、もしくは正確な結果が得られない場合がある。そのような小さなサイズの画像領域に対する特徴量抽出や照合処理をスキップすることで、さらに識別精度を高めることができる。また、無駄な特徴量抽出・照合処理をスキップすることができるため、処理を高速化することができる。 In addition, it is possible to change to the step of determining whether or not an image area of a certain size or more remains in the determination of whether or not the image area from which the feature amount is not extracted remains (step S208). A threshold is set for the size of the image region, and for a region having a smaller size, for example, an image region that is so small that feature extraction or collation processing is difficult, results are not obtained even if steps S205 to S206 are performed. Or, accurate results may not be obtained. By skipping feature amount extraction and collation processing for such a small-sized image region, the identification accuracy can be further improved. In addition, the useless feature amount extraction / collation process can be skipped, so the processing can be speeded up.

［第２実施形態］
次に、本発明の第２実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、画像加工部１０３において、切り出した画像領域に対して、向き補正を行う。この点で、画像加工部１０３において画像領域を切り出すだけである第１実施形態とは異なる。その他の構成および動作は、第１実施形態と同様であるため、同じ構成および動作については説明を省略する。 [Second Embodiment]
Next, a video processing apparatus according to the second embodiment of the present invention will be described. In the image processing apparatus according to the present embodiment, the image processing unit 103 performs orientation correction on the cut-out image area. This is different from the first embodiment in which the image processing unit 103 only cuts out an image region. Other configurations and operations are the same as those in the first embodiment, and thus description of the same configurations and operations is omitted.

面検出部１０２で検出された面の領域に対応した画像領域を切り出し、切り出した画像を、検出した面の向きの情報を用いて幾何補正する。ここで幾何補正とは、例えば拡大・縮小、アフィン変換、射影変換などを含む幾何変換により画像を補正する処理である。例えば、画像加工部１０３は、面検出部１０２で検出した面の向きが正面向きになるように、切り出された画像領域の画像を幾何補正してもよい。ここで正面向きとは、画像取得部１０４で取得する画像面に対して平行になる向き、すなわちカメラの光軸（ｚ軸）に対して垂直になる向き、のことである。 An image region corresponding to the region of the surface detected by the surface detection unit 102 is cut out, and the cut-out image is geometrically corrected using information on the detected surface direction. Here, the geometric correction is processing for correcting an image by geometric transformation including, for example, enlargement / reduction, affine transformation, projective transformation, and the like. For example, the image processing unit 103 may geometrically correct the image of the clipped image region so that the direction of the surface detected by the surface detection unit 102 is the front direction. Here, the front direction means a direction parallel to the image plane acquired by the image acquisition unit 104, that is, a direction perpendicular to the optical axis (z axis) of the camera.

画像加工部１０３が、面検出部１０２で検出した面の向きを用いて、画像を幾何補正する具体例について説明する。検出した面の向きは、例えば、その面の法線ベクトル（a、ｂ、ｃ）として表現することができる。面の法線ベクトルの算出方法については、第１実施形態の面検出部１０２で説明した。第２実施形態においては、面検出部１０２は、面の法線ベクトル（ａ，ｂ，ｃ）などの面の向きを示す情報も画像加工部１０３に供給する。 A specific example in which the image processing unit 103 geometrically corrects an image using the orientation of the surface detected by the surface detection unit 102 will be described. The orientation of the detected surface can be expressed as, for example, the normal vector (a, b, c) of the surface. The method for calculating the normal vector of the surface has been described in the surface detection unit 102 of the first embodiment. In the second embodiment, the surface detection unit 102 also supplies information indicating the orientation of the surface, such as a normal vector (a, b, c) of the surface, to the image processing unit 103.

向きの補正方法に関して、例として下記では面検出部１０２で検出された面が平面である場合、および、画像加工処理１０３において、画像を正面に向ける場合について、説明する。 As an example of the orientation correction method, a case where the surface detected by the surface detection unit 102 is a flat surface and a case where the image is directed to the front in the image processing 103 will be described below.

面検出部１０２が出力した面の向きを表す法線ベクトル（ａ，ｂ，ｃ）を用いて、面に対応する画像領域の画像を、正面から撮影した画像に幾何変換することで画像を補正する（幾何補正）。これにより、面を正面に向けた画像を生成することができる。ここで、幾何補正により補正された画像を補正画像と呼ぶ。 Using the normal vector (a, b, c) representing the orientation of the surface output by the surface detection unit 102, the image is corrected by geometrically converting the image of the image region corresponding to the surface into an image photographed from the front. (Geometric correction). As a result, it is possible to generate an image with the surface facing the front. Here, an image corrected by geometric correction is referred to as a corrected image.

補正画像の生成方法の一例ついて、説明する。ここで、画像取得部１０４のカメラ装置に対するｘｙｚ座標系を定義する。ｘ軸を画像面（画像取得部１０４が取得した画像）の左右方向（右向きを正とする）、ｙ軸を画像面の上下方向（下向きを正とする）、ｚ軸をカメラ装置の光軸方向（撮影された画像面の方向を正とする）とする。また、カメラ装置の光学中心を原点０、焦点距離をｆとする。 An example of a method for generating a corrected image will be described. Here, an xyz coordinate system for the camera device of the image acquisition unit 104 is defined. The x-axis is the left-right direction of the image plane (image acquired by the image acquisition unit 104) (rightward is positive), the y-axis is the vertical direction of the image plane (downward is positive), and the z-axis is the optical axis of the camera device. The direction (the direction of the captured image plane is positive). Further, the optical center of the camera device is defined as an origin 0 and the focal length is defined as f.

まず、ｘｙｚ座標系のｚ軸を、検出した面の法線ベクトル（ａ，ｂ，ｃ）の向きに回転することを考える。すなわち、次の式
を満たすような回転行列Ｒ（3×3行列）について考える。こうすることで、検出した面に対して正面向きの視点を得ることができる。なお、法線ベクトル（ａ，ｂ，ｃ）は単位ベクトルに正規化されているものとする。ここで回転は、まずｙ軸まわりにφだけ回転し、次にｘ軸まわりにθだけ回転する、という手順で行うものとする。すると、回転角θとφは、ａ，ｂ，ｃを用いると、次式で算出される。
First, consider rotating the z-axis of the xyz coordinate system in the direction of the normal vector (a, b, c) of the detected surface. That is, the following formula
Consider a rotation matrix R (3 × 3 matrix) that satisfies By doing so, it is possible to obtain a front-facing viewpoint with respect to the detected surface. Note that the normal vector (a, b, c) is normalized to a unit vector. Here, the rotation is performed by the procedure of first rotating about φ around the y axis and then rotating about θ about the x axis. Then, the rotation angles θ and φ are calculated by the following equations using a, b, and c.

従って、回転行列Ｒは、次式として求められる。
Accordingly, the rotation matrix R is obtained as the following equation.

次に、画像座標（ｘ、ｙ）を検出した面上に座標変換することを考える。つまり、画像面の１点に対応する原点０からの３次元ベクトル（ｘ、ｙ、ｆ）を延長した時に、検出した面と交差する点ｐを座標変換後の画像座標とする。ｘｙｚ座標系を基準として、点ｐに対応する３次元ベクトルをＰとすると、Ｐは次式で表される。
Next, consider the coordinate transformation on the surface where the image coordinates (x, y) are detected. That is, when the three-dimensional vector (x, y, f) from the origin 0 corresponding to one point on the image plane is extended, the point p that intersects the detected plane is set as the image coordinate after coordinate conversion. Assuming that the three-dimensional vector corresponding to the point p is P with respect to the xyz coordinate system, P is expressed by the following equation.

ここでｋはｋ＞０であり、原点０から面までの距離を表す拡大係数である。ここで、検出した平面と正対させたときの３次元ベクトルＰ´は、Ｐに対して回転行列Ｒを用いて座標変換することで、求まる。
Here, k is k> 0, and is an expansion coefficient that represents the distance from the origin 0 to the surface. Here, the three-dimensional vector P ′ when facing the detected plane is obtained by performing coordinate transformation on P using the rotation matrix R.

数式６の３次元ベクトルＰ´のｘ座標、ｙ座標を、点（ｘ，ｙ）の座標変換後の画像座標とすることで、幾何補正後の画像座標（ｘ，ｙ）を取得できる。すなわち、整理すると次式となる。
By using the x-coordinate and y-coordinate of the three-dimensional vector P ′ in Expression 6 as the image coordinates after coordinate conversion of the point (x, y), the image coordinates (x, y) after geometric correction can be acquired. In other words, the following formula is obtained.

数式７を用いて画像面上のある座標（ｘ，ｙ）を補正画像の座標（Ｘ，Ｙ）に座標変換することで、補正画像を生成できる。補正画像の画像座標（Ｘ，Ｙ）に対応する変換前の画像座標（ｘ，ｙ）は一般に整数にならないが、最近傍補間、バイリニア補間、バイキュービック補間やB-スプライン補間法などの補間方法を用いればよい。 A corrected image can be generated by converting a certain coordinate (x, y) on the image plane to the coordinate (X, Y) of the corrected image using Expression 7. Although the image coordinates (x, y) before conversion corresponding to the image coordinates (X, Y) of the corrected image are not generally integers, interpolation methods such as nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, and B-spline interpolation are used. May be used.

上記したような画像加工部１０３で行われる画像の切り出し及び向き補正の方法について、図４のフローチャートにより説明する。まず、画像取得部から画像を得る（ステップS401）。次に、面検出部から面の領域を取得し、面の領域から面の向きを表す３つのパラメータを求める（ステップS402）。次に、面の領域と対応している画像領域を切り出す（ステップS403）。次に、得られた（ａ，ｂ，ｃ）を基に、先に説明した手順で回転行列Ｒを算出する（ステップS404）。最後に、以上の手順で求めた（ａ，ｂ，ｃ）及び回転行列Ｒを基に、数式７の座標変換を施すことにより向き補正画像を作成し（ステップS405）、処理を終了する。 A method of image clipping and orientation correction performed by the image processing unit 103 as described above will be described with reference to the flowchart of FIG. First, an image is obtained from the image acquisition unit (step S401). Next, a surface area is acquired from the surface detection unit, and three parameters representing the surface orientation are obtained from the surface area (step S402). Next, an image region corresponding to the surface region is cut out (step S403). Next, based on the obtained (a, b, c), the rotation matrix R is calculated by the procedure described above (step S404). Finally, based on (a, b, c) obtained by the above procedure and the rotation matrix R, a direction-corrected image is created by performing coordinate transformation of Formula 7 (step S405), and the process ends.

なお、上記の例では特徴量（被写体）データベース１０７の特徴量を取得した画像が正面向きで登録されていることが多いため、正面向きに補正と記述したが、必ずしも正面向きに補正する必要はなく、データベース画像と同じ向きに補正すればよい。 In the above example, since the image obtained from the feature quantity (subject) database 107 is often registered in the front direction, it is described as correction in the front direction, but it is not always necessary to correct in the front direction. However, it may be corrected in the same direction as the database image.

本実施形態によれば、画像加工部５０３は、映像中の認識対象物に対して面検出部１０２にて面検出されている面に対応する画像取得部５０４で取得された画像の画像領域に対して、画像の向きを補正して特徴量抽出・照合を行うことができる。そのため、被写体識別の認識の精度を上げることができる。 According to the present embodiment, the image processing unit 503 applies the image area of the image acquired by the image acquisition unit 504 corresponding to the surface detected by the surface detection unit 102 to the recognition target in the video. On the other hand, feature amount extraction / collation can be performed by correcting the orientation of the image. As a result, the accuracy of recognition of subject identification can be increased.

［第３実施形態］
次に、本発明の第３実施形態に係る映像処理装置について説明する。本実施形態に係る映像処理装置は、第１実施形態および第２実施形態と比べると、被写体識別を行った後、被写体識別結果とその被写体の位置や面の向きを示す情報を用いて、識別結果に応じた画像を、識別された位置や面の向きに応じて投影する点で異なる。その他の構成および動作は、第１実施形態および第２実施形態と同様であるため、同じ構成および動作については説明を省略する。 [Third Embodiment]
Next, a video processing apparatus according to the third embodiment of the present invention will be described. Compared with the first and second embodiments, the video processing apparatus according to the present embodiment performs subject identification, and then uses the information indicating the subject identification result and the position and orientation of the subject to identify the subject. An image corresponding to the result is different in that the image is projected according to the identified position and the orientation of the surface. Other configurations and operations are the same as those in the first embodiment and the second embodiment, and thus the description of the same configurations and operations is omitted.

照合部５０６は、特徴量抽出部５０５において抽出された特徴量と特徴量（被写体）データベース５０７に登録された特徴量とを照合する。照合部５０６は、照合の結果、特徴量（被写体）データベース５０７に一致するデータがあると判定した場合、照合結果として、そのデータの識別情報（識別結果（ID））と、画像加工部５０３において切り取られた画像領域における被写体の位置や向きの情報（識別結果（位置））とを出力する。識別結果（ID）と識別結果（位置）は、それぞれ投影元画像選択部５０８、投影画像補正部５１１に入力される。 The collation unit 506 collates the feature amount extracted by the feature amount extraction unit 505 with the feature amount registered in the feature amount (subject) database 507. When the collation unit 506 determines that there is matching data in the feature quantity (subject) database 507 as a result of the collation, the collation result includes identification information (identification result (ID)) of the data and the image processing unit 503. Information on the position and orientation of the subject in the cut image area (identification result (position)) is output. The identification result (ID) and the identification result (position) are input to the projection source image selection unit 508 and the projection image correction unit 511, respectively.

投影元画像選択部５０８では、投影元画像データベース５０９の中から、入力された照合結果（ID）に対応付けられた投影元画像と投影相対位置を選択して投影画像補正部５１１に出力する。投影元画像は、被写体画像に対応づけられて予め投影元画像データベース５０９に登録された画像である。投影相対位置は、後述する画像投影部５１２において投影元画像を投影するときの相対位置である。 The projection source image selection unit 508 selects a projection source image and a projection relative position associated with the input collation result (ID) from the projection source image database 509 and outputs them to the projection image correction unit 511. The projection source image is an image previously registered in the projection source image database 509 in association with the subject image. The projection relative position is a relative position when the projection image is projected by the image projection unit 512 described later.

投影画像補正部５１１では、前記面検出部５０２から得られた面の向きを示す情報と、照合部５０６で得られた被写体の位置や向きの情報（識別結果（位置））の情報を利用し、投影元画像データベース５０９からの投影元画像に対して補正を行う。また、投影位置に関しては、照合手段からの識別結果（位置）と、投影元画像データベース５０９からの投影相対位置から、投影位置を決定する。なお、画像取得部５０４の撮像エリアと画像投影部５１２の投影エリアはキャリブレーションされ、あらかじめそれぞれの画素単位で対応関係が定まっているものとする。 The projection image correction unit 511 uses information indicating the orientation of the surface obtained from the surface detection unit 502 and information on the position and orientation of the subject (identification result (position)) obtained by the collation unit 506. Then, the projection source image from the projection source image database 509 is corrected. As for the projection position, the projection position is determined from the identification result (position) from the collating means and the projection relative position from the projection source image database 509. Note that it is assumed that the imaging area of the image acquisition unit 504 and the projection area of the image projection unit 512 are calibrated and the correspondence is determined in advance for each pixel.

ここで、局所特徴量（SIFT特徴量など）を用いた被写体識別の場合の投影画像補正部５１１の具体的な処理の方法について、説明する。1つの方法としては、照合において幾何学的整合性の検証を行った際に得られるホモグラフィ行列を用いて補正を行う方法がある。ホモグラフィ行列は、画像領域内での識別対象の位置（被写体の正確な位置）と向きを表現している。 Here, a specific processing method of the projection image correction unit 511 in the case of subject identification using a local feature amount (SIFT feature amount or the like) will be described. As one method, there is a method in which correction is performed using a homography matrix obtained when verification of geometric consistency in verification is performed. The homography matrix expresses the position (exact position of the subject) and orientation of the identification target in the image area.

まず、投影元画像の各画素（Ｘ，Ｙ）に対して、それぞれ投影相対位置の分だけ移動させる。その後、ホモグラフィ行列による幾何変換を行い、（Ｘｎ，Ｙｎ）を得る。その後、面の向きを示す情報から得られる、数式７の逆変換である数式８を用いて、ホモグラフィ行列によって幾何変換された投影元画像の各画素の座標（Ｘｎ，Ｙｎ）を２次元画像上の座標（ｘｎ、ｙｎ）に座標変換することで、補正された画像を得ることができる。なお、（ｘｎ、ｙｎ）を求めるための（Ｘｎ，Ｙｎ）は一般に整数にならないが、第２実施形態で例として挙げた最近傍補間、バイリニア補間、バイキュービック補間やB-スプライン補間法などの補間方法を用いればよい。
補正された画像は、画像投影部５１２に出力される。 First, each pixel (X, Y) of the projection source image is moved by the projection relative position. Thereafter, geometric transformation is performed using a homography matrix to obtain (Xn, Yn). Thereafter, the coordinates (Xn, Yn) of each pixel of the projection source image geometrically transformed by the homography matrix using Equation 8 which is the inverse transformation of Equation 7 obtained from the information indicating the orientation of the surface is a two-dimensional image. A corrected image can be obtained by converting the coordinates to the upper coordinates (xn, yn). Note that (Xn, Yn) for obtaining (xn, yn) is generally not an integer, but the nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, B-spline interpolation method, etc., exemplified in the second embodiment. An interpolation method may be used.
The corrected image is output to the image projection unit 512.

画像投影部５１２では、投影画像補正部５１１において補正された画像を投影する。画像投影部５１２は、例えば、プロジェクターなどにより構成される。 The image projection unit 512 projects the image corrected by the projection image correction unit 511. The image projection unit 512 is configured by, for example, a projector.

本実施形態によれば、被写体識別によって被写体に応じた投影画像を選択し、検出された面に対応する投影画像補正をかけた画像を投影できる。例えば、本実施形態を用いると、紙面に印刷された被写体画像を識別し、被写体画像に対応した投影画像を、同じ紙面上に紙面と同じ角度で投影させることが可能となる。 According to the present embodiment, a projection image corresponding to a subject can be selected by subject identification, and an image subjected to projection image correction corresponding to the detected surface can be projected. For example, by using this embodiment, it is possible to identify a subject image printed on a paper surface and project a projection image corresponding to the subject image on the same paper surface at the same angle as the paper surface.

［他の実施形態］
以上、いくつかの実施形態を参照して本発明を説明したが、本発明は上記実施形態に限定されものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。 [Other Embodiments]
As mentioned above, although this invention was demonstrated with reference to some embodiment, this invention is not limited to the said embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. In addition, a system or an apparatus in which different features included in each embodiment are combined in any way is also included in the scope of the present invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する制御プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされる制御プログラム、あるいはその制御プログラムを格納した媒体、その制御プログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、本発明の範疇に含まれる。 In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where a control program that realizes the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a control program installed in the computer, a medium storing the control program, and a WWW (World Wide Web) server that downloads the control program are also included in the scope of the present invention. include.

上述した実施形態の一部または全部は、以下の付記のようにも記載され得るが、本発明を以下のように限定するものではない。 Some or all of the above-described embodiments can be described as in the following supplementary notes, but the present invention is not limited to the following.

（付記１）対象物のデプス画像を取得するデプス画像取得手段と、
前記対象物の画像を取得する画像取得手段と、
前記デプス画像取得手段により取得されたデプス画像から面を検出する面検出手段と、
前記画像取得手段により取得された画像の中から、前記面検出手段で検出された面に対応する画像領域を切り出す画像加工手段と、
前記加工手段で切り出された画像領域から、特徴量を抽出する特徴量抽出手段と、
前記特徴量抽出手段で抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合手段と
を備えることを特徴とする被写体識別装置。
（付記２）前記画像加工手段は、前記検出した面の向きの情報を用いて、前記切り出された画像領域の画像を幾何補正することを特徴とする、付記１に記載の被写体識別装置。
（付記３）前記画像加工手段は、前記検出した面の向きが正面向きになるように、前記切り出された画像領域の画像を幾何補正することを特徴とする、付記１に記載の被写体識別装置。
（付記４）前記照合の結果、一致すると判定された場合、対応する前記照合対象に対応付けられた投影元画像と投影相対位置とをデータベースから選択する投影元画像選択手段と、
前記面の検出により得られる前記面の向きの情報と、前記照合により得られる前記切り出された画像領域における前記対象物の位置および向きと、前記選択された投影相対位置とを用いて、前記投影元画像を補正する補正手段と、
前記補正された画像を投影する投影手段と
を備えることを特徴とする、付記１に記載の被写体識別装置。
（付記５）前記補正手段による前記投影元画像の補正は、前記照合により得られるホモグラフィ行列を用いることを特徴とする、付記４に記載の被写体識別装置。
（付記６）対象物を撮影してデプス画像を取得するデプス画像取得ステップと、
前記対象物を撮影して画像を取得する画像取得ステップと、
前記デプス画像から面を検出する面検出ステップと、
前記面検出ステップにおいて面が検出された場合、前記画像取得ステップにおいて取得した画像から、前記検出された面に対応する画像領域を切り出す画像加工ステップと、
前記画像加工ステップにおいて切り出された画像領域から特徴量を抽出する特徴量抽出ステップと、
前記特徴量抽出ステップで抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合ステップと、
前記照合の結果、特徴量が一致すると判断された場合、前記照合対象となる画像の被写体を識別する情報を出力するステップと
を備えることを特徴とする被写体識別方法。
（付記７）前記特徴量抽出ステップは、前記画像加工ステップにおいて切り出された画像領域のうち、所定のサイズ以上の画像領域から特徴量を抽出することを特徴とする付記６に記載の被写体識別方法。
（付記８）コンピュータを、
対象物のデプス画像を取得するデプス画像取得手段、
前記対象物の画像を取得する画像取得手段、
前記デプス画像取得手段により取得されたデプス画像から面を検出する面検出手段、
前記画像取得手段により取得された画像の中から、前記面検出手段で検出された面に対応する画像領域を切り出す画像加工手段、
前記加工手段で切り出された画像領域から、特徴量を抽出する特徴量抽出手段、
前記特徴量抽出手段で抽出した特徴量と、照合対象となる画像から抽出された特徴量とを照合する照合手段
として機能させるためのプログラム。 (Supplementary note 1) Depth image acquisition means for acquiring a depth image of an object;
Image acquisition means for acquiring an image of the object;
Surface detection means for detecting a surface from the depth image acquired by the depth image acquisition means;
Image processing means for cutting out an image area corresponding to the surface detected by the surface detection means from the images acquired by the image acquisition means;
Feature amount extraction means for extracting a feature amount from the image region cut out by the processing means;
A subject identification apparatus comprising: a collation unit that collates a feature amount extracted by the feature amount extraction unit with a feature amount extracted from an image to be collated.
(Supplementary note 2) The subject identification device according to supplementary note 1, wherein the image processing means geometrically corrects the image of the cut out image region using information on the orientation of the detected surface.
(Supplementary note 3) The subject identification device according to supplementary note 1, wherein the image processing unit geometrically corrects the image of the clipped image area so that the detected orientation of the surface is a front orientation. .
(Supplementary Note 4) When it is determined as a result of the collation, a projection source image selection unit that selects a projection source image and a projection relative position associated with the corresponding collation target from a database;
The projection using the information on the orientation of the surface obtained by the detection of the surface, the position and orientation of the object in the cut-out image region obtained by the collation, and the selected projection relative position. Correction means for correcting the original image;
The subject identification device according to claim 1, further comprising: a projecting unit that projects the corrected image.
(Supplementary Note 5) The subject identification device according to Supplementary Note 4, wherein the correction of the projection source image by the correction unit uses a homography matrix obtained by the collation.
(Appendix 6) Depth image acquisition step of acquiring a depth image by photographing an object;
An image acquisition step of acquiring an image by photographing the object;
A surface detection step of detecting a surface from the depth image;
When a surface is detected in the surface detection step, an image processing step of cutting out an image area corresponding to the detected surface from the image acquired in the image acquisition step;
A feature amount extracting step for extracting a feature amount from the image region cut out in the image processing step;
A collation step for collating the feature quantity extracted in the feature quantity extraction step with the feature quantity extracted from the image to be collated;
And a step of outputting information for identifying a subject of the image to be collated when it is determined that the feature amounts match as a result of the collation.
(Supplementary note 7) The subject identifying method according to supplementary note 6, wherein the feature amount extracting step extracts a feature amount from an image region having a predetermined size or larger among the image regions cut out in the image processing step. .
(Appendix 8)
A depth image acquisition means for acquiring a depth image of the object;
Image acquisition means for acquiring an image of the object;
A surface detecting means for detecting a surface from the depth image acquired by the depth image acquiring means;
Image processing means for cutting out an image area corresponding to the surface detected by the surface detection means from the images acquired by the image acquisition means;
Feature amount extraction means for extracting feature amounts from the image region cut out by the processing means;
A program for functioning as a collating unit that collates a feature amount extracted by the feature amount extracting unit and a feature amount extracted from an image to be collated.

Claims

A depth image acquisition means for acquiring a depth image of the object;
Image acquisition means for acquiring an image of the object;
Surface detection means for detecting a surface from the depth image acquired by the depth image acquisition means;
Image processing means for cutting out an image area corresponding to the surface detected by the surface detection means from the images acquired by the image acquisition means;
Feature amount extraction means for extracting a feature amount from the image region cut out by the image processing means;
A feature amount extracted by the feature extraction means, and matching means for matching the feature quantity extracted from the image being a target for verification,
Ru with a,
The Utsushitai identification device.

The image processing means includes
Using the orientation information of the detected face, you geometric correction the image of the clipped image area,
The subject identification device according to claim 1.

The image processing means includes
As the orientation of the detected face is frontally, you geometric correction the image of the clipped image area,
The subject identification device according to claim 1.

The feature amount extraction means includes:
Of the image area cut out by the image processing means, we extract features from an image area larger than a predetermined size,
Subject identification apparatus according to 請 Motomeko 1.

As a result of the collation, if it is determined that they match, a projection source image selection unit that selects a projection source image and a projection relative position associated with the corresponding collation target from a database;
By using the information of the direction of the surface obtained by the detection of the face, the position and orientation of the object in the extracted image area obtained by said collation, and a front Kito shadow relative position, the projection Correction means for correcting the original image;
Projecting means for projecting the image corrected by the correcting means ;
Ru with a,
The subject identification device according to claim 1.

The correction of the projection original image by the correcting means, Ru using homography matrix obtained by the comparison,
The subject identification device according to claim 5.

A depth image acquisition step of acquiring a depth image of the object;
An image acquisition step of acquiring an image of the object;
A surface detection step of detecting a surface from the depth image acquired in the depth image acquisition step ;
From the acquired image in the previous SL image acquisition step, an image processing step for cutting out an image area corresponding to the detected face by the face detecting step,
The image regions cut out by the image processing step, a feature amount extraction step of extracting a feature value,
A matching step of performing a feature amount extracted by the feature amount extracting step, a matching between the feature amount extracted from the image being a target for verification,
If it is determined as a result of the collation that the feature amounts match, outputting information for identifying the subject of the image to be collated ;
Ru with a,
The Utsushitai identification method.

Computer
A depth image acquisition means for acquiring a depth image of the object;
Image acquisition means for acquiring an image of the object;
A surface detecting means for detecting a surface from the depth image acquired by the depth image acquiring means;
Image processing means for cutting out an image area corresponding to the surface detected by the surface detection means from the images acquired by the image acquisition means;
Feature amount extraction means for extracting a feature amount from the image region cut out by the image processing means;
Checking means for performing a feature amount extracted by the feature extracting unit, a matching between the feature amount extracted from the image being a target for verification,
Program to function as.