WO2017042852A1

WO2017042852A1 - Object recognition appratus, object recognition method and storage medium

Info

Publication number: WO2017042852A1
Application number: PCT/JP2015/004628
Authority: WO
Inventors: Ruihan BAO
Original assignee: Nec Corporation
Priority date: 2015-09-11
Filing date: 2015-09-11
Publication date: 2017-03-16
Also published as: JP2018526753A; JP6544482B2

Abstract

The provided is an object recognition apparatus and the like to improve an accuracy of object recognition. An object recognition apparatus according to an aspect of the present invention includes: extraction means for extracting a feature from an image; matching means for performing matching a first feature extracted from the image with second features extracted from model images representing an object; relation calculation means for calculating, based on the model images, relative camera poses representing geometric relations among the model images; voting means for calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled; clustering means for performing clustering the calibrated votes; and determination means for determining if the image represents the object based on a result of the clustering.

Description

OBJECT RECOGNITION APPRATUS, OBJECT RECOGNITION METHOD AND STORAGE MEDIUM

The present invention is related to a technology of recognizing objects in an image.

Recognizing objects from an image is an important task in computer vision.

Patent Literature PTL 1 discloses an object recognition method of detecting an object represented in a query image. In the object recognition method of the Patent Literature PTL 1, the object represented in the query image is detected by using a similarity score calculated on the basis of query feature vectors extracted from the query image and reference vectors extracted from images each of which is related with an object and which are stored in an image database.

Patent Literature PTL 2 discloses an object recognition apparatus that estimates an appearance of an input image of a three dimensional (3D) object. The Patent Literature PTL 2 generates, as a result of recognition, an appearance image that is similar to the input image by using areas which are extracted as similar areas to the input image from images stored in a database on the basis of a result of voting based on local features of corresponding feature points in feature points extracted from the input image and feature points extracted from the stored images.

[PTL 1] PCT International Application Publication No. WO 2011/021605
[PTL 2] Japanese Unexamined Patent Application Publication No. 2012-83855

Summary

In the method according to Patent Literature PTL 1, only one image is stored in the image database for each object. Therefore, it is difficult to detect an object accurately by the technology of Patent Literature PTL 1 when the query image is taken from a direction different from that of a database image, which is an image stored in the image database, of the same object as that of the query image.

When generating the appearance image, the object recognition apparatus according to Patent Literature PTL 2 extracts an area similar to the input image regardless of whether the object of the extracted area is corresponding to the object of the input image. For example, the object recognition apparatus may extract, as one of the areas used for generating the appearance image, an area of an object having a quite different appearance viewed in a direction different from a direction in which an image including the area is taken. The object recognition apparatus according to Patent Literature PTL 2 does not identify an object corresponding to the object of the input image. Therefore, it is difficult to detect an object accurately by the technology of Patent Literature PTL 2.

One of objects of the present invention is to provide an object recognition apparatus and the like to improve an accuracy of object recognition.

An object recognition apparatus according to an aspect of the present invention includes: extraction means for extracting a feature from an image; matching means for performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object; relation calculation means for calculating, based on the model images, relative camera poses representing geometric relations among the model images; voting means for calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled; clustering means for performing clustering the calibrated votes; and determination means for determining if the image represents the object based on a result of the clustering.

An object recognition method according to an aspect of the present invention includes: extracting a feature from an image; performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object; calculating, based on the model images, relative camera poses representing geometric relations among the model images; calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled; performing clustering the calibrated votes; and determining if the image represents the object based on a result of the clustering.

A computer readable medium according to an aspect of the present invention stores a program causing a computer to operate as: extraction means for extracting a feature from an image; matching means for performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object; relation calculation means for calculating, based on the model images, relative camera poses representing geometric relations among the model images; voting means for calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled; clustering means for performing clustering the calibrated votes; and determination means for determining if the image represents the object based on a result of the clustering.

According to the present invention, it is possible to improve an accuracy of object recognition.

Fig. 1A is a block diagram illustrating a first example of a structure of an object recognition apparatus according to a first related technology of the present invention. Fig. 1B is a block diagram illustrating a second example of a structure of the object recognition apparatus according to the first related technology of the present invention. Fig. 2 is a block diagram illustrating a first example of a structure of an object recognition apparatus according to a second related technology of the present invention. Fig. 3A is a block diagram illustrating a first example of a structure of an object recognition apparatus according to a first exemplary embodiment of the present invention. Fig. 3B is a block diagram illustrating a second example of a structure of the object recognition apparatus according to the first exemplary embodiment of the present invention. Fig. 3C is a block diagram illustrating a third example of a structure of the object recognition apparatus according to the first exemplary embodiment of the present invention. Fig. 4 is a block diagram illustrating an example of a configuration of voting unit according to the first exemplary embodiment of the present invention. Fig. 5 is a block diagram illustrating an example of a configuration of the voting unit according to the first exemplary embodiment of the present invention. Fig. 6 is a flowchart illustrating an example of an operation of the object recognition apparatus according to the first exemplary embodiment of the present invention. Fig. 7A is a block diagram illustrating a first example of a structure of an object recognition apparatus according to a second exemplary embodiment of the present invention. Fig. 7B is a block diagram illustrating a second example of a structure of an object recognition apparatus according to the second exemplary embodiment of the present invention. Fig. 7C is a block diagram illustrating a third example of a structure of an object recognition apparatus according to the second exemplary embodiment of the present invention. Fig. 8 is a block diagram illustrating an example of a configuration of a voting unit according to the second exemplary embodiment of the present invention. Fig. 9 is a block diagram illustrating an example of an alternative configuration of the voting unit according to the second exemplary embodiment of the present invention. Fig. 10 is a flow chart illustrating an operation of the object recognition apparatus according to the second exemplary embodiment of the present invention. Fig. 11 is a block diagram illustrating an example of a structure of an object recognition apparatus according to a third exemplary embodiment of the present invention. Fig. 12 is a block diagram illustrating an example of a structure of a computer which is capable of operating as each of the object recognition apparatuses according to the exemplary embodiments of the present invention. Fig. 13 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the first exemplary embodiment of the present invention. Fig. 14 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the second exemplary embodiment of the present invention. Fig. 15 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the third exemplary embodiment of the present invention.

Hereinafter, an exemplary embodiment of the present invention will be described in detail.

<Related Art>
First, a related art of the present invention is described.
In a two-dimensional (2D) object recognition method that is one of object recognition methods, an object represented by an image (referred to as a "query image") is recognized by, for example, identifying an image similar to the query image among model images (also referred to as "reference images") including an image of an object to be recognized. More specifically, the 2D object recognition may include extracting local features form the query image and the model images, and performing matching the local features extracted from the query image and the local features extracted from each of the model images.

An example of the local features is local features called "Scale-Invariant Feature Transform" (SIFT). SIFT is disclosed by Lowe, David G. "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, Volume 60 Issue 2, November 2004, pp. 91-110 (hereinafter, referred to as "Lowe").

By the matching, feature matches are found. Each of the feature matches is, for example, a set of a local feature extracted from the query image and a local feature extracted from one of the model images. After feature matches are found, geometric verification is carried out using a method such as Hough voting between two images to vote for the relative translation, rotation and scaling change between the query image and a model image in the model images, using feature location, orientation and scales. Hough voting is disclosed by Iryna Gordon and David G. Lowe, "What and where: 3D object recognition with accurate pose", Toward Category-Level Object Recognition, Springer-Verlag, 2006, pp. 67-82 (hereinafter, referred to as "Gordon et al.").

In the 2D object recognition, each of the model images may be an image of a different object. A result of the object recognition is, for example, an image including an area similar to a part of the query image.

Unlike the 2D object recognition described-above, in a 3D object recognition method, object recognition is performed using a plurality of images (model images) around the object. In other words, the plurality of model images represents the object.

A class of methods to handle 3D object recognition is disclosed by Gordon et al. and Qiang Hao et al., "Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition", Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 899-906.
An outline of 3D object recognition method is described below. First, 3D models are generated by applying structure-from-motion (SfM) on the model images. The output of SfM is a set of coordinates of points in three dimensional space (i.e. 3D points, referred to as "point cloud") recovered from the local features in the model images and camera poses of the model images. The camera poses represent relative positions of the model images concerning 3D objects. At the same time, the local features extracted from the model images are assigned to 3D points in the point cloud. When the query image is presented, local features are extracted from the query image and the extracted features are matched to the local features assigned to the point clouds. After feature matches are found by the matching, geometric verification is carried out using a method such as RANdom SAmple Consensus (RANSAC). However, a RANSAC based method often performs relatively slow and may fail to work when a query image includes a noise cluttered background.

As mentioned above, a RANSAC based 3D object recognition method suffers from slow processing speed and low accuracy when a query image includes noise cluttered background. A Hough voting based method is faster and relatively robust to noise and a background, but requires, when dealing with multi-view (i.e. images of the same object taken from various angles), calibration among model images; otherwise the estimated object centers will form different clusters in the query image and it is difficult to detect the object appearing in the query image.

Next, implementations of the above-described related arts are described.

<First Related Example>
Fig. 1A is a block diagram illustrating an example of a structure of an object recognition apparatus 1100 that is an embodiment (i.e. a first related example) of related art of 3D object recognition.

With reference to Fig. 1, the object recognition apparatus 1100 includes an extraction unit 1101, a matching unit 1102, a voting unit 1103, a clustering unit 1104, a determination unit 1105, a model image storage unit 1106, a reception unit 1107, an output unit 1108 and a model storage unit 1110.

The reception unit 1107 receives an image that is a recognition target (referred to as a "query image"), and a plurality of images representing an object (referred to as "model images"). The query image may or may not include an image of the object to be identified. The model images are taken from various angles around the object and the images are used as reference for the recognition purpose.

The reception unit 1107 sends the query image and the model images to the extraction unit 1101. The reception unit 1107 may store the model images in the model image storage unit 1106.

The reception unit 1107 may further receive coordinates of an object center of each of the model images. In this case, an operator of the object recognition apparatus 1100 may indicate the coordinates of the object center of each of the model images by an input device (not illustrated), such as a mouse or a touch panel. The reception unit 1107 may further send the coordinates of the object center of each of the model images to the extraction unit 1101. The reception unit 1107 may further store the coordinates of the object center of each of the model images in the model image storage unit 1106.

The model image storage unit 1106 stores the model images. The model image storage unit 1106 may further store the coordinates of the object center of each of the model images.

The extraction unit 1101 receives the query image, extracts local features from the query image, and outputs the extracted local features. The extraction unit 1101 receives the model images, extracts local features from the model images, and outputs the extracted local features. The extraction unit 1101 may read out the model images from the model image storage unit 1106. The extraction unit 1101 may store the local features extracted from the model images in the model storage unit 1110.

Each of the local features is a local measurement from an image, which includes but is not limited to vectors forming representation of the pixels at and around a location of the image (referred to as a "local descriptor"), a rotation invariant value (referred to as an "orientation") at the location and a scale invariant value (referred to as a "scale") at the location. One implementation of the local feature, including local descriptor, orientation and scale, is SIFT disclosed by Lowe.

The extraction unit 1101 may further read out the coordinates of the object center of each of the model images from the model image storage unit 1106. The extraction unit 1101 further calculates coordinates of an object center on the basis of the model images and/or the extracted local features extracted from each of the model images. For example, the extraction unit 1101 may calculate, as the coordinates of the object center of a model image in the model images, coordinates of a central point of the model image. The extraction unit 1101 may calculate, as the coordinates of the object center of a model image in the model images, a mean of coordinates of locations included in the local features extracted from the model image. The extraction unit 1101 may calculate the coordinates of the object center of a model image in the model images by another method.

The extraction unit 1101 may further send the coordinates of the object center of each of the model images as a part of the local features to the matching unit 1102. The extraction unit 1101 may store the coordinates of the object center of each of the model images in the model storage unit 1110. The extraction unit 1101 may further send the coordinates of the object center of each of the model images as a part of the local features to the voting unit 1103.

The model storage unit 1110 stores the local features extracted from the model images. The model storage unit 1110 further stores the coordinates of the object center of each of the model images.

The matching unit 1102 receives the local features extracted from the query image and the local features extracted from an image in the model images. The matching unit 1102 compares the local features extracted from the query image and the local features extracted from an image in the model images by calculating the similarity of local features between the query image and the image from the model images to generate feature matches on the basis of the calculated similarity. When the local features are represented by vectors, the similarity between the local features may be a vector distance between the local features. The similarity may be defined depending on the local features.

Each of the feature matches indicates two local features having high similarity (i.e. a measurement of the similarity between the two local features indicates higher similarity compared with a preset similarity threshold). One of the two local features is a local feature in the local features extracted from the query image. The other of the two local features is a local feature in the local features extracted from the image in the model images.
The matching unit 1102 may calculate, as the measurement of the similarity between two local features, a vector distance between the local descriptors included in the two local features. Each of the feature matches is represented by identifications of the two local features, by which the two local features are able to be easily identified and retrieved.

The matching unit 1102 outputs a set of the feature matches. The resultant feature matches output from the matching unit 1102 are sent to the voting unit 1103.

The voting unit 1103 receives the set of the feature matches of the query image and one image from the model images and the coordinates of the object centers of the image from the model images. The voting unit 1103 calculates Hough votes including a predicted location of the object centers, scaling changes and rotation. The voting unit 1103 sends the resultant Hough votes to the clustering unit 1104. One way to perform Hough vote calculation is described in Patent Literature PTL 2.

The clustering unit 1104 receives the Hough votes from the voting unit 1103. The clustering unit 1104 performs clustering on the Hough votes on the basis of similarity (e.g. a vector distance between two of the Hough votes) so that the Hough votes that are similar to each other are grouped together. The clustering unit 1104 sends the clustering results to the determination unit 1105. A clustering method used by the voting unit 1103 may be any one of mean-shift, bin voting or any other unsupervised clustering methods. The clustering unit 1104 may extract, from the feature matches, a subset of feature matches belonging to clusters satisfying a certain condition, that is, for example, clusters each of which includes elements (i.e. the Hough votes) the number of which exceeds a predefined threshold. The clustering unit 1104 sends the extracted feature matches (i.e. the subset of feature matches) to the determination unit 1105.

The determination unit 1105 receives the extracted feature matches (i.e. the subset of feature matches). The determination unit 1105 may determine if the object represented by model images is presented in the query image based on the number of feature matches in the subset. The determination unit 1105 outputs, as a result of recognition, a result of determining. The determination unit 1105 may further output an object pose including the object location, the rotation and the scaling change derived from the feature matches. The determination unit 1105 may use an absolute number of the feature matches in order to determine if the object of the model images is presented in the query image. As an alternative, the determination unit 1105 may use a normalized score, by calculating a ratio of the absolute number of the feature matches to a certain normalized factor (for instance, a total number of the feature matches calculated by the matching unit 1102). The determination unit 1105 may output, as the result of recognition, a binary result which indicates whether the object is presented in the query image. The determination unit 1105 may calculate and output a probability number indicating a confidence of the recognition result.

The output unit 1108 outputs the result of recognition from the object recognition apparatus 1100. The output unit 1108 may send the result of recognition to a display device (not illustrated). The display device may display the result of recognition. The output unit 1108 may send the result of recognition to a terminal apparatus (not illustrated) used by an operator of the object recognition apparatus 1100.

The object recognition apparatus 1100 that is an embodiment of the related art works fast and accurate compared to RANSAC based methods since the Hough votes generated from the model images may form clusters in the parametric space. However, when the model images have large variation in the perspective changes, the Hough votes generated from those model images may form clusters that are far apart. Therefore, further calibration for the Hough votes is required; otherwise object recognition results in failure.

Fig. 1B is a block diagram illustrating an example of a structure of an object recognition apparatus 1100B that is another embodiment of related art of 3D object recognition. The object recognition apparatus 1100B is the same as the object recognition apparatus 1100 in Fig. 1A except the following differences.

The object recognition apparatus 1100B illustrated in Fig. 1B includes extraction units 1101 each corresponding to the extraction unit 1101 in Fig. 1A, matching units 1102 each corresponding to the matching unit 1102 in Fig. 1A, voting units 1103 each corresponding to the voting unit 1103 in Fig. 1A, the clustering unit 1104, the determination unit 1105, the reception unit 1107 and the output unit 1108. The extraction units 1101 are able to operate in parallel. The matching units 1102 are able to operate in parallel. The voting units 1103 are able to operate in parallel.

One of the extraction units 1101 receives the query image, extracts the local features from the query image, and sends the local features to each of the matching units 1102. Each of the other extraction units receives a model image in the model images, extracts the local features from the received mode image, and sends the extracted local features to one of the matching units 1102.

Each of the matching units 1102 receives the local features extracted from the query image and the local features extracted from one of the model images, performs feature matching (i.e. compares the local features extracted from the query image and the local features extracted from one of the model images) to generate feature matches, and send the generated local matches to one of the voting units 1103.

Each of the voting units 1103 receives feature matches from one of the matching units 1102, calculate the Hough votes. Each of the voting units 1103 sends the result to the clustering unit 1104.

<Second Related Example>
Fig. 2 is a block diagram illustrating an example of a structure of an object recognition apparatus 1200 that is an alternative embodiment (i.e. a second related example) of related art of 3D object recognition using the technology of Gordon et al. With reference to Fig. 2, the object recognition apparatus 1200 includes the extraction unit 1101, a reconstruction unit 1201, a matching unit 1202, a verification unit 1203, a determination unit 1105, the reception unit 1107 and the output unit 1108. The object recognition apparatus 1200 may further include the model image storage unit 1106 and the model storage unit 1110. Each of the units to which a code that is assigned to a unit illustrated in Fig. 1A is similar to the unit to which the code is assigned except differences described-below.

The extraction unit 1101 sends the local features extracted from the model images to the reconstruction unit 1201.

The reconstruction unit 1201 receives the local features extracted from the model images, performs 3D reconstruction of the object of the model images to generate a 3D model of the object, and sends the reconstructed 3D model to the matching unit 1202. As an example of a 3D reconstruction technology of reconstructing a 3D model of an object represented in model images, structure-from-motion (SfM) is widely used. The resultant 3D model of the objects includes a set of 3D points recovered from 2D points in the model images, the local features including the local descriptors, the scale and the orientation, which are extracted at the location of the 2D points in the model images.

The matching unit 1202 receives the local features extracted from the query image and the 3D model reconstructed from model images. As described above, the 3D model includes the set of the 3D points recovered from the 2D points in the model images, the local features including the local descriptors, the scale and the orientation, which are extracted at the location of the 2D points in the model images. The matching unit 1202 performs feature matching to generate feature matches each including, for instance, an identification of a local feature in the query image and an identification of the matched local feature in the 3D model based on a similarity measurement of local features. The matching unit 1202 may calculate, as the similarity measurement, a vector distance of local descriptors included in local features. The matching unit 1202 sends the generated feature matches to the verification unit 1203.

The verification unit 1203 receives the feature matches. The verification unit 1203 performs geometric verification to extract a correct subset of feature matches, that is, a subset of feature matches that are consistent in a geometry model. The verification unit 1203 may use, as the geometry model, a projection model depicting the geometry relation shape between 3D points and 2D points, which is disclosed in Gordon et al. In order to extract the correct subset of feature matches, the verification unit 1203 may use RANSAC technology along with the projection model. The verification unit 1203 sends the extracted a subset of feature matches to the determination unit 1105.

The object recognition apparatus 1200 works without suffering from calibration issue, but takes time since the required number of iterations for RANSAC is proportional to the inverse of the ratio of the number of inliers (i.e. correct feature matches) to the number of total feature matches. In the case that an object is represented by SfM model, the above-described ratio is usually very low.

<First Exemplary Embodiment>
Next, a first exemplary embodiment according to the present invention is described with reference to drawings.

Fig. 3A is a block diagram illustrating a first example of a structure of an object recognition apparatus according to the first exemplary embodiment of the present invention. With reference to Fig. 3A, the object recognition apparatus 100A includes an extraction unit 101, a matching unit 102, a relation calculation unit 106, a voting unit 103, a clustering unit 104, a determination unit 105, a reception unit 107, and an output unit 108.

Fig. 3B is a block diagram illustrating a second example of a structure of an object recognition apparatus according to the first exemplary embodiment of the present invention. The object recognition apparatus 100B in Fig. 3B includes, in addition to the above-described units included in the object recognition apparatus 100A, a model image storage unit 109, a model storage unit 110 and a relation storage unit 111. In the object recognition apparatus 100B, the reception unit 107 stores the model images in the model image storage unit 109. The model image storage unit 109 stores the model images received and stored by the reception unit 107. The model storage unit 110 stores the local features extracted from the model images by the extraction unit 101. The relation calculation unit 106 stores the calculated relative camera poses in the relation storage unit 111. The relation storage unit 111 stores the relative camera poses calculated and stored by the relation calculation unit 106.

Fig. 3C is a block diagram illustrating a third example of a structure of an object recognition apparatus according to the first exemplary embodiment of the present invention. The object recognition apparatus 100C in Fig. 3C includes extraction units 101 each corresponding to the extraction unit 101 in Fig. 3A and Fig. 3B, and matching units 102 each corresponding to the matching unit 102 in Fig. 3A and Fig. 3B. In the object recognition apparatus 100C, one of the extraction units 101 receives the query image and extracts the local features form the query image. Each of the other extraction units 101 receives a model image in the model images, and extracts the local features from the received model image. Each of the extraction units 101 is able to operate in parallel. Each of the matching units 102 receives the local features extracted from the query image and the local features extracted from a model image in the model images. Each of the matching units performs matching the received local features extracted from the query image and the received local features extracted from the model image. Each of the matching units 102 is able to operate in parallel.

The object recognition apparatus 100A, the object recognition apparatus 100B and the object recognition apparatus 100C are the same except the difference described above. The object recognition apparatus 100B in Fig. 3B of the present exemplary embodiment is mainly described in detail. Detailed descriptions are omitted for the same functions and the same operation of the object recognition apparatus 100B as those of the object recognition apparatus 1100 in the following description.

The reception unit 107 receives the query image and sends the query image to the extraction unit 101. The reception unit 107 receives the model images and stores the model images in the model image storage unit 109. The reception unit 107 may send the model images to the extraction unit 101. The reception unit 107 may also send the model images to the relation calculation unit 106. The query image and the model images are the same as those of the first and second related example.

The model image storage unit 109 stores the model images. The model image storage unit 109 operates similarly as the model image storage unit 1106 according to the first related example.

The extraction unit 101 receives the query image and extracts the local features from the query image. The extraction unit 101 sends the local features extracted from the query image to the matching unit 102. The extraction unit 101 also receives the model images and extracts the local features from each of the model images. The extraction unit 101 may read out the model images from the model image storage unit 109. The extraction unit 101 sends the local features extracted from the model images to the matching unit 102. The extraction unit 101 stores the local features extracted from the model images in the model storage unit 110. The extraction unit 101 operates similarly to the extraction unit 1101 according to the first related example.

The model storage unit 110 stores the local features extracted from the model images. The model storage unit 110 operates similarly as the model storage unit 1110 according to the first related example.

The matching unit 102 receives the local features extracted from the query images and the local features extracted from each of the model images. The matching unit 102 may read out the local features extracted from the model images. The matching unit 102 matches the local features extracted from the query images and the local features extracted from each of the model images to generate the feature matches for each set of the query image and one of the model images. The matching unit 102 sends the feature matches to the voting unit 103. The matching unit 102 operates similarly as the matching unit 1102 according to the first related example.

The relation calculation unit 106 receives the model images. The relation calculation unit 106 calculates relative camera poses of the model images. The relation calculation unit 106 may store the calculated relative camera poses in the relation storage unit 110. The relation calculation unit 106 may be directly connected with the voting unit 103, and may send the calculated relative camera poses to the voting unit 103.

The relative camera poses include relative geometric relationship among the model images, such as transformation modeled by Homography, Affine or similarity relations, or camera pose based on epipolar geometry. The relative geometric relationship may be represented by each of relative geometric transformations of the model images. A relative geometric transformation, in the relative geometric transformations, for a model image in the model images may be a transformation transforming coordinates of each pixel of the model image to coordinates of a pixel of a reference image.

The relation calculation unit 106 may select the reference image from the model images. In order to calculate the relative camera poses, the relation calculation unit 106 may select an image from the model images as the reference image, and then calculate each of relative geometric transformations each transforming one of the model images other than the reference image to the reference image by using either Least Square method or RANSAC method.

The relation calculation unit 106 may calculate the relative camera poses by performing structure-from-motion. The relation calculation unit 106 may calculate transformations each transforming a coordinate system to an image coordinate systems of one of the model images, and calculate the relative camera pose by using the calculated transformations.

The relation calculation unit 106 may use, as the relative camera poses, the location, the rotation and the scale of a camera, which are included in the local features, at the time of photo shooting of each of the model images.

When coordinates of a pixel of an image is represented by 3D vector as in the field of projective geometry, each of the relative camera poses is represented by a 3 x 3 matrix. The relation calculation unit 106 may calculate a matrix representing a relative camera pose for each of the model images except the reference image. The relative camera pose for the reference image is represented by an identity matrix.

The relation calculation unit 106 may store the relative camera pose in the relation storage unit 111. In this case, the voting unit 103 may read out the relative camera pose from the relation storage unit 111.

The relation storage unit 111 stores the relative camera pose stored by the relation calculation unit 106.

The voting unit 103 receives the feature matches from the matching unit 102 and the relative camera pose. The voting unit 103 extracts a subset of feature matches that are consistent in voting space under the relative camera pose. The voting unit 103 sends the extracted subset of feature matches to the clustering unit 104. The purpose of the voting unit 103 is to perform Hough voting further functioning as geometric verification by taking geometric relationship among the model images into consideration so that Hough votes from different images are calibrated geometrically.

Fig 4 is a block diagram illustrating an example of a configuration of the voting unit 103 according to the present exemplary embodiment.
Referring to Fig.4, the voting unit 103 includes a vote calculation unit 1031 and a vote calibration unit 1032. A detailed explanation of the voting unit 103 is described below.

The vote calculation unit 1031 of the voting unit 103 receives the feature matches. The vote calculation unit 1031 calculates relative vote for each of the feature matches by using the scale, the orientation and the coordinates of the local features. The vote calculation unit 1031 may calculate the relative vote by using the scaling change (s₁₂), the rotation (q₁₂) and the translation (x₁₂ and y₁₂) between two images (i.e. the query image and one of the model images) according to the following equations:

Here, s₁ and s₂ are the scales of local features of the two images, q₁ and q₂ are the orientations of local features of the two images, and [x₁, y₁] and [x₂,y₂] are the 2D coordinates of local features of the two images. R(q₁₂) is a rotation matrix for q₁₂. C is a constant vector set in advance to offset the translation. The vote calculation unit 1031 calculates a relative vote including four elements (s₁₂, q₁₂, x₁₂ and y₁₂) for each of the feature matches. The vote calculation unit 1031 sends the relative votes and the relative camera pose to the vote calibration unit 1032.

The vote calibration unit 1032 of the voting unit 103 receives the relative votes of the feature matches and the relative camera pose of the model images. The vote calibration unit 1032 calculates a calibrated vote for each of the feature match by incorporating geometric relations among the model images, and sends the calibrated vote to the clustering unit 104. The vote calibration unit 1032 may calculate the calibration vote according to the following steps for each of the model images.

Step 0: Selecting a model image from the model images.

Step 1: Selecting a relative vote from the relative votes of the selected model images, and converting the selected relative vote to a similarity transformation matrix for a convenience of calculation. The similarity transformation matrix S is represented by the following equation:

Here, the scaling change (s₁₂), the rotation (q₁₂) and the translation (x₁₂ and y₁₂) are calculated by the vote calculation unit 1031.

Step 2: Calculating a matrix H representing a calibrated vote for the selected relative vote of the selected model image by a matrix multiplication according to the following equation:

where the relative camera pose of the model image is referred to as P. The calibrated vote is generated by excluding an effect due to a variation of relative camera pose from the relative vote.

Step 3: Iterating the processing from Step 1 to Step 2 until a calibrated vote is calculated for each of the relative votes of the selected model image.

Step 4: Iterating the processing from Step 0 to Step 3 until each of the model images is selected.

Step 5: Sending the calibrated votes calculated in the processing from Step 0 to Step 4 to the clustering unit 104.

The vote calibration unit 1032 may also further convert the calibrated votes to equivalent representation. For instance, the vote calibration unit 1032 may convert each of the calibrated votes to a form of [R|t], where R is a 3 by 3 rotation matrix, t is a 3 by 1 vector representing translation and [R|t] is a 3 by 4 matrix. The vote calibration unit 1032 may convert the rotation matrix including 9 elements to a quaternion form which includes 4 elements. Furthermore, the vote calibration unit 1032 may convert the calibrated votes by simply dropping one or more elements in the calibrated votes (or equivalent quaternion representation) according to a preset rule. For instance, when an original calibrated vote includes 12 elements, the vote calibration unit 1032 may generate, by using only a subset of those elements of the original calibrated vote, a calibrated vote for clustering by the clustering unit 104.

The clustering unit 104 receives the calibrated votes from the voting unit 103. The clustering unit 104 performs clustering on the received calibrated votes to generate groups (i.e. clusters) of the calibrated votes so that the calibrated votes included in each of the groups are similar among them. Each of the calibrated votes has four elements similarly to the relative votes described above, and may be represented by a vector having the four elements. The matrices representing the calibrated votes may be in a form of a vector having four elements similarly to the relative votes described above. In this case, the similarity of two of the calibrated votes may be a vector distance between vectors representing the two of the calibrated votes. The similarity of the two calibrated votes may be a distance between vectors that are generated by transforming the same vector (e.g. [1, 0, 0]^T) by the matrices representing the two calibrated votes.

The clustering unit 104 may extract, from the calibrated votes, a subset of calibrated votes belonging to clusters satisfying a certain condition, that is, for example, clusters each of which includes elements (i.e. the calibrated votes) the number of which exceeds a predefined threshold. The clustering unit 104 sends the extracted calibrated votes (i.e. the subset of calibrated votes) to the determination unit 105.

The determination unit 105 receives the extracted calibrated votes (i.e. the subset of calibrated votes). The determination unit 105 may determine if the object represented by model images is presented in the query image based on the number of calibrated votes in the subset. The determination unit 105 outputs, as a result of recognition, a result of determining. The determination unit 105 may output an object pose including the object location, the rotation and the scaling change derived from the feature matches related with the extracted calibrated votes. The determination unit 105 may use an absolute number of the calibrated votes in order to determine if the object of the model images is presented in the query image. As an alternative, the determination unit 105 may use a normalized score, by calculating a ratio of the absolute number of the calibrated votes to a certain normalized factor (for instance, a total number of the calibrated votes calculated by the voting unit 103). The determination unit 105 may output, as the result of recognition, a binary result which indicates whether the object is presented in the query image. The determination unit 105 may calculate and output a probability number indicating a confidence of the recognition result.

The output unit 108 outputs the result of recognition from the object recognition apparatus 100B. The output unit 108 may send the result of recognition to a display device (not illustrated). The display device may display the result of recognition. The output unit 108 may send the result of recognition to a terminal apparatus (not illustrated) used by an operator of the object recognition apparatus 100B.

Fig.5 is a block diagram of an example of a configuration of a voting unit 103A that is an example of modification of the voting unit 103 of the present exemplary embodiment. The voting unit 103A includes the vote calculation unit 1031, a second clustering unit 1033 and the vote calibration unit 1032. The second clustering unit 1033 is connected between the vote calculation unit 1031 and the vote calibration unit 1032. The second clustering unit 1033 performs clustering on the relative votes calculated by the vote calculation unit 1031 to generate clusters of relative votes. The second clustering unit 1033 selects, from the generated clusters, clusters including the relative votes whose number is more than or equal to a threshold experimentally set in advance so that clusters including false feature matches are not selected. In other words, the second clustering unit 1033 identifies an outlier cluster (i.e. a cluster including the relative votes whose number is less than the threshold), and removes an outlier (i.e. each of the relative votes included in the outlier cluster) from the relative votes calculated by the vote calculation unit 1031. The second clustering unit 1033 sends subsets of the relative votes (i.e. the relative votes included in the selected clusters) to the vote calibration unit 1032. The vote calibration unit 1032 receives the relative votes from the second clustering unit 1033 and operates in the same way as the vote calibration unit 1032 in Fig. 4. According to the configuration shown in Fig. 5, false feature matches can be effectively removed.

The second clustering unit 1033 is used for utilizing a view point constraint for each of the model images so that false feature matches may be removed by performing clustering on the relative votes. This will improve the accuracy and speed at the same time.

Fig.6 is a flow chart illustrating an example of an operation of the object recognition apparatus 100B. Before the operation illustrated in Fig. 6, the reception unit 107 receives the model images. The operation illustrated in Fig. 6 starts when the reception unit 107 receives the query image.

The extraction unit 101 extracts the local features from the query image (Step S101). The local features may be extracted from the model images in advance. The extraction unit 101 may extract the local features from the model images in Step S101. The matching unit 102 matches the local features extracted from the query image and the local features extracted from each of the model images by, for example, comparing vector distances between local descriptors included in matched local features (Step S102). The voting unit 103 (more specifically, the vote calculation unit 1031 of the voting unit 103) calculates the relative votes based on the feature matches (Step S103). The voting unit 103 (more specifically, the vote calibration unit 1032 of the voting unit 103) calculates the calibrated votes by using the relative votes and the relative camera poses (Step S104). The clustering unit 104 performs clustering on the calibrated votes to detect possible location of an object within an image (Step S105). The determination unit 105 determines if the query image includes an image of the object represented by the model images on the basis of the result of clustering (Step S106). Then the output unit 108 outputs the result of determining by the determination unit 105.

In the present exemplary embodiment, the voting unit 103 (more specifically, the vote calibration unit 1032) calibrates the relative votes (i.e. calculates the calibrated votes), so that correct feature matches form a single cluster in a parametric space. Therefore, an accuracy of object recognition improves according to the present exemplary embodiment.

<Second Exemplary Embodiment>
Next, an object recognition apparatus according to a second exemplary embodiment of the present invention is described with reference to drawings.

Fig.7A is a block diagram illustrating a first example of a structure of an object recognition apparatus according to the second exemplary embodiment of the present invention. With reference to Fig.7A, the object recognition apparatus 200A includes an extraction unit 101, a reconstruction unit 201, a matching unit 202, a relation calculation unit 106, a voting unit 203, a clustering unit 104, a determination unit 105, a reception unit 107 and an output unit 108.

The extraction unit 101 in Fig. 7A sends the model images to the reconstruction unit 201.

Fig.7B is a block diagram illustrating a second example of a structure of an object recognition apparatus according to the second exemplary embodiment of the present invention. The object recognition apparatus 200B in Fig. 7B further includes a model image storage unit 109, a model storage unit 110 and a relation storage unit 111. The model image storage unit 109, the model storage unit 110 and the relation storage unit 111 in Fig 7B are the same as those in Fig. 3B.

The reception unit 107 of the object recognition apparatus 200B stores the model images in the model image storage unit 109. The extraction unit 101 of the object recognition apparatus 200B reads out the model images from the model image storage unit 109. The extraction unit 101 of the object recognition apparatus 200B stores the local features extracted from the model images in the model storage unit 110. The relation calculation unit 106 of the object recognition apparatus 200B reads out the model images from the model image storage unit 109. The relation calculation unit 106 of the object recognition apparatus 200B stores the relative camera poses in the relation storage unit 111.

Fig.7C is a block diagram illustrating a third example of a structure of an object recognition apparatus according to the second exemplary embodiment of the present invention. The object recognition apparatus 200C in Fig. 7C includes extraction units 101. The reception unit 107 sends the query image to one of the extraction units 101. The reception unit 107 sends each of the model images to one of the other extraction units 101. The extraction unit 101 of the object recognition apparatus 200C is able to operate in parallel.

The object recognition apparatus 200A, the object recognition apparatus 200B and the object recognition apparatus 200C are the same except the difference described above. In the following, the object recognition apparatus 200B is mainly described.

The extraction unit 101, the clustering unit 104, the determination unit 105, the relation calculation unit 106 and the output unit 108 are the same as those of the object recognition apparatus according to the first exemplary embodiment of the present invention, except the following difference. Detailed description of the above-described units is omitted in the following.

The reconstruction unit 201 receives the local features extracted from the model images. The reconstruction unit 201 may read out the local features from the model storage unit 110. The reconstruction unit 201 performs 3D reconstruction of the object of the model images to generate a 3D model of the object, and sends the reconstructed 3D model to the matching unit 202. The reconstruction unit 201 operates in the same way as the reconstruction unit 1201 of the second related example described above. Similarly to the reconstruction unit 1201 of the second related example, the reconstruction unit 201 generates the 3D model including the set of the 3D points recovered from the 2D points in the model images, and the local features including the local descriptors, the scale and the orientation, which are extracted at location of the 2D points in the model images.

The matching unit 202 receives the local features extracted from the query image and the 3D model reconstructed from model images. As described above, the 3D model includes the set of the 3D points recovered from the 2D points in the model images, the local features including the local descriptors, the scale and the orientation, The matching unit 202 according to the present exemplary embodiment operates as the same way as the matching unit 1202 of the second related example. The matching unit 202 sends the generated feature matches to the voting unit 203.

The voting unit 203 receives the feature matches from the matching unit 202. The voting unit 203 receives the relative camera poses from the relation calculation unit 106. The voting unit 203 generates the relative vote for each set of the object translation, the rotation and the scaling change. The voting unit 203 calibrates the relative vote by using the relative camera pose. The voting unit 203 sends the calibrated votes to the clustering unit 104.

Fig. 8 is a block diagram illustrating an example of a configuration of the voting unit 203 according to the present exemplary embodiment. With reference to Fig.8, the common voting unit 203 includes a vote calculation unit 2031 and a vote calibration unit 2032.

The vote calculation unit 2031 receives the feature matches from the matching unit 202. The vote calculation unit 2031 calculates a relative vote for each set of the translation, the scale change and the rotation by using the local features extracted the query image and the local features extracted from the model images. The vote calculation unit 2031 calculates the translation, the scale changes and the rotation according to the equations in Math. 1, Math. 2 and Math. 3. As described above, the reconstructed 3D model includes 3D points. For a 3D point in the 3D points in the 3D model, the local features may be extracted from more than one of the model images.

When the local features for a 3D point are extracted from more than one of the model images, the vote calculation unit 2031 may select, as the local features for the 3D point, the local features extracted from one of the model images from which the local features are extracted for the 3D point. The method of selecting the local features is not limited. The vote calculation unit 2031 may compose, as the local features for the 3D point, local features by using the local features extracted from the model images for the 3D point. The composed local features may be average values of the local features extracted from the model images for the 3D point. The composed local features may be a normalized combination value of the local features extracted from the model images for the 3D point.

The vote calibration unit 2032 operates in the same way as the vote calibration unit 1032 according to the first exemplary embodiment.

Fig.9 is a block diagram illustrating an example of an alternative configuration of the voting unit according to the present exemplary embodiment. The voting unit 203A in Fig 9 is an example of modification of the voting unit 203 in Fig. 8. The voting unit 203A in Fig. 9 includes the vote calculation unit 2031, a second clustering unit 2033 and the vote calibration unit 2032. The second clustering unit 2033 is connected between the vote calculation unit 2031 and the vote calibration unit 2032. The second clustering unit 2033 performs clustering on the relative votes calculated by the vote calculation unit 2031 to generate clusters of relative votes, and selects, from the generated clusters, clusters including the relative votes whose number is more than a threshold experimentally set in advance so that clusters including false feature matches are not selected. The second clustering unit 2033 sends subsets of relative votes (i.e. the relative votes included in the selected clusters) to the vote calibration unit 2032. The vote calibration unit 2032 receives the relative votes from the second clustering unit 2033 and operates in the same way as the vote calibration unit 1032 according to the first exemplary embodiment. According to the configuration shown in Fig. 9, false feature matches can be effectively removed.

The second clustering unit 2033 is used for utilizing a view point constraint for each of the model images so that false feature matches may be removed by performing clustering on the relative votes. This will improve the accuracy and speed at the same time.

The clustering unit 104, the determination unit 105 and the output unit 108 operate in the same way as the clustering unit 104, the determination unit 105 and the output unit 108 according to the first exemplary embodiment, respectively. The detailed descriptions of the clustering unit 104, the determination unit 105 and the output unit 108 are omitted.

Fig.10 is a flow chart illustrating an operation of the object recognition apparatus 200B according to the second exemplary embodiment of the present invention. Before the operation illustrated in Fig. 10, the reception unit 107 receives the model images. The operation illustrated in Fig. 10 starts when the reception unit 107 receives the query image.

According to Fig. 10, the extraction unit 101 extracts the local features from the query image (Step S101). The local features may be extracted from the model images in advance. The extraction unit 101 may extract the local features from the model images in Step S101. The reconstruction unit 201 reconstructs the 3D model based on the local features extracted from the model images (Step S201). The reconstruction unit 201 may extract the 3D model in advance. In this case, the reconstruction unit 201 does not execute processing of Step S201 in Fig. 10. The matching unit 202 matches (i.e. performs matching) the local features extracted from the query image and the local features extracted from a model image in the model images (Step S102). The local features extracted from the model image in the model images are included in the 3D model. The matching unit 202 repeats the matching until the local features of each of the model images are matched with the local features extracted from the query image. The voting unit 203 (more specifically, the vote calculation unit 2031 of the voting unit 203) calculates the relative votes based on the feature matches that are a result of the matching (Step S103). The voting unit 203 (more specifically, the vote calibration unit 2032 of the voting unit 203) calibrates the relative votes to generate the calibrated votes (i.e. calculates the calibrated votes based on the relative votes) (Step S104). The clustering unit 104 performs clustering on the calibrated votes (Step S105). The determination unit 105 determines if the query image includes an image of the object represented by the model images on the basis of a result of the clustering (Step S106). Then the output unit 108 outputs the result of determining by the determination unit 105.

In the present exemplary embodiment, the voting unit 203 (more specifically, the vote calibration unit 2032) calibrates the relative votes (i.e. calculates the calibrated votes), so that correct feature matches form a single cluster in a parametric space. Therefore, an accuracy of object recognition improves according to the present exemplary embodiment. The voting unit 203 works much faster compared with processing by a 2D-3D RANSAC based method, because the non-iterative common voting method used by the voting unit 203 works much faster compared with a 2D-3D RANSAC based method. According to the present exemplary embodiment, it is possible to recover camera pose by using a result of the feature matches between 2D points from the query image and 3D points from the 3D model, because the reconstruction unit 201 reconstructs the 3D model and the matching unit 202 execute matching the local features extracted from the query image and the local features extracted from the model images.

<Third Exemplary Embodiment>
Next, a third exemplary embodiment of the present invention is described in detail.

Fig. 11 is a block diagram illustrating an example of a structure of an object recognition apparatus according to the third exemplary embodiment of the present invention. According to Fig. 11, the object recognition apparatus 300 of the present invention includes an extraction unit 101, a matching unit 102, a voting unit 103, a clustering unit 104, a determination unit 105 and a relation calculation unit 106.

The extraction unit 101 extracts a first feature that is a feature (i.e. the local features described above) from an image (i.e. the query image described above). The matching unit 102 performs matching the feature extracted from the image with second features that is features (each corresponding to the local features described above) extracted from model images that are images representing an object. The relation calculation unit 106 calculates, based on the model images, relative camera poses representing geometric relations among the model images. The voting unit 103 calculates calibrated votes based on a result of the matching and the relative camera poses. The calibrated votes each represent a calibrated geometric relation between the first feature and a second feature of the second features. The calibrated geometric relation is a geometric relation from which an effect of the relative camera poses is canceled. The clustering unit 104 performs clustering the calibrated votes. The determination unit 105 determines if the image represents the object based on a result of the clustering.

The present exemplary embodiment has the same effect as that of the first exemplary embodiment. The reason for the effect of the present exemplary embodiment is the same as that of the first exemplary embodiment.

<Other Exemplary Embodiments>
Each of the object recognition apparatuses according to the exemplary embodiments of the present invention may be implemented by circuitry such as dedicated hardware (e.g. a circuit or circuits), a computer including a processor and a memory, or a combination of the dedicated hardware and the computer.

The Fig. 12 is a block diagram illustrating an example of a structure of a computer which is capable of operating as each of the object recognition apparatuses according to the exemplary embodiments of the present invention.

According to Fig. 12, a computer 1000 in Fig. 12 includes a processor 1001, a memory 1002, a storage device 1003, and an I/O (Input/Output) interface 1004. The computer 1000 is able to access a storage medium 1005. The memory 1002 and the storage device 1003 are able to be implemented with such as a RAM (Random Access Memory) or a hard disk drive. The storage medium 1005 may be, for example, a RAM, a storage device such as a hard disk drive, a ROM (Read Only Memory), a portable recording medium or the like. The storage device 1003 may function as the storage medium 1005. The processor 1001 is able to read data and a program from the memory 1002 and the storage device 1003, and to write data and a program into the memory 1002 and the storage device 1003. The processor 1001 is able to access the input device (not illustrated), an apparatus providing the query image and the model images, an apparatus displaying the result of the determination through the I/O interface 1004. The processor 1001 is able to access the storage medium 1005. The storage medium 1005 stores a program causing the computer 1000 to operate as the object recognition apparatus according to any one of the exemplary embodiments of the present invention.

The processor 1001 loads the program stored in the storage medium 1005 in the memory 1002. The processor 1001 operates as the object recognition apparatus according to any one of the exemplary embodiment of the present invention by executing the program stored in the memory 1002. .

The extraction unit 101, the matching unit 102, the voting unit 103, the clustering unit 104, the determination unit 105, the relation calculation unit 106, the reception unit 107, the output unit 108, the reconstruction unit 201, the matching unit 202 and the voting unit 203 are able to be implemented with the processor 1001 controlled by the above-described program read out from the storage medium 1005 and loaded in the memory 1002.

The model image storage unit 109, the model storage unit 110 and the relation storage unit 111 are able to be implemented with the memory 1002 and/or the storage device 1003 such as a hard disk drive.

As described above at least one of the extraction unit 101, the matching unit 102, the voting unit 103, the clustering unit 104, the determination unit 105, the relation calculation unit 106, the reception unit 107, the output unit 108, the reconstruction unit 201, the matching unit 202, the voting unit 203, the model image storage unit 109, the model storage unit 110 and the relation storage unit 111 is able to be implemented with dedicated hardware.

Any one or more of units included in each of the exemplary embodiment of the present invention may be implemented as a dedicated hardware (e.g. circuitry). Any one or more of the units included in each of the exemplary embodiment of the present invention may be implemented using a computer including a memory in which a program is loaded and a processor controlled by the program loaded in the memory.

Fig. 13 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the first exemplary embodiment of the present invention. According to Fig. 13, the object recognition apparatus 100B is implemented by including an extraction circuit 2101, a matching circuit 2102, a voting circuit 2103, a clustering circuit 2104, a determination circuit 2105, a relation calculation circuit 2106, a reception circuit 2107, an output circuit 2108, a model image storage device 2109, a model storage device 2110 and a relation storage device 2111.

The extraction circuit 2101, the matching circuit 2102, the voting circuit 2103, the clustering circuit 2104, the determination circuit 2105, the relation calculation circuit 2106, the reception circuit 2107, the output circuit 2108, the model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented as a circuit or a plurality of circuits. The extraction circuit 2101, the matching circuit 2102, the voting circuit 2103, the clustering circuit 2104, the determination circuit 2105, the relation calculation circuit 2106, the reception circuit 2107, the output circuit 2108, the model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented in one apparatus or a plurality of apparatuses.

The extraction circuit 2101 operates as the extraction unit 101. The matching circuit 2102 operates as the matching unit 102. The voting unit 2103 operates as the voting unit 103. The clustering unit 2104 operates as the clustering unit 104. The determination circuit 2105 operates as the determination unit 105. The relation calculation circuit 2106 operates as the relation calculation unit 106. The reception circuit 2107 operates as the reception unit 107. The output circuit 2108 operates as the output unit 108. The model image storage device 2109 operates as the model image storage unit 109. The model storage device 2110 operates as the model storage unit 110. The relation storage device 2111 operates as the relation storage unit 111. The model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented using a storage device such as a hard disk drive. The model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented using memory circuits.

Fig. 14 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the second exemplary embodiment of the present invention. According to Fig. 14, the object recognition apparatus 200B is implemented by including an extraction circuit 2101, a reconstruction circuit 2201, a matching circuit 2202, a voting circuit 2203, a clustering circuit 2104, a determination circuit 2105, a relation calculation circuit 2106, a reception circuit 2107, an output circuit 2108, a model image storage device 2109, a model storage device 2110 and a relation storage device 2111.

The extraction circuit 2101, the reconstruction circuit 2201, the matching circuit 2202, the voting circuit 2203, the clustering circuit 2104, the determination circuit 2105, the relation calculation circuit 2106, the reception circuit 2107, the output circuit 2108, the model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented as a circuit or a plurality of circuits. The extraction circuit 2101, the reconstruction circuit 2201, the matching circuit 2202, the voting circuit 2203, the clustering circuit 2104, the determination circuit 2105, the relation calculation circuit 2106, the reception circuit 2107, the output circuit 2108, the model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented in one apparatus or a plurality of apparatuses.

The extraction circuit 2101 operates as the extraction unit 101. The reconstruction circuit 2201 operates as the reconstruction unit 201. The matching circuit 2202 operates as the matching unit 202. The voting circuit 2203 operates as the voting unit 203. The clustering circuit 2104 operates as the clustering unit 104. The determination circuit 2105 operates as the determination unit 105. The relation calculation circuit 2106 operates as the relation calculation unit 106. The reception circuit 2107 operates as the reception unit 107. The output circuit 2108 operates as the output unit 108. The model image storage device 2109 operates as the model image storage unit 109. The model storage device 2110 operates as the model storage unit 110. The relation storage device 2111 operates as the relation storage unit 111. The model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented using a storage device such as a hard disk drive. The model image storage device 2109, the model storage device 2110 and the relation storage device 2111 may be implemented using memory circuits.

Fig. 15 is a block diagram illustrating an example of a structure of the object recognition apparatus according to the third exemplary embodiment of the present invention. According to Fig. 15, the object recognition apparatus 300 is implemented by including an extraction circuit 2101, a matching circuit 2102, a voting circuit 2103, a clustering circuit 2104, a determination circuit 2105 and a relation calculation circuit 2106.

The extraction circuit 2101, the matching circuit 2102, the voting circuit 2103, the clustering circuit 2104, the determination circuit 2105 and the relation calculation circuit 2106 may be implemented as a circuit or a plurality of circuits. The extraction circuit 2101, the matching circuit 2102, the voting circuit 2103, the clustering circuit 2104, the determination circuit 2105 and the relation calculation circuit 2106 may be implemented in one apparatus or a plurality of apparatuses.

The extraction circuit 2101 operates as the extraction unit 101. The matching circuit 2102 operates as the matching unit 102. The voting unit 2103 operates as the voting unit 103. The clustering unit 2104 operates as the clustering unit 104. The determination circuit 2105 operates as the determination unit 105. The relation calculation circuit 2106 operates as the relation calculation unit 106.

While the invention has been particularly shown and described with reference to exemplary embodiments thereof, the invention is not limited to these embodiments. It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the claims.

100A object recognition apparatus
100B object recognition apparatus
100C object recognition apparatus
101 extraction unit
102 matching unit
103 voting unit
103A voting unit
104 clustering unit
105 determination unit
106 relation calculation unit
107 reception unit
108 output unit
109 model image storage unit
110 model storage unit
111 relation storage unit
200A object recognition apparatus
200B object recognition apparatus
200C object recognition apparatus
201 reconstruction unit
202 matching unit
203 voting unit
203A voting unit
300 object recognition apparatus
1000 computer
1001 processor
1002 memory
1003 storage device
1004 I/O interface
1005 storage medium
1031 vote calculation unit
1032 vote calibration unit
1033 second clustering unit
1100 object recognition apparatus
1101 extraction unit
1102 matching unit
1103 voting unit
1104 clustering unit
1105 determination unit
1106 model image storage unit
1107 reception unit
1108 output unit
1110 model storage unit
1200 object recognition apparatus
1201 reconstruction unit
1202 matching unit
1203 voting unit
2031 vote calculation circuit
2032 vote calibration circuit
2033 second clustering circuit
2101 extraction circuit
2102 matching circuit
2103 voting circuit
2104 clustering circuit
2105 determination circuit
2106 relation calculation circuit
2107 reception circuit
2108 output circuit
2109 model image storage device
2110 model storage device
2111 relation storage device
2201 reconstruction circuit
2202 matching circuit
2203 voting circuit

Claims

An object recognition apparatus comprising:
extraction means for extracting a feature from an image;
matching means for performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object;
relation calculation means for calculating, based on the model images, relative camera poses representing geometric relations among the model images;
voting means for calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled;
clustering means for performing clustering the calibrated votes; and
determination means for determining if the image represents the object based on a result of the clustering.
The object recognition apparatus according to claim 1, further comprising:
reconstruction means for reconstructing a three dimensional model based on the model images, the three dimensional model including the second features at points in the model images, the points relating to three dimensional points whose three dimensional coordinates are reconstructed, wherein
the matching means performs matching the first feature with the second features in the three dimensional model.
The object recognition apparatus according to claim 1 or 2, wherein
the voting means calculates relative votes each representing a geometric relation between the first feature and the second feature of the second features, and calculates the calibrated votes based on the relative votes and the relative camera poses.
The object recognition apparatus according to claim 3, wherein
the voting means further performs clustering on the relative votes to remove an outlier of the relative votes, and calculates the calibrated votes based on the relative votes from which the outlier is removed.
An object recognition method comprising:
extracting a feature from an image;
performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object;
calculating, based on the model images, relative camera poses representing geometric relations among the model images;
calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled;
performing clustering the calibrated votes; and
determining if the image represents the object based on a result of the clustering.
The object recognition method according to claim 5, further comprising:
reconstructing a three dimensional model based on the model images, the three dimensional model including the second features at points in the model images, the points relating to three dimensional points whose three dimensional coordinates are reconstructed, and
performing matching the first feature with the second features in the three dimensional model.
The object recognition method according to claim 5 or 6, wherein
the calculating calibrated votes includes calculating relative votes each representing a geometric relation between the first feature and the second feature of the second features, and calculating the calibrated votes based on the relative votes and the relative camera poses.
A computer readable medium storing a program causing a computer to operate as:
extraction means for extracting a feature from an image;
matching means for performing matching a first feature that is the feature extracted from the image with second features that are features extracted from model images being images representing an object;
relation calculation means for calculating, based on the model images, relative camera poses representing geometric relations among the model images;
voting means for calculating calibrated votes based on a result of the matching and the relative camera poses, the calibrated votes each representing a calibrated geometric relation between the first feature and a second feature of the second features, the calibrated geometric relation being a geometric relation from which an effect of the relative camera poses is canceled;
clustering means for performing clustering the calibrated votes; and
determination means for determining if the image represents the object based on a result of the clustering.
The computer readable medium according to claim 8, storing the program causing a computer to operate as:
reconstruction means for reconstructing a three dimensional model based on the model images, the three dimensional model including the second features at points in the model images, the points relating to three dimensional points whose three dimensional coordinates are reconstructed, wherein
the matching means performs matching the first feature with the second features in the three dimensional model.
The computer readable medium according to claim 8 or 9, wherein
the voting means calculates relative votes each representing a geometric relation between the first feature and the second feature of the second features, and calculates the calibrated votes based on the relative votes and the relative camera poses.