WO2009069071A1 - Method and system for three-dimensional object recognition - Google Patents

Method and system for three-dimensional object recognition Download PDF

Info

Publication number
WO2009069071A1
WO2009069071A1 PCT/IB2008/054935 IB2008054935W WO2009069071A1 WO 2009069071 A1 WO2009069071 A1 WO 2009069071A1 IB 2008054935 W IB2008054935 W IB 2008054935W WO 2009069071 A1 WO2009069071 A1 WO 2009069071A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
corner
feature
dimensional feature
camera
Prior art date
Application number
PCT/IB2008/054935
Other languages
French (fr)
Inventor
Richard P. Kleihorst
Anthony Martiniere
Serafim Efstratiadis
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2009069071A1 publication Critical patent/WO2009069071A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Definitions

  • the invention relates to a video camera system, in particular to a method and a system for three-dimensional object recognition in images of scenes observed by said video camera system.
  • Object recognition is a procedure to determine which of a set of objects is present in an image of a scene observed by a video camera system.
  • the first step in object recognition is to build a database of known objects.
  • the database is populated with data that may be obtained in several ways, for example by controlled observation of known objects.
  • the second step in object recognition is to match a new observation of a previously viewed object with its representation in the database.
  • geometry-based approaches rely on matching the geometric structure of an object.
  • Appearance-based approaches rely on using the intensity values of one or more spectral bands in the camera image; this may be grey-scale, color, or other image values.
  • a single view may not contain sufficient information to recognize the object, because the detected features depend on the camera viewpoint and the viewing geometry.
  • Features are defined as key points in images, for example corners, centers of areas, edges etc. These features are used to create an abstract description of parts of the image. This abstract description can be used for depth estimation and object recognition, for example.
  • Object recognition is performed by comparing features detected in the image with a set of stored features from a model object in a database.
  • a single- view approach may not be suitable for 3D object recognition.
  • three-dimensional (3D) object recognition it is an object of the invention to perform three-dimensional (3D) object recognition in an accurate and efficient way.
  • This object is achieved by the method according to claim 1 and by the system according to claim 6.
  • three-dimensional features are obtained from multiple views, i.e. by using multiple cameras that view a scene from different sides, and by having the cameras collaborate to establish a correct description of such a three-dimensional feature.
  • real three-dimensional features are obtained, i.e. not merely two-dimensional projections of three-dimensional features.
  • a 3D feature descriptor is computed using 2D feature descriptors and the camera positions in space (known a priori).
  • one camera finds corners in its captured image and than compares its findings with corners found by the other cameras. If one or some of the other corners fall on the epipolar lines of the camera set-up, a 3D construction is found.
  • high-level reasoning it is checked whether the feature is a real 3D corner: each of the cameras should than see the corner at specific orientations. Result is that this 3D corner and its position in space can be used for effective description of structures in space and of genuine 3D object recognition. If "orientation" is taken as a 2D feature descriptor, a certain combination of the orientations, seen from different views, results in a specific 3D feature descriptor. The combination depends on the 2D feature descriptors. It is deducted by a high- level reasoning algorithm.
  • Fig. 1 illustrates an example of corner orientation deducted after a background subtraction
  • Fig. 2 illustrates a general example of 3D corner detection with a camera set-up comprising four cameras
  • Fig. 3 illustrates an example of an algorithm according to the invention
  • Fig. 4 illustrates an example of a database containing for each object information about its 3D corners.
  • the invention is based on the idea to recognize objects in the 3D space using a collaborative multi-view camera system.
  • the system which has been used for the experiments is the Wireless Camera (WiCa) platform, developed by NXP Research.
  • Each camera is equipped with a Xetal 3D processor, dedicated for video processing, and a communication module using a ZigBee protocol.
  • the Xetal 3D processor is combined with a 30 frames per second color VGA- format image sensor.
  • the processor is fully programmable and therefore able to run a variety of computer vision algorithms.
  • Xetal 3D is able to achieve high computational performances (up to 50 GOPS) with modest power consumption.
  • the aim of the method according to the invention is to detect 3D objects, using their 3D features and their center of mass.
  • the method combines several 2D features, i.e. 2D feature descriptors, obtained from different views by different cameras, and then defines a new type of 3D feature.
  • the Intersecting Line Technique which has been described in "Embedded Object Recognition Using Smart Cameras and the Relative Position of Feature Points", D. Rankin, Master Thesis Report, University of Glasgow, pages 54-59, 2007, can be used to recognize the 3D object, if it is extended to 3D.
  • the camera network is assumed to be calibrated in space.
  • the particular 3D feature is a 3D corner.
  • Each smart camera uses corner detection for feature detector and corner orientation for feature descriptor.
  • corner orientations computed from different views are compared.
  • the 3D corners detected are used to find the center of mass of the object in 3D space.
  • the new 3D corner is defined by comparing the corner orientations computed from different views. It is assumed that a spatial calibration of the cameras has been performed beforehand. The following basic steps are performed:
  • Corner Detection The aim is to define a 3D corner based on the real-time detection of 2D corners from each camera.
  • a common corner detector can be used for corner detection, for example the Harris- Stephens corner operator ("A combined corner and edge detector", C. Harris and M.J. Stephens, Alvey Vision Conference, pages 147 - 152, 1988).
  • the Harris- Stephens algorithm is not only sensitive to corners, but also to local image regions which have a high degree of variation in all directions. Therefore, all interest points are detected in the image, containing all corner orientations.
  • the detected corners can be limited to those of the object itself by background subtraction.
  • corner orientation defined as the direction from the corner to the object
  • a relevant corner is detected on the object boundary.
  • the orientation is computed by looking at the angle of the edges around the interest point, and by performing a background subtraction algorithm on the current scene. A comparison between edge orientations only gives the direction of the corner but not its orientation, depending on the position of the object.
  • Fig. 1 illustrates an example of corner orientation deducted after a background subtraction.
  • Fig. 1 shows a corner (C) and the corresponding edges (El, E2). For each corner point, it needs to be defined on which side of the edges the object of interest is positioned. This is done by background subtraction, which defines the relative position of the object with respect to the position of the edges.
  • the dotted vector represents the other orientation detected if no background subtraction is applied.
  • any of the object's grey-level values could be used instead of background subtraction, but this would be more sensitive to permanent luminosity change.
  • Fig. 2 illustrates a general example of 3D corner detection with a camera set-up comprising four cameras.
  • Fig. 2 shows how 2D corner orientations are combined in order to find a 3D feature. Assuming a calibrated system in space, the 2D vectors can be interpreted and the presence of a 3D corner can be deducted. In practice, for each corner detected in one camera, the pixels on the other cameras corresponding to the same position in space are considered. If a corner exists at this position, its orientation is used to establish the shape of the 3D corner. By applying the same process in each camera, the system can deal with occlusions.
  • Fig. 3 illustrates an example of an algorithm according to the invention.
  • a 3D background subtraction method Prior to executing this algorithm, a 3D background subtraction method (see for example "Nonstationary Background Removal via Multiple Camera Collaboration", H. Lee, C. Wu, and H. Aghajan, Proc. of 1st International Conference on Distributed Smart Cameras,
  • 'Pi' and 'Pj' are pixels located on cameras i and j, respectively.
  • the variable 'table' is a table containing the corner orientations from each camera. The length of 'tab' is equal to the number of cameras used.
  • the function 'Correspondence()' computes a correspondence rate between the 2D corner orientations. The goal is not to minimize the distance between the feature vectors, but to compare their difference relative to the positions of the cameras in space. The correspondence rate gets higher if the relation between the different orientations is verified. This rate depends on the number of distributed cameras and their layout in space. The positions of the cameras in space are assumed to be known by calibration.
  • the proposed 3D recognition process is based on the Intersecting Line Technique.
  • This method has the advantage of being simple, fast, scale-invariant and robust to occlusions. It uses a database of objects which contains, for each feature point, the type of the feature and its 3D line gradient to the object center of mass.
  • Fig. 4 illustrates an example of a database containing for each object information about its 3D corners. Generally speaking, the database contains for each object the number of
  • 3D features the type of 3D features, and the line gradients.
  • Figure 4 an example of how to create such a database for a simple cubic object is shown.
  • the centre of mass is calculated by taking the average x, y and z values of the object feature point coordinates. So, for an object consisting of n 3D feature points, the centre of mass is expressed as:
  • CoM x (X 1 + ... + X n )
  • CoM y ( y ⁇ + ... + y n )/n
  • CoMx, CoMy and CoMz are its coordinates.
  • the two must be combined to recognize the object.
  • Each feature point detected in the image is considered separately.
  • all occurrences of the same feature type in the database are retrieved. From these retrieved database entries the associated line is drawn on the image, starting at where the feature point has been detected. More than one line can emerge from a specific image feature point if this type of feature is used multiple times in the object shape stored in the database. The newly drawn line emanates from the image feature point in the direction of the expected location of the centre of mass.
  • computation tasks necessary to perform the method according to the invention may be performed by one or more processors which form part of the video camera system.
  • the skilled person will be able to select appropriate processing means for these computation tasks in accordance with the amount of processing involved and the required functionality. For example, some tasks may be performed by a general-purpose processor, whereas other tasks may be performed by a dedicated microcontroller.
  • the word 'comprising' does not exclude other parts than those mentioned in a claim.
  • the word 'a(n)' preceding an element does not exclude a plurality of those elements.
  • Means forming part of the invention may both be implemented in the form of dedicated hardware or in the form of a programmed general-purpose processor. The invention resides in each new feature or combination of features.

Abstract

Three-dimensional features are obtained from multiple views, i.e. by using multiple cameras that view a scene from different sides, and by having the cameras collaborate to establish a correct description of such a three-dimensional feature. In this way real three- dimensional features are obtained, i.e. not merely two-dimensional projections of three- dimensional features. A 3D feature descriptor is computed using 2D feature descriptors and the camera positions in space (known a priori).

Description

Method and system for three-dimensional object recognition
FIELD OF THE INVENTION
The invention relates to a video camera system, in particular to a method and a system for three-dimensional object recognition in images of scenes observed by said video camera system.
BACKGROUND OF THE INVENTION
Object recognition is a procedure to determine which of a set of objects is present in an image of a scene observed by a video camera system. Generally speaking, the first step in object recognition is to build a database of known objects. The database is populated with data that may be obtained in several ways, for example by controlled observation of known objects. The second step in object recognition is to match a new observation of a previously viewed object with its representation in the database.
Prior work in object recognition can be divided into two basic approaches: geometry- based approaches and appearance-based approaches. Broadly speaking, geometry-based approaches rely on matching the geometric structure of an object. Appearance-based approaches rely on using the intensity values of one or more spectral bands in the camera image; this may be grey-scale, color, or other image values.
The field of three-dimensional (3D) object recognition has been investigated extensively in recent years. While human beings can easily recognize arbitrary 3D objects in arbitrary situations, computer vision algorithms can only solve the object recognition problem in constrained conditions.
Most model-based 3D object recognition systems use information from a single view.
For example, see the object recognition systems described in: "Model-based recognition of 3D objects from single images", I. Weiss and M.Ray, IEEE Trans. On Pattern Analysis and Machine Intell, 23(2), pages 116-128, 2001; "Recognition and Reconstruction of 3D Objects Using Model Based Perceptual Grouping", LK. Park, K.M. Lee and S.U. Lee, Proc. 15th International Conference on Pattern Recognition, pages 720-724, 2000; "Towards True 3D Object Recognition", J. Ponce, S. Lazebnik, F. Rothganger and C. Schmid, Proc. CVPR, Vol. II, pages 272-277, 2003. Unfortunately, a single view may not contain sufficient information to recognize the object, because the detected features depend on the camera viewpoint and the viewing geometry. Features are defined as key points in images, for example corners, centers of areas, edges etc. These features are used to create an abstract description of parts of the image. This abstract description can be used for depth estimation and object recognition, for example. Object recognition is performed by comparing features detected in the image with a set of stored features from a model object in a database. A single- view approach may not be suitable for 3D object recognition.
To overcome this problem there has been some research on combining data from several views, i.e. from several cameras, in order to recognize the object of interest. For example, see: "Multi-view Technique For 3D Polyhedral Object Recognition Using Surface Representation", M.F.S. Farias and J.M. de Carvalho, Revista Controle & Automacao, 10(2), pages 107-117, 1999; "Integration of Multiple Feature Groups and Multiple Views into a 3D Object Recognition System", J. Mao, P.J. Flynn and A.K. Jain, Computer Vision and Image Understanding, 62(3), pages 309-325, 1995; "3D object recognition system using multiple views and cascaded multilayered perception network", M.K. Osman, M.Y. Mashor and M.R. Arshad, Cybernetics and Intelligent Systems, IEEE Conference, Vol. 2, pages 1011-1015, 2004.
However, these 3D object recognition systems are based on the combination of two- dimensional (2D) features detected from different views. A 3D object recognition based on such a combination of 2D features, detected from completely different angles of view, requires the use of reliable 2D features. Unfortunately, this is difficult to achieve. The result of a 2D projection of the real 3D space is that some key features are not found from the images because they become disformed in the projection process. This hinders object recognition and leads to unreliable results. In order to properly recognize 3D objects from 2D images, a large number of features must be used to describe such an object, which requires a lot of processing resources. Also, the model object is stored as multiple sets of features in the database, i.e. a set of features for each side of view, in order to find a match with the object to be recognized in the image. Hence, the database needs to have a relatively large size. SUMMARY OF THE INVENTION
It is an object of the invention to perform three-dimensional (3D) object recognition in an accurate and efficient way. This object is achieved by the method according to claim 1 and by the system according to claim 6. According to the invention, three-dimensional features are obtained from multiple views, i.e. by using multiple cameras that view a scene from different sides, and by having the cameras collaborate to establish a correct description of such a three-dimensional feature. In this way real three-dimensional features are obtained, i.e. not merely two-dimensional projections of three-dimensional features. A 3D feature descriptor is computed using 2D feature descriptors and the camera positions in space (known a priori).
Up to now object recognition or scene description has been done on the basis of 2D feature finding; these 2D features are mere projections of 3D features on the image plane. As a result of this mismatch and feature deformation, object recognition has to be done by detecting an overkill of features and by storing multiple views of the objects under different capturing angles in the database. The invention introduces multiple cameras that see the object and scene from different sides. The network of calibrated cameras is able to see the scene in 3D. By collaborating, they can establish the real 3D features and their location in space. This makes a description of the scene much more simple and elaborate and it reduces the number of features needed for recognising 3D objects. An example of such a 3D feature is a 3D corner. With collaborative cameras 3D corners can be detected as follows. For example, one camera finds corners in its captured image and than compares its findings with corners found by the other cameras. If one or some of the other corners fall on the epipolar lines of the camera set-up, a 3D construction is found. By high-level reasoning it is checked whether the feature is a real 3D corner: each of the cameras should than see the corner at specific orientations. Result is that this 3D corner and its position in space can be used for effective description of structures in space and of genuine 3D object recognition. If "orientation" is taken as a 2D feature descriptor, a certain combination of the orientations, seen from different views, results in a specific 3D feature descriptor. The combination depends on the 2D feature descriptors. It is deducted by a high- level reasoning algorithm.
Advantageous embodiments of the invention are defined in the dependent claims. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is described in more detail with reference to the drawings, in which:
Fig. 1 illustrates an example of corner orientation deducted after a background subtraction;
Fig. 2 illustrates a general example of 3D corner detection with a camera set-up comprising four cameras;
Fig. 3 illustrates an example of an algorithm according to the invention; Fig. 4 illustrates an example of a database containing for each object information about its 3D corners.
DESCRIPTION OF PREFERRED EMBODIMENTS
The invention is based on the idea to recognize objects in the 3D space using a collaborative multi-view camera system. The system which has been used for the experiments is the Wireless Camera (WiCa) platform, developed by NXP Research. Each camera is equipped with a Xetal 3D processor, dedicated for video processing, and a communication module using a ZigBee protocol. The Xetal 3D processor is combined with a 30 frames per second color VGA- format image sensor. The processor is fully programmable and therefore able to run a variety of computer vision algorithms. Xetal 3D is able to achieve high computational performances (up to 50 GOPS) with modest power consumption.
The aim of the method according to the invention is to detect 3D objects, using their 3D features and their center of mass. In order to achieve this, the method combines several 2D features, i.e. 2D feature descriptors, obtained from different views by different cameras, and then defines a new type of 3D feature. The Intersecting Line Technique, which has been described in "Embedded Object Recognition Using Smart Cameras and the Relative Position of Feature Points", D. Rankin, Master Thesis Report, University of Glasgow, pages 54-59, 2007, can be used to recognize the 3D object, if it is extended to 3D. The camera network is assumed to be calibrated in space.
According to a preferred embodiment of the invention, the particular 3D feature is a 3D corner. Each smart camera uses corner detection for feature detector and corner orientation for feature descriptor. In order to find the 3D features, a collaboration between the 2D corner descriptions to the 3D corner description needs to be computed. Thus, the corner orientations computed from different views are compared. The 3D corners detected are used to find the center of mass of the object in 3D space. Thus, the new 3D corner is defined by comparing the corner orientations computed from different views. It is assumed that a spatial calibration of the cameras has been performed beforehand. The following basic steps are performed:
1) Corner Detection The aim is to define a 3D corner based on the real-time detection of 2D corners from each camera. A common corner detector can be used for corner detection, for example the Harris- Stephens corner operator ("A combined corner and edge detector", C. Harris and M.J. Stephens, Alvey Vision Conference, pages 147 - 152, 1988). The Harris- Stephens algorithm is not only sensitive to corners, but also to local image regions which have a high degree of variation in all directions. Therefore, all interest points are detected in the image, containing all corner orientations. The detected corners can be limited to those of the object itself by background subtraction.
2) Corner Description
As soon as a 2D corner is detected, a descriptor needs to be assigned to this corner. For this purpose the corner orientation, defined as the direction from the corner to the object, can be used. A relevant corner is detected on the object boundary. The orientation is computed by looking at the angle of the edges around the interest point, and by performing a background subtraction algorithm on the current scene. A comparison between edge orientations only gives the direction of the corner but not its orientation, depending on the position of the object.
Fig. 1 illustrates an example of corner orientation deducted after a background subtraction. Fig. 1 shows a corner (C) and the corresponding edges (El, E2). For each corner point, it needs to be defined on which side of the edges the object of interest is positioned. This is done by background subtraction, which defines the relative position of the object with respect to the position of the edges. The dotted vector represents the other orientation detected if no background subtraction is applied. Alternatively, any of the object's grey-level values could be used instead of background subtraction, but this would be more sensitive to permanent luminosity change.
Fig. 2 illustrates a general example of 3D corner detection with a camera set-up comprising four cameras. Fig. 2 shows how 2D corner orientations are combined in order to find a 3D feature. Assuming a calibrated system in space, the 2D vectors can be interpreted and the presence of a 3D corner can be deducted. In practice, for each corner detected in one camera, the pixels on the other cameras corresponding to the same position in space are considered. If a corner exists at this position, its orientation is used to establish the shape of the 3D corner. By applying the same process in each camera, the system can deal with occlusions.
Fig. 3 illustrates an example of an algorithm according to the invention. Prior to executing this algorithm, a 3D background subtraction method (see for example "Nonstationary Background Removal via Multiple Camera Collaboration", H. Lee, C. Wu, and H. Aghajan, Proc. of 1st International Conference on Distributed Smart Cameras,
Vienna, Austria, Sept 2007) may remove any detected corners which do not belong to the object.
In Fig. 3, 'Pi' and 'Pj' are pixels located on cameras i and j, respectively. The variable 'table' is a table containing the corner orientations from each camera. The length of 'tab' is equal to the number of cameras used. The function 'Correspondence()' computes a correspondence rate between the 2D corner orientations. The goal is not to minimize the distance between the feature vectors, but to compare their difference relative to the positions of the cameras in space. The correspondence rate gets higher if the relation between the different orientations is verified. This rate depends on the number of distributed cameras and their layout in space. The positions of the cameras in space are assumed to be known by calibration.
The proposed 3D recognition process is based on the Intersecting Line Technique.
This method has the advantage of being simple, fast, scale-invariant and robust to occlusions. It uses a database of objects which contains, for each feature point, the type of the feature and its 3D line gradient to the object center of mass.
It is noted that the Intersecting Line Technique has been explained in European patent application EP07104583, titled "Object recognition method and device", filed by the applicant on 21 March 2007. Fig. 4 illustrates an example of a database containing for each object information about its 3D corners. Generally speaking, the database contains for each object the number of
3D features, the type of 3D features, and the line gradients. In Figure 4 an example of how to create such a database for a simple cubic object is shown. There are 8 different types of 3D corners and the corresponding 3D line gradients. The centre of mass is calculated by taking the average x, y and z values of the object feature point coordinates. So, for an object consisting of n 3D feature points, the centre of mass is expressed as:
CoMx = (X1 + ... + Xn) In CoMy = ( + ... + yn)/n
Figure imgf000009_0001
where CoMx, CoMy and CoMz are its coordinates.
Once the shape is defined in the database and the feature points have been extracted from the image, the two must be combined to recognize the object. Each feature point detected in the image is considered separately. When a feature from the image is processed, then all occurrences of the same feature type in the database are retrieved. From these retrieved database entries the associated line is drawn on the image, starting at where the feature point has been detected. More than one line can emerge from a specific image feature point if this type of feature is used multiple times in the object shape stored in the database. The newly drawn line emanates from the image feature point in the direction of the expected location of the centre of mass.
It is noted that the computation tasks necessary to perform the method according to the invention may be performed by one or more processors which form part of the video camera system. The skilled person will be able to select appropriate processing means for these computation tasks in accordance with the amount of processing involved and the required functionality. For example, some tasks may be performed by a general-purpose processor, whereas other tasks may be performed by a dedicated microcontroller.
Furthermore, it is proposed to use color to obtain more information than grey-scale under certain conditions. This leads to a more accurate detection in certain cases. In case of the algorithm described above, the goal is to decrease the number of lines in order to decrease the number of false detections. Therefore, if the feature color is known, the image can be segmented and only the interest region can be kept. In this way, a smaller amount of corners is detected and fewer line gradients will be drawn on the screen. It is noted that 3D features do not have to be corners, but they can be any 3D feature whose shape can be deducted from collaborating 2D views. It is remarked that the scope of protection of the invention is not restricted to the embodiments described herein. Neither is the scope of protection of the invention restricted by the reference symbols in the claims. The word 'comprising' does not exclude other parts than those mentioned in a claim. The word 'a(n)' preceding an element does not exclude a plurality of those elements. Means forming part of the invention may both be implemented in the form of dedicated hardware or in the form of a programmed general-purpose processor. The invention resides in each new feature or combination of features.

Claims

CLAIMS:
1. A method for three-dimensional (3D) object recognition using a collaborative camera network comprising at least two cameras,
(a) wherein each camera captures an image, detects at least one two-dimensional (2D) feature in the image, and assigns a two-dimensional feature descriptor to the two-dimensional feature;
(b) wherein a three-dimensional feature descriptor is derived from the two- dimensional feature descriptors and from the camera positions in space;
(c) wherein the three-dimensional feature descriptor is compared to three-dimensional feature information stored in a database, in order to recognize a three-dimensional object.
2. A method as claimed in claim 1, wherein the two-dimensional feature is a 2D corner, the 2D feature descriptor is the orientation of the 2D corner, the three-dimensional feature is a 3D corner, and the three-dimensional feature descriptor is the orientation, relative to a camera in the collaborative camera network, of the 3D corner.
3. A method as claimed in claim 2, wherein the 2D corner is detected using the Harris- Stephens corner operator.
4. A method as claimed in claim 2, wherein the orientation of the 2D corner is computed by performing a background subtraction algorithm on the current scene.
5. A method as claimed in claim 1, wherein the Intersecting Line Technique is used to recognize the three-dimensional object.
6. A system for three-dimensional (3D) object recognition in a collaborative camera network comprising at least two cameras,
(a) wherein each camera is arranged to capture an image, to detect at least one two- dimensional (2D) feature in the image, and to assign a two-dimensional feature descriptor to the two-dimensional feature; (b) further comprising means for deriving a three-dimensional feature descriptor from the two-dimensional feature descriptors and from the camera positions in space;
(c) and means for comparing the three-dimensional feature descriptor to three- dimensional feature information stored in a database, in order to recognize a three- dimensional object.
PCT/IB2008/054935 2007-11-28 2008-11-25 Method and system for three-dimensional object recognition WO2009069071A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07121746.7 2007-11-28
EP07121746 2007-11-28

Publications (1)

Publication Number Publication Date
WO2009069071A1 true WO2009069071A1 (en) 2009-06-04

Family

ID=40527503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2008/054935 WO2009069071A1 (en) 2007-11-28 2008-11-25 Method and system for three-dimensional object recognition

Country Status (1)

Country Link
WO (1) WO2009069071A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9501498B2 (en) 2014-02-14 2016-11-22 Nant Holdings Ip, Llc Object ingestion through canonical shapes, systems and methods
US9508009B2 (en) 2013-07-19 2016-11-29 Nant Holdings Ip, Llc Fast recognition algorithm processing, systems and methods
US9646384B2 (en) 2013-09-11 2017-05-09 Google Technology Holdings LLC 3D feature descriptors with camera pose information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006002320A2 (en) * 2004-06-23 2006-01-05 Strider Labs, Inc. System and method for 3d object recognition using range and intensity

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006002320A2 (en) * 2004-06-23 2006-01-05 Strider Labs, Inc. System and method for 3d object recognition using range and intensity

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SEHGAL A ET AL: "3D object recognition using Bayesian geometric hashing and pose clustering", PATTERN RECOGNITION, ELSEVIER, GB, vol. 36, no. 3, 1 March 2003 (2003-03-01), pages 765 - 780, XP004393107, ISSN: 0031-3203 *
SEUNGDO JEONG ET AL: "Design of a Simultaneous Mobile Robot Localization and Spatial Context Recognition System", KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS; [LECTURE NOTES IN COMPUTER SCIENCE;LECTURE NOTES IN ARTIFICIAL INTELLIGENCE;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, vol. 3683, 17 August 2005 (2005-08-17), pages 945 - 952, XP019015615, ISBN: 978-3-540-28896-1 *
STEIN F ET AL: "Structural hashing: efficient three dimensional object recognition", PROCEEDINGS OF THE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. LAHAINA, MAUI, HAWAII, JUNE 3 - 6, 1991; [PROCEEDINGS OF THE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION], LOS ALAMITOS, IEEE. COMP., vol. -, 3 June 1991 (1991-06-03), pages 244 - 250, XP010023215, ISBN: 978-0-8186-2148-2 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9508009B2 (en) 2013-07-19 2016-11-29 Nant Holdings Ip, Llc Fast recognition algorithm processing, systems and methods
US9690991B2 (en) 2013-07-19 2017-06-27 Nant Holdings Ip, Llc Fast recognition algorithm processing, systems and methods
US9904850B2 (en) 2013-07-19 2018-02-27 Nant Holdings Ip, Llc Fast recognition algorithm processing, systems and methods
US10628673B2 (en) 2013-07-19 2020-04-21 Nant Holdings Ip, Llc Fast recognition algorithm processing, systems and methods
US9646384B2 (en) 2013-09-11 2017-05-09 Google Technology Holdings LLC 3D feature descriptors with camera pose information
US9501498B2 (en) 2014-02-14 2016-11-22 Nant Holdings Ip, Llc Object ingestion through canonical shapes, systems and methods
US10095945B2 (en) 2014-02-14 2018-10-09 Nant Holdings Ip, Llc Object ingestion through canonical shapes, systems and methods
US10832075B2 (en) 2014-02-14 2020-11-10 Nant Holdings Ip, Llc Object ingestion through canonical shapes, systems and methods
US11380080B2 (en) 2014-02-14 2022-07-05 Nant Holdings Ip, Llc Object ingestion through canonical shapes, systems and methods
US11748990B2 (en) 2014-02-14 2023-09-05 Nant Holdings Ip, Llc Object ingestion and recognition systems and methods

Similar Documents

Publication Publication Date Title
US10691979B2 (en) Systems and methods for shape-based object retrieval
Bak et al. Person re-identification using spatial covariance regions of human body parts
CN106716450B (en) Image-based feature detection using edge vectors
EP3356994B1 (en) System and method for reading coded information
US20110298799A1 (en) Method for replacing objects in images
US20190188451A1 (en) Lightweight 3D Vision Camera with Intelligent Segmentation Engine for Machine Vision and Auto Identification
JP2008519371A (en) Integrated image processor
CN104850857B (en) Across the video camera pedestrian target matching process of view-based access control model spatial saliency constraint
CN109190617B (en) Image rectangle detection method and device and storage medium
EP3035242B1 (en) Method and electronic device for object tracking in a light-field capture
CN104318216B (en) Across the identification matching process of blind area pedestrian target in video monitoring
WO2009069071A1 (en) Method and system for three-dimensional object recognition
CN112470189A (en) Occlusion cancellation for light field systems
CN112101260B (en) Method, device, equipment and storage medium for identifying safety belt of operator
Parameswaran et al. Illumination compensation based change detection using order consistency
Tasson et al. FPGA-based pedestrian detection under strong distortions
KR101741761B1 (en) A classification method of feature points required for multi-frame based building recognition
Mashor et al. 3D object recognition using 2D moments and HMLP network
Prathap et al. A real-time image mosaicing using scale invariant feature transform
Jaijing et al. Object detection and modeling algorithm for automatic visual people counting system
Yoon et al. A robust human head detection method for human tracking
KR101803018B1 (en) Method and apparatus for detecting shape change based on image
Ozer et al. Video analysis for smart rooms
Sahoo et al. Depth estimated history image based appearance representation for human action recognition
KR102260121B1 (en) A checkerboard for camera calibration, method and apparatus for automatic target recognition using the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08853500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08853500

Country of ref document: EP

Kind code of ref document: A1