WO2008114206A2

WO2008114206A2 - Object recognition method and device

Info

Publication number: WO2008114206A2
Application number: PCT/IB2008/051001
Authority: WO
Inventors: Richard P. Kleihorst; David A. Rankin; Robin T. A Van Rootseler
Original assignee: Nxp B.V.
Priority date: 2007-03-21
Filing date: 2008-03-17
Publication date: 2008-09-25
Also published as: WO2008114206A3

Abstract

An object recognition method comprises the steps of: receiving a digital representation of an image, detecting local features in the image, representing the local features as a set of coordinates indicating a position of the local features in the image, comparing the detected local features with the local object features in a model of at least one reference object, said model comprising for each of said local object features a type indication and an indication of an orientation with respect to a center of mass of the local object features, - constructing a line through each local feature each time the local feature corresponds to a local object feature, the line having a direction equal to the direction from the local object feature to a centre of mass of the local object features, determining whether at least a predetermined number of constructed lines coincides within a window of predetermined size.

Description

OBJECT RECOGNITION METHOD AND DEVICE

FIELD OF THE INVENTION

The invention relates to an object recognition method. The invention further relates to object recognition device. The invention further relates to an object recognition system. The invention still further relates to a method for building a model for an object

BACKGROUND OF THE INVENTION

For many years the area of road sign detection has interested scientists working in the domain of computer vision, e.g. C-Y. Fang, S. -W. Chen, C-S. Fuh, "Road- Sign Detection and Tracking", IEEE Trans. Vehicle Technology, Vol. 52 (5), pp. 1329-1341, 2003.

Visual recognition of road signs could have many applications, the most obvious of which would be automotive autopilot. This non-invasive technology would be able to work with existing road setups, and with the standardization of road signs throughout Europe, and similarities throughout continents, such a system would attract interest worldwide. To work as an automotive subsystem, the technique must be able to be implemented as an embedded system and have a fast refresh rate if it is to operate as the car drives.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide for a relatively simple method and a system capable of position invariant recognition of objects in an image.

These and other objects are achieved by the method according to claim 1 and the object recognition system according to claim 6.

The method is based upon the detection of local features. These can be detected with simple means. Furthermore the detection of such features can be easily carried out in parallel. As the detection method is based on the position of the detected local features relative to a center of mass shared by the features, the detection method is position invariant. Being able to cope with partial occlusion of an objects to be detected in the image is a very valuable property. Occlusion can result from many situations. For example, the sign may be partially obscured by a tree, or dirt on the sign. If the technique is able to cope with occlusion it is also resilient to situations when a feature point is not properly extracted from the image. This is achieved with the device according to the invention in that the constructed lines have a direction equal to the direction from the local feature in the model to the centre of mass in said model for the reference object under consideration. In other words, the centre of mass is not constructed straightforward from the features detected in the image, but by the intersection of the lines that are constructed through the detected features, the constructed lines having a direction corresponding to the direction of the lines of the corresponding features. Even if a part of the features is occluded in the image, the centre of mass is reliably reconstructed in this way.

For certain applications it is sufficient if the object recognition method detects objects at a particular scale. When used in a moving vehicle such an object recognition method will detect the object when it is at a predetermined distance from the object. Preferably however the object detection method is substantially scale invariant, so that objects can be recognized from a large range of positions. This can be easily achieved by the measure of claim 2. Scale invariant features are for example corners in the image. Corners can be detected easily by a method using a first step that constructs an edge map, for example using a Sobel filter, and a second step that detects the corners from the edge map.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects are described in more detail with reference to the drawing. Therein,

Fig. 1 schematically shows a method according to the invention,

Fig. 2 shows an example of a typical reference shape,

Fig. 3 shows the relation of the features to the center of mass in said shape,

Fig. 4 shows an intermediate result in the recognition of said shape in an image,

Fig. 5A shows a shape different from the reference shape, Fig. 5B shows the detected local features in the shape of Fig. 5 A, Fig. 5C indicates the correspondence of these detected local features with the features stored in the database for the reference shape, Fig. 5D shows the lines constructed on the basis of these correspondences, Fig. 6 shows a set of bit-planes assigned to respective features of the reference shape,

Fig. 7 shows an embodiment of an object recognition system according to the invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description of the invention, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, the invention may be practiced without these specific details. In other instances well known methods, procedures, and/or components have not been described in detail so as not to unnecessarily obscure aspects of the invention.

Fig. 1 shows an object recognition method according to the invention. In a first step SI a digital representation of an image is received. In a second step S2 local features are detected in the image. Suitable local features are for example edges. Such local features can be detected with simple means and furthermore the detection can be carried out in parallel. Various filters are known to detect edges in an image. A well-known filter for this purpose is the Sobel filter. Preferably the detected features are scale invariant. Such features are corners for example. These features can be detected straight forward from the received digital representation or from a prefiltered image, e.g. obtained by applying the Sobel filter. In particular 90 degree corners are very practical features for recognizing human designed objects.

In a third step S3 the local features are represented as a set of coordinates indicating a position of the local features in the image. In a fourth step S4 the detected local features are compared with the local features in a model of at least one reference object.

In a fifth step S5 a line is constructed for each time a detected local feature corresponds to a local feature in the model. The line has a direction equal to the direction from the local feature in the model to a centre of mass of the local features in said model, and coincides with the position of the local feature in the digital representation of the image. It is however not necessary that the constructed line actually extends through the position of the local feature in the digital representation of the image. It is sufficient if an extrapolated version of the constructed line intersects that position. In a sixth step S6 a center of mass is constructed as the position where a maximum number of constructed lines intersect within a window of predetermined size.

In the seventh step S7 it is decided whether said maximum number exceeds a predetermined threshold. If the threshold is exceeded, the object in the image is detected as the reference object from the database currently under consideration. If the threshold is not exceeded it is determined in step S8 whether a next reference object is still available. If no further reference objects are available no classification can be made. Otherwise the method continues with the next reference object in step S4.

The above method is illustrated in more detail with reference to an example:

Fig. 2 shows a T shaped object. The object shown has eight 90 degree corners, two of type Upper Left, two of type Lower Left, two of type Lower Right and two of type Upper Right. These eight features have a center of mass (CoM), which is defined as the average x,y coordinate of the set of eight features. As shown in Fig. 3, the object can also be represented by the collection of lines emanating from the center of mass CoM to the coordinates of the features. Accordingly the object can be represented by the following table, wherein each entry represents the type of feature as well as the direction of the line passing through that feature and the center of mass. In this table the direction is defined counter clockwise, wherein a horizontal direction to the right corresponds to an angle of 0°.

Table 1 : representation of a 2D object

In this way one or more objects can be stored as reference objects in a database.

It is noted that the information about the object may be stored differently. For example, instead of storing the direction of the line, the position of the object features relevant to the center of mass may be stored. Alternatively the absolute position of the features may be stored together with the absolute position of the center of mass.

After in step S2 relevant features are extracted in a way known as such, the extracted features are compared with those stored in the object database. If for example a feature is found of type Upper Left, this may be object feature 1 or object feature 6 of a T shaped object. Presuming this, two lines are constructed through the detected Upper Left feature having a direction respectively corresponding to the direction in entries 1 and 6 of the table. This is repeated for each feature detected in the image.

The result is shown in Fig. 4 . As in this example each type of feature appears twice in the reference object with which the set of detected features is compared, the resulting image comprises 2 * 8 lines. However, if the observed object corresponds to the object with which it is compared in the database, the resulting image will have a point, the Detected Center of Mass (DcoM), that is intersected by a line emanating from each of the features of the object from the database.

By way of comparison an example is illustrated in Figs. 5 A - 5D, where the observed object does not correspond to the reference object with which it is compared. In this example Fig. 5A shows the object as it is observed with a camera. Fig. 5B shows the result of the step of feature detection. The result is that 8 corners are detected: one corner of type upper left (UL), three corners of type lower left (LL), three corners of type upper right (UR), and one corner of type lower right (LR). Fig. 5C shows the feature references that match with the detected features, if the observed object were the T-shaped object from the database. I.e. a priori, each upper left corner could be feature Fl or feature F6 of the T-shaped reference object.

Fig. 5 D shows the lines constructed from the identified features. The lines have a direction corresponding to the matching features of the reference object. For example, in the previous step it was found that each upper left corner could correspond to feature Fl or to feature F6. Accordingly from each upper left corner a line is drawn in a direction of 315° , corresponding to feature Fl and one in a direction of 200°, which corresponds to feature F6. In Fig. 5D it is clearly visible that no center of mass can be detected. At most 3 lines intersect each other in the same point. From this observation it is concluded that the observed object is not the reference object.

Determining the number of intersections for each point may be realized in different ways. For example the lines for each detected feature may be constructed serially. An image buffer could be used having pixel values initialized at 0. Subsequently lines are constructed. Each line starts at the location of the detected feature and emanates in the direction indicated by the table for the reference object. Subsequently each pixel intersected by the constructed line has increased its value by 1. After all lines are drawn in the frame buffer the pixel value indicates the number of intersections for the pixel.

Alternatively, separate bitplanes may be used for every feature in the object under consideration taken from the database.

This embodiment is shown in Fig. 6. For clarity only 4 bitplanes are shown, but any number of bitplanes, one for each feature of the object of the database under consideration may be used. For example the T-shaped object in the database would require 8 bitplanes. In each bitplane a line is constructed. For example when attempting to recognize the object of Fig. 5A, one line is constructed for feature Fl, as the object of Fig. 5A has only one feature (its upperleft corner), that could a priori correspond to said Feature Fl of the reference object. The line is constructed from the location of said feature in the observed object in the direction indicated by the table for feature Fl of the reference object under consideration. Feature F2 of the reference object under consideration is detected three times. Accordingly, three lines are constructed in the second bitplane, emanating from the locations where the feature was detected and having a direction indicated for F2 for the considered reference object. By using separate bitplanes for every feature of the reference object the emanating lines can be constructed in parallel, therewith accelerating the detection process. After all lines are constructed, the number of intersections per pixel is determined by counting the number of bit planes wherein the pixel is intersected by a line.

In case that the observed object exactly corresponds to the reference object under consideration, the maximum number of intersections found for a pixel is equal to the number of features Fl , ...., Fn for the reference object.

The observation of objects may be hampered in practical circumstances. For example the object may be partially occluded. For example a traffic sign may be partially occluded by a tree, or by another vehicle. In an embodiment, detection of partially occluded objects is improved by requiring that the maximum number of intersections found for a pixel is at least a predetermined fraction of the number of features of the reference object under consideration. The fraction may for example be in the range of 0.8 to 0.95. A fraction which is substantially lower than 0.8, e.g. 0.5 could lead to an unacceptable number of false detections.

Another cause that may hamper detection is the relative orientation of the observed object to the observer. In an embodiment of the present invention the number of constructed lines substantially coincide if said number of lines crosses a common window. This allows for a certain tolerance in the relative orientation. If separate bit planes are used, such a window can be introduced by carrying out a logical OR operation. A constructed line is considered to intersect a pixel if either the said pixel is intersected, or another pixel in a window of predetermined size around the pixel. The window may have a size from e.g. 3x3, upto 7x7. The larger the size of the window, the more tolerant the detection becomes for deviations in the relative orientation. Nevertheless, the window should not be too large, as this could result in too high a number of false detections. In some cases also a window could be used wherein the height differs from the width.

Alternatively, the window could be applied at the stage wherein the lines are constructed. Accordingly, not only each pixel that is intersected by the line that is constructed is set to indicate intersection, but also the other pixels in a window around said pixel. If the lines are constructed in a common frame buffer the number of intersections for a pixel may be estimated by integrating the values in a window around the pixel.

Alternatively, the maximum number of intersections in a point could be calculated analytically instead of numerically. In this alternative embodiment, instead of actually constructing the lines, intersections for each pair of lines are calculated. The coordinates of each calculated intersection may be stored in a frequency table, comprising the coordinates for which an intersection was found and the number of times that coordinate was intersected by a line. For each next calculated intersection it is first determined if already one or more intersections were found at the same coordinates. If so then the frequency for those coordinates is increased by one. Otherwise a new entry in the frequency table is made for the new coordinates, and having a frequency value of 1. This analytic calculation has the advantage that less memory is required, as the number of intersections is substantially smaller than the number of pixels. A disadvantage may be however, that it is more difficult to carry out the calculation in parallel. It is noted that the detection is not limited to 2D objects. The method according to the present invention can be analogically applied to higher dimensional objects. For example 3D objects may likewise be characterized by features arranged around a common center of mass for said features. In the same way as shown above, a table can be stored for said objects comprising a number of entries for each feature, wherein each entry contains a characterization of the shape of the feature, e.g. upper left front (ULF), lower left front (LLF), lower right front (LRF), upper right front (URF), upper left back (ULB), lower left back (LLB), lower right back (LRB), upper right back (URB), and further contains an indication of the orientation of the feature relative to the center of mass of the features. In the recognition phase the same steps are taken as in the two-dimensional version of the method.

A digital representation of an image is received, in this case a three- dimensional representation. The three-dimensional representation may be obtained for example by using a stereo camera, or by using images taken from different positions with the same camera, or using images from multiple cameras.

Subsequently local features are detected in the image, in this case the features by which the three dimensional objects are characterized.

Next, the local features are represented as a set of 3D coordinates indicating a position of the local features in the image. The detected local features are compared with the local object features in a model of at least one object, as stored in the table described above for the 3D object.

Now a line is constructed through each local feature each time the local feature corresponds to a local object feature. The line having a direction equal to the direction from the local object feature to a centre of mass of the local object features. If for example an upper left front corner is detected in the image a line is constructed for every feature in the table that is of type upper left front corner. The constructed lines emanate from the position of the detected feature in a direction corresponding to the direction stored in the table for the feature of the object under consideration.

Finally it is determined whether at least a predetermined number of constructed lines coincides within a window of predetermined size.

Fig. 7 shows an object recognition system comprising a database 10 with information for at least one reference object about the relative position of local features in an image, said local features having a center of mass. The database may store the information for example in the form of Table 1 shown above. Alternatively however, the at least one object may be characterized by other types of features, such as circle segments, textures, colors, etc. The relative position of the features with respect to the center of mass may be indicated by a pair of relative coordinates, instead of by an angle. Instead of relative coordinates absolute coordinates may be used. An image capturing device 20 is included for capturing a digital representation of the image.

A recognition facility 30 is further included for recognizing local features in the captured image. This facility may use known means. By way of example an edge detector 32 is used. The facility comprises a corner detector 34 that detects corners from the intermediary result provided by the edge detector 32. A line constructing facility 40 is further included for constructing a line through a local feature. This constructing facility 40 obtains from the corner detector 34 the position (x,y) and type (t) of the feature detected. The constructing facility 40 subsequently determines if a feature of type t occurs in the database 10 for the reference object (o) currently under consideration. The feature may occur multiple times. For each occurrence the constructing facility 40 constructs a line from the position of said feature in a direction indicated in the database 10 to the center of mass for the feature of the reference object with which it is compared. In the embodiment shown the constructing facility 40 constructs the lines in a frame buffer 50 having at least separate bit-planes for each feature of the reference object. A coincidence detecting facility 60 determines whether at least a predetermined number of lines coincide within a window of predetermined size. If the coincidence detecting facility 60 detects that at least a predetermined number of lines coincide within a window of predetermined size it determines that the observed object is the reference object o in the database 10 currently considered. Otherwise it indicates with signal nxt that the observed object has to be compared with a next reference object in the database 10.

It is remarked that the scope of protection of the invention is not restricted to the embodiments described herein. For example the matching process does not have to take place on an object by object basis. Alternatively the matching process is carried out for all objects in parallel. In that case the constructed lines are identified by the object they relate to. So, instead of putting the lines in bitplanes, which is essentially an identifier, each line departing from a feature can be assigned an identifier like:(feature point number "k" from object "o"). If in the detection phase many feature points (85% of all numbers) from object "o" give crossing lines in a certain spot the object o is detected. For example values 1, ..., 255 of an 8-bit red channel may be used to identify the object to which the constructed lines belong, while the values 1,...,255 of a green channel indicate the type of detected feature from which the lines depart. Accordingly each line has an identifier for the object and the feature of the object that it represents. In order to detect a particular object a detection procedure searches whether a sufficient number of lines having the particular red value indicative for said object, concentrated within a window, and a sufficient number of features (green values) is represented within that window.

Parts of the system may implemented in hardware, software or a combination thereof. Neither is the scope of protection of the invention restricted by the reference numerals in the claims. The word 'comprising' does not exclude other parts than those mentioned in a claim. The word 'a(n)' preceding an element does not exclude a plurality of those elements. Means forming part of the invention may both be implemented in the form of dedicated hardware or in the form of a programmed general purpose processor. The invention resides in each new feature or combination of features.

Claims

CLAIMS:

1. Object recognition method comprising the steps of: receiving a digital representation of an image, detecting local features in the image, representing the local features as a set of coordinates indicating a position of the local features in the image, comparing the detected local features with local object features in a model of at least one reference object, said model comprising for each of said local object features a type indication and an indication of a direction from said local object feature to a center of mass of the local object features, - constructing a line through each local feature detected in the digital representation that corresponds to a local feature of the reference object, the line having a direction equal to the direction indicated in the model for said local feature of the reference object, determining whether at least a predetermined number of constructed lines coincides within a window of predetermined size.

2. Object recognition method according to claim 1, wherein the detected features are scale invariant features.

3. Object recognition method according to claim 1, wherein a bit-plane is assigned to each feature of the at least one reference object, and wherein the line to be constructed is represented in the bit-plane assigned to the local feature that corresponds to the detected local feature.

4. Object recognition method according to claim 3, wherein the center of mass is constructed as the position that is intersected by a constructed line in the highest number of bit-planes.

5. Object recognition method according to claim 1, wherein the number of constructed lines substantially coincide if said number of lines cross a common window.

6. Object recognition device comprising: - A database (10) comprising a model of at least one reference object, said model comprising for each of a set of local object features a type indication and an indication of a direction from said local object feature to a center of mass of the local object features, A recognition facility (30) for recognizing local features in a captured image, A constructing facility (40) for constructing a line through a local feature when a local feature corresponds to a local feature for the at least one reference object, the line having a direction indicated by the model for said local feature for said at least one reference object,

A coincidence detecting facility (60) for determining whether at least a predetermined number of lines coincide within a window of predetermined size.

7. Object recognition system comprising an image capturing device (20) for capturing a digital representation of an image, and an object recognition device according to claim 6 coupled to the image capturing device for recognizing objects in the captured digital representation.

8. Method for building an model for an object, comprising the steps of: Receiving a reference image of the object,

Detecting local features in the reference image,

Storing the position and type of the local features in the reference images relative to a center of mass of said features.