WO2015014882A1

WO2015014882A1 - Method for detecting a target object by clustering of characteristic features of an image, camera system and motor vehicle

Info

Publication number: WO2015014882A1
Application number: PCT/EP2014/066349
Authority: WO
Inventors: James Mcdonald; John Mcdonald
Original assignee: Connaught Electronics Ltd.
Priority date: 2013-07-31
Filing date: 2014-07-30
Publication date: 2015-02-05
Also published as: DE102013012780A1

Abstract

The invention relates to a method for detecting a target object in an environmental region of a motor vehicle based on an image of the environmental region, which is provided by means of a camera of the motor vehicle, wherein characteristic features are identified in the image and the target object is detected depending on the characteristic features by means of an evaluation device of the motor vehicle, wherein a plurality of image nodes (11) is defined in the image, and a local feature density of the characteristic features around the respective image node (11) is determined to each image node (11), wherein the detection of the target object includes that several of the image nodes (11) are combined to a cluster (18) representing the target object depending on the respective feature density.

Description

Method for detecting a target object by clustering of characteristic features of an image, camera system and motor vehicle

The invention relates to a method for detecting a target object in an environmental region of a motor vehicle based on an image of the environmental region, which is provided by means of a camera of the motor vehicle, wherein characteristic features are identified in the image and the target object is detected depending on the characteristic features by means of an evaluation device of the motor vehicle. In addition, the invention relates to a camera system for performing such a method as well as to a motor vehicle with such a camera system.

Camera systems for motor vehicles are already known from the prior art. As is known, a camera system includes at least one camera, which is attached to the motor vehicle and captures an environmental region of the motor vehicle. Several such cameras can also be employed, which capture the entire environment around the motor vehicle. The camera mounted on the motor vehicle provides a temporal sequence of images of the

environmental region, namely a plurality of images per second. This image sequence is then communicated to an electronic evaluation device, which processes the captured images and is able to provide very different functionalities in the motor vehicle based on the images. Presently, the interest applies to the detection of target objects located in the depicted environmental region. If a target object is detected in the images, thus, this target object can be tracked in the sequence of images. For this purpose, in the prior art, the optical flow method is usually used, in which characteristic features such as for example edges and/or corners are detected in the images and a flow vector is calculated to each characteristic feature, which specifies the direction of movement and the speed of movement of the characteristic feature in the sequence of images.

In the detection of target objects in the camera images, there is a great challenge in differentiating between characteristic features, which are associated with different target objects. In order to ensure this differentiation, usually, a so-called clustering is effected such that several characteristic features are combined to a cluster, which represents a detected target object. Therein, various algorithms are known, which serve for clustering. In this context, for example a hierarchical clustering and a so-called divisive clustering method as well as an agglomerative clustering method within the scope of the hierarchical clustering are known. In the divisive clustering method, all of the characteristic features are first considered as associated with one cluster and then the already formed clusters are incrementally divided in increasingly smaller clusters. The agglomerative clustering method in turn includes that first each characteristic feature constitutes a cluster and then the already formed clusters are incrementally combined to increasingly larger clusters. Apart from that, a plurality of other clustering methods is also known, but which all have in common that they require relatively much computational effort and thus need

computationally powerful evaluation devices. Namely, the known clustering methods immediately work on the characteristic features and evaluate them in order to divide the features in corresponding clusters. However, since the number of the characteristic features usually can be relatively great, a correspondingly great computational power is required to allow the grouping of the characteristic features. However, in particular in embedded systems in motor vehicles, such a computational power is only available in restricted manner. The known algorithms additionally have the disadvantage that they cannot be applied to all types of characteristic features. Namely, some types of characteristic features have descriptive data or so-called descriptors, on which the use of the known clustering methods is not possible. Some of the known algorithms additionally require a priori assumptions and thus are correspondingly imprecise.

It is an object of the invention to provide a method for detecting a target object improved over the prior art, a camera system as well as a motor vehicle.

According to the invention, this object is solved by a method, by a camera system as well as by a motor vehicle having the features according to the respective independent claims. Advantageous implementations of the invention are the subject matter of the dependent claims, of the description and of the figures.

A method according to the invention serves for detecting a target object in an

environmental region of a motor vehicle based on an image of the environmental region. The image is generated by a camera of the motor vehicle and processed by means of an evaluation device. The evaluation device extracts characteristic features from the image. The evaluation device then detects the target object depending on the characteristic features. A plurality of image nodes is defined in the image, and a local feature density of the characteristic features around the respective image node is determined to each image node. The detection of the target object includes that several of the image nodes are combined to a cluster representing the target object depending on the respective feature density. Thus, according to the invention, it is provided that the characteristic features are only indirectly combined to the cluster such that not the characteristic features themselves, but the image nodes are combined to the cluster depending on the respective local feature density. Instead of combining the characteristic features themselves to a cluster, thus, clustering of the image nodes is effected. This considerably reduces the required computational power compared to the prior art. Namely, the clustering is effected in deterministic manner since the number and position of the image nodes can be fixedly preset. In addition, the accuracy of the clustering is increased thereby compared to stochastic methods. The reduction of the computational effort is furthermore favored in that the proposed method is a so-called "single pass" algorithm or "one pass" algorithm. Namely, in the method according to the invention, the detected characteristic features only have to be taken into account once in order to determine the feature density at the respective nodes. Intermediate storage of the characteristic features is therefore not required. The method according to the invention can thus be implemented particularly advantageously in an embedded system, which proves advantageous in particular in motor vehicles.

Generally, different methods can be applied in order to extract the characteristic features from the image. For example, the so-called Harris points can be detected, or further methods such as SIFT, SURF, ORB or the like can also be used.

Generally speaking, the distances between adjacent image nodes can be identical such that the image nodes are defined according to a unitary raster.

In an embodiment, it is provided that at least one image region of the image is divided in a plurality of image cells for defining the image nodes and the image nodes are defined by respective corners of the image cells. Therein, the image cells can be rectangular, in particular square cells. Thereby, image nodes can be provided, which are disposed in rows and columns, whereby the effort in clustering is further reduced. Preferably, all of the image cells have the same size, for example 10x10 pixels. If only a region of the image is used, thus, for example 32x32 image cells in total can be provided, which each have a size of 10x10 pixels. However, the invention is not restricted to this size, the number and the size of the image cells can basically be arbitrarily selected.

The respective feature density can be indicated to each image node by a number of characteristic features, which are detected in a predetermined area around the respective image node. In this manner, a parameter is defined to each image node, namely the number of characteristic features, which are associated with this image node. The evaluation of the image nodes is therefore particularly low-effort, since each image node is only defined by the number of the associated characteristic features besides its position in the image frame.

Very generally, it can be provided that the cluster is formed of image nodes adjacent to each other, the feature density of which is greater than a preset threshold value. Thus, the clustering can be performed without much effort. It is only sufficient that image nodes are found, the feature density of which is greater than the threshold value, and adjacent image nodes satisfying this criterion can be combined to a common cluster. The threshold value can for example be in a range of values from 1 to 5, which means that only those image nodes are considered, with which a number of at least 1 to 5 of characteristic features is associated.

It has proven particularly advantageous if a "snake" algorithm is used for clustering in order find those image nodes, which are to be combined to a common cluster. Such a "snake" algorithm is described in more detail below:

The image nodes can be organized in rows and columns such that several strings of image nodes are respectively defined along two orthogonal image axes of the image - for example along the x axis as well as the y axis of the image. The combination of the several image nodes to the cluster can include that starting from a starting node, in the string thereof (in rows or in columns), first it is proceeded to further image nodes one after the other in a first examination direction in order to examine these image nodes one after the other to the effect if the feature density thereof satisfies a predetermined criterion. Those image nodes, which satisfy the criterion, are combined to the cluster. If an image node is detected, on which the criterion is not satisfied, thus, it is proceeded to an adjacent string (adjacent row or column) in order to examine the image nodes one after the other for the predetermined criterion in this adjacent string in a second examination direction opposite to the first examination direction. In this manner, the image nodes are successively examined in meander-shaped manner to the effect whether or not they satisfy a preset criterion with respect to the feature density. Those pixels, which satisfy the criterion, then constitute the cluster. Such an approach has the advantage that those image nodes satisfying the criterion, can be particularly fast found. This algorithm thus proves for example advantageous if the depicted scene changes relatively fast and the detected target object for example moves relatively fast. Furthermore, the density of the image nodes in the image and thus also the number of the image nods can be varied without influencing the reliability of the "snake" algorithm.

The mentioned criterion can for example include that the feature density of the respective image node is greater than a preset threshold value. Preferably, this can be the above already mentioned threshold value, which can be in a range of values from 1 to 5.

The starting node, at which the algorithm begins, can basically be arbitrarily selected. However, it proves advantageous if the starting node is selected depending on the feature density and/or depending on a position in the image. For example, the image node with the greatest feature density or the first or the last image node or else a randomly determined node can be selected as the starting node.

The first examination direction can also be determined depending on the feature density of the image nodes adjacent to the starting node. In particular, this means that it is determined depending on the feature density of the nodes adjacent to the starting node if the "snake" algorithm is to be performed in columns or in rows. In this manner, the search for the image node of a common cluster can be further optimized.

A camera system according to the invention for a motor vehicle includes a camera for providing an image of an environmental region of the motor vehicle as well as an electronic evaluation device formed for performing a method according to the invention.

A motor vehicle according to the invention, in particular a passenger car, includes a camera system according to the invention.

The preferred embodiments presented with respect to the method according to the invention and the advantages thereof correspondingly apply to the camera system according to the invention as well as to the motor vehicle according to the invention.

Further features of the invention are apparent from the claims, the figures and the description of figures. All of the features and feature combinations mentioned above in the description as well as the features and feature combinations mentioned below in the description of figures and/or shown in the figures alone are usable not only in the respectively specified combination, but also in other combinations or else alone. Now, the invention is explained in more detail based on a preferred embodiment as well as with reference to the attached drawings.

There show:

Fig. 1 in schematic illustration a motor vehicle with a camera system

according to an embodiment of the invention;

Fig. 2 to 6 an exemplary image of an environmental region of the motor vehicle, wherein a method according to an embodiment of the invention is explained in more detail; and

Fig. 7 to 10 schematic illustrations for explaining the method.

In Fig. 1 , a motor vehicle 1 according to an embodiment is shown in schematic illustration. The motor vehicle 1 is for example a passenger car. It includes a camera system 2 having a camera 3. The motor vehicle 1 is on a road 4. On the road 4, there is additionally a target object 5, that is an obstacle. The camera system 2 is for example a collision warning system and serves for warning the driver of the presence of the target object 5 in an environmental region 6 of the motor vehicle 1 . The camera system 2 thus serves for detecting the target object 5 and preferably also for tracking the target object 5 in the environmental region 6. Therein, images are captured by means of the camera 3, which are then processed by means of an electronic evaluation device (signal processor) not illustrated in more detail. The evaluation device receives the captured images and processes them. The evaluation device can be integrated in the camera 3 or it can be a component of the camera system 2 separate from the camera 3. Optionally, the images can also be displayed on a display in the motor vehicle 1 , wherein the detection of the target object 5 can for example be effected to the effect that the target object 5 is provided with a border in the images.

For example, the camera 3 has a capturing angle or opening angle, which may be in a range of values from 90° to 200 °. The camera 3 can be a CMOS camera or a CCD camera or any image capturing device, which is formed for detecting light in the visible spectral range. The camera 3 is preferably a video camera continuously providing a sequence of images. The electronic evaluation device then processes the image sequence in real time and can detect and track the target object 5 based on this image sequence.

In the embodiment according to Fig. 1 , the camera 3 is disposed in the rear region of the motor vehicle 1 and captures an environmental region 6 behind the motor vehicle 1 . The camera 3 can for example be disposed on the bumper or else on a tailgate. However, the invention is not restricted to such an arrangement of the camera 3; the arrangement of the camera 3 can be different according to embodiment. For example, the camera 3 or an additional camera can also be disposed in a front area of the motor vehicle 1 and/or for example be integrated in a side-view mirror. Plural cameras 3 can also be employed, which each capture a separate environmental region 6.

In Fig. 2 an image region 7 of an image overall denoted by 8 is illustrated, which was provided by the camera 3. In the image 8, the environmental region 6 of the motor vehicle 1 is depicted. As is apparent from Fig. 2, two target objects 5a, 5b in total are located in the environmental region, which are to be detected by the evaluation device based on the image 8. For this purpose, a grid 9 is defined in the image region 7 according to a preset raster, as shown in Fig. 3. By this grid 9, the image region 7 is divided in a plurality of square image cells 10 of the same size such that image nodes 1 1 are defined, which are constituted by corners of the image cells 10. Thus, the image nodes 1 1 represent nodes of the grid 9. In the embodiment, for example, a grid with 32x32 image cells 10 is defined, which each can have a size of 10x10 pixels.

Thus, the image nodes 1 1 are organized in columns 12 as well as in rows 13. The rows 13 are defined along the x axis of the image 8; the columns 12 are defined along the y axis of the image 8. The columns 12 and rows 13 overall constitute strings of image nodes 1 1 . Thus, there are strings of image nodes 1 1 both along the x axis and the y axis.

At least in the image region 7, the evaluation device determines so-called characteristic features, which are extracted from the image 8. For determining the characteristic features, the methods already known from the prior art can be used. Then, a local feature density of the characteristic features is determined to each image node 1 1 .

The determination of the feature density to each image node 1 1 is now explained in more detail with reference to Fig. 7: In Fig. 7, four image nodes 1 1 a, 1 1 b, 1 1 c, 1 1 d in total are illustrated, which are located in the following positions of the image 8: (xA, yA), (xB, yB), (xC, yC) and (xD, yD). The image nodes 1 1 a, 1 1 b, 1 1 c, 1 1 d thus bound an individual image cell 10, the center point of which is in the position (xP, yP). By the evaluation device, four characteristic features 14 in total are detected in the image cell 10, namely in the following image positions: (x1 , y1 ), (x2, y2), (x3, y3) and (x4, y4). In order to determine the local feature density, the image cell 10 is divided in four quadrants Q1 , Q2, Q3 und Q4, i.e. four identical (here square) image regions. If a characteristic feature 14 is detected in the first quadrant Q1 , thus, this characteristic feature 14 is associated with the image node 1 1 b. If a characteristic feature 14 is detected in the second quadrant Q2, thus, this feature 14 is associated with the image node 1 1 a. Those features 14 located in the third quadrant Q3 are associated with the image node 1 1 c. Finally, features located in the fourth quadrant Q4 are associated with the image node 1 1 d. The same method is performed for all of the image cells 10. In other words, those features 14 located in a predetermined area around the respective image node 1 1 are associated with this image node 1 1 .

At each image node 1 1 , the number of the associated characteristic features 14 is counted. This number of features 14 then represents the feature density at the respective image node 1 1 . Thus, the number of associated characteristic features 14 is determined to each image node 1 1 .

Thus, overall, a density map of the characteristic features 14 is provided, as it is schematically illustrated in Fig. 8. Here, the number of associated characteristic features 14 is indicated to each image node 1 1 . In order to detect the two target objects 5a, 5b, now, clustering of the image nodes 1 1 is performed depending on the respective feature density. For this purpose, a so-called "snake" algorithm is used.

First, one or more starting nodes are selected for this algorithm. For example, this can be that image node 1 1 , which has the greatest feature density in a certain image region. In the embodiment according to Fig. 8, three such starting nodes 15a, 15b as well as 15c in total are selected. The three starting nodes 15a, 15b, 15c locally each have the maximum feature density.

The "snake" algorithm is now separately executed for each starting node 15a, 15b, 15c.

For general explanation of the algorithm, now, reference is made to Fig. 9: here, the selected starting node is denoted by 15, the other image nodes by 1 1 . Starting from the starting node 15, first, a first examination direction 16 is determined, in which the examination of the image nodes 1 1 is to occur starting from the starting node 15. This examination can be effected in rows or in columns. In the current string of the starting node 15, the image nodes 1 1 are now examined one after the other (in the first examination direction) to the effect if the feature density thereof corresponds to a preset criterion, for example is greater than a preset threshold value such as greater than 1 . As illustrated in Fig. 9, it is incrementally proceeded to an image node 1 1 x. The image node 1 1 x satisfies the above mentioned criterion. However, the evaluation device determines that the next image node 1 1 in the same row 13a does not satisfy the mentioned criterion. Thus, it is proceeded to the next adjacent row 13b. Therein, in the next row 13b, first an image node 1 1 m is examined in the next row 13b, which is in a preset distance from the node 1 1 x in the next row 13b. If this image node 1 1 m does not satisfy the criterion, thus, it is proceeded in a second examination direction 17 opposite to the first examination direction 16, and all of the image nodes 1 1 of the next row 13b are examined one after the other for the criterion. In the embodiment according to Fig. 9, only an image node 1 1 n satisfies the criterion. The examination in the second examination direction 17 is effected until an image node 1 1 is again found, which does not satisfy the criterion. As is apparent from Fig. 9, the image node 1 1 y is the last node in the row 13b, which satisfies the criterion. Then, the algorithm moves to a further row 13c, in which the method is repeated.

Starting from the starting node 15, the "snake" algorithm is also performed in the opposite direction. Here too, the examination according to Fig. 9 is effected in rows.

The image nodes 1 1 illustrated in Fig. 9, which satisfy the criterion, are then combined to a cluster 18.

With reference again to Fig. 8, in which the scene corresponding to the image of Fig. 2 is shown, three clusters 18a, 18b, 18c in total are detected. For detecting the first cluster 18a, the image nodes 1 1 are examined in rows starting from the starting node 15a.

Therein, the selection of the first examination direction 16 can be performed depending on the feature density of those image nodes 1 1 , which are adjacent to the starting node 15a. As is apparent from Fig. 8, the feature density is maximally equal to 7 at the image nodes 1 1 adjacent to the starting node 15a, namely at the right image node 1 1 . For this reason, the first examination direction 16 is selected as horizontal. Thus, it is examined in rows if the image nodes 1 1 satisfy the above mentioned criterion.

At the second starting node 15b, a vertical direction is selected as the first examination direction 16. With the second cluster 18b, thus, the image nodes 1 1 are examined in columns. With the third cluster 18c, the examination is again effected in rows. If plural clusters 18a, 18b, 18c are found, as shown in Fig. 8, thus, adjacent clusters can also be combined to an overall cluster if preset criteria are satisfied. These criteria can for example include the size of the clusters 18 and/or the feature density within the respective cluster 18 and/or the position within the image region 7 and/or a distance between two adjacent clusters 18 and/or the speed of the clusters 18, with which the respective cluster 18 moves over the image sequence, and/or the color of the pixels. In the embodiment according to Fig. 8, the two clusters 18a and 18b are combined to a common cluster. This overall cluster 18a, 18b then corresponds to the target object 5a according to Fig. 2. The cluster 18c corresponds to the target object 5b according to Fig. 2.

The combination of the two clusters 18a, 18b to a common cluster is furthermore illustrated in Fig. 10. The image nodes 1 1 associated with the common cluster 18a, 18b are marked with "1 ". The image nodes 1 1 of the second cluster 18c are denoted by "2".

With reference again to Fig. 8, it is mentioned that the number of the found

clusters 18a, 18b, 18c is also dependent on the above mentioned threshold value and thus on the criterion, which is taken as a basis for the "snake" algorithm. If the threshold value is set to zero, thus, the algorithm will find a single cluster instead of the separate clusters 18a, 18b, which corresponds to the overall cluster 18a, 18b. The selection of the threshold value thus represents a compromise between the size of the detected clusters 18 on the one hand and possible errors in the combination of clusters 18 on the other hand. The combination of several image clusters 18 to a common cluster is therefore preferably performed afterwards after the algorithm has found several clusters 18.

As is illustrated in Fig. 5, a border 19 (so-called "bounding box") can be defined around the common cluster 18a, 18b. This border 19 then represents the target object 5a in the digital range. A corresponding border 19 can also be defined for the cluster 18c and then represents the target object 5b. The two borders 19 are illustrated in Fig. 6 without the clusters 18.

Claims

1 . Method for detecting a target object (5) in an environmental region (6) of a motor vehicle (1 ) based on an image (8) of the environmental region (6), which is provided by means of a camera (3) of the motor vehicle (1 ), wherein characteristic features (14) are identified in the image (8) and the target object (5) is detected depending on the characteristic features (14) by means of an evaluation device of the motor vehicle (1 ),

characterized in that

a plurality of image nodes (1 1 ) is defined in the image (8), and that a local feature density of the characteristic features (14) around the respective image node (1 1 ) is determined to each image node (1 1 ), wherein the detection of the target object (5) includes that several of the image nodes (1 1 ) are combined to a cluster (18) representing the target object (5) depending on the respective feature density.

2. Method according to claim 1 ,

characterized in that

the definition of the image nodes (1 1 ) includes that at least one image region (7) of the image (8) is divided in a plurality of, in particular rectangular, preferably square, image cells (10) and the image nodes (1 1 ) are defined by respective corners of the image cells (10).

3. Method according to claim 1 or 2,

characterized in that

to each image node (1 1 ), the respective feature density is indicated by a number of characteristic features (14), which are detected in a predetermined area around the respective image node (1 1 ).

4. Method according to any one of the preceding claims,

characterized in that

the cluster (18) is formed of image nodes (1 1 ) adjacent to each other, the feature density of which is greater than a present threshold value.

5. Method according to any one of the preceding claims,

characterized in that

the image nodes (1 1 ) are organized in rows (13) and columns (12) such that several strings of image nodes (1 1 ) are respectively defined along two orthogonal image axes (x, y) of the image (8), wherein the combination of the several image nodes (1 1 ) to the cluster (18) includes that starting from a starting node (15) in the string (13a) thereof, first it is proceeded to further image nodes (1 1 ) one after the other in a first examination direction (16) and these image nodes (1 1 ) are examined one after the other to the effect if the feature density thereof satisfies a predetermined criterion, and after finding an image node (1 1 ), which does not satisfy the

predetermined criterion with respect to the feature density, it is proceeded to an adjacent string (13b), in order to examine the image nodes (1 1 ) one after the other for the predetermined criterion in this adjacent string (13b) in a second examination direction (17) opposite to the first examination direction (16), wherein those image nodes (1 1 ) are combined to the cluster (18), which satisfy the predetermined criterion.

6. Method according to claim 5,

characterized in that

the predetermined criterion includes that the feature density of the respective image node (1 1 ) is greater than a preset threshold value.

7. Method according to claim 5 or 6,

characterized in that

the starting node (15) is selected depending on the feature density and/or depending on a position in the image (8).

8. Method according to any one of claims 5 to 7,

characterized in that

the first examination direction (16) is determined depending on the feature density of the image nodes (1 1 ) adjacent to the starting node (15).

9. Camera system (2) for a motor vehicle (1 ) including a camera (3) for providing an image (8) of an environmental region (6) of the motor vehicle (1 ) and including an evaluation device adapted to perform a method according to any one of the preceding claims.

10. Motor vehicle (1 ) with a camera system (2) according to claim 9.