WO2019110824A1 - Using silhouette for fast object recognition - Google Patents

Using silhouette for fast object recognition Download PDF

Info

Publication number
WO2019110824A1
WO2019110824A1 PCT/EP2018/084035 EP2018084035W WO2019110824A1 WO 2019110824 A1 WO2019110824 A1 WO 2019110824A1 EP 2018084035 W EP2018084035 W EP 2018084035W WO 2019110824 A1 WO2019110824 A1 WO 2019110824A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
object recognition
silhouette
objects
separated
Prior art date
Application number
PCT/EP2018/084035
Other languages
French (fr)
Inventor
Sylvain Bougnoux
Original Assignee
Imra Europe S.A.S.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imra Europe S.A.S. filed Critical Imra Europe S.A.S.
Priority to JP2020528326A priority Critical patent/JP7317009B2/en
Publication of WO2019110824A1 publication Critical patent/WO2019110824A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data

Definitions

  • the present invention generally relates to machine learning techniques using neural networks.
  • the present invention relates to an object recognition method using silhouette detection.
  • Such method is useful especially in the field of human-assisted or autonomous vehicles using sensors for obstacle detection and avoidance, to navigate safely through its environment.
  • autonomous cars are being developed with deep learning, or neural networks.
  • the neural network depends on an extensive amount of data extracted from real life driving scenarios.
  • the neural network is activated and “learns” to perform the best course of action.
  • sensors such as the LIDAR sensors already used in self-driving cars; cameras to detect the environment, and precise GPS navigation will be used in autonomous cars.
  • the present invention aims to recognize objects using silhouette detection, i.e. the ability to recognize and group occlusion contours, which is a key component of human vision to avoid dangers.
  • the aim is to reproduce such key component and offer this ability to computer vision for many applications such as driver’s assistance, automatic driving or robotic in general.
  • the invention relates to an object recognition method comprising the steps of:
  • This method presents several advantages among which robustness thanks to the combined use of the information contained in the image taken by the image sensor and the information contained in the 3D point cloud obtained by the depth sensor, even in the case of bad conditions such as low light conditions.
  • This approach is also generic as object recognition through silhouettes may be applied to all object categories (humans, poles, trees, animals, vehicles, etc.). Computational costs are low, and computation is fast. Indeed, pixel distribution analysis requires a lot of processing, while silhouette recognizing requires much less processing. Although silhouette processing does not provide a full description of a scene, it nevertheless provides a fundamental cue and a core technology for fast detection with good performances for detecting potential danger within a scene.
  • the image taken by the image sensor is made of a plurality of pixels and the segmentation of the image step comprises the sub-steps of
  • each separated object from the 3D point clustering step by o projecting on the image all 3D points from the 3D point clustering step corresponding to the separated object under consideration; o assessing the projected 3D points as belonging either to the separated object under consideration or to a background; o assessing each pixel of the image as belonging either to the separated object under consideration, to a background, or to an unknown state, using a pixel weight based on color difference and/or distance between two neighboring pixels;
  • the extraction of the silhouette is done by using the graph-cut technology to perform the segmentation of the image with the collaboration of the 3D point clouds.
  • the introduction of 3D information reveals as a major advantage as complex association in the image can be made, separate a cluttered scene or seen poles in completely saturated location etc.
  • the contour detection step comprises the sub-steps of:
  • Such contour detection using both 2D and 3D information allows to easily separate the real contours from the occlusion fake ones, which is a major importance for object recognition.
  • the silhouette descriptor is a 1 D descriptor using a constant description length and preferably the descriptor has a reduced length between 100 and 300 float numbers for an image of more than 1 million pixels.
  • Using a constant and reduced length for the descriptor ensures fast recognition and allows to reduce the neural network number of hidden layers, for instance to two layers with respectively 800 and 600 units.
  • the method further comprising the step of combining the object recognition neural network with at least another trained neural network for object prediction within the image so as to form an end-to-end neural network for object recognition and prediction.
  • silhouette recognition gives flexibility for extending the method to at least another neural network.
  • silhouettes may be used for more elaborated tasks such as danger perception.
  • the invention relates to an assisted or autonomous vehicle comprising:
  • an image sensor unit arranged to capture an image
  • a depth sensor arranged to obtain a 3D point cloud
  • a synchronising unit to (temporally and/or spatially) synchronize the image and the 3D point cloud
  • control unit arranged to control the vehicle based on the recognized objects.
  • the assisted or autonomous vehicle further comprises a display unit arranged to display an information related to the recognized objects and / or an assisted or autonomous driving unit arranged to plan a safe path depending on recognized objects; and wherein the control unit is arranged to activate at least one of the display unit and the assisted or autonomous driving unit.
  • - Figure 1 represents an object recognition method according to a first embodiment of the invention
  • - Figure 2A represents a preferred embodiment for the image segmentation step
  • FIG. 2B represents a preferred embodiment for the contour detection step
  • FIG. 3 represents a vehicle equipped with the necessary units to implement the method according to the invention
  • a silhouette is a contour turning around a set of pixels; it is the intra border between pixels.
  • the silhouette quantum (smallest element - i.e. 1 pixel long) can only have 2 orientations (vertical or horizontal) and 4 directions (up, right, down, left). Note that a single silhouette can be made of many blobs, e.g. if split by an occultation.
  • silhouette is in the global enhancement of perception in general. It is quite accurate both in 2D & 3D vision, and as being generic we can see all objects as identified cluster and recognize most of them.
  • Perceiving silhouette i.e. the ability to recognize and group occlusion contours, is a key component of human vision giving the ability to avoid as fast as possible the dangers. It is done through separation and recognition of the dangers. Our aim is to mimic this key component and offer this ability to computer vision for many applications (driver’s assistance, automatic driving, and robotic in general) instead of pixels distribution analysis (the classical approach) which is possible but requires a lot of (time and power) processing.
  • silhouettes technology can also be the start of other functions (such as action predictions) because it is much easier to understand the pose of humans as input to action recognition.
  • Figure 1 represents an object recognition method according to a first embodiment of the invention.
  • the object recognition method comprising the steps of:
  • S4 extracting silhouettes by segmentation of the image using the 3D point clustering (S41 ), and contour detection of the separated objects into the segmented image (S42);
  • S5 recognizing silhouette by transforming each detected contour into a silhouette descriptor (S51 ), and classifying these silhouette descriptors into recognized objects using a trained neural network for object recognition (S52).
  • the object separation task is made possible with the help of 3D given by a depth sensor such as a laser light scanning unit (LIDAR) to take a continuous series of 3d point clouds.
  • a depth sensor such as a laser light scanning unit (LIDAR)
  • LIDAR laser light scanning unit
  • silhouette extraction is done by using 3D information to separate objects and extracting their contours in an image. Then silhouette recognition is done via a descriptor and a classifier.
  • the sensor collaborates in the sense that the 3D information is used to indicate where to create and look for objects. Then further information is taken from the image with dense pixel information. For instance, a usual depth sensor such as a 64-planes Lidar allows perceiving with enough confidence a pedestrian until 25m, even without image. When going farther or due to using a Lidar with fewer planes, it is the role of the image to take over.
  • Figure 2A represents a preferred embodiment for the image segmentation step.
  • Such segmentation step comprises the sub-steps of:
  • S41 12 assessing the projected 3D points as belonging either to the separated object under consideration, or to a background;
  • S41 13 assessing each pixel of the image as belonging either to the separated object under consideration, to a background, or to an unknown state, using a pixel weight based on color difference and/or distance between two neighboring pixels;
  • S41 14 adjusting the pixel weight for each pixel belonging to the unknown state based on its distance to the pixels belonging to the separated objects and its distance to the pixels belonging to the background;
  • S412 outputting for each separated object, a black and white mask of pixels representative of the background and of the separated object under consideration in the form of one or several blobs.
  • this section describes the algorithm for extracting the silhouette of the object within an image.
  • a cut is a segmentation of the image assessing each pixel to either the foreground (i.e. any object under consideration) or to the background (i.e. anything else than the object under consideration).
  • the edges of the graph (classically the n-link, ie the segments between pixels) are given weight according to the affinity of being such a label or for neighbor pixels to be given the same label.
  • the specificity of the graph-cut here is to add some information from the 3D points. Concretely a weight is added to each segment. For that purpose, all the 3D points of an object are selected (thanks to the clustering step). Then these 3D points are converted as a connex 2D area by projecting the 3D points onto the image.
  • each line of the 3D points constitutes a line in the image (in green), that we surround by a margin, or because the 3D points explicitly belong to another object, by a red pixel.
  • To make the yellow pixels we interpolate between each found extremities, and we extrapolate on the top and on the bottom of the object. Overall, we call these 2D pixels (the 3 categories) the preselected pixels. To deal with synchronization issues the red and green sets are slightly minored.
  • the graph-cut can separate 2 models, the background and the foreground.
  • each pixel is a vertex of the graph.
  • Neighboring pixels are linked by n-links, weighted by a distance on their respective colors. Most classical distances can be used. Then more importantly each pixel is linked to the two terminal vertices, by a t-link.
  • the weight of each t-link is composed by a term from the distance between the color of the pixel and a color model of the foreground respectively of the background (classically a Gaussian Mixture Model - GMM - of their respective model) and a term from the distances to the closest pixel belonging to the foreground (the green pixels), respectively to the closest pixel belongings to the background (the red pixels).
  • a term from the distance between the color of the pixel and a color model of the foreground respectively of the background classically a Gaussian Mixture Model - GMM - of their respective model
  • a term from the distances to the closest pixel belonging to the foreground the green pixels
  • the red pixels the closest pixel belongings to the background
  • the distance on each image direction (horizontal and vertical axis) is weighted by a factor resulting of the 3D points distribution.
  • the horizontal distribution is much denser than the vertical one, therefore the vertical distance is minored compared to the horizontal one.
  • the output of the graph-cut is a black and white mask, i.e. an image in full resolution, all black unless the pixels presumably assessed to the object.
  • the mask has no reason to be connex (i.e. made of a single blob), indeed in many situations a shape is made of several blobs. Now we have to turn this mask into a contour representation.
  • FIG. 2B represents a preferred embodiment for the contour detection step.
  • the contour detection step comprises the sub-steps of:
  • S424 determining fake contour portions for each pixel of the contour assessed with a distance belonging to a closer blob.
  • FIG. 3 represents a vehicle 100 equipped with at least one camera 200 pointing the road ahead or the environment of the vehicle to take a video or a continuous series of images and with a 360° scanning unit 210, such as a laser light scanning unit (LIDAR) to take a continuous series of 3d point clouds.
  • the vehicle 100 also comprises a processing unit and an electronic control unit (300), a display unit and an autonomous driving unit (400, 410).
  • the electronic control unit 300 is connected with the autonomous driving unit comprising a steering unit 400 arranged to steer the vehicle, and a movement control unit 410 comprising a power unit, arranged to maintain or increase a vehicle speed and a braking unit arranged to stop the vehicle or to decrease the vehicle speed, so that vehicle 100 might be driven with the method according to the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An object recognition method comprising the steps of obtaining an image from an image sensor and a 3D point cloud from a depth sensor; synchronizing the image and the 3D point cloud; 3D point clustering to separate objects from the 3D point cloud; extracting silhouettes by segmentation of the image using the 3D point clustering, and contour detection of the separated objects into the segmented image; recognizing silhouette by transforming each detected contour into a silhouette descriptor, and classifying these silhouette descriptors into recognized objects using a trained neural network for object recognition.

Description

USI NG SI LHO U ETTE FOR FAST OBJ ECT RECOG N ITION
FIELD OF THE INVENTION
The present invention generally relates to machine learning techniques using neural networks. In particular, the present invention relates to an object recognition method using silhouette detection.
Such method is useful especially in the field of human-assisted or autonomous vehicles using sensors for obstacle detection and avoidance, to navigate safely through its environment.
BACKGROUND OF THE INVENTION
According to Wikipedia, autonomous cars are being developed with deep learning, or neural networks. The neural network depends on an extensive amount of data extracted from real life driving scenarios. The neural network is activated and “learns” to perform the best course of action. In addition, sensors, such as the LIDAR sensors already used in self-driving cars; cameras to detect the environment, and precise GPS navigation will be used in autonomous cars.
Despite all the recent improvements made to these new technologies for autonomous cars, several drawbacks remain such as detection and behavior in case of rare or unseen driving situations as well as the necessary compromise between the increasing need of computing power on one hand and the absolute need of a high processing speed of all collected information by the vehicle sensors.
SUMMARY OF THE INVENTION
The present invention aims to recognize objects using silhouette detection, i.e. the ability to recognize and group occlusion contours, which is a key component of human vision to avoid dangers. The aim is to reproduce such key component and offer this ability to computer vision for many applications such as driver’s assistance, automatic driving or robotic in general. According to a first aspect, the invention relates to an object recognition method comprising the steps of:
- obtaining an image from an image sensor and a 3D point cloud from a depth sensor;
- synchronizing the image and the 3D point cloud;
- clustering the 3D points to separate objects from the 3D point cloud;
- extracting silhouettes by
o segmentation of the image using the 3D point clustering, and o contour detection of the separated objects into the segmented image;
- recognizing silhouette by
o transforming each detected contour into a silhouette descriptor, and o classifying these silhouette descriptors into recognized objects using a trained neural network for object recognition.
This method presents several advantages among which robustness thanks to the combined use of the information contained in the image taken by the image sensor and the information contained in the 3D point cloud obtained by the depth sensor, even in the case of bad conditions such as low light conditions. This approach is also generic as object recognition through silhouettes may be applied to all object categories (humans, poles, trees, animals, vehicles, etc.). Computational costs are low, and computation is fast. Indeed, pixel distribution analysis requires a lot of processing, while silhouette recognizing requires much less processing. Although silhouette processing does not provide a full description of a scene, it nevertheless provides a fundamental cue and a core technology for fast detection with good performances for detecting potential danger within a scene.
Advantageously, the image taken by the image sensor is made of a plurality of pixels and the segmentation of the image step comprises the sub-steps of
- graph-cutting each separated object from the 3D point clustering step by o projecting on the image all 3D points from the 3D point clustering step corresponding to the separated object under consideration; o assessing the projected 3D points as belonging either to the separated object under consideration or to a background; o assessing each pixel of the image as belonging either to the separated object under consideration, to a background, or to an unknown state, using a pixel weight based on color difference and/or distance between two neighboring pixels;
o adjusting the pixel weight for each pixel belonging to the unknown state based on its distance to the pixels belonging to the separated objects and its distance to the pixels belonging to the background;
- outputting for each separated object a black and white mask of pixels representative of the background and the separated object under consideration in the form of one or several blobs.
The extraction of the silhouette is done by using the graph-cut technology to perform the segmentation of the image with the collaboration of the 3D point clouds. The introduction of 3D information reveals as a major advantage as complex association in the image can be made, separate a cluttered scene or seen poles in completely saturated location etc.
Advantageously, the contour detection step comprises the sub-steps of:
- assessing for each blob a distance based on the 3D point clustering of the corresponding separated object;
- combining all the blobs in a single image by drawing them from furthest to closest and assigning them a different label for further identification, resulting in a superimposed blobs image;
- extracting the contour from the superimposed blobs image corresponding to separated objects;
- determining fake contour portions for each pixel of the contour assessed with a distance belonging to a closer blob.
Such contour detection using both 2D and 3D information allows to easily separate the real contours from the occlusion fake ones, which is a major importance for object recognition.
Advantageously, the silhouette descriptor is a 1 D descriptor using a constant description length and preferably the descriptor has a reduced length between 100 and 300 float numbers for an image of more than 1 million pixels. Using a constant and reduced length for the descriptor ensures fast recognition and allows to reduce the neural network number of hidden layers, for instance to two layers with respectively 800 and 600 units.
Advantageously, the method further comprising the step of combining the object recognition neural network with at least another trained neural network for object prediction within the image so as to form an end-to-end neural network for object recognition and prediction.
Using silhouette recognition gives flexibility for extending the method to at least another neural network. As a core technology, silhouettes may be used for more elaborated tasks such as danger perception.
According to another aspect, the invention relates to an assisted or autonomous vehicle comprising:
- an image sensor unit arranged to capture an image;
- a depth sensor arranged to obtain a 3D point cloud;
a synchronising unit to (temporally and/or spatially) synchronize the image and the 3D point cloud;
- a processing unit arranged to recognize objects within the image according to the object recognition method of any of claims 1 to 6;
- a control unit arranged to control the vehicle based on the recognized objects.
Advantageously, the assisted or autonomous vehicle further comprises a display unit arranged to display an information related to the recognized objects and / or an assisted or autonomous driving unit arranged to plan a safe path depending on recognized objects; and wherein the control unit is arranged to activate at least one of the display unit and the assisted or autonomous driving unit.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the present invention will appear more clearly from the following detailed description of particular non-limitative examples of the invention, illustrated by the appended drawings where:
- Figure 1 represents an object recognition method according to a first embodiment of the invention; - Figure 2A represents a preferred embodiment for the image segmentation step;
- Figure 2B represents a preferred embodiment for the contour detection step;
- Figure 3 represents a vehicle equipped with the necessary units to implement the method according to the invention
DETAILED DESCRIPTION OF THE INVENTION
Before describing in more details the different embodiments of the present invention, here is a reminder of a silhouette definition, term that will be often used as well as some general consideration about the interest to use silhouettes in computer vision for autonomous car or the like.
A silhouette is a contour turning around a set of pixels; it is the intra border between pixels. The silhouette quantum (smallest element - i.e. 1 pixel long) can only have 2 orientations (vertical or horizontal) and 4 directions (up, right, down, left). Note that a single silhouette can be made of many blobs, e.g. if split by an occultation. We define the interior and the exterior by turning clock-wise, i.e. the interior is on the right side of the run.
The main advantage of silhouette is in the global enhancement of perception in general. It is quite accurate both in 2D & 3D vision, and as being generic we can see all objects as identified cluster and recognize most of them. Perceiving silhouette, i.e. the ability to recognize and group occlusion contours, is a key component of human vision giving the ability to avoid as fast as possible the dangers. It is done through separation and recognition of the dangers. Our aim is to mimic this key component and offer this ability to computer vision for many applications (driver’s assistance, automatic driving, and robotic in general) instead of pixels distribution analysis (the classical approach) which is possible but requires a lot of (time and power) processing. Moreover silhouettes technology can also be the start of other functions (such as action predictions) because it is much easier to understand the pose of humans as input to action recognition.
Using 2D and 3D image/depth sensors collaborate for robustness. While camera (2D vision) will be limited by some external conditions such as low light conditions, the depth sensor (3D vision) will not be able to distinguish a traffic light status for instance.
Figure 1 represents an object recognition method according to a first embodiment of the invention. The object recognition method comprising the steps of:
S1 : obtaining an image from an image sensor and a 3D point cloud from a depth sensor;
S2: synchronizing the image and the 3D point cloud;
S3: clustering 3D points to separate objects from the 3D point cloud;
S4: extracting silhouettes by segmentation of the image using the 3D point clustering (S41 ), and contour detection of the separated objects into the segmented image (S42);
S5: recognizing silhouette by transforming each detected contour into a silhouette descriptor (S51 ), and classifying these silhouette descriptors into recognized objects using a trained neural network for object recognition (S52).
The object separation task is made possible with the help of 3D given by a depth sensor such as a laser light scanning unit (LIDAR) to take a continuous series of 3d point clouds. Namely silhouette extraction is done by using 3D information to separate objects and extracting their contours in an image. Then silhouette recognition is done via a descriptor and a classifier.
The sensor collaborates in the sense that the 3D information is used to indicate where to create and look for objects. Then further information is taken from the image with dense pixel information. For instance, a usual depth sensor such as a 64-planes Lidar allows perceiving with enough confidence a pedestrian until 25m, even without image. When going farther or due to using a Lidar with fewer planes, it is the role of the image to take over.
For clustering 3D points step (S2), one can use any of the known solutions among which:
- the point cloud library (www.pointclouds.org), which is a standalone, large scale, open project for 2D/3D image and point cloud processing;
-“On the Segmentation of 3D LIDAR Point Clouds”, by Douillard et al., presenting in part III, segmentation algorithms with 3D clustering methods; -“Shape-based recognition of 3D point clouds in urban environments”, by Golovinskiy et al., presenting a system for recognizing objects in 3D point clouds of urban environments.
For the silhouette extracting step (S4), one can use several existing solutions among which:
-“Shape feature encoding via Fisher Vector for efficient fall detection in depth- videos”, by Adrian et al., presenting the use of Fisher Vectors for feature extraction.
Figure 2A represents a preferred embodiment for the image segmentation step. Such segmentation step comprises the sub-steps of:
S41 1 : graph-cutting each separated object from the 3D point clustering step by
S41 1 1 : projecting on the image all 3D points from the 3D point clustering step corresponding to the separated object under consideration;
S41 12: assessing the projected 3D points as belonging either to the separated object under consideration, or to a background;
S41 13: assessing each pixel of the image as belonging either to the separated object under consideration, to a background, or to an unknown state, using a pixel weight based on color difference and/or distance between two neighboring pixels;
S41 14: adjusting the pixel weight for each pixel belonging to the unknown state based on its distance to the pixels belonging to the separated objects and its distance to the pixels belonging to the background; and
S412: outputting for each separated object, a black and white mask of pixels representative of the background and of the separated object under consideration in the form of one or several blobs.
The most important point in the use of this graph-cut technology is its efficiency to extract complex shapes, its genericity (for any objects but also in another context such as robotics), and its rapidity due to a limited number of uncertain pixels. More specifically, this section describes the algorithm for extracting the silhouette of the object within an image. We start from the clustering of the objects from the 3D point clouds. The extraction is done using graph-cuts.
A cut is a segmentation of the image assessing each pixel to either the foreground (i.e. any object under consideration) or to the background (i.e. anything else than the object under consideration). To perform the segmentation, the edges of the graph (classically the n-link, ie the segments between pixels) are given weight according to the affinity of being such a label or for neighbor pixels to be given the same label.
The specificity of the graph-cut here is to add some information from the 3D points. Concretely a weight is added to each segment. For that purpose, all the 3D points of an object are selected (thanks to the clustering step). Then these 3D points are converted as a connex 2D area by projecting the 3D points onto the image.
As shown on Figure 4, the idea is to use 3 sets of pixels, the set of foreground pixel (i.e. known as belonging to the object - in green below), the set of background pixel (i.e. known as belonging to the background - in red), and the unknown set (i.e. that can be either foreground or background - in yellow). We start from the 3D points ordered in lines. Basically each line of the 3D points constitutes a line in the image (in green), that we surround by a margin, or because the 3D points explicitly belong to another object, by a red pixel. To make the yellow pixels, we interpolate between each found extremities, and we extrapolate on the top and on the bottom of the object. Overall, we call these 2D pixels (the 3 categories) the preselected pixels. To deal with synchronization issues the red and green sets are slightly minored.
Flereafter we refine the description of our weight model. The graph-cut can separate 2 models, the background and the foreground. Classically each pixel is a vertex of the graph. Neighboring pixels are linked by n-links, weighted by a distance on their respective colors. Most classical distances can be used. Then more importantly each pixel is linked to the two terminal vertices, by a t-link. The weight of each t-link is composed by a term from the distance between the color of the pixel and a color model of the foreground respectively of the background (classically a Gaussian Mixture Model - GMM - of their respective model) and a term from the distances to the closest pixel belonging to the foreground (the green pixels), respectively to the closest pixel belongings to the background (the red pixels). For the 1st terminal we directly take this distance (pixel, foreground) as a weight. Whereas for the 2nd terminal, we a distance to the background (pixel, background). We can avoid computing the distance to the background by taking the inverse of the distance to the foreground instead, or reversely. As a refinement for this 2nd term, the distance on each image direction (horizontal and vertical axis) is weighted by a factor resulting of the 3D points distribution. For instance for Lidar, the horizontal distribution is much denser than the vertical one, therefore the vertical distance is minored compared to the horizontal one.
Then the graph-cut is computed by a max-flow/min-cut algorithm.
The output of the graph-cut is a black and white mask, i.e. an image in full resolution, all black unless the pixels presumably assessed to the object. The mask has no reason to be connex (i.e. made of a single blob), indeed in many situations a shape is made of several blobs. Now we have to turn this mask into a contour representation.
Figure 2B represents a preferred embodiment for the contour detection step. The contour detection step comprises the sub-steps of:
S421 : assessing for each blob a distance based on the 3D point clustering of the corresponding separated object;
S422: combining all the blobs in a single image by drawing them from furthest to closest and assigning them a different label for further identification, resulting in a superimposed blobs image;
S423: extracting the contour from the superimposed blobs image corresponding to separated objects;
S424: determining fake contour portions for each pixel of the contour assessed with a distance belonging to a closer blob.
Indeed, it is really important to distinguish fake contours from real ones. Fake contours are artificial ones due to occultations. In our method, this task turns simple because each blob is assessed a distance based on the 3D points it holds. When extracting the contour, we also ran the exterior contour, when the exterior pixel is assessed to a blob closer, the corresponding frontier is marked as fake. Figure 3 represents a vehicle 100 equipped with at least one camera 200 pointing the road ahead or the environment of the vehicle to take a video or a continuous series of images and with a 360° scanning unit 210, such as a laser light scanning unit (LIDAR) to take a continuous series of 3d point clouds. The vehicle 100 also comprises a processing unit and an electronic control unit (300), a display unit and an autonomous driving unit (400, 410).
The electronic control unit 300 is connected with the autonomous driving unit comprising a steering unit 400 arranged to steer the vehicle, and a movement control unit 410 comprising a power unit, arranged to maintain or increase a vehicle speed and a braking unit arranged to stop the vehicle or to decrease the vehicle speed, so that vehicle 100 might be driven with the method according to the present invention.
It will be understood that various modifications and/or improvements evident to those skilled in the art can be brought to the different embodiments of the invention described in the present description without departing from the scope of the invention defined by the accompanying claims.

Claims

CLAI MS
1. An object recognition method comprising the steps of:
- obtaining an image from an image sensor and a 3D point cloud from a depth sensor (S1 );
- synchronizing the image and the 3D point cloud (S2);
- clustering 3D points to separate objects from the 3D point cloud (S3);
- extracting silhouettes (S4) by
o segmentation of the image using the 3D point clustering (S41 ), and o contour detection of the separated objects into the segmented image (S42);
- recognizing silhouette (S5) by
o transforming each detected contour into a silhouette descriptor (S51 ), and
o classifying these silhouette descriptors into recognized objects using a trained neural network for object recognition (S52).
2. The object recognition method according to claim 1 , wherein the image is made of a plurality of pixels and wherein the segmentation of the image step comprises the sub-steps of:
- graph-cutting (S41 1 ) each separated object from the 3D point clustering step by
o projecting on the image all 3D points from the 3D point clustering step corresponding to the separated object under consideration (S41 1 1 );
o assessing the projected 3D points as belonging either to the separated object under consideration, or to a background (S41 12); o assessing each pixel of the image as belonging either to the separated object under consideration, to a background, or to an unknown state, using a pixel weight based on color difference and/or distance between two neighboring pixels (S41 13); o adjusting the pixel weight for each pixel belonging to the unknown state based on its distance to the pixels belonging to the separated objects and its distance to the pixels belonging to the background (S41 14);
- outputting for each separated object, a black and white mask of pixels representative of the background and of the separated object under consideration (S412) in the form of one or several blobs.
3. The object recognition method according to claim 2, wherein the contour detection step comprises the sub-steps of:
- assessing for each blob a distance based on the 3D point clustering of the corresponding separated object (S421 );
- combining all the blobs in a single image by drawing them from furthest to closest and assigning them a different label for further identification, resulting in a superimposed blobs image (S422);
- extracting the contour from the superimposed blobs image corresponding to separated objects (S423);
- determining fake contour portions for each pixel of the contour assessed with a distance belonging to a closer blob (S424).
4. The object recognition method according to any of claims 1 to 3, wherein the silhouette descriptor is a 1 D descriptor using a constant description length.
5. The object recognition method according to claim 4, wherein the silhouette descriptor has a reduced length.
6. The object recognition method according to any of claims 1 to 5, further comprising the step of:
- combining the object recognition neural network with at least another trained neural network for object prediction within the image so as to form an end-to-end neural network for object recognition and prediction.
7. An assisted or autonomous vehicle (100) comprising: - an image sensor unit (200) arranged to capture an image;
- a depth sensor (210) arranged to obtain a 3D point cloud;
- a processing unit (300) arranged:
o to synchronize the image and the 3D point cloud;
o to recognize objects within the image according to the object recognition method of any of claims 1 to 6;
- a control unit arranged to control the vehicle (100) based on the recognized objects.
8. The assisted or autonomous vehicle (100) according to claim 7, further comprising:
- a display unit arranged to display an information related to the recognized objects; and / or
- an assisted or autonomous driving unit (400, 410) arranged to plan a safe path depending on recognized objects; and
wherein the control unit is arranged to activate at least one of the display unit and the assisted or autonomous driving unit.
PCT/EP2018/084035 2017-12-07 2018-12-07 Using silhouette for fast object recognition WO2019110824A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2020528326A JP7317009B2 (en) 2017-12-07 2018-12-07 Using Silhouettes for Fast Object Recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1761804 2017-12-07
FR1761804A FR3074941B1 (en) 2017-12-07 2017-12-07 USE OF SILHOUETTES FOR FAST RECOGNITION OF OBJECTS

Publications (1)

Publication Number Publication Date
WO2019110824A1 true WO2019110824A1 (en) 2019-06-13

Family

ID=61655891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/084035 WO2019110824A1 (en) 2017-12-07 2018-12-07 Using silhouette for fast object recognition

Country Status (3)

Country Link
JP (1) JP7317009B2 (en)
FR (1) FR3074941B1 (en)
WO (1) WO2019110824A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110570442A (en) * 2019-09-19 2019-12-13 厦门市美亚柏科信息股份有限公司 Contour detection method under complex background, terminal device and storage medium
CN111008560A (en) * 2019-10-31 2020-04-14 重庆小雨点小额贷款有限公司 Livestock weight determination method, device, terminal and computer storage medium
CN111198563A (en) * 2019-12-30 2020-05-26 广东省智能制造研究所 Terrain recognition method and system for dynamic motion of foot type robot
CN112445215A (en) * 2019-08-29 2021-03-05 阿里巴巴集团控股有限公司 Automatic guided vehicle driving control method, device and computer system
CN112561836A (en) * 2019-09-25 2021-03-26 北京地平线机器人技术研发有限公司 Method and device for acquiring point cloud set of target object
CN112975957A (en) * 2021-02-07 2021-06-18 深圳市广宁股份有限公司 Target extraction method, system, robot and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9036902B2 (en) * 2007-01-29 2015-05-19 Intellivision Technologies Corporation Detector for chemical, biological and/or radiological attacks
JP2017129543A (en) * 2016-01-22 2017-07-27 京セラ株式会社 Stereo camera device and vehicle

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHUNZHAO GUO ET AL: "Hierarchical road understanding for intelligent vehicles based on sensor fusion", INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2011 14TH INTERNATIONAL IEEE CONFERENCE ON, IEEE, 5 October 2011 (2011-10-05), pages 1672 - 1679, XP032023391, ISBN: 978-1-4577-2198-4, DOI: 10.1109/ITSC.2011.6082996 *
DARAEI M HOSSEIN ET AL: "Region segmentation using LiDAR and camera", 2017 IEEE 20TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), IEEE, 16 October 2017 (2017-10-16), pages 1 - 6, XP033330515, DOI: 10.1109/ITSC.2017.8317861 *
FREDRIK LARSSON ET AL: "Using Fourier Descriptors and Spatial Models for Traffic Sign Recognition - Poster", IMAGE ANALYSIS, 1 January 2011 (2011-01-01), pages 238, XP055496942 *
JELMER DE VRIES: "Object Recognition: A Shape-Based Approach using Artificial Neural Networks", MASTER THESIS UNIV. UTRECHT, 1 January 2006 (2006-01-01), XP055497147, Retrieved from the Internet <URL:http://www.ai.rug.nl/~mwiering/ObjectRecognition.pdf> [retrieved on 20180803] *
KESER TOMISLAV ET AL: "Traffic signs shape recognition based on contour descriptor analysis", 2016 INTERNATIONAL CONFERENCE ON SMART SYSTEMS AND TECHNOLOGIES (SST), IEEE, 12 October 2016 (2016-10-12), pages 199 - 204, XP033016417, ISBN: 978-1-5090-3718-6, [retrieved on 20161202], DOI: 10.1109/SST.2016.7765659 *
LARSSON FREDRIK ET AL: "Using Fourier Descriptors and Spatial Models for Traffic Sign Recognition", 2011, MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2015 : 18TH INTERNATIONAL CONFERENCE, MUNICH, GERMANY, OCTOBER 5-9, 2015; PROCEEDINGS; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CH, ISBN: 978-3-642-16065-3, ISSN: 0302-9743, XP047469530 *
WENQI HUANG ET AL: "Fusion Based Holistic Road Scene Understanding", 29 June 2014 (2014-06-29), XP055496875, Retrieved from the Internet <URL:https://arxiv.org/pdf/1406.7525.pdf> *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445215A (en) * 2019-08-29 2021-03-05 阿里巴巴集团控股有限公司 Automatic guided vehicle driving control method, device and computer system
CN110570442A (en) * 2019-09-19 2019-12-13 厦门市美亚柏科信息股份有限公司 Contour detection method under complex background, terminal device and storage medium
CN112561836A (en) * 2019-09-25 2021-03-26 北京地平线机器人技术研发有限公司 Method and device for acquiring point cloud set of target object
CN112561836B (en) * 2019-09-25 2024-04-16 北京地平线机器人技术研发有限公司 Method and device for acquiring point cloud set of target object
CN111008560A (en) * 2019-10-31 2020-04-14 重庆小雨点小额贷款有限公司 Livestock weight determination method, device, terminal and computer storage medium
CN111198563A (en) * 2019-12-30 2020-05-26 广东省智能制造研究所 Terrain recognition method and system for dynamic motion of foot type robot
CN111198563B (en) * 2019-12-30 2022-07-29 广东省智能制造研究所 Terrain identification method and system for dynamic motion of foot type robot
CN112975957A (en) * 2021-02-07 2021-06-18 深圳市广宁股份有限公司 Target extraction method, system, robot and storage medium

Also Published As

Publication number Publication date
JP7317009B2 (en) 2023-07-28
JP2021511556A (en) 2021-05-06
FR3074941A1 (en) 2019-06-14
FR3074941B1 (en) 2021-01-15

Similar Documents

Publication Publication Date Title
WO2019110824A1 (en) Using silhouette for fast object recognition
Wang et al. Appearance-based brake-lights recognition using deep learning and vehicle detection
Zhao et al. Stereo-and neural network-based pedestrian detection
Yahiaoui et al. Fisheyemodnet: Moving object detection on surround-view cameras for autonomous driving
EP3627446B1 (en) System, method and medium for generating a geometric model
Yoneyama et al. Robust vehicle and traffic information extraction for highway surveillance
Jebamikyous et al. Autonomous vehicles perception (avp) using deep learning: Modeling, assessment, and challenges
US11024042B2 (en) Moving object detection apparatus and moving object detection method
Deepika et al. Obstacle classification and detection for vision based navigation for autonomous driving
Premachandra et al. Detection and tracking of moving objects at road intersections using a 360-degree camera for driver assistance and automated driving
Elhousni et al. Automatic building and labeling of hd maps with deep learning
JP2019053625A (en) Moving object detection device, and moving object detection method
Poostchi et al. Semantic depth map fusion for moving vehicle detection in aerial video
US9558410B2 (en) Road environment recognizing apparatus
Rashed et al. Bev-modnet: Monocular camera based bird's eye view moving object detection for autonomous driving
Huu et al. Proposing Lane and Obstacle Detection Algorithm Using YOLO to Control Self‐Driving Cars on Advanced Networks
Sirbu et al. Real-time line matching based speed bump detection algorithm
Kazerouni et al. An intelligent modular real-time vision-based system for environment perception
Zhang et al. Night time vehicle detection and tracking by fusing sensor cues from autonomous vehicles
Omar et al. Detection and localization of traffic lights using YOLOv3 and Stereo Vision
US20230394680A1 (en) Method for determining a motion model of an object in the surroundings of a motor vehicle, computer program product, computer-readable storage medium, as well as assistance system
Iftikhar et al. Traffic Light Detection: A cost effective approach
Choe et al. HazardNet: Road Debris Detection by Augmentation of Synthetic Models
Silar et al. Objects Detection and Tracking on the Level Crossing
Lee et al. Dense disparity map-based pedestrian detection for intelligent vehicle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18814607

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020528326

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18814607

Country of ref document: EP

Kind code of ref document: A1