WO2021228686A1 - Détection améliorée d'objets - Google Patents

Détection améliorée d'objets Download PDF

Info

Publication number
WO2021228686A1
WO2021228686A1 PCT/EP2021/062026 EP2021062026W WO2021228686A1 WO 2021228686 A1 WO2021228686 A1 WO 2021228686A1 EP 2021062026 W EP2021062026 W EP 2021062026W WO 2021228686 A1 WO2021228686 A1 WO 2021228686A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
surroundings
resolution
areas
environment
Prior art date
Application number
PCT/EP2021/062026
Other languages
German (de)
English (en)
Inventor
Fabian BURGER
Philippe Lafon
Thomas Boulay
Diego Mendoza Barrenechea
Flora Dellinger
Prashanth Viswanath
Original Assignee
Valeo Schalter Und Sensoren Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valeo Schalter Und Sensoren Gmbh filed Critical Valeo Schalter Und Sensoren Gmbh
Priority to EP21723981.3A priority Critical patent/EP4150508A1/fr
Publication of WO2021228686A1 publication Critical patent/WO2021228686A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to a method for recognizing objects in an image of the surroundings using a neural network, in particular a convolutional neural network using deep learning, for a driving support system of a vehicle.
  • a neural network in particular a convolutional neural network using deep learning
  • the image of the surroundings is received and encoded to provide a two-dimensional grid with image information.
  • Object recognition is then carried out based on the image information.
  • the present invention also relates to a driving assistance system for a vehicle, in particular as an improved driver assistance system, with at least one camera-based environment sensor for providing an image of the surroundings and a control unit which receives the image of the environment from the at least one camera-based environment sensor, the driving assistance system being designed to carry out the above method .
  • ADAS Advanced Driver Assistance Systems
  • An important system parameter is a grid size of the grid, i.e. a size of cells that are defined by the grid.
  • This grid size defines a total number of objects that can be recognized and classified. In addition, this results in a spatial accuracy of the detection and classification of the objects.
  • FIG. 1 a shows an image of the surroundings 100 that was recorded with a camera-based surroundings sensor of a vehicle.
  • the image of the surroundings 100 shows a roadway 102 with two lateral footpaths 104.
  • several objects 106 which are pedestrians here, can be seen in the image of the surroundings 100.
  • FIG. 1b) shows a uniform grid placed over the image of the surroundings 100 with a plurality of regular cells 108.
  • the cells 108 define a resolution for which the image of the surroundings 100 is encoded and image information is provided.
  • the grid is selected to be fine, so that distant objects 106 can also be reliably detected and classified in the image of the surroundings 100. For nearby objects 106, however, the identification and classification of the objects 106 requires a comparatively large amount of processing.
  • FIG. 1c) also shows a uniform grid placed over the image of the surroundings 100 with a plurality of regular cells 108. The cells 108. In FIG. 1c), the grid is roughly selected compared to the representation in FIG. 1b), so that objects in the vicinity are recognized very efficiently and can be classified.
  • the fine grid of FIG. 1b also enables an improved differentiation of objects that extend over several cells. However, this goes hand in hand with an increased processing effort, with an increased number of regression steps typically being required.
  • the invention is therefore based on the object of developing a method for recognizing objects in an image of the surroundings using a neural network, in particular a convolutional neural network using deep learning, for a driving support system of a vehicle, as well specify a corresponding driving support system for carrying out the method, which enables a reliable and efficient detection of objects in images of the surroundings.
  • a method for recognizing objects in an image of the surroundings using a neural network, in particular a convolutional neural network using deep learning, for a driving support system of a vehicle comprising the steps of receiving the image of the surroundings, encoding the image of the surroundings to provide a two-dimensional grid, which has a first resolution, with Image information, subdividing the surrounding image into a plurality of image areas with at least one first image area and at least one second image area, performing a decoding step in the at least one second image area to provide a two-dimensional grid that has a second resolution that is lower than the first resolution Image information, and performing an object recognition based on the image information of the plurality of image areas, wherein the at least one first image area has the first resolution and the at least one second image area has the second resolution.
  • a driving support system for a vehicle in particular as an improved driver assistance system, with at least one camera-based environment sensor for providing an image of the surroundings and a control unit that receives the image of the surroundings from the at least one camera-based environment sensor is also specified, the driving assistance system being designed to carry out the above method .
  • the basic idea of the present invention is therefore to provide the image information of an image of the surroundings with a different degree of detail so that, on the one hand, the entire image of the surroundings can be efficiently processed and, on the other hand, no important detailed information is lost.
  • the image of the surroundings is first encoded in order to provide the image information with the first resolution.
  • the image information of the first resolution is preprocessed in the decoding step in order to provide the at least one second image area with a lower resolution of the image information.
  • At least a first image area remains with the resolution of the image information as it is present after encoding, and which can be decoded without the need for additional processing in order to recognize and classify objects.
  • the image information is provided with a resolution depending on the respective image area, so that it can be processed efficiently in a correspondingly adapted network structure of the neural network.
  • the detection of the objects can be carried out optimally for each of the image areas, since there are comparable ratios of the objects in relation to the cells of the grid.
  • distant objects which are relatively small in the surrounding image, can have a high reliability. Close objects that are relatively large in the image of the surroundings can also be recognized well.
  • the training for recognizing the objects is made easier, since the objects are represented similarly in each of the image areas and are therefore easy to recognize.
  • meta-knowledge about the information to be expected in the image of the surroundings is preferably used in order to define the different image areas and to divide the image of the surroundings accordingly.
  • the meta-knowledge relates, for example, to knowledge about an assembly and / or alignment of the at least one camera-based environmental sensor on the vehicle.
  • the grid defines an arrangement of cells with image information.
  • the grid can, for example, define a cell size of 16 x 16 pixels or 32 x 32 pixels for the first resolution, a balance here too between a desired level of detail in the detection and classification of the objects and the processing speed.
  • the grid can, for example, define a corresponding cell size of 32 x 32 pixels or 64 x 64 pixels for the second resolution.
  • the grid can define cells with any dimensions, it not being necessary for the cells to include the same number of pixels in each plane direction. This applies to each resolution independently.
  • the cells for one resolution can have a square shape, while the cells for another resolution have a rectangular shape.
  • the cells for another resolution can have a different rectangular shape.
  • the cells of the at least one second image area with the second resolution each combine a plurality of cells of the at least one image area with the first resolution.
  • the cells for the grid can be newly formed with the second resolution and, for example, comprise non-integer multiples of cells with the first resolution.
  • the at least one second image area relates to an area of the environmental image that is defined by the second resolution.
  • the image of the surroundings can thus have a plurality of independent second image areas, which can be contiguous or non-contiguous. The same applies to the at least one first image area.
  • the subdivision of the surrounding image into the plurality of image areas can accordingly take place with a high degree of freedom.
  • the detection of objects in the image of the surroundings relates to a detection of the objects with their position and a classification of the object, for example as a pedestrian, car, truck, tree, house, dog or the like.
  • the neural network is designed in particular as a convolutional neural network using deep learning.
  • Convolutional Neural Networks are widespread in the field of object recognition and are highly reliable.
  • the driving assistance system is designed, for example, as an improved driver assistance system.
  • improved driver assistance systems are known, for example, as ADAS (Advanced Driver Assistance Systems) and can include various functions. These functions can include, for example, a blind spot assistant, a lane departure warning system and / or a collision warning and protection system.
  • ADAS Advanced Driver Assistance Systems
  • these functions can include, for example, a blind spot assistant, a lane departure warning system and / or a collision warning and protection system.
  • the detection of objects is also relevant for other driving support functions through to the implementation of functions for autonomous driving of vehicles.
  • the environment image is an image provided by the camera-based environment sensor. It contains a matrix with image points (pixels) which at least partially reproduce the surroundings of the vehicle.
  • the surrounding image can only include brightness information for the individual pixels, ie the image of the surroundings is an image in the manner of a black / white image, or brightness information for a plurality of colors, for example in the RGB format or others.
  • the camera-based environment sensor can be designed as a camera that only provides a brightness value for individual pixels, or the camera-based environment sensor is, for example, a camera for providing color information, ie a brightness value for each color that can be perceived by the camera.
  • the environment image can be provided by a single camera-based environment sensor alone or as a combination of several individual images from a plurality of camera-based environment sensors together.
  • the latter usually relates to a combination of the multiple individual images in a horizontal direction in order to create the image of the surroundings in the manner of a panorama image.
  • the camera-based environment sensor can be designed as an optical camera.
  • optical camera For example, wide-angle cameras through to cameras with fish-eye lenses are used in current vehicles for monitoring the surroundings.
  • the optical camera can be designed for visible light and / or for light in wavelengths that are not visible to humans, for example for ultraviolet light or for infrared light.
  • the encoding of the image of the surroundings to provide a two-dimensional grid with image information includes providing image information for each cell formed by the grid. Different image information can be provided for this purpose, as set out below.
  • the encoding of the image of the surroundings includes, in particular, an encoding of the image of the surroundings with a CNN encoder to provide the image information for the entire image of the surroundings in accordance with the grid with the first resolution.
  • the resolution of the image information relates to a level of detail in the image information. More image information means a higher resolution, less image information means a lower resolution. Accordingly, a higher resolution means providing one finer grid, ie with smaller cells, whereas a lower resolution means providing a coarser grid, ie with larger cells.
  • Carrying out the decoding step in the at least one second image area to provide a two-dimensional raster, which has a second resolution that is lower than the first resolution, with image information relates to processing the image information with the first resolution in the area of the at least one second image area to provide the two-dimensional grid with the image information with the second resolution therefrom.
  • the image information in the at least one first image area is adopted unchanged for the subsequent object recognition.
  • An object recognition is carried out based on the image information of the plurality of image areas.
  • Various approaches are known as such in the prior art to perform object recognition, as further specified below.
  • the image of the surroundings is provided by the at least one camera-based surroundings sensor and transmitted to the control unit.
  • the control unit After receiving the image of the surroundings, the control unit carries out the encoding of the image of the surroundings, the subdivision of the image of the surroundings into the image areas, the decoding step in the at least one second image area to provide the two-dimensional grid, which has a second resolution that is lower than the first resolution, with image information as well as performing the object recognition based on the image information of the plurality of image areas.
  • the control unit is also referred to as an ECU (Electronic Control Unit).
  • the control unit is preferably designed as an embedded device and is provided in the vehicle.
  • subdividing the environmental image into a plurality of image areas includes subdividing the environmental image into a plurality of image areas with at least one third image area, and the method includes an additional step of performing a decoding step in the at least one third image area to provide a two-dimensional grid that has a third resolution that is lower than the first resolution and is different from the second resolution, with image information.
  • the surrounding image can therefore be divided into three image areas, whereby the same principles can be applied as with the division into only two image areas.
  • a division into four or more image areas with different resolutions is also conceivable.
  • the individual image areas can be arranged contiguously or distributed and not contiguous.
  • dividing the image of the surroundings into a plurality of image areas includes dividing the image of the surroundings into a plurality of image areas with at least one fourth image area, and the method includes an additional step for discarding image information in the at least one fourth image area.
  • the meta-knowledge relates to a knowledge of the installation and alignment of the at least one camera-based environment sensor on the vehicle, whereby, for example, areas in the environment image can be identified that are covered or overlap with a field of view of another camera and therefore do not have to be processed twice.
  • areas with strong distortions can be excluded from further processing, as can sometimes occur when using wide-angle optics through to fish-eye lenses.
  • the subdivision of the environmental image into a plurality of image areas with the at least one fourth image area is preferably carried out as a static subdivision, in particular when the at least one fourth image area is based on meta-knowledge about the assembly and alignment of the at least one camera-based environmental sensor on the vehicle, i.e. on static information .
  • the fourth image area can also be defined dynamically, for example by determining the horizon or an area with sky in previous images of the surroundings.
  • the method comprises an additional step for identifying a horizon of the image of the surroundings, and the subdivision of the image of the surroundings into a plurality of image areas with at least one fourth image area is carried out based on the horizon.
  • objects are usually located below or only slightly above a horizon plane in the image of the surroundings.
  • image information can be discarded from an upper edge of the image downwards, but at a distance above the horizon, since no relevant objects are to be expected there on the road, ie no objects that are relevant for driving the vehicle.
  • objects in the air are usually of little relevance.
  • an upper row with cells with the encoded image information in the grid with the first resolution above the horizon can be discarded.
  • several rows with cells can also be discarded.
  • dividing the image of the surroundings into a plurality of image areas comprises dividing the image of the surroundings into two image areas along at least one horizontal line, with the at least one second image area and / or the at least one third image area being based on an orientation of the image of the surroundings below the at least one horizontal line is arranged.
  • the subdivision of the image of the surroundings to form the different image areas along the horizontal line or along a plurality of horizontal lines is based on a typical image division of the at least one camera-based environmental sensor. In particular when driving outside of built-up areas, closer objects are typically located below a horizontal line in the image of the surroundings compared to more distant objects.
  • a size within the image of the surroundings is typically dependent on their vertical position in the image of the surroundings. This can be taken into account by dividing the image of the surroundings along the at least one horizontal line. Image areas below a horizontal line preferably have a grid with a lower resolution than image areas above the corresponding horizontal line.
  • performing an object recognition based on the image information of the plurality of image areas includes performing an independent object recognition in the plurality of image areas and merging the object recognition of the plurality of image areas for object recognition in the surrounding image.
  • the same principles can therefore be used for each of the image areas can be used to capture and recognize objects.
  • the object recognition for the different image areas can also be carried out with the same decoder, since there are no or only minor differences in principle for the objects with regard to the resolution of the raster in the different image areas.
  • performing an independent object recognition in the plurality of image areas includes an independent object recognition using at least one regression layer of a deep neural network, YOLO and / or SSD.
  • a deep neural network YOLO and / or SSD.
  • Each image area of the respective environment image can be processed in the same way or in a different way.
  • the same deep neural network can also be used to process the image information of different image areas, since objects have the same properties regardless of their position.
  • YOLO is an abbreviation for "You only look once”
  • SSD is an abbreviation for "Single Shot multibox Detector". Both YOLO and SSD are known as such in the prior art and are therefore not explained in detail at this point.
  • YOLO as well as SSD are well suited for real-time object recognition, especially in embedded systems.
  • the merging of the object recognition of the plurality of image areas for object recognition in the surrounding image includes providing a uniform resolution space for providing a list with merged object recognitions.
  • the recognized objects can be made available for further processing in a uniform manner.
  • encoding the image of the surroundings to provide a two-dimensional grid with image information and / or performing a decoding step in the at least one second image area to provide a two-dimensional grid with image information for each cell defined by the grid includes providing for each recognized object an object trustworthiness, a position of a bounding box enclosing the object, determining dimensions of the bounding box, and a Determining an object class probability for each object class to be recognized.
  • object trustworthiness specifies how high the trust in the existence of an object is.
  • the object class probability for each possible object class indicates the probability that the recognized object belongs to the corresponding object class. Further information is the position and dimensions of the bounding box that encloses the object, which enables easy handling of the recognized object. Objects at borders within each cell can also lie at borders thereof, and the objects themselves can extend over these several cells as recognized objects of several cells.
  • the image information preferably includes information that relates not only to the respective cell, but also to neighboring cells or other cells located in the vicinity.
  • the objects can be recognized with a high degree of reliability, in particular in the case of objects that extend over more than a single cell.
  • FIG. 1 shows a view of an image of the surroundings with a road with lateral
  • FIG. 2 shows a view of a vehicle with a driving assistance system, in particular as an improved driver assistance system, with a camera-based environment sensor for providing an image of the environment and a control unit that controls the Receives environmental image from the camera-based environmental sensor, according to a first, preferred embodiment,
  • FIG. 3 shows a view of an image of the surroundings with a road with lateral
  • Footpaths and a plurality of people alone and with a grid comprising an image area with a fine grid and an image area with a coarse grid in accordance with the first embodiment
  • FIG. 4 shows a system illustration of the driving assistance system from FIG.
  • FIG. 5 shows a view of an image of the surroundings with a road with sidewalks and a person extending over several cells of an image area with a fine grid and several cells of an image area with a coarse grid, in accordance with the first embodiment
  • FIG. 6 shows a flow diagram of a method for recognizing objects in an image of the surroundings using a neural network in accordance with the first embodiment.
  • FIG. 2 shows a vehicle 10 with a driving support system 12 according to a first, preferred embodiment.
  • the driving assistance system 12 is designed, for example, as an improved driver assistance system.
  • improved driver assistance systems are known, for example, as ADAS (Advanced Driver Assistance Systems) and can include various functions. These functions can include, for example, a blind spot assistant, a lane departure warning system and / or a collision warning and protection system.
  • the driving support system 12 can support functions up to and including autonomous driving of the vehicle 10.
  • the driving assistance system 12 is shown by way of example in FIG. 2 with a camera-based environment sensor 14.
  • the camera-based environment sensor 14 is an optical camera in this exemplary embodiment.
  • the optical camera 14 has, for example, a resolution of approximately 2 megapixels.
  • the driving support system 12 also includes a control unit 16.
  • the control unit 16 is also referred to as an ECU (Electronic Control Unit) in the field of vehicles.
  • the control unit 16 is embodied as an embedded device and is provided in the vehicle 10.
  • the optical camera 14 is connected to the control unit 16 via a data bus 18.
  • the optical camera 14 detects the surroundings 20 of the vehicle 10 and records images of the surroundings 30, which are transmitted to the control unit 16 via the data bus 18.
  • the surroundings images 30 each contain a matrix with image points (pixels) which at least partially reproduce the surroundings 20 of the vehicle 10.
  • the image of the surroundings 30 comprises brightness information for a plurality of colors for each pixel, for example in the RGB or other format, which is provided by the optical camera 14.
  • FIGS. 3 to 6 a method for recognizing objects 36 in the image of the surroundings 30 using a neural network is described below.
  • objects 36 pedestrians 36 are represented in the image of surroundings 30 by way of example.
  • the neural network is a convolutional neural network using deep learning. The method is carried out with the driving support system 12 described above.
  • step S100 which relates to receiving the image 30 of the surroundings.
  • the image of the surroundings 30 is recorded by the optical camera 14 and transmitted to the control unit 16 via the data bus 18.
  • Step S110 relates to an encoding of the image of the surroundings 30 in order to provide a two-dimensional grid 38, which has a first resolution, with image information.
  • a plurality of cells 40 is formed by the grid 38, image information being provided for each of the cells 40 by the encoding.
  • the grid 38 thus defines an arrangement of the cells 40 with image information, the cells 40 in the exemplary embodiment described having a cell size of 16 ⁇ 16 pixels for the first resolution.
  • the encoding of the image of the surroundings 30 includes an encoding of the image of the surroundings 30 with a CNN encoder 42, which is shown in FIG. 4, for providing the image information for the entire image of the surroundings 30 according to the grid 38 with the first resolution.
  • the control unit 16 includes the encoder 42 and carries out the encoding of the environmental image 30.
  • An object trustworthy value, a position of a bounding box 44 which encloses the object 36, dimensions of the bounding box 44 and an object class probability for each object class to be recognized are determined as image information for each cell 40 defined by the grid 38 for each recognized object 36.
  • the object trustworthiness specifies how high the trust in the existence of an object 36 is.
  • the object class probability for each possible object class indicates the probability that the recognized object 36 belongs to the corresponding object class.
  • Further information is the position and dimensions of the bounding box 44 which encloses the object 36.
  • the image information includes information that relates not only to the respective cell 40, but also to neighboring cells 40 or other cells 40 located in the vicinity.
  • Step S120 relates to subdividing the image of the surroundings 30 into a plurality of image areas 46a, 46b, 46c.
  • the image of the surroundings is first divided into an upper half 50 and a lower half 52 of the image.
  • the upper half of the image 50 is then divided into a first image area 46a and a fourth image area 46c.
  • the fourth image area 46c is formed on the upper edge of the image 30 of the surroundings.
  • the lower half of the image 52 forms a second image area 46b.
  • the image of the surroundings 30 is divided along horizontal lines 48.
  • a horizon of the image of the surroundings 30 lies parallel to the two horizontal lines 48, as a result of which the subdivision of the image areas 46a, 46b, 46c and in particular the establishment of the fourth image area 46c takes place based on the horizon.
  • the subdivision of the surrounding image 30 into the image areas 46a, 46b, 46c takes place as a static subdivision.
  • Step S120 can also be carried out at any earlier point in time than the configuration of the driving support system 12.
  • the subdivision of the environmental image into the image areas 46a, 46b, 46c is therefore identical for all environmental images 30 of the same type, i.e. for all environmental images 30 of the optical camera 14.
  • Step S130 relates to discarding image information in the fourth image area 46c.
  • the image information at the upper edge of the environmental image 30 is discarded, i.e. the image information from an upper image edge of the environmental image 30 downwards, but at a distance above the horizon, is discarded, since no relevant objects 36 are to be expected there on the road 32.
  • Step S140 relates to carrying out a decoding step in the second image region 46b in order to provide a two-dimensional raster 38, which has a second resolution, which is lower than the first resolution, with image information.
  • the image information is fed to a decoder 54, which is implemented in the control unit 16 and carries out the decoding step.
  • the decoding step comprises processing the image information with the first resolution of the surrounding image 30 in the area of the second image area 46b in order to provide therefrom the two-dimensional raster 38 with the image information with the second resolution, which is lower than the first resolution.
  • This is indicated in FIG. 4 by the fact that the lower half of the image 52 at the end of the decoding step, ie after passing through the decoder 54, has a smaller size than before passing through the decoder 54.
  • the grid 38 in the second Image area 46b has a cell size of 32 x 32 pixels.
  • the image information of the image areas 46a, 46b, 46c is then combined and further processed together, as shown in FIG. There is one Combination of the two different grids 38 for an image of the surroundings 30, as is also shown in FIG. 3b.
  • Step S150 relates to performing an object recognition based on the image information of the image areas 46a, 46b, 46c.
  • the image information is adopted unchanged in the first image area 46a.
  • the detection of objects 36 in the image of the surroundings 30 relates to a detection of the objects 36 with their position and a classification of the respective object 36, for example as a pedestrian, car, truck, tree, house, dog or the like.
  • An object recognition is carried out based on the image information of the plurality of image areas 46a, 46b, 46c. In this case, an independent object recognition is carried out in the first and second image areas 46a, 46b. The object recognition of the first and second image areas 46a, 46b is then merged to completely complete the object recognition in the environmental image 30. The same principles are used for the first and second image areas 46a, 46b in order to detect and recognize the objects 36. The object recognition for the first and second image areas 46a, 46b can in principle be carried out with the same decoder 54.
  • the object recognition is carried out in detail using at least one regression layer of a deep neural network, YOLO and / or SSD.
  • YOLO is an abbreviation for "You only look once”
  • SSD is an abbreviation for "Single Shot multibox Detector”.
  • the merging of the object recognition of the first and second image areas 46a, 46b for object recognition in the environment image 30 includes providing a uniform resolution space for providing a list with merged object recognitions. As a result, the recognized objects 36 are made available in a uniform manner for further processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

L'invention concerne un procédé de reconnaissance d'objets (36) dans une image d'environnement (30) à l'aide d'un réseau neuronal, en particulier un réseau neuronal convolutif utilisant un apprentissage profond, pour un système d'aide à la conduite (12) d'un véhicule (10), comprenant les étapes suivantes : la réception de l'image d'environnement (30) ; le codage de l'image d'environnement (30) pour fournir une grille bidimensionnelle (38) ayant une première résolution avec des informations d'image ; la division de l'image d'environnement (30) en une pluralité de régions d'image (46a, 46b, 46c) avec au moins une première zone d'image (46a) et au moins une seconde région d'image (46b) ; la réalisation d'une étape de décodage dans la ou les secondes régions d'image (46b) pour fournir une grille bidimensionnelle (38) ayant une seconde résolution qui est inférieure à la première résolution avec des informations d'image ; et la réalisation d'une reconnaissance d'objet sur la base des informations d'image de la pluralité de régions d'image (46a, 46b, 46c), la ou les premières régions d'image (46a) ayant la première résolution et la ou les secondes régions d'image (46b) ayant la seconde résolution. L'invention concerne également un système d'aide à la conduite (12) pour un véhicule (10) comprenant au moins un capteur d'environnement basé sur une caméra (14) pour fournir une image d'environnement (30) et une unité de commande (16) qui reçoit l'image d'environnement (30) en provenance du ou des capteurs d'environnement basé sur une caméra (14), le système d'aide à la conduite (12) étant configuré pour mettre en œuvre le procédé ci-dessus.
PCT/EP2021/062026 2020-05-12 2021-05-06 Détection améliorée d'objets WO2021228686A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21723981.3A EP4150508A1 (fr) 2020-05-12 2021-05-06 Détection améliorée d'objets

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020112860.6 2020-05-12
DE102020112860.6A DE102020112860A1 (de) 2020-05-12 2020-05-12 Verbesserte Detektion von Objekten

Publications (1)

Publication Number Publication Date
WO2021228686A1 true WO2021228686A1 (fr) 2021-11-18

Family

ID=75850207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/062026 WO2021228686A1 (fr) 2020-05-12 2021-05-06 Détection améliorée d'objets

Country Status (3)

Country Link
EP (1) EP4150508A1 (fr)
DE (1) DE102020112860A1 (fr)
WO (1) WO2021228686A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9305214B1 (en) * 2013-10-29 2016-04-05 The United States Of America, As Represented By The Secretary Of The Navy Systems and methods for real-time horizon detection in images
US20190325263A1 (en) * 2018-04-23 2019-10-24 Intel Corporation Non-maximum suppression of features for object detection

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102011121473A1 (de) 2011-12-17 2013-06-20 Valeo Schalter Und Sensoren Gmbh Verfahren zum Anzeigen von Bildern auf einer Anzeigeeinrichtung eines Kraftfahrzeugs,Fahrerassistenzeinrichtung, Kraftfahrzeug und Computerprogramm
EP2696310B1 (fr) 2012-08-10 2017-10-18 Delphi Technologies, Inc. Procédé destiné à identifier un bord de route
DE102013201545A1 (de) 2013-01-30 2014-07-31 Bayerische Motoren Werke Aktiengesellschaft Erstellen eines Umfeldmodells für ein Fahrzeug
DE102015212771A1 (de) 2015-07-08 2017-01-12 Bayerische Motoren Werke Aktiengesellschaft Vorrichtung zur Erkennung von teilverdeckten beweglichen Objekten für ein Umfelderfassungssystem eines Kraftfahrzeugs
DE102017130488A1 (de) 2017-12-19 2019-06-19 Valeo Schalter Und Sensoren Gmbh Verfahren zur Klassifizierung von Parklücken in einem Umgebungsbereich eines Fahrzeugs mit einem neuronalen Netzwerk
DE102018114229A1 (de) 2018-06-14 2019-12-19 Connaught Electronics Ltd. Verfahren zum Bestimmen eines Bewegungszustands eines Objekts in Abhängigkeit einer erzeugten Bewegungsmaske und eines erzeugten Begrenzungsrahmens, Fahrerassistenzsystem sowie Kraftfahrzeug

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9305214B1 (en) * 2013-10-29 2016-04-05 The United States Of America, As Represented By The Secretary Of The Navy Systems and methods for real-time horizon detection in images
US20190325263A1 (en) * 2018-04-23 2019-10-24 Intel Corporation Non-maximum suppression of features for object detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU WEI ET AL: "SSD: Single Shot MultiBox Detector", COMPUTER VISION - ECCV 2016, vol. 9905, 17 September 2016 (2016-09-17), Cham, pages 21 - 37, XP055821235, ISSN: 0302-9743, ISBN: 978-3-319-46447-3, DOI: 10.1007/978-3-319-46448-0 *
WU QIONG ET AL: "Single Shot MultiBox Detector for Vehicles and Pedestrians Detection and Classification", 2017 2ND INTERNATIONAL SEMINAR ON APPLIED PHYSICS, OPTOELECTRONICS AND PHOTONICS (APOP 2017), 21 February 2018 (2018-02-21), pages 22 - 28, XP055820738, ISSN: 2475-885X, ISBN: 978-1-60595-522-3, DOI: 10.12783/dtetr/apop2017/18705 *

Also Published As

Publication number Publication date
EP4150508A1 (fr) 2023-03-22
DE102020112860A1 (de) 2021-11-18

Similar Documents

Publication Publication Date Title
EP2394234B1 (fr) Procédé et dispositif de détermination d'un marquage de voie de circulation en vigueur
EP2179381B1 (fr) Procédé et dispositif servant à la reconnaissance de panneaux de signalisation routière
DE102013205950B4 (de) Verfahren zum Detektieren von Straßenrändern
DE69624980T2 (de) Objektüberwachungsverfahren und -gerät mit zwei oder mehreren Kameras
DE112013001858T5 (de) Mehrfachhinweis-Objekterkennung und -Analyse
DE19955919C1 (de) Verfahren zur Erkennung von Objekten in Bildern auf der Bildpixelebene
DE102017203276B4 (de) Verfahren und Vorrichtung zur Ermittlung einer Trajektorie in Off-road-Szenarien
EP2396746A2 (fr) Procédé de détection d'objets
DE102011111440A1 (de) Verfahren zur Umgebungsrepräsentation
WO2014032904A1 (fr) Procédé et dispositif de détection de la position d'un véhicule sur une voie de circulation
DE102016210534A1 (de) Verfahren zum Klassifizieren einer Umgebung eines Fahrzeugs
EP3520023B1 (fr) Détection et validation d'objets provenant d'images séquentielles d'une caméra
DE102017210112A1 (de) Verfahren und System zur Durchführung einer Kalibrierung eines Sensors
DE102018121008A1 (de) Kreuzverkehrserfassung unter verwendung von kameras
DE102013012930A1 (de) Verfahren zum Bestimmen eines aktuellen Abstands und/oder einer aktuellen Geschwindigkeit eines Zielobjekts anhand eines Referenzpunkts in einem Kamerabild, Kamerasystem und Kraftfahrzeug
DE102009022278A1 (de) Verfahren zur Ermittlung eines hindernisfreien Raums
WO2020020654A1 (fr) Procédé pour faire fonctionner un système d'aide à la coduite doté deux dispositifs de détection
DE102020204840A1 (de) Prozessierung von Mehrkanal-Bilddaten einer Bildaufnahmevorrichtung durch einen Bilddatenprozessor
DE102008050456B4 (de) Verfahren und Vorrichtung zur Fahrspurerkennung
WO2021228686A1 (fr) Détection améliorée d'objets
WO2019057252A1 (fr) Procédé et dispositif de détection de voies de circulation, système d'aide à la conduite et véhicule
DE102006007550A1 (de) Vorrichtung und Verfahren zur Erkennung einer Fahrbahnmarkierung für ein Kraftfahrzeug
DE102019132012B4 (de) Verfahren und System zur Detektion von kleinen unklassifizierten Hindernissen auf einer Straßenoberfläche
DE102015112389A1 (de) Verfahren zum Erfassen zumindest eines Objekts auf einer Straße in einem Umgebungsbereich eines Kraftfahrzeugs, Kamerasystem sowie Kraftfahrzeug
WO2015074915A1 (fr) Ensemble de filtres et procédé de fabrication d'un ensemble de filtres

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21723981

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021723981

Country of ref document: EP

Effective date: 20221212