WO2022042903A1 - Procédé d'identification d'objets tridimensionnels, programme informatique, support de stockage lisible par machine, unité de commande, véhicule et système de surveillance vidéo - Google Patents

Procédé d'identification d'objets tridimensionnels, programme informatique, support de stockage lisible par machine, unité de commande, véhicule et système de surveillance vidéo Download PDF

Info

Publication number
WO2022042903A1
WO2022042903A1 PCT/EP2021/068017 EP2021068017W WO2022042903A1 WO 2022042903 A1 WO2022042903 A1 WO 2022042903A1 EP 2021068017 W EP2021068017 W EP 2021068017W WO 2022042903 A1 WO2022042903 A1 WO 2022042903A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
segment
determined
pixels
dimensional object
Prior art date
Application number
PCT/EP2021/068017
Other languages
German (de)
English (en)
Inventor
Emil Schreiber
Fabian GIGENGACK
Original Assignee
Robert Bosch Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh filed Critical Robert Bosch Gmbh
Publication of WO2022042903A1 publication Critical patent/WO2022042903A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • the present invention relates to a method for detecting three-dimensional objects in a field of view of a camera.
  • the invention also relates to a computer program that is set up to carry out this method, and a machine-readable storage medium on which the computer program is stored.
  • the invention also relates to a control unit that is set up to carry out the method according to the invention, and a vehicle with this control unit, and also a video surveillance system with this control unit.
  • the acquisition of camera images on a vehicle by means of a mono or stereo camera is known, with the vehicle camera capturing, for example, the area to the rear or to the front in the direction of travel of an area surrounding the vehicle.
  • a learned machine recognition method can be used, for example, to carry out a semantic segmentation and/or an object recognition.
  • the recognized segments and/or objects in the surroundings of the vehicle are used in driver assistance methods or partially or fully autonomous guidance of a vehicle and/or for display in a virtual three-dimensional environment model, for example a driving maneuver is carried out depending on a recognized object.
  • distance data is recorded detected objects, for example to follow another vehicle or to avoid an object or to be able to carry out a driving maneuver.
  • Machine recognition methods are, for example, neural networks, in particular those with a large number of layers, each of which includes so-called neurons.
  • a neuron of each layer is typically associated with neurons of a previous layer and neurons of a subsequent layer.
  • the links between the neurons each have associated weights.
  • Machine recognition methods are advantageously trained with a large number of data, in particular this data includes a large number of images which each have an assigned label or an expected output of the recognition method for at least a partial area of a respective image.
  • a machine recognition method can be trained with data of only one known output. During training, at least the weights of the links are typically adjusted.
  • Each of the layers of the neural network advantageously represents an abstraction level of the image.
  • a machine recognition method can learn, for example, in particular by adjusting the weights of the connections between the neurons, to distinguish a vehicle in an image from a person or a tree or to recognize the vehicle, with the machine recognition method typically providing a probability for the Presence of the object determined.
  • the result is a computationally efficient, trained machine recognition method.
  • Both the structure of the machine recognition process or the number of layers and neurons per layer and the training or the training data of the machine recognition process have a major influence on the recognition quality.
  • a resulting principle of the machine recognition method in the application often remains unclear for a user.
  • a machine recognition method can be described as a non-analytical method.
  • Semantic segmentation is known as a learned machine recognition method.
  • a method for semantic segmentation delivers as a result a classification of a pixel of a camera image into semantic categories (eg person, car, street, . . . ), with all pixels of the image being classified in particular but not necessarily. This corresponds in particular to a rough classification of the image content mapped by the pixels.
  • semantic segmentation an image can be divided into two classes or sub-areas, for example a sub-area that depicts a person and another sub-area that depicts the background of the person depicted.
  • stereo cameras allow, in a known manner, the determination of distance data between surrounding objects in the field of view of a camera and the camera using two cameras arranged at a fixed distance using a triangulation method.
  • a stereo camera comprising two cameras
  • distance data between objects and a vehicle or the camera can be recorded by means of an ultrasonic sensor, by means of a radar sensor or by means of a lidar sensor.
  • a sensor placed in addition to a camera increases the costs of the overall system, for example the vehicle, and in the case of a sensor data fusion requires a complex and possibly regular calibration between the sensor and the camera as well as a more powerful computing unit for the sensor data fusion.
  • the object of the present invention is to improve the detection of static and/or dynamic objects compared to the prior art.
  • the present invention relates to a method for detecting three-dimensional objects in a field of view of a camera.
  • the camera is in particular a vehicle camera, which preferably captures at least part of the surroundings of the vehicle.
  • a vehicle camera is arranged on the vehicle at an elevated position behind the windshield.
  • at least one camera image is captured using at least the camera.
  • Provision can advantageously be made for multiple camera images to be captured approximately simultaneously by means of one camera each, with the cameras having different fields of view or perspectives or capturing a different partial area of the environment.
  • provision can be made for multiple camera images to be captured by multiple vehicle cameras, which are each advantageously part of a surround view system of the vehicle.
  • the camera or vehicle camera is advantageously set up to capture an area of an environment of the vehicle that is at the front in the direction of travel.
  • the camera or vehicle camera is preferably a mono camera, in particular for cost reasons, it being possible for the mono camera to have wide-angle optics.
  • the camera or vehicle camera is advantageously part of a stereo camera, in particular in order to achieve increased reliability or accuracy of the method.
  • a semantic segmentation of the at least one camera image is carried out using a first learned machine recognition method. At least one image region that depicts a static and/or moving object class, for example, is advantageously recognized in at least one partial area of the camera image other vehicles detected in the camera image.
  • segment information is assigned to the pixels of the camera image as a function of the semantic segmentation, with the respective pixels in particular imaging the recognized object.
  • the pixels of the camera image that depict a vehicle are assigned a value as segment information that represents vehicles.
  • At least one image section of the camera image is then determined as a segment, which has adjacent pixels with the same assigned segment information.
  • adjacent pixels of the camera image are grouped into a segment depending on the respectively assigned segment information. All pixels of a camera image that have a connection to the respective pixel through pixels with the same segment information are advantageously understood as neighboring pixels.
  • a single segment can consequently in particular include more than one vehicle or more than one person.
  • the distance data is then determined between surrounding objects in the camera's field of view and the camera, in particular between objects in the vicinity of the vehicle and the vehicle.
  • the distance data are preferably determined as a function of the captured camera image.
  • the distance data are particularly preferably determined as a function of the captured camera image by a second, learned machine recognition method, see publication by C. Godard et al. or D. Eigen et al.
  • the distance data can be determined by a stereo vision method and/or a structure-from-motion method.
  • the distance data can be determined by an ultrasonic sensor, a radar sensor and/or a lidar sensor.
  • distance information is assigned to the pixels of at least part of the camera image as a function of the determined distance data.
  • the associated distance information of the respective pixel advantageously represents a distance of an object in the surroundings, which is imaged by the pixel, from the vehicle.
  • an optical flow or a movement relative to at least some of the pixels of the captured camera image is determined.
  • an optical flow of the pixels of a determined segment is determined.
  • the optical flow to at least some of the pixels of the camera image is determined as a function of the captured camera image and at least one other previously and/or subsequently captured camera image.
  • At least one three-dimensional object hypothesis is determined in a determined segment, with pixels of the determined segment are advantageously subgrouped or grouped into a segment excerpt depending on the respectively assigned distance information.
  • the grouping of the pixels in a segment to form a three-dimensional object hypothesis takes place in particular when a difference between the associated distance information of these pixels or at least a predetermined number of these pixels is less than or equal to a distance tolerance value.
  • a segment section of the segment is advantageously determined as a three-dimensional object hypothesis as a function of the assigned distance information of the pixels of the segment, this segment section advantageously having at least a defined number of pixels whose assigned distance information each have a difference that is less than or equal to a distance tolerance value.
  • the at least one three-dimensional object hypothesis is determined as a segment section of a determined segment depending on the determined optical flow of the pixels of the segment.
  • the pixels of a determined segment are advantageously additionally or alternatively combined or grouped into a segment excerpt depending on the determined optical flow.
  • those pixels of the segment whose optical flow vectors are approximately the same and/or whose change in flow vectors are approximately the same and/or whose optical flow vectors point in approximately the same direction are additionally or alternatively combined to form the segment section.
  • the pixels grouped into a three-dimensional object hypothesis are adjacent.
  • the method has the advantage that the object hypotheses are reliably determined because the machine recognition method or methods that have been learned are linked to a physical model. In other words, errors in object recognition and/or in a determined object extension are avoided, especially when two objects cover each other, since mutually covering objects have a different distance from the camera and/or a different direction of movement and/or a different speed.
  • the physical model used states that in an image section of the camera image or in a segment that depicts the same semantic content, there can be no significantly different distances to the camera or vehicle and/or no significantly different speeds or directions of movement if this would represent only one object. In other words can in advantageously different segment excerpts are identified in a segment, which represent different three-dimensional object hypotheses.
  • the first learned machine recognition method can advantageously be trained more robustly, since it can generate a more abstract output compared to classic object recognition methods, for example static and moving objects or vehicle classes do not initially have to be differentiated.
  • the method is preferably carried out using only one camera or vehicle camera or using a mono camera or using a stereo camera and additional active sensors which emit electromechanical radiation or pressure or ultrasound are dispensed with. As a result, the method can be carried out in a cost-effective and very computationally efficient manner.
  • the object hypothesis is only determined if the distance information assigned to the pixels of the segment excerpt is less than or equal to a distance threshold value for at least a predetermined number of pixels. This makes the method more computationally efficient and more reliable.
  • the three-dimensional object hypothesis is only determined if the number of pixels in the segment section is greater than or equal to a minimum value. This avoids unrealistically small extensions of object hypotheses or unimportant object hypotheses.
  • the distance tolerance value when determining the three-dimensional object hypothesis, is adjusted as a function of the assigned segment information, the assigned distance information and/or a detected speed of the vehicle.
  • the distance tolerance value can advantageously be adapted to an expected extent of an object class and/or to an expected orientation of an object class, for example vehicles or people who are hiding, and/or to an accuracy of the determined distance data that changes as the vehicle speed changes.
  • At least one object in the ascertained segment is recognized by a further learned machine recognition method. For example, a person's head or a license plate is recognized.
  • the object hypothesis in a segment is then determined as a function of the detected object, for example the number of object hypotheses is determined as a function of the number of vehicles or people depicted in the segment.
  • object hypotheses for example, are determined as a function of the number of vehicles or people depicted in the segment.
  • object information is assigned to the respective pixels of the determined segment, which depict the detected object, depending on the detected object.
  • the at least one three-dimensional object hypothesis is then determined as a segment section of a determined segment, additionally as a function of the object information assigned to at least some pixels in the segment section.
  • the determination of the at least one three-dimensional object hypothesis as a segment section of a determined segment is optionally carried out additionally as a function of the object information assigned to the pixels.
  • a distance tolerance value can be adjusted as a function of the number of objects detected if the number of object hypotheses determined does not correspond to the number of objects detected.
  • the number of object hypotheses determined is carried out as a function of the determined number of objects recognized.
  • an object hypothesis for an object located in the foreground in the camera's field of view is thus advantageously determined if, for example, a necessary condition, such as a number plate of a vehicle or a person's head, is detected.
  • a necessary condition such as a number plate of a vehicle or a person's head
  • the number of determined three-dimensional object hypotheses is advantageously checked and, if necessary, a parameter of the method is adjusted if the number of determined object hypotheses does not correlate to the number of objects recognized.
  • Texture information and/or color information of the pixels in the determined segment to determine.
  • the determined texture information and/or the determined color information is then assigned to the respective pixels of the determined segment which map the determined texture information and/or the determined color information.
  • the at least one three-dimensional object hypothesis is then determined as a segment section of a determined segment, additionally depending on the assigned texture information and/or the assigned color information.
  • the pixels of a determined segment are also combined or grouped depending on the determined texture information and/or the determined color information to form a three-dimensional object hypothesis or a segment section, with in particular those pixels of the segment being combined whose determined or assigned texture information and/or their determined or assigned color information are approximately the same.
  • the distance data between the surroundings of the vehicle and the vehicle, determined by means of a vehicle camera are corrected by means of an ultrasonic sensor, a lidar sensor and/or a radar sensor.
  • an ultrasonic sensor e.g., a lidar sensor
  • a radar sensor e.g., a radar sensor
  • a validation of the three-dimensional object hypothesis can preferably be carried out, the method being carried out repeatedly based on another camera image previously or later captured by the camera or vehicle camera.
  • the determination of the object hypotheses is advantageously checked for temporal consistency.
  • it is checked whether a person or a vehicle has been detected before and after in the camera image and has already been determined as an object hypothesis, since the person or the vehicle cannot suddenly disappear or appear.
  • the three-dimensional object hypothesis can optionally be validated, with the method being carried out on the basis of another camera image captured earlier or later or at the same time using a different camera from a different perspective.
  • the determination of the object hypotheses is advantageously checked for perspective consistency.
  • the other camera and the camera or the vehicle camera in this embodiment are part of a stereo camera, so that the distance data can also be precisely recorded or determined.
  • the method is particularly accurate and reliable.
  • the at least one specific three-dimensional object hypothesis is then displayed in a virtual three-dimensional environment model.
  • the environment model is advantageously displayed or represented from a bird's-eye view.
  • the three-dimensional object hypothesis for the vehicle is displayed as a function of the distance information assigned to the pixels, which represents the respective determined object hypothesis.
  • the three-dimensional object hypothesis is additionally displayed as a function of an orientation of the specific object hypothesis determined based on another learned machine recognition method.
  • the invention also relates to a computer program which is set up to carry out a method according to the invention for recognizing three-dimensional objects in a field of view of a camera.
  • the invention also relates to a machine-readable storage medium on which the computer program product according to the invention is stored.
  • control unit is set up to be connected to at least one camera, the camera being in particular a vehicle camera.
  • the control unit is Also set up to carry out a method according to the invention for detecting three-dimensional objects in a field of view of a camera.
  • the invention relates to a vehicle with a control device according to the invention.
  • the invention relates to a video surveillance system with a control unit according to the invention.
  • FIG. 5 determined distance data for the captured camera image
  • a vehicle 100 is shown schematically in FIG.
  • Vehicle 100 has a camera 111 or vehicle camera, which is advantageously designed as a mono camera for reasons of cost.
  • the camera 111 captures a partial area 191 of the surroundings 190 which is in the field of view 191 of the camera.
  • Camera 111 is set up to capture at least one camera image of partial area 191 in field of view 191 of surroundings 190 of vehicle 100 or a sequence of camera images of surroundings 190 .
  • camera 111 captures a field of view 191 or a partial area of surroundings 190 in the direction of travel of vehicle 100 or front surroundings 190 of vehicle 100 .
  • Camera 120 also captures a rear portion of surroundings 190 of vehicle 100, with each camera 111 being able to be designed as a wide-angle camera. Furthermore, provision can be made for several wide-angle cameras 120 of a surround view camera system to be arranged as cameras 111 on the vehicle as an alternative or in addition.
  • Vehicle 100 optionally includes a stereo vision system 110 which includes camera 111 or vehicle camera and a further camera 112 .
  • Camera 111 and the additional camera 112 can be used to capture a sequence of camera images or camera images and, using a triangulation method based on simultaneously captured camera images from camera 111 and the additional camera 112, distances or distance data between camera 111 or the vehicle and surroundings 190 or Objects 180 in the area 190 of the vehicle 100 are determined.
  • Surrounding objects 180 are, for example, other vehicles or third-party vehicles that are driving ahead or behind vehicle 100, for example on a common lane 182, or other vehicles that are approaching vehicle 100, for example on another lane 182, or people who are, for example, on move on a sidewalk 181 next to the roadway.
  • vehicle 100 can have at least one radar sensor 130, a lidar sensor (not shown) and/or an ultrasonic sensor 140 as an optional sensor in addition to camera 111 for detecting or determining distance data.
  • the vehicle 100 also has a display device 150 which is set up to display information which is based on the detected sensor data of the various sensors 111 , 112 , 120 , 130 , 140 to a user or driver of the vehicle 100 .
  • the vehicle 100 can optionally be set up by means of a control unit to support a guidance of the vehicle 100 .
  • Vehicle 100 can also optionally be set up by means of a control unit to carry out some driving situations semi-autonomously or fully autonomously, for example a parking maneuver or driving on a freeway.
  • FIG. 2 shows a sequence of the method for detecting three-dimensional objects 180 in a field of view 191 of a camera 111 as a block diagram.
  • the method begins with acquisition 210 of at least one camera image using camera 111, 112 and/or 120, with camera 111, 112 and/or 120 being arranged in particular on a vehicle 100 or with camera 111, 112 and/or 120 in particular the vehicle camera according to FIG.
  • the camera 111, 112 and/or 120 can be part of a surveillance system, with the surveillance system being stationary in particular.
  • a semantic segmentation of the camera image is carried out using a first learned machine recognition method.
  • the first learned machine recognition method or the semantic segmentation subareas of the camera image that depict semantic categories, such as at least one person, a vehicle or a car and/or a road and/or a vehicle for driving or monitoring unimportant background of the environment detected.
  • all pixels of the camera image are classified by the semantic segmentation 220, with the semantic segmentation 220 representing a rough classification of the camera image into the respective categories shown, for example the categories include a background of the camera image or moving objects.
  • all objects in the vicinity of the camera 111 that are in the field of view 191 of the camera are preferably classified by the semantic segmentation 220 and, in particular, a partial area of the camera image is also recognized or classified as the background.
  • segment information is assigned 221 to those pixels of the respective partial area of the camera image for which a category was recognized.
  • the segment information assigned in step 221 to a respective pixel of the camera image or a representation of the camera image represents the recognized semantic category which is mapped by the pixel.
  • adjacent pixels of the camera image are grouped into a segment 410, 420 depending on the respectively associated semantic segment information.
  • at least one image section is determined as a segment 410, 420 depending on the semantic segment information assigned to the pixels, with a segment 410, 420 preferably only having pixels that are adjacent to one another.
  • the neighborhood of pixels can be determined in a number of ways according to the prior art.
  • pixels can be considered to be adjacent to one another if only pixels with the same assigned semantic segment information are arranged between two pixels or if a direct connection through pixels with the same assigned semantic segment information is possible between two pixels.
  • a segment 410, 420 can have one or more objects that at least partially cover one another, for example a number of people or a number of vehicles.
  • optical flow vectors or an optical Flow to at least part of the pixels of the captured camera image is determined, see below.
  • optical flow vectors for the pixels of each determined segment 410, 420 are determined as a function of the camera image and at least one other previously and/or subsequently captured camera image, in particular if the segment 410, 420 or the partial area of the camera image contains at least one moving and/or or non-moving environmental object. Furthermore, a determination 240 of at least one item of texture information and/or one item of color information of the pixels in the determined segment 410, 420 can optionally be carried out. In this optional refinement, an assignment 241 of the ascertained texture information and/or the ascertained color information to the respective pixels of the ascertained segment 410, 420 is then carried out.
  • step 250 of the method that is an alternative or additional to step 230, distance data 501 to 507 between surrounding objects 180 in the camera field of view 191 or the partial area of the surroundings and camera 111, 112 and/or 120 captured by camera 111, 112 and/or 120 determined.
  • the distance data 501 to 507 are determined in step 250, preferably using a trained second machine recognition method based on the camera image 300 of a mono camera as camera 111 or using a stereo camera 110.
  • the distance data between surrounding objects 180 in the camera field of view 191 and the camera 111, 112 and/or 120 can be determined at least by means of an ultrasonic sensor, a lidar sensor and/or a radar sensor. It can be provided in an optional step 251 that in step 250 camera-based distance data is corrected and/or validated by distance data determined by means of an ultrasonic sensor, lidar sensor and/or radar sensor. Then, in step 252, pixels of at least part of the camera image are assigned a respective distance information item depending on the distance data determined in step 250 or in step 251.
  • At least one object or detail object in the determined segment 410, 420 is recognized by a further trained machine recognition method, with the detected detail object in the segment 410, 420 having a lower degree of abstraction than the assigned segment information or the recognized semantic category of the Segments 410, 420.
  • a number plate is determined for the determined segment vehicles or moving object. This is done in an optional step 261, not shown in Figure 2 recognized detailed object assigned to the respective pixels of the determined or associated or superordinate segment 410, 420.
  • at least one three-dimensional object hypothesis is determined as a segment section of a determined segment 410, 420 depending on the distance information assigned to the pixels of the segment.
  • an object hypothesis is determined in step 270 if the distance information assigned to neighboring pixels is approximately the same or the assigned distance information of the pixels has a difference that is less than or equal to a distance tolerance value.
  • the object hypothesis is advantageously determined in step 270 if the distance information assigned to the pixels of a segment section of segment 410, 420, in particular for at least a predefined number of pixels, is approximately the same or the assigned distance information of the pixels in at least one segment section is in each case to one another have a difference less than or equal to a distance tolerance value.
  • the determination 270 of the object hypothesis is set up to separate two different objects that are imaged in the same segment 410, 420 and that in particular cover one another, since they are at a different distance from the camera, which is represented by the distance information. It can optionally be provided in step 270 that the object hypothesis is only determined if the distance information assigned to the pixels of a segment section of segment 410, 420 is less than or equal to a distance threshold value for at least a predetermined number of pixels. In other words, three-dimensional object hypotheses are advantageously determined in step 270 only within a closer environment to the camera or to the vehicle, with this closer environment being defined by the distance threshold value.
  • the three-dimensional object hypothesis is determined 270 only if the number of pixels of the segment section is greater than or equal to a minimum value. It can also be provided in step 270 that when determining 260 the three-dimensional object hypothesis, a distance tolerance value is adjusted depending on the segment information assigned to the pixels of the segment, depending on the distance information assigned to the pixels of the segment and/or depending on a detected speed of the vehicle will.
  • the at least one three-dimensional object hypothesis is determined 270 as a segment section of a determined segment additionally or alternatively in Dependency of the determined optical flow.
  • the determination 270 of the at least one three-dimensional object hypothesis as a segment section of a determined segment is additionally carried out as a function of the detected object or the detected detailed object.
  • a vehicle driving ahead is advantageously recognized when a number plate is recognized in the segment detail.
  • the determination 270 of the three-dimensional object hypothesis as a segment excerpt of a determined segment can also take place depending on the assigned texture information and/or the assigned color information, so that a green vehicle can be separated or differentiated more easily from a red vehicle.
  • the method is first repeatedly carried out on the basis of another camera image previously or later captured by the vehicle camera. Then, in optional step 280, the consistency of the object hypothesis is checked with object hypotheses determined earlier or later, or the specific object hypothesis is validated or discarded as a function of the object hypothesis determined at a different point in time. Furthermore, in another optional step 281, the method can be carried out repeatedly based on a camera image captured from a different perspective. Then, in optional step 281, the consistency of the determined three-dimensional object hypothesis is checked with an object hypothesis determined from a different perspective or the determined object hypothesis is validated or rejected depending on the object hypothesis determined from a different perspective. Finally, in an optional method step 290, the at least one specific three-dimensional object hypothesis can be displayed in a virtual three-dimensional environment model.
  • a captured camera image 300 is shown schematically in FIG.
  • the camera image 300 depicts the partial area of the surroundings captured in the field of view 191 of the camera 111 , 112 and/or 120 .
  • a roadway or lane 182 with a vehicle driving ahead 320 as a moving object 180 and pedestrians 310 partially covering one another as further moving objects 180 are shown on one Sidewalk 181 and a vehicle 330 parked on a sidewalk 181 as a stationary moving object 180, the parked vehicle 330 being partially covered by the vehicle 320 driving ahead.
  • FIG. 4 shows a categorized representation or rough classification 400, determined according to steps 220, 221 and 222, of the captured camera image 300 from FIG.
  • moving objects 180 are initially recognized as a semantic category, for example people and vehicles, by a first learned machine recognition method.
  • the semantic segmentation 220 recognizes a background in the camera image 300 that is not relevant to the driving of the vehicle. Provision can be made for recognizing further semantic categories, for example the roadway 182.
  • the respective pixels which depict the vehicles and people are assigned the moving object category 180 as segment information in step 221.
  • segments 410 and 420 are formed or determined by grouping adjacent pixels with the same assigned segment information, in particular moving object 180.
  • the semantic segmentation 220 of the captured camera image consequently results in the steps 221 and 222 in the rough division 400 of the camera image 300 shown in Figure 4 into segments 410, 420 and 430 and 440, with this rough division 400 in particular separating adjacent pixels with different assigned segment information from one another .
  • a segment 410, 420 of the camera image can represent or include a number of people and/or vehicles.
  • distance data from the captured camera image 300 from FIG. 3 determined by means of the second learned machine recognition method are shown schematically.
  • the areas 501 to 507 which in part but not necessarily run in the form of a ring, each represent a different distance between the surroundings with the surrounding objects 180, 310, 320, 330 and the camera 111, 112 and/or 120 or the vehicle 100. It can be seen that that based on the detected or determined distance data 501 to 507, at least a large number of pixels of the camera image 300 can be assigned determined distance information.
  • the distance data 501 to 507 are advantageously very computationally efficient due to the second learned machine
  • the detection method is estimated or determined or detected or, not shown in FIG.
  • the distance data can be recorded or determined by an ultrasonic, radar or lidar sensor, with distance data advantageously resulting in high quality or reliability.
  • the person 510 in the foreground for example, can be easily determined or distinguished from the people 511, 512 located behind as a separate three-dimensional object hypothesis in the segment 410 based on the determined distance data.
  • vehicles driving in front of each other that are concealing one another can be determined well from one another as separate three-dimensional object hypotheses (not shown).
  • Vehicles 320 and 330 depicted in camera image 300 cannot be clearly distinguished from each other based on distance data, despite the different distances at their respective rears, since vehicles 320 and 330 have different and sometimes the same distances to the camera due to their respective spatial depth.
  • the optical flow vectors for vehicles 320 and 330 have very different magnitudes because vehicle 320 is driving and vehicle 330 is parked or stationary.
  • Vehicles 320 and 330 can therefore advantageously be determined very reliably as different three-dimensional object hypotheses in the same segment 420 if the three-dimensional object hypothesis is determined as a function of the optical flow or the optical flow vectors of the respective pixels of a segment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'identification d'objets tridimensionnels (180) dans un champ de vision (191) d'un dispositif de prise de vues (111, 112, 120), comprenant les étapes suivantes consistant à : capturer (210) au moins une image de dispositif de prise de vues (300) au moyen de l'au moins un dispositif de prise de vues (111, 112, 120); segmenter de manière sémantique (220) l'image de dispositif de prise de vues (300) par un premier procédé d'identification appris par machine; attribuer (221) un élément d'informations de segment aux pixels de l'image de dispositif de prise de vues (300) en fonction de la segmentation sémantique; déterminer (222) au moins un détail d'image en tant que segment (410, 420, 430, 440), des pixels adjacents de l'image de dispositif de prise de vues (300) étant regroupés en un segment (410, 420, 430, 440) en fonction de l'élément sémantique attribué d'informations de segment; déterminer (250) des données de distance (501 à 507) entre des objets environnants (180) dans le champ de vision de dispositif de prise de vues (191) et le dispositif de prise de vues (111, 112, 120) et attribuer (252) un élément d'informations de distance aux pixels d'au moins une partie de l'image de dispositif de prise de vues (300) en fonction des données de distance déterminées (501 à 507), et/ou déterminer (230) un flux optique vers au moins une partie des pixels de l'image de dispositif de prise de vues capturée en fonction de l'image de dispositif de prise de vues (300) et d'au moins une image de dispositif de prise de vues supplémentaire capturée précédemment et/ou ultérieurement; et déterminer (270) au moins une hypothèse d'objet tridimensionnel (510, 511, 512, 520, 530) en tant que section de segment d'un segment déterminé (410, 420, 430, 440) en fonction de l'élément d'informations de distance attribué aux pixels du segment et/ou en fonction du flux optique déterminé.
PCT/EP2021/068017 2020-08-27 2021-06-30 Procédé d'identification d'objets tridimensionnels, programme informatique, support de stockage lisible par machine, unité de commande, véhicule et système de surveillance vidéo WO2022042903A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020210816.1A DE102020210816A1 (de) 2020-08-27 2020-08-27 Verfahren zur Erkennung dreidimensionaler Objekte, Computerprogramm, Maschinenlesbares Speichermedium, Steuergerät, Fahrzeug und Videoüberwachungssystem
DE102020210816.1 2020-08-27

Publications (1)

Publication Number Publication Date
WO2022042903A1 true WO2022042903A1 (fr) 2022-03-03

Family

ID=76859603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/068017 WO2022042903A1 (fr) 2020-08-27 2021-06-30 Procédé d'identification d'objets tridimensionnels, programme informatique, support de stockage lisible par machine, unité de commande, véhicule et système de surveillance vidéo

Country Status (2)

Country Link
DE (1) DE102020210816A1 (fr)
WO (1) WO2022042903A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110311108A1 (en) * 2009-02-16 2011-12-22 Daimler Ag Method for detecting objects
US20160133054A1 (en) * 2014-11-12 2016-05-12 Canon Kabushiki Kaisha Information processing apparatus, information processing method, information processing system, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102018123518A1 (de) 2017-09-26 2019-03-28 Nvidia Corporation Lernen von Affinität über ein neuronales Netzwerk mit räumlicher Propagierung
DE102018220024B3 (de) 2018-11-22 2020-03-12 Audi Ag Verfahren zur Fusion von Sensordaten von mehreren Sensoren und Fusionsvorrichtung zum Fusionieren von Sensordaten von mehreren Sensoren
DE102018132805A1 (de) 2018-12-19 2020-06-25 Valeo Schalter Und Sensoren Gmbh Verfahren für eine verbesserte Objekterfassung
DE102020003008A1 (de) 2020-05-19 2020-07-16 Daimler Ag Automatische visuelle Warnehmung mittels einer Umfeldsensoranordnung

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110311108A1 (en) * 2009-02-16 2011-12-22 Daimler Ag Method for detecting objects
US20160133054A1 (en) * 2014-11-12 2016-05-12 Canon Kabushiki Kaisha Information processing apparatus, information processing method, information processing system, and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
C.GODARD ET AL.: "Unsupervised Monocular Depth Estimation with Left-Right Consistency", ARXIV:1609.03677V3
D.EIGEN ET AL., DEPTH MAP PREDICTION FROM A SINGLE IMAGEUSING A MULTI-SCALE DEEP NETWORK
MA WEI-CHIU ET AL: "Deep Rigid Instance Scene Flow", 18 April 2019 (2019-04-18), pages 1 - 10, XP055848639, Retrieved from the Internet <URL:https://arxiv.org/pdf/1904.08913.pdf> [retrieved on 20211006] *

Also Published As

Publication number Publication date
DE102020210816A1 (de) 2022-03-03

Similar Documents

Publication Publication Date Title
EP3292510B1 (fr) Procédé et dispositif pour identifier et évaluer des réflexions sur une voie de circulation
DE112018007287T5 (de) Fahrzeugsystem und -verfahren zum erfassen von objekten und einer objektentfernung
WO2019174682A1 (fr) Procédé et dispositif de détection et d&#39;évaluation des états de voie de circulation et des influences environnementales météorologiques
WO2018177484A1 (fr) Procédé et système de prédiction de signaux de capteur d&#39;un véhicule
DE102009050505A1 (de) Detektion eines freien Pfads durch Strassenmodellerstellung
DE102009048699A1 (de) Pixelbasierte Detektion einer nicht vorhandenen Struktur eines freien Pfads
DE102009048892A1 (de) Pixelbasierte strukturreiche Detektion eines freien Pfads
EP3631677A1 (fr) Procédé de détection d&#39;objets dans une image d&#39;une camera
WO2007107315A1 (fr) Détecteur d&#39;objets multi-sensoriel reposant sur des hypothèses et dispositif de suivi d&#39;objets
EP3044727B1 (fr) Procédé et dispositif de détection d&#39;objets d&#39;après des données d&#39;image ayant une résolution de profondeur
WO2020025091A1 (fr) Identification de l&#39;intention de déplacement d&#39;un piéton à partir d&#39;images de caméra
DE102014112797A1 (de) Fahrzeugaußenumgebungerkennungsvorrichtung
WO2019201565A1 (fr) Procédé, dispositif et support d&#39;enregistrement lisible par ordinateur comprenant des instructions pour le traitement de données de capteur
WO2016177372A1 (fr) Procédé et dispositif de détection et d&#39;évaluation d&#39;influences environnantes et d&#39;informations sur l&#39;état de la chaussée dans l&#39;environnement d&#39;un véhicule
DE102007013664A1 (de) Multisensorieller Hypothesen-basierter Objektdetektor und Objektverfolger
DE102018133441A1 (de) Verfahren und System zum Bestimmen von Landmarken in einer Umgebung eines Fahrzeugs
DE102018100909A1 (de) Verfahren zum Rekonstruieren von Bildern einer Szene, die durch ein multifokales Kamerasystem aufgenommen werden
DE102021002798A1 (de) Verfahren zur kamerabasierten Umgebungserfassung
EP3642758A1 (fr) Procédé d&#39;évaluation d&#39;un aspect visuel dans un environnement du véhicule et véhicule
DE102011082477A1 (de) Verfahren und System zur Erstellung einer digitalen Abbildung eines Fahrzeugumfeldes
DE102018100667A1 (de) Computersichtvorfusion und räumlich-zeitliche Verfolgung
EP3655299B1 (fr) Procédé et dispositif de détermination d&#39;un flux optique à l&#39;aide d&#39;une séquence d&#39;images enregistrée par une caméra d&#39;un véhicule
DE102019214558A1 (de) Projektionsinformations-erkennungsvorrichtung auf basis eines künstlichen neuronalen netzwerks und verfahren derselben
WO2020104551A1 (fr) Reconnaissance d&#39;objets au moyen d&#39;un système de détection de véhicules
DE102013021840A1 (de) Verfahren zum Erzeugen eines Umgebungsmodells eines Kraftfahrzeugs, Fahrerassistenzsystem und Kraftfahrzeug

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21739993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21739993

Country of ref document: EP

Kind code of ref document: A1