WO2022243357A1 - Transfert d'informations sémantiques sur des nuages de points - Google Patents

Transfert d'informations sémantiques sur des nuages de points Download PDF

Info

Publication number
WO2022243357A1
WO2022243357A1 PCT/EP2022/063407 EP2022063407W WO2022243357A1 WO 2022243357 A1 WO2022243357 A1 WO 2022243357A1 EP 2022063407 W EP2022063407 W EP 2022063407W WO 2022243357 A1 WO2022243357 A1 WO 2022243357A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
point cloud
image
semantic
sensor
Prior art date
Application number
PCT/EP2022/063407
Other languages
German (de)
English (en)
Inventor
Jens HONER
Original Assignee
Valeo Schalter Und Sensoren Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valeo Schalter Und Sensoren Gmbh filed Critical Valeo Schalter Und Sensoren Gmbh
Publication of WO2022243357A1 publication Critical patent/WO2022243357A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to a method for generating information about the surroundings of a vehicle with a driving support system that has at least one surroundings sensor and a detection device, for example a flash lidar or an optical camera.
  • the method comprises the steps of capturing image information around the vehicle with the optical camera, capturing a point cloud around the vehicle with a plurality of surrounding points using the at least one environment sensor, and generating a semantic image of the environment around the vehicle based on the image information that is fed to a neural network, in particular a convolutional neural network, particularly preferably a completely convolutional neural network, FCN, with a reduced number of pixels compared to the image information.
  • a neural network in particular a convolutional neural network, particularly preferably a completely convolutional neural network, FCN, with a reduced number of pixels compared to the image information.
  • the present invention also relates to a driving support system for generating information about the surroundings of a vehicle with at least one surroundings sensor, an optical camera, a control unit, and a data connection via which the at least one surroundings sensor, the optical camera and the control unit are connected to one another, the Driving support system is designed to carry out the above method.
  • Driving support systems are becoming more and more important in current vehicles in order to increase driving safety when driving the vehicle. This applies both to driver assistance systems that assist a human driver in driving the vehicle and to the provision of functionalities for carrying out autonomous or semi-autonomous driving functions.
  • a basis for this is a reliable detection of environmental information of an environment of a vehicle.
  • Geometric information relates to a desertification of objects and structures in the environment, and semantic information to an assignment of different categories to the objects and structures.
  • semantic information can be derived more easily from camera data. Accordingly, the effort involved in semantics on point clouds is typically higher and the results are worse.
  • the invention described here attempts to use the sensors according to their respective strengths.
  • environment sensors such as LiDAR-based environment sensors or radar sensors are known in the prior art, which can determine a geometric structure of the environment reliably and with a high level of accuracy.
  • These surroundings sensors typically provide a point cloud of the surroundings of the vehicle with a plurality of surrounding points.
  • Each of the environmental points is defined by its angular position with respect to the environmental sensor, its elevation angle, and an associated distance value.
  • the environmental points thus indicate the positions of the objects and structures in the area surrounding the vehicle.
  • discrete laser pulses are emitted at an angular spacing of about 0.1 degrees in the horizontal direction.
  • Reflections of the emitted laser pulses are received by the LiDAR-based environmental sensor, and the corresponding distance value can be determined from a runtime from the emission of the laser pulse to the receipt of the associated reflection.
  • the LiDAR-based environmental sensor can emit the laser pulses in one or more scan planes, with the angular distance in the vertical direction being greater than in the horizontal direction when used on vehicles. The details regarding angular distances in horizontal and vertical directions as well as a total number of scan planes depend on the LiDAR-based environmental sensor used in each case.
  • semantic information about the environment can be determined from the point cloud recorded in this way, the semantic information obtained in this way is not very reliable due to the relatively large angular distances and the lack of detailed information in relation to the objects and structures, such as color information from a camera.
  • additional information can be provided for each of the environmental points, for example as intensity values of the received reflections. This allows the determination of semantic information of the environment improved, but this information is still not reliable enough.
  • an optical camera provides image information as dense information with small angular distances between individual pixels.
  • the pixels are defined by a chip area of the image sensor and a distance between the pixels on the image sensor is very small, gaps between the pixels tend to zero.
  • a camera generates dense information in this sense as an undetected area tends to zero, while LiDAR-based environmental sensors emit discrete laser pulses that have a small expansion, leaving gaps between adjacent laser pulses that LiDAR-based environmental sensors do not detect are, and thus do not contribute to the detection of the environment.
  • Camera systems typically realize higher resolution, and the passiveness of a camera compared to the active exposure in lidar creates a more even distribution of information on a "pixel".
  • optical cameras can provide the image information with color information for the individual pixels, which represent additional information for the semantic processing. This higher amount of information to be processed and the density of the image information leads to a good performance of the semantic segmentation of the image information provided with the optical camera.
  • depth estimates based on the image information of individual cameras are also known. However, the depth estimates are mostly derived from semantics. Either implicitly via semantics, or stereo or pseudo-stereo by means of different poses of the moving vehicle.
  • image information 100 with a resolution Fl * B is provided by an optical camera (not shown).
  • the image information 100 contains in this Exemplary embodiment information from three color channels K.
  • the image information 100 is processed with a neural network 102 in a first processing step.
  • a semantic image 104 is generated, which has a lower resolution H/N * B/M than the image information 100 and contains semantic information for K classes for each pixel. Details on this are shown in FIG.
  • the neural network 102 is shown there, which has a plurality of layers 106 which process the image information 100 in stages.
  • the neural network 102 has been previously trained for semantic segmentation.
  • a highly sampled semantic image 108 is generated from the semantic image 104 by bilinear upsampling, which has the resolution Fl * B of the image information 100 in order to generate a semantic mask for the entire image information 100 .
  • Bilinear upsampling is an example here, but there are other methods as well.
  • the semantic image 104 as well as the upsampled semantic image 108 contain semantic information for 21 classes.
  • An exemplary representation 110 in which the respective class with the highest confidence value is transferred to the image information 100 allows individual objects 112 to be identified.
  • the upsampling is shown in FIG. 3 as an example.
  • the semantic image 104 with the resolution H/N * B/M is processed in an upsampling layer 114 and enlarged to the resolution Fl * B of the image information 100 .
  • a point cloud 116 is provided by an environment sensor (not shown) in sensor coordination of the environment sensor.
  • the surrounding points contained in the point cloud 116 are transformed into image coordinates, as a result of which the point cloud is provided in image coordinates 118 .
  • the point cloud in image coordinates 118 and the upsampled semantic image 108 are merged into the environment information 120 by pixel mapping.
  • the environmental information 120 is a semantic point cloud, i.e. each environmental point of the point cloud 116 is assigned the semantic information of the pixel of the upsampled semantic image 108, which corresponds to its position in image coordinates.
  • the invention is therefore based on the object of specifying a method for generating information about the surroundings of a vehicle with a driving assistance system that has at least one surroundings sensor and an optical camera, as well as such a driving assistance system that provides an efficient and enable reliable generation of environment information with geometric and semantic information.
  • a method for generating information about the surroundings of a vehicle with a driving support system that has at least one surroundings sensor and an optical camera comprising the steps of capturing image information of the surroundings of the vehicle with the optical camera, capturing a point cloud of the environment of the vehicle with a plurality of surrounding points with the at least one surrounding sensor, generating a semantic image of the surrounding of the vehicle based on the image information that is supplied to a neural network, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, with a compared to the image information reduced number of pixels, mapping the surrounding points of the point cloud directly to positions in the semantic image, and generating the environment information by assigning the semantic information for each environment point of the point cloud based on the mapping of the respective surrounding point to the corresponding position in the semantic image.
  • a neural network in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN
  • a driving support system for generating information about the surroundings of a vehicle is also specified, with at least one surroundings sensor, an optical camera, a control unit, and a data connection via which the at least one surroundings sensor, the optical camera and the control unit are connected to one another, the driving support system is designed to carry out the above method.
  • the basic idea of the present invention is therefore to carry out an efficient determination of the semantic information for all surrounding points of the point cloud by mapping the surrounding points to the positions in the semantic image, without upsampling of the semantic image being necessary.
  • this is advantageous because the image information typically contains significantly more picture elements (pixels) than surrounding points are contained in the point cloud, and thus only the small number of surrounding points is processed instead of generating a larger number of picture elements for a highly sampled semantic image.
  • the upsampling of the semantic image represents a resource-intensive processing step.
  • the entire captured image information can be used to generate the semantic information.
  • parts of the image information that are not required for assigning the semantic information to the surrounding points cannot be processed further in order to save resources.
  • upsampling does not generate any additional information, but merely represents the existing information differently.
  • the environment information refers to information that defines the environment, in particular to discover obstacles or potential dangers for the vehicle.
  • the environmental information is formed with geometric information and with semantic information.
  • Geometric information relates to a desertification of objects and structures in the environment, and semantic information to an assignment of different categories to the objects and structures.
  • the surroundings of the vehicle is an area that is captured by the optical camera and the at least one surroundings sensor.
  • the environment can be recorded in full, i.e. 360° around the vehicle, or only in a partial area, for example in a field of view with 90° to 180° in driving direction.
  • the range typically extends to a distance of 100 or 200 meters from the vehicle, but can also be greater or lesser in extent. In particular, the extent is not greater than a range of the at least one environmental sensor or the optical camera.
  • the driving support system can be designed to provide any support functions for driving the vehicle or with the vehicle. This can involve driver assistance systems that assist a human driver in driving the vehicle, as well as the provision of functionalities for carrying out autonomous or semi-autonomous driving functions.
  • driver assistance systems are known, for example, under the term ADAS (Advanced Driver Assistance Systems).
  • Capturing the image information of the surroundings of the vehicle with the optical camera includes generating a two-dimensional matrix with image points, which are also referred to as pixels. Typical resolutions of optical cameras are in the range of one megapixel or more per image. Typically, the image information is continuously provided anew, for example in the manner of a Video streams with consecutive frames, which each form the image information. The image information is preferably provided by the optical camera with color information for the individual pixels. Optical cameras typically have a viewing angle of less than 180°, so that in order to monitor the surroundings at an angle of more than 360°, image information from a number of optical cameras must be processed together.
  • the driving support system can comprise a plurality of optical cameras, for example one camera being arranged on each side of the vehicle.
  • Capturing a point cloud surrounding the vehicle provides the point cloud with a plurality of surrounding points.
  • Each of the environmental points is defined by its angular position with respect to the environmental sensor and an associated distance value.
  • the environmental points thus indicate the positions of objects or structures in the area surrounding the vehicle.
  • the point cloud is transmitted from the at least one environmental sensor to the control unit via the data connection.
  • discrete laser pulses are emitted at an angular spacing of, for example, approximately 0.1 degrees in the horizontal direction.
  • lidar technologies that work with continuous illumination. The invention described here can be applied in both cases.
  • Reflections of the emitted laser pulses are received by the LiDAR-based environmental sensor, and the distance value for the respective environmental point can be determined from a transit time from the emission of the laser pulse to the receipt of the associated reflection.
  • the LiDAR-based environmental sensor can emit the laser pulses in one or more scan planes, with the angular distance in the vertical direction being greater than in the horizontal direction when used on vehicles. The details regarding angular distances in horizontal and vertical directions as well as a total number of scan planes depends on the respective LiDAR-based environmental sensor. With current LiDAR-based environmental sensors, additional information can be provided for each of the environmental points, for example as intensity values of the received reflections.
  • a semantic image of the surroundings of the vehicle is generated based on the image information that is supplied to the neural network, in particular a convolutional neural network, particularly preferably one Completely convolutional neural network, FCN, with a reduced number of pixels compared to the image information.
  • the image information is transmitted from the optical camera to the control unit via the data connection and processed there automatically.
  • Corresponding implementations for the semantic segmentation of image information are known as such and can be implemented, for example, using the neural network, the neural network having to be trained accordingly in advance in order to recognize relevant semantic information for driving the vehicle and the driving support by the driving support system .
  • the resolution is typically reduced, so that the semantic image has fewer pixels than the image information that is supplied to the neural network.
  • the semantic image typically includes confidence values for different classes of objects that are to be recognized in the image information, for example cars, trucks, pedestrians, bicycles, trees, traffic lights or the like.
  • the mapping of the surrounding points of the point cloud directly to positions in the semantic image takes place in the control unit.
  • a position in the semantic image is determined for each surrounding point.
  • the position is defined by a two-dimensional vector.
  • the surrounding points can in principle be assigned to individual pixels of the semantic image.
  • the position in the semantic image is preferably specified for each surrounding point with real values, i.e. positions between centers of the individual pixels of the semantic image are determined, for example as floating point values.
  • the environmental information is generated by assigning semantic information of the semantic image to each environmental point of the point cloud based on the mapping of the respective environmental point to the corresponding position in the semantic image. A mapping of the semantic information is therefore carried out at the determined position of each environmental point relative to this environmental point.
  • the semantic information of a surrounding point can be formed by the semantic information of a single pixel of the semantic image or by the semantic information of a plurality of pixels of the semantic image in combination.
  • the control unit includes at least one processor and one memory in order to execute a program for carrying out a support function of the driving support system.
  • the control unit processes the point cloud captured by the at least one environment sensor and the image information captured by the optical camera and generates the environment information based thereon.
  • the data connection is designed, for example, in the manner of a bus system that is customary in the automotive sector.
  • bus systems such as CAN, FlexRay, LIN or others are known in this context.
  • BR Ethernet or LVDS for cameras is usually used.
  • the method includes a step for temporally synchronizing the acquisition of the image information with the optical camera and the acquisition of the point cloud with the at least one environmental sensor.
  • the temporal synchronization ensures that the image information and the point cloud contain information that corresponds to one another, so that the semantic information can be correctly assigned for each point surrounding the point cloud.
  • the temporal synchronization of the capturing of the image information with the optical camera and the capturing of the point cloud with the at least one environmental sensor can be implemented in different ways.
  • the optical camera and the at least one environmental sensor can be operated in a synchronized manner, so that the image information and the point cloud are generated essentially simultaneously.
  • the temporal synchronization can include providing a common time base, so that a time stamp can be assigned to the image information and the point cloud.
  • a time stamp can be assigned to the image information and the point cloud.
  • fundamental differences can have to be taken into account when capturing the image information with the optical camera and when capturing the point cloud with the at least one environmental sensor.
  • the optical camera captures the image information over a period of time that is classically referred to as the "exposure time" in order to capture a sufficient amount of light with its sensor.
  • the point cloud can be recorded in different ways.
  • LiDAR-based environmental sensors are known which, as flash LiDAR, enable the entire point cloud or areas of the point cloud to be recorded simultaneously.
  • LiDAR-based environmental sensors are known which allow the point cloud to be recorded in columns or rows.
  • LiDAR-based environmental sensors with an individual detection of each environmental point of the point cloud are known.
  • the acquisition of the point cloud can extend over time intervals of different lengths. The same applies in principle to radar sensors.
  • the method includes a step for determining a fixed mapping rule for the points surrounding the point cloud to positions in the semantic image, and mapping the points surrounding the point cloud directly to positions in the semantic image includes mapping the points surrounding the point cloud to the positions in the semantic image with the fixed mapping rule of the surrounding points of the point cloud to positions in the semantic image.
  • the mapping of the surrounding points of the point cloud to the positions in the semantic image using the fixed mapping rule can be carried out very efficiently and with little computational effort compared to individually implemented mapping of the surrounding points of the point cloud to the positions in the semantic image. In addition, little storage space is required for mapping. It is only necessary to determine the fixed mapping rule once in advance.
  • the fixed mapping rule can be in the form of a “look-up table” (Lut) in the Control unit of the driving support system to be stored.
  • the use of the look-up table enables the surrounding points of the point cloud to be mapped quickly and efficiently directly to the positions in the semantic image.
  • a fixed mapping rule is thus stored in the look-up table for all surrounding points.
  • the acquisition of the image information with the optical camera and the acquisition of the point cloud with the at least one environmental sensor are preferably synchronized in time. This further reduces imaging errors.
  • the mapping of the surrounding points of the point cloud directly to positions in the semantic image includes an interpolation of the positions in the semantic image to pixels of the semantic image.
  • the interpolation enables an exact determination of the semantic information for each point in the surrounding area.
  • Soft transitions are also made possible for the semantic information of neighboring surrounding points.
  • the interpolation of the positions in the semantic image to pixels of the semantic image includes bilinear interpolation, nearest neighbor classification, use of a support vector machine or application of a Gaussian process.
  • Corresponding methods for interpolation are known as such in the prior art.
  • the mapping of the points surrounding the point cloud directly to positions in the semantic image includes mapping the points surrounding the point cloud directly to positions in the semantic image based on one or more parameters from an extrinsic calibration of the optical camera, an intrinsic calibration of the optical camera, a position of the at least one environmental sensor, a pose of the at least one environmental sensor and a distance of the corresponding environmental points of the point cloud from the at least one environmental sensor.
  • the surrounding points as detected by the surrounding sensor in sensor coordination, can first be transformed into image coordinates, so that the surrounding points can then be simply mapped in image coordinates to the positions in the semantic image.
  • the method includes an additional step for generating an environment map based on the environment information with the semantic information for each environment point of the point cloud.
  • the environment map covers an area around the vehicle and can easily be used by various driving support functions of the vehicle.
  • the environment points are marked with the semantic information in the environment of the vehicle.
  • the surroundings map can be generated, for example, in the manner of a grid occupancy map, in which individual grid elements are covered with their semantic information based on the surrounding points.
  • the at least one environmental sensor is designed as a LiDAR-based environmental sensor and/or as a radar sensor.
  • Corresponding environmental sensors are known as such and are already widely used.
  • a combination of a plurality of environment sensors and/or a plurality of optical cameras can also be carried out in order, for example, to capture a large area of the vehicle's surroundings.
  • sensor-dependent parameters such as spatial uncertainty of radar sensors when mapping the environmental points of the point cloud directly to positions in the semantic image.
  • the at least one environment sensor and the optical camera are designed as a sensor unit for joint attachment as a sensor unit on the vehicle.
  • the joint provision of the at least one environmental sensor and the optical camera enables simple and quick assembly in one assembly step.
  • a compact provision of the at least one environmental sensor with the optical camera is made possible.
  • the provision of the at least one environment sensor and the optical camera as a sensor unit typically causes the at least one environment sensor and the optical camera to be arranged at a small distance from one another, which simplifies processing of the point cloud together with the image information.
  • the at least one environment sensor and the optical camera are designed to be attached to the vehicle at a small distance
  • the driving support system is designed to use the above method with a step for determining a fixed mapping rule of the surrounding points of the point cloud to positions in the to perform the semantic image, and to perform the mapping of the points surrounding the point cloud directly to positions in the semantic image with a mapping of the points surrounding the point cloud to the positions in the semantic image with the fixed mapping rule of the points surrounding the point cloud to positions in the semantic image.
  • control unit has a plurality of data processing units and is designed to operate in parallel to map the surrounding points of the point cloud directly to positions in the semantic image and/or to generate the surrounding information by assigning the semantic information for each surrounding point of the point cloud on the mapping of the respective environmental point to the corresponding position in the semantic image.
  • the individual surrounding points of the point cloud are independent of one another and can also be processed independently of one another.
  • Graphics processors GPU with a number of parallel computing cores have proven their worth for parallel information processing. Since the mapping of the surrounding points of the point cloud directly to positions in the semantic image is also a graphic function, the mapping can be carried out particularly efficiently with graphics processors. The same applies in detail, for example, to a transformation of the surrounding points into sensor coordinates into image coordinates.
  • FIG. 1 shows a view of a diagram for generating environmental information based on a combination of image information of the environment from an optical camera and a point cloud of the environment with a plurality of environmental points from an environment sensor from the prior art
  • FIG. 2 shows a schematic representation of processing of the image information for generating a semantic image with a neural network and for upsampling in accordance with FIG. 1 ,
  • FIG. 3 shows a schematic representation of a processing of the semantic image for upsampling in accordance with FIG. 1 ,
  • Fig. 4 is a schematic view of a vehicle with a
  • Driving support system with a sensor unit comprising a LiDAR-based environment sensor and an optical camera and with a control unit, which are connected to one another via a data connection, according to a first preferred embodiment,
  • FIG. 5 shows a view of a diagram for generating environmental information based on a combination of image information of the environment from the optical camera and a point cloud of the environment with a plurality of environmental points from the environmental sensor with the driving support system from FIG. 4,
  • FIG. 6 shows a schematic representation of processing of the image information to generate a semantic image in accordance with FIG. 5
  • 7 shows a schematic representation of image information with a depiction of surrounding points of the point cloud and a highlighting of identified objects in the image information
  • FIG. 8 shows a detailed view of a mapping of a point surrounding the point cloud from FIG. 7 onto a semantic image
  • FIG. 9 shows a flow chart for generating the environmental information based on a combination of image information of the environment from the optical camera and a point cloud of the environment with a plurality of environmental points from the environmental sensor in accordance with the illustration from FIG. 5.
  • FIG. 1 shows a vehicle 10 with a driving support system 12 according to a first preferred embodiment.
  • the driving support system 12 can be designed to provide any support functions for driving the vehicle 10 or with the vehicle 10 . This can involve driver assistance systems that assist a human driver in driving the vehicle 10, as well as the provision of functionalities for carrying out autonomous or semi-autonomous driving functions.
  • driver assistance systems are known, for example, under the term ADAS (Advanced Driver Assistance Systems).
  • the driving assistance system 12 includes a sensor unit 14 with a LiDAR-based surroundings sensor 16 and an optical camera 18.
  • the driving assistance system 12 also comprises a control unit 20 and a data connection 22 via which the LiDAR-based surroundings sensor 16, the optical camera 18 and the control unit 20 are connected to each other.
  • the sensor unit 14 is designed for joint attachment of the LiDAR-based surroundings sensor 16 and the optical camera 18 to the vehicle 10.
  • the LiDAR-based surroundings sensor 16 and the optical camera 18 are attached to the vehicle 10 at a small distance from one another.
  • the LiDAR-based environment sensor 16 is designed to capture an environment 24 of the vehicle 10 .
  • the surroundings 24 are recorded as a point cloud 26 with a plurality of surrounding points 28 which are arranged in a plurality of scan planes 30, as can be seen from FIG.
  • the surrounding points 28 are generated in that laser pulses are emitted and reflections of the emitted laser pulses are received, so that a distance value can be determined from the resulting propagation time.
  • Each of the environmental points 28 is defined by its angular position in relation to the LiDAR-based environmental sensor 16 and the associated distance value.
  • the laser pulses are emitted with a uniform angular spacing.
  • Optical camera 18 is also designed to capture surroundings 24 of vehicle 10 .
  • the detection takes place in a known manner based on a dot matrix with individual image points, which are also referred to as pixels and each include brightness information and/or color information.
  • the optical camera 18 provides corresponding image information 32, as shown in FIGS. 6 and 7 by way of example.
  • the environment 24 of the vehicle 10 is an area captured by the optical camera 18 and the LiDAR-based environment sensor 16 .
  • Environment 24 should have a field of view of 90° to 180° in the direction of travel and extend up to a distance of 100 or 200 meters from vehicle 10, as is generally the case in accordance with the illustration in FIG.
  • the control unit 20 includes a processor and a memory for executing a program for performing a support function of the driving support system 12 as well as for performing the method described below.
  • the control unit 20 controls the LiDAR-based surroundings sensor 16 and/or the optical camera 16 and receives and processes point clouds 26 provided by the LiDAR-based surroundings sensor 16 and image information 32 provided by the optical camera 18.
  • the data connection 22 is designed, for example, in the manner of a bus system that is customary in the automotive sector. Various bus systems such as CAN, FlexRay, LIN or others are known in this context.
  • FIG. 9 shows a flow chart of the method.
  • step S100 the method starts with the determination of a fixed mapping rule of the surrounding points 28 of the point cloud 26 generated by the LiDAR-based surrounding sensor 16 on positions in a semantic image.
  • the mapping rule is calculated dynamically, since the cameras and lidars are usually installed at different positions, and this can result in imaging errors that are too large. But this also depends on the specific application.
  • the fixed mapping rule can be stored in the control unit 20 of the driving support system 12 in the form of a “look-up table” (Lut), for example.
  • the fixed mapping rule can be determined once for the driving support system 12, for example during installation or as part of a calibration. Details on the content of the look-up table result from step S150 described below.
  • Step S110 relates to capturing the image information 32 of the surroundings 24 of the vehicle 10 with the optical camera 18. Details on the function of the optical camera 18 have already been explained above.
  • the image information 32 is transmitted from the optical camera 18 to the control unit 20 via the data connection 22 .
  • Step S120 relates to capturing the point cloud 26 of the surroundings 24 of the vehicle 10 with the plurality of surrounding points 28 using the LiDAR-based surroundings sensor 16. Details on the function of the LiDAR-based surroundings sensor 16 have already been explained above.
  • the point cloud 26 is transmitted from the LiDAR-based environment sensor 16 to the control unit 20 via the data connection 22 .
  • Step S130 relates to a temporal synchronization of the acquisition of the image information 32 with the optical camera 18 and the acquisition of the point cloud 26 with the LiDAR based environment sensor 16.
  • the temporal synchronization of the acquisition of the image information 32 and the point cloud 26 can be carried out, for example, by synchronized operation of the optical camera 18 and the LiDAR-based environment sensor 16, so that the image information 32 and the point cloud 26 are generated essentially simultaneously .
  • the temporal synchronization can include providing a common time base for the optical camera 18 and the LiDAR-based environment sensor 16, with which the image information 32 and the point cloud 26 is assigned a time stamp.
  • a temporal interpolation of the detections in image space can achieve an approximate interpolation as long as the time bases are synchronous.
  • Steps S110 and S120 can thus be carried out at the same time or with a slight time offset in any sequence.
  • Step S140 relates to generating the semantic image 34 of the surroundings 24 of the vehicle 10 based on the image information 32, which is fed to a neural network 36, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, with an opposite of the image information 32 reduced number of pixels 38.
  • the semantic image 34 is shown schematically (in part) with a plurality of pixels 38 in FIG.
  • Semantic image 34 of environment 24 of vehicle 10 is generated based on processing of image information 32 captured by optical camera 18.
  • image information 32 is fed to neural network 36, which processes image information 32 in a plurality of layers 40 .
  • the semantic image 34 is created by semantic segmentation of image content.
  • the semantic image 34 thus includes the objects 42 shown by way of example as animals in the image information 32 in FIG. Vehicles are recorded as objects 42 in the exemplary image information 32 from FIG.
  • the objects 42 are marked with an object frame in FIG. 7, but this is not necessary.
  • Starting from the Image information 32 from FIG. 7 includes the semantic image 34 for each of its pixels 38 confidence values for different classes of objects 42 that are to be recognized in the image information 32, for example cars, trucks, pedestrians, bicycles, trees, traffic lights or the like.
  • Step S150 relates to mapping the surrounding points 28 of the point cloud 26 directly onto positions 44 in the semantic image 34.
  • the surrounding points 28 of the point cloud 26 are first transformed into surrounding points in image coordination 46 starting from their sensor coordinates.
  • the transformation is based on one or more parameters from an extrinsic calibration of the optical camera 18, an intrinsic calibration of the optical camera 18, a position of the LiDAR-based environmental sensor 16, a pose of the LiDAR-based environmental sensor 16 and a distance between the corresponding environmental points 28 the point cloud 26 from the LiDAR-based environment sensor 16.
  • the transformation of the point cloud 26 into sensor coordinates into image coordinates of the optical camera 18 is essentially independent of the distance measurement of the respective surrounding points 28 due to the joint design of the optical camera 18 and the LiDAR based environment sensor 16 in the sensor unit 14 and the resulting positioning close together.
  • the surrounding points in image coordination 46 are then mapped directly to positions 44 in the semantic image 34 for the point cloud 26 by determining an associated position 44 in the semantic image 34 for each surrounding point 28, as shown by way of example in FIG.
  • the position 44 is defined by a two-dimensional vector with real values, i.e. positions between centers of the individual pixels 38 of the semantic image 34 are determined, for example as floating point values.
  • the imaging of the surrounding points 28 directly onto the positions 44 in the semantic image 34 thus takes place independently of the distance value of each surrounding point.
  • the illustration is as described here in the Stored look-up table so that the position 44 can be determined for each surrounding point 28 with the fixed mapping rule.
  • the position 44 is therefore independent of its distance value for each surrounding point 28 .
  • the mapping of the surrounding points 28 of the point cloud 26 directly to positions 44 in the semantic image 34 includes an interpolation of the positions 44 in the semantic image 34 to pixels 38 of the semantic image 34.
  • the interpolation is carried out as bilinear interpolation.
  • the local position 44 of the corresponding surrounding point 28 is in a corner area of the pixel 38, so that the semantic information of the surrounding point 28 for this position 44 is based on the semantic information of all the adjacent pixels 38 that are there in addition denoted by a, b, c, d is formed in combination.
  • the interpolation can also be defined in advance.
  • Step S160 relates to generating surrounding information 48 by assigning the semantic information for each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34. It is therefore each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34, a mapping of the semantic information at the determined position 44 of each surrounding point 28 is carried out in order to assign the semantic information at the corresponding position 44 to this.
  • control unit 20 has a plurality of data processing units (not shown individually) and is designed to operate in parallel to map surrounding points 28 of point cloud 26 directly to positions 44 in semantic image 34 and to generate surrounding information 48 by assigning the semantic information for to carry out each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34 .
  • Graphics processors (GPU) with a number of parallel computing cores have proven their worth for parallel information processing.
  • Step S170 relates to generating an environment map based on the environment information 48 with the semantic information for each environment point 28 of the point cloud 26.
  • the environment map includes an area around the vehicle 10 for use by various driving support functions of the vehicle 10.
  • the environment map is, for example, according to the type generates a grid occupancy map in which individual grid elements based on the surrounding points 28 are covered with their semantic information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

L'invention concerne un procédé pour générer des informations d'environnement (48) d'un environnement (24) d'un véhicule (10) équipé d'un système d'aide à la conduite (12), comportant au moins un capteur d'environnement (16) et une caméra optique (18), ledit procédé comprenant les étapes consistant à : détecter des informations d'image (32) de l'environnement (24) du véhicule (10) au moyen de la caméra optique (18), détecter un nuage de points (26) de l'environnement (24) du véhicule (10), comprenant une pluralité de points d'environnement (28), au moyen du ou des capteur(s) d'environnement (16), générer une image sémantique (34) de l'environnement (24) du véhicule (10) sur la base des informations d'image (32) qui sont fournies à un réseau de neurones, en particulier un réseau de neurones à convolution, de préférence encore un réseau de neurones entièrement convolutif, FCN, comprenant un nombre de points d'image (38) réduit par rapport aux informations d'image (32), représenter les points d'environnement (28) du nuage de points (26) directement sur des positions (44) dans l'image sémantique (34), et à générer les informations d'environnement (48) par association des informations sémantiques pour chaque point d'environnement (28) du nuage de points (26), en fonction de la représentation du point d'environnement (28) respectif sur la position (44) correspondante dans l'image sémantique (34). Cette invention concerne en outre un système d'aide à la conduite (12) qui est conçu pour mettre en œuvre le procédé ci-dessus.
PCT/EP2022/063407 2021-05-20 2022-05-18 Transfert d'informations sémantiques sur des nuages de points WO2022243357A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102021113052.2 2021-05-20
DE102021113052.2A DE102021113052A1 (de) 2021-05-20 2021-05-20 Übertragen von semantischen Informationen auf Punktwolken

Publications (1)

Publication Number Publication Date
WO2022243357A1 true WO2022243357A1 (fr) 2022-11-24

Family

ID=82067716

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/063407 WO2022243357A1 (fr) 2021-05-20 2022-05-18 Transfert d'informations sémantiques sur des nuages de points

Country Status (2)

Country Link
DE (1) DE102021113052A1 (fr)
WO (1) WO2022243357A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232947A1 (en) * 2017-02-11 2018-08-16 Vayavision, Ltd. Method and system for generating multidimensional maps of a scene using a plurality of sensors of various types
US20200142421A1 (en) * 2018-11-05 2020-05-07 GM Global Technology Operations LLC Method and system for end-to-end learning of control commands for autonomous vehicle
CN112801124A (zh) * 2019-11-14 2021-05-14 动态Ad有限责任公司 用于3d对象检测的序贯融合

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180232947A1 (en) * 2017-02-11 2018-08-16 Vayavision, Ltd. Method and system for generating multidimensional maps of a scene using a plurality of sensors of various types
US20200142421A1 (en) * 2018-11-05 2020-05-07 GM Global Technology Operations LLC Method and system for end-to-end learning of control commands for autonomous vehicle
CN112801124A (zh) * 2019-11-14 2021-05-14 动态Ad有限责任公司 用于3d对象检测的序贯融合
US20220080999A1 (en) * 2019-11-14 2022-03-17 Motional Ad Llc Sequential fusion for 3d object detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LONG JONATHAN ET AL: "Fully convolutional networks for semantic segmentation", 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 7 June 2015 (2015-06-07), pages 3431 - 3440, XP032793793, DOI: 10.1109/CVPR.2015.7298965 *

Also Published As

Publication number Publication date
DE102021113052A1 (de) 2022-11-24

Similar Documents

Publication Publication Date Title
DE102019115874A1 (de) Systeme und verfahren zur verbesserten entfernungsschätzung durch eine monokamera unter verwendung von radar- und bewegungsdaten
DE112018000899T5 (de) Gemeinsame 3D-Objekterfassung und Ausrichtungsabschätzung über multimodale Fusion
DE102016122190A1 (de) Verfahren und Systeme zur Stixel-Schätzung
DE102018121008B4 (de) System zum fahren eines autonomen fahrzeugs sowie damit ausgestattetes fahrzeugs
EP3117399B1 (fr) Procédé pour assembler des images élémentaires qui ont été prises par un système d'appareils photographiques à partir de différentes positions, pour former une image unique
DE102016208056A1 (de) Verfahren und Vorrichtung zur Verarbeitung von Bilddaten und Fahrerassistenzsystem für ein Fahrzeug
DE102021103151A1 (de) Systeme und verfahren zur bildunschärfeentfernung in einem fahrzeug
DE102021124986A1 (de) Bildeinfärbung für fahrzeugkamerabilder
DE112020004301T5 (de) Objekterkennungsvorrichtung
DE102018212049A1 (de) Verfahren zur dreidimensionalen bildlichen Rekonstruktion eines Fahrzeugs
DE10141055B4 (de) Verfahren zur Bestimmung von Bewegungsinformationen
DE102019131971A1 (de) Ein Bildverarbeitungsmodul
DE102020127000A1 (de) Erzeugung von zusammengesetzten bildern unter verwendung von zwischenbildflächen
DE102016203710A1 (de) Abstands- und Richtungsschätzung eines Zielpunkts von einem Fahrzeug unter Verwendung einer monokularen Videokamera
DE102016124978A1 (de) Virtuelle Repräsentation einer Umgebung eines Kraftfahrzeugs in einem Fahrerassistenzsystem mit mehreren Projektionsflächen
EP1460454A2 (fr) Procédé de traitement combiné d'images à haute définition et d'images vidéo
WO2020164671A1 (fr) Procédé modulaire d'inpainting
DE112015001088T5 (de) Fahrzeugumgebungsbildanzeigevorrichtung und Fahrzeugumgebungsbildanzeigeverfahren
DE102020134584A1 (de) RÄUMLICH UND ZEITLICH KOHÄRENTE MULTI-LiDAR-PUNKTWOLKEN-FUSION
DE112015000763T5 (de) Fahrzeugumgebungsbildanzeigevorrichtung undFahrzeugumgebungsbildanzeigeverfahren
DE102018108751A1 (de) Method, system and device of obtaining 3D-information of objects
EP3815044A1 (fr) Procédé de représentation d'un environnement basée sur capteur et mémoire, dispositif d'affichage et véhicule équipé d'un tel dispositif d'affichage
DE102011082881A1 (de) Darstellung der Umgebung eines Kraftfahrzeugs in einer bestimmten Ansicht unter Verwendung räumlicher Information
WO2022243357A1 (fr) Transfert d'informations sémantiques sur des nuages de points
DE102014201409A1 (de) Parkplatz - trackinggerät und verfahren desselben

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22730682

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22730682

Country of ref document: EP

Kind code of ref document: A1