WO2022243357A1

WO2022243357A1 - Transfer of semantic information to point clouds

Info

Publication number: WO2022243357A1
Application number: PCT/EP2022/063407
Authority: WO
Inventors: Jens HONER
Original assignee: Valeo Schalter Und Sensoren Gmbh
Priority date: 2021-05-20
Filing date: 2022-05-18
Publication date: 2022-11-24
Also published as: DE102021113052A1

Abstract

The invention relates to a method for generating surroundings information (48) of the surroundings (24) of a vehicle (10) comprising a driving assistance system (12) which has at least one surroundings sensor (16) and an optical camera (18), said method comprising the steps of: using the optical camera (18) to acquire image information (32) of the surroundings (24) of the vehicle (10); using the at least one surroundings sensor (16) to acquire a point cloud (26) of the surroundings (24) of the vehicle (10) having a plurality of surroundings points (28); generating a semantic image (34) of the surroundings (24) of the vehicle (10) based on the image information (32) which is fed to a neural network, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, having a reduced number of pixels (38) compared with the image information (32); mapping the surroundings points (28) of the point cloud (26) directly onto positions (44) in the semantic image (34); and generating the surroundings information (48) by assigning the semantic information for each surroundings point (28) of the point cloud (26) to the corresponding position (44) in the semantic image (34) based on the mapping of the associated surroundings point (28). The invention also relates to a driving assistance system (12) which is designed to perform the above method.

Description

Transferring semantic information to point clouds

FIELD OF THE INVENTION The present invention relates to a method for generating information about the surroundings of a vehicle with a driving support system that has at least one surroundings sensor and a detection device, for example a flash lidar or an optical camera. The method comprises the steps of capturing image information around the vehicle with the optical camera, capturing a point cloud around the vehicle with a plurality of surrounding points using the at least one environment sensor, and generating a semantic image of the environment around the vehicle based on the image information that is fed to a neural network, in particular a convolutional neural network, particularly preferably a completely convolutional neural network, FCN, with a reduced number of pixels compared to the image information.

The present invention also relates to a driving support system for generating information about the surroundings of a vehicle with at least one surroundings sensor, an optical camera, a control unit, and a data connection via which the at least one surroundings sensor, the optical camera and the control unit are connected to one another, the Driving support system is designed to carry out the above method.

Driving support systems are becoming more and more important in current vehicles in order to increase driving safety when driving the vehicle. This applies both to driver assistance systems that assist a human driver in driving the vehicle and to the provision of functionalities for carrying out autonomous or semi-autonomous driving functions.

A basis for this is a reliable detection of environmental information of an environment of a vehicle. Both geometric and semantic information about the environment are essential. Geometric information relates to a desertification of objects and structures in the environment, and semantic information to an assignment of different categories to the objects and structures. There are approaches to classifying the semantics directly on point clouds. However, it is the case that semantics can be derived more easily from camera data. Accordingly, the effort involved in semantics on point clouds is typically higher and the results are worse. The invention described here attempts to use the sensors according to their respective strengths.

For example, environment sensors such as LiDAR-based environment sensors or radar sensors are known in the prior art, which can determine a geometric structure of the environment reliably and with a high level of accuracy. These surroundings sensors typically provide a point cloud of the surroundings of the vehicle with a plurality of surrounding points. Each of the environmental points is defined by its angular position with respect to the environmental sensor, its elevation angle, and an associated distance value. The environmental points thus indicate the positions of the objects and structures in the area surrounding the vehicle. For example, in LiDAR-based environmental sensors, discrete laser pulses are emitted at an angular spacing of about 0.1 degrees in the horizontal direction. Reflections of the emitted laser pulses are received by the LiDAR-based environmental sensor, and the corresponding distance value can be determined from a runtime from the emission of the laser pulse to the receipt of the associated reflection. The LiDAR-based environmental sensor can emit the laser pulses in one or more scan planes, with the angular distance in the vertical direction being greater than in the horizontal direction when used on vehicles. The details regarding angular distances in horizontal and vertical directions as well as a total number of scan planes depend on the LiDAR-based environmental sensor used in each case.

Although semantic information about the environment can be determined from the point cloud recorded in this way, the semantic information obtained in this way is not very reliable due to the relatively large angular distances and the lack of detailed information in relation to the objects and structures, such as color information from a camera. With current LiDAR-based environmental sensors, additional information can be provided for each of the environmental points, for example as intensity values of the received reflections. This allows the determination of semantic information of the environment improved, but this information is still not reliable enough.

In contrast, an optical camera provides image information as dense information with small angular distances between individual pixels. In addition, since the pixels are defined by a chip area of the image sensor and a distance between the pixels on the image sensor is very small, gaps between the pixels tend to zero. Correspondingly, a camera generates dense information in this sense as an undetected area tends to zero, while LiDAR-based environmental sensors emit discrete laser pulses that have a small expansion, leaving gaps between adjacent laser pulses that LiDAR-based environmental sensors do not detect are, and thus do not contribute to the detection of the environment. Camera systems typically realize higher resolution, and the passiveness of a camera compared to the active exposure in lidar creates a more even distribution of information on a "pixel". In addition, optical cameras can provide the image information with color information for the individual pixels, which represent additional information for the semantic processing. This higher amount of information to be processed and the density of the image information leads to a good performance of the semantic segmentation of the image information provided with the optical camera. However, due to the lack of depth information, the geometric structure can only be determined poorly. It is true that depth estimates based on the image information of individual cameras (mono cameras) are also known. However, the depth estimates are mostly derived from semantics. Either implicitly via semantics, or stereo or pseudo-stereo by means of different poses of the moving vehicle.

Accordingly, a combination of the geometric information of the point cloud with the semantic information based on the image information is desirable in order to obtain environment information for an effective and reliable detection of the environment of the vehicle. Such a prior art combination is shown schematically in the diagram in Figure 1 with additional reference to Figures 2 and 3.

According to FIG. 1, image information 100 with a resolution Fl ^* B is provided by an optical camera (not shown). The image information 100 contains in this Exemplary embodiment information from three color channels K. The image information 100 is processed with a neural network 102 in a first processing step. As a result, a semantic image 104 is generated, which has a lower resolution H/N ^* B/M than the image information 100 and contains semantic information for K classes for each pixel. Details on this are shown in FIG. The neural network 102 is shown there, which has a plurality of layers 106 which process the image information 100 in stages. The neural network 102 has been previously trained for semantic segmentation.

A highly sampled semantic image 108 is generated from the semantic image 104 by bilinear upsampling, which has the resolution Fl ^* B of the image information 100 in order to generate a semantic mask for the entire image information 100 . Bilinear upsampling is an example here, but there are other methods as well. In this exemplary embodiment, the semantic image 104 as well as the upsampled semantic image 108 contain semantic information for 21 classes. An exemplary representation 110 in which the respective class with the highest confidence value is transferred to the image information 100 allows individual objects 112 to be identified. The upsampling is shown in FIG. 3 as an example. There, the semantic image 104 with the resolution H/N ^* B/M is processed in an upsampling layer 114 and enlarged to the resolution Fl ^* B of the image information 100 .

In addition, according to FIG. 1, a point cloud 116 is provided by an environment sensor (not shown) in sensor coordination of the environment sensor. The surrounding points contained in the point cloud 116 are transformed into image coordinates, as a result of which the point cloud is provided in image coordinates 118 .

The point cloud in image coordinates 118 and the upsampled semantic image 108 are merged into the environment information 120 by pixel mapping. The environmental information 120 is a semantic point cloud, i.e. each environmental point of the point cloud 116 is assigned the semantic information of the pixel of the upsampled semantic image 108, which corresponds to its position in image coordinates.

The upsampling is resource-intensive, since this step scales with the number of pixels (=Fl ^* B) of the image information 100 . Furthermore, the transfer is great Amounts of data are resource-intensive due to many write operations and often generate latencies that are undesirable. But not only write operations, but also the data structure has to be packed in packets, see TCP/IP protocol for example, and decoded again on the opposite side, usually the ECU, and assembled to form the original data structure. Reading and the network resources used can also limit the system.

Proceeding from the above-mentioned prior art, the invention is therefore based on the object of specifying a method for generating information about the surroundings of a vehicle with a driving assistance system that has at least one surroundings sensor and an optical camera, as well as such a driving assistance system that provides an efficient and enable reliable generation of environment information with geometric and semantic information.

The object is achieved according to the invention by the features of the independent claims. Advantageous refinements of the invention are specified in the dependent claims.

According to the invention, a method for generating information about the surroundings of a vehicle with a driving support system that has at least one surroundings sensor and an optical camera is specified, comprising the steps of capturing image information of the surroundings of the vehicle with the optical camera, capturing a point cloud of the environment of the vehicle with a plurality of surrounding points with the at least one surrounding sensor, generating a semantic image of the surrounding of the vehicle based on the image information that is supplied to a neural network, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, with a compared to the image information reduced number of pixels, mapping the surrounding points of the point cloud directly to positions in the semantic image, and generating the environment information by assigning the semantic information for each environment point of the point cloud based on the mapping of the respective surrounding point to the corresponding position in the semantic image. According to the invention, a driving support system for generating information about the surroundings of a vehicle is also specified, with at least one surroundings sensor, an optical camera, a control unit, and a data connection via which the at least one surroundings sensor, the optical camera and the control unit are connected to one another, the driving support system is designed to carry out the above method.

The basic idea of the present invention is therefore to carry out an efficient determination of the semantic information for all surrounding points of the point cloud by mapping the surrounding points to the positions in the semantic image, without upsampling of the semantic image being necessary. On the one hand, this is advantageous because the image information typically contains significantly more picture elements (pixels) than surrounding points are contained in the point cloud, and thus only the small number of surrounding points is processed instead of generating a larger number of picture elements for a highly sampled semantic image. In addition, the upsampling of the semantic image represents a resource-intensive processing step.

In this case, the entire captured image information can be used to generate the semantic information. At the same time, parts of the image information that are not required for assigning the semantic information to the surrounding points cannot be processed further in order to save resources. In this context, it must also be taken into account that upsampling, as is carried out in the prior art, does not generate any additional information, but merely represents the existing information differently. By mapping the surrounding points of the point cloud directly to positions in the semantic image, the information contained in the semantic image can thus be fully used without a disadvantage arising compared to the mapping to the highly sampled semantic image used in the prior art. Due to the angular distances between the surrounding points and the resulting small number of surrounding points compared to the pixels of the image information, most of the information in a highly sampled semantic image is redundant and is not required for transferring the semantic information to the surrounding points of the point cloud. Also, the amount of data of the upsampled semantic image with the dimension (Fl ^* B ^* max(K)) alone is compared with the "original" semantic Image with the dimension (H/N ^* W/M ^* max(K)) is significantly larger, which, according to the invention, leads to a reduction in memory requirements by a factor of 1/(N ^* M).

The environment information refers to information that defines the environment, in particular to discover obstacles or potential dangers for the vehicle. For this purpose, the environmental information is formed with geometric information and with semantic information. Geometric information relates to a desertification of objects and structures in the environment, and semantic information to an assignment of different categories to the objects and structures.

The surroundings of the vehicle is an area that is captured by the optical camera and the at least one surroundings sensor. Depending on the type and number of optical cameras and environmental sensors attached to the vehicle and their orientation, the environment can be recorded in full, i.e. 360° around the vehicle, or only in a partial area, for example in a field of view with 90° to 180° in driving direction. The range typically extends to a distance of 100 or 200 meters from the vehicle, but can also be greater or lesser in extent. In particular, the extent is not greater than a range of the at least one environmental sensor or the optical camera.

The driving support system can be designed to provide any support functions for driving the vehicle or with the vehicle. This can involve driver assistance systems that assist a human driver in driving the vehicle, as well as the provision of functionalities for carrying out autonomous or semi-autonomous driving functions. Various driver assistance systems are known, for example, under the term ADAS (Advanced Driver Assistance Systems).

Capturing the image information of the surroundings of the vehicle with the optical camera includes generating a two-dimensional matrix with image points, which are also referred to as pixels. Typical resolutions of optical cameras are in the range of one megapixel or more per image. Typically, the image information is continuously provided anew, for example in the manner of a Video streams with consecutive frames, which each form the image information. The image information is preferably provided by the optical camera with color information for the individual pixels. Optical cameras typically have a viewing angle of less than 180°, so that in order to monitor the surroundings at an angle of more than 360°, image information from a number of optical cameras must be processed together. Correspondingly, the driving support system can comprise a plurality of optical cameras, for example one camera being arranged on each side of the vehicle.

Capturing a point cloud surrounding the vehicle provides the point cloud with a plurality of surrounding points. Each of the environmental points is defined by its angular position with respect to the environmental sensor and an associated distance value. The environmental points thus indicate the positions of objects or structures in the area surrounding the vehicle. The point cloud is transmitted from the at least one environmental sensor to the control unit via the data connection. In the case of LiDAR-based environmental sensors, for example, discrete laser pulses are emitted at an angular spacing of, for example, approximately 0.1 degrees in the horizontal direction. But there are other lidar technologies that work with continuous illumination. The invention described here can be applied in both cases. Reflections of the emitted laser pulses are received by the LiDAR-based environmental sensor, and the distance value for the respective environmental point can be determined from a transit time from the emission of the laser pulse to the receipt of the associated reflection. The LiDAR-based environmental sensor can emit the laser pulses in one or more scan planes, with the angular distance in the vertical direction being greater than in the horizontal direction when used on vehicles. The details regarding angular distances in horizontal and vertical directions as well as a total number of scan planes depends on the respective LiDAR-based environmental sensor. With current LiDAR-based environmental sensors, additional information can be provided for each of the environmental points, for example as intensity values of the received reflections.

A semantic image of the surroundings of the vehicle is generated based on the image information that is supplied to the neural network, in particular a convolutional neural network, particularly preferably one Completely convolutional neural network, FCN, with a reduced number of pixels compared to the image information. The image information is transmitted from the optical camera to the control unit via the data connection and processed there automatically. Corresponding implementations for the semantic segmentation of image information are known as such and can be implemented, for example, using the neural network, the neural network having to be trained accordingly in advance in order to recognize relevant semantic information for driving the vehicle and the driving support by the driving support system . When processing the image information in the neural network, the resolution is typically reduced, so that the semantic image has fewer pixels than the image information that is supplied to the neural network.

For each pixel, the semantic image typically includes confidence values for different classes of objects that are to be recognized in the image information, for example cars, trucks, pedestrians, bicycles, trees, traffic lights or the like.

The mapping of the surrounding points of the point cloud directly to positions in the semantic image takes place in the control unit. A position in the semantic image is determined for each surrounding point. In the case of two-dimensional images, the position is defined by a two-dimensional vector. Depending on the resolution of the semantic image, the surrounding points can in principle be assigned to individual pixels of the semantic image. However, the position in the semantic image is preferably specified for each surrounding point with real values, i.e. positions between centers of the individual pixels of the semantic image are determined, for example as floating point values.

The environmental information is generated by assigning semantic information of the semantic image to each environmental point of the point cloud based on the mapping of the respective environmental point to the corresponding position in the semantic image. A mapping of the semantic information is therefore carried out at the determined position of each environmental point relative to this environmental point. The semantic information of a surrounding point can be formed by the semantic information of a single pixel of the semantic image or by the semantic information of a plurality of pixels of the semantic image in combination. The control unit includes at least one processor and one memory in order to execute a program for carrying out a support function of the driving support system. The control unit processes the point cloud captured by the at least one environment sensor and the image information captured by the optical camera and generates the environment information based thereon.

The data connection is designed, for example, in the manner of a bus system that is customary in the automotive sector. Various bus systems such as CAN, FlexRay, LIN or others are known in this context. For the typical amounts of data, however, BR Ethernet or LVDS for cameras is usually used.

In an advantageous embodiment of the invention, the method includes a step for temporally synchronizing the acquisition of the image information with the optical camera and the acquisition of the point cloud with the at least one environmental sensor. The temporal synchronization ensures that the image information and the point cloud contain information that corresponds to one another, so that the semantic information can be correctly assigned for each point surrounding the point cloud. The temporal synchronization of the capturing of the image information with the optical camera and the capturing of the point cloud with the at least one environmental sensor can be implemented in different ways. For example, the optical camera and the at least one environmental sensor can be operated in a synchronized manner, so that the image information and the point cloud are generated essentially simultaneously. Alternatively or additionally, the temporal synchronization can include providing a common time base, so that a time stamp can be assigned to the image information and the point cloud. In this case, fundamental differences can have to be taken into account when capturing the image information with the optical camera and when capturing the point cloud with the at least one environmental sensor. In this way, the optical camera captures the image information over a period of time that is classically referred to as the "exposure time" in order to capture a sufficient amount of light with its sensor. In contrast, the point cloud can be recorded in different ways. For example, LiDAR-based environmental sensors are known which, as flash LiDAR, enable the entire point cloud or areas of the point cloud to be recorded simultaneously. Furthermore, LiDAR-based environmental sensors are known which allow the point cloud to be recorded in columns or rows. In addition, LiDAR-based environmental sensors with an individual detection of each environmental point of the point cloud are known. Thus, the acquisition of the point cloud can extend over time intervals of different lengths. The same applies in principle to radar sensors.

In addition, it can be advantageous to carry out a temporal interpolation in order to calculate the image information and the point cloud back to a common point in time, starting from real times of the respective acquisition, and in this way to provide synchronous image information and point clouds.

In an advantageous embodiment of the invention, the method includes a step for determining a fixed mapping rule for the points surrounding the point cloud to positions in the semantic image, and mapping the points surrounding the point cloud directly to positions in the semantic image includes mapping the points surrounding the point cloud to the positions in the semantic image with the fixed mapping rule of the surrounding points of the point cloud to positions in the semantic image. The mapping of the surrounding points of the point cloud to the positions in the semantic image using the fixed mapping rule can be carried out very efficiently and with little computational effort compared to individually implemented mapping of the surrounding points of the point cloud to the positions in the semantic image. In addition, little storage space is required for mapping. It is only necessary to determine the fixed mapping rule once in advance. The smaller the distance between the at least one environment sensor and the optical camera, the smaller typically is a mapping error of the fixed mapping rule compared to a respectively individually implemented mapping of the points around the point cloud to the positions in the semantic image. It is therefore preferred that the at least one environment sensor and the optical camera are mounted at a small distance from one another. As a result, for example, a transformation of the point cloud into sensor coordinates into image coordinates of the optical camera can be assumed essentially independently of the distance measurement of the respective surrounding points, so that the transformation is approximately constant for a typically static detection pattern of the at least one surrounding sensor. The fixed mapping rule can be in the form of a “look-up table” (Lut) in the Control unit of the driving support system to be stored. The use of the look-up table enables the surrounding points of the point cloud to be mapped quickly and efficiently directly to the positions in the semantic image. A fixed mapping rule is thus stored in the look-up table for all surrounding points. The acquisition of the image information with the optical camera and the acquisition of the point cloud with the at least one environmental sensor are preferably synchronized in time. This further reduces imaging errors.

In an advantageous embodiment of the invention, the mapping of the surrounding points of the point cloud directly to positions in the semantic image includes an interpolation of the positions in the semantic image to pixels of the semantic image. The interpolation enables an exact determination of the semantic information for each point in the surrounding area. Soft transitions are also made possible for the semantic information of neighboring surrounding points.

In an advantageous embodiment of the invention, the interpolation of the positions in the semantic image to pixels of the semantic image includes bilinear interpolation, nearest neighbor classification, use of a support vector machine or application of a Gaussian process. Corresponding methods for interpolation are known as such in the prior art.

In an advantageous embodiment of the invention, the mapping of the points surrounding the point cloud directly to positions in the semantic image includes mapping the points surrounding the point cloud directly to positions in the semantic image based on one or more parameters from an extrinsic calibration of the optical camera, an intrinsic calibration of the optical camera, a position of the at least one environmental sensor, a pose of the at least one environmental sensor and a distance of the corresponding environmental points of the point cloud from the at least one environmental sensor. For example, the surrounding points, as detected by the surrounding sensor in sensor coordination, can first be transformed into image coordinates, so that the surrounding points can then be simply mapped in image coordinates to the positions in the semantic image. By considering the various parameters, a reliable mapping of the Surrounding points of the point cloud take place directly on positions in the semantic image.

In an advantageous embodiment of the invention, the method includes an additional step for generating an environment map based on the environment information with the semantic information for each environment point of the point cloud. The environment map covers an area around the vehicle and can easily be used by various driving support functions of the vehicle. The environment points are marked with the semantic information in the environment of the vehicle. As a result, the surroundings map can be generated, for example, in the manner of a grid occupancy map, in which individual grid elements are covered with their semantic information based on the surrounding points.

In an advantageous embodiment of the invention, the at least one environmental sensor is designed as a LiDAR-based environmental sensor and/or as a radar sensor. Corresponding environmental sensors are known as such and are already widely used. A combination of a plurality of environment sensors and/or a plurality of optical cameras can also be carried out in order, for example, to capture a large area of the vehicle's surroundings. Depending on the type of environmental sensors, it can be advantageous to take into account sensor-dependent parameters such as spatial uncertainty of radar sensors when mapping the environmental points of the point cloud directly to positions in the semantic image.

In an advantageous embodiment of the invention, the at least one environment sensor and the optical camera are designed as a sensor unit for joint attachment as a sensor unit on the vehicle. The joint provision of the at least one environmental sensor and the optical camera enables simple and quick assembly in one assembly step. In addition, a compact provision of the at least one environmental sensor with the optical camera is made possible. The provision of the at least one environment sensor and the optical camera as a sensor unit typically causes the at least one environment sensor and the optical camera to be arranged at a small distance from one another, which simplifies processing of the point cloud together with the image information. In an advantageous embodiment of the invention, the at least one environment sensor and the optical camera are designed to be attached to the vehicle at a small distance, and the driving support system is designed to use the above method with a step for determining a fixed mapping rule of the surrounding points of the point cloud to positions in the to perform the semantic image, and to perform the mapping of the points surrounding the point cloud directly to positions in the semantic image with a mapping of the points surrounding the point cloud to the positions in the semantic image with the fixed mapping rule of the points surrounding the point cloud to positions in the semantic image. The smaller the distance, the fewer mapping errors occur when using the fixed mapping rule for mapping the points surrounding the point cloud directly onto positions in the semantic image.

In an advantageous embodiment of the invention, the control unit has a plurality of data processing units and is designed to operate in parallel to map the surrounding points of the point cloud directly to positions in the semantic image and/or to generate the surrounding information by assigning the semantic information for each surrounding point of the point cloud on the mapping of the respective environmental point to the corresponding position in the semantic image. The individual surrounding points of the point cloud are independent of one another and can also be processed independently of one another. Graphics processors (GPU) with a number of parallel computing cores have proven their worth for parallel information processing. Since the mapping of the surrounding points of the point cloud directly to positions in the semantic image is also a graphic function, the mapping can be carried out particularly efficiently with graphics processors. The same applies in detail, for example, to a transformation of the surrounding points into sensor coordinates into image coordinates.

The invention is explained in more detail below with reference to the attached drawing based on preferred embodiments. The features shown can be an aspect of the invention both individually and in combination represent. Features of different embodiments can be transferred from one

embodiment to another.

It shows

1 shows a view of a diagram for generating environmental information based on a combination of image information of the environment from an optical camera and a point cloud of the environment with a plurality of environmental points from an environment sensor from the prior art,

FIG. 2 shows a schematic representation of processing of the image information for generating a semantic image with a neural network and for upsampling in accordance with FIG. 1 ,

3 shows a schematic representation of a processing of the semantic image for upsampling in accordance with FIG. 1 ,

Fig. 4 is a schematic view of a vehicle with a

Driving support system with a sensor unit comprising a LiDAR-based environment sensor and an optical camera and with a control unit, which are connected to one another via a data connection, according to a first preferred embodiment,

5 shows a view of a diagram for generating environmental information based on a combination of image information of the environment from the optical camera and a point cloud of the environment with a plurality of environmental points from the environmental sensor with the driving support system from FIG. 4,

6 shows a schematic representation of processing of the image information to generate a semantic image in accordance with FIG. 5, 7 shows a schematic representation of image information with a depiction of surrounding points of the point cloud and a highlighting of identified objects in the image information,

FIG. 8 shows a detailed view of a mapping of a point surrounding the point cloud from FIG. 7 onto a semantic image, and

9 shows a flow chart for generating the environmental information based on a combination of image information of the environment from the optical camera and a point cloud of the environment with a plurality of environmental points from the environmental sensor in accordance with the illustration from FIG. 5.

FIG. 1 shows a vehicle 10 with a driving support system 12 according to a first preferred embodiment.

The driving support system 12 can be designed to provide any support functions for driving the vehicle 10 or with the vehicle 10 . This can involve driver assistance systems that assist a human driver in driving the vehicle 10, as well as the provision of functionalities for carrying out autonomous or semi-autonomous driving functions. Various driver assistance systems are known, for example, under the term ADAS (Advanced Driver Assistance Systems).

The driving assistance system 12 includes a sensor unit 14 with a LiDAR-based surroundings sensor 16 and an optical camera 18. The driving assistance system 12 also comprises a control unit 20 and a data connection 22 via which the LiDAR-based surroundings sensor 16, the optical camera 18 and the control unit 20 are connected to each other.

The sensor unit 14 is designed for joint attachment of the LiDAR-based surroundings sensor 16 and the optical camera 18 to the vehicle 10. As a result, the LiDAR-based surroundings sensor 16 and the optical camera 18 are attached to the vehicle 10 at a small distance from one another. The LiDAR-based environment sensor 16 is designed to capture an environment 24 of the vehicle 10 . The surroundings 24 are recorded as a point cloud 26 with a plurality of surrounding points 28 which are arranged in a plurality of scan planes 30, as can be seen from FIG. The surrounding points 28 are generated in that laser pulses are emitted and reflections of the emitted laser pulses are received, so that a distance value can be determined from the resulting propagation time. Each of the environmental points 28 is defined by its angular position in relation to the LiDAR-based environmental sensor 16 and the associated distance value. In each scan plane 30, the laser pulses are emitted with a uniform angular spacing. But there are also LiDAR-based environmental sensors that have an area in the middle that has a higher resolution. For example 0.125 degrees compared to 0.25 degrees in the outdoor areas.

Optical camera 18 is also designed to capture surroundings 24 of vehicle 10 . The detection takes place in a known manner based on a dot matrix with individual image points, which are also referred to as pixels and each include brightness information and/or color information. The optical camera 18 provides corresponding image information 32, as shown in FIGS. 6 and 7 by way of example.

The environment 24 of the vehicle 10 is an area captured by the optical camera 18 and the LiDAR-based environment sensor 16 . Environment 24 should have a field of view of 90° to 180° in the direction of travel and extend up to a distance of 100 or 200 meters from vehicle 10, as is generally the case in accordance with the illustration in FIG.

The control unit 20 includes a processor and a memory for executing a program for performing a support function of the driving support system 12 as well as for performing the method described below.

The control unit 20 controls the LiDAR-based surroundings sensor 16 and/or the optical camera 16 and receives and processes point clouds 26 provided by the LiDAR-based surroundings sensor 16 and image information 32 provided by the optical camera 18. The data connection 22 is designed, for example, in the manner of a bus system that is customary in the automotive sector. Various bus systems such as CAN, FlexRay, LIN or others are known in this context.

A method for generating environmental information 48 of the environment 24 of the vehicle 10 with the driving support system 12 described above with reference to FIG. 4 according to the first specific embodiment is described below with reference to FIGS. FIG. 9 shows a flow chart of the method.

In step S100 the method starts with the determination of a fixed mapping rule of the surrounding points 28 of the point cloud 26 generated by the LiDAR-based surrounding sensor 16 on positions in a semantic image. In general, the mapping rule is calculated dynamically, since the cameras and lidars are usually installed at different positions, and this can result in imaging errors that are too large. But this also depends on the specific application.

The fixed mapping rule can be stored in the control unit 20 of the driving support system 12 in the form of a “look-up table” (Lut), for example. The fixed mapping rule can be determined once for the driving support system 12, for example during installation or as part of a calibration. Details on the content of the look-up table result from step S150 described below.

Step S110 relates to capturing the image information 32 of the surroundings 24 of the vehicle 10 with the optical camera 18. Details on the function of the optical camera 18 have already been explained above. The image information 32 is transmitted from the optical camera 18 to the control unit 20 via the data connection 22 .

Step S120 relates to capturing the point cloud 26 of the surroundings 24 of the vehicle 10 with the plurality of surrounding points 28 using the LiDAR-based surroundings sensor 16. Details on the function of the LiDAR-based surroundings sensor 16 have already been explained above. The point cloud 26 is transmitted from the LiDAR-based environment sensor 16 to the control unit 20 via the data connection 22 .

Step S130 relates to a temporal synchronization of the acquisition of the image information 32 with the optical camera 18 and the acquisition of the point cloud 26 with the LiDAR based environment sensor 16. The temporal synchronization of the acquisition of the image information 32 and the point cloud 26 can be carried out, for example, by synchronized operation of the optical camera 18 and the LiDAR-based environment sensor 16, so that the image information 32 and the point cloud 26 are generated essentially simultaneously . Alternatively or additionally, the temporal synchronization can include providing a common time base for the optical camera 18 and the LiDAR-based environment sensor 16, with which the image information 32 and the point cloud 26 is assigned a time stamp. As a result, differences in time when capturing the image information 32 with the optical camera 18 and when capturing the point cloud 26 with the LiDAR-based surroundings sensor 16 can be compensated for. A temporal interpolation of the detections in image space can achieve an approximate interpolation as long as the time bases are synchronous.

Steps S110 and S120 can thus be carried out at the same time or with a slight time offset in any sequence.

Step S140 relates to generating the semantic image 34 of the surroundings 24 of the vehicle 10 based on the image information 32, which is fed to a neural network 36, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, with an opposite of the image information 32 reduced number of pixels 38. The semantic image 34 is shown schematically (in part) with a plurality of pixels 38 in FIG.

Semantic image 34 of environment 24 of vehicle 10 is generated based on processing of image information 32 captured by optical camera 18. In this exemplary embodiment, image information 32 is fed to neural network 36, which processes image information 32 in a plurality of layers 40 . In the course of this processing, the semantic image 34 is created by semantic segmentation of image content. The semantic image 34 thus includes the objects 42 shown by way of example as animals in the image information 32 in FIG. Vehicles are recorded as objects 42 in the exemplary image information 32 from FIG. The objects 42 are marked with an object frame in FIG. 7, but this is not necessary. Starting from the Image information 32 from FIG. 7 includes the semantic image 34 for each of its pixels 38 confidence values for different classes of objects 42 that are to be recognized in the image information 32, for example cars, trucks, pedestrians, bicycles, trees, traffic lights or the like.

Step S150 relates to mapping the surrounding points 28 of the point cloud 26 directly onto positions 44 in the semantic image 34.

For this purpose, as shown in FIG. 5, the surrounding points 28 of the point cloud 26 are first transformed into surrounding points in image coordination 46 starting from their sensor coordinates. The transformation is based on one or more parameters from an extrinsic calibration of the optical camera 18, an intrinsic calibration of the optical camera 18, a position of the LiDAR-based environmental sensor 16, a pose of the LiDAR-based environmental sensor 16 and a distance between the corresponding environmental points 28 the point cloud 26 from the LiDAR-based environment sensor 16. In this exemplary embodiment, the transformation of the point cloud 26 into sensor coordinates into image coordinates of the optical camera 18 is essentially independent of the distance measurement of the respective surrounding points 28 due to the joint design of the optical camera 18 and the LiDAR based environment sensor 16 in the sensor unit 14 and the resulting positioning close together.

The surrounding points in image coordination 46 are then mapped directly to positions 44 in the semantic image 34 for the point cloud 26 by determining an associated position 44 in the semantic image 34 for each surrounding point 28, as shown by way of example in FIG. The position 44 is defined by a two-dimensional vector with real values, i.e. positions between centers of the individual pixels 38 of the semantic image 34 are determined, for example as floating point values.

The imaging of the surrounding points 28 directly onto the positions 44 in the semantic image 34 thus takes place independently of the distance value of each surrounding point. The illustration is as described here in the Stored look-up table so that the position 44 can be determined for each surrounding point 28 with the fixed mapping rule. The position 44 is therefore independent of its distance value for each surrounding point 28 .

In this exemplary embodiment, the mapping of the surrounding points 28 of the point cloud 26 directly to positions 44 in the semantic image 34 includes an interpolation of the positions 44 in the semantic image 34 to pixels 38 of the semantic image 34. In this exemplary embodiment, the interpolation is carried out as bilinear interpolation. As can be seen from Figure 8, for example, the local position 44 of the corresponding surrounding point 28 is in a corner area of the pixel 38, so that the semantic information of the surrounding point 28 for this position 44 is based on the semantic information of all the adjacent pixels 38 that are there in addition denoted by a, b, c, d is formed in combination. Based on the position 44 specified from the look-up table, the interpolation can also be defined in advance.

Step S160 relates to generating surrounding information 48 by assigning the semantic information for each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34. It is therefore each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34, a mapping of the semantic information at the determined position 44 of each surrounding point 28 is carried out in order to assign the semantic information at the corresponding position 44 to this.

In this exemplary embodiment, control unit 20 has a plurality of data processing units (not shown individually) and is designed to operate in parallel to map surrounding points 28 of point cloud 26 directly to positions 44 in semantic image 34 and to generate surrounding information 48 by assigning the semantic information for to carry out each surrounding point 28 of the point cloud 26 based on the mapping of the respective surrounding point 28 to the corresponding position 44 in the semantic image 34 . Graphics processors (GPU) with a number of parallel computing cores have proven their worth for parallel information processing. Step S170 relates to generating an environment map based on the environment information 48 with the semantic information for each environment point 28 of the point cloud 26. The environment map includes an area around the vehicle 10 for use by various driving support functions of the vehicle 10. The environment map is, for example, according to the type generates a grid occupancy map in which individual grid elements based on the surrounding points 28 are covered with their semantic information.

List of Reference Numbers 10 Vehicle

12 driving support system

14 sensor unit

16 environmental sensor, LiDAR-based environmental sensor

18 optical camera

20 control unit

22 data connection

24 environment

26 point cloud

28 environment point

30 scan plane

32 image information

34 semantic picture

36 neural network

38 pixels (semantic image)

40 layer

42 object

44 positions

46 environment points in image coordination

48 environment information

100 Image information (state of the art)

102 neural network (state of the art)

104 semantic image (state of the art)

106 layer (prior art)

108 upsampled semantic image (prior art) 110 representation (prior art)

112 object (state of the art)

114 upsampling layer (prior art)

116 point cloud (prior art)

118 point cloud in image coordinates (state of the art)

120 environment information (state of the art)

Claims

patent claims

1. A method for generating environmental information (48) of an environment (24) of a vehicle (10) with a driving support system (12) having at least one environmental sensor (16) and an optical camera (18), comprising the steps

Capturing image information (32) of the surroundings (24) of the vehicle (10) with the optical camera (18),

Detection of a point cloud (26) of the surroundings (24) of the vehicle (10) with a plurality of surrounding points (28) with the at least one surrounding sensor (16),

Generating a semantic image (34) of the surroundings (24) of the vehicle (10) based on the image information (32), which is supplied to a neural network, in particular a convolutional neural network, particularly preferably a fully convolutional neural network, FCN, with a compared to the image information (32) reduced number of pixels (38),

mapping the surrounding points (28) of the point cloud (26) directly to positions (44) in the semantic image (34), and

Generating the environmental information (48) by assigning the semantic information for each environmental point (28) of the point cloud (26) based on the mapping of the respective environmental point (28) to the corresponding position (44) in the semantic image (34).

2. The method according to claim 1, characterized in that the method comprises a step for temporally synchronizing the acquisition of the image information (32) with the optical camera (18) and the acquisition of the point cloud (26) with the at least one environment sensor (16).

3. The method according to any one of claims 1 or 2, characterized in that the method comprises a step for determining a fixed mapping rule of the surrounding points (28) of the point cloud (26) to positions in the semantic image (34), and mapping the surrounding points (28) of the point cloud (26) directly to positions (44) in the semantic image (34) mapping the surrounding points (28) of the point cloud (26) to the positions (44) in the semantic image (34) with the fixed mapping rule of the surrounding points (28) of the point cloud (26) to positions (44) in the semantic image (34).

4. The method according to any one of the preceding claims, characterized in that the mapping of the surrounding points (28) of the point cloud (26) directly to positions (44) in the semantic image (34) an interpolation of the positions (44) in the semantic image ( 34) to pixels (38) of the semantic image (34).

5. The method according to claim 4, characterized in that the interpolation of the positions (44) in the semantic image (34) to pixels (38) of the semantic image (34) is bilinear interpolating, nearest neighbor classification, use of a support vector Machine or applying a Gaussian process includes.

6. The method according to any one of the preceding claims, characterized in that the mapping of the surrounding points (28) of the point cloud (26) directly to positions (44) in the semantic image (34) means a mapping of the surrounding points (28) of the point cloud (26) directly to positions (44) in the semantic image (34) based on one or more parameters from an extrinsic calibration of the optical camera (18), an intrinsic calibration of the optical camera (18), a position of the at least one environmental sensor (16), a pose of the at least one environmental sensor (16) and a distance of the corresponding environmental points of the point cloud from the at least one environmental sensor (16).

7. The method according to any one of the preceding claims, characterized in that the method comprises an additional step for generating an environment map based on the environment information (48) with the semantic information for each environment point (28) of the point cloud (26).

8. Driving support system (12) for generating environmental information (48) of an environment (24) of a vehicle (10) with at least one environmental sensor (16), an optical camera (18), a control unit (20) and a data connection (22) , via which the at least one environment sensor (16), the optical camera (20) and the control unit (20) are connected to one another, characterized in that the driving support system (12) is designed to carry out the method according to one of claims 1 to 7.

9. Driving support system (12) according to claim 8, characterized in that the at least one environment sensor (16) is designed as a LiDAR-based environment sensor (16) and/or as a radar sensor.

10. Driving support system (12) according to one of claims 8 or 9, characterized in that the at least one environment sensor (16) and the optical camera (18) are designed as a sensor unit (14) for joint attachment as a sensor unit (14) on the vehicle (10).

11 . Driving support system (12) according to one of Claims 8 to 10, characterized in that the at least one environment sensor (16) and the optical camera (18) are designed to be attached to the vehicle (10) at a small distance, and the driving support system (12 ) is designed to carry out the method according to claim 3.

12. Driving support system (12) according to one of claims 8 to 11, characterized in that the control unit (20) has a plurality of data processing units and is designed to carry out a parallel operation for mapping the surrounding points (28) of the point cloud (26) directly to positions ( 44) in the semantic image (34) and/or for generating the environmental information (48) by assigning the semantic information for each environmental point (28) of the point cloud (26) based on the mapping of the respective environmental point (28) to the corresponding position ( 44) in the semantic image (34).