WO2020104423A1

WO2020104423A1 - Method and apparatus for data fusion of lidar data and image data

Info

Publication number: WO2020104423A1
Application number: PCT/EP2019/081741
Authority: WO
Inventors: Nijanthan BERINPANATHAN; Nils Kuepper; Jerramy GIPSON
Original assignee: Volkswagen Aktiengesellschaft; Audi Ag; Dr. Ing. H.C.F. Porsche Ag
Priority date: 2018-11-20
Filing date: 2019-11-19
Publication date: 2020-05-28

Abstract

A system, methodologies and components apply dynamic motion compensation to enable asynchronous fusion between LiDAR data and camera data. This involves using a processor coupled to a high framerate camera and a LiDAR sensor system for receiving high framerate camera generated image data and LiDAR sensor generated pointcloud data. The processor runs software to estimate three dimensional velocity of dynamic objects included in a scene sensed by the high framerate camera and the LiDAR sensor system to perform the dynamic motion compensation. The estimated three dimensional velocity of the dynamic objects may be used for transportation vehicle path planning and/or trajectory management.

Description

METHOD AND APPARATUS LOR DATA LUSION OL LIDAR DATA AND IMAGE

DATA

Field:

The present invention relates generally to a method and an apparatus for data fusion of LiDAR data and image data, and more particularly to sensor fusion in transportation vehicles for obtaining a consistent environment model for assisted or automated driving.

Background:

Autonomous driving allows vehicles such as self-driving cars to move safely with little or no human input based on a variety of sensors, which perceive their surroundings. Depending on the amount of driver intervention and attentiveness required, classification systems define different levels, typically ranging from fully manual to fully automated systems. For example, the standard SAE J3016 defines six such levels, wherein for levels 0 to 2 the human driver still monitors the driving environment with no automation at all, some driver assistance or partial automation and for levels 3 to 5 an automated driving system monitors the driving environment and performs the driving tasks with different levels of possible driver intervention.

LiDAR sensors and cameras are often used together in the fields of driver assistance systems and autonomous driving, as these sensors complement each other well thanks to their different sensor characteristics. Cameras generate image data by capturing light intensities and typically sample the scene at once but only provide a 2D projection of the scene without 3D information. Opposed to a camera, a LiDAR sensor directly determines 3D distances for each sample point of the generated LiDAR pointcloud, however, as compared to a camera, with a low spatial and temporal sampling frequency. Therefore, the combination of these two complementary sensors can lead to a better understanding of the environment.

Conventional approaches to fusing LiDAR pointcloud data and camera generated image data have been implemented in various works in the fields of driver assistance systems and autonomous driving. The underlying motivation for these efforts is to provide more accurate and complete environment models for use in transportation vehicles, allowing more reliable perception systems to ultimately ensure safer and more comfortable driving functionalities beyond the capabilities of systems that are currently used for level 2 functions in series development transportation vehicles. Current level 2 applications do not demand a highly accurate alignment of data from different sensors because the functionality does not require it.

For example, Sebastian Schneider et al.,“Fusing vision and lidar - synchronization, correction and occlusion reasoning," 2010 IEEE Intelligent Vehicles Symposium, pp. 388- 393, 2010, proposed a framework to create a color point cloud by fusing camera and image data. More specifically, the synchronization of a camera with a 360-degree rotating LiDAR is discussed, whereby all the samples, which are in the field of view of the camera, may be taken. Also the concept of ego-motion compensation using integrated IMU data is discussed. In addition, it is disclosed how occlusion is explicitly handled similar as in computer graphics with a z-buffer concept.

Jennifer Dolson et al.,“Upsampling range data in dynamic environments," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1141-1148, 2010, addresses upsampling of range data in dynamic environments by taking image information into account. This method is similar to a bilateral filter, which is a widely used technique in image filtering. A 6-dimensional position vector is constructed consisting of the color value (RGB), pixel position (u, v) and time. The unknown range for a specific position is predicted as a weighted sum of neighboring range values. The weight is dependent on the 6-D distance between its neighbors. For low relative velocities, for example, with cars travelling on a highway in the same direction and at almost the same speed as the ego- vehicle, this algorithm appears to work well. However, if high relative velocities occur, that approach will lead to inconsistent matching of three-dimensional points and, in turn, will lead to a wrong upsampling.

David Held et al.,’’Precision tracking with sparse 3d and dense color 2d data," in 2013 IEEE International Conference on Robotics and Automation, May 2013, pp. 1138-1145 and David Held et al.,“Robust Real-Time Tracking Combining 3D Shape, Color, and Motion,” in The International Journal of Robotics Research, August 2015, pp. 30-49 are directed to a different approach. In particular, the latter proposed fusing laser and camera data to estimate velocities of dynamic objects. For each tracked object, colored points were accumulated. The data was associated in a probabilistic framework by maximizing the velocity posterior. For this, a grid-based sampling method was used to sample the state-space to align accumulated colored point cloud with the next incoming frame. However, if the relative velocities got higher and the scanning pattern dependent distortion increased, the result was an overall distorted accumulated point cloud, which failed to correspond to the real form present in the scene. The whole dynamic artifacts introduced through the scanning nature of the LiDAR were not treated.

M. H. Daraei, et al.,“Region Segmentation Using LiDAR and Camera,” 2017 IEEE

Intelligent Transportation Systems Conference (ITSC), October 2017, and M. H. Daraei, et al.,“Velocity and shape from tightly-coupled lidar and camera,” 2017 IEEE Intelligent Vehicles Symposium, June 2017, focused on a similar approach by proposing a method for object tracking and reconstruction by fusing LiDAR and image data. In that approach, an intermediate surface was used for combining both modalities. The image likelihood term was a function of the appearance model, the new image and the three dimensional velocity. The image could then be related to the appearance model by shifting the model according to the projected three-dimensional velocity on the image. As a result, a photometric error could be computed as a function of the velocity. The LiDAR likelihood term was a function of the new incoming point cloud, the intermediate surface representation and the velocity. According to the velocity, the estimated surface could be transformed and an error of the alignment with the incoming point cloud could be computed. By minimizing this multi-modal cost function, one would get the optimal value for the velocity estimation. However, if the relative velocities exceed tolerable levels and the correct overlay of the point cloud and image could not be guaranteed, it leads to a wrong estimation of the appearance model and accordingly to inaccurate velocity estimation.

R. Omar Chavez-Garcia et al.,“Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking,” IEEE Transactions on Intelligent Transportation Systems, vol. PP, no. 99, pp. 1-10, 2015 addressed the problem of detecting and tracking dynamic objects in the environment by using radar, LiDAR and camera. Their approach detected and classified the objects in each sensors’ modalities separately and, at the end, fused the object list. However, that proposed fusion approach would not provide reliable estimation without the additional radar sensor. The approach of S. Wang et ah,“Dynamic detection technology for moving objects using 3d lidar information and rgb camera," in 2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), June 2017, pp. 37-38, fused LiDAR and RGB camera data as well for detecting and tracking moving objects. In that approach, the data was segmented in image and LiDAR space. Object proposals from the two dimensional image was used to cluster the three-dimensional space and to keep only the object points. Subsequently, in both sensor modalities, the objects were detected independently. In the image space, a Fast R- CNN was used to detect the objects. In LiDAR space, a trained linear Support Vector Machine was used to classify the Spin Image LiDAR descriptors. However, the weakness of this approach, even before detection, is the image dependent LiDAR clustering. If there was a wrong matching between LiDAR and camera data, object points were thrown away although they were a part of the object.

In X. Chen et ah,“Multi-view 3d object detection network for autonomous driving," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6526(6534, 2017, Multi-View Three Dimensional (MV3D) networks were presented for detecting vehicles and outputting oriented three dimensional bounding boxes by fusing LiDAR and RGB images. The overall network could be split into two subnetwork. First, the three dimensional proposal network estimated three dimensional candidate boxes. Second, the region-based fusion network projected these candidates to different views and extracts features. In conclusion, this learning-based approach could detect three-dimensional bounding boxes, but the average precision on the KITTI dataset in a difficult scene was only 79.81%. Further, this network did not output the three dimensional LiDAR aligned with the image, instead the oriented bounding box, which approximately covered the contour of the vehicle. Additionally, no temporal information, like velocities, was taken into account.

All of the mentioned works fuse the data coming from camera and LiDAR without explicitly taking into account the problem of inconsistent overlay of three dimensional points and image information when high relative velocities are present. This assumption generally is not a direct limitation for those approaches, as they are only considering low relative velocities between the ego-vehicle and the dynamic objects in the scene.

E. Ilg et ah,“Reconstruction of rigid body models from motion distorted laser range data using optical flow," in IEEE International Conference on Robotics and Automation (ICRA), 2014 has addressed the issue of motion distortion for a system using a single line laser scanner on a nodding, stationary platform.

Summary:

The following presents a simplified summary in order to provide a basic understanding of some aspects of various invention embodiments. The summary is not an extensive overview of the invention. It is neither intended to identify key or critical elements of the invention nor to delineate the scope of the invention. The following summary merely presents some concepts of the invention in a simplified form as a prelude to the more detailed description below.

According to the present disclosure, a method and apparatus is provided that addresses the problem of arbitrary relative velocities in order to enable sensor fusion processing in a wide range of configurations.

According to a general aspect of at least one embodiment, a method for data fusion of LiDAR data and image data is presented, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, comprising:

capturing three-dimensional LiDAR data of the scene with a scanning LiDAR sensor; capturing image data corresponding to a two-dimensional projection of the scene with an image sensor having a higher framerate as compared to the framerate of a complete scan of the LiDAR sensor;

estimating based on the captured image data and/or LiDAR data a three-dimensional velocity of a dynamic object present in the scene;

compensating the motion of the dynamic object; and

fusing the motion compensated LiDAR data and image data.

In one preferred embodiment, the method further comprises:

generating a LiDAR subframe including LiDAR samples covering a part of the scene during the frame period of an image sensor generating the image data; and

accumulating LiDAR samples of multiple LiDAR subframes to generate an accumulated LiDAR point cloud corresponding to a complete scan of the sensed scene at the frame rate of the LiDAR sensor. Furthermore, in one preferred embodiment, the method further comprises:

projecting LiDAR samples of a LiDAR subframe on image data of a camera image; and

propagating the projected LiDAR samples according to a displacement information to generate a propagated point cloud.

Advantageously, propagating the projected LiDAR samples is based on an optical flow algorithm which determines the displacement information between consecutive camera images.

Furthermore, in one preferred embodiment, the method further comprises:

updating the propagated point cloud according to the optical flow information generated for a new camera image; and

checking the projected LiDAR samples of a new LiDAR subframe for a correspondence with existing points in the current propagated point cloud.

For this embodiment, it is preferred that a correspondence qualifies as valid if projected LiDAR samples of a new LiDAR subframe fall within a defined radius around an existing point.

In a further preferred embodiment, the method further comprises:

assigning an index to the LiDAR samples; and

keeping the index of the LiDAR samples while propagating the projected LiDAR samples.

Advantageously, LiDAR samples from multiple LiDAR subframes are accumulated to cover the time interval of a full LiDAR frame, wherein accumulated data older than this time interval are deleted after the processing of an image frame and new data for the current LiDAR subframe are added to the accumulated data. In addition, the method further comprises in a preferred embodiment:

computing LiDAR descriptors for dynamic LiDAR points in a LiDAR subframe point cloud, wherein a LiDAR descriptor describes a geometrical three dimensional structure of the local neighborhood of the dynamic points;

matching LiDAR points of a new LiDAR subframe with LiDAR points in the accumulated LiDAR point cloud using the LiDAR descriptors; and

propagating the matched LiDAR points of the new LiDAR subframe according to the estimated three dimensional velocity to generate a three dimensional propagated point cloud.

Furthermore, in one preferred embodiment, the method further comprises:

checking whether LiDAR samples in a new LiDAR subframe correspond to a dynamic or static object; and

keeping only LiDAR samples corresponding to dynamic points for the further processing.

Similarly, it is preferred that the method further comprises:

detecting a ground plane in the scene;

identifying LiDAR samples in a new LiDAR subframe corresponding to the detected ground plane; and

removing the identified LiDAR samples from the further processing.

Advantageously, the estimated three-dimensional velocity of the dynamic objects is used for transportation vehicle path planning and/or trajectory management.

According to another general aspect of at least one embodiment, an apparatus for data fusion of LiDAR data and image data is presented, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, the apparatus comprising:

a scanning LiDAR sensor (2) generating the LiDAR data;

a high framerate camera (3) generating the image data, wherein the image data are generated with a higher framerate as compared to the framerate of a complete scan of the scanning LiDAR sensor;

one or more processor coupled to the camera and the LiDAR sensor, wherein said one or more processors are configured to:

receive camera generated image data and LiDAR sensor generated point cloud data; estimate a three dimensional velocity of a dynamic object present in the scene;

compensate the motion of the dynamic object; and

fuse the motion compensated LiDAR data and image data.

According to a preferred embodiment, the apparatus further comprises:

one or more buffers coupled to the one or more processors for storing accumulated information related to the LiDAR data and the image data.

Finally, in at least one preferred embodiment a vehicle performs a method and/or comprises an apparatus as described above.

Brief Description of Figures:

Further advantages, features and possibilities of using the presently disclosed embodiments emerge from the description below in conjunction with the figures.

Figure 1 A illustrates a high resolution LiDAR sensor and a high framerate camera closely collocated for simultaneously capturing the environment from the same viewpoint.

Figure IB illustrates an arrangement comprising a high resolution LiDAR system and a high framerate camera mounted on the roof of a vehicle.

Figure 2A illustrates an example of a transportation vehicle, which passes an ego vehicle with unknown velocity and a silhouette surrounding distorted LiDAR samples

Figure 2B illustrates schematically several images captured by a camera and a single full LiDAR frame completely mapped on one of the images.

Figure 3A illustrates the transportation vehicle with a silhouette surrounding LiDAR samples corresponding to a LiDAR subframe.

Figures 3B illustrates schematically the LiDAR subframe falling in a time interval Dΐ around one of the camera images. Figure 4A illustrates an image of a transportation vehicle together with LiDAR samples, which correspond to the shape of the transportation vehicle but are displaced.

Figure 4B corresponds to Figure 2B.

Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle.

Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame.

Figure 6 illustrates a method for performing a velocity estimation for an asynchronous fusion of LiDAR data and camera data in accordance with at least one embodiment.

Detailed Description:

The disclosed embodiments are directed at providing a technical solution for the problem of asynchronous fusion between LiDAR data and camera data, provided by a combination of a LiDAR sensor and camera such as shown in figure 1.

Figure 1 A illustrates an arrangement 1 comprising a combination of a high resolution LiDAR system 2, such as a Luminar G2+ LiDAR sensor, and a high framerate camera 3, such as a Flir Grasshooper camera, closely collocated above each other for simultaneously capturing the environment from nearly the same viewpoint.

Figure IB illustrates the experimental arrangement 1 mounted on the roof of an ego vehicle 4. However, the illustrated arrangement is only an example of a prototype, in a production vehicle other arrangements of LiDAR sensor and camera to each other and also at other locations of the body of the vehicle are also possible.

More specifically, the above mentioned technical problem occurs because conventionally known global shutter cameras on the one hand and scanning LiDAR systems on the other hand capture their environment in different ways. Global shutter cameras capture a two dimensional projection of their environment, commonly known as an“image,” in an interval in time defined by the shutter time of that particular camera. Thus, within that timespan - usually 1/30⁴ of a second up to 1 /2000^th of a second (depending on the lighting conditions of the scene), - the entire scene is captured in image data wherein every pixel in the image data is exposed at the same time.

Capturing three-dimensional information for a scene with a scanning LiDAR system is fundamentally different. This is because conventional scanning LiDAR systems continuously scan the scene within a defined time interval, e.g., for most conventional LiDAR systems within 100ms corresponding to a 10 Hz scan rate. Thus, the scene is not captured for every LiDAR point in the same time interval as happens with a conventional image camera. More specifically, within 1/10^th of a second, the LiDAR beam is steered into various locations to locally sample the scene and only after 1/10^th of a second covers the same location again.

While this effect is not of concern if the observed scene is completely stationary with no movement in it, it becomes a challenging problem if there are dynamic objects, i.e., moving objects, present. This is because the velocity of those objects is unknown. As a result, it becomes difficult, if not impossible to correlate LIDAR generated point cloud data with conventional camera image data when moving objects are present in a scene to be imaged.

This issue is a particular problem in the fields of driver assistance systems and autonomous driving. As explained above, conventional approaches to fusing LiDAR pointcloud data and camera generated image data have been implemented with the hopes of providing more accurate and complete environment models for use in transportation vehicles, allowing more reliable perception systems to ultimately ensure safer and more comfortable driving functionalities beyond the capabilities of systems that are currently used for level 2 functions in series development transportation vehicles. Current level 2 applications do not demand a highly accurate alignment of data from different sensors because the functionality does not require it. However, for level 3+ levels of automation, this is fundamentally different.

This issue of asynchronous sensor fusion is illustrated in Figures 2A-2B. Figure 2A illustrates an example of a transportation vehicle 5, which passes a not shown ego vehicle with unknown velocity. The individual LiDAR samples are not individually shown in the figure, but are located within silhouette 6. As can be seen, the silhouette does not match with the actual appearance of the car in the image frame, meaning that the LiDAR samples are distorted. This distortion results from the different capturing mechanisms, as mentioned above and further illustrated in Figure 2B.

Figure 2B schematically show six progressive full images 7, each captured by a camera within a short timespan at a particular point in time t. Over the same period of time where the shown six full images are captured, a single full LiDAR frame 8 is determined by scanning the same scene. The determined LiDAR frame 8 is then mapped on the last captured image at time ti, as symbolized by the arrows. In this way, artifacts are introduced by the unknown dynamic motion of objects in the scene. Such artifacts are called“dynamic motion artifact” in the following and render the resulting data less useful for the generation of environment models.

Disclosed embodiments address this technical problem by applying a“dynamic motion compensation,” which leverages the functionality of a high framerate camera, e.g. operating at a 100Hz framerate, in combination with a LiDAR sensor operating at e.g. 10Hz repetition rate. I.e., in these embodiment the high framerate camera has a frame rate at least 10 times faster than the LiDAR sensor’s frame-rate. However, other ratios between the camera framerate and the LiDAR framerate are also possible as long as the camera framerate is significantly higher than the LiDAR framerate.

Figures 3A-3B illustrate how, by using a high framerate camera the misalignments between camera data and LiDAR data becomes negligible. Instead of mapping all LiDAR samples corresponding to a complete scan on a particular image frame, for each of the image frames only a fraction of the total LiDAR samples are considered for the fusion of three dimensional samples and camera image data. I.e. only LiDAR samples are considered that fall into a time interval of consecutive images, located in Figure 3A within silhouette 9, which covers due to the used scan pattern only a vertical fragment of the vehicle which is roughly parallel to the road surface. Correspondingly, at time ti in figure 3B only the fraction of the LiDAR frame 8 is taken into account, which falls in the time interval Dΐ around ti. The length of the time interval Dΐ is dependent on the processing framerate and may in particular correspond to the inverse of the camera framerate. Disclosed embodiments further estimate the three dimensional velocity of the dynamic objects in order to perform the dynamic motion compensation. Thus, disclosed embodiments’ estimation of the three dimensional velocity provides a valuable input for consecutive processing operations such as path planning and trajectory management. Accordingly, disclosed embodiments enable creation of a consistent environment model, where LiDAR points and camera generated image data are consistent. In this way, pixel data in the camera generated image data and pointcloud data generated by the LiDAR sensor are in agreement at all times.

Figure 4A illustrates raw alignment of a dynamically moving car. As can be seen clearly, the points representing the LiDAR measurements correspond to the shape of the car but are displaced. Figure 4B, being identical to Figure 2B, illustrates this distortion.

Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle. The estimated motion of the vehicle is indicated in the figure with arrow 10. The improved alignment is achieved by accumulating and updating data for several images frames as schematically shown in Figure 5B.

Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. In the accumulated data At-1 those data being older than the time interval are deleted at the end of the overall processing of a frame at time ti and at the same time new data for the current subframe are added to the accumulated data.

Between each image, the displacements of brightness patterns may be calculated through a method called“optical flow.” The output of optical flow provides information where intensity patterns have shifted in consecutive image frames; ideally, this corresponds to the actual projection of motion in the observed three-dimensional scene. With this information, one can propagate LiDAR samples on the image plane and/or limit the search space to find corresponding LiDAR samples in consecutive LiDAR frames. This can be used to resolve ambiguities in the LiDAR pointcloud processing as well as speed up algorithms for data association. The image is not only the most dense and accurate representation of the environment, but also is a helping construct for all LiDAR processing.

Accordingly, at each incoming image frame, only LiDAR samples are considered that fall into the time interval of consecutive images. This subset of points is then projected on the image and propagated according to the optical flow of each new incoming image. The raw subset of points may be accumulated until a certain number of LiDAR sub frames is reached. This number may be defined by the number of points that go into a full LiDAR scan.

Additionally, the propagated 2D points may be saved in a separate buffer, while the original indexing may be kept so the propagated points can be related to the raw three-dimensional points.

In the next image frame, the new arriving subset of LiDAR points may be projected on the image. Before doing any further processing on the new subset of LiDAR points, the propagated two-dimensional pointcloud may be updated according to the optical flow generated by the new image. Then each point of the new arrived sub set may be checked for existing points in the current two-dimensional propagated pointcloud within a defined radius that might qualify as a valid correspondence.

Following that scheme, one can connect between valid points in the new subset and in the two-dimensional propagated pointcloud. Since the indexing in the propagated pointcloud is kept, the connection to the raw accumulated pointcloud is known as well. The i-th point in the propagated pointcloud origins from the i-th point in the raw accumulated pointcloud.

The relationship between points in three-dimensional space between the just arrived sub- frame data and the accumulated data is always known. This process tremendously increases the performance in correspondence finding between LiDAR subframes because only a limited number of points need to be checked as opposed to the whole accumulated pointcloud.

The correspondences from the previous processing operation may be used as input for a RAN SAC algorithm to reject outliers during the process. With valid correspondences and their respective displacements, a final transformation can be obtained. Through the known time interval between LiDAR samples, the computed transformation can be related to a three-dimensional velocity. Each point in the subframe cloud may be checked against the motion hypothesis by transforming this point back according to the hypothesis so that the process aligns to the accumulated raw data. Further, the points may be compared using certain criteria defined by the neighboring region in the accumulated three dimensional pointcloud.

The comparison operation may take into account the LiDAR reflectivity, color intensity, range value and the number of inlier correspondences within a defined radius. If a given point fulfills these criteria, the specific motion will be assigned to it. Subsequently, the processed subframe may be added to the three dimensional accumulated pointcloud containing the information of motion in the scene.

Thereafter, some points in the accumulated pointcloud may have a motion assigned.

Accordingly, as soon as a new data frame arrives, the moving points in the accumulated pointcloud may be propagated in three dimensional space to the current time according the assigned motion and saved in a similar fashion as the two dimensional propagated pointcloud with the property of keeping the indexing. This propagation operation results in

compensation of the dynamic motion, since these points corresponding to moving objects are shifted to the current time of the frame.

Next, the two dimensional propagated pointcloud may be updated according to the latest data, as explained above. Subsequently, the new subset of LiDAR points may be first compared to the three dimensional propagated pointcloud. Again, in a similar fashion as the

correspondence finding on the image, each point in the new subset may be checked for existing in the three dimensional propagated pointcloud within a defined distance. If a point exists, the connection to the raw accumulated pointcloud can be established by using the index information from the three dimensional propagated pointcloud.

Before the outlier rejection by the RAN SAC algorithm is applied to the matching point from the previous iteration, each correspondence may be weighted by the similarity to the optical flow derived from the images until the current frame. In this configuration, optical flow may be calculated for 10 frames captured at 100Hz to weigh the LiDAR samples generated at 10 Hz. The obtained transformation from the valid correspondences may provide a new motion hypothesis and each point may undergo the verification operation using the above mentioned criteria (i.e., LiDAR intensity, color intensity, range value and number of inlier correspondences in the neighborhood).

If a point passes this analysis, it is assigned to the motion hypothesis, as explained above. The points, which do not follow the motion hypothesis, are further processed in the same manner described above, wherein the correspondences are found on the image. After all of this processing is performed, the new subset of LiDAR data may again be added to the accumulated pointcloud with updated motion information.

The processing works at the framerate of the camera, i.e. new data arrive for the algorithm with the framerate of the camera, e.g. at 100 Hz. The process takes as inputs the current image frame I_t, a LiDAR subframe P_t containing a fraction number of points of the full LiDAR frame and the accumulated data I*_t-i accumulated during the previous processing of the last 10 frames. The output of the illustrated processing is a motion estimate for the incoming subframe cloud. The information obtained in the current frame is saved in the accumulated data I*_t.

The accumulated data I*t-1 contains data related to the optical flow at t-1, the raw

accumulated point cloud, 2D propagated data and the 3D propagated point cloud. The raw accumulated cloud contains raw 3D points from multiple subframes Pt-10; Pt-9; Pt-1 and covers the time interval of a full lidarframe, e.g. 0.1 seconds. The data which is older than 0.1 seconds are deleted at the end of the overall processing of the image frame at time t and at this time the new raw subframe Pt is added to the accumulated cloud. The 2D propagated data consists of 2D projected points and covers the same time interval as the raw accumulated cloud and contains the same number of points as well. In contrast to the raw accumulated cloud, the 2D points are temporally aligned to the last frame. The 3D propagated pointcloud consists of the 3D points from multiple subframes, same as the the raw accumulated point cloud. But the points are propagated according to the velocity obtained from the previously processed frames, so each point of the propagated pointcloud is temporally aligned after the propagation. First of all, the above mentioned input data provided by block 11 are supplied to the update block 12. In update block 12 a correction of the optical flow to the newly acquired image I_t and the propagation of the accumulated points in 2D by using the optical flow process, and in 3D according to the computed motion if there is any available is executed.

In the preprocessing step 13, first the ground points are removed to save computation time, then each sample of the new LiDAR subframe is checked whether it belongs to a dynamic or static object. Only dynamic points are further processed. Lastly, in the preprocessing step LiDAR descriptors are computed, which are required for the matching step.

In the 3D correspondence finding block 14, matching points between the new subframe P_t and the accumulated data I*_t-i are determined. For that, the motion information obtained in previous processing frames are used to find correspondences directly in 3D space.

Out of the found correspondences, the optimal transformation is determined in the 3D velocity estimation block 15. Then, the obtained transformation is related to the time difference between the corresponding points to get a 3D velocity value.

In the next block 16, the motion hypothesis is tested on each dynamic point in the subframe Pt. This check of the motion hypothesis is performed since even though the output of the correspondence finding block is a transformation T_t which best aligns the new subframe Pt to the accumulated pointcloud r_t-io_:t-i, subframe Pt may still contain some points which do not belong to the observed object. Therefore, each point is checked for certain criteria whether it follows the estimated motion.

Each point is transformed back according to the hypothesis, so it aligns to the accumulated raw data and compared with its neighbor. The comparing step takes into account different properties of LiDAR and camera data and assigns the motion to the correct points. The remaining points, which do not fulfill the specified criteria, are further processed in the 2D correspondence finding block 17. In this step, the image space is used to find robust correspondences, without having any prior knowledge about the 3D motion.

Then, similar to before, the optimal transformation is extracted from the found

correspondences and a 3D velocity is computed in block 18. Finally, in block 19 the remaining dynamic points of the subframe P_t are checked if certain criteria are met and added to the accumulated information data with the current motion information.

At initialization, the update block only computes a new optical flow update because the accumulated data I*_t-i is empty at the beginning. In the further processing steps no correspondences neither in 3D space nor in 2D space will be found, this in turn, will lead to no velocity estimation and only the raw subframe cloud is added to the accumulated information buffer. At the next processing frame, the accumulated information A_t-i will not be empty anymore. This time, the accumulated information can be propagated according to the optical flow in 2D space. But the 3D propagation part is still missing since the points have no motion assigned.

Therefore, again, no correspondences in 3D space can be found and most probably there will be neither any correspondences in 2D space, since the LiDAR scans at 10 Hz and it will approximately take 10 image frames until it scans the same part of the object again. After some frames, there will be 2D corresponding points found and for a first time, a velocity can be assigned to the points. From now on, the points can also be propagated in 3D space to find directly 3D correspondences. Every time a new obstacle is detected, it has a lag of one full LiDAR frame until the motion of the object can be detected.

The disclosed embodiments are a marked departure from conventional approaches to the technical problem. For a long time, algorithms for perception and sensor fusion algorithms were developed on datasets with limited framerates. The most popular example for this is the Kitti dataset, which the above conventional approaches often refer to. Camera and LiDAR operate at 10 HZ in this dataset. Moreover, conventionally the problem of dynamic objects has been conventionally assumed to be negligible in various implemented applications. Thus, one reason the artifact of dynamic motion compensation has not been addressed explicitly is the conventional approaches to sensor configuration and the low relative speeds in the dataset.

Further, it should be understood that the traditional frequency of the majority of sensors on a transportation vehicle, including the camera, is 25Hz. For a long time, research and development in the automotive space has been restricted to that framerate, which made explorations into high framerate cameras an exotic specialty. However, with more capable image sensors and improved processing, disclosed embodiments have been developed by challenging these conventional limitations.

Disclosed embodiments use a sensor setup that generates significantly more camera images while one scan of LiDAR is performed. Thus, conventional methods merely consider only one image per one LiDAR frame. To the contrary, disclosed embodiments do not rely on correct matching between the pointcloud and the image on a full LiDAR frame. Rather, splitting the data into smaller chunks enables the ability to cope with very high relative velocities because the matching error is only in the subpixel area.

Disclosed embodiments are not limited to any specific LiDAR sensor. All that is required is a scanning LiDAR that gives timestamps for each point, which is the case for all the conventional, commercially available LiDAR sensors.

Further, through processing performed in accordance with disclosed embodiments, a temporal upsampled LiDAR data is sampled to the frequency of the camera data, which is consistent with the real environment.

Disclosed embodiments provide low level sensor fusion between asynchronous sensors that overcomes the problems of obtaining a consistent environment model by explicitly addressing the artifacts introduced by dynamic objects. Disclosed embodiments leverage optical flow computations on a high framerate camera combined with pointcloud processing of LiDAR data. The LiDAR data processing may be a combination of sub-frame processing and full LiDAR frame processing, which enables accurate matching of LiDAR points to any part of the environment as seen in the images. This may provide a consistent and complete three dimensional model of the environment at the framerate of the camera.

Further, to enable this functionality, three dimensional velocity estimations may be derived from the pointcloud data. This is a useful tool and information for consecutive processing like trajectory management. With an accurate estimation where an object is located in three dimensional and where it is heading towards at what speed, plus the dense pixel information provided by an image, the presented approach is an extension to current algorithms that do not explicitly take into account the dynamic motion artifacts. Disclosed embodiments are based on a setup of a high framerate camera and a high resolution LiDAR. The setup can further be aided by Global Positioning System (GPS) and Inertial Measurement Unit (IMU) technology but this technology is not central to the inventive concepts.

The attached Appendix includes source code that provides insight how the actual fusion of the 10 Hz LiDAR and 100 Hz camera may be performed in accordance with at least one embodiment. However, the Appendix contains only code snippets of the main functions. Therefore, this code is not compilable.

It should be understood that the operations explained herein may be implemented in conjunction with, or under the control of, one or more general purpose computers running software algorithms to provide the presently disclosed functionality and turning those computers into specific purpose computers.

Moreover, those skilled in the art will recognize, upon consideration of the above teachings, that the above exemplary embodiments may be based upon use of one or more programmed processors programmed with a suitable computer program. However, the disclosed embodiments could be implemented using hardware component equivalents such as special purpose hardware and/or dedicated processors. Similarly, general purpose computers, microprocessor based computers, micro-controllers, optical computers, analog computers, dedicated processors, application specific circuits and/or dedicated hard wired logic may be used to construct alternative equivalent embodiments.

Moreover, it should be understood that control and cooperation of the above-described components may be provided using software instructions that may be stored in a tangible, non-transitory storage device such as a non-transitory computer readable storage device storing instructions which, when executed on one or more programmed processors, carry out the above-described method operations and resulting functionality. In this case, the term non- transitory is intended to preclude transmitted signals and propagating waves, but not storage devices that are erasable or dependent upon power sources to retain information.

Those skilled in the art will appreciate, upon consideration of the above teachings, that the program operations and processes and associated data used to implement certain of the embodiments described above can be implemented using disc storage as well as other forms of storage devices including, but not limited to non-transitory storage media (where non- transitory is intended only to preclude propagating signals and not signals which are transitory in that they are erased by removal of power or explicit acts of erasure) such as for example Read Only Memory (ROM) devices, Random Access Memory (RAM) devices, network memory devices, optical storage elements, magnetic storage elements, magneto optical storage elements, flash memory, core memory and/or other equivalent volatile and non-volatile storage technologies without departing from certain embodiments. Such alternative storage devices should be considered equivalents.

While certain illustrative embodiments have been described, it is evident that many alternatives, modifications, permutations and variations will become apparent to those skilled in the art in light of the foregoing description. Accordingly, the various embodiments of, as set forth above, are intended to be illustrative, not limiting.

Appendix

This source code provides the insights how the actual fusion of the 10 Hz Lidar and 100 Hz camera is done. It contains only code snippets of the main functions. Therefore, this code is not compilable.

*/

class DynamicMotionEstimator

{

public:

Sceneframe Fusion_data;

ofFilter m_OpticalFlow;

clustering m_Object;

REGISTRATION : :Transformation_Estimator T_Estimator;

DynamicMotionEstimator() {

auto & smSettings = SettingsManager<FusionSettings>::Current();

smS ettings . Lo ck() ;

FusionSettings settingsFusion = smSettings. settings;

smS ettings . Unlo ck() ;

pci: : console: : setVerbosityLevel(pcl: : console: :L_ALWAYS);

_CamParams = settingsFusion._CameraParams;

_lidParams = settingsFusion. rdarParams;

}

~DynamicMotionEstimator() { }

/* This functions splits up a full lidar frame into smaller subframes and outputs the indices of each split. The goal of the splitting is to minimize the temporal difference between

the sampling time of each point in the subframe and the capturing time of the corresponding image. Therefore the data is split in the middle of the time interval of consecutive images.

*/

void Lidardatasplit(lidarFusionFrame const& srcData )

{

//get the index which splits the data to the corresponding images

int numlmages = srcData. cameraFrames.size();

LidarMotionFrame<LIDAR_SIZE> const& lidar = srcData.lidarFrame;

std::vector<uint64_t> imgTS;

imgTS .reserve(numlmages+ 1 ) ;

//get Image timestamp

for(int i = 0; i<srcData.cameraFrames.size(); ++i )

{

imgTS[i] = srcData. cameraFrames[i] .timestamp;

}

this->start_index. clear() ;

this->start_index.resize(2);

this->start_index[0].reserve(numImages+2);

this->start_index[ 1 ] .reserve(numImages+2);

this->start_index[0] push_back(0);

this->start_index[ 1 ] .push_back(0);

int d[2] = {0,0};

for (int j = 0; j < LidarSizeY; ++j)

{

for (int i = 0; i < LidarSizeX; ++i)

{

for(int eye = 0; eye<2; eye ++)

{

lidarSampleData sampleData = cvt2LSD(lidar, eye, i, j, USE CAM REF, 0); //Get timestamp of the (i,j)th lidarSample

auto dstTime = sampleData.timestamp;

//project points to specific image frame

if(srcData.cameraFrames.size()>d[eye])

{

//calculate delta time values

if(srcData.cameraFrames.size()-l != d[eye])

{

int64_t dtime_img = imgTS[d[eye]+l]-imgTS[d[eye]];

int64_t dtime = dstTime - imgTS[d[eye]];

//decide if it belongs to current or next image

#ifdef Simulation

if(dtime > (double) dtime_img)

{

d[eye]++;

start_index[eye] .push_back(j *LidarSizeX+i);

}

#else

if(dtime > 0.5 *(double) dtime_img)

{

d[eye]++;

start_index[eye] .push_back(j *LidarSizeX+i);

}

#endif

}

for(int j = 0; j< 2; j++)

{

while(this->start_index[j].size() != srcData.cameraFrames.size()+l)

{

this->start_index[j].push_back(LidarSizeX*LidarSizeY);

}

/* This functions considers only the new incoming subframe and its corresponding image. The 3D lidar points are first transformed into world coorindate system. This

would correspond in our case to the camera cooridnate system. Then each point is projected on the image and the 2D locations and the color values are saved.

Further, the reflectivity values of the lidar samples are saved as well. Only points which do not belong are further considered.

All this data is needed to compute the new velocity update for the current frame.

*/

void project_subframe(lidarFusionFrame const& srcData, int it)

{

//Extract the one lidar frame attached to the struct

LidarMotionFrame<LIDAR_SIZE> const& lidar = srcData.lidarFrame;

Pixellmage<uint8_t,l> const &image = srcData.cameraFrames.back().images[0]; int numlmages = srcData.cameraFrames.size();

std::vector<uint64_t> imgTS;

imgTS .reserve(numlmages+ 1 ) ;

//get Image timestamp

for(int i = 0; i<srcData.cameraFrames.size(); ++i )

{

imgTS[i] = srcData.cameraFrames[i].timestamp;

}

std: :vector<cv: :Matx34f> proj_vector;

std: :vector<cv: :Matx34f> world_proj ;

std: :vector<std: :vector<cv: :Matx34f>> mod_proj ;

lidarSensorStruct const& srcSensor = srcData.lidarFrame.sensors[0];

//save the rotation matrix for each eye

for(int eye = 0; eye<2; eye ++)

{

cv::Matx33frot;

cv::Matx3 If trans;

//lidarSensorStruct const& srcSensor = srcData.lidarFrame.sensors[s]; cv::Matx34fproj;

cv::Matx34f world_cam;

cv::Matx34f proj_copy;

_lidParams.rt[eye](cv::Rect(0,0,3,3)).copyTo(rot);

_lidParams.rt[eye](cv::Rect(3,0,l,3)).copyTo(trans);

cv::Matx33f cam = _CamParams.K;

_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(proj);

_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(world_cam);

_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(proj_copy);

proj = cam * proj;

//save projection matrix

proj _ve ctor . push_back(proj ) ;

world_proj .push_back(world_cam) ;

std: : vector<cv: :Matx34f> tmp_vec;

for(int i= 0; i<2; i++)

{

cam(0,0) = 0.5 * cam(0,0);

cam(l,l) = 0.5 * cam(l,l);

cam(0,2) = round(0.5 * cam(0,2));

cam(l,2) = round(0.5 * cam(l,2));

proj = cam * proj_copy;

tmp_vec .push_back(proj) ;

}

mod_proj .push_back(tmp_vec);

}

//create a vector of images

std::vector<cv::Mat> img_vector;

//undistort all the images

for(int n = 0; n< srcData.cameraFrames.size(); ++n )

{

cv::Mat raw;

cv::Mat undist; cv::Mat img;

Pixellmage<uint8_t,l> const &imagel = srcData.cameraFrames[n].images[0];

wrapPixelImage2Mat(const_cast<PixelImage<uint8_t,l>&>(imagel), raw);

cv::undistort(raw, undist, _CamParams.K, _CamParams.D);

cv::cvtColor(undist, img, cv::COLOR_GRAY2BGR);

img_vector.push_back(img);

}

//get indices of subframe

int index = 0;

//get current image & and init the OF class

cv::Mat tmp = cv::Mat(2048, 640, CV_8UC1);

cv::cvtColor(img_vector[it], tmp, cv::COLOR_BGR2GRAY);

m_OpticalFlow. set_current_frame(tmp) ;

tmp. copyTo(this->Fusion_data. img) ;

// Go through lidarData

for(int eye = 0; eye<2; eye ++)

{

//only a fraction of the whole lidar frame is considered

for (int k = this->start_index[eye][it]; k < this->start_index[eye][it+l]; k++)

{

int j = k / LidarSizeX;

int i = k - j *LidarSizeX;

lidarSampleData sampleData = cvt2LSD(lidar, eye, i, j, _USE_CAM_REF, 0);

//do transformation

cv: :Matx4 if coord = cv::Matx41f(sampleData.x, sampleData.y, sampleData.z, 1);

//map point on the image

cv: :Matx31 f pointH = proj_vector[eye] *coord;

//normalize coordinates

int x = round(pointH(0)/pointH(2));

int y = round(pointH(l)/pointH(2));

//compute world coordinate with camera as center of origin:

cv::Matx31fpoint3D = world_proj[eye]*coord;

//Get timestamp of the Lidar sample

auto dstTime = sampleData.timestamp;

//iterator for image

int d = it;

index = d;

//save all information of reflectivity, intensity, raw coordinates, pixel coordinates, timestamp if(srcData.cameraFrames.size()>d)

{

//save points on correct image

if(x >= 1 && x < _CamParams.S. width && y>=0 && y < _CamParams.S. height && sampleData.range < 250 && sampleData. segment != 0) //&& sampleData.range > 20)

{

int loc = (y * _CamParams.S. width + x);

//add points to pointcloud

pcl::PointCloud<pcl::PointXYZ>::PointType pt;

pt.x = point3D(0); pt.y = point3D(l); pt.z = point3D(2);

this->Fusion_data.subframe_cloud->push_back(pt); Poxel_N t;

t.x = x;

t.y = y;

t.x OF = x;

t.y OF = y;

t.x_raw = pt.x;

t.y_raw = pt.y;

t.z_raw = pt.z;

t.ax = 0;

t.ay = 0;

t.az = 0;

t.flow = 0.0;

t.range = sampleData.range;

t.reflectivity = 1 -sampleData.reflectivity;

t.intensity = tmp.at<uchar>(y,x);

#ifdef Simulation_timestamp

t.timestamp = imgTS[it];

#else

t.timestamp = dstTime;

#endif

this->F usion_data. subframe .push_back(t) ;

}

/* Each existing point in the 2D propagated pointcloud is temporally aligned to current frame. This can be achieved by shifiting each 2D point according to the optical flow

obtained in the current frame. The 2D propagated pointcloud is later on needed to find correspondence in the image space.

*/

void propagate_2D(){

if(this->Fusion_data.segements.size() == this->Fusion_data.accumulated_cloud->size())

{

//go through subframe cloud

for(int i = 0; i < this->Fusion_data.segements.size(); i++)

{

//consider only dynamic points

if(this->Fusion_data.segements[i] != 3)

Poxel Sf tmp = this->Fusion_data.accumulated_frame[i];

if(0 <= tmp.x_OF && tmp.x_OF < 2048 && 0 <= tmp.y_OF && tmp.y_OF <= 640)

{

cv::Point2f& fxy = m_OpticalFlow.curr_flow.at<cv::Point2f>(tmp.y_OF, tmp.x_OF);

//propagate

tmp.x_OF = tmp.x_OF + fxy.x;

tmp.y_OF = tmp.y_OF + fxy.y;

tmp. flow = sqrt(fxy.x * fxy.x + fxy.y * fxy.y);

}

//save the data

this->Fusion_data.accumulated_frame[i] = tmp; }

}

/* The raw accumulated pointcloud is the accumulated raw 3D points from multipe subframes and covers the time interval of a full lidar frame. In this step,

all the points in the raw accumulated pointcloud, which have a motion assigned, are propagated according to their 3D velocities. The 3D propagated pointcloud

is needed in the further processing to find correspondences in 3D space.

*/

void propagate_3D(uint64_t time_img, lidarFusionFrame const& srcData)

{

if(this->Fusion_data.accumulated_cloud->size()== this->Fusion_data.time.size())

for(int i = 0; i < this->Fusion_data.time.size(); i++)

{

//check if the a motion is assigned to the point

if(this->Fusion_data.segements[i] == 1)

{

//propagate point according to time

double factor = (double) this->Fusion_data.time[i];

Poxel Sf motion = this->Fusion_data.accumulated_frame[i];

//compute the time difference of the current image and the sampling time of the specific lidar point float delta_t = (time_img - motion. timestamp);

float d_t = delta_t * (float) le-6;

pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.accumulated_cloud->points[i]; pcl::PointCloud<pcl::PointXYZ>::PointType new_pt;

//change to m/s

float vx = motion.vx * 100;

float vy = motion.vy * 100;

float vz = motion.vz * 100;

//propagate 3D

new_pt.x = pt.x + d_t * vx;

new_pt.y = pt.y + d_t * vy;

new_pt.z = pt.z + d_t * vz;

//save the values

this->Fusion_data.propagated_cloud->push_back(new_pt);

(else

{

pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.accumulated_cloud->points[i]; this->Fusion_data.propagated_cloud->push_back(pt);

}

/* First, each point of the new incoming subframe cloud is classified as static or dynamic. For the further processing only dynamic points are considered.

Then, the 3D propagated cloud is used to find matching points in lidar space, while each pair of points is weighted with the corresponding optical flow. A RAN SAC

algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. Each point of the subframe cloud is then checked

if it follows the motion or not. The remaining points, which do not fulfill the motion hypothesis test, are further processed in the 2D correspondence finding block. These remaining points are projected on the image and checked within a defined radius for existing points in the 2D propagated pointcloud. Trough the connection between

the accumulated pointcloud and the 2D propagated pointcloud, correspondences can be found. Similar to the 3D correspondence block, a RANSAC algorithm is used to obtain

the optimal transformotion. Finally the motion hypothesis test checks if a motion is assigned to each remaining point.

*/

void compute_3D_velocity(lidarFusionFrame const& srcData, int d)

{

if(!this->Fusion_data.subframe_cloud->empty()){

/* In the preprocessing step the SHOT descriptor for each point in the new subframe is computed. Further, each point of the new incoming subframe cloud is

classified as static or dynamic. For the further processing only dynamic points are considered.

*/

III start number 3 - Preprocessing

//compute normal vector - needed for descriptor computation

compute_normal() ;

if( ! this->F usion_data. subframe_cloud->empty()) {

//compute lidar descriptor

compute_SHOT();

if(this->Fusion_data.tot_features.size() > 9 && this->Fusion_data.subframe_keypoints.size() >10 && this->F usion_data. segements. size() == this->F usion_data. accumulated_frame. size())

{

///start feature matching and velocity update

int counter = 0;

int tot_counter = 0;

int segment = 0;

pci: : Correspondences correspondences_propagated;

std::vector<float> weights;

//check for static points - go through all the points in the subframe cloud

for(int i=0; i< this->Fusion_data.subframe_cloud->size(); i++)

{

//get the 2d projection of the 3D subframe point

pcl::PointCloud<pcl::PointXYZ>::PointType pt_sub;

pt_sub = this->Fusion_data.subframe_cloud->points[i];

Poxel_N tl = this->Fusion_data.subframe[i];

//if there is ego-motion -> transform each point according to it's egomotion

// take difference between proj_point and proj_transformed_point

// multiply v_proj_lidar and v_OF and divide through their lenght and this should equal to 1 float EPSILON = 0.5;

//get the flow at a specific position

cv::Point2f& fxy = m_OpticalFlow.curr_flow.at<cv::Point2f>(proj_point(l), proj_point(0)); float x_disp = fxy.x;

float y_disp = fxy.y;

float l_v_of = sqrt(x_disp * x_disp + y_disp * y_disp);

float l_v_3D = ego_motion; //if there is no ego-motion it corresponds to zero

//CHECK IF STATIC: no ego-motion is assumed

if(l_v_3D < EPSILON && l_v_of < EPSILON)

{

//3: static this->Fusion_data.subframe_segments[i] = 3;

tl.vx = 0;

tl.vy = 0;

tl.vz = 0;

this->Fusion_data.subframe[i] = tl;

}

/* The 3D propagated cloud is used to find matching points in lidar space, while each pair of points is weighted with the corresponding optical flow.

*/

III start 4 - 3D correspondence finding

//just a check if the dataset are still consistent

if(this->Fusion_data.segements.size() > 0 && !this->Fusion_data.propagated_cloud->empty() && enable_propagation)

{

//all the points are stored in trees, to get faster the nearest neighbor

create_tree();

//go through the keypoints

for(int i=0; i< this->Fusion_data.subframe_keypoints_cloud->size(); i++)

{

//take only dynamic points:

if(this->Fusion_data.subframe_segments[keypoints_indices->indices[i]] == 0)

{

//find nearest neighboor of query point

pcl::PointXYZ searchPoint;

pcl::PointCloud<pcl::PointXYZ>::PointType pt = this-

>Fusion_data.subframe_keypoints_cloud->points[i];

// K nearest neighbor search

int K = 1 ;

std: : vector<int> pointldxNKN Search(K) ;

std: :vector<float> pointNKN SquaredDistance(K);

//check if a neighbor exists between the 3D propagated pointcloud and the point in the subframecloud

if ( kdtree.nearestKSearch (searchPoint, K, pointldxNKNSearch, pointNKNSquaredDistance) > 0 )

{

//check if it has already a label -> increase counter

if(this->Fusion_data.segements[pointIdxNKNSearch[0]] == 1 && pointNKNSquaredDistancefO] < 0.1)

{

//Through the propagated 3D cloud one knows the connection to the accumulated raw

3D cloud

pcl::PointXYZ match;

match = this->Fusion_data.accumulated_cloud->points[pointIdxNKNSearch[0]];

//find best nearest neighbor— SFiOT comparison

int set_counter = 1 ;

std::vector<int> index_cloud(K);

std: :vector<float> distance_vector(K);

float range = sqrt(pt.x * pt.x + pt.y * pt.y + pt.z * pt.z);

float lim = range / 50.0;

//find nearest neighbors int checkpoints = acc_cloud_tree.radiusSearch(match, lim, index_cloud, distance_vector, set_counter);

//compare SHOT features

curr_distance = shot.compare(index_cloud, checkpoints);

int j = shot.get index();

match = this->Fusion_data.accumulated_cloud->points[j];

//compute weight:

weight = compute_OF_weight(match);

weights.push_back(weight);

//save correspondences

pcl::Correspondence corr;

corr.index_query = i;

corr.index_match = j ;//pointIdxNKNSearch[0] ;

correspondences_propagated.push_back(corr);

}

/* A RANSAC algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. */

///start 5 - 3D velocity estimation

//RANSAC: reject outliers

if(!correspondences_propagated.empty())

//do RANSAC with new correspondance

ransac . compute() ;

transform = ransac. transform();

correspondences = ransac. correspondences();

/* Each point of the subframe cloud is then checked if it follows the motion or not. The remaining points, which do not fulfill the motion hypothesis test,

are further processed in the 2D correspondence finding block.

*/

///start 6 - motion hypothesis check

//go through subframe_cloud

for(int i = 0; i< this->Fusion_data.subframe_cloud->size(); i++)

{

if(this->Fusion_data.subframe_segments[i] != 3 && this-

>Fusion_data.subframe_segments[i] != 4 )

{

//check each point in the subframe if they correspond to the motion pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.subframe_cloud-

>points[i];

Eigen: :MatrixXf point(4, 1 );

point(O) = pt.x;

point(l) = pt.y;

point(2) = pt.z;

point(3) = 1;

//transform each point through the hypothesis

Eigen: :MatrixXf res = transform * point;

Poxel_N tl = this->Fusion_data.subframe[i];

double prob = 0; //find nearest neighboor of query point

pcl::PointXYZ searchPoint;

//project res into image:

cv::Matx31f coord = cv::Matx31f(res(0,0), res(l,0), res(2,0));

cv::Matx33f cam = _CamParams.K;

cv::Matx31fproj_point = cam * coord;

searchPoint.x = proj_point(0);

searchPoint.y = proj_point(l);

if(!isnan(searchPoint.x) && !isnan( searchPoint.y) && !isnan(searchPointz))

11 K nearest neighbor search

int K = 8;

std: :vector<int> pointldxNKNSearch(K);

std::vector<float> pointNKNSquaredDistance(K);

//find the subset of points in the accumulated pointcloud

int neighbors = kdtreel.radiusSearch(searchPoint,d(range), pointldxNKNSearch, pointNKNSquaredDistance, 8);

float reflectivity = tl. reflectivity;

float intensity = tl .intensity; int inlier = 0;

//search for inlier:

if( ! correspondence_cloud->empty())

{

//find valid correspondeces in the

std: : vector<int> pointldxNKN Search_inlier( 10);

std: : vector<float> pointNKNSquaredDistance_inlier( 10);

int neighbors_inlier = kdtree_inlier.radiusSearch(pt, d(range), pointIdxNKNSearch_inlier, pointNKNSquaredDistance_inlier, 10);

inlier = neighbors_inlier;

}

prob = compute_probability(point, res, neighbors, pointldxNKNSearch, reflectivity, intensity);

prob = prob * inlier * 0.5;

}

float limit = 3 ;

if(prob>limit)

{

//assign velocity

tl = compute_velocity();

}

this->Fusion_data.subframe[i] = tl;

}

this->Fusion_data.subframe_time = 1;

}

/* These remaining points are projected on the image and checked within a defined radius for existing points in the 2D propagated pointcloud. Trough the connection between

the accumulated pointcloud and the 2D propagated pointcloud, correspondences can be found.

*/

///start 7 - 2D correspondence finding

//OF features

//propagate lidar samples through OF and use this to find corresponding point

pcl::CorrespondencesPtr correspondences(new pcl::Correspondences());

pcl::Correspondences corr;

for(int i = 0; i < this->Fusion_data.subframe_keypoints_cloud->size(); i++)

{ searchPoint.z = 0;

//check if there is any points without any velocities

bool check = false;

if(this->Fusion_data.subframe_segments[keypoints_indices->indices[i]] == 0 && !this-

>Fusion_data.accumulated_features->empty())

check = true;

if(check)

{

//find nearest neighbor between the 2D projected point of the subframe cloud and the 2D propagated cloud

//the index of the nearest neighbor can be again taken to determine the connection to the accumulated cloud

neighbors = OF_tree.radiusSearch(searchPoint, radius, pointldxNKNSearch, pointNKNSquaredDistance, K);

if(neighbors >= 1)

{

float distance = shot.compare_shot(pointNKNSquaredDistance, neighbors);

int j = shot.get_index();

pt = this->Fusion_data.subframe_keypoints_cloud->points[i];

match = this->Fusion_data.accumulated_cloud->points[j];

time = this->Fusion_data.time[j];

index = j;

}

//compute OF weight

weight = compute_OF_weight(sub_data);

weights.push_back(weight); pci:: Correspondence corrl;

corrl.index_query = i;

corrl .index_match = index;

correspondences->push_back(corr 1 );

}

///start 8 - 3D velocity estimation

//RANSAC: reject outliers

if(! correspondences->empty()) {

//RANSAC

ransac . compute() ;

transform = ransac.transform();

correspondences = ransac. correspondences!);

this->Fusion_data.correspondence_new =“"correspondences;

are further processed in the 2D correspondence finding block.

*/

///start 9 - motion hypothesis check

//go through subframe_cloud

for(int i = 0; i< this->Fusion_data.subframe_cloud->size(); i++)

{

if((this->Fusion_data.subframe_segments[i] == 0 || this->Fusion_data.subframe_segments[i] == 2 II this->Fusion_data.subframe_segments[i] == 6) && ! correspondences->empty())

{

pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.subframe_cloud-

>points[i];

Eigen: :MatrixXf point(4, 1 );

point(0) = pt.x;

point(l) = pt.y;

point(2) = pt.z;

point(3) = 1;

//each point is projected back according its velocity

Eigen: :MatrixXf res = transform * point;

Poxel_N tl = this->Fusion_data.subframe[i];

double prob = 0;

//find nearest neighboor of query point

pcl::PointXYZ searchPoint;

cv::Matx31f coord = cv::Matx31f(res(0,0), res(l,0), res(2,0));

cv::Matx33f cam = _CamParams.K;

cv::Matx31fproj_point = cam * coord;

searchPoint.x = proj_point(0);

searchPoint.y = proj_point(l); if(!isnan(searchPoint.x) && !isnan(searchPointy) && !isnan(searchPointz))

// K nearest neighbor search

int K = 8;

std: :vector<int> pointldxNKNSearch(K);

std::vector<float> pointNKNSquaredDistance(K);

int neighbors = kdtreel.radiusSearch(searchPoint, 5, pointldxNKN Search, pointNKNSquaredDistance, 8);

int inlier = 0;

//search for inlier:

if(! correspondence_cloudl ->empty())

{

std: : vector<int> pointldxNKN Search_inlier( 10);

std: : vector<float> pointNKNSquaredDistance_inlier( 10); int neighbors_inlier = kdtree_inlier.radiusSearch(pt, 1, pointIdxNKNSearch_inlier, pointNKNSquaredDistance_inlier, 10);

inlier = neighbors_inlier;

}

float reflectivity = tl. reflectivity;

float intensity = tl .intensity;

prob = prob * inlier * 0.5;

} if(prob > 10)

{

//assign velocity

tl = compute_velocity;

this->Fusion_data.subframe[i] = tl ;

}

this->Fusion_data.subframe_time = 1;

}

//This is the main function. It calls all the subfunctions

void process(lidarFusionFrame const& srcData)

{

//Only a fraction of points are considered each frame

Lidardatasplit(srcData) ;

for(int i=0; i < start_index[l].size()-l;i++)

{

//update optical flow

m_OpticalFlow.process();

//propagates projected 3D point in image space by the optical flow

propagate_2D();

//3D propagation of the lidar points

propagate_3D(srcData.cameraFrames[i]. timestamp, srcData);

//3D -> 2D coordinates

proj e ct_sub frame (srcD ata, i) ;

//classification between static and dynamic points/ SHOT feature computation

//3D correspondence finding and 2D correspondence finding

compute_3D_velocity( srcData, i); }

5 start_index.clear();

}

} ;

10

#endif // DYNAMICMOTIONESTIMATO 2

Claims

1. A method for data fusion of LiDAR data and image data, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, comprising: capturing three-dimensional LiDAR data of the scene with a scanning LiDAR sensor

(2);

capturing image data corresponding to a two-dimensional projection of the scene with an image sensor (3) having a higher framerate as compared to the framerate of a complete scan of the LiDAR sensor;

estimating (15, 18) based on the captured image data and/or LiDAR data a three- dimensional velocity of a dynamic object (5) present in the scene;

compensating the motion of the dynamic object (5); and

fusing the motion compensated LiDAR data and image data.

2. The method of claim 1, further comprising:

accumulating LiDAR samples of multiple LiDAR subframes to generate an accumulated LiDAR point cloud corresponding to a complete scan of the sensed scene at the frame rate of the LiDAR sensor.

3. The method of claim 2 or, further comprising:

4. The method of claim 3, wherein propagating the projected LiDAR samples is based on an optical flow algorithm which determines the displacement information between consecutive camera images.

5. The method of claim 4, further comprising:

updating (12) the propagated point cloud according to the optical flow information generated for a new camera image; and

checking (14, 17) the projected LiDAR samples of a new LiDAR subframe for a correspondence with existing points in the current propagated point cloud.

6. The method of claim 5, wherein a correspondence qualifies as valid if projected LiDAR samples of a new LiDAR subframe fall within a defined radius around an existing point.

7. The method of any of claims 3 to 6, further comprising:

assigning an index to the LiDAR samples; and

8. The method of any of claims 3 to 7, wherein LiDAR samples from multiple LiDAR subframes are accumulated to cover the time interval of a full LiDAR frame, and wherein accumulated data older than this time interval are deleted after the processing of an image frame and new data for the current LiDAR subframe are added to the accumulated data.

9. The method of any of claims 3 to 8, further comprising:

10. The method of any of claims 3 to 9, further comprising:

checking whether LiDAR samples in a new LiDAR subframe correspond to a dynamic or static object; and keeping only LiDAR samples corresponding to dynamic points for the further processing.

11. The method of any of claims 3 to 10, further comprising:

detecting a ground plane in the scene;

removing the identified LiDAR samples from the further processing.

12. The method of any of the preceding claims, wherein the estimated three dimensional velocity of the dynamic objects is used for transportation vehicle path planning and/or trajectory management.

13. An apparatus for data fusion of LiDAR data and image data, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, the apparatus comprising:

a scanning LiDAR sensor (2) generating the LiDAR data;

compensate the motion of the dynamic object; and

fuse the motion compensated LiDAR data and image data.

14. The apparatus according to claim 14, further comprising:

15. A vehicle performing a method according to any of claims 1 to 12 and/or comprising an apparatus according to any of claims 13 to 14.