WO2020104423A1 - Method and apparatus for data fusion of lidar data and image data - Google Patents

Method and apparatus for data fusion of lidar data and image data

Info

Publication number
WO2020104423A1
WO2020104423A1 PCT/EP2019/081741 EP2019081741W WO2020104423A1 WO 2020104423 A1 WO2020104423 A1 WO 2020104423A1 EP 2019081741 W EP2019081741 W EP 2019081741W WO 2020104423 A1 WO2020104423 A1 WO 2020104423A1
Authority
WO
WIPO (PCT)
Prior art keywords
lidar
data
subframe
points
samples
Prior art date
Application number
PCT/EP2019/081741
Other languages
French (fr)
Inventor
Nijanthan BERINPANATHAN
Nils Kuepper
Jerramy GIPSON
Original Assignee
Volkswagen Aktiengesellschaft
Audi Ag
Dr. Ing. H.C.F. Porsche Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Volkswagen Aktiengesellschaft, Audi Ag, Dr. Ing. H.C.F. Porsche Ag filed Critical Volkswagen Aktiengesellschaft
Publication of WO2020104423A1 publication Critical patent/WO2020104423A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/06Systems determining position data of a target
    • G01S17/42Simultaneous measurement of distance and other co-ordinates
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/50Systems of measurement based on relative movement of target
    • G01S17/58Velocity or trajectory determination systems; Sense-of-movement determination systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/66Tracking systems using electromagnetic waves other than radio waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/48Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00
    • G01S7/4802Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present invention relates generally to a method and an apparatus for data fusion of LiDAR data and image data, and more particularly to sensor fusion in transportation vehicles for obtaining a consistent environment model for assisted or automated driving.
  • Autonomous driving allows vehicles such as self-driving cars to move safely with little or no human input based on a variety of sensors, which perceive their surroundings.
  • classification systems define different levels, typically ranging from fully manual to fully automated systems.
  • the standard SAE J3016 defines six such levels, wherein for levels 0 to 2 the human driver still monitors the driving environment with no automation at all, some driver assistance or partial automation and for levels 3 to 5 an automated driving system monitors the driving environment and performs the driving tasks with different levels of possible driver intervention.
  • LiDAR sensors and cameras are often used together in the fields of driver assistance systems and autonomous driving, as these sensors complement each other well thanks to their different sensor characteristics.
  • Cameras generate image data by capturing light intensities and typically sample the scene at once but only provide a 2D projection of the scene without 3D information.
  • a LiDAR sensor Opposed to a camera, a LiDAR sensor directly determines 3D distances for each sample point of the generated LiDAR pointcloud, however, as compared to a camera, with a low spatial and temporal sampling frequency. Therefore, the combination of these two complementary sensors can lead to a better understanding of the environment.
  • a 6-dimensional position vector is constructed consisting of the color value (RGB), pixel position (u, v) and time.
  • the unknown range for a specific position is predicted as a weighted sum of neighboring range values. The weight is dependent on the 6-D distance between its neighbors. For low relative velocities, for example, with cars travelling on a highway in the same direction and at almost the same speed as the ego- vehicle, this algorithm appears to work well. However, if high relative velocities occur, that approach will lead to inconsistent matching of three-dimensional points and, in turn, will lead to a wrong upsampling.
  • Intelligent Transportation Systems Conference (ITSC), October 2017, and M. H. Daraei, et al.,“Velocity and shape from tightly-coupled lidar and camera,” 2017 IEEE Intelligent Vehicles Symposium, June 2017, focused on a similar approach by proposing a method for object tracking and reconstruction by fusing LiDAR and image data.
  • an intermediate surface was used for combining both modalities.
  • the image likelihood term was a function of the appearance model, the new image and the three dimensional velocity.
  • the image could then be related to the appearance model by shifting the model according to the projected three-dimensional velocity on the image.
  • a photometric error could be computed as a function of the velocity.
  • the LiDAR likelihood term was a function of the new incoming point cloud, the intermediate surface representation and the velocity. According to the velocity, the estimated surface could be transformed and an error of the alignment with the incoming point cloud could be computed. By minimizing this multi-modal cost function, one would get the optimal value for the velocity estimation. However, if the relative velocities exceed tolerable levels and the correct overlay of the point cloud and image could not be guaranteed, it leads to a wrong estimation of the appearance model and accordingly to inaccurate velocity estimation.
  • a method and apparatus addresses the problem of arbitrary relative velocities in order to enable sensor fusion processing in a wide range of configurations.
  • a method for data fusion of LiDAR data and image data comprising:
  • the method further comprises:
  • LiDAR subframe including LiDAR samples covering a part of the scene during the frame period of an image sensor generating the image data
  • the method further comprises:
  • propagating the projected LiDAR samples is based on an optical flow algorithm which determines the displacement information between consecutive camera images.
  • the method further comprises:
  • a correspondence qualifies as valid if projected LiDAR samples of a new LiDAR subframe fall within a defined radius around an existing point.
  • the method further comprises:
  • LiDAR samples from multiple LiDAR subframes are accumulated to cover the time interval of a full LiDAR frame, wherein accumulated data older than this time interval are deleted after the processing of an image frame and new data for the current LiDAR subframe are added to the accumulated data.
  • the method further comprises in a preferred embodiment:
  • LiDAR descriptors for dynamic LiDAR points in a LiDAR subframe point cloud, wherein a LiDAR descriptor describes a geometrical three dimensional structure of the local neighborhood of the dynamic points;
  • the method further comprises:
  • the method further comprises:
  • the estimated three-dimensional velocity of the dynamic objects is used for transportation vehicle path planning and/or trajectory management.
  • an apparatus for data fusion of LiDAR data and image data comprising:
  • a high framerate camera (3) generating the image data, wherein the image data are generated with a higher framerate as compared to the framerate of a complete scan of the scanning LiDAR sensor;
  • one or more processor coupled to the camera and the LiDAR sensor, wherein said one or more processors are configured to:
  • the apparatus further comprises:
  • one or more buffers coupled to the one or more processors for storing accumulated information related to the LiDAR data and the image data.
  • a vehicle performs a method and/or comprises an apparatus as described above.
  • Figure 1 A illustrates a high resolution LiDAR sensor and a high framerate camera closely collocated for simultaneously capturing the environment from the same viewpoint.
  • Figure IB illustrates an arrangement comprising a high resolution LiDAR system and a high framerate camera mounted on the roof of a vehicle.
  • Figure 2A illustrates an example of a transportation vehicle, which passes an ego vehicle with unknown velocity and a silhouette surrounding distorted LiDAR samples
  • Figure 2B illustrates schematically several images captured by a camera and a single full LiDAR frame completely mapped on one of the images.
  • Figure 3A illustrates the transportation vehicle with a silhouette surrounding LiDAR samples corresponding to a LiDAR subframe.
  • Figures 3B illustrates schematically the LiDAR subframe falling in a time interval D ⁇ around one of the camera images.
  • Figure 4A illustrates an image of a transportation vehicle together with LiDAR samples, which correspond to the shape of the transportation vehicle but are displaced.
  • Figure 4B corresponds to Figure 2B.
  • Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle.
  • Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame.
  • Figure 6 illustrates a method for performing a velocity estimation for an asynchronous fusion of LiDAR data and camera data in accordance with at least one embodiment.
  • the disclosed embodiments are directed at providing a technical solution for the problem of asynchronous fusion between LiDAR data and camera data, provided by a combination of a LiDAR sensor and camera such as shown in figure 1.
  • Figure 1 A illustrates an arrangement 1 comprising a combination of a high resolution LiDAR system 2, such as a Luminar G2+ LiDAR sensor, and a high framerate camera 3, such as a Flir Grasshooper camera, closely collocated above each other for simultaneously capturing the environment from nearly the same viewpoint.
  • a high resolution LiDAR system 2 such as a Luminar G2+ LiDAR sensor
  • a high framerate camera 3 such as a Flir Grasshooper camera
  • Figure IB illustrates the experimental arrangement 1 mounted on the roof of an ego vehicle 4.
  • the illustrated arrangement is only an example of a prototype, in a production vehicle other arrangements of LiDAR sensor and camera to each other and also at other locations of the body of the vehicle are also possible.
  • Capturing three-dimensional information for a scene with a scanning LiDAR system is fundamentally different. This is because conventional scanning LiDAR systems continuously scan the scene within a defined time interval, e.g., for most conventional LiDAR systems within 100ms corresponding to a 10 Hz scan rate. Thus, the scene is not captured for every LiDAR point in the same time interval as happens with a conventional image camera. More specifically, within 1/10 th of a second, the LiDAR beam is steered into various locations to locally sample the scene and only after 1/10 th of a second covers the same location again.
  • Figure 2A illustrates an example of a transportation vehicle 5, which passes a not shown ego vehicle with unknown velocity.
  • the individual LiDAR samples are not individually shown in the figure, but are located within silhouette 6.
  • the silhouette does not match with the actual appearance of the car in the image frame, meaning that the LiDAR samples are distorted. This distortion results from the different capturing mechanisms, as mentioned above and further illustrated in Figure 2B.
  • Figure 2B schematically show six progressive full images 7, each captured by a camera within a short timespan at a particular point in time t. Over the same period of time where the shown six full images are captured, a single full LiDAR frame 8 is determined by scanning the same scene. The determined LiDAR frame 8 is then mapped on the last captured image at time ti, as symbolized by the arrows.
  • artifacts are introduced by the unknown dynamic motion of objects in the scene. Such artifacts are called“dynamic motion artifact” in the following and render the resulting data less useful for the generation of environment models.
  • Disclosed embodiments address this technical problem by applying a“dynamic motion compensation,” which leverages the functionality of a high framerate camera, e.g. operating at a 100Hz framerate, in combination with a LiDAR sensor operating at e.g. 10Hz repetition rate.
  • a“dynamic motion compensation” leverages the functionality of a high framerate camera, e.g. operating at a 100Hz framerate, in combination with a LiDAR sensor operating at e.g. 10Hz repetition rate.
  • the high framerate camera has a frame rate at least 10 times faster than the LiDAR sensor’s frame-rate.
  • other ratios between the camera framerate and the LiDAR framerate are also possible as long as the camera framerate is significantly higher than the LiDAR framerate.
  • Figures 3A-3B illustrate how, by using a high framerate camera the misalignments between camera data and LiDAR data becomes negligible. Instead of mapping all LiDAR samples corresponding to a complete scan on a particular image frame, for each of the image frames only a fraction of the total LiDAR samples are considered for the fusion of three dimensional samples and camera image data. I.e. only LiDAR samples are considered that fall into a time interval of consecutive images, located in Figure 3A within silhouette 9, which covers due to the used scan pattern only a vertical fragment of the vehicle which is roughly parallel to the road surface. Correspondingly, at time ti in figure 3B only the fraction of the LiDAR frame 8 is taken into account, which falls in the time interval D ⁇ around ti.
  • the length of the time interval D ⁇ is dependent on the processing framerate and may in particular correspond to the inverse of the camera framerate.
  • Disclosed embodiments further estimate the three dimensional velocity of the dynamic objects in order to perform the dynamic motion compensation.
  • estimation of the three dimensional velocity provides a valuable input for consecutive processing operations such as path planning and trajectory management.
  • disclosed embodiments enable creation of a consistent environment model, where LiDAR points and camera generated image data are consistent. In this way, pixel data in the camera generated image data and pointcloud data generated by the LiDAR sensor are in agreement at all times.
  • Figure 4A illustrates raw alignment of a dynamically moving car. As can be seen clearly, the points representing the LiDAR measurements correspond to the shape of the car but are displaced. Figure 4B, being identical to Figure 2B, illustrates this distortion.
  • Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle.
  • the estimated motion of the vehicle is indicated in the figure with arrow 10.
  • the improved alignment is achieved by accumulating and updating data for several images frames as schematically shown in Figure 5B.
  • Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. In the accumulated data At-1 those data being older than the time interval are deleted at the end of the overall processing of a frame at time ti and at the same time new data for the current subframe are added to the accumulated data.
  • the displacements of brightness patterns may be calculated through a method called“optical flow.”
  • optical flow provides information where intensity patterns have shifted in consecutive image frames; ideally, this corresponds to the actual projection of motion in the observed three-dimensional scene.
  • This information one can propagate LiDAR samples on the image plane and/or limit the search space to find corresponding LiDAR samples in consecutive LiDAR frames. This can be used to resolve ambiguities in the LiDAR pointcloud processing as well as speed up algorithms for data association.
  • the image is not only the most dense and accurate representation of the environment, but also is a helping construct for all LiDAR processing.
  • LiDAR samples are considered that fall into the time interval of consecutive images.
  • This subset of points is then projected on the image and propagated according to the optical flow of each new incoming image.
  • the raw subset of points may be accumulated until a certain number of LiDAR sub frames is reached. This number may be defined by the number of points that go into a full LiDAR scan.
  • the propagated 2D points may be saved in a separate buffer, while the original indexing may be kept so the propagated points can be related to the raw three-dimensional points.
  • the new arriving subset of LiDAR points may be projected on the image.
  • the propagated two-dimensional pointcloud may be updated according to the optical flow generated by the new image. Then each point of the new arrived sub set may be checked for existing points in the current two-dimensional propagated pointcloud within a defined radius that might qualify as a valid correspondence.
  • the correspondences from the previous processing operation may be used as input for a RAN SAC algorithm to reject outliers during the process.
  • a final transformation can be obtained.
  • the computed transformation can be related to a three-dimensional velocity.
  • Each point in the subframe cloud may be checked against the motion hypothesis by transforming this point back according to the hypothesis so that the process aligns to the accumulated raw data. Further, the points may be compared using certain criteria defined by the neighboring region in the accumulated three dimensional pointcloud.
  • the comparison operation may take into account the LiDAR reflectivity, color intensity, range value and the number of inlier correspondences within a defined radius. If a given point fulfills these criteria, the specific motion will be assigned to it. Subsequently, the processed subframe may be added to the three dimensional accumulated pointcloud containing the information of motion in the scene.
  • the moving points in the accumulated pointcloud may be propagated in three dimensional space to the current time according the assigned motion and saved in a similar fashion as the two dimensional propagated pointcloud with the property of keeping the indexing. This propagation operation results in
  • the two dimensional propagated pointcloud may be updated according to the latest data, as explained above.
  • the new subset of LiDAR points may be first compared to the three dimensional propagated pointcloud. Again, in a similar fashion as the
  • each point in the new subset may be checked for existing in the three dimensional propagated pointcloud within a defined distance. If a point exists, the connection to the raw accumulated pointcloud can be established by using the index information from the three dimensional propagated pointcloud.
  • each correspondence may be weighted by the similarity to the optical flow derived from the images until the current frame.
  • optical flow may be calculated for 10 frames captured at 100Hz to weigh the LiDAR samples generated at 10 Hz.
  • the obtained transformation from the valid correspondences may provide a new motion hypothesis and each point may undergo the verification operation using the above mentioned criteria (i.e., LiDAR intensity, color intensity, range value and number of inlier correspondences in the neighborhood).
  • a point passes this analysis, it is assigned to the motion hypothesis, as explained above.
  • the points, which do not follow the motion hypothesis, are further processed in the same manner described above, wherein the correspondences are found on the image. After all of this processing is performed, the new subset of LiDAR data may again be added to the accumulated pointcloud with updated motion information.
  • Figure 6 illustrates a method for performing a velocity estimation for an asynchronous fusion of LiDAR data and camera data in accordance with at least one embodiment.
  • the processing works at the framerate of the camera, i.e. new data arrive for the algorithm with the framerate of the camera, e.g. at 100 Hz.
  • the process takes as inputs the current image frame I t , a LiDAR subframe P t containing a fraction number of points of the full LiDAR frame and the accumulated data I* t-i accumulated during the previous processing of the last 10 frames.
  • the output of the illustrated processing is a motion estimate for the incoming subframe cloud.
  • the information obtained in the current frame is saved in the accumulated data I* t .
  • the accumulated data I*t-1 contains data related to the optical flow at t-1, the raw
  • the raw accumulated cloud contains raw 3D points from multiple subframes Pt-10; Pt-9; Pt-1 and covers the time interval of a full lidarframe, e.g. 0.1 seconds.
  • the data which is older than 0.1 seconds are deleted at the end of the overall processing of the image frame at time t and at this time the new raw subframe Pt is added to the accumulated cloud.
  • the 2D propagated data consists of 2D projected points and covers the same time interval as the raw accumulated cloud and contains the same number of points as well. In contrast to the raw accumulated cloud, the 2D points are temporally aligned to the last frame.
  • the 3D propagated pointcloud consists of the 3D points from multiple subframes, same as the the raw accumulated point cloud. But the points are propagated according to the velocity obtained from the previously processed frames, so each point of the propagated pointcloud is temporally aligned after the propagation.
  • the above mentioned input data provided by block 11 are supplied to the update block 12.
  • update block 12 a correction of the optical flow to the newly acquired image I t and the propagation of the accumulated points in 2D by using the optical flow process, and in 3D according to the computed motion if there is any available is executed.
  • the preprocessing step 13 first the ground points are removed to save computation time, then each sample of the new LiDAR subframe is checked whether it belongs to a dynamic or static object. Only dynamic points are further processed. Lastly, in the preprocessing step LiDAR descriptors are computed, which are required for the matching step.
  • mapping points between the new subframe P t and the accumulated data I* t-i are determined. For that, the motion information obtained in previous processing frames are used to find correspondences directly in 3D space.
  • the optimal transformation is determined in the 3D velocity estimation block 15. Then, the obtained transformation is related to the time difference between the corresponding points to get a 3D velocity value.
  • the motion hypothesis is tested on each dynamic point in the subframe Pt.
  • This check of the motion hypothesis is performed since even though the output of the correspondence finding block is a transformation T t which best aligns the new subframe Pt to the accumulated pointcloud r t-i o :t-i , subframe Pt may still contain some points which do not belong to the observed object. Therefore, each point is checked for certain criteria whether it follows the estimated motion.
  • Each point is transformed back according to the hypothesis, so it aligns to the accumulated raw data and compared with its neighbor.
  • the comparing step takes into account different properties of LiDAR and camera data and assigns the motion to the correct points.
  • the remaining points, which do not fulfill the specified criteria, are further processed in the 2D correspondence finding block 17.
  • the image space is used to find robust correspondences, without having any prior knowledge about the 3D motion.
  • the update block only computes a new optical flow update because the accumulated data I* t-i is empty at the beginning.
  • no correspondences neither in 3D space nor in 2D space will be found, this in turn, will lead to no velocity estimation and only the raw subframe cloud is added to the accumulated information buffer.
  • the accumulated information A t-i will not be empty anymore. This time, the accumulated information can be propagated according to the optical flow in 2D space. But the 3D propagation part is still missing since the points have no motion assigned.
  • the disclosed embodiments are a marked departure from conventional approaches to the technical problem.
  • algorithms for perception and sensor fusion algorithms were developed on datasets with limited framerates.
  • the most popular example for this is the Kitti dataset, which the above conventional approaches often refer to.
  • Camera and LiDAR operate at 10 HZ in this dataset.
  • conventionally the problem of dynamic objects has been conventionally assumed to be negligible in various implemented applications.
  • one reason the artifact of dynamic motion compensation has not been addressed explicitly is the conventional approaches to sensor configuration and the low relative speeds in the dataset.
  • Disclosed embodiments use a sensor setup that generates significantly more camera images while one scan of LiDAR is performed.
  • conventional methods merely consider only one image per one LiDAR frame.
  • disclosed embodiments do not rely on correct matching between the pointcloud and the image on a full LiDAR frame. Rather, splitting the data into smaller chunks enables the ability to cope with very high relative velocities because the matching error is only in the subpixel area.
  • Disclosed embodiments are not limited to any specific LiDAR sensor. All that is required is a scanning LiDAR that gives timestamps for each point, which is the case for all the conventional, commercially available LiDAR sensors.
  • a temporal upsampled LiDAR data is sampled to the frequency of the camera data, which is consistent with the real environment.
  • Disclosed embodiments provide low level sensor fusion between asynchronous sensors that overcomes the problems of obtaining a consistent environment model by explicitly addressing the artifacts introduced by dynamic objects.
  • Disclosed embodiments leverage optical flow computations on a high framerate camera combined with pointcloud processing of LiDAR data.
  • the LiDAR data processing may be a combination of sub-frame processing and full LiDAR frame processing, which enables accurate matching of LiDAR points to any part of the environment as seen in the images. This may provide a consistent and complete three dimensional model of the environment at the framerate of the camera.
  • three dimensional velocity estimations may be derived from the pointcloud data. This is a useful tool and information for consecutive processing like trajectory management. With an accurate estimation where an object is located in three dimensional and where it is heading towards at what speed, plus the dense pixel information provided by an image, the presented approach is an extension to current algorithms that do not explicitly take into account the dynamic motion artifacts.
  • Disclosed embodiments are based on a setup of a high framerate camera and a high resolution LiDAR. The setup can further be aided by Global Positioning System (GPS) and Inertial Measurement Unit (IMU) technology but this technology is not central to the inventive concepts.
  • GPS Global Positioning System
  • IMU Inertial Measurement Unit
  • the attached Appendix includes source code that provides insight how the actual fusion of the 10 Hz LiDAR and 100 Hz camera may be performed in accordance with at least one embodiment.
  • the Appendix contains only code snippets of the main functions. Therefore, this code is not compilable.
  • control and cooperation of the above-described components may be provided using software instructions that may be stored in a tangible, non-transitory storage device such as a non-transitory computer readable storage device storing instructions which, when executed on one or more programmed processors, carry out the above-described method operations and resulting functionality.
  • a tangible, non-transitory storage device such as a non-transitory computer readable storage device storing instructions which, when executed on one or more programmed processors, carry out the above-described method operations and resulting functionality.
  • non- transitory is intended to preclude transmitted signals and propagating waves, but not storage devices that are erasable or dependent upon power sources to retain information.
  • This source code provides the insights how the actual fusion of the 10 Hz Lidar and 100 Hz camera is done. It contains only code snippets of the main functions. Therefore, this code is not compilable.
  • pci console: : setVerbosityLevel(pcl: : console: :L_ALWAYS);
  • This functions splits up a full lidar frame into smaller subframes and outputs the indices of each split.
  • the goal of the splitting is to minimize the temporal difference between
  • the sampling time of each point in the subframe and the capturing time of the corresponding image is split in the middle of the time interval of consecutive images.
  • LidarMotionFrame ⁇ LIDAR_SIZE> const& lidar srcData.lidarFrame;
  • imgTS[i] srcData. cameraFrames[i] .timestamp;
  • lidarSampleData sampleData cvt2LSD(lidar, eye, i, j, USE CAM REF, 0); //Get timestamp of the (i,j)th lidarSample
  • int64_t dtime_img imgTS[d[eye]+l]-imgTS[d[eye]];
  • LidarMotionFrame ⁇ LIDAR_SIZE> const& lidar srcData.lidarFrame;
  • imgTS[i] srcData.cameraFrames[i].timestamp
  • lidarSensorStruct const& srcSensor srcData.lidarFrame.sensors[0];
  • proj cam * proj
  • proj _ve ctor push_back(proj ) ;
  • cam(0,0) 0.5 * cam(0,0);
  • cam(l,l) 0.5 * cam(l,l);
  • cam(0,2) round(0.5 * cam(0,2));
  • cam(l,2) round(0.5 * cam(l,2)
  • proj cam * proj_copy
  • cv::Mat tmp cv::Mat(2048, 640, CV_8UC1);
  • lidarSampleData sampleData cvt2LSD(lidar, eye, i, j, _USE_CAM_REF, 0);
  • t.x_raw pt.x
  • t.z_raw pt.z
  • t.timestamp dstTime
  • Each existing point in the 2D propagated pointcloud is temporally aligned to current frame. This can be achieved by shifiting each 2D point according to the optical flow
  • the 2D propagated pointcloud is later on needed to find correspondence in the image space.
  • cv::Point2f& fxy m_OpticalFlow.curr_flow.at ⁇ cv::Point2f>(tmp.y_OF, tmp.x_OF);
  • tmp.x_OF tmp.x_OF + fxy.x;
  • tmp.y_OF tmp.y_OF + fxy.y;
  • tmp. flow sqrt(fxy.x * fxy.x + fxy.y * fxy.y);
  • the raw accumulated pointcloud is the accumulated raw 3D points from multipe subframes and covers the time interval of a full lidar frame.
  • new_pt.x pt.x + d_t * vx;
  • each point of the new incoming subframe cloud is classified as static or dynamic. For the further processing only dynamic points are considered.
  • the remaining points which do not fulfill the motion hypothesis test, are further processed in the 2D correspondence finding block. These remaining points are projected on the image and checked within a defined radius for existing points in the 2D propagated pointcloud. Trough the connection between
  • the SHOT descriptor for each point in the new subframe is computed. Further, each point of the new incoming subframe cloud is
  • Poxel_N tl this->Fusion_data.subframe[i];
  • float l_v_of sqrt(x_disp * x_disp + y_disp * y_disp);
  • the 3D propagated cloud is used to find matching points in lidar space, while each pair of points is weighted with the corresponding optical flow.
  • std vector ⁇ int> pointldxNKN Search(K) ;
  • float range sqrt(pt.x * pt.x + pt.y * pt.y + pt.z * pt.z);
  • curr_distance shot.compare(index_cloud, checkpoints);
  • corr.index_match j ;//pointIdxNKNSearch[0] ;
  • a RANSAC algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. */
  • Poxel_N tl this->Fusion_data.subframe[i];
  • cv::Matx31f coord cv::Matx31f(res(0,0), res(l,0), res(2,0));
  • searchPoint.x proj_point(0)
  • searchPoint.y proj_point(l);
  • int neighbors_inlier kdtree_inlier.radiusSearch(pt, d(range), pointIdxNKNSearch_inlier, pointNKNSquaredDistance_inlier, 10);
  • prob compute_probability(point, res, neighbors, pointldxNKNSearch, reflectivity, intensity);
  • prob prob * inlier * 0.5;
  • weight compute_OF_weight(sub_data);
  • a RANSAC algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. */
  • Poxel_N tl this->Fusion_data.subframe[i];
  • cv::Matx31f coord cv::Matx31f(res(0,0), res(l,0), res(2,0));
  • searchPoint.x proj_point(0)
  • searchPoint.y proj_point(l); if(!isnan(searchPoint.x) && !isnan(searchPointy) && !isnan(searchPointz))
  • prob compute_probability(point, res, neighbors, pointldxNKNSearch, reflectivity, intensity);
  • prob prob * inlier * 0.5;

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Electromagnetism (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A system, methodologies and components apply dynamic motion compensation to enable asynchronous fusion between LiDAR data and camera data. This involves using a processor coupled to a high framerate camera and a LiDAR sensor system for receiving high framerate camera generated image data and LiDAR sensor generated pointcloud data. The processor runs software to estimate three dimensional velocity of dynamic objects included in a scene sensed by the high framerate camera and the LiDAR sensor system to perform the dynamic motion compensation. The estimated three dimensional velocity of the dynamic objects may be used for transportation vehicle path planning and/or trajectory management.

Description

METHOD AND APPARATUS LOR DATA LUSION OL LIDAR DATA AND IMAGE
DATA
Field:
The present invention relates generally to a method and an apparatus for data fusion of LiDAR data and image data, and more particularly to sensor fusion in transportation vehicles for obtaining a consistent environment model for assisted or automated driving.
Background:
Autonomous driving allows vehicles such as self-driving cars to move safely with little or no human input based on a variety of sensors, which perceive their surroundings. Depending on the amount of driver intervention and attentiveness required, classification systems define different levels, typically ranging from fully manual to fully automated systems. For example, the standard SAE J3016 defines six such levels, wherein for levels 0 to 2 the human driver still monitors the driving environment with no automation at all, some driver assistance or partial automation and for levels 3 to 5 an automated driving system monitors the driving environment and performs the driving tasks with different levels of possible driver intervention.
LiDAR sensors and cameras are often used together in the fields of driver assistance systems and autonomous driving, as these sensors complement each other well thanks to their different sensor characteristics. Cameras generate image data by capturing light intensities and typically sample the scene at once but only provide a 2D projection of the scene without 3D information. Opposed to a camera, a LiDAR sensor directly determines 3D distances for each sample point of the generated LiDAR pointcloud, however, as compared to a camera, with a low spatial and temporal sampling frequency. Therefore, the combination of these two complementary sensors can lead to a better understanding of the environment.
Conventional approaches to fusing LiDAR pointcloud data and camera generated image data have been implemented in various works in the fields of driver assistance systems and autonomous driving. The underlying motivation for these efforts is to provide more accurate and complete environment models for use in transportation vehicles, allowing more reliable perception systems to ultimately ensure safer and more comfortable driving functionalities beyond the capabilities of systems that are currently used for level 2 functions in series development transportation vehicles. Current level 2 applications do not demand a highly accurate alignment of data from different sensors because the functionality does not require it.
For example, Sebastian Schneider et al.,“Fusing vision and lidar - synchronization, correction and occlusion reasoning," 2010 IEEE Intelligent Vehicles Symposium, pp. 388- 393, 2010, proposed a framework to create a color point cloud by fusing camera and image data. More specifically, the synchronization of a camera with a 360-degree rotating LiDAR is discussed, whereby all the samples, which are in the field of view of the camera, may be taken. Also the concept of ego-motion compensation using integrated IMU data is discussed. In addition, it is disclosed how occlusion is explicitly handled similar as in computer graphics with a z-buffer concept.
Jennifer Dolson et al.,“Upsampling range data in dynamic environments," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1141-1148, 2010, addresses upsampling of range data in dynamic environments by taking image information into account. This method is similar to a bilateral filter, which is a widely used technique in image filtering. A 6-dimensional position vector is constructed consisting of the color value (RGB), pixel position (u, v) and time. The unknown range for a specific position is predicted as a weighted sum of neighboring range values. The weight is dependent on the 6-D distance between its neighbors. For low relative velocities, for example, with cars travelling on a highway in the same direction and at almost the same speed as the ego- vehicle, this algorithm appears to work well. However, if high relative velocities occur, that approach will lead to inconsistent matching of three-dimensional points and, in turn, will lead to a wrong upsampling.
David Held et al.,’’Precision tracking with sparse 3d and dense color 2d data," in 2013 IEEE International Conference on Robotics and Automation, May 2013, pp. 1138-1145 and David Held et al.,“Robust Real-Time Tracking Combining 3D Shape, Color, and Motion,” in The International Journal of Robotics Research, August 2015, pp. 30-49 are directed to a different approach. In particular, the latter proposed fusing laser and camera data to estimate velocities of dynamic objects. For each tracked object, colored points were accumulated. The data was associated in a probabilistic framework by maximizing the velocity posterior. For this, a grid-based sampling method was used to sample the state-space to align accumulated colored point cloud with the next incoming frame. However, if the relative velocities got higher and the scanning pattern dependent distortion increased, the result was an overall distorted accumulated point cloud, which failed to correspond to the real form present in the scene. The whole dynamic artifacts introduced through the scanning nature of the LiDAR were not treated.
M. H. Daraei, et al.,“Region Segmentation Using LiDAR and Camera,” 2017 IEEE
Intelligent Transportation Systems Conference (ITSC), October 2017, and M. H. Daraei, et al.,“Velocity and shape from tightly-coupled lidar and camera,” 2017 IEEE Intelligent Vehicles Symposium, June 2017, focused on a similar approach by proposing a method for object tracking and reconstruction by fusing LiDAR and image data. In that approach, an intermediate surface was used for combining both modalities. The image likelihood term was a function of the appearance model, the new image and the three dimensional velocity. The image could then be related to the appearance model by shifting the model according to the projected three-dimensional velocity on the image. As a result, a photometric error could be computed as a function of the velocity. The LiDAR likelihood term was a function of the new incoming point cloud, the intermediate surface representation and the velocity. According to the velocity, the estimated surface could be transformed and an error of the alignment with the incoming point cloud could be computed. By minimizing this multi-modal cost function, one would get the optimal value for the velocity estimation. However, if the relative velocities exceed tolerable levels and the correct overlay of the point cloud and image could not be guaranteed, it leads to a wrong estimation of the appearance model and accordingly to inaccurate velocity estimation.
R. Omar Chavez-Garcia et al.,“Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking,” IEEE Transactions on Intelligent Transportation Systems, vol. PP, no. 99, pp. 1-10, 2015 addressed the problem of detecting and tracking dynamic objects in the environment by using radar, LiDAR and camera. Their approach detected and classified the objects in each sensors’ modalities separately and, at the end, fused the object list. However, that proposed fusion approach would not provide reliable estimation without the additional radar sensor. The approach of S. Wang et ah,“Dynamic detection technology for moving objects using 3d lidar information and rgb camera," in 2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW), June 2017, pp. 37-38, fused LiDAR and RGB camera data as well for detecting and tracking moving objects. In that approach, the data was segmented in image and LiDAR space. Object proposals from the two dimensional image was used to cluster the three-dimensional space and to keep only the object points. Subsequently, in both sensor modalities, the objects were detected independently. In the image space, a Fast R- CNN was used to detect the objects. In LiDAR space, a trained linear Support Vector Machine was used to classify the Spin Image LiDAR descriptors. However, the weakness of this approach, even before detection, is the image dependent LiDAR clustering. If there was a wrong matching between LiDAR and camera data, object points were thrown away although they were a part of the object.
In X. Chen et ah,“Multi-view 3d object detection network for autonomous driving," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6526(6534, 2017, Multi-View Three Dimensional (MV3D) networks were presented for detecting vehicles and outputting oriented three dimensional bounding boxes by fusing LiDAR and RGB images. The overall network could be split into two subnetwork. First, the three dimensional proposal network estimated three dimensional candidate boxes. Second, the region-based fusion network projected these candidates to different views and extracts features. In conclusion, this learning-based approach could detect three-dimensional bounding boxes, but the average precision on the KITTI dataset in a difficult scene was only 79.81%. Further, this network did not output the three dimensional LiDAR aligned with the image, instead the oriented bounding box, which approximately covered the contour of the vehicle. Additionally, no temporal information, like velocities, was taken into account.
All of the mentioned works fuse the data coming from camera and LiDAR without explicitly taking into account the problem of inconsistent overlay of three dimensional points and image information when high relative velocities are present. This assumption generally is not a direct limitation for those approaches, as they are only considering low relative velocities between the ego-vehicle and the dynamic objects in the scene.
E. Ilg et ah,“Reconstruction of rigid body models from motion distorted laser range data using optical flow," in IEEE International Conference on Robotics and Automation (ICRA), 2014 has addressed the issue of motion distortion for a system using a single line laser scanner on a nodding, stationary platform.
Summary:
The following presents a simplified summary in order to provide a basic understanding of some aspects of various invention embodiments. The summary is not an extensive overview of the invention. It is neither intended to identify key or critical elements of the invention nor to delineate the scope of the invention. The following summary merely presents some concepts of the invention in a simplified form as a prelude to the more detailed description below.
According to the present disclosure, a method and apparatus is provided that addresses the problem of arbitrary relative velocities in order to enable sensor fusion processing in a wide range of configurations.
According to a general aspect of at least one embodiment, a method for data fusion of LiDAR data and image data is presented, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, comprising:
capturing three-dimensional LiDAR data of the scene with a scanning LiDAR sensor; capturing image data corresponding to a two-dimensional projection of the scene with an image sensor having a higher framerate as compared to the framerate of a complete scan of the LiDAR sensor;
estimating based on the captured image data and/or LiDAR data a three-dimensional velocity of a dynamic object present in the scene;
compensating the motion of the dynamic object; and
fusing the motion compensated LiDAR data and image data.
In one preferred embodiment, the method further comprises:
generating a LiDAR subframe including LiDAR samples covering a part of the scene during the frame period of an image sensor generating the image data; and
accumulating LiDAR samples of multiple LiDAR subframes to generate an accumulated LiDAR point cloud corresponding to a complete scan of the sensed scene at the frame rate of the LiDAR sensor. Furthermore, in one preferred embodiment, the method further comprises:
projecting LiDAR samples of a LiDAR subframe on image data of a camera image; and
propagating the projected LiDAR samples according to a displacement information to generate a propagated point cloud.
Advantageously, propagating the projected LiDAR samples is based on an optical flow algorithm which determines the displacement information between consecutive camera images.
Furthermore, in one preferred embodiment, the method further comprises:
updating the propagated point cloud according to the optical flow information generated for a new camera image; and
checking the projected LiDAR samples of a new LiDAR subframe for a correspondence with existing points in the current propagated point cloud.
For this embodiment, it is preferred that a correspondence qualifies as valid if projected LiDAR samples of a new LiDAR subframe fall within a defined radius around an existing point.
In a further preferred embodiment, the method further comprises:
assigning an index to the LiDAR samples; and
keeping the index of the LiDAR samples while propagating the projected LiDAR samples.
Advantageously, LiDAR samples from multiple LiDAR subframes are accumulated to cover the time interval of a full LiDAR frame, wherein accumulated data older than this time interval are deleted after the processing of an image frame and new data for the current LiDAR subframe are added to the accumulated data. In addition, the method further comprises in a preferred embodiment:
computing LiDAR descriptors for dynamic LiDAR points in a LiDAR subframe point cloud, wherein a LiDAR descriptor describes a geometrical three dimensional structure of the local neighborhood of the dynamic points;
matching LiDAR points of a new LiDAR subframe with LiDAR points in the accumulated LiDAR point cloud using the LiDAR descriptors; and
propagating the matched LiDAR points of the new LiDAR subframe according to the estimated three dimensional velocity to generate a three dimensional propagated point cloud.
Furthermore, in one preferred embodiment, the method further comprises:
checking whether LiDAR samples in a new LiDAR subframe correspond to a dynamic or static object; and
keeping only LiDAR samples corresponding to dynamic points for the further processing.
Similarly, it is preferred that the method further comprises:
detecting a ground plane in the scene;
identifying LiDAR samples in a new LiDAR subframe corresponding to the detected ground plane; and
removing the identified LiDAR samples from the further processing.
Advantageously, the estimated three-dimensional velocity of the dynamic objects is used for transportation vehicle path planning and/or trajectory management.
According to another general aspect of at least one embodiment, an apparatus for data fusion of LiDAR data and image data is presented, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, the apparatus comprising:
a scanning LiDAR sensor (2) generating the LiDAR data;
a high framerate camera (3) generating the image data, wherein the image data are generated with a higher framerate as compared to the framerate of a complete scan of the scanning LiDAR sensor;
one or more processor coupled to the camera and the LiDAR sensor, wherein said one or more processors are configured to:
receive camera generated image data and LiDAR sensor generated point cloud data; estimate a three dimensional velocity of a dynamic object present in the scene;
compensate the motion of the dynamic object; and
fuse the motion compensated LiDAR data and image data.
According to a preferred embodiment, the apparatus further comprises:
one or more buffers coupled to the one or more processors for storing accumulated information related to the LiDAR data and the image data.
Finally, in at least one preferred embodiment a vehicle performs a method and/or comprises an apparatus as described above.
Brief Description of Figures:
Further advantages, features and possibilities of using the presently disclosed embodiments emerge from the description below in conjunction with the figures.
Figure 1 A illustrates a high resolution LiDAR sensor and a high framerate camera closely collocated for simultaneously capturing the environment from the same viewpoint.
Figure IB illustrates an arrangement comprising a high resolution LiDAR system and a high framerate camera mounted on the roof of a vehicle.
Figure 2A illustrates an example of a transportation vehicle, which passes an ego vehicle with unknown velocity and a silhouette surrounding distorted LiDAR samples
Figure 2B illustrates schematically several images captured by a camera and a single full LiDAR frame completely mapped on one of the images.
Figure 3A illustrates the transportation vehicle with a silhouette surrounding LiDAR samples corresponding to a LiDAR subframe.
Figures 3B illustrates schematically the LiDAR subframe falling in a time interval Dΐ around one of the camera images. Figure 4A illustrates an image of a transportation vehicle together with LiDAR samples, which correspond to the shape of the transportation vehicle but are displaced.
Figure 4B corresponds to Figure 2B.
Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle.
Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame.
Figure 6 illustrates a method for performing a velocity estimation for an asynchronous fusion of LiDAR data and camera data in accordance with at least one embodiment.
Detailed Description:
The disclosed embodiments are directed at providing a technical solution for the problem of asynchronous fusion between LiDAR data and camera data, provided by a combination of a LiDAR sensor and camera such as shown in figure 1.
Figure 1 A illustrates an arrangement 1 comprising a combination of a high resolution LiDAR system 2, such as a Luminar G2+ LiDAR sensor, and a high framerate camera 3, such as a Flir Grasshooper camera, closely collocated above each other for simultaneously capturing the environment from nearly the same viewpoint.
Figure IB illustrates the experimental arrangement 1 mounted on the roof of an ego vehicle 4. However, the illustrated arrangement is only an example of a prototype, in a production vehicle other arrangements of LiDAR sensor and camera to each other and also at other locations of the body of the vehicle are also possible.
More specifically, the above mentioned technical problem occurs because conventionally known global shutter cameras on the one hand and scanning LiDAR systems on the other hand capture their environment in different ways. Global shutter cameras capture a two dimensional projection of their environment, commonly known as an“image,” in an interval in time defined by the shutter time of that particular camera. Thus, within that timespan - usually 1/304 of a second up to 1 /2000th of a second (depending on the lighting conditions of the scene), - the entire scene is captured in image data wherein every pixel in the image data is exposed at the same time.
Capturing three-dimensional information for a scene with a scanning LiDAR system is fundamentally different. This is because conventional scanning LiDAR systems continuously scan the scene within a defined time interval, e.g., for most conventional LiDAR systems within 100ms corresponding to a 10 Hz scan rate. Thus, the scene is not captured for every LiDAR point in the same time interval as happens with a conventional image camera. More specifically, within 1/10th of a second, the LiDAR beam is steered into various locations to locally sample the scene and only after 1/10th of a second covers the same location again.
While this effect is not of concern if the observed scene is completely stationary with no movement in it, it becomes a challenging problem if there are dynamic objects, i.e., moving objects, present. This is because the velocity of those objects is unknown. As a result, it becomes difficult, if not impossible to correlate LIDAR generated point cloud data with conventional camera image data when moving objects are present in a scene to be imaged.
This issue is a particular problem in the fields of driver assistance systems and autonomous driving. As explained above, conventional approaches to fusing LiDAR pointcloud data and camera generated image data have been implemented with the hopes of providing more accurate and complete environment models for use in transportation vehicles, allowing more reliable perception systems to ultimately ensure safer and more comfortable driving functionalities beyond the capabilities of systems that are currently used for level 2 functions in series development transportation vehicles. Current level 2 applications do not demand a highly accurate alignment of data from different sensors because the functionality does not require it. However, for level 3+ levels of automation, this is fundamentally different.
This issue of asynchronous sensor fusion is illustrated in Figures 2A-2B. Figure 2A illustrates an example of a transportation vehicle 5, which passes a not shown ego vehicle with unknown velocity. The individual LiDAR samples are not individually shown in the figure, but are located within silhouette 6. As can be seen, the silhouette does not match with the actual appearance of the car in the image frame, meaning that the LiDAR samples are distorted. This distortion results from the different capturing mechanisms, as mentioned above and further illustrated in Figure 2B.
Figure 2B schematically show six progressive full images 7, each captured by a camera within a short timespan at a particular point in time t. Over the same period of time where the shown six full images are captured, a single full LiDAR frame 8 is determined by scanning the same scene. The determined LiDAR frame 8 is then mapped on the last captured image at time ti, as symbolized by the arrows. In this way, artifacts are introduced by the unknown dynamic motion of objects in the scene. Such artifacts are called“dynamic motion artifact” in the following and render the resulting data less useful for the generation of environment models.
Disclosed embodiments address this technical problem by applying a“dynamic motion compensation,” which leverages the functionality of a high framerate camera, e.g. operating at a 100Hz framerate, in combination with a LiDAR sensor operating at e.g. 10Hz repetition rate. I.e., in these embodiment the high framerate camera has a frame rate at least 10 times faster than the LiDAR sensor’s frame-rate. However, other ratios between the camera framerate and the LiDAR framerate are also possible as long as the camera framerate is significantly higher than the LiDAR framerate.
Figures 3A-3B illustrate how, by using a high framerate camera the misalignments between camera data and LiDAR data becomes negligible. Instead of mapping all LiDAR samples corresponding to a complete scan on a particular image frame, for each of the image frames only a fraction of the total LiDAR samples are considered for the fusion of three dimensional samples and camera image data. I.e. only LiDAR samples are considered that fall into a time interval of consecutive images, located in Figure 3A within silhouette 9, which covers due to the used scan pattern only a vertical fragment of the vehicle which is roughly parallel to the road surface. Correspondingly, at time ti in figure 3B only the fraction of the LiDAR frame 8 is taken into account, which falls in the time interval Dΐ around ti. The length of the time interval Dΐ is dependent on the processing framerate and may in particular correspond to the inverse of the camera framerate. Disclosed embodiments further estimate the three dimensional velocity of the dynamic objects in order to perform the dynamic motion compensation. Thus, disclosed embodiments’ estimation of the three dimensional velocity provides a valuable input for consecutive processing operations such as path planning and trajectory management. Accordingly, disclosed embodiments enable creation of a consistent environment model, where LiDAR points and camera generated image data are consistent. In this way, pixel data in the camera generated image data and pointcloud data generated by the LiDAR sensor are in agreement at all times.
Figure 4A illustrates raw alignment of a dynamically moving car. As can be seen clearly, the points representing the LiDAR measurements correspond to the shape of the car but are displaced. Figure 4B, being identical to Figure 2B, illustrates this distortion.
Figure 5 A illustrates that after application of the disclosed method the LiDAR samples are in better alignment with the image of the transportation vehicle. The estimated motion of the vehicle is indicated in the figure with arrow 10. The improved alignment is achieved by accumulating and updating data for several images frames as schematically shown in Figure 5B.
Figure 5B illustrates schematically that LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. LiDAR samples from multiple subframes are accumulated to cover the time interval of a full LiDAR frame. In the accumulated data At-1 those data being older than the time interval are deleted at the end of the overall processing of a frame at time ti and at the same time new data for the current subframe are added to the accumulated data.
Between each image, the displacements of brightness patterns may be calculated through a method called“optical flow.” The output of optical flow provides information where intensity patterns have shifted in consecutive image frames; ideally, this corresponds to the actual projection of motion in the observed three-dimensional scene. With this information, one can propagate LiDAR samples on the image plane and/or limit the search space to find corresponding LiDAR samples in consecutive LiDAR frames. This can be used to resolve ambiguities in the LiDAR pointcloud processing as well as speed up algorithms for data association. The image is not only the most dense and accurate representation of the environment, but also is a helping construct for all LiDAR processing.
Accordingly, at each incoming image frame, only LiDAR samples are considered that fall into the time interval of consecutive images. This subset of points is then projected on the image and propagated according to the optical flow of each new incoming image. The raw subset of points may be accumulated until a certain number of LiDAR sub frames is reached. This number may be defined by the number of points that go into a full LiDAR scan.
Additionally, the propagated 2D points may be saved in a separate buffer, while the original indexing may be kept so the propagated points can be related to the raw three-dimensional points.
In the next image frame, the new arriving subset of LiDAR points may be projected on the image. Before doing any further processing on the new subset of LiDAR points, the propagated two-dimensional pointcloud may be updated according to the optical flow generated by the new image. Then each point of the new arrived sub set may be checked for existing points in the current two-dimensional propagated pointcloud within a defined radius that might qualify as a valid correspondence.
Following that scheme, one can connect between valid points in the new subset and in the two-dimensional propagated pointcloud. Since the indexing in the propagated pointcloud is kept, the connection to the raw accumulated pointcloud is known as well. The i-th point in the propagated pointcloud origins from the i-th point in the raw accumulated pointcloud.
The relationship between points in three-dimensional space between the just arrived sub- frame data and the accumulated data is always known. This process tremendously increases the performance in correspondence finding between LiDAR subframes because only a limited number of points need to be checked as opposed to the whole accumulated pointcloud.
The correspondences from the previous processing operation may be used as input for a RAN SAC algorithm to reject outliers during the process. With valid correspondences and their respective displacements, a final transformation can be obtained. Through the known time interval between LiDAR samples, the computed transformation can be related to a three-dimensional velocity. Each point in the subframe cloud may be checked against the motion hypothesis by transforming this point back according to the hypothesis so that the process aligns to the accumulated raw data. Further, the points may be compared using certain criteria defined by the neighboring region in the accumulated three dimensional pointcloud.
The comparison operation may take into account the LiDAR reflectivity, color intensity, range value and the number of inlier correspondences within a defined radius. If a given point fulfills these criteria, the specific motion will be assigned to it. Subsequently, the processed subframe may be added to the three dimensional accumulated pointcloud containing the information of motion in the scene.
Thereafter, some points in the accumulated pointcloud may have a motion assigned.
Accordingly, as soon as a new data frame arrives, the moving points in the accumulated pointcloud may be propagated in three dimensional space to the current time according the assigned motion and saved in a similar fashion as the two dimensional propagated pointcloud with the property of keeping the indexing. This propagation operation results in
compensation of the dynamic motion, since these points corresponding to moving objects are shifted to the current time of the frame.
Next, the two dimensional propagated pointcloud may be updated according to the latest data, as explained above. Subsequently, the new subset of LiDAR points may be first compared to the three dimensional propagated pointcloud. Again, in a similar fashion as the
correspondence finding on the image, each point in the new subset may be checked for existing in the three dimensional propagated pointcloud within a defined distance. If a point exists, the connection to the raw accumulated pointcloud can be established by using the index information from the three dimensional propagated pointcloud.
Before the outlier rejection by the RAN SAC algorithm is applied to the matching point from the previous iteration, each correspondence may be weighted by the similarity to the optical flow derived from the images until the current frame. In this configuration, optical flow may be calculated for 10 frames captured at 100Hz to weigh the LiDAR samples generated at 10 Hz. The obtained transformation from the valid correspondences may provide a new motion hypothesis and each point may undergo the verification operation using the above mentioned criteria (i.e., LiDAR intensity, color intensity, range value and number of inlier correspondences in the neighborhood).
If a point passes this analysis, it is assigned to the motion hypothesis, as explained above. The points, which do not follow the motion hypothesis, are further processed in the same manner described above, wherein the correspondences are found on the image. After all of this processing is performed, the new subset of LiDAR data may again be added to the accumulated pointcloud with updated motion information.
Figure 6 illustrates a method for performing a velocity estimation for an asynchronous fusion of LiDAR data and camera data in accordance with at least one embodiment.
The processing works at the framerate of the camera, i.e. new data arrive for the algorithm with the framerate of the camera, e.g. at 100 Hz. The process takes as inputs the current image frame It, a LiDAR subframe Pt containing a fraction number of points of the full LiDAR frame and the accumulated data I*t-i accumulated during the previous processing of the last 10 frames. The output of the illustrated processing is a motion estimate for the incoming subframe cloud. The information obtained in the current frame is saved in the accumulated data I*t.
The accumulated data I*t-1 contains data related to the optical flow at t-1, the raw
accumulated point cloud, 2D propagated data and the 3D propagated point cloud. The raw accumulated cloud contains raw 3D points from multiple subframes Pt-10; Pt-9; Pt-1 and covers the time interval of a full lidarframe, e.g. 0.1 seconds. The data which is older than 0.1 seconds are deleted at the end of the overall processing of the image frame at time t and at this time the new raw subframe Pt is added to the accumulated cloud. The 2D propagated data consists of 2D projected points and covers the same time interval as the raw accumulated cloud and contains the same number of points as well. In contrast to the raw accumulated cloud, the 2D points are temporally aligned to the last frame. The 3D propagated pointcloud consists of the 3D points from multiple subframes, same as the the raw accumulated point cloud. But the points are propagated according to the velocity obtained from the previously processed frames, so each point of the propagated pointcloud is temporally aligned after the propagation. First of all, the above mentioned input data provided by block 11 are supplied to the update block 12. In update block 12 a correction of the optical flow to the newly acquired image It and the propagation of the accumulated points in 2D by using the optical flow process, and in 3D according to the computed motion if there is any available is executed.
In the preprocessing step 13, first the ground points are removed to save computation time, then each sample of the new LiDAR subframe is checked whether it belongs to a dynamic or static object. Only dynamic points are further processed. Lastly, in the preprocessing step LiDAR descriptors are computed, which are required for the matching step.
In the 3D correspondence finding block 14, matching points between the new subframe Pt and the accumulated data I*t-i are determined. For that, the motion information obtained in previous processing frames are used to find correspondences directly in 3D space.
Out of the found correspondences, the optimal transformation is determined in the 3D velocity estimation block 15. Then, the obtained transformation is related to the time difference between the corresponding points to get a 3D velocity value.
In the next block 16, the motion hypothesis is tested on each dynamic point in the subframe Pt. This check of the motion hypothesis is performed since even though the output of the correspondence finding block is a transformation Tt which best aligns the new subframe Pt to the accumulated pointcloud rt-io:t-i, subframe Pt may still contain some points which do not belong to the observed object. Therefore, each point is checked for certain criteria whether it follows the estimated motion.
Each point is transformed back according to the hypothesis, so it aligns to the accumulated raw data and compared with its neighbor. The comparing step takes into account different properties of LiDAR and camera data and assigns the motion to the correct points. The remaining points, which do not fulfill the specified criteria, are further processed in the 2D correspondence finding block 17. In this step, the image space is used to find robust correspondences, without having any prior knowledge about the 3D motion.
Then, similar to before, the optimal transformation is extracted from the found
correspondences and a 3D velocity is computed in block 18. Finally, in block 19 the remaining dynamic points of the subframe Pt are checked if certain criteria are met and added to the accumulated information data with the current motion information.
At initialization, the update block only computes a new optical flow update because the accumulated data I*t-i is empty at the beginning. In the further processing steps no correspondences neither in 3D space nor in 2D space will be found, this in turn, will lead to no velocity estimation and only the raw subframe cloud is added to the accumulated information buffer. At the next processing frame, the accumulated information At-i will not be empty anymore. This time, the accumulated information can be propagated according to the optical flow in 2D space. But the 3D propagation part is still missing since the points have no motion assigned.
Therefore, again, no correspondences in 3D space can be found and most probably there will be neither any correspondences in 2D space, since the LiDAR scans at 10 Hz and it will approximately take 10 image frames until it scans the same part of the object again. After some frames, there will be 2D corresponding points found and for a first time, a velocity can be assigned to the points. From now on, the points can also be propagated in 3D space to find directly 3D correspondences. Every time a new obstacle is detected, it has a lag of one full LiDAR frame until the motion of the object can be detected.
The disclosed embodiments are a marked departure from conventional approaches to the technical problem. For a long time, algorithms for perception and sensor fusion algorithms were developed on datasets with limited framerates. The most popular example for this is the Kitti dataset, which the above conventional approaches often refer to. Camera and LiDAR operate at 10 HZ in this dataset. Moreover, conventionally the problem of dynamic objects has been conventionally assumed to be negligible in various implemented applications. Thus, one reason the artifact of dynamic motion compensation has not been addressed explicitly is the conventional approaches to sensor configuration and the low relative speeds in the dataset.
Further, it should be understood that the traditional frequency of the majority of sensors on a transportation vehicle, including the camera, is 25Hz. For a long time, research and development in the automotive space has been restricted to that framerate, which made explorations into high framerate cameras an exotic specialty. However, with more capable image sensors and improved processing, disclosed embodiments have been developed by challenging these conventional limitations.
Disclosed embodiments use a sensor setup that generates significantly more camera images while one scan of LiDAR is performed. Thus, conventional methods merely consider only one image per one LiDAR frame. To the contrary, disclosed embodiments do not rely on correct matching between the pointcloud and the image on a full LiDAR frame. Rather, splitting the data into smaller chunks enables the ability to cope with very high relative velocities because the matching error is only in the subpixel area.
Disclosed embodiments are not limited to any specific LiDAR sensor. All that is required is a scanning LiDAR that gives timestamps for each point, which is the case for all the conventional, commercially available LiDAR sensors.
Further, through processing performed in accordance with disclosed embodiments, a temporal upsampled LiDAR data is sampled to the frequency of the camera data, which is consistent with the real environment.
Disclosed embodiments provide low level sensor fusion between asynchronous sensors that overcomes the problems of obtaining a consistent environment model by explicitly addressing the artifacts introduced by dynamic objects. Disclosed embodiments leverage optical flow computations on a high framerate camera combined with pointcloud processing of LiDAR data. The LiDAR data processing may be a combination of sub-frame processing and full LiDAR frame processing, which enables accurate matching of LiDAR points to any part of the environment as seen in the images. This may provide a consistent and complete three dimensional model of the environment at the framerate of the camera.
Further, to enable this functionality, three dimensional velocity estimations may be derived from the pointcloud data. This is a useful tool and information for consecutive processing like trajectory management. With an accurate estimation where an object is located in three dimensional and where it is heading towards at what speed, plus the dense pixel information provided by an image, the presented approach is an extension to current algorithms that do not explicitly take into account the dynamic motion artifacts. Disclosed embodiments are based on a setup of a high framerate camera and a high resolution LiDAR. The setup can further be aided by Global Positioning System (GPS) and Inertial Measurement Unit (IMU) technology but this technology is not central to the inventive concepts.
The attached Appendix includes source code that provides insight how the actual fusion of the 10 Hz LiDAR and 100 Hz camera may be performed in accordance with at least one embodiment. However, the Appendix contains only code snippets of the main functions. Therefore, this code is not compilable.
It should be understood that the operations explained herein may be implemented in conjunction with, or under the control of, one or more general purpose computers running software algorithms to provide the presently disclosed functionality and turning those computers into specific purpose computers.
Moreover, those skilled in the art will recognize, upon consideration of the above teachings, that the above exemplary embodiments may be based upon use of one or more programmed processors programmed with a suitable computer program. However, the disclosed embodiments could be implemented using hardware component equivalents such as special purpose hardware and/or dedicated processors. Similarly, general purpose computers, microprocessor based computers, micro-controllers, optical computers, analog computers, dedicated processors, application specific circuits and/or dedicated hard wired logic may be used to construct alternative equivalent embodiments.
Moreover, it should be understood that control and cooperation of the above-described components may be provided using software instructions that may be stored in a tangible, non-transitory storage device such as a non-transitory computer readable storage device storing instructions which, when executed on one or more programmed processors, carry out the above-described method operations and resulting functionality. In this case, the term non- transitory is intended to preclude transmitted signals and propagating waves, but not storage devices that are erasable or dependent upon power sources to retain information.
Those skilled in the art will appreciate, upon consideration of the above teachings, that the program operations and processes and associated data used to implement certain of the embodiments described above can be implemented using disc storage as well as other forms of storage devices including, but not limited to non-transitory storage media (where non- transitory is intended only to preclude propagating signals and not signals which are transitory in that they are erased by removal of power or explicit acts of erasure) such as for example Read Only Memory (ROM) devices, Random Access Memory (RAM) devices, network memory devices, optical storage elements, magnetic storage elements, magneto optical storage elements, flash memory, core memory and/or other equivalent volatile and non-volatile storage technologies without departing from certain embodiments. Such alternative storage devices should be considered equivalents.
While certain illustrative embodiments have been described, it is evident that many alternatives, modifications, permutations and variations will become apparent to those skilled in the art in light of the foregoing description. Accordingly, the various embodiments of, as set forth above, are intended to be illustrative, not limiting.
Appendix
This source code provides the insights how the actual fusion of the 10 Hz Lidar and 100 Hz camera is done. It contains only code snippets of the main functions. Therefore, this code is not compilable.
*/
class DynamicMotionEstimator
{
public:
Sceneframe Fusion_data;
ofFilter m_OpticalFlow;
clustering m_Object;
REGISTRATION : :Transformation_Estimator T_Estimator;
DynamicMotionEstimator() {
auto & smSettings = SettingsManager<FusionSettings>::Current();
smS ettings . Lo ck() ;
FusionSettings settingsFusion = smSettings. settings;
smS ettings . Unlo ck() ;
pci: : console: : setVerbosityLevel(pcl: : console: :L_ALWAYS);
_CamParams = settingsFusion._CameraParams;
_lidParams = settingsFusion. rdarParams;
}
~DynamicMotionEstimator() { }
/* This functions splits up a full lidar frame into smaller subframes and outputs the indices of each split. The goal of the splitting is to minimize the temporal difference between
the sampling time of each point in the subframe and the capturing time of the corresponding image. Therefore the data is split in the middle of the time interval of consecutive images.
*/
void Lidardatasplit(lidarFusionFrame const& srcData )
{
//get the index which splits the data to the corresponding images
int numlmages = srcData. cameraFrames.size();
LidarMotionFrame<LIDAR_SIZE> const& lidar = srcData.lidarFrame;
std::vector<uint64_t> imgTS;
imgTS .reserve(numlmages+ 1 ) ;
//get Image timestamp
for(int i = 0; i<srcData.cameraFrames.size(); ++i )
{
imgTS[i] = srcData. cameraFrames[i] .timestamp;
}
this->start_index. clear() ;
this->start_index.resize(2);
this->start_index[0].reserve(numImages+2);
this->start_index[ 1 ] .reserve(numImages+2);
this->start_index[0] push_back(0);
this->start_index[ 1 ] .push_back(0);
int d[2] = {0,0};
for (int j = 0; j < LidarSizeY; ++j)
{
for (int i = 0; i < LidarSizeX; ++i)
{
for(int eye = 0; eye<2; eye ++)
{
lidarSampleData sampleData = cvt2LSD(lidar, eye, i, j, USE CAM REF, 0); //Get timestamp of the (i,j)th lidarSample
auto dstTime = sampleData.timestamp;
//project points to specific image frame
if(srcData.cameraFrames.size()>d[eye])
{
//calculate delta time values
if(srcData.cameraFrames.size()-l != d[eye])
{
int64_t dtime_img = imgTS[d[eye]+l]-imgTS[d[eye]];
int64_t dtime = dstTime - imgTS[d[eye]];
//decide if it belongs to current or next image
#ifdef Simulation
if(dtime > (double) dtime_img)
{
d[eye]++;
start_index[eye] .push_back(j *LidarSizeX+i);
}
#else
if(dtime > 0.5 *(double) dtime_img)
{
d[eye]++;
start_index[eye] .push_back(j *LidarSizeX+i);
}
#endif
}
}
}
}
}
for(int j = 0; j< 2; j++)
{
while(this->start_index[j].size() != srcData.cameraFrames.size()+l)
{
this->start_index[j].push_back(LidarSizeX*LidarSizeY);
}
}
}
/* This functions considers only the new incoming subframe and its corresponding image. The 3D lidar points are first transformed into world coorindate system. This
would correspond in our case to the camera cooridnate system. Then each point is projected on the image and the 2D locations and the color values are saved.
Further, the reflectivity values of the lidar samples are saved as well. Only points which do not belong are further considered.
All this data is needed to compute the new velocity update for the current frame.
*/
void project_subframe(lidarFusionFrame const& srcData, int it)
{
//Extract the one lidar frame attached to the struct
LidarMotionFrame<LIDAR_SIZE> const& lidar = srcData.lidarFrame;
Pixellmage<uint8_t,l> const &image = srcData.cameraFrames.back().images[0]; int numlmages = srcData.cameraFrames.size();
std::vector<uint64_t> imgTS;
imgTS .reserve(numlmages+ 1 ) ;
//get Image timestamp
for(int i = 0; i<srcData.cameraFrames.size(); ++i )
{
imgTS[i] = srcData.cameraFrames[i].timestamp;
}
std: :vector<cv: :Matx34f> proj_vector;
std: :vector<cv: :Matx34f> world_proj ;
std: :vector<std: :vector<cv: :Matx34f>> mod_proj ;
lidarSensorStruct const& srcSensor = srcData.lidarFrame.sensors[0];
//save the rotation matrix for each eye
for(int eye = 0; eye<2; eye ++)
{
cv::Matx33frot;
cv::Matx3 If trans;
//lidarSensorStruct const& srcSensor = srcData.lidarFrame.sensors[s]; cv::Matx34fproj;
cv::Matx34f world_cam;
cv::Matx34f proj_copy;
_lidParams.rt[eye](cv::Rect(0,0,3,3)).copyTo(rot);
_lidParams.rt[eye](cv::Rect(3,0,l,3)).copyTo(trans);
cv::Matx33f cam = _CamParams.K;
_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(proj);
_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(world_cam);
_lidParams.rt[eye](cv::Rect(0,0,4,3)).copyTo(proj_copy);
proj = cam * proj;
//save projection matrix
proj _ve ctor . push_back(proj ) ;
world_proj .push_back(world_cam) ;
std: : vector<cv: :Matx34f> tmp_vec;
for(int i= 0; i<2; i++)
{
cam(0,0) = 0.5 * cam(0,0);
cam(l,l) = 0.5 * cam(l,l);
cam(0,2) = round(0.5 * cam(0,2));
cam(l,2) = round(0.5 * cam(l,2));
proj = cam * proj_copy;
tmp_vec .push_back(proj) ;
}
mod_proj .push_back(tmp_vec);
}
//create a vector of images
std::vector<cv::Mat> img_vector;
//undistort all the images
for(int n = 0; n< srcData.cameraFrames.size(); ++n )
{
cv::Mat raw;
cv::Mat undist; cv::Mat img;
Pixellmage<uint8_t,l> const &imagel = srcData.cameraFrames[n].images[0];
wrapPixelImage2Mat(const_cast<PixelImage<uint8_t,l>&>(imagel), raw);
cv::undistort(raw, undist, _CamParams.K, _CamParams.D);
cv::cvtColor(undist, img, cv::COLOR_GRAY2BGR);
img_vector.push_back(img);
}
//get indices of subframe
int index = 0;
//get current image & and init the OF class
cv::Mat tmp = cv::Mat(2048, 640, CV_8UC1);
cv::cvtColor(img_vector[it], tmp, cv::COLOR_BGR2GRAY);
m_OpticalFlow. set_current_frame(tmp) ;
tmp. copyTo(this->Fusion_data. img) ;
// Go through lidarData
for(int eye = 0; eye<2; eye ++)
{
//only a fraction of the whole lidar frame is considered
for (int k = this->start_index[eye][it]; k < this->start_index[eye][it+l]; k++)
{
int j = k / LidarSizeX;
int i = k - j *LidarSizeX;
lidarSampleData sampleData = cvt2LSD(lidar, eye, i, j, _USE_CAM_REF, 0);
//do transformation
cv: :Matx4 if coord = cv::Matx41f(sampleData.x, sampleData.y, sampleData.z, 1);
//map point on the image
cv: :Matx31 f pointH = proj_vector[eye] *coord;
//normalize coordinates
int x = round(pointH(0)/pointH(2));
int y = round(pointH(l)/pointH(2));
//compute world coordinate with camera as center of origin:
cv::Matx31fpoint3D = world_proj[eye]*coord;
//Get timestamp of the Lidar sample
auto dstTime = sampleData.timestamp;
//iterator for image
int d = it;
index = d;
//save all information of reflectivity, intensity, raw coordinates, pixel coordinates, timestamp if(srcData.cameraFrames.size()>d)
{
//save points on correct image
if(x >= 1 && x < _CamParams.S. width && y>=0 && y < _CamParams.S. height && sampleData.range < 250 && sampleData. segment != 0) //&& sampleData.range > 20)
{
int loc = (y * _CamParams.S. width + x);
//add points to pointcloud
pcl::PointCloud<pcl::PointXYZ>::PointType pt;
pt.x = point3D(0); pt.y = point3D(l); pt.z = point3D(2);
this->Fusion_data.subframe_cloud->push_back(pt); Poxel_N t;
t.x = x;
t.y = y;
t.x OF = x;
t.y OF = y;
t.x_raw = pt.x;
t.y_raw = pt.y;
t.z_raw = pt.z;
t.ax = 0;
t.ay = 0;
t.az = 0;
t.flow = 0.0;
t.range = sampleData.range;
t.reflectivity = 1 -sampleData.reflectivity;
t.intensity = tmp.at<uchar>(y,x);
#ifdef Simulation_timestamp
t.timestamp = imgTS[it];
#else
t.timestamp = dstTime;
#endif
this->F usion_data. subframe .push_back(t) ;
}
}
}
}
}
/* Each existing point in the 2D propagated pointcloud is temporally aligned to current frame. This can be achieved by shifiting each 2D point according to the optical flow
obtained in the current frame. The 2D propagated pointcloud is later on needed to find correspondence in the image space.
*/
void propagate_2D(){
if(this->Fusion_data.segements.size() == this->Fusion_data.accumulated_cloud->size())
{
//go through subframe cloud
for(int i = 0; i < this->Fusion_data.segements.size(); i++)
{
//consider only dynamic points
if(this->Fusion_data.segements[i] != 3)
Poxel Sf tmp = this->Fusion_data.accumulated_frame[i];
if(0 <= tmp.x_OF && tmp.x_OF < 2048 && 0 <= tmp.y_OF && tmp.y_OF <= 640)
{
cv::Point2f& fxy = m_OpticalFlow.curr_flow.at<cv::Point2f>(tmp.y_OF, tmp.x_OF);
//propagate
tmp.x_OF = tmp.x_OF + fxy.x;
tmp.y_OF = tmp.y_OF + fxy.y;
tmp. flow = sqrt(fxy.x * fxy.x + fxy.y * fxy.y);
}
//save the data
this->Fusion_data.accumulated_frame[i] = tmp; }
}
}
}
/* The raw accumulated pointcloud is the accumulated raw 3D points from multipe subframes and covers the time interval of a full lidar frame. In this step,
all the points in the raw accumulated pointcloud, which have a motion assigned, are propagated according to their 3D velocities. The 3D propagated pointcloud
is needed in the further processing to find correspondences in 3D space.
*/
void propagate_3D(uint64_t time_img, lidarFusionFrame const& srcData)
{
if(this->Fusion_data.accumulated_cloud->size()== this->Fusion_data.time.size())
for(int i = 0; i < this->Fusion_data.time.size(); i++)
{
//check if the a motion is assigned to the point
if(this->Fusion_data.segements[i] == 1)
{
//propagate point according to time
double factor = (double) this->Fusion_data.time[i];
Poxel Sf motion = this->Fusion_data.accumulated_frame[i];
//compute the time difference of the current image and the sampling time of the specific lidar point float delta_t = (time_img - motion. timestamp);
float d_t = delta_t * (float) le-6;
pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.accumulated_cloud->points[i]; pcl::PointCloud<pcl::PointXYZ>::PointType new_pt;
//change to m/s
float vx = motion.vx * 100;
float vy = motion.vy * 100;
float vz = motion.vz * 100;
//propagate 3D
new_pt.x = pt.x + d_t * vx;
new_pt.y = pt.y + d_t * vy;
new_pt.z = pt.z + d_t * vz;
//save the values
this->Fusion_data.propagated_cloud->push_back(new_pt);
(else
{
pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.accumulated_cloud->points[i]; this->Fusion_data.propagated_cloud->push_back(pt);
}
}
}
/* First, each point of the new incoming subframe cloud is classified as static or dynamic. For the further processing only dynamic points are considered.
Then, the 3D propagated cloud is used to find matching points in lidar space, while each pair of points is weighted with the corresponding optical flow. A RAN SAC
algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. Each point of the subframe cloud is then checked
if it follows the motion or not. The remaining points, which do not fulfill the motion hypothesis test, are further processed in the 2D correspondence finding block. These remaining points are projected on the image and checked within a defined radius for existing points in the 2D propagated pointcloud. Trough the connection between
the accumulated pointcloud and the 2D propagated pointcloud, correspondences can be found. Similar to the 3D correspondence block, a RANSAC algorithm is used to obtain
the optimal transformotion. Finally the motion hypothesis test checks if a motion is assigned to each remaining point.
*/
void compute_3D_velocity(lidarFusionFrame const& srcData, int d)
{
if(!this->Fusion_data.subframe_cloud->empty()){
/* In the preprocessing step the SHOT descriptor for each point in the new subframe is computed. Further, each point of the new incoming subframe cloud is
classified as static or dynamic. For the further processing only dynamic points are considered.
*/
III start number 3 - Preprocessing
//compute normal vector - needed for descriptor computation
compute_normal() ;
if( ! this->F usion_data. subframe_cloud->empty()) {
//compute lidar descriptor
compute_SHOT();
if(this->Fusion_data.tot_features.size() > 9 && this->Fusion_data.subframe_keypoints.size() >10 && this->F usion_data. segements. size() == this->F usion_data. accumulated_frame. size())
{
///start feature matching and velocity update
int counter = 0;
int tot_counter = 0;
int segment = 0;
pci: : Correspondences correspondences_propagated;
std::vector<float> weights;
//check for static points - go through all the points in the subframe cloud
for(int i=0; i< this->Fusion_data.subframe_cloud->size(); i++)
{
//get the 2d projection of the 3D subframe point
pcl::PointCloud<pcl::PointXYZ>::PointType pt_sub;
pt_sub = this->Fusion_data.subframe_cloud->points[i];
Poxel_N tl = this->Fusion_data.subframe[i];
//if there is ego-motion -> transform each point according to it's egomotion
// take difference between proj_point and proj_transformed_point
// multiply v_proj_lidar and v_OF and divide through their lenght and this should equal to 1 float EPSILON = 0.5;
//get the flow at a specific position
cv::Point2f& fxy = m_OpticalFlow.curr_flow.at<cv::Point2f>(proj_point(l), proj_point(0)); float x_disp = fxy.x;
float y_disp = fxy.y;
float l_v_of = sqrt(x_disp * x_disp + y_disp * y_disp);
float l_v_3D = ego_motion; //if there is no ego-motion it corresponds to zero
//CHECK IF STATIC: no ego-motion is assumed
if(l_v_3D < EPSILON && l_v_of < EPSILON)
{
//3: static this->Fusion_data.subframe_segments[i] = 3;
tl.vx = 0;
tl.vy = 0;
tl.vz = 0;
this->Fusion_data.subframe[i] = tl;
}
}
/* The 3D propagated cloud is used to find matching points in lidar space, while each pair of points is weighted with the corresponding optical flow.
*/
III start 4 - 3D correspondence finding
//just a check if the dataset are still consistent
if(this->Fusion_data.segements.size() > 0 && !this->Fusion_data.propagated_cloud->empty() && enable_propagation)
{
//all the points are stored in trees, to get faster the nearest neighbor
create_tree();
//go through the keypoints
for(int i=0; i< this->Fusion_data.subframe_keypoints_cloud->size(); i++)
{
//take only dynamic points:
if(this->Fusion_data.subframe_segments[keypoints_indices->indices[i]] == 0)
{
//find nearest neighboor of query point
pcl::PointXYZ searchPoint;
pcl::PointCloud<pcl::PointXYZ>::PointType pt = this-
>Fusion_data.subframe_keypoints_cloud->points[i];
// K nearest neighbor search
int K = 1 ;
std: : vector<int> pointldxNKN Search(K) ;
std: :vector<float> pointNKN SquaredDistance(K);
//check if a neighbor exists between the 3D propagated pointcloud and the point in the subframecloud
if ( kdtree.nearestKSearch (searchPoint, K, pointldxNKNSearch, pointNKNSquaredDistance) > 0 )
{
//check if it has already a label -> increase counter
if(this->Fusion_data.segements[pointIdxNKNSearch[0]] == 1 && pointNKNSquaredDistancefO] < 0.1)
{
//Through the propagated 3D cloud one knows the connection to the accumulated raw
3D cloud
pcl::PointXYZ match;
match = this->Fusion_data.accumulated_cloud->points[pointIdxNKNSearch[0]];
//find best nearest neighbor— SFiOT comparison
int set_counter = 1 ;
std::vector<int> index_cloud(K);
std: :vector<float> distance_vector(K);
float range = sqrt(pt.x * pt.x + pt.y * pt.y + pt.z * pt.z);
float lim = range / 50.0;
//find nearest neighbors int checkpoints = acc_cloud_tree.radiusSearch(match, lim, index_cloud, distance_vector, set_counter);
//compare SHOT features
curr_distance = shot.compare(index_cloud, checkpoints);
int j = shot.get index();
match = this->Fusion_data.accumulated_cloud->points[j];
//compute weight:
weight = compute_OF_weight(match);
weights.push_back(weight);
//save correspondences
pcl::Correspondence corr;
corr.index_query = i;
corr.index_match = j ;//pointIdxNKNSearch[0] ;
correspondences_propagated.push_back(corr);
}
}
}
}
}
/* A RANSAC algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. */
///start 5 - 3D velocity estimation
//RANSAC: reject outliers
if(!correspondences_propagated.empty())
//do RANSAC with new correspondance
ransac . compute() ;
transform = ransac. transform();
correspondences = ransac. correspondences();
/* Each point of the subframe cloud is then checked if it follows the motion or not. The remaining points, which do not fulfill the motion hypothesis test,
are further processed in the 2D correspondence finding block.
*/
///start 6 - motion hypothesis check
//go through subframe_cloud
for(int i = 0; i< this->Fusion_data.subframe_cloud->size(); i++)
{
if(this->Fusion_data.subframe_segments[i] != 3 && this-
>Fusion_data.subframe_segments[i] != 4 )
{
//check each point in the subframe if they correspond to the motion pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.subframe_cloud-
>points[i];
Eigen: :MatrixXf point(4, 1 );
point(O) = pt.x;
point(l) = pt.y;
point(2) = pt.z;
point(3) = 1;
//transform each point through the hypothesis
Eigen: :MatrixXf res = transform * point;
Poxel_N tl = this->Fusion_data.subframe[i];
double prob = 0; //find nearest neighboor of query point
pcl::PointXYZ searchPoint;
//project res into image:
cv::Matx31f coord = cv::Matx31f(res(0,0), res(l,0), res(2,0));
cv::Matx33f cam = _CamParams.K;
cv::Matx31fproj_point = cam * coord;
searchPoint.x = proj_point(0);
searchPoint.y = proj_point(l);
if(!isnan(searchPoint.x) && !isnan( searchPoint.y) && !isnan(searchPointz))
11 K nearest neighbor search
int K = 8;
std: :vector<int> pointldxNKNSearch(K);
std::vector<float> pointNKNSquaredDistance(K);
//find the subset of points in the accumulated pointcloud
int neighbors = kdtreel.radiusSearch(searchPoint,d(range), pointldxNKNSearch, pointNKNSquaredDistance, 8);
float reflectivity = tl. reflectivity;
float intensity = tl .intensity; int inlier = 0;
//search for inlier:
if( ! correspondence_cloud->empty())
{
//find valid correspondeces in the
std: : vector<int> pointldxNKN Search_inlier( 10);
std: : vector<float> pointNKNSquaredDistance_inlier( 10);
int neighbors_inlier = kdtree_inlier.radiusSearch(pt, d(range), pointIdxNKNSearch_inlier, pointNKNSquaredDistance_inlier, 10);
inlier = neighbors_inlier;
}
prob = compute_probability(point, res, neighbors, pointldxNKNSearch, reflectivity, intensity);
prob = prob * inlier * 0.5;
}
float limit = 3 ;
if(prob>limit)
{
//assign velocity
tl = compute_velocity();
}
this->Fusion_data.subframe[i] = tl;
}
this->Fusion_data.subframe_time = 1;
Figure imgf000031_0001
}
/* These remaining points are projected on the image and checked within a defined radius for existing points in the 2D propagated pointcloud. Trough the connection between
the accumulated pointcloud and the 2D propagated pointcloud, correspondences can be found.
*/
///start 7 - 2D correspondence finding
//OF features
//propagate lidar samples through OF and use this to find corresponding point
pcl::CorrespondencesPtr correspondences(new pcl::Correspondences());
pcl::Correspondences corr;
for(int i = 0; i < this->Fusion_data.subframe_keypoints_cloud->size(); i++)
{ searchPoint.z = 0;
//check if there is any points without any velocities
bool check = false;
if(this->Fusion_data.subframe_segments[keypoints_indices->indices[i]] == 0 && !this-
>Fusion_data.accumulated_features->empty())
check = true;
if(check)
{
//find nearest neighbor between the 2D projected point of the subframe cloud and the 2D propagated cloud
//the index of the nearest neighbor can be again taken to determine the connection to the accumulated cloud
neighbors = OF_tree.radiusSearch(searchPoint, radius, pointldxNKNSearch, pointNKNSquaredDistance, K);
if(neighbors >= 1)
{
float distance = shot.compare_shot(pointNKNSquaredDistance, neighbors);
int j = shot.get_index();
pt = this->Fusion_data.subframe_keypoints_cloud->points[i];
match = this->Fusion_data.accumulated_cloud->points[j];
time = this->Fusion_data.time[j];
index = j;
}
//compute OF weight
weight = compute_OF_weight(sub_data);
weights.push_back(weight); pci:: Correspondence corrl;
corrl.index_query = i;
corrl .index_match = index;
correspondences->push_back(corr 1 );
}
}
}
/* A RANSAC algorithm is used to reject outliers. With the valid pairs of points (correspondences) a transformation is computed. */
///start 8 - 3D velocity estimation
//RANSAC: reject outliers
if(! correspondences->empty()) {
//RANSAC
ransac . compute() ;
transform = ransac.transform();
correspondences = ransac. correspondences!);
this->Fusion_data.correspondence_new =“"correspondences;
/* Each point of the subframe cloud is then checked if it follows the motion or not. The remaining points, which do not fulfill the motion hypothesis test,
are further processed in the 2D correspondence finding block.
*/
///start 9 - motion hypothesis check
//go through subframe_cloud
for(int i = 0; i< this->Fusion_data.subframe_cloud->size(); i++)
{
if((this->Fusion_data.subframe_segments[i] == 0 || this->Fusion_data.subframe_segments[i] == 2 II this->Fusion_data.subframe_segments[i] == 6) && ! correspondences->empty())
{
pcl::PointCloud<pcl::PointXYZ>::PointType pt = this->Fusion_data.subframe_cloud-
>points[i];
Eigen: :MatrixXf point(4, 1 );
point(0) = pt.x;
point(l) = pt.y;
point(2) = pt.z;
point(3) = 1;
//each point is projected back according its velocity
Eigen: :MatrixXf res = transform * point;
Poxel_N tl = this->Fusion_data.subframe[i];
double prob = 0;
//find nearest neighboor of query point
pcl::PointXYZ searchPoint;
cv::Matx31f coord = cv::Matx31f(res(0,0), res(l,0), res(2,0));
cv::Matx33f cam = _CamParams.K;
cv::Matx31fproj_point = cam * coord;
searchPoint.x = proj_point(0);
searchPoint.y = proj_point(l); if(!isnan(searchPoint.x) && !isnan(searchPointy) && !isnan(searchPointz))
// K nearest neighbor search
int K = 8;
std: :vector<int> pointldxNKNSearch(K);
std::vector<float> pointNKNSquaredDistance(K);
int neighbors = kdtreel.radiusSearch(searchPoint, 5, pointldxNKN Search, pointNKNSquaredDistance, 8);
int inlier = 0;
//search for inlier:
if(! correspondence_cloudl ->empty())
{
std: : vector<int> pointldxNKN Search_inlier( 10);
std: : vector<float> pointNKNSquaredDistance_inlier( 10); int neighbors_inlier = kdtree_inlier.radiusSearch(pt, 1, pointIdxNKNSearch_inlier, pointNKNSquaredDistance_inlier, 10);
inlier = neighbors_inlier;
}
float reflectivity = tl. reflectivity;
float intensity = tl .intensity;
prob = compute_probability(point, res, neighbors, pointldxNKNSearch, reflectivity, intensity);
prob = prob * inlier * 0.5;
} if(prob > 10)
{
//assign velocity
tl = compute_velocity;
this->Fusion_data.subframe[i] = tl ;
}
this->Fusion_data.subframe_time = 1;
}
}
}
}
}
}
}
//This is the main function. It calls all the subfunctions
void process(lidarFusionFrame const& srcData)
{
//Only a fraction of points are considered each frame
Lidardatasplit(srcData) ;
for(int i=0; i < start_index[l].size()-l;i++)
{
//update optical flow
m_OpticalFlow.process();
//propagates projected 3D point in image space by the optical flow
propagate_2D();
//3D propagation of the lidar points
propagate_3D(srcData.cameraFrames[i]. timestamp, srcData);
//3D -> 2D coordinates
proj e ct_sub frame (srcD ata, i) ;
//classification between static and dynamic points/ SHOT feature computation
//3D correspondence finding and 2D correspondence finding
compute_3D_velocity( srcData, i); }
5 start_index.clear();
}
} ;
10
#endif // DYNAMICMOTIONESTIMATO 2

Claims

Claims
1. A method for data fusion of LiDAR data and image data, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, comprising: capturing three-dimensional LiDAR data of the scene with a scanning LiDAR sensor
(2);
capturing image data corresponding to a two-dimensional projection of the scene with an image sensor (3) having a higher framerate as compared to the framerate of a complete scan of the LiDAR sensor;
estimating (15, 18) based on the captured image data and/or LiDAR data a three- dimensional velocity of a dynamic object (5) present in the scene;
compensating the motion of the dynamic object (5); and
fusing the motion compensated LiDAR data and image data.
2. The method of claim 1, further comprising:
generating a LiDAR subframe including LiDAR samples covering a part of the scene during the frame period of an image sensor generating the image data; and
accumulating LiDAR samples of multiple LiDAR subframes to generate an accumulated LiDAR point cloud corresponding to a complete scan of the sensed scene at the frame rate of the LiDAR sensor.
3. The method of claim 2 or, further comprising:
projecting LiDAR samples of a LiDAR subframe on image data of a camera image; and
propagating the projected LiDAR samples according to a displacement information to generate a propagated point cloud.
4. The method of claim 3, wherein propagating the projected LiDAR samples is based on an optical flow algorithm which determines the displacement information between consecutive camera images.
5. The method of claim 4, further comprising:
updating (12) the propagated point cloud according to the optical flow information generated for a new camera image; and
checking (14, 17) the projected LiDAR samples of a new LiDAR subframe for a correspondence with existing points in the current propagated point cloud.
6. The method of claim 5, wherein a correspondence qualifies as valid if projected LiDAR samples of a new LiDAR subframe fall within a defined radius around an existing point.
7. The method of any of claims 3 to 6, further comprising:
assigning an index to the LiDAR samples; and
keeping the index of the LiDAR samples while propagating the projected LiDAR samples.
8. The method of any of claims 3 to 7, wherein LiDAR samples from multiple LiDAR subframes are accumulated to cover the time interval of a full LiDAR frame, and wherein accumulated data older than this time interval are deleted after the processing of an image frame and new data for the current LiDAR subframe are added to the accumulated data.
9. The method of any of claims 3 to 8, further comprising:
computing LiDAR descriptors for dynamic LiDAR points in a LiDAR subframe point cloud, wherein a LiDAR descriptor describes a geometrical three dimensional structure of the local neighborhood of the dynamic points;
matching LiDAR points of a new LiDAR subframe with LiDAR points in the accumulated LiDAR point cloud using the LiDAR descriptors; and
propagating the matched LiDAR points of the new LiDAR subframe according to the estimated three dimensional velocity to generate a three dimensional propagated point cloud.
10. The method of any of claims 3 to 9, further comprising:
checking whether LiDAR samples in a new LiDAR subframe correspond to a dynamic or static object; and keeping only LiDAR samples corresponding to dynamic points for the further processing.
11. The method of any of claims 3 to 10, further comprising:
detecting a ground plane in the scene;
identifying LiDAR samples in a new LiDAR subframe corresponding to the detected ground plane; and
removing the identified LiDAR samples from the further processing.
12. The method of any of the preceding claims, wherein the estimated three dimensional velocity of the dynamic objects is used for transportation vehicle path planning and/or trajectory management.
13. An apparatus for data fusion of LiDAR data and image data, wherein both the LiDAR data and the image data cover the same scene in the environment of a vehicle, the apparatus comprising:
a scanning LiDAR sensor (2) generating the LiDAR data;
a high framerate camera (3) generating the image data, wherein the image data are generated with a higher framerate as compared to the framerate of a complete scan of the scanning LiDAR sensor;
one or more processor coupled to the camera and the LiDAR sensor, wherein said one or more processors are configured to:
receive camera generated image data and LiDAR sensor generated point cloud data; estimate a three dimensional velocity of a dynamic object present in the scene;
compensate the motion of the dynamic object; and
fuse the motion compensated LiDAR data and image data.
14. The apparatus according to claim 14, further comprising:
one or more buffers coupled to the one or more processors for storing accumulated information related to the LiDAR data and the image data.
15. A vehicle performing a method according to any of claims 1 to 12 and/or comprising an apparatus according to any of claims 13 to 14.
PCT/EP2019/081741 2018-11-20 2019-11-19 Method and apparatus for data fusion of lidar data and image data WO2020104423A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862770147P 2018-11-20 2018-11-20
US62/770,147 2018-11-20

Publications (1)

Publication Number Publication Date
WO2020104423A1 true WO2020104423A1 (en) 2020-05-28

Family

ID=68655511

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/081741 WO2020104423A1 (en) 2018-11-20 2019-11-19 Method and apparatus for data fusion of lidar data and image data

Country Status (1)

Country Link
WO (1) WO2020104423A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932943A (en) * 2020-10-15 2020-11-13 深圳市速腾聚创科技有限公司 Dynamic target detection method and device, storage medium and roadbed monitoring equipment
CN113029137A (en) * 2021-04-01 2021-06-25 清华大学 Multi-source information self-adaptive fusion positioning method and system
EP3945349A1 (en) * 2020-07-31 2022-02-02 Continental Automotive GmbH Method and system for determining 3d image information
CN114018236A (en) * 2021-09-30 2022-02-08 哈尔滨工程大学 Laser vision strong coupling SLAM method based on adaptive factor graph
CN114333418A (en) * 2021-12-30 2022-04-12 深兰人工智能(深圳)有限公司 Data processing method for automatic driving and related device
CN114415123A (en) * 2022-04-01 2022-04-29 北京海兰信数据科技股份有限公司 Non-coherent neighborhood based weighting pulse accumulation processing method and system
CN114475650A (en) * 2021-12-01 2022-05-13 中铁十九局集团矿业投资有限公司北京信息技术分公司 Vehicle driving behavior determination method, device, equipment and medium
CN114494332A (en) * 2022-01-21 2022-05-13 四川大学 Unsupervised estimation method for scene flow from synthesis to real LiDAR point cloud
CN114577196A (en) * 2020-11-17 2022-06-03 沃尔沃卡车集团 Lidar positioning using optical flow
CN115407304A (en) * 2022-08-30 2022-11-29 中国第一汽车股份有限公司 Point cloud data processing method and device
WO2023062400A1 (en) * 2021-10-12 2023-04-20 日産自動車株式会社 Object recognition method and object recognition device
WO2023103198A1 (en) * 2021-12-08 2023-06-15 深圳奥锐达科技有限公司 Method and device for calculating relative extrinsic parameters of ranging system, and storage medium
CN116736322A (en) * 2023-08-15 2023-09-12 山东科技大学 Speed prediction method integrating camera image and airborne laser radar point cloud data
US20240080432A1 (en) * 2022-09-01 2024-03-07 Sony Group Corporation Self-organizing rolling shutter camera arrays for low-cost, accurate volemetric capture workflows
US12003695B2 (en) * 2022-09-01 2024-06-04 Sony Group Corporation Self-organizing rolling shutter camera arrays for low-cost, accurate volumetric capture workflows

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140064555A1 (en) * 2012-09-04 2014-03-06 Digital Singnal Corporation System and Method for Increasing Resolution of Images Obtained from a Three-Dimensional Measurement System

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140064555A1 (en) * 2012-09-04 2014-03-06 Digital Singnal Corporation System and Method for Increasing Resolution of Images Obtained from a Three-Dimensional Measurement System

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
DAVID HELD ET AL.: "Precision tracking with sparse 3d and dense color 2d data", 2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, May 2013 (2013-05-01), pages 1138 - 1145, XP032506018, DOI: 10.1109/ICRA.2013.6630715
DAVID HELD ET AL.: "Robust Real-Time Tracking Combining 3D Shape, Color, and Motion", THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, August 2015 (2015-08-01), pages 30 - 49
E. ILG ET AL.: "Reconstruction of rigid body models from motion distorted laser range data using optical flow", IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA, 2014
ILG EDDY ET AL: "Reconstruction of rigid body models from motion distorted laser range data using optical flow", 2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 31 May 2014 (2014-05-31), pages 4627 - 4632, XP032650440, DOI: 10.1109/ICRA.2014.6907535 *
JENNIFER DOLSON ET AL.: "Upsampling range data in dynamic environments", 2010 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2010, pages 1141 - 1148, XP031725915
JINWOOK CHOI ET AL: "2D-plus-depth based resolution and frame-rate up-conversion technique for depth video", IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 56, no. 4, November 2010 (2010-11-01), pages 2489 - 2497, XP011341851, ISSN: 0098-3063, DOI: 10.1109/TCE.2010.5681132 *
M. H. DARAEI ET AL.: "Region Segmentation Using LiDAR and Camera", 2017 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC, October 2017 (2017-10-01)
M. H. DARAEI ET AL.: "Velocity and shape from tightly-coupled lidar and camera", 2017 IEEE INTELLIGENT VEHICLES SYMPOSIUM, June 2017 (2017-06-01)
R. OMAR CHAVEZ-GARCIA ET AL.: "Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking", IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, vol. PP, no. 99, 2015, pages 1 - 10
S. WANG ET AL.: "Dynamic detection technology for moving objects using 3d lidar information and rgb camera", 2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW, June 2017 (2017-06-01), pages 37 - 38, XP033129944, DOI: 10.1109/ICCE-China.2017.7990983
SEBASTIAN SCHNEIDER ET AL.: "Fusing vision and lidar - synchronization, correction and occlusion reasoning", 2010 IEEE INTELLIGENT VEHICLES SYMPOSIUM, 2010, pages 388 - 393, XP031732240
X. CHEN ET AL.: "Multi-view 3d object detection network for autonomous driving", 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, vol. 6534, 2017, pages 6526, XP033250017, DOI: 10.1109/CVPR.2017.691

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3945349A1 (en) * 2020-07-31 2022-02-02 Continental Automotive GmbH Method and system for determining 3d image information
CN111932943B (en) * 2020-10-15 2021-05-14 深圳市速腾聚创科技有限公司 Dynamic target detection method and device, storage medium and roadbed monitoring equipment
CN111932943A (en) * 2020-10-15 2020-11-13 深圳市速腾聚创科技有限公司 Dynamic target detection method and device, storage medium and roadbed monitoring equipment
CN114577196B (en) * 2020-11-17 2024-05-24 沃尔沃卡车集团 Lidar positioning using optical flow
CN114577196A (en) * 2020-11-17 2022-06-03 沃尔沃卡车集团 Lidar positioning using optical flow
CN113029137A (en) * 2021-04-01 2021-06-25 清华大学 Multi-source information self-adaptive fusion positioning method and system
CN114018236A (en) * 2021-09-30 2022-02-08 哈尔滨工程大学 Laser vision strong coupling SLAM method based on adaptive factor graph
CN114018236B (en) * 2021-09-30 2023-11-03 哈尔滨工程大学 Laser vision strong coupling SLAM method based on self-adaptive factor graph
WO2023062400A1 (en) * 2021-10-12 2023-04-20 日産自動車株式会社 Object recognition method and object recognition device
CN114475650A (en) * 2021-12-01 2022-05-13 中铁十九局集团矿业投资有限公司北京信息技术分公司 Vehicle driving behavior determination method, device, equipment and medium
WO2023103198A1 (en) * 2021-12-08 2023-06-15 深圳奥锐达科技有限公司 Method and device for calculating relative extrinsic parameters of ranging system, and storage medium
CN114333418A (en) * 2021-12-30 2022-04-12 深兰人工智能(深圳)有限公司 Data processing method for automatic driving and related device
CN114333418B (en) * 2021-12-30 2022-11-01 深兰人工智能(深圳)有限公司 Data processing method for automatic driving and related device
CN114494332B (en) * 2022-01-21 2023-04-25 四川大学 Unsupervised synthesis to real LiDAR point cloud scene flow estimation method
CN114494332A (en) * 2022-01-21 2022-05-13 四川大学 Unsupervised estimation method for scene flow from synthesis to real LiDAR point cloud
CN114415123B (en) * 2022-04-01 2022-07-19 北京海兰信数据科技股份有限公司 Non-coherent neighborhood based weighting pulse accumulation processing method and system
CN114415123A (en) * 2022-04-01 2022-04-29 北京海兰信数据科技股份有限公司 Non-coherent neighborhood based weighting pulse accumulation processing method and system
CN115407304A (en) * 2022-08-30 2022-11-29 中国第一汽车股份有限公司 Point cloud data processing method and device
US20240080432A1 (en) * 2022-09-01 2024-03-07 Sony Group Corporation Self-organizing rolling shutter camera arrays for low-cost, accurate volemetric capture workflows
US12003695B2 (en) * 2022-09-01 2024-06-04 Sony Group Corporation Self-organizing rolling shutter camera arrays for low-cost, accurate volumetric capture workflows
CN116736322A (en) * 2023-08-15 2023-09-12 山东科技大学 Speed prediction method integrating camera image and airborne laser radar point cloud data
CN116736322B (en) * 2023-08-15 2023-10-20 山东科技大学 Speed prediction method integrating camera image and airborne laser radar point cloud data

Similar Documents

Publication Publication Date Title
WO2020104423A1 (en) Method and apparatus for data fusion of lidar data and image data
CN112292711B (en) Associating LIDAR data and image data
Jörgensen et al. Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss
US10915793B2 (en) Method and system for converting point cloud data for use with 2D convolutional neural networks
Königshof et al. Realtime 3d object detection for automated driving using stereo vision and semantic information
US20170316569A1 (en) Robust Anytime Tracking Combining 3D Shape, Color, and Motion with Annealed Dynamic Histograms
Held et al. Precision tracking with sparse 3d and dense color 2d data
US11935250B2 (en) Method, device and computer-readable storage medium with instructions for processing sensor data
CN110832568B (en) Vehicle environment recognition device
JP2016081525A (en) Vehicular image recognition system and corresponding method
US11887336B2 (en) Method for estimating a relative position of an object in the surroundings of a vehicle and electronic control unit for a vehicle and vehicle
CN112997187A (en) Two-dimensional object bounding box information estimation based on aerial view point cloud
US20220398856A1 (en) Method for reconstruction of a feature in an environmental scene of a road
EP3416132A1 (en) Image processing device, object recognition device, device control system, and image processing method and program
JP7091686B2 (en) 3D object recognition device, image pickup device and vehicle
JP7209115B2 (en) Detection, 3D reconstruction and tracking of multiple rigid objects moving in relatively close proximity
EP4012653A1 (en) Depth-map prediction method; computer program, readable medium, system and mobile machine for implementing the method
Xiang et al. Vilivo: Virtual lidar-visual odometry for an autonomous vehicle with a multi-camera system
El Bouazzaoui et al. Enhancing rgb-d slam performances considering sensor specifications for indoor localization
Diego et al. Vision-based road detection via on-line video registration
US20210407117A1 (en) System and method for self-supervised monocular ground-plane extraction
Vatavu et al. Real-time modeling of dynamic environments in traffic scenarios using a stereo-vision system
WO2020118623A1 (en) Method and system for generating an environment model for positioning
CN116543143A (en) Training method of target detection model, target detection method and device
US11657506B2 (en) Systems and methods for autonomous robot navigation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19809014

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19809014

Country of ref document: EP

Kind code of ref document: A1