CN113409376A

CN113409376A - Method for filtering laser radar point cloud based on depth estimation of camera

Info

Publication number: CN113409376A
Application number: CN202110681027.1A
Authority: CN
Inventors: 张雨
Original assignee: Beijing Qingzhou Zhihang Technology Co ltd
Current assignee: Beijing Qingzhou Zhihang Technology Co ltd
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-17

Abstract

The embodiment of the invention relates to a method for filtering laser radar point cloud based on depth estimation of a camera, which comprises the following steps: acquiring a first point cloud data set; acquiring first scene image data; performing depth image conversion on the first scene image data by using a depth estimation model to generate corresponding first depth image data; performing depth data conversion processing on second pixel data of each second pixel point data to generate corresponding first pixel point depth data; marking the first point cloud data matched with each second pixel point in the first point cloud data set as corresponding first matched point cloud data; effective point cloud data marking processing is carried out on the first matching point cloud data corresponding to each second pixel point data; and filtering out the first point cloud data which are not marked as valid point cloud data in the first point cloud data set. By the method, invalid information in the point cloud set can be removed to the maximum extent, the point cloud accuracy can be improved, and the point cloud calculated amount can be reduced.

Description

Method for filtering laser radar point cloud based on depth estimation of camera

Technical Field

The invention relates to the technical field of data processing, in particular to a method for filtering laser radar point cloud based on depth estimation of a camera.

Background

The point cloud data (point cloud data) is data for recording scanning information in a point form, and each point cloud data obtained by scanning of the laser radar contains depth data reflecting the depth of the point cloud. The lidar point cloud is used as the most main sensor data and plays an important role in an automatic driving system. However, the lidar point cloud can generate false points in some scenes, such as in rainy and foggy weather, smoke, automobile exhaust and other scenes. Sometimes, in order to increase point cloud information, a laser radar double echo (dual return) mode is used for acquiring the point cloud, that is, the same laser radar ray returns two different echoes at most, so that two different laser radar points are obtained in the same emission direction. In this case, if there are a large number of false points, the effect is doubled.

Disclosure of Invention

The invention aims to provide a method for filtering laser radar point clouds based on depth estimation of a camera, electronic equipment and a computer readable storage medium, which aim to overcome the defects of the prior art, carry out depth image conversion on image data shot by the camera, locate matched point clouds corresponding to depth image pixel points in a point cloud set, carry out effective point cloud marking on the matched point clouds according to depth image pixel points and depth difference values of the matched point clouds, and finally complete false point removing operation on the whole point cloud set in a manner of filtering the invalid point clouds of the point cloud set. By the method, invalid information in the point cloud set can be removed to the maximum extent, the point cloud identification rate is improved, and the point cloud calculation amount is reduced.

In order to achieve the above object, a first aspect of the embodiments of the present invention provides a method for performing laser radar point cloud filtering based on depth estimation of a camera, where the method includes:

acquiring a first point cloud data set generated by scanning a first specified environment by a laser radar at a first time T; the first set of point cloud data comprises a plurality of first point cloud data; the first point cloud data comprises first point cloud depth data;

acquiring first scene image data generated by shooting the first specified environment by a camera at the first moment T; the first scene image data comprises a plurality of first pixel point data; the first pixel point data includes first pixel data;

performing depth image conversion processing on the first scene image data by using a well-trained depth estimation model based on an optical image to generate corresponding first depth image data; the first depth image data comprises a plurality of second pixel point data; the second pixel point data includes second pixel data; a resolution of the first depth image data is consistent with a resolution of the first scene image data; the second pixel point data corresponds to the image coordinates of the first pixel point data one by one;

according to the corresponding relation between the pixel data of the depth image and the depth data, performing depth data conversion processing on the second pixel data of each second pixel point data to generate corresponding first pixel point depth data;

according to the corresponding relation between the depth image coordinate system and the point cloud coordinate system, marking the first point cloud data matched with each second pixel point in the first point cloud data set as corresponding first matched point cloud data; if the preset scanning mode of the laser radar is a single-echo mode, the maximum number of the first matching point cloud data corresponding to each second pixel point data is 1; if the scanning mode is a double echo mode, the maximum number of the first matching point cloud data corresponding to each second pixel point data is 2;

effective point cloud data marking processing is carried out on the first matching point cloud data corresponding to each second pixel point data;

filtering out the first point cloud data which is not marked as the valid point cloud data in the first point cloud data set.

Preferably, the depth estimation model based on the optical image is a monocular depth estimation model;

the monocular depth estimation model comprises a first type monocular depth estimation model and a second type monocular depth estimation model;

the first type of monocular depth estimation model comprises a convolutional neural network depth map prediction module;

the second type of monocular depth estimation model comprises a convolutional neural network disparity map prediction module, a binocular image reconstruction module and a depth map generation module.

Preferably, the performing depth image conversion processing on the first scene image data by using a trained depth estimation model based on an optical image to generate corresponding first depth image data specifically includes:

when the depth estimation model based on the optical image is the first type monocular depth estimation model, acquiring an external calibration parameter M of the current camera₁And an internal calibration parameter K₁And M is₁And K₁Inputting the convolutional neural network depth map prediction module as a calculation factor;

and inputting the first scene image data into the convolutional neural network depth map prediction module to perform depth map prediction operation, and generating corresponding first depth image data.

when the used depth estimation model based on the optical image is the second type monocular depth estimation model, inputting the first scene image data into the convolutional neural network disparity map prediction module for disparity map prediction operation, and generating second left-to-right disparity map data and second right-to-left disparity map data; the second left-to-right disparity map data and the second right-to-left disparity map data both comprise a plurality of disparity map pixels, and pixel values of the disparity map pixels and corresponding disparity values are in a linear relationship;

inputting the second left-to-right disparity map data into the depth map generation module in a manner to

Calculating the depth conversion relation to generate the first depth image data; f is the focal length of the left camera and the right camera used in the model training, B is the optical center distance of the left camera and the right camera used in the model training, d is a parallax value, and Z is a depth value; the second pixel data of each second pixel point data of the first depth image data is in a linear relationship with a corresponding depth value.

Preferably, the effective point cloud data marking processing on the first matching point cloud data corresponding to each second pixel point data specifically includes:

when the scanning mode is a single echo mode, according to the first pixel point depth data corresponding to each second pixel point data and the first point cloud depth data of the corresponding first matching point cloud data, performing depth difference value calculation to generate a first depth difference value; if the first depth difference data conforms to a preset difference threshold range, marking the first matching point cloud data corresponding to the current second pixel point data as the effective point cloud data;

when the scanning mode is a double-echo mode, selecting the smaller first point cloud depth data as corresponding second matching point cloud data from 2 first matching point cloud data corresponding to each second pixel point data; performing depth difference calculation according to the first pixel point depth data corresponding to each second pixel point data and the first point cloud depth data of the corresponding second matching point cloud data to generate a second depth difference; and if the second depth difference data conforms to the difference threshold range, marking the second matching point cloud data corresponding to the current second pixel point data as the effective point cloud data.

A second aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;

the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;

the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.

A third aspect of embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.

The embodiment of the invention provides a method for filtering a laser radar point cloud based on depth estimation of a camera, electronic equipment and a computer readable storage medium, which are used for carrying out depth image conversion on image data shot by the camera, positioning a matching point cloud corresponding to a depth image pixel point in a point cloud set, carrying out effective point cloud marking on the matching point cloud according to a depth difference value of the depth image pixel point and the matching point cloud, and finally completing false point removing operation on the whole point cloud set in a manner of filtering invalid point clouds in the point cloud set. By the method, invalid information in the point cloud set can be removed to the maximum extent, the point cloud identification rate is improved, and the point cloud calculation amount is reduced.

Drawings

Fig. 1 is a schematic diagram of a method for performing laser radar point cloud filtering based on camera depth estimation according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides a method for performing laser radar point cloud filtering based on depth estimation of a camera, as shown in fig. 1, which is a schematic diagram of the method for performing laser radar point cloud filtering based on depth estimation of a camera according to an embodiment of the present invention, the method mainly includes the following steps:

step 1, acquiring a first point cloud data set generated by scanning a first specified environment by a laser radar at a first time T;

wherein the first point cloud data set comprises a plurality of first point cloud data; the first point cloud data includes first point cloud depth data.

Here, the laser radar is mounted on the body of an autonomous unmanned vehicle; the point cloud data comprises width data and height data besides depth data, wherein the width data and the height data refer to width coordinates and height coordinates of corresponding points in a point cloud three-dimensional coordinate system; the laser radar in the embodiment of the invention supports two point cloud scanning modes of single echo and double echo, wherein the single echo mode means that only one depth data exists at the position with the same width and height, and the double echo mode means that two near depth data and two far depth data may appear at the position with the same width and height.

Step 2, acquiring first scene image data generated by shooting a first specified environment by a camera at a first time T;

wherein the first scene image data comprises a plurality of first pixel point data; the first pixel point data includes first pixel data.

Here, the first scene image data is a common color image without depth information, and reflects an actual image of a captured scene, and pixels of the first scene image data are in a consistent relationship with pixels of the actual image; it should be noted that the camera in the current step is mounted on the same vehicle body as the laser radar in step 1, and the two operate synchronously; the shooting scene of the current camera is the same scene as the scanning scene of the laser radar in the step 1, one of the first point cloud data set generated in the step 1 and the first scene image data generated in the step 2 is a three-dimensional space pattern reflecting the scene relative to the vehicle body, and the other is a front two-dimensional image reflecting the scene relative to the vehicle body.

Step 3, performing depth image conversion processing on the first scene image data by using a well-trained depth estimation model based on the optical image to generate corresponding first depth image data;

wherein the first depth image data comprises a plurality of second pixel point data; the second pixel point data includes second pixel data; the resolution of the first depth image data is consistent with the resolution of the first scene image data; the second pixel point data corresponds to the image coordinates of the first pixel point data one to one.

Here, a depth image (depth image) is also called a range image (range image) and refers to an image having a depth (distance) from a camera to each point in a scene as a pixel value; the depth estimation model based on the optical image is used for performing depth map conversion on a common two-dimensional image, namely first scene image data, to obtain a corresponding two-dimensional depth map, namely first depth image data, and the depth map principle shows that each pixel point, namely the pixel value of the second pixel point data, is converted from a predicted depth value. When the depth estimation model based on the optical image is used for depth map conversion, the resolution of the image is not damaged, so that the resolution of the first depth image data is consistent with that of the first scene image data, and each second pixel point data in the first depth image can find the first pixel point data corresponding to one coordinate in the first scene image data.

It should be noted that, in the embodiment of the present invention, the essence of the depth estimation model based on the optical image is a monocular depth estimation model, that is, a model for performing depth estimation based on shot data of one camera;

the monocular depth estimation model provided by the embodiment of the invention is different in two types based on the learning principle: a first type monocular depth estimation model and a second type monocular depth estimation model; the first type of monocular depth estimation model comprises a convolutional neural network depth map prediction module; the second type of monocular depth estimation model comprises a convolutional neural network disparity map prediction module, a binocular image reconstruction module and a depth map generation module;

for the first type of monocular depth estimation model, the model structure is mainly composed of a convolutional neural network for depth map prediction, and the learning principle of the model is the known camera imaging principle, namely the pinhole imaging principle, so that the external and internal calibration parameters of a camera are required to be used for calculation in the operation process;

for the second type of monocular depth estimation model, the model structure of the model consists of a convolutional neural network for performing disparity map prediction, a binocular image reconstruction module for image reconstruction and a depth map generation module for generating a depth map, and the learning principle of the model is the known binocular disparity depth estimation principle.

Here, before using the trained mature optical image-based depth estimation model, the method of the embodiment of the present invention further includes training both types of models.

Performing model training on a first type monocular depth estimation model, which specifically comprises the following steps:

a1, calibrating internal and external parameters of the camera to obtain an external calibration parameter M and an internal calibration parameter K; inputting the external calibration parameter M and the internal calibration parameter K into a convolutional neural network depth map prediction module as calculation factors;

step A2, using a camera to travel to the same scene and continuously shoot in the traveling process, and generating three continuous images of time points t-1, t and t + 1: image data at time t-1, image data at time t and image data at time t + 1;

step A3, inputting the t-1 moment image data, the t moment image data and the t +1 moment image data into a convolutional neural network depth map prediction module for depth map prediction operation, and generating corresponding t-1 moment depth map data, t moment depth map data and t +1 moment depth map data;

a4, according to the mapping relation between the depth map coordinate and K M, depth map loss evaluation is carried out on the depth map data at the t-1 moment, the depth map data at the t moment and the depth map data at the t +1 moment, and the convolutional neural network depth map prediction module is reversely modulated according to the depth map loss evaluation result;

step A5, if the depth map loss evaluation result has converged to the preset optimal error range, completing the training of the convolutional neural network depth map prediction module.

(II) carrying out model training on a convolutional neural network disparity map prediction module of a second type monocular depth estimation model, which specifically comprises the following steps:

step B1, using the left and right cameras to shoot the same appointed environment with the central line coincident with the vertical central line of the left and right camera connecting lines, and generating first left camera image data and first right camera image data;

here, the resolutions of the first left camera image data and the first right camera image data are both consistent by default;

step B2, inputting the first left camera image data into a convolutional neural network disparity map prediction module for disparity map prediction operation, and generating first left-to-right disparity map data and first right-to-left disparity map data;

here, the disparity map refers to a positional deviation of pixels of the same scene imaged by two cameras, which is generally reflected in a horizontal direction because the two binocular cameras are generally horizontally disposed; the left-to-right disparity map is a disparity map of a left camera image relative to a right camera image, and the right-to-left disparity map is a disparity map of a right camera image relative to a left camera image; the pixel value of each pixel point of the disparity map is converted from the left-to-right or right-to-left disparity value of the corresponding point; as can be seen from the principle of disparity maps, the resolutions of the first left-to-right disparity map data and the first right-to-left disparity map data are consistent with the first left camera image data and the first right camera image data;

step B3, inputting the first left camera image data and the first right pair left disparity map data into a binocular image reconstruction module for right eye image reconstruction processing to generate first right eye reconstructed image data; inputting the first right camera image data and the first left-right parallax image data into a binocular image reconstruction module for left eye image reconstruction processing to generate first left eye reconstructed image data;

here, the reason why the left and right eye maps are to be reconstructed is that the subsequent steps are compared with real left and right eye maps, that is, the first left and right camera image data, and the closer the reconstructed image is to the real image, the more accurate the parallax map of the convolutional neural network parallax map prediction module is generated; the binocular image reconstruction module is only used for model training of the convolutional neural network disparity map prediction module;

step B4, right eye image loss evaluation is carried out on the first right eye reconstructed image data by using the first right camera image data, left eye image loss evaluation is carried out on the first left eye reconstructed image data by using the first left camera image data, and the convolutional neural network disparity map prediction module is subjected to backward modulation according to the comprehensive evaluation result of the left eye image and the right eye image;

and step B5, finishing the training of the convolutional neural network disparity map prediction module if the comprehensive evaluation result of the left and right target images converges to a preset optimal error range.

After the model training is completed, the embodiment of the present invention can perform depth image conversion processing on the first scene image data to generate corresponding first depth image data by using a first-class monocular depth estimation model or a second-class monocular depth estimation model which is well trained, and the specific operation steps are as follows.

When the optical image-based depth estimation model used is a monocular depth estimation model of the first type,

step C1, obtaining the external calibration parameter M of the current camera₁And an internal calibration parameter K₁And M is₁And K₁Inputting the depth map prediction module of the convolutional neural network as a calculation factor;

and step C2, inputting the first scene image data into a convolutional neural network depth map prediction module to perform depth map prediction operation, and generating corresponding first depth image data.

(ii) when the optical image-based depth estimation model used is a second type monocular depth estimation model,

step D1, inputting the first scene image data into a disparity map prediction module of a convolutional neural network for disparity map prediction operation, and generating second left-to-right disparity map data and second right-to-left disparity map data;

the second left-to-right parallax image data and the second right-to-left parallax image data comprise a plurality of parallax image pixel points, and pixel values of the parallax image pixel points and corresponding parallax values are in a linear relation;

here, the embodiment of the present invention defaults to taking the first scene image data as the left eye camera image;

step D2, inputting the second left-to-right disparity map data into the depth map generation module, and generating the depth map data according to the input result

Calculating the depth conversion relation to generate first depth image data;

wherein f is the focal length of the left camera and the right camera used in the model training, B is the optical center distance of the left camera and the right camera used in the model training, d is a parallax value, and Z is a depth value; the second pixel data of each second pixel point data of the first depth image data is in a linear relationship with the corresponding depth value.

Here, the depth map generation module polls all parallax image pixel points of the second left-to-right parallax map data in sequence; when polling, firstly, the pixel value (pixel value 1) of the current parallax map pixel point is converted into the corresponding parallax value d, and then the corresponding parallax value d is obtained according to the pixel value

Calculating to obtain a depth value Z, obtaining another pixel value (pixel value 2) according to the conversion relation between the depth value and the pixel value, finding a pixel point (second pixel point) which is consistent with the two-dimensional image coordinate of the current parallax image pixel point in the first depth image data, and setting the pixel value of the second pixel point as the pixel value 2; and circulating the steps until the polling is finished, and finishing the setting of the first depth image data.

And 4, performing depth data conversion processing on the second pixel data of each second pixel point data according to the corresponding relation between the pixel data of the depth image and the depth data, and generating corresponding first pixel point depth data.

Here, the depth data conversion is actually the inverse operation of the above depth value Z to pixel value conversion, and the purpose thereof is to estimate a depth value corresponding to each pixel point in the first depth image data.

Step 5, according to the corresponding relation between the depth image coordinate system and the point cloud coordinate system, marking the first point cloud data matched with each second pixel point in the first point cloud data set as corresponding first matched point cloud data;

if the preset scanning mode of the laser radar is a single-echo mode, the maximum number of the first matching point cloud data corresponding to each second pixel point data is 1; and if the scanning mode is a double echo mode, the maximum number of the first matching point cloud data corresponding to each second pixel point data is 2.

The two-dimensional image coordinates and the point cloud coordinates are different coordinate systems, but the two coordinate systems have a fixed two-dimensional coordinate conversion mode, namely, an X/Y axis in the two-dimensional image coordinates can be converted corresponding to width and height coordinates in the point cloud coordinates, and the two-dimensional coordinates of pixel points of each two-dimensional image can be mapped into one column or one row in a point cloud space;

based on the characteristics of radar scanning, for a single echo mode, only one depth data exists at a scanning position with the same width and height, that is, only one point cloud data exists in one row or one column of the point cloud space which is mapped with the two-dimensional coordinates of the pixel points, and the point cloud data is used as matching point cloud data of the corresponding pixel points in the embodiment of the invention; for the double-echo mode, at most two depth data exist at the same scanning position with width and height, that is, at most two point cloud data exist in one row or one column in the point cloud space which is mapped with the two-dimensional coordinates of the pixel points.

Step 6, carrying out effective point cloud data marking processing on the first matching point cloud data corresponding to each second pixel point data;

the method specifically comprises the following steps: step 61, when the scanning mode is a single echo mode, according to the first pixel point depth data corresponding to each second pixel point data and the first point cloud depth data of the corresponding first matching point cloud data, performing depth difference calculation to generate a first depth difference; if the first depth difference data accords with a preset difference threshold range, marking first matching point cloud data corresponding to the current second pixel point data as effective point cloud data;

here, when the scanning mode is the single echo mode, it means that each second pixel point data corresponds to only one first matching point cloud data, and the rule for marking the valid point cloud data according to the embodiment of the present invention is as follows: if the depth data of the current first matching point cloud data is consistent with the estimated depth (first pixel point depth data) of the current second pixel point data or the deviation is not large (the difference value is within the range of a preset difference value threshold), the current first matching point cloud data is considered as the visible real point cloud data in the camera shooting image and is marked as effective point cloud data;

step 62, when the scanning mode is the double-echo mode, selecting the smaller first point cloud depth data as the corresponding second matching point cloud data from the 2 first matching point cloud data corresponding to each second pixel point data; according to the depth data of the first pixel points corresponding to the second pixel point data and the depth data of the first point cloud of the corresponding second matching point cloud data, depth difference calculation is carried out to generate a second depth difference; and if the second depth difference data conforms to the difference threshold range, marking the second matching point cloud data corresponding to the current second pixel point data as effective point cloud data.

Here, when the scanning mode is the dual echo mode, it means that each second pixel point data may correspond to at most two first matching point cloud data, and the rule for marking valid point cloud data according to the embodiment of the present invention is as follows: firstly, defaulting the farther point cloud in the 2 first matching point cloud data as an invalid point cloud, and only processing the first matching point cloud data which is relatively close to the first matching point cloud data; next, similar to step 61, if the depth data of the near first matching point cloud data is consistent with the estimated depth (first pixel point depth data) of the current second pixel point data or the deviation is not large (the difference is within the preset difference threshold range), the near first matching point cloud data is considered as the real point cloud data visible in the camera shot image and should be marked as the valid point cloud data.

And 7, filtering out the first point cloud data which are not marked as effective point cloud data in the first point cloud data set.

Here, step 6 only marks the valid point cloud data, and the remaining unmarked systems are default to be invalid point clouds, where all the invalid point clouds in the first point cloud data set are deleted.

In summary, it can be seen from the operations in steps 1 to 7 that in the embodiment of the present invention, it is actually assumed that the points visible in the image captured by the camera are valid points, and the point cloud data in the point cloud set is distinguished according to the valid points and invalid points, and finally the optimization of the point cloud data set is completed by deleting the invalid point cloud data, so that the error rate of the point cloud data is reduced, and the accuracy of the point cloud data is improved. For the point cloud data in the double echo mode, after the data optimization in the steps 1-6, the data volume can be reduced to less than half of the original data, and if the resolution of the camera is smaller than that of the laser radar, the data volume can be further reduced on the basis of half, so that the calculation time can be greatly reduced in the subsequent point cloud calculation, and the calculation efficiency can be improved.

Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be the terminal device or the server, or may be a terminal device or a server connected to the terminal device or the server and implementing the method according to the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripherals.

The system bus 305 mentioned in fig. 2 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM) and may also include a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method and the processing procedure provided in the above-mentioned embodiment.

The embodiment of the present invention further provides a chip for executing the instructions, where the chip is configured to execute the processing steps described in the foregoing method embodiment.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for lidar point cloud filtering based on camera depth estimation, the method comprising:

2. The method for lidar point cloud filtering based on camera depth estimation of claim 1,

the depth estimation model based on the optical image is a monocular depth estimation model;

3. The method of claim 2, wherein the performing depth image conversion processing on the first scene image data using a well-trained depth estimation model based on optical images to generate corresponding first depth image data comprises:

4. The method of claim 2, wherein the performing depth image conversion processing on the first scene image data using a well-trained depth estimation model based on optical images to generate corresponding first depth image data comprises:

5. The method of claim 1, wherein the effective point cloud data labeling of the first matching point cloud data corresponding to each of the second pixel point data comprises:

6. An electronic device, comprising: a memory, a processor, and a transceiver;

the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of the claims 1-5;

7. A computer-readable storage medium having stored thereon computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-5.