CN112233163A - Depth estimation method and device for laser radar stereo camera fusion and medium thereof - Google Patents
Depth estimation method and device for laser radar stereo camera fusion and medium thereof Download PDFInfo
- Publication number
- CN112233163A CN112233163A CN202011464746.XA CN202011464746A CN112233163A CN 112233163 A CN112233163 A CN 112233163A CN 202011464746 A CN202011464746 A CN 202011464746A CN 112233163 A CN112233163 A CN 112233163A
- Authority
- CN
- China
- Prior art keywords
- radar
- image
- left image
- right image
- current frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 claims abstract description 18
- 230000002776 aggregation Effects 0.000 claims abstract description 15
- 238000004220 aggregation Methods 0.000 claims abstract description 15
- 230000004931 aggregating effect Effects 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 150000001875 compounds Chemical class 0.000 claims description 9
- 238000012937 correction Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 230000036544 posture Effects 0.000 claims description 4
- 238000011524 similarity measure Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10044—Radar image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Optics & Photonics (AREA)
- Health & Medical Sciences (AREA)
- Measurement Of Optical Distance (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a depth estimation method, a device and a medium for fusion of a laser radar stereo camera, wherein the method comprises the following steps: acquiring a current frame left image and a current frame right image of a stereo camera; acquiring a radar left image and a radar right image; fusing the current frame left image and the radar left image to obtain a first left image; fusing the right image of the current frame with the right image of the radar to obtain a first right image; inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image; inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image; acquiring initial matching cost between the first characteristic left image and the first characteristic right image; optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm; and performing depth estimation according to the disparity map. The method can obtain accurate and reliable depth prediction and is widely applied to the technical field of image processing.
Description
Technical Field
The invention relates to the field of image processing and computer vision, in particular to a depth estimation method and device for laser radar stereo camera fusion and a medium thereof.
Background
The laser radar is one of important sensors for realizing the perception of the mobile robot and the automatic driving environment of the automobile, is suitable for the perception of the complex traffic environment, has the advantages of higher precision of the obtained depth map and low resolution, and can obtain the depth map which is very sparse and is easy to ignore small targets. Binocular stereo vision is an important branch of computer vision, is widely applied to the automobile unmanned technology, but the accuracy of the obtained depth map is low due to the fact that the influence of environmental factors such as vision and illumination is large. The existing methods based on the deep neural network cannot meet the requirement of obtaining real-time and accurate depth estimation, and a proper solution for fusing radar measurement and a stereo matching algorithm is lacked.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a depth estimation method and device for laser radar stereo camera fusion and a medium thereof.
The technical scheme adopted by the invention is as follows:
in one aspect, an embodiment of the present invention includes a depth estimation method for laser radar stereo camera fusion, including:
acquiring a current frame left image and a current frame right image of a stereo camera;
acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
fusing the current frame left image and the radar left image to obtain a first left image;
fusing the current frame right image and the radar right image to obtain a first right image;
inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
acquiring an initial matching cost between the first characteristic left image and the first characteristic right image;
optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and performing depth estimation according to the disparity map.
Further, the method further comprises:
and simultaneously shooting calibration objects in different postures and different positions by using the stereo camera and the laser radar.
Further, the stereo camera includes a left camera and a right camera, and after acquiring a current frame left image and a current frame right image of the stereo camera, the method further includes:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
Further, the step of acquiring a radar left image and a radar right image specifically includes:
acquiring a mapping chart shot by the laser radar;
compressing the map and dividing the map into a radar left image and a radar right image.
Further, the fusing the current frame left image and the radar left image to obtain a first left image specifically includes:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
Further, the fusing the current frame right image and the radar right image to obtain a first right image specifically includes:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
Further, the step of obtaining an initial matching cost between the first feature left image and the first feature right image specifically includes:
calculating a similarity measure between the first feature left image and the first feature right image by a weighted hamming distance method;
and acquiring an initial matching cost between the first characteristic left image and the first characteristic right image according to the similarity measurement.
Further, the step of optimizing the initial matching cost based on a cross radar trust aggregation and semi-global stereo matching algorithm includes:
determining a first target point in the radar left image, and drawing a cross-shaped graph through the first target point, wherein the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first formula is:(ii) a In the formula (I), the compound is shown in the specification,the first distance is represented by a first distance,the coordinates of the second target point are represented,coordinates representing a point in the vertical or horizontal direction of the second target point,indicating function for indicating coordinatesAnd coordinatesWhether the difference in pixel intensity between is less than a threshold; wherein, theThe second formula is calculated, and the second formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,representing coordinatesThe intensity of the pixel of (a) is,representing coordinatesThe pixel intensity of (a);representing coordinatesPixel intensity and coordinates ofThe absolute difference in the intensity of the pixels of (a),a threshold value representing a difference in pixel intensity;
according to the first distanceOptimizing the initial matching cost by a third formula, wherein the third formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,the coordinates of the points are represented by,representing coordinates in radar left imageThe parallax of (a) is greater than (b),coordinates representing the first target pointAnd the coordinatesCorrespondingly, the numerical values are completely the same,representing point coordinatesAnd the coordinatesThe distance in the vertical direction or in the horizontal direction,indicating function for indicating coordinatesAnd point coordinatesBetweenWhether the difference in pixel intensity is less than a threshold;representing optimized point coordinatesIn the parallaxThe cost of the matching of (a) to (b),representing point coordinates obtained by a weighted hamming distance methodIn the parallaxAn initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
On the other hand, the embodiment of the invention also comprises a depth estimation device for the fusion of the laser radar stereo camera, which comprises the following steps:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the depth estimation method.
In another aspect, the embodiments of the present invention further include a computer-readable storage medium on which a program executable by a processor is stored, the program executable by the processor being used for implementing the depth estimation method when being executed by the processor.
The invention has the beneficial effects that:
(1) by effectively fusing the laser radar and the stereo camera, accurate and reliable depth prediction can be obtained;
(2) by utilizing the binary neural network to simultaneously extract the characteristics of the two images, the accuracy is ensured, and simultaneously, the method is large
The speed is greatly improved;
(3) by means of cross-based radar trust aggregation, depth information obtained by laser radar shooting is utilized to the maximum extent,
thereby achieving a good improvement in accuracy.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating steps of a method for depth estimation with laser radar stereo camera fusion according to an embodiment of the present invention;
fig. 2 is a block diagram of a depth estimation method for lidar stereo camera fusion according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a binary neural network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of cross-based radar trust aggregation according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a depth estimation device with a laser radar stereo camera fused according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, an embodiment of the present invention includes a method for depth estimation of lidar stereo camera fusion, including but not limited to the following steps:
s1, acquiring a current frame left image and a current frame right image of a stereo camera;
s2, acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
s3, fusing the current frame left image and the radar left image to obtain a first left image;
s4, fusing the current frame right image and the radar right image to obtain a first right image;
s5, inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
s6, inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
s7, acquiring initial matching cost between the first characteristic left image and the first characteristic right image;
s8, optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and S9, carrying out depth estimation according to the disparity map.
As an optional embodiment, the method further comprises:
s0. uses the stereo camera and laser radar to shoot the calibration objects with different postures and positions at the same time.
As an optional implementation manner, the stereo camera includes a left camera and a right camera, and after acquiring the current frame left image and the current frame right image of the stereo camera, the method further includes:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
In this embodiment, the stereo camera adopts binocular camera, and binocular camera generally includes two monocular cameras that are used for the formation of image, is called left camera and right camera, and these two monocular cameras set up the coplanar at binocular camera, and the distance between each other is greater than a definite value. In practical application, a binocular camera is generally applied to the fields of robots, unmanned vehicles, security monitoring and the like, specifically, the binocular camera can shoot images at a certain time interval, and the images shot at a certain moment comprise a left image and a right image respectively shot by a left camera and a right camera in the binocular camera, namely a left image of a certain frame and a right image of a certain frame.
In this embodiment, after the left image and the right image are obtained by shooting, correction processing needs to be performed respectively according to distortion parameters of the camera.
In step S2, that is, the step of acquiring the radar left image and the radar right image specifically includes:
s201, obtaining a mapping map shot by the laser radar;
s202, compressing the mapping map, and dividing the mapping map into a radar left image and a radar right image.
In the embodiment, the laser radar and the stereo camera are used for shooting calibration objects with different postures and different positions at the same time; the laser radar is a radar system that detects a characteristic quantity such as a position and a velocity of a target by emitting a laser beam. From the theory of operation, there is not fundamental difference with microwave radar, its theory of operation is: the method comprises the steps of transmitting a detection signal (laser beam) to a target, comparing a received signal (target echo) reflected from the target with the transmitted signal, and after proper processing, obtaining relevant information of the target, such as target distance, azimuth, height, speed, attitude, even shape and other parameters, thereby detecting, tracking and identifying the target such as an airplane, a missile and the like. Specifically, the laser radar is composed of a laser transmitter, an optical receiver, a rotary table, an information processing system and the like, wherein the laser device converts electric pulses into optical pulses to be transmitted out, and the optical receiver restores the optical pulses reflected from a target into electric pulses to be transmitted to a display.
In the embodiment, an image of the same part of the same marker, which is shot by the laser radar and corresponds to the image shot by the left camera of the binocular camera, is selected as a radar left image; selecting an image of the same part of the same marker, which is shot by the laser radar and corresponds to the image shot by the right camera of the binocular camera, as a radar right image; in this embodiment, the image captured by the laser radar is a sparse map.
As an optional implementation manner, in step S5, that is, the current frame left image and the radar left image are fused to obtain a first left image, specifically:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
As an optional implementation manner, in step S6, that is, the current frame right image and the radar right image are fused to obtain a first right image, specifically:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
In this embodiment, steps S5 and S6 are executed along the fusion path according to the image size, respectively, that is, the current frame left image and the radar left image are fused to obtain a first left image; and fusing the current frame right image and the radar right image to obtain a first right image.
As an optional implementation manner, step S7, that is, the step of obtaining the initial matching cost between the first feature left image and the first feature right image, specifically includes:
s701, calculating similarity measurement between the first characteristic left image and the first characteristic right image through a weighted Hamming distance method;
s702, according to the similarity measurement, obtaining an initial matching cost between the first characteristic left image and the first characteristic right image.
Specifically, referring to fig. 2, fig. 2 is a frame diagram of the depth estimation method for lidar stereo camera fusion; the specific process comprises the following steps:
(1) the method comprises the following steps of taking an RGB left image obtained by shooting through a binocular camera and a radar left image obtained by shooting through a corresponding radar as first inputs, and taking an RGB right image obtained by shooting through the binocular camera and a radar right image obtained by shooting through the corresponding radar as parallel second inputs;
(2) fusing the RGB left image and the corresponding radar left image along the channel size to obtain a first left image; the RGB right image and the corresponding radar right image are fused along the channel size to obtain a first right image;
(3) inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image; inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
(4) calculating a similarity matrix of the first characteristic left image and the first characteristic right image through a weighted Hamming distance;
(5) continuing to perform cost aggregation processing, and refining and aggregating the result by using the depth information of the images shot by the laser radar in the cost aggregation process;
(6) and finally obtaining a finally refined disparity map after SGM (semi-global stereo matching algorithm).
Referring to fig. 3, in the process, the binary neural network is a binary neural network and is a highly quantized network, and the floating point weight is represented as +1 or-1, so as to realize maximum model compression. The binary neural network comprises a floating point convolution layer, a binary convolution layer, a zooming layer, a normalization layer, a binarization neuron and Hardtach, wherein the characteristic extraction network comprises four groups of layer processing, which are respectively as follows: the first group of layers comprise a floating point convolution layer, a normalization layer, a binarization neuron and a Hardtath; the second group of layers comprises a binary convolution layer, a scaling layer, a normalization layer and a binary neuron; the third group of layers and the second group of layers comprise a binary convolution layer, a scaling layer, a normalization layer and a binary neuron, and the fourth group of layers comprise a binary convolution layer, a scaling layer and a normalization layer; wherein the first set of layers has no binary convolutional layer in order to ensure that the accuracy is not excessively degraded.
In this embodiment, the binary neural network is equivalent to a binary feature extractor, and can jointly represent multidimensional information as a high-level bitwise feature vector. By encoding the image information captured by the lidar, more accurate feature information can be obtained, which is more accurate than relying on optical appearance alone.
Referring to fig. 4, in step S8, namely, regarding the method of cross-based radar trust aggregation, the purpose is to better utilize accurate depth information obtained by lidar shooting; the method does not need to establish a local region for each pixel and aggregate all candidate disparities, but only needs to update a small amount of the specific disparities of the pixels at the vertical intersection of sparse keypoints (e.g., radar points). Then, after the aggregation, the influence of the key points is automatically expanded to the neighbors. The method can improve the accuracy of depth estimation.
Specifically, a first target point is determined in the radar left image, a cross-shaped graph is drawn through the first target point, the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first formula is:(ii) a In the formula (I), the compound is shown in the specification,the first distance is represented by a first distance,the coordinates of the second target point are represented,coordinates representing a point in the vertical or horizontal direction of the second target point,indicating function for indicating coordinatesAnd coordinatesWhether the difference in pixel intensity between is less than a threshold; wherein, theThe second formula is calculated, and the second formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,representing coordinatesThe intensity of the pixel of (a) is,representing coordinatesThe pixel intensity of (a);representing coordinatesPixel intensity and coordinates ofThe absolute difference in the intensity of the pixels of (a),a threshold value representing a difference in pixel intensity;
optimizing the initial matching cost through a third formula according to the first distance, wherein the third formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,the coordinates of the points are represented by,representing coordinates in radar left imageThe parallax of (a) is greater than (b),coordinates representing the first target pointAnd the coordinatesCorrespondingly, the numerical values are completely the same,representing point coordinatesAnd the coordinatesThe distance in the vertical direction or in the horizontal direction,indicating function for indicating coordinatesAnd point coordinatesWhether the difference in pixel intensity between is less than a threshold;representing optimized point coordinatesIn the parallaxThe cost of the matching of (a) to (b),representing point coordinates obtained by a weighted hamming distance methodIn the parallaxAn initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
In this embodiment, the radar left image has sparse radar points, where points with a point value greater than 0 are valid points, and the point values of most points are all 0, which are invalid points; in the embodiment, all effective points in the radar left image can be traversed, and the cross image is sketched by taking each effective point as a central point; because the current frame left image obtained by the stereo camera completely corresponds to the radar left image, points corresponding to effective points in the radar left image also exist in the current frame left image, and the coordinate values of the corresponding points are completely the same; therefore, a point (a second target point) corresponding to the effective point can be obtained in the left image of the current frame, and a cross-shaped graph can be drawn by taking the point corresponding to the effective point as a central point; a first distance is then calculated. The formula for the first distance can be understood as: and searching left, right, up and down by taking the point (second target point) corresponding to the effective point as a center, and meeting the longest distance that the pixel intensity difference value of all the points on the path and the second target point is smaller than.
And setting the cost of the point with the difference between the pixel value of the left arm of the cross and the pixel value of the radar point not larger than the threshold value to be 0, otherwise, calculating by using the matching cost obtained by the weighted Hamming distance method. By the method, the peripheral cost can be effectively updated by using the sparse radar points, so that the condition that only the cost of the radar points is updated, the radar points are considered to be outliers due to overlarge difference with peripheral pixels, and are ignored or repeatedly updated in the process of cost aggregation is avoided; the method for cross-based radar trust aggregation described in this embodiment is to spread the information of the key points into the whole area.
In summary, the depth estimation method for laser radar stereo camera fusion described in this embodiment has the following advantages:
(1) by effectively fusing the laser radar and the stereo camera, accurate and reliable depth prediction can be obtained;
(2) by utilizing the binary neural network to simultaneously extract the characteristics of the two images, the accuracy is ensured, and simultaneously, the method is large
The speed is greatly improved;
(3) by means of cross-based radar trust aggregation, depth information obtained by laser radar shooting is utilized to the maximum extent,
thereby achieving a good improvement in accuracy.
Referring to fig. 5, an embodiment of the present invention further provides a depth estimation apparatus 200 for laser radar stereo camera fusion, which specifically includes:
at least one processor 210;
at least one memory 220 for storing at least one program;
when executed by the at least one processor 210, causes the at least one processor 210 to implement the method as shown in fig. 1.
The memory 220, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs and non-transitory computer-executable programs. The memory 220 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 220 may optionally include remote memory located remotely from processor 210, and such remote memory may be connected to processor 210 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be understood that the device structure shown in fig. 5 is not intended to be limiting of device 200, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
In the apparatus 200 shown in fig. 5, the processor 210 may retrieve the program stored in the memory 220 and execute, but is not limited to, the steps of the embodiment shown in fig. 1.
The above-described embodiments of the apparatus 200 are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purposes of the embodiments.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a program executable by a processor, and the program executable by the processor is used for implementing the method shown in fig. 1 when being executed by the processor.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (10)
1. A depth estimation method for laser radar stereo camera fusion is characterized by comprising the following steps:
acquiring a current frame left image and a current frame right image of a stereo camera;
acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
fusing the current frame left image and the radar left image to obtain a first left image;
fusing the current frame right image and the radar right image to obtain a first right image;
inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
acquiring an initial matching cost between the first characteristic left image and the first characteristic right image;
optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and performing depth estimation according to the disparity map.
2. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In that the method further comprises:
and simultaneously shooting calibration objects in different postures and different positions by using the stereo camera and the laser radar.
3. The lidar stereo camera fused depth estimation method according to claim 1, wherein
After acquiring a current frame left image and a current frame right image of the stereo camera, the method further comprises:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
4. The lidar stereo camera fused depth estimation method according to claim 1, wherein
The step of acquiring the radar left image and the radar right image specifically comprises:
acquiring a mapping chart shot by the laser radar;
compressing the map and dividing the map into a radar left image and a radar right image.
5. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the fusing the current frame left image and the radar left image to obtain a first left image specifically includes:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
6. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the step of fusing the current frame right image and the radar right image to obtain a first right image specifically comprises:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
7. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the step of obtaining an initial matching cost between the first feature left image and the first feature right image, the method specifically includes:
calculating a similarity measure between the first feature left image and the first feature right image by a weighted hamming distance method;
and acquiring an initial matching cost between the first characteristic left image and the first characteristic right image according to the similarity measurement.
8. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the step of optimizing the initial matching cost based on the cross-based radar trust aggregation and semi-global stereo matching algorithm includes:
determining a first target point in the radar left image, and drawing a cross-shaped graph through the first target point, wherein the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first mentionedOne formula is:(ii) a In the formula (I), the compound is shown in the specification,the first distance is represented by a first distance,the coordinates of the second target point are represented,coordinates representing a point in the vertical or horizontal direction of the second target point,indicating function for indicating coordinatesAnd coordinatesWhether the difference in pixel intensity between is less than a threshold; wherein, theThe second formula is calculated, and the second formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,representing coordinatesThe intensity of the pixel of (a) is,representing coordinatesThe pixel intensity of (a);representing coordinatesPixel intensity and coordinates ofThe absolute difference in the intensity of the pixels of (a),a threshold value representing a difference in pixel intensity;
optimizing the initial matching cost through a third formula according to the first distance, wherein the third formula is as follows:(ii) a In the formula (I), the compound is shown in the specification,the coordinates of the points are represented by,representing coordinates in radar left imageThe parallax of (a) is greater than (b),coordinates representing the first target pointAnd the coordinatesCorrespondingly, the numerical values are completely the same,representing point coordinatesAnd the coordinatesThe distance in the vertical direction or in the horizontal direction,indicating function for indicating coordinatesAnd point coordinatesWhether the difference in pixel intensity between is less than a threshold;representing optimized point coordinatesIn the parallaxThe cost of the matching of (a) to (b),representing point coordinates obtained by a weighted hamming distance methodIn the parallaxAn initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
9. A depth estimation device for laser radar stereo camera fusion is characterized by comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.
10. Computer-readable storage medium, characterized in that a program executable by a processor is stored thereon, the program
The processor-executable program is for implementing the method of any one of claims 1 to 8 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011464746.XA CN112233163B (en) | 2020-12-14 | 2020-12-14 | Depth estimation method and device for laser radar stereo camera fusion and medium thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011464746.XA CN112233163B (en) | 2020-12-14 | 2020-12-14 | Depth estimation method and device for laser radar stereo camera fusion and medium thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112233163A true CN112233163A (en) | 2021-01-15 |
CN112233163B CN112233163B (en) | 2021-03-30 |
Family
ID=74124881
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011464746.XA Active CN112233163B (en) | 2020-12-14 | 2020-12-14 | Depth estimation method and device for laser radar stereo camera fusion and medium thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112233163B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113281779A (en) * | 2021-05-20 | 2021-08-20 | 中山大学 | 3D object rapid detection method, device, equipment and medium |
CN114140507A (en) * | 2021-10-28 | 2022-03-04 | 中国科学院自动化研究所 | Depth estimation method, device and equipment integrating laser radar and binocular camera |
CN114862931A (en) * | 2022-05-31 | 2022-08-05 | 小米汽车科技有限公司 | Depth distance determination method and device, vehicle, storage medium and chip |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255811A (en) * | 2018-07-18 | 2019-01-22 | 南京航空航天大学 | A kind of solid matching method based on the optimization of confidence level figure parallax |
CN110517309A (en) * | 2019-07-19 | 2019-11-29 | 沈阳工业大学 | A kind of monocular depth information acquisition method based on convolutional neural networks |
CN110517303A (en) * | 2019-08-30 | 2019-11-29 | 的卢技术有限公司 | A kind of fusion SLAM method and system based on binocular camera and millimetre-wave radar |
CN110942477A (en) * | 2019-11-21 | 2020-03-31 | 大连理工大学 | Method for depth map fusion by using binocular camera and laser radar |
CN111028285A (en) * | 2019-12-03 | 2020-04-17 | 浙江大学 | Depth estimation method based on binocular vision and laser radar fusion |
US20200175315A1 (en) * | 2018-11-30 | 2020-06-04 | Qualcomm Incorporated | Early fusion of camera and radar frames |
CN111415305A (en) * | 2020-03-10 | 2020-07-14 | 桂林电子科技大学 | Method for recovering three-dimensional scene, computer-readable storage medium and unmanned aerial vehicle |
CN111563442A (en) * | 2020-04-29 | 2020-08-21 | 上海交通大学 | Slam method and system for fusing point cloud and camera image data based on laser radar |
-
2020
- 2020-12-14 CN CN202011464746.XA patent/CN112233163B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255811A (en) * | 2018-07-18 | 2019-01-22 | 南京航空航天大学 | A kind of solid matching method based on the optimization of confidence level figure parallax |
US20200175315A1 (en) * | 2018-11-30 | 2020-06-04 | Qualcomm Incorporated | Early fusion of camera and radar frames |
CN110517309A (en) * | 2019-07-19 | 2019-11-29 | 沈阳工业大学 | A kind of monocular depth information acquisition method based on convolutional neural networks |
CN110517303A (en) * | 2019-08-30 | 2019-11-29 | 的卢技术有限公司 | A kind of fusion SLAM method and system based on binocular camera and millimetre-wave radar |
CN110942477A (en) * | 2019-11-21 | 2020-03-31 | 大连理工大学 | Method for depth map fusion by using binocular camera and laser radar |
CN111028285A (en) * | 2019-12-03 | 2020-04-17 | 浙江大学 | Depth estimation method based on binocular vision and laser radar fusion |
CN111415305A (en) * | 2020-03-10 | 2020-07-14 | 桂林电子科技大学 | Method for recovering three-dimensional scene, computer-readable storage medium and unmanned aerial vehicle |
CN111563442A (en) * | 2020-04-29 | 2020-08-21 | 上海交通大学 | Slam method and system for fusing point cloud and camera image data based on laser radar |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113281779A (en) * | 2021-05-20 | 2021-08-20 | 中山大学 | 3D object rapid detection method, device, equipment and medium |
CN114140507A (en) * | 2021-10-28 | 2022-03-04 | 中国科学院自动化研究所 | Depth estimation method, device and equipment integrating laser radar and binocular camera |
CN114862931A (en) * | 2022-05-31 | 2022-08-05 | 小米汽车科技有限公司 | Depth distance determination method and device, vehicle, storage medium and chip |
Also Published As
Publication number | Publication date |
---|---|
CN112233163B (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112233163B (en) | Depth estimation method and device for laser radar stereo camera fusion and medium thereof | |
CN111436216B (en) | Method and system for color point cloud generation | |
JP2020525809A (en) | System and method for updating high resolution maps based on binocular images | |
WO2021016854A1 (en) | Calibration method and device, movable platform, and storage medium | |
EP3792660A1 (en) | Method, apparatus and system for measuring distance | |
CN105069804B (en) | Threedimensional model scan rebuilding method based on smart mobile phone | |
CN109885053B (en) | Obstacle detection method and device and unmanned aerial vehicle | |
CN114049382B (en) | Target fusion tracking method, system and medium in intelligent network connection environment | |
CN113160327A (en) | Method and system for realizing point cloud completion | |
CN111983603A (en) | Motion trajectory relay method, system and device and central processing equipment | |
CN113920183A (en) | Monocular vision-based vehicle front obstacle distance measurement method | |
CN112207821A (en) | Target searching method of visual robot and robot | |
CN114662587B (en) | Three-dimensional target perception method, device and system based on laser radar | |
CN116612059B (en) | Image processing method and device, electronic equipment and storage medium | |
CN111399014B (en) | Local stereoscopic vision infrared camera system and method for monitoring wild animals | |
CN110864670B (en) | Method and system for acquiring position of target obstacle | |
CN117406234A (en) | Target ranging and tracking method based on single-line laser radar and vision fusion | |
CN116994225A (en) | Target detection method, device, computer equipment and storage medium | |
CN116342677A (en) | Depth estimation method, device, vehicle and computer program product | |
CN111986248B (en) | Multi-vision sensing method and device and automatic driving automobile | |
CN114782496A (en) | Object tracking method and device, storage medium and electronic device | |
CN115436927A (en) | Road monitoring fusion tracking and positioning speed measuring method of camera and millimeter wave radar | |
CN114581889A (en) | Fusion method, device, equipment, medium and product | |
CN112001970A (en) | Monocular vision odometer method based on point-line characteristics | |
Berrio et al. | Semantic sensor fusion: From camera to sparse LiDAR information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |