CN112233163A - Depth estimation method and device for laser radar stereo camera fusion and medium thereof - Google Patents

Depth estimation method and device for laser radar stereo camera fusion and medium thereof Download PDF

Info

Publication number
CN112233163A
CN112233163A CN202011464746.XA CN202011464746A CN112233163A CN 112233163 A CN112233163 A CN 112233163A CN 202011464746 A CN202011464746 A CN 202011464746A CN 112233163 A CN112233163 A CN 112233163A
Authority
CN
China
Prior art keywords
radar
image
left image
right image
current frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011464746.XA
Other languages
Chinese (zh)
Other versions
CN112233163B (en
Inventor
陈刚
仲崇豪
孟海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011464746.XA priority Critical patent/CN112233163B/en
Publication of CN112233163A publication Critical patent/CN112233163A/en
Application granted granted Critical
Publication of CN112233163B publication Critical patent/CN112233163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Optics & Photonics (AREA)
  • Health & Medical Sciences (AREA)
  • Measurement Of Optical Distance (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a depth estimation method, a device and a medium for fusion of a laser radar stereo camera, wherein the method comprises the following steps: acquiring a current frame left image and a current frame right image of a stereo camera; acquiring a radar left image and a radar right image; fusing the current frame left image and the radar left image to obtain a first left image; fusing the right image of the current frame with the right image of the radar to obtain a first right image; inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image; inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image; acquiring initial matching cost between the first characteristic left image and the first characteristic right image; optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm; and performing depth estimation according to the disparity map. The method can obtain accurate and reliable depth prediction and is widely applied to the technical field of image processing.

Description

Depth estimation method and device for laser radar stereo camera fusion and medium thereof
Technical Field
The invention relates to the field of image processing and computer vision, in particular to a depth estimation method and device for laser radar stereo camera fusion and a medium thereof.
Background
The laser radar is one of important sensors for realizing the perception of the mobile robot and the automatic driving environment of the automobile, is suitable for the perception of the complex traffic environment, has the advantages of higher precision of the obtained depth map and low resolution, and can obtain the depth map which is very sparse and is easy to ignore small targets. Binocular stereo vision is an important branch of computer vision, is widely applied to the automobile unmanned technology, but the accuracy of the obtained depth map is low due to the fact that the influence of environmental factors such as vision and illumination is large. The existing methods based on the deep neural network cannot meet the requirement of obtaining real-time and accurate depth estimation, and a proper solution for fusing radar measurement and a stereo matching algorithm is lacked.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a depth estimation method and device for laser radar stereo camera fusion and a medium thereof.
The technical scheme adopted by the invention is as follows:
in one aspect, an embodiment of the present invention includes a depth estimation method for laser radar stereo camera fusion, including:
acquiring a current frame left image and a current frame right image of a stereo camera;
acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
fusing the current frame left image and the radar left image to obtain a first left image;
fusing the current frame right image and the radar right image to obtain a first right image;
inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
acquiring an initial matching cost between the first characteristic left image and the first characteristic right image;
optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and performing depth estimation according to the disparity map.
Further, the method further comprises:
and simultaneously shooting calibration objects in different postures and different positions by using the stereo camera and the laser radar.
Further, the stereo camera includes a left camera and a right camera, and after acquiring a current frame left image and a current frame right image of the stereo camera, the method further includes:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
Further, the step of acquiring a radar left image and a radar right image specifically includes:
acquiring a mapping chart shot by the laser radar;
compressing the map and dividing the map into a radar left image and a radar right image.
Further, the fusing the current frame left image and the radar left image to obtain a first left image specifically includes:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
Further, the fusing the current frame right image and the radar right image to obtain a first right image specifically includes:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
Further, the step of obtaining an initial matching cost between the first feature left image and the first feature right image specifically includes:
calculating a similarity measure between the first feature left image and the first feature right image by a weighted hamming distance method;
and acquiring an initial matching cost between the first characteristic left image and the first characteristic right image according to the similarity measurement.
Further, the step of optimizing the initial matching cost based on a cross radar trust aggregation and semi-global stereo matching algorithm includes:
determining a first target point in the radar left image, and drawing a cross-shaped graph through the first target point, wherein the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first formula is:
Figure DEST_PATH_IMAGE001
(ii) a In the formula (I), the compound is shown in the specification,
Figure 392475DEST_PATH_IMAGE002
the first distance is represented by a first distance,
Figure DEST_PATH_IMAGE003
the coordinates of the second target point are represented,
Figure 63628DEST_PATH_IMAGE004
coordinates representing a point in the vertical or horizontal direction of the second target point,
Figure DEST_PATH_IMAGE005
indicating function for indicating coordinates
Figure 237121DEST_PATH_IMAGE006
And coordinates
Figure DEST_PATH_IMAGE007
Whether the difference in pixel intensity between is less than a threshold; wherein, the
Figure 559386DEST_PATH_IMAGE008
The second formula is calculated, and the second formula is as follows:
Figure DEST_PATH_IMAGE009
(ii) a In the formula (I), the compound is shown in the specification,
Figure 527342DEST_PATH_IMAGE010
representing coordinates
Figure DEST_PATH_IMAGE011
The intensity of the pixel of (a) is,
Figure 685791DEST_PATH_IMAGE012
representing coordinates
Figure DEST_PATH_IMAGE013
The pixel intensity of (a);
Figure 131816DEST_PATH_IMAGE014
representing coordinates
Figure DEST_PATH_IMAGE015
Pixel intensity and coordinates of
Figure 75633DEST_PATH_IMAGE016
The absolute difference in the intensity of the pixels of (a),
Figure DEST_PATH_IMAGE017
a threshold value representing a difference in pixel intensity;
according to the first distanceOptimizing the initial matching cost by a third formula, wherein the third formula is as follows:
Figure 979871DEST_PATH_IMAGE018
(ii) a In the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE019
the coordinates of the points are represented by,
Figure 563299DEST_PATH_IMAGE020
representing coordinates in radar left image
Figure DEST_PATH_IMAGE021
The parallax of (a) is greater than (b),
Figure 875331DEST_PATH_IMAGE022
coordinates representing the first target point
Figure DEST_PATH_IMAGE023
And the coordinates
Figure 860605DEST_PATH_IMAGE024
Correspondingly, the numerical values are completely the same,
Figure DEST_PATH_IMAGE025
representing point coordinates
Figure 481948DEST_PATH_IMAGE026
And the coordinates
Figure DEST_PATH_IMAGE027
The distance in the vertical direction or in the horizontal direction,
Figure 552672DEST_PATH_IMAGE028
indicating function for indicating coordinates
Figure 606079DEST_PATH_IMAGE006
And point coordinates
Figure DEST_PATH_IMAGE029
BetweenWhether the difference in pixel intensity is less than a threshold;
Figure 199521DEST_PATH_IMAGE030
representing optimized point coordinates
Figure DEST_PATH_IMAGE031
In the parallax
Figure 742498DEST_PATH_IMAGE032
The cost of the matching of (a) to (b),
Figure DEST_PATH_IMAGE033
representing point coordinates obtained by a weighted hamming distance method
Figure 847988DEST_PATH_IMAGE034
In the parallax
Figure DEST_PATH_IMAGE035
An initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
On the other hand, the embodiment of the invention also comprises a depth estimation device for the fusion of the laser radar stereo camera, which comprises the following steps:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor is caused to implement the depth estimation method.
In another aspect, the embodiments of the present invention further include a computer-readable storage medium on which a program executable by a processor is stored, the program executable by the processor being used for implementing the depth estimation method when being executed by the processor.
The invention has the beneficial effects that:
(1) by effectively fusing the laser radar and the stereo camera, accurate and reliable depth prediction can be obtained;
(2) by utilizing the binary neural network to simultaneously extract the characteristics of the two images, the accuracy is ensured, and simultaneously, the method is large
The speed is greatly improved;
(3) by means of cross-based radar trust aggregation, depth information obtained by laser radar shooting is utilized to the maximum extent,
thereby achieving a good improvement in accuracy.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating steps of a method for depth estimation with laser radar stereo camera fusion according to an embodiment of the present invention;
fig. 2 is a block diagram of a depth estimation method for lidar stereo camera fusion according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a binary neural network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of cross-based radar trust aggregation according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a depth estimation device with a laser radar stereo camera fused according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, an embodiment of the present invention includes a method for depth estimation of lidar stereo camera fusion, including but not limited to the following steps:
s1, acquiring a current frame left image and a current frame right image of a stereo camera;
s2, acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
s3, fusing the current frame left image and the radar left image to obtain a first left image;
s4, fusing the current frame right image and the radar right image to obtain a first right image;
s5, inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
s6, inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
s7, acquiring initial matching cost between the first characteristic left image and the first characteristic right image;
s8, optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and S9, carrying out depth estimation according to the disparity map.
As an optional embodiment, the method further comprises:
s0. uses the stereo camera and laser radar to shoot the calibration objects with different postures and positions at the same time.
As an optional implementation manner, the stereo camera includes a left camera and a right camera, and after acquiring the current frame left image and the current frame right image of the stereo camera, the method further includes:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
In this embodiment, the stereo camera adopts binocular camera, and binocular camera generally includes two monocular cameras that are used for the formation of image, is called left camera and right camera, and these two monocular cameras set up the coplanar at binocular camera, and the distance between each other is greater than a definite value. In practical application, a binocular camera is generally applied to the fields of robots, unmanned vehicles, security monitoring and the like, specifically, the binocular camera can shoot images at a certain time interval, and the images shot at a certain moment comprise a left image and a right image respectively shot by a left camera and a right camera in the binocular camera, namely a left image of a certain frame and a right image of a certain frame.
In this embodiment, after the left image and the right image are obtained by shooting, correction processing needs to be performed respectively according to distortion parameters of the camera.
In step S2, that is, the step of acquiring the radar left image and the radar right image specifically includes:
s201, obtaining a mapping map shot by the laser radar;
s202, compressing the mapping map, and dividing the mapping map into a radar left image and a radar right image.
In the embodiment, the laser radar and the stereo camera are used for shooting calibration objects with different postures and different positions at the same time; the laser radar is a radar system that detects a characteristic quantity such as a position and a velocity of a target by emitting a laser beam. From the theory of operation, there is not fundamental difference with microwave radar, its theory of operation is: the method comprises the steps of transmitting a detection signal (laser beam) to a target, comparing a received signal (target echo) reflected from the target with the transmitted signal, and after proper processing, obtaining relevant information of the target, such as target distance, azimuth, height, speed, attitude, even shape and other parameters, thereby detecting, tracking and identifying the target such as an airplane, a missile and the like. Specifically, the laser radar is composed of a laser transmitter, an optical receiver, a rotary table, an information processing system and the like, wherein the laser device converts electric pulses into optical pulses to be transmitted out, and the optical receiver restores the optical pulses reflected from a target into electric pulses to be transmitted to a display.
In the embodiment, an image of the same part of the same marker, which is shot by the laser radar and corresponds to the image shot by the left camera of the binocular camera, is selected as a radar left image; selecting an image of the same part of the same marker, which is shot by the laser radar and corresponds to the image shot by the right camera of the binocular camera, as a radar right image; in this embodiment, the image captured by the laser radar is a sparse map.
As an optional implementation manner, in step S5, that is, the current frame left image and the radar left image are fused to obtain a first left image, specifically:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
As an optional implementation manner, in step S6, that is, the current frame right image and the radar right image are fused to obtain a first right image, specifically:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
In this embodiment, steps S5 and S6 are executed along the fusion path according to the image size, respectively, that is, the current frame left image and the radar left image are fused to obtain a first left image; and fusing the current frame right image and the radar right image to obtain a first right image.
As an optional implementation manner, step S7, that is, the step of obtaining the initial matching cost between the first feature left image and the first feature right image, specifically includes:
s701, calculating similarity measurement between the first characteristic left image and the first characteristic right image through a weighted Hamming distance method;
s702, according to the similarity measurement, obtaining an initial matching cost between the first characteristic left image and the first characteristic right image.
Specifically, referring to fig. 2, fig. 2 is a frame diagram of the depth estimation method for lidar stereo camera fusion; the specific process comprises the following steps:
(1) the method comprises the following steps of taking an RGB left image obtained by shooting through a binocular camera and a radar left image obtained by shooting through a corresponding radar as first inputs, and taking an RGB right image obtained by shooting through the binocular camera and a radar right image obtained by shooting through the corresponding radar as parallel second inputs;
(2) fusing the RGB left image and the corresponding radar left image along the channel size to obtain a first left image; the RGB right image and the corresponding radar right image are fused along the channel size to obtain a first right image;
(3) inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image; inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
(4) calculating a similarity matrix of the first characteristic left image and the first characteristic right image through a weighted Hamming distance;
(5) continuing to perform cost aggregation processing, and refining and aggregating the result by using the depth information of the images shot by the laser radar in the cost aggregation process;
(6) and finally obtaining a finally refined disparity map after SGM (semi-global stereo matching algorithm).
Referring to fig. 3, in the process, the binary neural network is a binary neural network and is a highly quantized network, and the floating point weight is represented as +1 or-1, so as to realize maximum model compression. The binary neural network comprises a floating point convolution layer, a binary convolution layer, a zooming layer, a normalization layer, a binarization neuron and Hardtach, wherein the characteristic extraction network comprises four groups of layer processing, which are respectively as follows: the first group of layers comprise a floating point convolution layer, a normalization layer, a binarization neuron and a Hardtath; the second group of layers comprises a binary convolution layer, a scaling layer, a normalization layer and a binary neuron; the third group of layers and the second group of layers comprise a binary convolution layer, a scaling layer, a normalization layer and a binary neuron, and the fourth group of layers comprise a binary convolution layer, a scaling layer and a normalization layer; wherein the first set of layers has no binary convolutional layer in order to ensure that the accuracy is not excessively degraded.
In this embodiment, the binary neural network is equivalent to a binary feature extractor, and can jointly represent multidimensional information as a high-level bitwise feature vector. By encoding the image information captured by the lidar, more accurate feature information can be obtained, which is more accurate than relying on optical appearance alone.
Referring to fig. 4, in step S8, namely, regarding the method of cross-based radar trust aggregation, the purpose is to better utilize accurate depth information obtained by lidar shooting; the method does not need to establish a local region for each pixel and aggregate all candidate disparities, but only needs to update a small amount of the specific disparities of the pixels at the vertical intersection of sparse keypoints (e.g., radar points). Then, after the aggregation, the influence of the key points is automatically expanded to the neighbors. The method can improve the accuracy of depth estimation.
Specifically, a first target point is determined in the radar left image, a cross-shaped graph is drawn through the first target point, the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first formula is:
Figure 705086DEST_PATH_IMAGE001
(ii) a In the formula (I), the compound is shown in the specification,
Figure 399372DEST_PATH_IMAGE002
the first distance is represented by a first distance,
Figure 50933DEST_PATH_IMAGE003
the coordinates of the second target point are represented,
Figure 158567DEST_PATH_IMAGE004
coordinates representing a point in the vertical or horizontal direction of the second target point,
Figure 553776DEST_PATH_IMAGE005
indicating function for indicating coordinates
Figure 102569DEST_PATH_IMAGE006
And coordinates
Figure 190611DEST_PATH_IMAGE007
Whether the difference in pixel intensity between is less than a threshold; wherein, the
Figure 386538DEST_PATH_IMAGE008
The second formula is calculated, and the second formula is as follows:
Figure 382176DEST_PATH_IMAGE009
(ii) a In the formula (I), the compound is shown in the specification,
Figure 51055DEST_PATH_IMAGE010
representing coordinates
Figure 309998DEST_PATH_IMAGE011
The intensity of the pixel of (a) is,
Figure 798748DEST_PATH_IMAGE012
representing coordinates
Figure 112923DEST_PATH_IMAGE013
The pixel intensity of (a);
Figure 636309DEST_PATH_IMAGE014
representing coordinates
Figure 800574DEST_PATH_IMAGE015
Pixel intensity and coordinates of
Figure 42199DEST_PATH_IMAGE016
The absolute difference in the intensity of the pixels of (a),
Figure 645219DEST_PATH_IMAGE017
a threshold value representing a difference in pixel intensity;
optimizing the initial matching cost through a third formula according to the first distance, wherein the third formula is as follows:
Figure 23111DEST_PATH_IMAGE018
(ii) a In the formula (I), the compound is shown in the specification,
Figure 623856DEST_PATH_IMAGE019
the coordinates of the points are represented by,
Figure 87199DEST_PATH_IMAGE020
representing coordinates in radar left image
Figure 244642DEST_PATH_IMAGE021
The parallax of (a) is greater than (b),
Figure 742619DEST_PATH_IMAGE022
coordinates representing the first target point
Figure 248687DEST_PATH_IMAGE023
And the coordinates
Figure 464904DEST_PATH_IMAGE024
Correspondingly, the numerical values are completely the same,
Figure 409727DEST_PATH_IMAGE025
representing point coordinates
Figure 762211DEST_PATH_IMAGE026
And the coordinates
Figure 439180DEST_PATH_IMAGE027
The distance in the vertical direction or in the horizontal direction,
Figure 877114DEST_PATH_IMAGE028
indicating function for indicating coordinates
Figure 877825DEST_PATH_IMAGE006
And point coordinates
Figure 84815DEST_PATH_IMAGE029
Whether the difference in pixel intensity between is less than a threshold;
Figure 198264DEST_PATH_IMAGE030
representing optimized point coordinates
Figure 123495DEST_PATH_IMAGE031
In the parallax
Figure 410120DEST_PATH_IMAGE032
The cost of the matching of (a) to (b),
Figure 737196DEST_PATH_IMAGE033
representing point coordinates obtained by a weighted hamming distance method
Figure 21547DEST_PATH_IMAGE034
In the parallax
Figure 168495DEST_PATH_IMAGE035
An initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
In this embodiment, the radar left image has sparse radar points, where points with a point value greater than 0 are valid points, and the point values of most points are all 0, which are invalid points; in the embodiment, all effective points in the radar left image can be traversed, and the cross image is sketched by taking each effective point as a central point; because the current frame left image obtained by the stereo camera completely corresponds to the radar left image, points corresponding to effective points in the radar left image also exist in the current frame left image, and the coordinate values of the corresponding points are completely the same; therefore, a point (a second target point) corresponding to the effective point can be obtained in the left image of the current frame, and a cross-shaped graph can be drawn by taking the point corresponding to the effective point as a central point; a first distance is then calculated. The formula for the first distance can be understood as: and searching left, right, up and down by taking the point (second target point) corresponding to the effective point as a center, and meeting the longest distance that the pixel intensity difference value of all the points on the path and the second target point is smaller than.
And setting the cost of the point with the difference between the pixel value of the left arm of the cross and the pixel value of the radar point not larger than the threshold value to be 0, otherwise, calculating by using the matching cost obtained by the weighted Hamming distance method. By the method, the peripheral cost can be effectively updated by using the sparse radar points, so that the condition that only the cost of the radar points is updated, the radar points are considered to be outliers due to overlarge difference with peripheral pixels, and are ignored or repeatedly updated in the process of cost aggregation is avoided; the method for cross-based radar trust aggregation described in this embodiment is to spread the information of the key points into the whole area.
In summary, the depth estimation method for laser radar stereo camera fusion described in this embodiment has the following advantages:
(1) by effectively fusing the laser radar and the stereo camera, accurate and reliable depth prediction can be obtained;
(2) by utilizing the binary neural network to simultaneously extract the characteristics of the two images, the accuracy is ensured, and simultaneously, the method is large
The speed is greatly improved;
(3) by means of cross-based radar trust aggregation, depth information obtained by laser radar shooting is utilized to the maximum extent,
thereby achieving a good improvement in accuracy.
Referring to fig. 5, an embodiment of the present invention further provides a depth estimation apparatus 200 for laser radar stereo camera fusion, which specifically includes:
at least one processor 210;
at least one memory 220 for storing at least one program;
when executed by the at least one processor 210, causes the at least one processor 210 to implement the method as shown in fig. 1.
The memory 220, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs and non-transitory computer-executable programs. The memory 220 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 220 may optionally include remote memory located remotely from processor 210, and such remote memory may be connected to processor 210 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It will be understood that the device structure shown in fig. 5 is not intended to be limiting of device 200, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
In the apparatus 200 shown in fig. 5, the processor 210 may retrieve the program stored in the memory 220 and execute, but is not limited to, the steps of the embodiment shown in fig. 1.
The above-described embodiments of the apparatus 200 are merely illustrative, and the units illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purposes of the embodiments.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a program executable by a processor, and the program executable by the processor is used for implementing the method shown in fig. 1 when being executed by the processor.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

1. A depth estimation method for laser radar stereo camera fusion is characterized by comprising the following steps:
acquiring a current frame left image and a current frame right image of a stereo camera;
acquiring a radar left image and a radar right image, wherein the radar left image and the current frame left image correspond to images of the same part of the same object, and the radar right image and the current frame right image correspond to images of the same part of the same object;
fusing the current frame left image and the radar left image to obtain a first left image;
fusing the current frame right image and the radar right image to obtain a first right image;
inputting the first left image into a binary neural network for feature extraction, and aggregating to obtain a first feature left image;
inputting the first right image into a binary neural network for feature extraction, and aggregating to obtain a first feature right image;
acquiring an initial matching cost between the first characteristic left image and the first characteristic right image;
optimizing the initial matching cost and extracting a disparity map based on a crossed radar trust aggregation and semi-global stereo matching algorithm;
and performing depth estimation according to the disparity map.
2. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In that the method further comprises:
and simultaneously shooting calibration objects in different postures and different positions by using the stereo camera and the laser radar.
3. The lidar stereo camera fused depth estimation method according to claim 1, wherein
After acquiring a current frame left image and a current frame right image of the stereo camera, the method further comprises:
carrying out deformation correction on the current frame left image according to the distortion parameter of the left camera;
and carrying out deformation correction on the right image of the current frame according to the distortion parameter of the right camera.
4. The lidar stereo camera fused depth estimation method according to claim 1, wherein
The step of acquiring the radar left image and the radar right image specifically comprises:
acquiring a mapping chart shot by the laser radar;
compressing the map and dividing the map into a radar left image and a radar right image.
5. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the fusing the current frame left image and the radar left image to obtain a first left image specifically includes:
and fusing the current frame left image and the radar left image along a fusion channel according to the image size to obtain a first left image.
6. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the step of fusing the current frame right image and the radar right image to obtain a first right image specifically comprises:
and fusing the current frame right image and the radar right image along a fusion channel according to the image size to obtain a first right image.
7. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the step of obtaining an initial matching cost between the first feature left image and the first feature right image, the method specifically includes:
calculating a similarity measure between the first feature left image and the first feature right image by a weighted hamming distance method;
and acquiring an initial matching cost between the first characteristic left image and the first characteristic right image according to the similarity measurement.
8. The lidar stereo camera fused depth estimation method according to claim 1, wherein
In the method, the step of optimizing the initial matching cost based on the cross-based radar trust aggregation and semi-global stereo matching algorithm includes:
determining a first target point in the radar left image, and drawing a cross-shaped graph through the first target point, wherein the first target point is any effective point in the radar left image, and the effective point is a point of which the point value is greater than zero;
acquiring a first distance through a first formula, wherein the first distance is the longest distance from a second target point in the vertical direction or the horizontal direction, and the second target point is a point corresponding to the first target point in the current frame left image; the first mentionedOne formula is:
Figure 869076DEST_PATH_IMAGE001
(ii) a In the formula (I), the compound is shown in the specification,
Figure 436455DEST_PATH_IMAGE002
the first distance is represented by a first distance,
Figure 173466DEST_PATH_IMAGE003
the coordinates of the second target point are represented,
Figure 696852DEST_PATH_IMAGE004
coordinates representing a point in the vertical or horizontal direction of the second target point,
Figure 126696DEST_PATH_IMAGE005
indicating function for indicating coordinates
Figure 165059DEST_PATH_IMAGE006
And coordinates
Figure 637586DEST_PATH_IMAGE007
Whether the difference in pixel intensity between is less than a threshold; wherein, the
Figure 281057DEST_PATH_IMAGE008
The second formula is calculated, and the second formula is as follows:
Figure 616223DEST_PATH_IMAGE009
(ii) a In the formula (I), the compound is shown in the specification,
Figure 407461DEST_PATH_IMAGE010
representing coordinates
Figure 486276DEST_PATH_IMAGE011
The intensity of the pixel of (a) is,
Figure 984253DEST_PATH_IMAGE012
representing coordinates
Figure 490321DEST_PATH_IMAGE013
The pixel intensity of (a);
Figure 18123DEST_PATH_IMAGE014
representing coordinates
Figure 900629DEST_PATH_IMAGE015
Pixel intensity and coordinates of
Figure 253112DEST_PATH_IMAGE016
The absolute difference in the intensity of the pixels of (a),
Figure 195661DEST_PATH_IMAGE017
a threshold value representing a difference in pixel intensity;
optimizing the initial matching cost through a third formula according to the first distance, wherein the third formula is as follows:
Figure 695912DEST_PATH_IMAGE018
(ii) a In the formula (I), the compound is shown in the specification,
Figure 116529DEST_PATH_IMAGE019
the coordinates of the points are represented by,
Figure 589099DEST_PATH_IMAGE020
representing coordinates in radar left image
Figure 436969DEST_PATH_IMAGE021
The parallax of (a) is greater than (b),
Figure 362200DEST_PATH_IMAGE022
coordinates representing the first target point
Figure 399557DEST_PATH_IMAGE023
And the coordinates
Figure 726633DEST_PATH_IMAGE024
Correspondingly, the numerical values are completely the same,
Figure 10984DEST_PATH_IMAGE025
representing point coordinates
Figure 423511DEST_PATH_IMAGE026
And the coordinates
Figure 248247DEST_PATH_IMAGE027
The distance in the vertical direction or in the horizontal direction,
Figure 506795DEST_PATH_IMAGE028
indicating function for indicating coordinates
Figure 962047DEST_PATH_IMAGE006
And point coordinates
Figure 861870DEST_PATH_IMAGE029
Whether the difference in pixel intensity between is less than a threshold;
Figure 427981DEST_PATH_IMAGE030
representing optimized point coordinates
Figure 526387DEST_PATH_IMAGE031
In the parallax
Figure 152540DEST_PATH_IMAGE032
The cost of the matching of (a) to (b),
Figure 539659DEST_PATH_IMAGE033
representing point coordinates obtained by a weighted hamming distance method
Figure 643882DEST_PATH_IMAGE034
In the parallax
Figure 613106DEST_PATH_IMAGE035
An initial matching cost of (c);
and extracting the disparity map by a semi-global stereo matching algorithm according to the optimized matching cost.
9. A depth estimation device for laser radar stereo camera fusion is characterized by comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.
10. Computer-readable storage medium, characterized in that a program executable by a processor is stored thereon, the program
The processor-executable program is for implementing the method of any one of claims 1 to 8 when executed by a processor.
CN202011464746.XA 2020-12-14 2020-12-14 Depth estimation method and device for laser radar stereo camera fusion and medium thereof Active CN112233163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011464746.XA CN112233163B (en) 2020-12-14 2020-12-14 Depth estimation method and device for laser radar stereo camera fusion and medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011464746.XA CN112233163B (en) 2020-12-14 2020-12-14 Depth estimation method and device for laser radar stereo camera fusion and medium thereof

Publications (2)

Publication Number Publication Date
CN112233163A true CN112233163A (en) 2021-01-15
CN112233163B CN112233163B (en) 2021-03-30

Family

ID=74124881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011464746.XA Active CN112233163B (en) 2020-12-14 2020-12-14 Depth estimation method and device for laser radar stereo camera fusion and medium thereof

Country Status (1)

Country Link
CN (1) CN112233163B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113281779A (en) * 2021-05-20 2021-08-20 中山大学 3D object rapid detection method, device, equipment and medium
CN114140507A (en) * 2021-10-28 2022-03-04 中国科学院自动化研究所 Depth estimation method, device and equipment integrating laser radar and binocular camera
CN114862931A (en) * 2022-05-31 2022-08-05 小米汽车科技有限公司 Depth distance determination method and device, vehicle, storage medium and chip

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255811A (en) * 2018-07-18 2019-01-22 南京航空航天大学 A kind of solid matching method based on the optimization of confidence level figure parallax
CN110517309A (en) * 2019-07-19 2019-11-29 沈阳工业大学 A kind of monocular depth information acquisition method based on convolutional neural networks
CN110517303A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of fusion SLAM method and system based on binocular camera and millimetre-wave radar
CN110942477A (en) * 2019-11-21 2020-03-31 大连理工大学 Method for depth map fusion by using binocular camera and laser radar
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
US20200175315A1 (en) * 2018-11-30 2020-06-04 Qualcomm Incorporated Early fusion of camera and radar frames
CN111415305A (en) * 2020-03-10 2020-07-14 桂林电子科技大学 Method for recovering three-dimensional scene, computer-readable storage medium and unmanned aerial vehicle
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255811A (en) * 2018-07-18 2019-01-22 南京航空航天大学 A kind of solid matching method based on the optimization of confidence level figure parallax
US20200175315A1 (en) * 2018-11-30 2020-06-04 Qualcomm Incorporated Early fusion of camera and radar frames
CN110517309A (en) * 2019-07-19 2019-11-29 沈阳工业大学 A kind of monocular depth information acquisition method based on convolutional neural networks
CN110517303A (en) * 2019-08-30 2019-11-29 的卢技术有限公司 A kind of fusion SLAM method and system based on binocular camera and millimetre-wave radar
CN110942477A (en) * 2019-11-21 2020-03-31 大连理工大学 Method for depth map fusion by using binocular camera and laser radar
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111415305A (en) * 2020-03-10 2020-07-14 桂林电子科技大学 Method for recovering three-dimensional scene, computer-readable storage medium and unmanned aerial vehicle
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113281779A (en) * 2021-05-20 2021-08-20 中山大学 3D object rapid detection method, device, equipment and medium
CN114140507A (en) * 2021-10-28 2022-03-04 中国科学院自动化研究所 Depth estimation method, device and equipment integrating laser radar and binocular camera
CN114862931A (en) * 2022-05-31 2022-08-05 小米汽车科技有限公司 Depth distance determination method and device, vehicle, storage medium and chip

Also Published As

Publication number Publication date
CN112233163B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112233163B (en) Depth estimation method and device for laser radar stereo camera fusion and medium thereof
CN111436216B (en) Method and system for color point cloud generation
JP2020525809A (en) System and method for updating high resolution maps based on binocular images
WO2021016854A1 (en) Calibration method and device, movable platform, and storage medium
EP3792660A1 (en) Method, apparatus and system for measuring distance
CN105069804B (en) Threedimensional model scan rebuilding method based on smart mobile phone
CN109885053B (en) Obstacle detection method and device and unmanned aerial vehicle
CN114049382B (en) Target fusion tracking method, system and medium in intelligent network connection environment
CN113160327A (en) Method and system for realizing point cloud completion
CN111983603A (en) Motion trajectory relay method, system and device and central processing equipment
CN113920183A (en) Monocular vision-based vehicle front obstacle distance measurement method
CN112207821A (en) Target searching method of visual robot and robot
CN114662587B (en) Three-dimensional target perception method, device and system based on laser radar
CN116612059B (en) Image processing method and device, electronic equipment and storage medium
CN111399014B (en) Local stereoscopic vision infrared camera system and method for monitoring wild animals
CN110864670B (en) Method and system for acquiring position of target obstacle
CN117406234A (en) Target ranging and tracking method based on single-line laser radar and vision fusion
CN116994225A (en) Target detection method, device, computer equipment and storage medium
CN116342677A (en) Depth estimation method, device, vehicle and computer program product
CN111986248B (en) Multi-vision sensing method and device and automatic driving automobile
CN114782496A (en) Object tracking method and device, storage medium and electronic device
CN115436927A (en) Road monitoring fusion tracking and positioning speed measuring method of camera and millimeter wave radar
CN114581889A (en) Fusion method, device, equipment, medium and product
CN112001970A (en) Monocular vision odometer method based on point-line characteristics
Berrio et al. Semantic sensor fusion: From camera to sparse LiDAR information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant