CN115861401B - Binocular and point cloud fusion depth recovery method, device and medium - Google Patents

Binocular and point cloud fusion depth recovery method, device and medium Download PDF

Info

Publication number
CN115861401B
CN115861401B CN202310170221.2A CN202310170221A CN115861401B CN 115861401 B CN115861401 B CN 115861401B CN 202310170221 A CN202310170221 A CN 202310170221A CN 115861401 B CN115861401 B CN 115861401B
Authority
CN
China
Prior art keywords
point cloud
depth
image
binocular
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310170221.2A
Other languages
Chinese (zh)
Other versions
CN115861401A (en
Inventor
许振宇
李月华
朱世强
邢琰
姜甜甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Control Engineering
Zhejiang Lab
Original Assignee
Beijing Institute of Control Engineering
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Control Engineering, Zhejiang Lab filed Critical Beijing Institute of Control Engineering
Priority to CN202310170221.2A priority Critical patent/CN115861401B/en
Publication of CN115861401A publication Critical patent/CN115861401A/en
Application granted granted Critical
Publication of CN115861401B publication Critical patent/CN115861401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a binocular and point cloud fusion depth recovery method, a device and a medium, wherein the method constructs a depth recovery neural network, and the depth recovery neural network comprises a sparse expansion module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascaded three-dimensional convolution neural network module. On the basis of a binocular stereo matching network, sparse point clouds are introduced, the density of guide points is improved by a neighborhood expansion method, and a Gaussian modulation and multi-scale feature extraction fusion method is comprehensively adopted, so that the depth recovery precision and robustness are improved, and the method is an effective method for dense depth recovery in real application.

Description

Binocular and point cloud fusion depth recovery method, device and medium
Technical Field
The invention relates to the field of computer vision, in particular to a binocular and point cloud fusion depth recovery method, a binocular and point cloud fusion depth recovery device and a binocular and point cloud fusion depth recovery medium.
Background
Depth restoration is a very important application in computer vision, and is widely applied to various fields such as robots, autopilots, three-dimensional reconstruction and the like.
Compared with the traditional binocular stereo matching depth recovery method, the depth recovery algorithm of binocular and sparse point cloud fusion introduces high-precision sparse point cloud derived from sensors such as a laser radar, a TOF camera and the like as priori information, and plays a guiding role in depth recovery. Especially in scenes with weak texture features, large shielding, large domain change and the like, the depth information provided by the sparse point cloud can effectively improve the accuracy and the robustness of depth recovery.
The existing binocular and point cloud fusion depth recovery algorithm is mainly divided into two types of point cloud guidance cost aggregation and point cloud information fusion, and the two types of algorithms directly use original sparse point clouds for fusion or guidance processing. However, due to sparsity of input point cloud data, the method for point cloud guidance cost aggregation has limited actual guidance information, and modulation guidance is only performed on a depth range, so that more sufficient prior information cannot be provided on an image dimension. For the method of point cloud information fusion, the direct fusion or the feature fusion is based on the discontinuity of the data, so that the extracted fusion information has weak texture.
Disclosure of Invention
The invention aims to provide a binocular and point cloud fusion depth recovery method, a binocular and point cloud fusion depth recovery device and a binocular and point cloud fusion depth recovery medium aiming at the defects of the prior art.
The aim of the invention is realized by the following technical scheme: the first aspect of the embodiment of the invention provides a binocular and point cloud fusion depth recovery method, which comprises the following steps:
(1) The method comprises the steps of constructing a depth recovery network, wherein the depth recovery network comprises a sparse expansion module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascade three-dimensional convolutional neural network module; the input of the depth recovery network is binocular image and sparse point cloud data, and the output of the depth recovery network is dense depth image;
(2) Training the depth recovery network constructed in the step (1), inputting binocular images and sparse point cloud data by using a binocular data set, projecting the sparse point cloud data to a left-eye camera coordinate system to generate a sparse depth image, comparing a depth truth image, carrying out data enhancement on the binocular images and the sparse depth image, calculating and outputting loss values of dense depth images, and iteratively updating network weights by using a counter-propagation network;
(3) And (3) inputting the binocular image to be tested and sparse point cloud data into the depth recovery network obtained by training in the step (2), and projecting the sparse point cloud data to a left-eye camera coordinate system to generate a sparse depth image by utilizing sensor calibration parameters so as to output a dense depth image.
Further, the sparse expansion module specifically includes: and taking the multi-channel information of the image as a guide, improving the density of sparse point cloud data by a neighborhood expansion method, and outputting a semi-dense depth map.
Further, constructing the sparse expansion module includes the sub-steps of:
(a1) Acquiring a sparse depth map according to the pose relation between the point cloud data and the left-eye camera image, and respectively extracting pixel coordinates of effective points in the sparse depth map, corresponding image multichannel values and image multichannel values of points in the neighborhood of the image multichannel values;
(a2) Calculating average image numerical deviation according to the image multi-channel numerical value corresponding to the pixel coordinates of the effective points and the image multi-channel numerical value of the adjacent points;
(a3) And expanding the sparse depth map into a semi-dense depth map according to the average image numerical deviation of the effective points and a set fixed threshold, and outputting the semi-dense depth map.
Further, the multi-scale feature extraction and fusion module specifically comprises: the method comprises the steps of taking a semi-dense depth map and a binocular image output by a sparse expansion module as input, adopting a decoding structure of a Unet encoder, combining a space pyramid pooling method to extract point cloud features, left-eye image features and right-eye image features, and further fusing the left-eye image features and the point cloud features in a cascading mode at a feature layer to obtain fusion features.
Further, constructing the multi-scale feature extraction and fusion module includes the sub-steps of:
(b1) Respectively carrying out multi-layer downsampling coding on the semi-dense depth map and the binocular image which are output by the sparse expansion module so as to obtain left-eye image characteristics, right-eye image characteristics and point cloud characteristics after downsampling coding of a plurality of scales;
(b2) Respectively carrying out spatial pyramid pooling treatment on the left eye image characteristic, the right eye image characteristic and the point cloud characteristic which are subjected to downsampling coding with the lowest resolution so as to obtain a pooling treatment result;
(b3) Respectively carrying out multi-layer up-sampling decoding on the results obtained after the pooling treatment of the left-eye image features, the right-eye image features and the point cloud features so as to obtain left-eye image features, right-eye image features and point cloud features obtained after up-sampling decoding of a plurality of scales;
(b4) And cascading the up-sampled and decoded left-eye image features and the point cloud features in feature dimensions to obtain fusion features of the left-eye image features and the point cloud features.
Further, the variable-weight gaussian modulation module specifically comprises: based on the data reliability of the semi-dense depth map, generating Gaussian modulation functions with different weights, and modulating the depth dimension at different pixel positions of the cost volume.
Further, constructing the variable weight gaussian modulation module comprises the sub-steps of:
(c1) Constructing a cost volume in a cascading mode according to the fusion characteristics and the right-eye image characteristics;
(c2) According to the reliability of the sparse point cloud, gaussian modulation functions with different weights are respectively constructed;
(c3) Modulating the cost roll according to the constructed Gaussian modulation function to obtain the modulated cost roll.
Further, constructing the cascaded three-dimensional convolutional neural network module includes the substeps of:
(d1) Carrying out cost volume fusion and cost volume aggregation on the low-resolution cost volumes through a three-dimensional convolutional neural network so as to obtain aggregated cost volumes;
(d2) Acquiring softmax values of all depth values on each pixel coordinate by adopting a softmax function so as to obtain a low-resolution depth map;
(d3) And up-sampling is carried out according to the low-resolution depth map so as to obtain a prediction result of the high-resolution depth map, and three cascading iteration processes are carried out so as to obtain a dense depth map under the complete resolution.
The second aspect of the embodiment of the invention provides a binocular and point cloud fusion depth recovery device, which comprises one or more processors and is used for realizing the binocular and point cloud fusion depth recovery method.
A third aspect of the embodiments of the present invention provides a computer readable storage medium having a program stored thereon, which when executed by a processor, is configured to implement the binocular and point cloud fusion depth restoration method described above.
The method has the advantages that dense depth is restored based on point cloud and binocular fusion, sparse point cloud data and binocular images are taken as input, a semi-dense depth image is obtained through neighborhood expansion, feature extraction and feature fusion are carried out based on the depth image and the binocular images, a cost volume is constructed, the cost volume is modulated by using a Gaussian modulation function with variable weight, cost aggregation is carried out through a deep learning network, and recovery of dense depth information is achieved; according to the invention, on the basis of the design of a binocular stereo matching depth recovery network, sparse point clouds are introduced, the density of guide points is improved by a neighborhood expansion method, and based on the sparse point clouds, a Gaussian modulation guide method and a multi-scale feature extraction and fusion method are adopted, so that the accuracy and the robustness of depth recovery are improved in an auxiliary mode. The invention relies on the sensor equipment capable of providing binocular image data and sparse point cloud data, is beneficial to improving precision and robustness, and is an effective method for recovering dense depth in real application.
Drawings
FIG. 1 is a diagram of a network architecture as a whole;
FIG. 2 is a sparse expansion schematic;
FIG. 3 is a schematic diagram of variable weight Gaussian modulation;
FIG. 4 is a schematic diagram showing the effect of the present invention; wherein a in fig. 4 is an input left-eye image, b in fig. 4 is an input right-eye image, c in fig. 4 is a picture of input sparse point cloud re-projected under a left-eye coordinate system, and d in fig. 4 is a depth picture obtained by restoration;
fig. 5 is a schematic structural diagram of the binocular and point cloud fusion depth restoration device of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The binocular and point cloud fusion depth recovery method of the invention, as shown in figure 1, comprises the following steps:
(1) And constructing a deep recovery network.
The overall network architecture design is based on an open-source deep learning framework pyrach, and is modified on a disclosed binocular three-dimensional matching network architecture CF-NET to construct four parts, namely a sparse expansion module, a multi-scale feature extraction and fusion module, a variable-weight Gaussian modulation module and a cascading three-dimensional convolutional neural network module. In addition, the input of the depth restoration network is binocular image and sparse point cloud data, and the output of the depth restoration network is dense depth image.
(1.1) constructing a sparse expansion module.
The whole processing flow of the module is shown in fig. 2, the multi-channel information of the image is used as a guide, the density of sparse point cloud data can be improved by a neighborhood expansion method, and a semi-dense depth map is output.
(a1) According to the pose relation between the point cloud data and the left-eye camera image, using an openCv reprojection function to project the input sparse point cloud data to a camera coordinate system to obtain a sparse depth map
Figure SMS_2
And defining depth value +.>
Figure SMS_8
Depth information greater than 0 is effective point, effective point +.>
Figure SMS_11
Pixel coordinates +.>
Figure SMS_4
And the corresponding image multichannel value +.>
Figure SMS_7
And image multi-channel numerical values of points in the neighborhood thereof
Figure SMS_10
Wherein->
Figure SMS_13
Sparse depth map D representing re-projection to left-eye image
Figure SMS_1
The depth values in the coordinate positions W, H represent the width and height of the image, respectively, in this embodiment w=960, h=512,
Figure SMS_5
expressed in pixel coordinates +.>
Figure SMS_9
Image multichannel value of the lower c-channel,/-channel>
Figure SMS_12
For the channel number, C=3, alpha and beta corresponding to RGB image represent the offset values on the abscissa and ordinate of the point in the neighborhood, respectively, +.>
Figure SMS_3
Representing the distance of the neighborhood, in this embodiment, let +.>
Figure SMS_6
=2. It should be understood that C may take other values as well, for example, c=4 for RGBA images. />
(a2) Pixel coordinates from the effective point
Figure SMS_14
Corresponding image multichannel value->
Figure SMS_15
Image multichannel value +.>
Figure SMS_16
Calculation ofAverage image value deviation>
Figure SMS_17
Average image numerical deviation
Figure SMS_18
The expression of (2) is:
Figure SMS_19
wherein, C is the number of channels,
Figure SMS_20
expressed in pixel coordinates +.>
Figure SMS_21
The image multi-channel value for the lower c-channel,
Figure SMS_22
expressed in pixel coordinates +.>
Figure SMS_23
The image values of the c channel of the neighborhood inner point, alpha and beta respectively represent the offset values on the abscissa and the ordinate of the neighborhood inner point,/and%>
Figure SMS_24
,/>
Figure SMS_25
,/>
Figure SMS_26
Representing the distance of the neighborhood.
(a3) Pixel coordinates for each active point
Figure SMS_27
Mean image value deviation +.>
Figure SMS_28
In contrast to a fixed Threshold (Threshold), which represents the ease of pixel extension,can be adjusted according to the accuracy of the final depth restoration, in this embodiment we set the fixed Threshold (Threshold) to 8, and then can extend the sparse depth map D to a semi-dense depth map D by the following formula exp After the neighborhood expansion of all the effective points is completed, a final semi-dense depth map can be obtained, and the semi-dense depth map is output:
Figure SMS_29
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_30
sparse depth map D representing a re-projection to the left-eye image is +.>
Figure SMS_31
The depth value in the coordinate position,
Figure SMS_32
sparse depth map D representing a re-projection to the left-eye image is +.>
Figure SMS_33
The depth values in the coordinate positions, alpha and beta respectively represent the offset values in the abscissa and the ordinate of the points in the neighborhood,/->
Figure SMS_34
,/>
Figure SMS_35
,/>
Figure SMS_36
Representing the distance of the neighborhood.
(1.2) constructing a multi-scale feature extraction and fusion module.
The module takes the semi-dense depth map and the binocular image output by the sparse expansion module as input, adopts a Unet encoder decoder structure respectively, combines a space pyramid pooling method, can extract point cloud characteristics, left-eye image characteristics and right-eye image characteristics, and further fuses the left-eye image characteristics and the point cloud characteristics in a cascading mode at a characteristic layer, so that fusion characteristics are obtained.
(b1) Multi-layer downsampling encoding is respectively carried out on the semi-dense depth map and the binocular image, and left-eye image characteristics after downsampling encoding of multiple scales can be obtained
Figure SMS_37
Right eye image features
Figure SMS_38
And Point cloud feature->
Figure SMS_39
Wherein->
Figure SMS_40
Representing the feature dimension of the i-th layer after downsampling encoding, in this embodiment, the semi-dense depth map and the binocular image are downsampled and encoded by a 5-level residual block, at which time +_>
Figure SMS_41
,W=960,H=512。
(b2) And respectively carrying out spatial pyramid pooling treatment on the left eye image characteristic, the right eye image characteristic and the point cloud characteristic which are subjected to downsampling coding with the lowest resolution so as to obtain pooled results, wherein the pooled results are respectively expressed as follows:
Figure SMS_42
Figure SMS_43
Figure SMS_44
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_45
representing a pooling function, meaning that downsampling is encodedThe feature is subjected to spatial pyramid pooling, N represents the maximum layer number of downsampling codes, and +.>
Figure SMS_46
Representing left-eye image feature pooling results, +.>
Figure SMS_47
Representing right eye image feature pooling results, +.>
Figure SMS_48
And (5) representing the point cloud characteristic pooling result.
In this embodiment, a spatial pyramid pooling method similar to the public network HSMNet is used to perform 4-level average pooling on the left-eye image feature, the right-eye image feature and the point cloud feature after downsampling encoding with the lowest resolution, where the pooling size of each level is respectively
Figure SMS_49
Pooling function->
Figure SMS_50
Expressed, the results after the pooling process can be expressed as:
Figure SMS_51
Figure SMS_52
Figure SMS_53
(b3) Respectively carrying out multi-layer up-sampling decoding on the results obtained after the pooling treatment of the left-eye image features, the right-eye image features and the point cloud features, and obtaining left-eye image features obtained after up-sampling decoding of multiple scales
Figure SMS_54
Right eye image feature->
Figure SMS_55
And Point cloud feature->
Figure SMS_56
Wherein->
Figure SMS_57
Representing the up-sampled decoded +.>
Figure SMS_58
The feature dimensions of the layer, correspondingly, the results of upsampling and decoding the left-eye image feature, the right-eye image feature and the point cloud feature are respectively expressed as follows:
Figure SMS_59
Figure SMS_60
Figure SMS_61
/>
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_62
representing a vector concatenation function, ">
Figure SMS_63
Representing the processing function of the upsampling decoding module, N representing the maximum number of layers for upsampling decoding.
In this embodiment, up-sampling decoding is performed by a 5-stage corresponding up-sampling decoding module, where F is set 1 =64,F 2 =128,F 3 =192,F 4 =256,F 5 =512, the upsampled decoded results are expressed as:
Figure SMS_64
Figure SMS_65
Figure SMS_66
(b4) Cascading the left-eye image features and the point cloud features in feature dimensions to obtain fusion features of the left-eye image features and the point cloud features
Figure SMS_67
The expression is:
Figure SMS_68
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_69
representing a vector concatenation function, ">
Figure SMS_70
Representing the left-eye image feature after upsampling decoding,/->
Figure SMS_71
And (3) representing the up-sampled and decoded point cloud characteristics, wherein i represents the characteristic dimension of the ith layer.
(1.3) constructing a variable weight Gaussian modulation module.
Based on the data reliability of the semi-dense depth image, generating Gaussian modulation functions with different weights, and modulating the depth dimension at different pixel positions of the cost volume.
(c1) Constructing a cost volume in a cascading mode according to the fusion characteristics and the right-eye image characteristics
Figure SMS_72
Wherein->
Figure SMS_73
Represents the maximum parallax search range, in this embodiment +.>
Figure SMS_74
The value 256 is->
Figure SMS_75
Characteristic dimension representing cost volume, ++>
Figure SMS_76
Representing the up-sampled decoded +.>
Figure SMS_77
Characteristic dimensions of the layers, w=960, h=512.
(c2) According to the reliability of the sparse point cloud, gaussian modulation functions with different weights are respectively constructed, and the expression of the Gaussian modulation functions is as follows:
Figure SMS_78
Figure SMS_79
wherein k is 1 、c 1 Respectively representing the weight and variance, k of the modulation function corresponding to the original sparse point cloud 2 、c 2 Respectively representing the weight and variance of the modulation function corresponding to the expanded point cloud, in the embodiment, k 1 =10,c 1 =1,k 2 =2,c 2 =8,
Figure SMS_82
Sparse depth map D representing a re-projection to the left-eye image is +.>
Figure SMS_84
Depth value in coordinate position,/->
Figure SMS_87
Semi-dense depth map D representing a re-projection to a left-eye image exp At->
Figure SMS_81
Depth value in coordinate position,/->
Figure SMS_85
、/>
Figure SMS_88
Respectively is
Figure SMS_90
、/>
Figure SMS_80
Is effective when the corresponding point is active (+)>
Figure SMS_83
) When (I)>
Figure SMS_86
Figure SMS_89
And is set to 1 otherwise to 0, d representing coordinates in the depth dimension.
(c3) Modulating the cost roll according to the constructed Gaussian modulation function to obtain a modulated cost roll
Figure SMS_91
. Specifically, for all cost volumes +.>
Figure SMS_92
Characteristic value +.>
Figure SMS_93
The modulated eigenvalues are expressed as:
Figure SMS_94
the overall flow diagram of the variable-weight gaussian modulation module is shown in fig. 3, and the corresponding sparse point cloud can be divided into an invalid point, an original point and a point obtained by neighborhood expansion.
In particular, at the point of invalidity
Figure SMS_97
、/>
Figure SMS_101
Therefore->
Figure SMS_104
、/>
Figure SMS_96
Therefore, the cost volume of the corresponding position of the invalid point remains unchanged; original +.>
Figure SMS_100
、/>
Figure SMS_103
Therefore, it is
Figure SMS_106
、/>
Figure SMS_95
Therefore, the original point corresponding to the position generation of the price volume uses a high weight and a low variance k 1 =10,c 1 A gaussian modulation function of =1; neighborhood extension derived point +.>
Figure SMS_99
Figure SMS_102
Therefore->
Figure SMS_105
、/>
Figure SMS_98
Therefore, the neighborhood expansion derived point reliability bias, using low weight high variance k 2 =2,c 2 A gaussian modulation function of 8.
(1.4) constructing a cascaded three-dimensional convolutional neural network module.
(d1) Adopting a method of cascading three-dimensional convolutional neural networks in a public network CF-NET to replace low-resolutionPrice roll
Figure SMS_107
Cost volume fusion and cost volume aggregation are carried out through an hourglass type three-dimensional convolutional neural network, and an aggregated cost volume is obtained>
Figure SMS_108
(d2) The softmax function is adopted to obtain the softmax value of all depth values on each pixel coordinate, so that a low-resolution depth map can be obtained
Figure SMS_109
(d3) Upsampling is performed based on the low-resolution depth map, so that a prediction result of the high-resolution depth map can be obtained
Figure SMS_110
. Defining the range of the depth actually predicted based on the reliability of the prediction result with the prediction result as the depth distribution range of the high-resolution cost volume aggregation +.>
Figure SMS_111
. The distribution range is recursively subjected to a cost aggregation process of the high-resolution cost volume, and cost aggregation is carried out through an hourglass type three-dimensional convolutional neural network, so that the aggregated cost volume with high primary resolution is obtained>
Figure SMS_112
Wherein->
Figure SMS_113
Representing the current number of depth layers, the corresponding actual depth value may be expressed as +.>
Figure SMS_114
. Likewise, the softmax function is utilized to obtain softmax values of all depth values on each pixel coordinate, so that a depth map +.>
Figure SMS_115
Through the above-mentioned process, 3 cascade iterative processes can be finally obtained to obtain a dense depth map under complete resolution
Figure SMS_116
The architecture of the cascaded three-dimensional convolutional neural network is shown as the cascaded three-dimensional convolutional neural network in fig. 1.
(2) Training the depth recovery network constructed in the step (1), inputting binocular images and sparse point cloud data by using a binocular data set, projecting the sparse point cloud data to a left-eye camera coordinate system to generate a sparse depth image, comparing a depth truth image, carrying out data enhancement on the binocular images and the sparse depth image, calculating and outputting loss values of dense depth images, and iteratively updating network weights by using a counter-propagation network.
In this embodiment, an open source SceneFlow binocular dataset may be selected as a task sample; the dataset contained 35454 pair binocular images and depth truth for training, 7349 pair binocular images and depth truth for testing. In the training process, 5% of points are randomly sampled from the depth truth value to obtain a sparse depth map, so that the sparse depth map of point cloud re-projection is simulated, and the sparse depth map is used as the input of the sparse depth map.
The binocular image is sequentially subjected to data enhancement by using methods such as random occlusion, asymmetric color transformation, random clipping and the like. The random occlusion is realized by randomly generating a rectangular coordinate area and converting image data on all coordinates in a corresponding area in the right image into average image values. The asymmetric color transformation is implemented by using different brightness, contrast and gamma value transformation processes for the left and right eye images, the corresponding processing functions can be directly called the adjust_bright under the torchvision. The random clipping is realized by randomly generating a rectangular coordinate area with fixed size and clipping the image information of the rest areas. The sparse depth map is also sequentially subjected to data enhancement by using methods such as random shielding and random clipping, wherein the random shielding positions are randomly generated without keeping consistent with the positions of the binocular images, and the random clipping areas are consistent with the clipping positions of the binocular images so as to ensure the correspondence between binocular image information and depth information.
The binocular image and the sparse depth image which are subjected to data enhancement processing are used as input and are sent into a depth recovery network in the step (1), an Adam optimizer is used for end-to-end network training, an L1 loss function is used for evaluating loss between the recovered depth image and a depth true value, iterative training is realized according to common forward propagation and reverse propagation processes of a neural network, and the learning rate of training can be set as follows initially
Figure SMS_117
The total iteration is carried out for 20 rounds, and the learning rate is reduced to half of the original learning rate from 16 th round to 18 th round. The learning rate and the iteration parameters can be adjusted according to the actual depth recovery precision result.
(3) In the task verification process, as shown in fig. 4, a binocular image to be tested (shown as a in fig. 4 and b in fig. 4) and sparse point cloud data are input into the depth recovery network obtained by training in the step (2), the sparse point cloud data are projected to a left-eye camera coordinate system to generate a sparse depth image (shown as c in fig. 4) by using sensor calibration parameters, and finally a dense depth image (shown as d in fig. 4) is output, so that the visualization process is completed.
Corresponding to the embodiment of the binocular and point cloud fusion depth recovery method, the invention also provides an embodiment of a binocular and point cloud fusion depth recovery device.
Referring to fig. 5, the binocular and point cloud fusion depth restoration device provided by the embodiment of the invention includes one or more processors, and is used for implementing the binocular and point cloud fusion depth restoration method in the above embodiment.
The embodiment of the binocular and point cloud fusion depth restoration device can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with data processing capability where the binocular and point cloud fusion depth recovery apparatus of the present invention is located is shown in fig. 5, and in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, the apparatus with data processing capability where the apparatus is located in an embodiment generally includes other hardware according to the actual function of the apparatus with data processing capability, which is not described herein.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements the binocular and point cloud fusion depth recovery method in the above embodiment.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (8)

1. The binocular and point cloud fusion depth recovery method is characterized by comprising the following steps of:
(1) The method comprises the steps of constructing a depth recovery network, wherein the depth recovery network comprises a sparse expansion module, a multi-scale feature extraction and fusion module, a variable weight Gaussian modulation module and a cascade three-dimensional convolutional neural network module; the input of the depth recovery network is binocular image and sparse point cloud data, and the output of the depth recovery network is dense depth image; the sparse expansion module specifically comprises: taking multi-channel information of the image as a guide, improving the density of sparse point cloud data by a neighborhood expansion method, and outputting a semi-dense depth map; the variable-weight Gaussian modulation module specifically comprises the following components: generating Gaussian modulation functions with different weights according to the data reliability of the semi-dense depth map, and modulating the depth dimensions at different pixel positions of the cost volume;
(2) Training the depth recovery network constructed in the step (1), inputting binocular images and sparse point cloud data by using a binocular data set, projecting the sparse point cloud data to a left-eye camera coordinate system to generate a sparse depth image, comparing a depth truth image, carrying out data enhancement on the binocular images and the sparse depth image, calculating and outputting loss values of dense depth images, and iteratively updating network weights by using a counter-propagation network;
(3) And (3) inputting the binocular image to be tested and sparse point cloud data into the depth recovery network obtained by training in the step (2), and projecting the sparse point cloud data to a left-eye camera coordinate system to generate a sparse depth image by utilizing sensor calibration parameters so as to output a dense depth image.
2. The binocular and point cloud fusion depth restoration method according to claim 1, wherein constructing the sparse expansion module comprises the sub-steps of:
(a1) Acquiring a sparse depth map according to the pose relation between the point cloud data and the left-eye camera image, and respectively extracting pixel coordinates of effective points in the sparse depth map, corresponding image multichannel values and image multichannel values of points in the neighborhood of the image multichannel values;
(a2) Calculating average image numerical deviation according to the image multi-channel numerical value corresponding to the pixel coordinates of the effective points and the image multi-channel numerical value of the adjacent points;
(a3) And expanding the sparse depth map into a semi-dense depth map according to the average image numerical deviation of the effective points and a set fixed threshold, and outputting the semi-dense depth map.
3. The binocular and point cloud fusion depth restoration method according to claim 1, wherein the multi-scale feature extraction and fusion module specifically comprises: the method comprises the steps of taking a semi-dense depth map and a binocular image output by a sparse expansion module as input, adopting a decoding structure of a Unet encoder, combining a space pyramid pooling method to extract point cloud features, left-eye image features and right-eye image features, and further fusing the left-eye image features and the point cloud features in a cascading mode at a feature layer to obtain fusion features.
4. A binocular and point cloud fusion depth restoration method according to claim 3, wherein constructing the multi-scale feature extraction and fusion module comprises the sub-steps of:
(b1) Respectively carrying out multi-layer downsampling coding on the semi-dense depth map and the binocular image which are output by the sparse expansion module so as to obtain left-eye image characteristics, right-eye image characteristics and point cloud characteristics after downsampling coding of a plurality of scales;
(b2) Respectively carrying out spatial pyramid pooling treatment on the left eye image characteristic, the right eye image characteristic and the point cloud characteristic which are subjected to downsampling coding with the lowest resolution so as to obtain a pooling treatment result;
(b3) Respectively carrying out multi-layer up-sampling decoding on the results obtained after the pooling treatment of the left-eye image features, the right-eye image features and the point cloud features so as to obtain left-eye image features, right-eye image features and point cloud features obtained after up-sampling decoding of a plurality of scales;
(b4) And cascading the up-sampled and decoded left-eye image features and the point cloud features in feature dimensions to obtain fusion features of the left-eye image features and the point cloud features.
5. The binocular and point cloud fusion depth restoration method according to claim 1, wherein constructing the variable weight gaussian modulation module comprises the sub-steps of:
(c1) Constructing a cost volume in a cascading mode according to the fusion characteristics and the right-eye image characteristics;
(c2) According to the reliability of the sparse point cloud, gaussian modulation functions with different weights are respectively constructed;
(c3) Modulating the cost roll according to the constructed Gaussian modulation function to obtain the modulated cost roll.
6. The binocular and point cloud fusion depth restoration method of claim 1, wherein constructing the cascaded three-dimensional convolutional neural network module comprises the sub-steps of:
(d1) Carrying out cost volume fusion and cost volume aggregation on the low-resolution cost volumes through a three-dimensional convolutional neural network so as to obtain aggregated cost volumes;
(d2) Acquiring softmax values of all depth values on each pixel coordinate by adopting a softmax function so as to obtain a low-resolution depth map;
(d3) And up-sampling is carried out according to the low-resolution depth map so as to obtain a prediction result of the high-resolution depth map, and three cascading iteration processes are carried out so as to obtain a dense depth map under the complete resolution.
7. A binocular and point cloud fusion depth restoration apparatus comprising one or more processors configured to implement the binocular and point cloud fusion depth restoration method of any one of claims 1-6.
8. A computer readable storage medium, having stored thereon a program which, when executed by a processor, is adapted to implement the binocular and point cloud fusion depth restoration method of any one of claims 1-6.
CN202310170221.2A 2023-02-27 2023-02-27 Binocular and point cloud fusion depth recovery method, device and medium Active CN115861401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310170221.2A CN115861401B (en) 2023-02-27 2023-02-27 Binocular and point cloud fusion depth recovery method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310170221.2A CN115861401B (en) 2023-02-27 2023-02-27 Binocular and point cloud fusion depth recovery method, device and medium

Publications (2)

Publication Number Publication Date
CN115861401A CN115861401A (en) 2023-03-28
CN115861401B true CN115861401B (en) 2023-06-09

Family

ID=85659135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310170221.2A Active CN115861401B (en) 2023-02-27 2023-02-27 Binocular and point cloud fusion depth recovery method, device and medium

Country Status (1)

Country Link
CN (1) CN115861401B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435325A (en) * 2020-09-29 2021-03-02 北京航空航天大学 VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346608B (en) * 2013-07-26 2017-09-08 株式会社理光 Sparse depth figure denseization method and apparatus
CN109685842B (en) * 2018-12-14 2023-03-21 电子科技大学 Sparse depth densification method based on multi-scale network
US10984543B1 (en) * 2019-05-09 2021-04-20 Zoox, Inc. Image-based depth data and relative depth data
US10937178B1 (en) * 2019-05-09 2021-03-02 Zoox, Inc. Image-based depth data and bounding boxes
CN110738731B (en) * 2019-10-16 2023-09-22 光沦科技(深圳)有限公司 3D reconstruction method and system for binocular vision
CN111028285A (en) * 2019-12-03 2020-04-17 浙江大学 Depth estimation method based on binocular vision and laser radar fusion
CN111563923B (en) * 2020-07-15 2020-11-10 浙江大华技术股份有限公司 Method for obtaining dense depth map and related device
CN112102472B (en) * 2020-09-01 2022-04-29 北京航空航天大学 Sparse three-dimensional point cloud densification method
CN114004754B (en) * 2021-09-13 2022-07-26 北京航空航天大学 Scene depth completion system and method based on deep learning
CN114519772A (en) * 2022-01-25 2022-05-20 武汉图科智能科技有限公司 Three-dimensional reconstruction method and system based on sparse point cloud and cost aggregation
CN115512042A (en) * 2022-09-15 2022-12-23 网易(杭州)网络有限公司 Network training and scene reconstruction method, device, machine, system and equipment
CN115511759A (en) * 2022-09-23 2022-12-23 西北工业大学 Point cloud image depth completion method based on cascade feature interaction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112435325A (en) * 2020-09-29 2021-03-02 北京航空航天大学 VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method

Also Published As

Publication number Publication date
CN115861401A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
Jaritz et al. Sparse and dense data with cnns: Depth completion and semantic segmentation
Tang et al. Learning guided convolutional network for depth completion
US20200117906A1 (en) Space-time memory network for locating target object in video content
CN114782691A (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN116310076A (en) Three-dimensional reconstruction method, device, equipment and storage medium based on nerve radiation field
CN113850900B (en) Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction
CN116797768A (en) Method and device for reducing reality of panoramic image
CN117095132B (en) Three-dimensional reconstruction method and system based on implicit function
CN117576292A (en) Three-dimensional scene rendering method and device, electronic equipment and storage medium
Polasek et al. Vision UFormer: Long-range monocular absolute depth estimation
Lyu et al. Learning a room with the occ-sdf hybrid: Signed distance function mingled with occupancy aids scene representation
CN112686830A (en) Super-resolution method of single depth map based on image decomposition
CN115861401B (en) Binocular and point cloud fusion depth recovery method, device and medium
US20230104702A1 (en) Transformer-based shape models
Wu et al. Non‐uniform image blind deblurring by two‐stage fully convolution network
CN113066165B (en) Three-dimensional reconstruction method and device for multi-stage unsupervised learning and electronic equipment
US20230145498A1 (en) Image reprojection and multi-image inpainting based on geometric depth parameters
CN115423697A (en) Image restoration method, terminal and computer storage medium
JP2024521816A (en) Unrestricted image stabilization
Deng et al. Cformer: An underwater image enhancement hybrid network combining convolution and transformer
KR102648938B1 (en) Method and apparatus for 3D image reconstruction based on few-shot neural radiance fields using geometric consistency
Du et al. Dehazing Network: Asymmetric Unet Based on Physical Model
CN117274066B (en) Image synthesis model, method, device and storage medium
CN117934733B (en) Full-open vocabulary 3D scene graph generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant