CN109685842B - Sparse depth densification method based on multi-scale network - Google Patents

Sparse depth densification method based on multi-scale network Download PDF

Info

Publication number
CN109685842B
CN109685842B CN201811531022.5A CN201811531022A CN109685842B CN 109685842 B CN109685842 B CN 109685842B CN 201811531022 A CN201811531022 A CN 201811531022A CN 109685842 B CN109685842 B CN 109685842B
Authority
CN
China
Prior art keywords
layer
convolution
input
block
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811531022.5A
Other languages
Chinese (zh)
Other versions
CN109685842A (en
Inventor
刘光辉
朱志鹏
孙铁成
李茹
徐增荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811531022.5A priority Critical patent/CN109685842B/en
Publication of CN109685842A publication Critical patent/CN109685842A/en
Application granted granted Critical
Publication of CN109685842B publication Critical patent/CN109685842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a sparse depth densification method based on a multi-scale network. Belongs to the technical field of depth estimation of computer vision. The invention uses the multi-scale convolution neural network to effectively fuse the RGB image data and the sparse point cloud data, and finally obtains the dense depth image. And mapping the sparse point cloud to a two-dimensional plane to generate a sparse depth map, aligning the sparse depth map with the RGB image, connecting the sparse depth map and the RGB image together to generate an RGBD image, inputting the RGBD image to a multi-scale convolution neural network for training and testing, and finally estimating a dense depth map. The depth is estimated in a mode of combining the RGB image and the sparse point cloud, and the distance information contained in the point cloud can guide the RGB image to be converted into a depth map; the multi-scale network utilizes the information of different resolutions of the original data, on one hand, the visual field is expanded, on the other hand, the input depth map on the small resolution is denser, and higher accuracy can be obtained.

Description

Sparse depth densification method based on multi-scale network
Technical Field
The invention belongs to the field of depth estimation of computer vision, and particularly relates to a sparse depth densification method based on a multi-scale convolutional neural network.
Background
In unmanned driving, a perception system based on computer vision technology is the most basic part. At present, a camera based on visible light is most commonly used in the unmanned sensing system, and the camera has the advantages of low cost, mature related technology and the like. However, visible light-based cameras also have significant disadvantages: first, the RGB image captured by the camera only has color information, and if the texture of the target is complex, the sensing system is prone to misjudgment. Second, in some environments, visible light based cameras can fail. For example, at night when the light is insufficient, the camera is difficult to work normally. Lidar is also a sensor often used by unmanned sensing systems. The laser radar is not easily affected by illumination conditions, the collected point cloud data has three-dimensional characteristics, a depth image can be directly obtained from the point cloud data, the depth image is formed by mapping the point cloud to a two-dimensional plane, and the value of each pixel point represents the distance from the point to the sensor. Compared with an RGB image, the distance information contained in the depth image is more helpful to tasks such as object recognition and segmentation. However, the laser radar is expensive, the collected point cloud is too sparse, the generated depth map is also too sparse, and the using effect of the laser radar is influenced to a certain degree.
Disclosure of Invention
The invention aims to: in view of the above problems, a method for densifying a sparse depth using a multi-scale network is provided.
The sparse depth densification method based on the multi-scale network comprises the following steps:
constructing a multi-scale network model:
the multi-scale network model comprises L (L is more than or equal to 2) input branch branches, output corresponding points of the L branch branches are added and then input into the information fusion layer, and an upper sampling processing layer is connected behind the information fusion layer and serves as an output layer of the multi-scale network model;
wherein, L routes input the branch road, wherein a route of branch road is regarded as the input of the original image; the residual L-1 path is used as an original image to carry out different down-sampling to obtain the input of a down-sampling image; the size of an output image of an output layer of the multi-scale network model is the same as that of an original image;
and the input data of the L input branch comprises: an RGB image and a sparse depth map; the downsampling mode of the sparse depth map of the original image is as follows: for the sparse depth map, dividing the sparse depth map into grids according to pixels based on a preset downsampling multiple K, wherein each grid comprises KxK original input pixels; and setting a mark value s of each original input pixel based on the depth value of the original input pixel i If the depth value of the current original input pixel is 0, s i =0; otherwise s i =1; where i is the specifier of K × K original input pixels included in each grid; and according to the formula
Figure BDA0001905651140000021
Obtaining the depth value p of each mesh new Wherein p is i A depth value representing the original input pixel i;
inputting a network structure of a branch which is an original image into a first network structure;
the network structure of the branch of the down-sampling image input as the original image is as follows: adding K/2 16-channel upsampling volume blocks D behind the first network structure, wherein K represents the downsampling multiple of the original image;
the first network structure includes fourteen layers, which are respectively:
the first layer is an input layer and a pooling layer, the convolution kernel size of the input layer is 7*7, the number of channels is 64, and the convolution step length is 2; the pooling layer adopts maximum pooling, the size of a convolution kernel is 3*3, and a pooling constant is 2;
the second layer and the third layer have the same structure and are both a 64-channel R 1 A residual convolution block;
the fourth layer is a 128-channel R 2 A residual convolution block;
the fifth layer is a 128-channel R 1 A residual convolution block;
the sixth layer is a 256-channel R 2 A residual convolution block;
the seventh layer is R with 256 channels 1 A residual convolution block;
the eighth layer is a 512-channel R 2 A residual convolution block;
the ninth layer is a 512-channel R 1 A residual convolution block;
the tenth layer is a convolution layer, the convolution kernel size is 3*3, the channel number is 256, and the convolution step size is 1;
the eleventh layer is an up-sampling rolling block D with 128 channels, and the output of the eleventh layer and the output of the seventh layer are input into the twelfth layer after being overlapped according to the channels;
the twelfth layer is an up-sampling rolling block D with 64 channels, and the output of the twelfth layer and the output of the fifth layer are overlapped according to the channels and then input into the thirteenth layer;
the thirteenth layer is an up-sampling volume block D with 32 channels, and the output of the twelfth layer and the output of the eleventh layer are input into the fourteenth layer after being overlapped according to the channels;
the fourteenth layer is an upsampling volume block D with 16 channels;
the R is 1 The residual convolution block comprises two layers of convolution layers with the same structure, the convolution kernel size is 3*3, the convolution step length is 1, and the number of channels is adjustable; and will input R 1 Adding the input data of the residual volume block and the output corresponding point of the second layer to access a ReLU activation function as R 1 An output layer of the residual convolution block;
the R is 2 The residual convolution block includes first, second and third convolution layers, and an input R 2 The input data of the residual convolution block respectively enters two branches, and then the output corresponding points of the two branches are added to be connected with a ReLU activation function as R 2 An output layer of the residual convolution block; one branch is a first convolution layer and a second convolution layer which are connected in sequence, and the other branch is a third convolution layer;
the first convolution layer and the second convolution layer are identical in structure, the convolution kernel size is 3*3, the convolution step length is 2, and the number of channels can be adjusted; the third convolution layer has the convolution kernel size of 3*3, the convolution step length is 1, and the number of channels is adjustable;
the up-sampling convolution block D comprises two amplification modules and a convolution layer, wherein input data input into the up-sampling convolution block D respectively enter two branches, and output corresponding points of the two branches are added to be connected with a ReLU activation function to serve as an output layer of the up-sampling convolution block D; one branch is a first amplification module and a convolution layer which are connected in sequence, and the other branch is a second amplification module;
wherein, the convolution layer of the up-sampling convolution block D is: the convolution kernel size is 3*3, the convolution step is 1, and the number of channels is adjustable;
the amplification module of the up-sampling convolution block D comprises four parallel convolution layers, the number of channels of the four convolution layers is set to be the same, and the sizes of convolution kernels are respectively as follows: 3, 2, 3 and 2*2, the convolution step length is 1, and the input data of the input amplification module passes through the four convolution layers and then is spliced together to be used as the output of the amplification module;
the information fusion module is a convolution layer with the convolution kernel size of 3*3, the channel number of 1 and the convolution step length of 1;
and performing deep learning training on the constructed multi-scale network model, and obtaining a densification processing result of the image to be processed through the trained multi-scale network model.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: according to the depth estimation method, the depth is estimated by combining the sparse point cloud and the image, the RGB image is guided by the sparse depth, and is supplemented by the RGB image, the advantages of two data forms are combined, the depth estimation is performed by combining the multiple scale network models, and the accuracy of the depth estimation is improved.
Drawings
FIG. 1 is a schematic down-sampling illustration of the present invention in an embodiment;
fig. 2 is a schematic diagram of a residual convolution block in an embodiment. Wherein FIG. 2-a is a type one residual volume block and FIG. 2-b is a type two residual volume block;
fig. 3 is a schematic diagram of an upsampling convolution block in an embodiment. Wherein fig. 3-a is a schematic diagram of an amplification module, and fig. 3-b is a schematic diagram of an entire upsampled convolution block;
FIG. 4 is a diagram illustrating a multi-scale network architecture used in an exemplary embodiment;
FIG. 5 is a graph of results and comparative results of the present invention and prior art treatment methods in an embodiment. Wherein FIG. 5-a is an input RGB image and FIG. 5-b is a sparse depth map; FIG. 5-c is a depth estimation of FIG. 5-b using a prior art method; FIG. 5-d is a depth estimation result of FIG. 5-b according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
In order to meet the requirement of a specific scene (such as unmanned driving) on high depth image quality, the invention provides a method for densifying sparse depth by using a multi-scale network. The depth estimation method mainly utilizes the RGB image to directly obtain the dense depth, but because the depth image directly estimated by the two-dimensional image has inherent ambiguity, in order to solve the problem, the depth estimation method estimates the depth by combining the sparse point cloud and the image, the RGB image is guided by the sparse depth, the RGB image supplements the sparse depth, the advantages of two data forms are combined, the depth estimation is carried out under multiple scales, and the accuracy of the depth estimation is improved.
The invention uses the multi-scale convolution neural network to effectively fuse the RGB image data and the sparse point cloud data, and finally obtains the dense depth image. And mapping the sparse point cloud to a two-dimensional plane to generate a sparse Depth Map, aligning the sparse Depth Map with the RGB image, connecting the sparse Depth Map and the RGB image together to generate an RGBD (RGB + Depth Map) image, inputting the RGBD image to a multi-scale convolution neural network for training and testing, and finally estimating a dense Depth Map. The depth is estimated in a mode of combining the RGB image and the sparse point cloud, and distance information contained in the point cloud can guide the RGB image to be converted into a depth map; the multi-scale network utilizes the information of different resolutions of the original data, on one hand, the visual field is expanded, on the other hand, the input depth map on the small resolution is denser, and higher accuracy can be obtained.
The specific implementation process of the sparse depth densification method based on the multi-scale is as follows:
(1) Input data downsampling:
the feasible down-sampling multiples have a large relation to the size of the input data. For an input image of size M x N, a range of possible downsampling multiples is [2,min (M, N) × 2% -5 ]。
The sampling method is as follows: representing the selected down-sampling multiple by K, dividing the input sparse depth map into grids according to pixels, each grid containing K x K original input pixels, and dividing the input image into
Figure BDA0001905651140000041
A grid. Fig. 1 is a schematic diagram of a downsampling multiple of 2. Representing K x K pixels in the grid as a set of pixels P = { P = { P } 1 ,p 2 ,...,p K*K }。
Since there are values with a depth of zero in the sparse depth map, these values are referred to as invalid values. Constructing a marking value s for marking an invalid value, and if the depth value of the pixel point is not equal to 0, determining that the pixel point is valid, and enabling s to be equal to 1; otherwise, the value is invalid, and s is equal to 0. So that a set of flag values corresponding to the set of pixels P can be obtained as S = { S = { S = } 1 ,s 2 ,...,s K*K }。
The new depth value after the down sampling is as follows:
Figure BDA0001905651140000042
wherein p is n Depth value, s, representing original pixel point n Indicating the marking value of the original pixel.
And performing the operation on each divided grid to obtain a new depth map with smaller resolution and more density (short for a small-resolution depth map). Compared with the traditional downsampling method, the small-resolution depth map obtained by the method is denser, and the depth value is more accurate due to the fact that the influence of invalid values is eliminated. The RGB image down-sampling uses a conventional bilinear interpolation down-sampling method. And finally obtaining an image with small resolution and a sparse depth map.
(2) Constructing a residual volume block:
the residual volume block is an important component of the multi-scale network of the present invention, and is used to extract the features of the input data, and is divided into two types.
The type one is as follows: residual convolution block R 1 The construction process is as follows: as shown in FIG. 2-a, the first layer of the residual convolution block is a convolution layer with a convolution kernel size of 3*3, a number of channels of n, and a convolution step (stride) of 1. The second layer has the same structure as the first layer. The input data is then added to the output corresponding points of the second layer. Finally, a ReLU activation function is accessed. The residual convolution block structure is fixed, but the number of channels of the convolution layer is variable, different residual convolution blocks can be obtained by adjusting the number of channels, and accordingly the type-one residual convolution block is named as n channels R 1 。R 1 Is consistent with the input-output size of (a), wherein there is no down-sampling operation.
Type two: residual convolution block R 2 The construction process is as follows: as shown in fig. 2-b, the first layer of the residual convolution block is a convolution layer with a convolution kernel size of 3*3, a number of channels of n, and a convolution step size of 2. The second layer is also a convolutional layer with a convolutional kernel size of 3*3, a number of channels of n, and a convolutional step size of 1. Then, the input data is passed through a convolution layer with a convolution kernel size of 1*1, the number of channels n, and a convolution step size of 2, and the output is added to the output corresponding point of the second layer. Finally, a ReLU activation function is accessed. And R 1 Naming in a similar manner, type two residual volume block is named n-channel R 2 。R 2 The input size of the operation is twice of the output size, the purpose of the operation is to enlarge the receptive field of the convolution kernel and better extract global features.
(3) Constructing an upsampling volume block:
the upsampling volume blocks are also an important part of the multi-scale network, and the role of the upsampling volume blocks is to amplify the input, and each upsampling volume block can amplify the input by one time. The construction process is as follows: the basic module of the upsampling convolutional block is an amplification module, as shown in fig. 3-a, the amplification module is composed of four convolutional layers in parallel, the number of channels of the four convolutional layers is n, the sizes of convolutional kernels are 3 × 3,3 × 2,2 × 3 and 2*2 respectively, the input is spliced together after passing through the four convolutional layers, and the output is doubled compared with the input. As shown in fig. 3-b, the upsampled convolution block is made up of two branches. The first layer of branch one is an amplifying module with the number of channels being n, and the amplifying module is followed by a ReLU activating function, the second layer is a convolution layer, the convolution kernel size is 3*3, and the number of channels is n. The second branch has only one layer, which is an amplifying module with the number of channels being n. And adding the output of the first branch and the corresponding point of the output of the second branch, and finally accessing a ReLU activation function. And R 1 ,R 2 The upsampled volume block is named n-channel D similarly.
(4) Constructing a multi-scale convolution network:
the multi-scale network can construct multiple scales, namely, multiple branches can be constructed, the constructed number of the branches is influenced by the size of the input image as the down-sampling multiple, and the size of the branches isFor M x N images, the upper limit of the number of branches is log 2 (min(M,N)*2 -5 ) +1. The construction method takes two branches as an example, and two branches are required to be established, wherein the input of one branch is the original resolution, the input of the other branch is the 1/K original resolution, and K is the downsampling multiple of an input image. And finally, carrying out information fusion on the two branches.
The first branch, i.e. the branch whose input is the original resolution, is constructed as follows:
the first layer is an input layer and a pooling layer, the convolution kernel size of the input layer is 7*7, the number of channels is 64, and the convolution step size is 2. The pooling layer adopts maximum pooling, the convolution kernel size is 3*3, and the pooling constant is 2. The original input size is M x N4, and the size is changed after passing through the first layer
Figure BDA0001905651140000061
That is, the size is 1/4 of the original size, and the number of channels is 64.
The second layer is a 64-channel R 1 Residual convolution block, denoted R 1 1
The third layer has the same structure as the second layer and is marked as R 1 2
The fourth layer is a 128-channel R 2 Residual convolution block is denoted as R 2 1
The fifth layer is a 128-channel R 1 Residual convolution block, denoted R 1 3
The sixth layer is a 256-channel R 2 Residual volume block, denoted R 2 2
The seventh layer is a 256-channel R 1 Residual volume block, denoted R 1 4
The eighth layer is a 512-channel R 2 Residual volume block, denoted R 2 3
The ninth layer is a 512-channel R 1 Residual volume block, denoted R 1 5
The tenth layer is a convolutional layer with a convolutional kernel size of 3*3, a number of channels of 256, and a convolutional step size of 1.
The eleventh layer is an upsampled 128-channel convolution block D, denoted as D 1
Then D is added 1 And the output of (1) and the seventh layer R 1 4 Are added together according to the channel, where R 1 4 Has an output size of
Figure BDA0001905651140000062
D 1 Has an output size of
Figure BDA0001905651140000063
The size after the superposition becomes
Figure BDA0001905651140000064
The significance of the superposition is that some original information lost in the convolution process can be obtained, so that the result is more accurate.
The twelfth layer is an upsampled volume block D of 64 channels, marked as D 2 Then, D is 2 Is output and R 1 3 The outputs of (a) are superimposed according to the channel.
The thirteenth layer is an upsampled 32-channel convolution block D, denoted as D 3 Then, D is 3 Output and R 1 2 The outputs of (a) are superimposed by channel.
The fourteenth layer is an upsampled 16-channel convolution block D, denoted as D 4
At this point, the network structure of the branch with the input of the original resolution is constructed.
The second branch, the branch with the input of 1/K original resolution, is constructed as follows:
the first fourteen layers have the same structure as the branch with the original resolution, and then the corresponding number of 16-channel upsampling volume blocks D are added according to the input size of the branch. For a tributary with an input of 1/K original resolution (downsampling multiple K), K/2 upsampled convolution blocks are added. Fig. 4 shows an example of a two-branch case where the second branch is input at 1/2 of the original resolution (the down-sampling multiple is 2), and the number of the up-sampling convolution blocks D to be added by the second branch is 1. The multi-resolution case is similar, if the input is 1/4 of the original resolution, two 16-channel upsampled volume blocks are added, and so on.
After the branch is constructed, the information of the two branches needs to be fused, and the structure of the information fusion is as follows: and adding the output of the first branch and the output corresponding point of the second branch to serve as the input of the information fusion module. The network structure of the information fusion module is a convolution layer, the convolution kernel size is 3*3, the number of channels is 1, and finally the layer output is subjected to linear up-sampling to obtain a final result with the size same as that of the original input.
The information fusion in the case of redundant multiple branches (more than two branches) is as follows:
(5) Setting of the loss function:
in the present embodiment, the loss function is a Smooth L1 loss function, that is, the loss function is a function of
Figure BDA0001905651140000071
Wherein d represents the depth value estimated by the convolutional neural network, d g Indicating standard depth values and N indicating the sum of the number of pixels in a depth map.
(6) Training and testing of the model:
in the present embodiment, the training data is derived from the public data set NYU-Depth-v2 dataset. The dataset contains RGB images and dense depth maps, with a size of 640 x 480. 48000 RGB images and corresponding dense depth maps are selected as training data in the training process; the test procedure selects 654 RGB images and their corresponding dense depths as test data. The input of the network is an RGB image and a sparse depth map, the sparse depth map does not exist in the data set, the sparse depth map can be obtained by randomly sampling 1000 points of the dense depth map, and the RGBD image is combined with the RGB image to serve as the input.
During training, the RGBD image is down-sampled to 320 × 240, then center cutting is performed to obtain an RGBD image (i.e., an original image input to the multi-scale network model) of 304 × 228, the image is used as an input of a first branch, and then the image is down-sampled twice according to the method described in step (1) to obtain an RGBD image of 152 × 114 as an input of a second branch. Training 8 images at a time requires 6000 times for training the entire data set, and 15 times for training the entire data set, for a total of 90000 times. The learning rate during training adopts the changed learning rate, the initial learning rate is set to be 0.01, the learning rate is reduced by 10 times after the data set is trained for 5 times, and the final learning rate is 0.0001. And after training, storing the parameters of the model.
And during testing, reading parameters of the model, inputting the processed data into the model in a data processing mode which is the same as that in the training process, and outputting a final result. As shown in fig. 5, there are some comparisons between the output results of the present invention and the existing deep learning method. The results of the invention are clearer overall, and the results of the invention are better detailed as can be seen from comparison of the results in the black box.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (3)

1. A sparse depth densification method based on a multi-scale network is characterized by comprising the following steps:
constructing a multi-scale network model:
the multi-scale network model comprises L input branch branches, the output corresponding points of the L branch branches are added and then input into the information fusion layer, and an up-sampling processing layer is connected behind the information fusion layer and serves as an output layer of the multi-scale network model;
wherein, L routes input the branch road, wherein a route of branch road is regarded as the input of the original image; the residual L-1 path is used as an original image to carry out different down-sampling to obtain the input of a down-sampling image; the size of an output image of an output layer of the multi-scale network model is the same as that of an original image;
and the input data of the L-path input branch circuit comprises: an RGB image and a sparse depth map; the downsampling mode of the sparse depth map of the original image is as follows: for the sparse depth map, dividing the sparse depth map into grids according to pixels based on a preset downsampling multiple K, wherein each grid comprises KxK original input pixels; and sets a tag value s for each original input pixel based on the depth value of the original input pixel i If the depth value of the current original input pixel is 0, s i =0; otherwise s i =1; where i is the specifier of K × K original input pixels included in each grid; and according to the formula
Figure FDA0004036000890000011
Obtaining the depth value p of each mesh new Wherein p is i A depth value representing the original input pixel i;
inputting a network structure of a branch which is an original image into a first network structure;
the network structure of the branch of the down-sampling image which is input as the original image is as follows: adding K/2 upsampling volume blocks D with 16 channels behind the first network structure, wherein K represents the downsampling multiple of the original image;
the first network structure includes fourteen layers, which are respectively:
the first layer is an input layer and a pooling layer, the convolution kernel size of the input layer is 7*7, the number of channels is 64, and the convolution step length is 2; the pooling layer adopts maximum pooling, the size of a convolution kernel is 3*3, and a pooling constant is 2;
the second layer and the third layer have the same structure and are both a 64-channel R 1 A residual convolution block;
the fourth layer is a 128-channel R 2 A residual convolution block;
the fifth layer is a 128-channel R 1 A residual convolution block;
the sixth layer is a 256-channel R 2 A residual convolution block;
the seventh layer is R with 256 channels 1 A residual convolution block;
the eighth layer is a 512-channel R 2 A residual convolution block;
the ninth layer is a 512-channel R 1 A residual convolution block;
the tenth layer is a convolution layer, the convolution kernel size is 3*3, the channel number is 256, and the convolution step size is 1;
the eleventh layer is an up-sampling rolling block D with 128 channels, and the output of the eleventh layer and the output of the seventh layer are input into the twelfth layer after being overlapped according to the channels;
the twelfth layer is an up-sampling rolling block D with 64 channels, and the output of the twelfth layer and the output of the fifth layer are overlapped according to the channels and then input into the thirteenth layer;
the thirteenth layer is an up-sampling volume block D with 32 channels, and the output of the twelfth layer and the output of the eleventh layer are input into the fourteenth layer after being overlapped according to the channels;
the fourteenth layer is an upsampling volume block D of 16 channels;
the R is 1 The residual convolution block comprises two layers of convolution layers with the same structure, the convolution kernel size is 3*3, the convolution step length is 1, and the number of channels is adjustable; and will input R 1 Adding the input data of the residual volume block and the output corresponding point of the second layer to access a ReLU activation function as R 1 An output layer of the residual convolution block;
the R is 2 The residual convolution block includes first, second and third convolution layers and an input R 2 The input data of the residual convolution block respectively enters two branches, and then the output corresponding points of the two branches are added to be connected with a ReLU activation function as R 2 An output layer of the residual convolution block; one branch is a first convolution layer and a second convolution layer which are connected in sequence, and the other branch is a third convolution layer;
the first convolution layer and the second convolution layer have the same structure, the convolution kernel size is 3*3, the convolution step length is 2, and the number of channels is adjustable; the third convolution layer has the convolution kernel size of 3*3, the convolution step length is 1, and the number of channels is adjustable;
the up-sampling convolution block D comprises two amplification modules and a convolution layer, wherein input data input into the up-sampling convolution block D respectively enter two branches, and output corresponding points of the two branches are added and connected into a ReLU activation function to serve as an output layer of the up-sampling convolution block D; one branch is a first amplification module and a convolution layer which are connected in sequence, and the other branch is a second amplification module;
wherein, the convolution layer of the up-sampling convolution block D is: the convolution kernel size is 3*3, the convolution step is 1, and the number of channels is adjustable;
the amplification module of the up-sampling convolution block D comprises four parallel convolution layers, the number of channels of the four convolution layers is set to be the same, and the sizes of convolution kernels are respectively as follows: 3, 2, 3 and 2*2, the convolution step length is 1, and the input data of the input amplification module passes through the four convolution layers and then is spliced together to be used as the output of the amplification module;
the information fusion module is a convolution layer with the convolution kernel size of 3*3, the channel number of 1 and the convolution step length of 1;
and carrying out deep learning training on the constructed multi-scale network model, and obtaining a densification processing result of the image to be processed through the trained multi-scale network model.
2. The method of claim 1, wherein the RGB image of the original image is down-sampled in a manner of: a bilinear interpolation down-sampling method is used.
3. The method of claim 1, wherein the loss function used in deep learning training of the multi-scale network model is
Figure FDA0004036000890000031
Wherein d is j The depth value of each pixel point output by the multi-scale network model, namely the depth value of an estimated value, j is a pixel point distinguisher,
Figure FDA0004036000890000032
and expressing the standard depth value of the pixel point, namely the corresponding label value of the training sample, wherein N expresses the sum of the pixel number of a sparse depth image.
CN201811531022.5A 2018-12-14 2018-12-14 Sparse depth densification method based on multi-scale network Active CN109685842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811531022.5A CN109685842B (en) 2018-12-14 2018-12-14 Sparse depth densification method based on multi-scale network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811531022.5A CN109685842B (en) 2018-12-14 2018-12-14 Sparse depth densification method based on multi-scale network

Publications (2)

Publication Number Publication Date
CN109685842A CN109685842A (en) 2019-04-26
CN109685842B true CN109685842B (en) 2023-03-21

Family

ID=66187804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811531022.5A Active CN109685842B (en) 2018-12-14 2018-12-14 Sparse depth densification method based on multi-scale network

Country Status (1)

Country Link
CN (1) CN109685842B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490118A (en) * 2019-08-14 2019-11-22 厦门美图之家科技有限公司 Image processing method and device
CN110796105A (en) * 2019-11-04 2020-02-14 中国矿业大学 Remote sensing image semantic segmentation method based on multi-modal data fusion
CN113034562B (en) * 2019-12-09 2023-05-12 百度在线网络技术(北京)有限公司 Method and apparatus for optimizing depth information
CN111062981B (en) * 2019-12-13 2023-05-05 腾讯科技(深圳)有限公司 Image processing method, device and storage medium
CN111079683B (en) * 2019-12-24 2023-12-12 天津大学 Remote sensing image cloud and snow detection method based on convolutional neural network
CN111199516B (en) * 2019-12-30 2023-05-05 深圳大学 Image processing method, system and storage medium based on image generation network model
CN111179331B (en) * 2019-12-31 2023-09-08 智车优行科技(上海)有限公司 Depth estimation method, depth estimation device, electronic equipment and computer readable storage medium
CN110992271B (en) * 2020-03-04 2020-07-07 腾讯科技(深圳)有限公司 Image processing method, path planning method, device, equipment and storage medium
CN111667522A (en) * 2020-06-04 2020-09-15 上海眼控科技股份有限公司 Three-dimensional laser point cloud densification method and equipment
CN112001914B (en) * 2020-08-31 2024-03-01 三星(中国)半导体有限公司 Depth image complement method and device
CN112102472B (en) * 2020-09-01 2022-04-29 北京航空航天大学 Sparse three-dimensional point cloud densification method
CN112258626A (en) * 2020-09-18 2021-01-22 山东师范大学 Three-dimensional model generation method and system for generating dense point cloud based on image cascade
CN112837262B (en) * 2020-12-04 2023-04-07 国网宁夏电力有限公司检修公司 Method, medium and system for detecting opening and closing states of disconnecting link
CN112861729B (en) * 2021-02-08 2022-07-08 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
CN113256546A (en) * 2021-05-24 2021-08-13 浙江大学 Depth map completion method based on color map guidance
EP4156085A4 (en) * 2021-08-06 2023-04-26 Shenzhen Goodix Technology Co., Ltd. Depth image collection apparatus, depth image fusion method and terminal device
CN113344839B (en) * 2021-08-06 2022-01-07 深圳市汇顶科技股份有限公司 Depth image acquisition device, fusion method and terminal equipment
CN113807417B (en) * 2021-08-31 2023-05-30 中国人民解放军战略支援部队信息工程大学 Dense matching method and system based on deep learning visual field self-selection network
CN114627351B (en) * 2022-02-18 2023-05-16 电子科技大学 Fusion depth estimation method based on vision and millimeter wave radar
CN114494023B (en) * 2022-04-06 2022-07-29 电子科技大学 Video super-resolution implementation method based on motion compensation and sparse enhancement
CN116152066B (en) * 2023-02-14 2023-07-04 苏州赫芯科技有限公司 Point cloud detection method, system, equipment and medium for complete appearance of element
CN115861401B (en) * 2023-02-27 2023-06-09 之江实验室 Binocular and point cloud fusion depth recovery method, device and medium
CN115908531B (en) * 2023-03-09 2023-06-13 深圳市灵明光子科技有限公司 Vehicle-mounted ranging method and device, vehicle-mounted terminal and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108535675A (en) * 2018-04-08 2018-09-14 朱高杰 A kind of magnetic resonance multichannel method for reconstructing being in harmony certainly based on deep learning and data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9519972B2 (en) * 2013-03-13 2016-12-13 Kip Peli P1 Lp Systems and methods for synthesizing images from image data captured by an array camera using restricted depth of field depth maps in which depth estimation precision varies
US9412172B2 (en) * 2013-05-06 2016-08-09 Disney Enterprises, Inc. Sparse light field representation
CN106408015A (en) * 2016-09-13 2017-02-15 电子科技大学成都研究院 Road fork identification and depth estimation method based on convolutional neural network
CN107767413B (en) * 2017-09-20 2020-02-18 华南理工大学 Image depth estimation method based on convolutional neural network
CN107944459A (en) * 2017-12-09 2018-04-20 天津大学 A kind of RGB D object identification methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108535675A (en) * 2018-04-08 2018-09-14 朱高杰 A kind of magnetic resonance multichannel method for reconstructing being in harmony certainly based on deep learning and data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度网络模型压缩综述;雷杰等;《软件学报》(第02期);第31-46页 *

Also Published As

Publication number Publication date
CN109685842A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN109685842B (en) Sparse depth densification method based on multi-scale network
CN111563923B (en) Method for obtaining dense depth map and related device
JP6745328B2 (en) Method and apparatus for recovering point cloud data
CN110674829B (en) Three-dimensional target detection method based on graph convolution attention network
CN109598754B (en) Binocular depth estimation method based on depth convolution network
WO2018119889A1 (en) Three-dimensional scene positioning method and device
CN108801274B (en) Landmark map generation method integrating binocular vision and differential satellite positioning
CN111127538B (en) Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure
CN111160214A (en) 3D target detection method based on data fusion
CN114254696A (en) Visible light, infrared and radar fusion target detection method based on deep learning
DE112017003815T5 (en) IMAGE PROCESSING DEVICE AND IMAGE PROCESSING METHOD
AU2021103300A4 (en) Unsupervised Monocular Depth Estimation Method Based On Multi- Scale Unification
CN111325782A (en) Unsupervised monocular view depth estimation method based on multi-scale unification
WO2021056516A1 (en) Method and device for target detection, and movable platform
CN115035235A (en) Three-dimensional reconstruction method and device
CN112907573A (en) Depth completion method based on 3D convolution
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN115457354A (en) Fusion method, 3D target detection method, vehicle-mounted device and storage medium
CN116778288A (en) Multi-mode fusion target detection system and method
CN113592015B (en) Method and device for positioning and training feature matching network
CN117132737B (en) Three-dimensional building model construction method, system and equipment
CN113902802A (en) Visual positioning method and related device, electronic equipment and storage medium
CN115965961B (en) Local-global multi-mode fusion method, system, equipment and storage medium
CN109657556B (en) Method and system for classifying road and surrounding ground objects thereof
CN116630528A (en) Static scene reconstruction method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant