CN109685842A

CN109685842A - A kind of thick densification method of sparse depth based on multiple dimensioned network

Info

Publication number: CN109685842A
Application number: CN201811531022.5A
Authority: CN
Inventors: 刘光辉; 朱志鹏; 孙铁成; 李茹; 徐增荣
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2019-04-26
Anticipated expiration: 2038-12-14
Also published as: CN109685842B

Abstract

The invention discloses a kind of thick densification methods of the sparse depth based on multiple dimensioned network.Belong to the estimation of Depth technical field of computer vision.The present invention uses multiple dimensioned convolutional neural networks, and rgb image data and sparse point cloud data are effectively merged, finally obtain dense depth image.Sparse cloud is mapped to two-dimensional surface and generates sparse depth figure, and it is aligned with RGB image, then sparse depth figure and RGB image are linked together and generates RGBD image, RGBD image is input to multiple dimensioned convolutional neural networks and is trained and tests, finally estimate a dense depth map.The mode estimating depth that RGB image and sparse cloud combine, the range information that a cloud can be allowed to include go that RGB image is instructed to be converted into depth map；The information of initial data different resolution is utilized in multiple dimensioned network, on the one hand expands view field, and on the other hand the input depth map in small resolution ratio is denser, can obtain higher accuracy rate.

Description

A kind of thick densification method of sparse depth based on multiple dimensioned network

Technical field

The invention belongs to the estimation of Depth fields of computer vision, and in particular to one kind is based on multiple dimensioned convolutional neural networks The thick densification method of sparse depth.

Background technique

In unmanned, the sensory perceptual system based on computer vision technique is most basic part.Currently, unmanned sense Know that most-often used in system is the camera based on visible light, camera has many advantages, such as at low cost, and the relevant technologies are mature.But There is also distinct disadvantages for camera based on visible light: first, there was only colouring information by the RGB image that camera is shot, if Target texture is complicated, and sensory perceptual system easily determines fault.Second, the camera based on visible light can fail in certain environment.Example Such as illumination insufficient night, camera is difficult to be normally carried out work.Laser radar is also that unmanned sensory perceptual system is commonly used Sensor.Laser radar is not easy to be illuminated by the light the influence of condition, and the point cloud data of acquisition has three-dimensional character, by point cloud data Depth image can be directly obtained, depth image is the image that will be put cloud and be mapped to two-dimensional surface formation, each pixel Value indicates the point to the distance of sensor.Compared to RGB image, the range information that depth image includes is to object identification, segmentation Etc. tasks it is more helpful.But laser radar is expensive, and the point cloud acquired is excessively sparse, and the depth map of generation is also excessively dilute It dredges, affects its using effect to a certain degree.

Summary of the invention

Goal of the invention of the invention is: in view of the above problems, providing a kind of multiple dimensioned network of utilization to sparse The method that depth carries out denseization.

The thick densification method of sparse depth based on multiple dimensioned network of the invention, including the following steps:

Construct multiple dimensioned network model:

The multiple dimensioned network model includes (L >=2) road L input branch's branch, by the output corresponding points of the road L branch branch Information fused layer is inputted after addition, a up-sampling treatment layer is followed by information fused layer, as the defeated of multiple dimensioned network model Layer out；

Wherein, in the road L input branch's branch, wherein input of the branch as original image all the way；The remaining road L-1 is as former Beginning image carries out the input of the down-sampled images obtained after different down-samplings；And the output figure of the output layer of multiple dimensioned network model As identical as the size of original image；

And the input data of the road L input branch's branch includes: RGB image and sparse depth figure；Wherein for original graph The down-sampling mode of the sparse depth figure of picture are as follows: for sparse depth figure, preset down-sampling multiple K is based on, by sparse depth Figure is divided into grid according to pixel, and each grid includes K × K and is originally inputted pixel；And based on the depth for being originally inputted pixel The mark value s for being respectively originally inputted pixel is arranged in value_iIf the depth value for being currently originally inputted pixel is 0, s_i=0；Otherwise s_i= 1；Wherein i is a specificator for being originally inputted pixel of K × K that each grid includes；And according to formulaObtain the depth value p of each grid_new, wherein p_iExpression is originally inputted picture The depth value of plain i；

Input is that the network structure of the branch of original image is first network structure；

Input is the network structure of the branch of the down-sampled images of original image are as follows: adds K/2 after first network structure The up-sampling convolution block D in a 16 channel, wherein K indicates the down-sampling multiple to original image；

The first network structure includes 14 layers, is respectively as follows:

First layer is input layer and pond layer, and the convolution kernel size of input layer is 7*7, port number 64, and convolution step-length is 2；Pond layer uses maximum value pond, and convolution kernel size is 3*3, and pond constant is 2；

The second layer is identical with third layer structure, is the R in 64 channels¹Residual error convolution block；

The 4th layer of R for 128 channels²Residual error convolution block；

Layer 5 is the R in 128 channels¹Residual error convolution block；

Layer 6 is the R in 256 channels²Residual error convolution block；

Layer 7 is the R in 256 channels¹Residual error convolution block；

The 8th layer of R for 512 channels²Residual error convolution block；

The 9th layer of R for 512 channels¹Residual error convolution block；

Tenth layer is a convolutional layer, and convolution kernel size is 3*3, and port number 256, convolution step-length is 1；

Eleventh floor is the up-sampling convolution block D in 128 channels, and by the output of eleventh floor and the output of layer 7 according to Floor 12 is inputted again after the superposition of channel；

Floor 12 is the up-sampling convolution block D in 64 channels, and by the output of Floor 12 and the output of layer 5 according to The 13rd layer is inputted again after the superposition of channel；

13rd layer be 32 channels up-sampling convolution block D, and by the 13rd layer of output and the output of third layer according to The 14th layer is inputted again after the superposition of channel；

The 14th layer of up-sampling convolution block D for 16 channels；

The R¹Residual error convolution block includes two layers of mutually isostructural convolutional layer, and convolution kernel size is 3*3, and convolution step-length is 1, port number is adjustable；And R will be inputted¹The input data of residual error convolution block is added access one with the output corresponding points of the second layer ReLU activation primitive, as R¹The output layer of residual error convolution block；

The R²Residual error convolution block includes the first, second, and third convolutional layer, inputs R²The input data of residual error convolution block point Not Jin Ru two branches, then the output corresponding points of two branches are added one ReLU activation primitive of access, as R²Residual error volume The output layer of block；Wherein a branch is sequentially connected first and second convolutional layer, and another branch is third convolutional layer；

The structure of first convolutional layer and the second convolutional layer is identical, is convolution kernel size for 3*3, and convolution step-length is 2, Port number is adjustable；Third convolutional layer is that convolution kernel size is 3*3, and convolution step-length is 1, and port number is adjustable；

The up-sampling convolution block D includes two amplification modules and a convolutional layer, wherein input up-samples convolution block D's Input data respectively enters two branches, then the output corresponding points of two branches are added one ReLU activation primitive of access, makees For the output layer for up-sampling convolution block D；Wherein a branch is sequentially connected first amplification module and convolutional layer, another branch Road is the second amplification module；

Wherein, the convolutional layer of convolution block D is up-sampled are as follows: convolution kernel size is 3*3, and convolution step-length is 1, and port number is adjustable Section；

The amplification module of up-sampling convolution block D includes four convolutional layers arranged side by side, the port number setting of four convolutional layers To be identical, convolution kernel size is respectively as follows: 3*3,3*2,2*3 and 2*2, and convolution step-length is 1, inputs the input number of amplification module According to by being stitched together after its four convolutional layers again, the output as amplification module；

The information Fusion Module is that convolution kernel size is 3*3, port number 1, the convolutional layer that convolution step-length is 1；

Deep learning training is carried out to constructed multiple dimensioned network model, and passes through trained multiple dimensioned network model Obtain the processing result of denseization of image to be processed.

In conclusion by adopting the above-described technical solution, the beneficial effects of the present invention are: the present invention utilizes sparse cloud The mode estimating depth combined with image, sparse depth instruct RGB image, and RGB image mends sparse depth It fills, in conjunction with the advantages of two kinds of data modes, multiple scale network modelings in conjunction with set by the present invention carry out estimation of Depth, improve The accuracy rate of estimation of Depth.

Detailed description of the invention

Fig. 1 is down-sampling schematic diagram of the invention in specific embodiment；

Fig. 2 is residual error convolution block schematic diagram in specific embodiment.Wherein Fig. 2-a is one residual error convolution block of type, Fig. 2- B is two residual error convolution block of type；

Fig. 3 is to up-sample convolution block schematic diagram in specific embodiment.Wherein Fig. 3-a is amplification module schematic diagram, Fig. 3- B is entire up-sampling convolution block schematic diagram；

Fig. 4 is used multiple dimensioned schematic network structure in specific embodiment；

Fig. 5 is the result and comparing result figure of the present invention and existing processing method in specific embodiment.Wherein Fig. 5-a For the RGB image of input, Fig. 5-b is sparse depth figure；Fig. 5-c is estimation of Depth of the existing method to Fig. 5-b；Fig. 5-d is this Depth estimation result of the invention to Fig. 5-b.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this hair It is bright to be described in further detail.

In order to meet special scenes (such as unmanned) to the higher demand of depth image quality requirement, the present invention is mentioned A kind of method that denseization is carried out to sparse depth using multiple dimensioned network is gone out.And existing depth estimation method mainly utilizes RGB image directly obtains dense depth, but since two dimensional image direct estimation depth image has inherent ambiguity, to understand The certainly problem, the present invention by sparse cloud and image combine in the way of estimating depth, sparse depth refers to RGB image It leads, RGB image supplements sparse depth, in conjunction with the advantages of two kinds of data modes, while carrying out depth at multiple scales Estimation, improves the accuracy rate of estimation of Depth.

The present invention uses multiple dimensioned convolutional neural networks, and rgb image data and sparse point cloud data are effectively melted It closes, finally obtains dense depth image.Sparse cloud is mapped to two-dimensional surface and generates sparse depth figure, and and RGB image Alignment, then sparse depth figure and RGB image are linked together and generate RGBD (RGB+Depth Map) image, RGBD is schemed It is trained and tests as being input to multiple dimensioned convolutional neural networks, finally estimate a dense depth map.RGB image and The mode estimating depth that sparse cloud combines, the range information that a cloud can be allowed to include go that RGB image is instructed to be converted into depth Figure；The information of initial data different resolution is utilized in multiple dimensioned network, on the one hand expands view field, on the other hand small resolution Input depth map in rate is denser, can obtain higher accuracy rate.

It is proposed by the present invention based on the multiple dimensioned thick densification method of sparse depth the specific implementation process is as follows:

(1) input data down-sampling:

The size of feasible down-sampling multiple and input data has very big relationship.The input for being M*N for a Zhang great little For image, feasible down-sampling multiple range is [2, min (M, N) * 2^-5]。

The mode of sampling is as described below: selected down-sampling multiple is indicated with K, by input sparse depth figure according to pixel It is divided into grid, each grid includes K*K and is originally inputted pixel, then input picture will be divided intoA grid.Under Fig. 1 is Sample schematic diagram when multiple is 2.K*K pixel in grid is expressed as pixel set P={ p₁, p₂..., p_K*K}。

Since, there are the value that depth is zero, these values are referred to as invalid value in sparse depth figure.A mark value s is constructed to use Carry out marked invalid value, thinks effectively, s to be enabled to be equal to 1 if the pixel depth value is not equal to 0；Otherwise it is invalid value, enables s etc. In 0.It is S={ s so as to obtain label value set corresponding with pixel set P₁, s₂..., s_K*K}。

New depth value after above-mentioned down-sampling are as follows:Wherein p_nIndicate former The depth value of beginning pixel, s_nIndicate the mark value of original image vegetarian refreshments.

Aforesaid operations are carried out to ready-portioned each grid, so that it is smaller to obtain a new resolution ratio, it is denser Depth map (referred to as small depth of resolution figure).Compared to traditional down-sampled method, small depth of resolution which obtains Figure is denser, and due to eliminating the influence of invalid value, depth value is also more accurate.RGB image is down-sampled then using traditional The down-sampled method of bilinear interpolation.Finally obtain the image and sparse depth figure of small resolution ratio.

(2) residual error convolution block is constructed:

Residual error convolution block is the important component of multiple dimensioned network of the invention, for extracting the feature of input data, It is divided into two types.

Type one: residual error convolution block R¹Building process is as follows: as shown in Fig. 2-a, the first layer of residual error convolution block is one Convolutional layer, convolution kernel size are 3*3, and port number n, convolution step-length (stide) is 1.The second layer is identical as first layer structure. Then input data is added with the output corresponding points of the second layer.Finally access a ReLU activation primitive.Residual error convolution agllutination Structure is fixed, but the port number of convolutional layer is variable, the available different residual error convolution block of port number is adjusted, accordingly by type one Residual error convolution block is named as n-channel R¹。R¹Input and output it is in the same size, without the operation of down-sampling.

Type two: residual error convolution block R²Building process is as follows: as shown in Fig. 2-b, the first layer of residual error convolution block is one Convolutional layer, convolution kernel size are 3*3, and port number n, convolution step-length is 2.The second layer be also be a convolutional layer, convolution Core size is 3*3, and port number n, convolution step-length is 1.Then by input data by a convolutional layer, convolution kernel size is 1*1, port number n, convolution step-length are 2, which is added with the output corresponding points of the second layer.A ReLU is finally accessed to swash Function living.With R¹Naming method is similar, and two residual error convolution block of type is named as n-channel R²。R²Input size be output two Times, the purpose of the operation is the receptive field for expanding convolution kernel, preferably extracts global characteristics.

(3) building up-sampling convolution block:

Up-sampling convolution block is also the pith of multiple dimensioned network, and effect is will to input amplification, each up-sampling Input can be put and is twice by convolution block.Its building process is as follows: the basic module of up-sampling convolution block is amplification module, is such as schemed Shown in 3-a, amplification module is made of four convolutional layers arranged side by side, and the port number of this four convolutional layers is all n, convolution kernel size point It is not 3*3,3*2,2*3 and 2*2, by being stitched together after this four convolutional layers, output expands one compared to input for input Times.As shown in Fig. 3-b, up-sampling convolution block is made of Liang Ge branch.The first layer of branch one is the amplification that a port number is n Module is followed by a ReLU activation primitive, and the second layer is a convolutional layer, and convolution kernel size is 3*3, port number n.Point Only one layer of branch two, this layer is the amplification module that a port number is n.The output and the output corresponding points phase of branch two of branch one Add, finally accesses a ReLU activation primitive.With R¹, R²Naming method is similar, and up-sampling convolution block is named as n-channel D.

(4) multiple dimensioned convolutional network is constructed:

Multiple dimensioned network can construct multiple scales, it can construct a plurality of branch, the quantity and down-sampling of branch building Multiple is equally influenced by input picture size, and for size is the image of M*N, the number of branches upper limit is log₂(min (M, N) * 2^-5)+1.Construction method will establish two branches by taking two branches as an example, and the input of a branch is former resolution ratio, The input of another branch is 1/K original resolution, and K is the down-sampling multiple of input picture.Two branches are finally subjected to letter Breath fusion.

First branch, i.e. input are that the branch building of original resolution is as follows:

First layer is input layer and pond layer, and the convolution kernel size of input layer is 7*7, port number 64, and convolution step-length is 2.Pond layer uses maximum value pond, and convolution kernel size is 3*3, and pond constant is 2.The size being originally inputted is M*N*4, is led to Size becomes after crossing first layerI.e. size becomes original 1/4, and port number becomes 64.

The second layer is the R in 64 channels¹Residual error convolution block, is denoted as R¹ ₁。

Third layer structure is identical as the second layer, is denoted as R¹ ₂。

4th layer be 128 channels R²Residual error convolution block is denoted as R² ₁。

Layer 5 is the R in 128 channels¹Residual error convolution block, is denoted as R¹ ₃。

Layer 6 is the R in 256 channels²Residual error convolution block, is denoted as R² ₂。

Layer 7 is the R in 256 channels¹Residual error convolution block, is denoted as R¹ ₄。

8th layer be 512 channels R²Residual error convolution block, is denoted as R² ₃。

9th layer be 512 channels R¹Residual error convolution block, is denoted as R¹ ₅。

Tenth layer is a convolutional layer, and convolution kernel size is 3*3, and port number 256, convolution step-length is 1.

Eleventh floor is the up-sampling convolution block D in 128 channels, is denoted as D₁。

Then by D₁Output and layer 7 R¹ ₄Output be superimposed according to channel, wherein R¹ ₄Output Size beD₁Output Size beSuperimposed size becomesThe meaning of superposition is The available some raw informations lost in convolution process, so that result is more acurrate.

Floor 12 is the up-sampling convolution block D in 64 channels, is denoted as D₂, then by D₂Output and R¹ ₃Output according to logical Trace-stacking.

13rd layer be 32 channels up-sampling convolution block D, be denoted as D₃, then by D₃Output and R¹ ₂Output according to logical Trace-stacking.

The 14th layer of up-sampling convolution block D for 16 channels is denoted as D₄。

So far, it inputs and is finished for the network structure building of the branch of former resolution ratio.

Article 2 branch, i.e. input are that the branch building of 1/K original resolution is as follows:

Preceding ten four-layer structure is that the branch of original resolution is identical with input, thereafter will be according to the input size of branch Add the up-sampling convolution block D in 16 channels of corresponding number.It is 1/K original resolution for input (down-sampling multiple is K) For branch, then K/2 up-sampling convolution block is added.Such as the situation that Fig. 4 is two branch, wherein Article 2 branch is defeated Enter for the example of 1/2 original resolution (down-sampling multiple is 2), of Article 2 branch up-sampling convolution block D to be added Number is exactly 1.The case where multiresolution, is similar therewith, if input is 1/4 original resolution, adds the upper of two 16 channels Convolution block is sampled, and so on.

After the completion of branch building, the information by this two branches is needed to merge, the structure of information fusion is as follows: by the The output of one branch is added with the output corresponding points of Article 2 branch, the input as information Fusion Module.Information merges mould The network structure of block is a convolutional layer, and convolution kernel size is 3*3, and port number 1 finally passes through layer output on linear Sampling obtains size and is originally inputted final result of a size.

Information fusion in the case of extra multiple branch circuit (two or more) is then:

(5) setting of loss function:

In present embodiment, loss function uses Smooth L1 loss function, i.e.,Its Middle d indicates the depth value that convolutional neural networks estimate, d^gThe depth value of expression standard, N indicate pixel in a depth map The summation of number.

(6) training and test of model:

In present embodiment, the training data of use derives from public data collection NYU-Depth-v2 dataset. The data set contains RGB image and dense depth map, size 640*480.Training process has selected 48000 RGB Image dense depth map corresponding with its is as training data；Test process has selected 654 RGB images corresponding with its dense Depth is as test data.The input of network is a RGB image and a sparse depth figure, and there is no sparse for the data set Depth map can be combined into RGBD with RGB image by obtaining sparse depth figure to 1000 points of dense depth map stochastical sampling Image is as input.

When training, by RGBD image down sampling at 320*240 size, then carries out heartcut and obtain 304*228 size RGBD image (inputs the original image of multiple dimensioned network model), using the image as the input of first branch, then will The image obtains the RGBD image of 152*114 size as Article 2 branch according to twice of method down-sampling described in step (1) Input.Once 8 images of training then train full number to need 6000 times according to collection, by entire data set training 15 times, then one It to train altogether 90000 times.Learning rate when training is using the learning rate changed, and initial learning rate is set as 0.01, and data set is every It has trained 5 times, learning rate declines 10 times, and last learning rate is 0.0001.The parameter of model is saved after training.

When test, the parameter of reading model, data processing method is identical in training process, and by treated, data are input to In model, final result is exported.As shown in figure 5, being some of output result and existing deep learning method of the invention Compare.On the whole, result of the invention is apparent, compares the result details as can be seen that of the invention from the result in black surround That embodies is more preferable.

The above description is merely a specific embodiment, any feature disclosed in this specification, except non-specifically Narration, can be replaced by other alternative features that are equivalent or have similar purpose；Disclosed all features or all sides Method or in the process the step of, other than mutually exclusive feature and/or step, can be combined in any way.

Claims

1. a kind of thick densification method of sparse depth based on multiple dimensioned network, characterized in that it comprises the following steps:

Construct multiple dimensioned network model:

The multiple dimensioned network model includes the road L input branch's branch, is inputted after the output corresponding points of the road L branch branch are added Information fused layer is followed by a up-sampling treatment layer to information fused layer, the output layer as multiple dimensioned network model；

Wherein, in the road L input branch's branch, wherein input of the branch as original image all the way；The remaining road L-1 is as original graph As the input of the down-sampled images obtained after different down-samplings；And the output image of the output layer of multiple dimensioned network model with The size of original image is identical；

And the input data of the road L input branch's branch includes: RGB image and sparse depth figure；Wherein for original image The down-sampling mode of sparse depth figure are as follows: for sparse depth figure, be based on preset down-sampling multiple K, sparse depth figure is pressed Photograph element is divided into grid, and each grid includes K × K and is originally inputted pixel；And it is set based on the depth value for being originally inputted pixel Set the mark value s for being respectively originally inputted pixel_iIf the depth value for being currently originally inputted pixel is 0, s_i=0；Otherwise s_i=1；Its Middle i is a specificator for being originally inputted pixel of K × K that each grid includes；And according to formulaObtain the depth value p of each grid_new, wherein p_iExpression is originally inputted picture The depth value of plain i；

Input is the network structure of the branch of the down-sampled images of original image are as follows: K/2 16 is added after first network structure The up-sampling convolution block D in channel, wherein K indicates the down-sampling multiple to original image；

The first network structure includes 14 layers, is respectively as follows:

First layer is input layer and pond layer, and the convolution kernel size of input layer is 7*7, and port number 64, convolution step-length is 2；Pond Change layer and use maximum value pond, convolution kernel size is 3*3, and pond constant is 2；

The 4th layer of R for 128 channels²Residual error convolution block；

Layer 5 is the R in 128 channels¹Residual error convolution block；

Layer 6 is the R in 256 channels²Residual error convolution block；

Layer 7 is the R in 256 channels¹Residual error convolution block；

The 8th layer of R for 512 channels²Residual error convolution block；

The 9th layer of R for 512 channels¹Residual error convolution block；

Eleventh floor is the up-sampling convolution block D in 128 channels, and by the output of eleventh floor and the output of layer 7 according to channel Floor 12 is inputted after superposition again；

Floor 12 is the up-sampling convolution block D in 64 channels, and by the output of Floor 12 and the output of layer 5 according to channel The 13rd layer is inputted after superposition again；

13rd layer be 32 channels up-sampling convolution block D, and by the 13rd layer of output and the output of third layer according to channel The 14th layer is inputted after superposition again；

The 14th layer of up-sampling convolution block D for 16 channels；

The R¹Residual error convolution block includes two layers of mutually isostructural convolutional layer, and convolution kernel size is 3*3, and convolution step-length is 1, is led to Road number is adjustable；And R will be inputted¹The input data of residual error convolution block is added one ReLU of access with the output corresponding points of the second layer Activation primitive, as R¹The output layer of residual error convolution block；

The R²Residual error convolution block includes the first, second, and third convolutional layer, inputs R²The input data of residual error convolution block respectively into Enter two branches, then the output corresponding points of two branches are added one ReLU activation primitive of access, as R²Residual error convolution block Output layer；Wherein a branch is sequentially connected first and second convolutional layer, and another branch is third convolutional layer；

The structure of first convolutional layer and the second convolutional layer is identical, is convolution kernel size for 3*3, convolution step-length is 2, channel Number is adjustable；Third convolutional layer is that convolution kernel size is 3*3, and convolution step-length is 1, and port number is adjustable；

The up-sampling convolution block D includes two amplification modules and a convolutional layer, wherein the input of input up-sampling convolution block D Data respectively enter two branches, then the output corresponding points of two branches are added one ReLU activation primitive of access, as upper Sample the output layer of convolution block D；Wherein a branch is sequentially connected first amplification module and convolutional layer, and another branch is Second amplification module；

Wherein, the convolutional layer of convolution block D is up-sampled are as follows: convolution kernel size is 3*3, and convolution step-length is 1, and port number is adjustable；

The amplification module for up-sampling convolution block D includes four convolutional layers arranged side by side, and the port number of four convolutional layers is set as phase Together, convolution kernel size is respectively as follows: 3*3,3*2,2*3 and 2*2, and convolution step-length is 1, and the input data for inputting amplification module is logical It is stitched together again after crossing its four convolutional layers, the output as amplification module；

Deep learning training is carried out to constructed multiple dimensioned network model, and is obtained by trained multiple dimensioned network model The processing result of denseization of image to be processed.

2. the method as described in claim 1, which is characterized in that the down-sampling mode of the RGB image of original image are as follows: use The down-sampled method of bilinear interpolation.

3. the method as described in claim 1, which is characterized in that when carrying out deep learning training to multiple dimensioned network model, adopt Loss function isWherein d_jIndicate the depth of each pixel of multiple dimensioned network model output Value, i.e. estimated value depth value, j are pixel specificator,Indicate the standard depth value of pixel, the i.e. corresponding mark of training sample Label value, N indicate the summation of the number of pixels of a width sparse depth figure.