CN108765333B - Depth map perfecting method based on depth convolution neural network - Google Patents

Depth map perfecting method based on depth convolution neural network Download PDF

Info

Publication number
CN108765333B
CN108765333B CN201810505428.XA CN201810505428A CN108765333B CN 108765333 B CN108765333 B CN 108765333B CN 201810505428 A CN201810505428 A CN 201810505428A CN 108765333 B CN108765333 B CN 108765333B
Authority
CN
China
Prior art keywords
depth
rgb
neural network
depth map
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810505428.XA
Other languages
Chinese (zh)
Other versions
CN108765333A (en
Inventor
袁书聪
青春美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810505428.XA priority Critical patent/CN108765333B/en
Publication of CN108765333A publication Critical patent/CN108765333A/en
Application granted granted Critical
Publication of CN108765333B publication Critical patent/CN108765333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a depth map perfecting method based on a depth convolution neural network, which comprises the following steps: 1) extracting samples and labels from depth pictures and RGB pictures in training data, and extracting square picture blocks; 2) performing data enhancement including rotation and distortion operations on a square picture block sample extracted from training data; 3) training the enhanced training data through a deep convolution neural network; 4) preprocessing a depth map and an RGB picture to be processed; 5) and (4) refining the depth by the preprocessed depth map and RGB through a trained neural network. The method of the invention fully utilizes the mutual relation of the structural information and the left and right depths in the RGB picture, solves the problem of low quality of the depth map acquired by the equipment through the powerful feature extraction capability of the neural network, and is better applied to the industrial and living fields.

Description

Depth map perfecting method based on depth convolution neural network
Technical Field
The invention relates to the technical field of unmanned driving and depth reconstruction, in particular to a depth map perfecting method based on a depth convolution neural network.
Background
With the development of science and technology, depth cameras gradually step into the lives of people. The common camera can capture visible light and image the visible light on a planar picture, wherein the value of each pixel point is the components of red, green and blue light; and the value of each pixel point of the picture shot by the depth camera is the distance from the shooting plane of the camera to the point.
The use and demand for high quality depth maps has increased, whether in the industrial or entertainment industries. In the industrial field, the depth map is an unmanned vehicle, and the surrounding environment cannot be sensed without necessary input of an unmanned vehicle navigation system; in the field of robots, the depth map can provide positioning guidance for the operation of robots and mechanical arms; in an intelligent home, a traditional key interaction mode can be gradually banned by a man-machine interaction mode related to gestures; in the game, motion sensing games, virtual reality and augmented reality all need depth pictures acquired by a depth camera. It can be said that a day at all, a depth camera will be standard like a visible light camera.
The depth cameras currently on the market can be roughly divided into two types. One is based on infrared light, such as Kinect, Kinect2, LeapMotion, RealSense, etc., which in turn may be subdivided into those based on coded light and TOF techniques. The other is based on binocular matching, the principle of which is similar to binocular vision imaging of human eyes, and a depth map can be obtained from two visible light pictures about the same scene. However, either method has serious drawbacks. Depth cameras based on infrared light can only be practical in indoor environments, where excessive noise can render the device unusable in outdoor environments, even in indoor environments where noise is a problem. Due to the fact that the binocular camera is shielded, the depth of some areas can be unavailable.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a depth map perfecting method based on a depth convolution neural network, which combines a depth map and RGB pictures, distinguishes a continuous smooth region and a steep edge region in the pictures according to the RGB pictures through the special diagnosis extraction capacity of depth learning, so as to guide the smoothness and perfection of the depth map, and solves the problem of low quality of the depth map acquired by equipment, thereby being better applied to the fields of industry and life.
The technical scheme provided by the invention is as follows: a depth map perfecting method based on a depth convolution neural network comprises the following steps:
1) extracting samples and labels from depth pictures and RGB pictures in training data, and extracting square picture blocks;
2) performing data enhancement including rotation and distortion operations on a square picture block sample extracted from training data;
3) training the enhanced training data through a deep convolution neural network;
4) preprocessing a depth map and an RGB picture to be processed;
5) and (4) refining the depth by the preprocessed depth map and RGB through a trained neural network.
In step 1), in order to improve the training efficiency of the deep convolutional neural network, a known accurate training label depth map with perfect edges, an RGB map and a depth map to be perfected need to be made into a square picture block with a fixed size, and such processing does not affect the learning effect of the neural network.
In the step 2), the RGB image, the depth image to be perfected and the training label depth image are subjected to multiple transformations with the same force, including rotation angle, scale enlargement or scale reduction and turnover, so that robustness can be improved, and overfitting is avoided.
In the step 3), a neural network is constructed and trained, for a depth camera based on infrared light, only one group of RGB and a depth map to be completed is provided, so that the training input of the network comprises a group of RGB square picture blocks and a group of square picture blocks to be completed, the label is the completed square picture block, the input data is subjected to feature extraction and convolution to extract rich features, feature screening is performed through a multi-scale perception domain residual error network, and finally MSE is adopted as a cost function; for a depth camera with a binocular matching structure, RGB and depth respectively have a left group and a right group, RGB rectangular picture blocks of the left group and the right group and rectangular picture blocks of a depth picture to be completed are input, a label is the completed depth picture of a left picture or a right picture, input data are extracted to obtain rich features through a feature extraction convolutional layer, feature screening is carried out through a multi-scale perception domain residual error network, and finally MSE is adopted as a cost function; then, a neural network is trained by adopting back propagation; the multi-scale perception domain residual error network is a neural network sub-module, the input of a module is a rectangular picture block, convolution cores with different sizes are used for carrying out convolution on the picture block, edge filling is carried out on the picture according to the size of the corresponding convolution core so that the feature scales obtained by the convolution are consistent, then the feature matrixes are overlapped or averaged according to channels, and meanwhile, the input of the module is directly cascaded to the output of the module.
In step 4), the completed depth map is not needed any more when the network is used, the depth map to be completed and the RGB map are input, the depth map to be completed and the RGB map are subjected to the same preprocessing, and the pixel value of the picture is normalized to be between 0 and 1.
In step 5), the processed depth map to be refined and the RGB picture are propagated forward through the trained neural network to obtain a refined depth map.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention realizes the perfection function of the depth map guided by the RGB picture based on the deep learning for the first time, and breaks through the technical defects of fuzzy edge, information loss and low precision of the depth map caused by the traditional method.
2. The invention utilizes the multi-scale perception domain residual error network to lead the neural network to be easier to train.
3. The invention realizes the left and right depth map checking function of the binocular matching depth camera based on depth learning for the first time, and combines the left and right depth and RGB information. The method overcomes the defects that the traditional method is too simple, cannot utilize global information and only uses a left depth map and a right depth map to complete the depth.
4. The invention realizes the depth map shielding perfection function of the binocular matching depth camera based on depth learning for the first time, and solves the problem of depth map deficiency caused by the visual angle shielding problem of the binocular matching camera.
5. The invention has simple use method and high speed, and has wide application in the aspects of industry, robots, entertainment and the like.
Drawings
FIG. 1 is a diagram of a depth network architecture for an infrared depth camera in accordance with the present invention.
Fig. 2 is a diagram of a multi-scale perceptual domain residual error network architecture.
FIG. 3 is a depth network architecture diagram for a binocular matched depth camera of the present invention.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The depth map perfecting method based on the depth convolution neural network provided by the embodiment has the following specific conditions:
1) acquiring input data and enhancing
Training data is obtained, and data sets such as SUN3D, Middlebury data sets and the like with depth maps are obtained on the web at the same time. In order to improve the universality of the deep learning network and prevent overfitting, data enhancement operation is carried out on the obtained data. And for pictures in the same group, the depth map to be completed, the RGB map and the completed depth map are included. The three pictures are subjected to the same random transformation, such as scaling, small-angle rotation, picture brightness enhancement and the like, and then RGB and depth of the pictures are normalized to be between 0 and 1.
2) Converting a complete picture into a square or rectangular picture block
For the depth camera based on infrared light, a complete picture is divided into square picture block groups, the positions of the center points of the square picture blocks are the same, the sizes of the square picture blocks are the same, the input of each group comprises an RGB square picture block and a square picture block of a depth map to be perfected, and the output comprises the perfected square picture block of the depth map. For a depth camera with a basic binocular matching structure, a complete picture is divided into long strip-shaped rectangular picture blocks, and the horizontal center position of each rectangular picture block is the same. The training of the network can be accelerated by dividing the complete picture into square picture blocks or rectangular picture blocks.
3) Training of deep convolutional neural networks
3.1) for an infrared light based depth camera, the structure of the network is as shown in FIG. 1. For each group of data, the data are transmitted in the forward direction through a network, the RGB square picture block and the depth map forward direction picture block to be completed pass through respective feature extraction networks, and rich features are extracted through a depth convolution network; then, the RGB and the depth map to be perfected are output through a feature extraction network and pass through a multi-scale perception domain residual error network, and extracted features are screened; the multi-scale perception domain residual error network is a neural network sub-module, when a module inputs a rectangular picture block, convolution is carried out on the rectangular picture block by convolution kernels with different sizes (such as the sizes of 3 x 3,5 x 5,9 x 9,11 x 11 and the like), edges of the picture are filled according to the sizes of the corresponding convolution kernels so that feature scales obtained by convolution are consistent, then the feature matrices are overlapped or averaged according to channels, meanwhile, the input of the module is directly cascaded to the output of the module, and the structure of the multi-scale perception domain residual error network can be shown in fig. 2 in a mode of overlapping or directly adding according to the channels. Because the perfection of the depth map not only needs detail information on a small scale, but also needs structural information on a large scale, and a multi-scale perception domain has a better effect, and meanwhile, the residual error training method can enable the training to be more accurate. Then, the screened features are output through a network obtained by a convolution neural network through a full connection layer or through superposition of two channels; comparing with the improved depth map square picture block to calculate MSE error; parameters of the network are then adjusted by a back propagation algorithm using a stochastic gradient descent optimization algorithm. Wherein the multi-scale perceptual domain residual network is shown in fig. 2.
3.2) for binocular matching based depth cameras, the network structure is shown in FIG. 3. For each group of data, firstly, the data is transmitted in the forward direction through a network, a rectangular picture block and a depth matrix picture block of a left picture RGB (red, green and blue) picture are subjected to respective feature extraction convolution networks to obtain a feature matrix, and then feature screening is carried out through respective multi-scale perception domain residual error networks to obtain a feature matrix 1 and a feature matrix 2; and (3) respectively extracting a convolution network from the rectangular picture block and the depth matrix picture block of the right picture RGB through respective features to obtain a feature matrix, then respectively screening the features through respective multi-scale perception domain residual error networks to obtain a feature matrix 3 and a feature matrix 4, then obtaining network output from the feature matrix 1, the feature matrix 2, the feature matrix 3 and the feature matrix 4 through a full connection layer, and comparing the network output with the rectangular picture block at the corresponding position of the improved left picture depth map to calculate the MSE error. Parameters of the network are then adjusted by a back propagation algorithm using a stochastic gradient descent optimization algorithm.
3.3) we get all data into an epoch through one forward propagation and one backward propagation, and the training process of the network can be accelerated through the GPU. At each epoch, first, N groups of data are randomly extracted from the data set, and then the network parameters are optimized by using the forward propagation and backward propagation algorithms in steps 3.1) and 3.2). Until the differential reduction rate of the network output and the tag no longer changes significantly or reaches a specified number of iterations.
4) Pre-processing of perfected depth maps
When the network is used, the completed depth map is not needed any more, and the input is the depth map to be completed and the RGB map. And carrying out the same pretreatment on the depth map to be perfected and the RGB map, and normalizing the pixel value of the picture to be between 0 and 1.
5) By neural networks
And forward propagating the processed depth map to be completed and the RGB picture through the trained neural network to obtain a completed depth map.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. A depth map perfecting method based on a depth convolution neural network is characterized by comprising the following steps:
1) extracting samples and labels from a depth map and an RGB map in training data, and extracting a square picture block;
2) performing data enhancement including rotation and distortion operations on a square picture block sample extracted from training data;
3) training the enhanced training data through a deep convolution neural network;
constructing a neural network and training, wherein for a depth camera based on infrared light, an RGB (red, green and blue) image and a depth image to be perfected are only one group, so that the training input of the network comprises a square image block of a group of RGB images and a square image block of a depth image to be perfected, a label is the perfected square image block, the input data is subjected to feature extraction and convolution to extract rich features, then the feature screening is carried out through a multi-scale perception domain residual error network, and finally MSE (mean square error) is adopted as a cost function; for a binocular matching structure depth camera, an RGB image and a depth image respectively have a left group and a right group, rectangular image blocks of the RGB image and rectangular image blocks of the depth image to be perfected are input, labels are the perfected depth images of the left image or the right image, input data are extracted to have rich features through a feature extraction convolutional layer, feature screening is carried out through a multi-scale perception domain residual error network, and finally MSE is adopted as a cost function; then, a neural network is trained by adopting back propagation; the multi-scale perception domain residual error network is a neural network sub-module, the input of a module is a rectangular picture block, convolution cores with different sizes are used for carrying out convolution on the rectangular picture block, the edge filling is carried out on the picture according to the size of the corresponding convolution core so that the feature scales obtained by the convolution are consistent, then the feature matrixes are overlapped or averaged according to channels, and meanwhile, the input of the module is directly cascaded to the output of the module;
4) preprocessing a depth map and an RGB map which need to be processed;
5) and (4) refining the depth by the trained neural network through the preprocessed depth map and the RGB map.
2. The method for improving the depth map based on the deep convolutional neural network as claimed in claim 1, wherein: in step 4), the completed depth map is not needed any more when the network is used, the depth map to be completed and the RGB map are input, the depth map to be completed and the RGB map are subjected to the same preprocessing, and the pixel value of the picture is normalized to be between 0 and 1.
CN201810505428.XA 2018-05-24 2018-05-24 Depth map perfecting method based on depth convolution neural network Active CN108765333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810505428.XA CN108765333B (en) 2018-05-24 2018-05-24 Depth map perfecting method based on depth convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810505428.XA CN108765333B (en) 2018-05-24 2018-05-24 Depth map perfecting method based on depth convolution neural network

Publications (2)

Publication Number Publication Date
CN108765333A CN108765333A (en) 2018-11-06
CN108765333B true CN108765333B (en) 2021-08-10

Family

ID=64005505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810505428.XA Active CN108765333B (en) 2018-05-24 2018-05-24 Depth map perfecting method based on depth convolution neural network

Country Status (1)

Country Link
CN (1) CN108765333B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493308B (en) * 2018-11-14 2021-10-26 吉林大学 Medical image synthesis and classification method for generating confrontation network based on condition multi-discrimination
CN109658352B (en) * 2018-12-14 2021-09-14 深圳市商汤科技有限公司 Image information optimization method and device, electronic equipment and storage medium
CN109829863B (en) * 2019-01-22 2021-06-25 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN109993169A (en) * 2019-04-11 2019-07-09 山东浪潮云信息技术有限公司 One kind is based on character type method for recognizing verification code end to end
CN113111909B (en) * 2021-03-04 2024-03-12 西北工业大学 Self-learning method for SAR target recognition with incomplete training target visual angle
CN115457101B (en) * 2022-11-10 2023-03-24 武汉图科智能科技有限公司 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825484A (en) * 2016-03-23 2016-08-03 华南理工大学 Depth image denoising and enhancing method based on deep learning
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107808131A (en) * 2017-10-23 2018-03-16 华南理工大学 Dynamic gesture identification method based on binary channel depth convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825484A (en) * 2016-03-23 2016-08-03 华南理工大学 Depth image denoising and enhancing method based on deep learning
CN107204010A (en) * 2017-04-28 2017-09-26 中国科学院计算技术研究所 A kind of monocular image depth estimation method and system
CN107808131A (en) * 2017-10-23 2018-03-16 华南理工大学 Dynamic gesture identification method based on binary channel depth convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度联合图像滤波;Yijun Li;《IEEE》;20160726;1-14页 *

Also Published As

Publication number Publication date
CN108765333A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108765333B (en) Depth map perfecting method based on depth convolution neural network
CN109508681B (en) Method and device for generating human body key point detection model
KR102319177B1 (en) Method and apparatus, equipment, and storage medium for determining object pose in an image
WO2019223382A1 (en) Method for estimating monocular depth, apparatus and device therefor, and storage medium
CN107274445B (en) Image depth estimation method and system
CN110570371A (en) image defogging method based on multi-scale residual error learning
CN109003297B (en) Monocular depth estimation method, device, terminal and storage medium
CN109509156B (en) Image defogging processing method based on generation countermeasure model
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN110060230B (en) Three-dimensional scene analysis method, device, medium and equipment
CN112907573B (en) Depth completion method based on 3D convolution
CN111768452A (en) Non-contact automatic mapping method based on deep learning
US11361534B2 (en) Method for glass detection in real scenes
CN113822951A (en) Image processing method, image processing device, electronic equipment and storage medium
CN108259764A (en) Video camera, image processing method and device applied to video camera
WO2023065665A1 (en) Image processing method and apparatus, device, storage medium and computer program product
CN117197388A (en) Live-action three-dimensional virtual reality scene construction method and system based on generation of antagonistic neural network and oblique photography
CN113076953A (en) Black car detection method, system, device and storage medium
CN114792354B (en) Model processing method and device, storage medium and electronic equipment
CN115294453A (en) Saliency detection method based on RGB-D multichannel information fusion
CN115496788A (en) Deep completion method using airspace propagation post-processing module
CN114648604A (en) Image rendering method, electronic device, storage medium and program product
CN115482285A (en) Image alignment method, device, equipment and storage medium
CN108268533A (en) A kind of Image Feature Matching method for image retrieval
TWI814503B (en) Method for training depth identification model, identifying depth of image and related devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant