CN110349087B - RGB-D image high-quality grid generation method based on adaptive convolution - Google Patents

RGB-D image high-quality grid generation method based on adaptive convolution Download PDF

Info

Publication number
CN110349087B
CN110349087B CN201910609314.4A CN201910609314A CN110349087B CN 110349087 B CN110349087 B CN 110349087B CN 201910609314 A CN201910609314 A CN 201910609314A CN 110349087 B CN110349087 B CN 110349087B
Authority
CN
China
Prior art keywords
network
convolution
image
resolution
adaptive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910609314.4A
Other languages
Chinese (zh)
Other versions
CN110349087A (en
Inventor
张东九
冼楚华
杨煜
钱锟
李桂清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910609314.4A priority Critical patent/CN110349087B/en
Publication of CN110349087A publication Critical patent/CN110349087A/en
Application granted granted Critical
Publication of CN110349087B publication Critical patent/CN110349087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for generating a high-quality RGB-D image grid based on adaptive convolution, which comprises the following steps: 1) constructing a training data set; 2) data amplification and normalization; 3) constructing an adaptive convolutional layer; 4) constructing a depth image completion network and a super-resolution network and training; 5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid. The data set constructed by the method solves the problem that a high-quality large-scale data set is lacked in the depth image completion field at present; by using the coding and decoding structure and the cross-layer connection structure, the low-layer characteristics and the high-layer characteristics in the data can be effectively fused, and meanwhile, the redundancy of parameters is avoided; the problem that the current method is difficult to generate a complete depth image with high quality can be effectively solved by using the adaptive convolution structure. The invention can solve the problems of low data precision and large missing area of the current kinect.

Description

RGB-D image high-quality grid generation method based on adaptive convolution
Technical Field
The invention relates to the technical field of high-quality three-dimensional grid generation, in particular to a method for generating a high-quality grid of an RGB-D image based on adaptive convolution.
Background
With the great application of depth sensors in the fields of automatic driving, augmented reality, indoor navigation, safe payment, scene reconstruction and the like, the demands for obtaining high-precision depth information and subsequent high-quality three-dimensional reconstruction results become more and more important. Although great progress has been made in depth sensing technology recently, on one hand, commercial grade RGB-D cameras such as Microsoft Kinect, Intel real sense and Google Tango devices still have the disadvantage that the lack of depth data often appears in the collected depth image when the collected surface is too smooth, high light, too fine, too close to or far away from the camera. These situations are frequently encountered in large rooms, in bar-like objects and in scenes with intense light. Even at home, depth images typically lack more than 50% of the pixels. On the other hand, limited by the lower resolution of the depth camera, the point cloud reconstructed from the sensor data is too sparse. The raw data from these depth sensor scans may be less suitable for use as described above for three-dimensional reconstruction applications.
The fast generation of high-quality grid data mainly has two key parts: first, data completion, i.e., recovery of the missing depth data due to various adverse factors, is performed. Then, data super-resolution, i.e. complete point cloud data of low resolution from the previous step, generates point cloud data of high resolution. And finally, further generating grid data from the point cloud data.
Many indoor RGB-D data completion and super-resolution methods based on the conventional method are unsatisfactory in effect, and recently, a few methods based on deep learning have certain effect but have the following main disadvantages: 1) non-end-to-end learning causes the method to fail in real time; 2) the larger field of convolution causes the destruction of edge information.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an RGB-D image high-quality grid generation method based on adaptive convolution.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the method for generating the RGB-D image high-quality grid based on the adaptive convolution comprises the following steps:
1) constructing a training data set;
2) data amplification and normalization;
3) constructing an adaptive convolutional layer;
4) constructing a depth image completion network and a super-resolution network and training;
5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid.
In the step 1), basic data including an RGBD data set NYU-DATASET and an RGBD-SCENE-DATASET, wherein the two data of NYU-DATASET and RGBD-SCENE-DATASET comprise an indoor SCENE color image I acquired by using kinect v1RGBAnd the corresponding depth image I with missing bandDincAnd repairing the depth image with band deletion by using a method based on a Poisson equation and a sparse matrix to obtain a complete depth image IDcFor training.
In the step 2), data amplification comprises horizontal turning and expansion operation on the missing region of the depth image so as to obtain training data with different missing proportions; data normalization refers to scaling all image pixel values between 0 and 1 for a color image and performing the following processing for a depth image:
Figure GDA0002819763160000021
wherein the content of the first and second substances,
Figure GDA0002819763160000022
representing the minimum and maximum values, respectively, of the pixels in the depth image before normalization.
In step 3), an adaptive convolution layer is constructed, and the operation of the adaptive convolution is as follows:
Figure GDA0002819763160000031
Figure GDA0002819763160000032
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight value regular term;
the adaptive convolutional layer gives different weights to different regions of the image, so that the depth network can better learn effective features in the image; network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution network Netsr,mjThe calculation methods of (a) are different, and specifically, the following are as follows:
network Net for semantic fillingfill
Figure GDA0002819763160000033
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjWhether the pixel value of (a) is 0;
for edge enhanced network Netrefine
Figure GDA0002819763160000034
Wherein x is judgedjWhether the criterion is valid is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values;
for super-resolution network Netsr
Figure GDA0002819763160000035
In step 4), a depth image completion network for completing the task and a super-resolution network for the super-resolution task are respectively constructed and trained, specifically as follows:
a. depth image completion network
The completion network is constructed by adopting a multi-scale coding and decoding network, and the network Net is filled by semanticsfillAnd edge enhanced network NetrefineThe two parts are formed and sequentially connected in sequence, wherein all standard convolutions except the last layer are replaced by adaptive convolutions;
network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDoutThen mix IDoutAnd a color image IRGBInput edge enhancement network Net togetherrefineRefining and adjusting to finally obtain a deletion repair result IrepairThe output tensor form is the same as the input image size; the network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 10: 1;
semantic filling network NetfillThe method comprises the following steps that a U-shaped neural network (U-Net) is adopted as a basic structure, the network comprises an encoder and a decoder, the encoder is used for encoding image information and converting a feature space, the decoder is used for decoding high-order information, and the two parts adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step length is all 2, the height and width of each layer of features are reduced to half of the original height and width, and 0 complementing processing is carried out on the boundary of an input image; the number of convolution kernels is 16,32,64,128 and 128 respectively; all the missing areas are finally repaired by continuously extracting features in different sizes to fill the missing areas;
the decoder is 5 layers of structures equally, and every layer contains four operations of upsampling, feature splicing, adaptive convolution and batch regularization, adopts leak relu as the activation function, carries out cross-layer connection between encoder and the decoder, and the output of every encoder all copies the concatenation with the same size's after the decoder upsampling signature graph output promptly to as the input of decoder, specifically be: after the input of the previous layer is sampled and spliced with the corresponding encoder features with the same size, the current adaptive convolution is input for feature learning, the sizes of convolution kernels are all 3 x 3, the step lengths are all 1 x 1, and the number of the convolution kernels is 128,64,32,16 and 1 respectively;
the last layer of the network is a convolution layer with the convolution kernel size of 1 x 1 and is used for channel transformation and numerical value interval mapping of features;
b. constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features, and uses Manhattan distance as a loss function for optimization;
for super-resolution network NetsrAdopting a dense connection block (dense block) as a basic structure, performing up-sampling through sub-pixel convolution, replacing all standard convolutions with adaptive convolutions, and using 1 × 1 convolution for channel adjustment in the last layer of the network; the network uses five dense connection blocks to extract features, each dense connection block uses two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of input to keep the feature sizes of input and output consistent, and the number of the convolution kernels is 64; the input and the output of the dense connection block are connected in a cross-layer way, namely the input and the output of the dense connection block are connected inSplicing the line characteristic dimensions and then taking the spliced line characteristic dimensions as the input of the next dense connecting block; the network learns richer information by continuously fusing features from low dimension to high dimension; the up-sampling factor of the sub-pixel convolution is 4, and at the end of the network, a standard convolution with the convolution kernel size of 1 x 1 takes relu as an activation function;
c. training the constructed network
Designing a corresponding loss function for the constructed network, and optimizing the loss function by using an Adam method to finally obtain the trained network.
In the step 5), high-quality indoor scene grid data is generated by the high-resolution point cloud data repaired by the neural network through a rolling Ball method Ball pitching method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. aiming at the condition that a high-quality RGB-D data set is not disclosed in the prior art, a method for constructing a high-low quality indoor scene RGB-D image data set is provided.
2. Aiming at a consumption level or a missing depth information map acquired by a depth camera of mobile equipment, a depth information map repairing algorithm combining RGB color image features and fusion convolution operation is provided.
3. Aiming at a low-quality and high-noise depth image, a method for denoising and enhancing features by utilizing RGB color image semantic information is provided.
4. A method for reconstructing super-resolution of depth images by a point cloud-based convolution network is provided, wherein the method is based on RGB color image semantic features.
Drawings
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is an architecture diagram of a semantic filling network.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the method for generating a high-quality mesh of an RGB-D image based on adaptive convolution according to this embodiment includes the following steps:
step 1, constructing a training data set
1.1 construction of indoor scene completion dataset databasecomplete
More than one hundred thousand sets of RGBD images of indoor scenes are arranged in an open data set of New York university, the RGBD images are collected from https:// cs.nyu.edu/. about.silberman/datasets/nyu _ depth _ v2.html website, and N is more than or equal to 9000 and less than or equal to NRGBD10000 or less, 640 x 480 resolution, wherein the RGBD image comprises a color image IRGBIncomplete depth image IDincAnd repairing the band-missing depth image by using the colorization method of Ant Levin's to obtain a complete depth image IDc. First, the depth image and the RGB image are aligned and noise reduced, and then the RGBD data is cropped around, resulting in 557 × 423 resolution images. Subsequently, extracting masks from all data in the data set to construct a Mask image set MaskDepth
Figure GDA0002819763160000071
In the formula (I), the compound is shown in the specification,
Figure GDA0002819763160000072
is a mask in which the depth image corresponds to missing information;
the above-mentioned generated IDincAnd corresponding to IRGBAnd IDcNew training data is formed, and a completion task training data set is further constructed.
1.2 construction of indoor scene super resolution dataset DatabaseSR
Two sets of pairs of depth maps M with the same content are acquired from http:// rgbd-dataset.cs. washington. edu/dataset/rgbd-scenes-v2, and the down-sampling factor of each image is 4 x. A set of bit low resolution depth maps ILRThe other set is a high resolution depth map IHRAnd performing data amplification on the horizontal turning of the image pairs to finally obtain 24000-25000 pairs of data pairs.
Step 2, data amplification and normalization
And the data amplification comprises horizontal inversion and expansion operation on the missing region of the depth image so as to obtain training data with different missing proportions.
For color image IRGBThe pixel values are scaled to 0-1 by dividing the pixel values by 255. For depth image IDincFirst, noise reduction processing is performed, and the depth values are set to zero at 200 or less and 40000 or more. Calculating the minimum value for each image separately
Figure GDA0002819763160000073
And maximum value
Figure GDA0002819763160000074
Then, the following treatments were carried out:
Figure GDA0002819763160000075
step 3, constructing an adaptive convolutional layer
The core here is to build a suitable adaptive convolution layer to replace the standard convolution to better serve the task. The flow of the adaptive convolutional layer is as follows:
Figure GDA0002819763160000081
Figure GDA0002819763160000082
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight regularization term. The adaptive convolution gives different weights to different regions, and compared with the traditional convolution, the network can learn effective characteristics better. Network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution network Netsr,mjThe calculation of (c) is different. The calculation is as follows.
Network Net for semantic fillingfill
Figure GDA0002819763160000083
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjIs 0.
For edge enhanced network Netrefine
Figure GDA0002819763160000084
Wherein x is judgedjWhether the criterion is valid or not is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values.
For super-resolution network Netsr
Figure GDA0002819763160000085
Step 4, constructing a depth image completion network and a super-resolution network and training
a. Constructing a depth image completion network
The depth image completion network is constructed by adopting a multi-scale coding and decoding structure network and is filled with a semantic network NetfillAnd edge enhanced network NetrefineThe two parts are composed and connected sequentially in sequence, wherein all standard convolutions except the last layer are replaced by adaptive convolutions.
Network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDout. Then adding IDoutAnd a color image IrgbInput Net togetherrefineRefining and adjusting to finally obtain a cavity filling result IFillThe output tensor form is also H × W × 1. The network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 1: 10.
As shown in FIG. 2, NetfillThe method adopts U-Net as a basic structure, and an encoder and a decoder both adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step sizes are all 2, the characteristic height and width of each coding layer are reduced to half of the original height and width, and 0 complementing processing is carried out on the boundary of an input image. The number of convolution kernels is 16,32,64,128 and 128 respectively. By continuously extracting features in different sizes to fill the missing regions, all the missing regions are finally repaired.
The decoder is also of a 5-layer structure, each layer comprises four operations of upsampling, feature splicing, adaptive convolution and regularization, and the leak relu is also adopted as an activation function. Each decoder layer firstly performs upsampling on the input of the previous layer, then splices the upsampled input with the corresponding features in the encoder with the same size, and then inputs the current adaptive convolution for feature learning. The convolution kernel sizes are all 3 × 3, the step sizes are all 1 × 1, and the number of convolution kernels is 128,64,32,16,1, respectively.
The last layer of the network is a common convolution layer with convolution kernel size of 1 x 1, and is the same as the channel transformation and the value interval mapping of the features.
c. Constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features and uses Manhattan distance as a loss function for optimization.
For super-resolution network NetsrInputting the result I obtained by the completion networkFillObtaining complete depth with high resolution ratio by semantic extraction of dense connection blocks and up-sampling of sub-pixel convolutionAnd measuring the image and finally converting into a grid.
Super-resolution network NetsrAnd (3) adopting a dense connecting block as a basic structure, and performing up-sampling through sub-pixel convolution. Likewise, all standard convolutions are replaced with adaptive convolutions. Similarly, the last layer of the network uses 1 × 1 convolution for channel tuning. The model uses five dense connection blocks to extract features, each dense connection block comprises two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of the input to keep the feature sizes of the input and the output consistent, and the number of the convolution kernels is 64. And the input and the output of the dense connecting block are connected in a cross-layer manner, namely the input and the output of the dense connecting block are spliced in characteristic dimensions and then serve as the input of the next dense connecting block. The network learns richer information by constantly fusing features from low dimension to high dimension. The up-sampling factor of the sub-pixel convolution is 4 and the feature becomes 4 times higher and wider after passing through this layer. At the end of the network, there is a standard convolution with a convolution kernel size of 1 x 1 and relu as the activation function.
Training a neural network: and dividing the data set into a training set, a verification set and a test set according to the ratio of 7:2:1, and respectively training the completion network and the super-resolution network. And evaluating the model in real time and calculating evaluation indexes by using the verification set, and performing performance test on the trained network by using the test set. The processor of the used equipment is Intel i7-7700, and the video card is Invitta 1080 ti;
for the completion task, NetfillInput as a depth map IinTraining is carried out for one day by using the batch size of 4 and the learning rate of 0.001, then the training is continued by reducing the learning rate to 0.0001, and the whole process takes three days. The training process takes the mean square error between the network output and the true value as the loss function. NetrefineWill be input into IrgbExtracted weight sum NetfillAnd the corresponding input is multiplied by the corresponding element, and is convolved by a standard convolution kernel with fixed parameters, so that the method has no trainable parameters and is quick to execute.
For the super-resolution task, NetsrIs input aslrIn a batch size of 8Training, the learning rate is 0.0001. Training takes 200 batches of models to converge.
Step 5, inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid, wherein the method specifically comprises the following steps:
and generating high-quality indoor scene grid data by using the high-resolution point cloud data repaired by the neural network through a Ball Pivoting method.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (4)

1. The method for generating the RGB-D image high-quality grid based on the adaptive convolution is characterized by comprising the following steps of:
1) constructing a training data set;
2) data amplification and normalization;
3) constructing an adaptive convolution layer, wherein the operation of the adaptive convolution is as follows:
Figure FDA0002819763150000011
Figure FDA0002819763150000012
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight value regular term;
different weights are given to different regions of the image by the adaptive convolution, so that the effective features in the image can be better learned by the depth network; network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution meshNet (Net)sr,mjThe calculation methods of (a) are different, and specifically, the following are as follows:
network Net for semantic fillingfill
Figure FDA0002819763150000013
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjWhether the pixel value of (a) is 0;
for edge enhanced network Netrefine
Figure FDA0002819763150000014
Wherein x is judgedjWhether the criterion is valid is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values;
for super-resolution network Netsr
Figure FDA0002819763150000021
4) Constructing a depth image completion network and a super-resolution network and training;
5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid.
2. The method for generating a high quality mesh for RGB-D images based on adaptive convolution of claim 1, wherein: in the step 1), basic data including an RGBD data set NYU-DATASET and an RGBD-SCENE-DATASET, wherein the two data of NYU-DATASET and RGBD-SCENE-DATASET comprise an indoor SCENE color image I acquired by using kinect v1RGBAnd the corresponding depth image I with missing bandDincAnd using the equation based on Poisson's equation and sparse momentRepairing the depth image with band missing by the array method to obtain a complete depth image IDcFor training.
3. The method for generating a high quality mesh for RGB-D images based on adaptive convolution of claim 1, wherein: in step 4), a depth image completion network for completing the task and a super-resolution network for the super-resolution task are respectively constructed and trained, specifically as follows:
a. depth image completion network
The completion network is constructed by adopting a multi-scale coding and decoding network, and the network Net is filled by semanticsfillAnd edge enhanced network NetrefineThe two parts are formed and sequentially connected in sequence, wherein all convolutions except the last layer are replaced by adaptive convolutions;
network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDoutThen mix IDoutAnd a color image IRGBInput edge enhancement network Net togetherrefineRefining and adjusting to finally obtain a deletion repair result IrepairThe output tensor form is the same as the input image size; the network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 10: 1;
semantic filling network NetfillThe method comprises the following steps that a U-shaped neural network U-Net is adopted as a basic structure, the network comprises an encoder and a decoder, the encoder is used for encoding image information and converting a feature space, the decoder is used for decoding high-order information, and the two parts adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step length is all 2, the height and width of the features are reduced to half after each time of adaptive convolution, and 0 complementing processing is carried out on the boundary of an input image to eliminate the difference of convolution edge regions; the number of convolution kernels is 16,32,64,128 and 128 respectively; all the missing areas are finally repaired by continuously extracting features in different sizes to fill the missing areas;
the decoder is 5 layers of structures equally, and every layer contains four operations of upsampling, feature splicing, adaptive convolution and batch regularization, adopts leak relu as the activation function, carries out cross-layer connection between encoder and the decoder, and the output of every encoder all copies the concatenation with the same size's after the decoder upsampling signature graph output promptly to as the input of decoder, specifically be: after the input of the previous layer is sampled and spliced with the corresponding encoder features with the same size, the current adaptive convolution is input for feature learning, the sizes of convolution kernels are all 3 x 3, the step lengths are all 1 x 1, and the number of the convolution kernels is 128,64,32,16 and 1 respectively;
the last layer of the network is standard convolution with convolution kernel size of 1 x 1, and is used for channel transformation and numerical value interval mapping of features;
b. constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features, and uses Manhattan distance as a loss function for optimization;
for super-resolution network NetsrSampling is carried out by sub-pixel convolution by adopting a dense connection block as a basic structure, all standard convolutions are replaced by adaptive convolutions, and the last layer of the network uses 1 × 1 convolution for channel adjustment; the network uses five dense connection blocks to extract features, each dense connection block comprises two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of input to keep the feature sizes of input and output consistent, and the number of the convolution kernels is 64; performing cross-layer connection on the input and the output of the dense connecting blocks, namely splicing the input and the output of the dense connecting blocks in characteristic dimensions and then using the spliced input and the output as the input of the next dense connecting block; from low dimension to high dimension by constant fusionTo enable the network to learn richer information; the up-sampling factor of the sub-pixel convolution is 4, and at the end of the network, a standard convolution with the convolution kernel size of 1 x 1 takes relu as an activation function;
c. training the constructed network
Designing a corresponding loss function for the constructed network, and optimizing the loss function by using an Adam method to finally obtain the trained network.
4. The adaptive convolution-based RGB-D image high quality mesh generation method of claim 1, wherein: in the step 5), high-quality indoor scene grid data is generated by rolling Ball method from the high-resolution point cloud data repaired by the neural network.
CN201910609314.4A 2019-07-08 2019-07-08 RGB-D image high-quality grid generation method based on adaptive convolution Active CN110349087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910609314.4A CN110349087B (en) 2019-07-08 2019-07-08 RGB-D image high-quality grid generation method based on adaptive convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910609314.4A CN110349087B (en) 2019-07-08 2019-07-08 RGB-D image high-quality grid generation method based on adaptive convolution

Publications (2)

Publication Number Publication Date
CN110349087A CN110349087A (en) 2019-10-18
CN110349087B true CN110349087B (en) 2021-02-12

Family

ID=68178224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910609314.4A Active CN110349087B (en) 2019-07-08 2019-07-08 RGB-D image high-quality grid generation method based on adaptive convolution

Country Status (1)

Country Link
CN (1) CN110349087B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091548B (en) * 2019-12-12 2020-08-21 哈尔滨市科佳通用机电股份有限公司 Railway wagon adapter dislocation fault image identification method and system based on deep learning
CN111626929B (en) * 2020-04-28 2023-08-08 Oppo广东移动通信有限公司 Depth image generation method and device, computer readable medium and electronic equipment
CN111915619A (en) * 2020-06-05 2020-11-10 华南理工大学 Full convolution network semantic segmentation method for dual-feature extraction and fusion
CN112734825A (en) * 2020-12-31 2021-04-30 深兰人工智能(深圳)有限公司 Depth completion method and device for 3D point cloud data
CN113033645A (en) * 2021-03-18 2021-06-25 南京大学 Multi-scale fusion depth image enhancement method and device for RGB-D image
CN114004754B (en) * 2021-09-13 2022-07-26 北京航空航天大学 Scene depth completion system and method based on deep learning
CN117420209B (en) * 2023-12-18 2024-05-07 中国机械总院集团沈阳铸造研究所有限公司 Deep learning-based full-focus phased array ultrasonic rapid high-resolution imaging method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971409A (en) * 2014-05-22 2014-08-06 福州大学 Measuring method for foot three-dimensional foot-type information and three-dimensional reconstruction model by means of RGB-D camera
CN109087375A (en) * 2018-06-22 2018-12-25 华东师范大学 Image cavity fill method based on deep learning
CN109272447A (en) * 2018-08-03 2019-01-25 天津大学 A kind of depth map super-resolution method
CN109903372A (en) * 2019-01-28 2019-06-18 中国科学院自动化研究所 Depth map super-resolution complementing method and high quality three-dimensional rebuilding method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10192347B2 (en) * 2016-05-17 2019-01-29 Vangogh Imaging, Inc. 3D photogrammetry
US10122994B2 (en) * 2016-11-11 2018-11-06 Disney Enterprises, Inc. Object reconstruction from dense light fields via depth from gradients
US10049463B1 (en) * 2017-02-14 2018-08-14 Pinnacle Imaging Corporation Method for accurately aligning and correcting images in high dynamic range video and image processing
US10497084B2 (en) * 2017-04-24 2019-12-03 Intel Corporation Efficient sharing and compression expansion of data across processing systems
CN108932550B (en) * 2018-06-26 2020-04-24 湖北工业大学 Method for classifying images based on fuzzy dense sparse dense algorithm
CN109064406A (en) * 2018-08-26 2018-12-21 东南大学 A kind of rarefaction representation image rebuilding method that regularization parameter is adaptive

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971409A (en) * 2014-05-22 2014-08-06 福州大学 Measuring method for foot three-dimensional foot-type information and three-dimensional reconstruction model by means of RGB-D camera
CN109087375A (en) * 2018-06-22 2018-12-25 华东师范大学 Image cavity fill method based on deep learning
CN109272447A (en) * 2018-08-03 2019-01-25 天津大学 A kind of depth map super-resolution method
CN109903372A (en) * 2019-01-28 2019-06-18 中国科学院自动化研究所 Depth map super-resolution complementing method and high quality three-dimensional rebuilding method and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Real-time scene reconstruction and triangle mesh generation using multiple RGB-D cameras;Siim Meerits 等;《J Real-Time Image Proc》;20171118;第2247-2259页 *
三维网格模型特征向量水印嵌入;李世群 等;《图学学报》;20170415;第38卷(第2期);第155-161页 *
基于RGBD图像的三维重建关键问题研究;郭庆慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140815(第08期);第I138-1360页 *
基于多尺度卷积网络的单幅图像的点法向估计;冼楚华 等;《华南理工大学学报(自然科学版)》;20181215;第46卷(第12期);第1-9页 *
基于深度学习的人脸表情识别研究;牛新亚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215(第02期);第I138-3911页 *
改进的基于卷积神经网络的图像超分辨率算法;肖进胜 等;《光学学报》;20170331;第37卷(第3期);第103-111页 *

Also Published As

Publication number Publication date
CN110349087A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN108765296B (en) Image super-resolution reconstruction method based on recursive residual attention network
CN111915484B (en) Reference image guiding super-resolution method based on dense matching and self-adaptive fusion
CN111784602B (en) Method for generating countermeasure network for image restoration
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
CN111539887B (en) Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution
CN108596841B (en) Method for realizing image super-resolution and deblurring in parallel
CN112001914A (en) Depth image completion method and device
CN110599401A (en) Remote sensing image super-resolution reconstruction method, processing device and readable storage medium
CN108734661B (en) High-resolution image prediction method for constructing loss function based on image texture information
WO2020015330A1 (en) Enhanced neural network-based image restoration method, storage medium, and system
CN109785279B (en) Image fusion reconstruction method based on deep learning
CN113870124B (en) Weak supervision-based double-network mutual excitation learning shadow removing method
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN112241939B (en) Multi-scale and non-local-based light rain removal method
Wei et al. Improving resolution of medical images with deep dense convolutional neural network
CN105488759B (en) A kind of image super-resolution rebuilding method based on local regression model
CN104899835A (en) Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN116777764A (en) Diffusion model-based cloud and mist removing method and system for optical remote sensing image
CN110633706B (en) Semantic segmentation method based on pyramid network
Yang et al. Image super-resolution reconstruction based on improved Dirac residual network
CN111553856A (en) Image defogging method based on depth estimation assistance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant