CN110349087B - RGB-D image high-quality grid generation method based on adaptive convolution - Google Patents
RGB-D image high-quality grid generation method based on adaptive convolution Download PDFInfo
- Publication number
- CN110349087B CN110349087B CN201910609314.4A CN201910609314A CN110349087B CN 110349087 B CN110349087 B CN 110349087B CN 201910609314 A CN201910609314 A CN 201910609314A CN 110349087 B CN110349087 B CN 110349087B
- Authority
- CN
- China
- Prior art keywords
- network
- convolution
- image
- resolution
- adaptive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000003321 amplification Effects 0.000 claims abstract description 7
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 7
- 238000010606 normalization Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 2
- 230000008439 repair process Effects 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for generating a high-quality RGB-D image grid based on adaptive convolution, which comprises the following steps: 1) constructing a training data set; 2) data amplification and normalization; 3) constructing an adaptive convolutional layer; 4) constructing a depth image completion network and a super-resolution network and training; 5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid. The data set constructed by the method solves the problem that a high-quality large-scale data set is lacked in the depth image completion field at present; by using the coding and decoding structure and the cross-layer connection structure, the low-layer characteristics and the high-layer characteristics in the data can be effectively fused, and meanwhile, the redundancy of parameters is avoided; the problem that the current method is difficult to generate a complete depth image with high quality can be effectively solved by using the adaptive convolution structure. The invention can solve the problems of low data precision and large missing area of the current kinect.
Description
Technical Field
The invention relates to the technical field of high-quality three-dimensional grid generation, in particular to a method for generating a high-quality grid of an RGB-D image based on adaptive convolution.
Background
With the great application of depth sensors in the fields of automatic driving, augmented reality, indoor navigation, safe payment, scene reconstruction and the like, the demands for obtaining high-precision depth information and subsequent high-quality three-dimensional reconstruction results become more and more important. Although great progress has been made in depth sensing technology recently, on one hand, commercial grade RGB-D cameras such as Microsoft Kinect, Intel real sense and Google Tango devices still have the disadvantage that the lack of depth data often appears in the collected depth image when the collected surface is too smooth, high light, too fine, too close to or far away from the camera. These situations are frequently encountered in large rooms, in bar-like objects and in scenes with intense light. Even at home, depth images typically lack more than 50% of the pixels. On the other hand, limited by the lower resolution of the depth camera, the point cloud reconstructed from the sensor data is too sparse. The raw data from these depth sensor scans may be less suitable for use as described above for three-dimensional reconstruction applications.
The fast generation of high-quality grid data mainly has two key parts: first, data completion, i.e., recovery of the missing depth data due to various adverse factors, is performed. Then, data super-resolution, i.e. complete point cloud data of low resolution from the previous step, generates point cloud data of high resolution. And finally, further generating grid data from the point cloud data.
Many indoor RGB-D data completion and super-resolution methods based on the conventional method are unsatisfactory in effect, and recently, a few methods based on deep learning have certain effect but have the following main disadvantages: 1) non-end-to-end learning causes the method to fail in real time; 2) the larger field of convolution causes the destruction of edge information.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an RGB-D image high-quality grid generation method based on adaptive convolution.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the method for generating the RGB-D image high-quality grid based on the adaptive convolution comprises the following steps:
1) constructing a training data set;
2) data amplification and normalization;
3) constructing an adaptive convolutional layer;
4) constructing a depth image completion network and a super-resolution network and training;
5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid.
In the step 1), basic data including an RGBD data set NYU-DATASET and an RGBD-SCENE-DATASET, wherein the two data of NYU-DATASET and RGBD-SCENE-DATASET comprise an indoor SCENE color image I acquired by using kinect v1RGBAnd the corresponding depth image I with missing bandDincAnd repairing the depth image with band deletion by using a method based on a Poisson equation and a sparse matrix to obtain a complete depth image IDcFor training.
In the step 2), data amplification comprises horizontal turning and expansion operation on the missing region of the depth image so as to obtain training data with different missing proportions; data normalization refers to scaling all image pixel values between 0 and 1 for a color image and performing the following processing for a depth image:
wherein the content of the first and second substances,representing the minimum and maximum values, respectively, of the pixels in the depth image before normalization.
In step 3), an adaptive convolution layer is constructed, and the operation of the adaptive convolution is as follows:
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight value regular term;
the adaptive convolutional layer gives different weights to different regions of the image, so that the depth network can better learn effective features in the image; network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution network Netsr,mjThe calculation methods of (a) are different, and specifically, the following are as follows:
network Net for semantic fillingfill:
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjWhether the pixel value of (a) is 0;
for edge enhanced network Netrefine:
Wherein x is judgedjWhether the criterion is valid is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values;
for super-resolution network Netsr:
In step 4), a depth image completion network for completing the task and a super-resolution network for the super-resolution task are respectively constructed and trained, specifically as follows:
a. depth image completion network
The completion network is constructed by adopting a multi-scale coding and decoding network, and the network Net is filled by semanticsfillAnd edge enhanced network NetrefineThe two parts are formed and sequentially connected in sequence, wherein all standard convolutions except the last layer are replaced by adaptive convolutions;
network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDoutThen mix IDoutAnd a color image IRGBInput edge enhancement network Net togetherrefineRefining and adjusting to finally obtain a deletion repair result IrepairThe output tensor form is the same as the input image size; the network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 10: 1;
semantic filling network NetfillThe method comprises the following steps that a U-shaped neural network (U-Net) is adopted as a basic structure, the network comprises an encoder and a decoder, the encoder is used for encoding image information and converting a feature space, the decoder is used for decoding high-order information, and the two parts adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step length is all 2, the height and width of each layer of features are reduced to half of the original height and width, and 0 complementing processing is carried out on the boundary of an input image; the number of convolution kernels is 16,32,64,128 and 128 respectively; all the missing areas are finally repaired by continuously extracting features in different sizes to fill the missing areas;
the decoder is 5 layers of structures equally, and every layer contains four operations of upsampling, feature splicing, adaptive convolution and batch regularization, adopts leak relu as the activation function, carries out cross-layer connection between encoder and the decoder, and the output of every encoder all copies the concatenation with the same size's after the decoder upsampling signature graph output promptly to as the input of decoder, specifically be: after the input of the previous layer is sampled and spliced with the corresponding encoder features with the same size, the current adaptive convolution is input for feature learning, the sizes of convolution kernels are all 3 x 3, the step lengths are all 1 x 1, and the number of the convolution kernels is 128,64,32,16 and 1 respectively;
the last layer of the network is a convolution layer with the convolution kernel size of 1 x 1 and is used for channel transformation and numerical value interval mapping of features;
b. constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features, and uses Manhattan distance as a loss function for optimization;
for super-resolution network NetsrAdopting a dense connection block (dense block) as a basic structure, performing up-sampling through sub-pixel convolution, replacing all standard convolutions with adaptive convolutions, and using 1 × 1 convolution for channel adjustment in the last layer of the network; the network uses five dense connection blocks to extract features, each dense connection block uses two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of input to keep the feature sizes of input and output consistent, and the number of the convolution kernels is 64; the input and the output of the dense connection block are connected in a cross-layer way, namely the input and the output of the dense connection block are connected inSplicing the line characteristic dimensions and then taking the spliced line characteristic dimensions as the input of the next dense connecting block; the network learns richer information by continuously fusing features from low dimension to high dimension; the up-sampling factor of the sub-pixel convolution is 4, and at the end of the network, a standard convolution with the convolution kernel size of 1 x 1 takes relu as an activation function;
c. training the constructed network
Designing a corresponding loss function for the constructed network, and optimizing the loss function by using an Adam method to finally obtain the trained network.
In the step 5), high-quality indoor scene grid data is generated by the high-resolution point cloud data repaired by the neural network through a rolling Ball method Ball pitching method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. aiming at the condition that a high-quality RGB-D data set is not disclosed in the prior art, a method for constructing a high-low quality indoor scene RGB-D image data set is provided.
2. Aiming at a consumption level or a missing depth information map acquired by a depth camera of mobile equipment, a depth information map repairing algorithm combining RGB color image features and fusion convolution operation is provided.
3. Aiming at a low-quality and high-noise depth image, a method for denoising and enhancing features by utilizing RGB color image semantic information is provided.
4. A method for reconstructing super-resolution of depth images by a point cloud-based convolution network is provided, wherein the method is based on RGB color image semantic features.
Drawings
FIG. 1 is a logic flow diagram of the present invention.
FIG. 2 is an architecture diagram of a semantic filling network.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the method for generating a high-quality mesh of an RGB-D image based on adaptive convolution according to this embodiment includes the following steps:
step 1, constructing a training data set
1.1 construction of indoor scene completion dataset databasecomplete
More than one hundred thousand sets of RGBD images of indoor scenes are arranged in an open data set of New York university, the RGBD images are collected from https:// cs.nyu.edu/. about.silberman/datasets/nyu _ depth _ v2.html website, and N is more than or equal to 9000 and less than or equal to NRGBD10000 or less, 640 x 480 resolution, wherein the RGBD image comprises a color image IRGBIncomplete depth image IDincAnd repairing the band-missing depth image by using the colorization method of Ant Levin's to obtain a complete depth image IDc. First, the depth image and the RGB image are aligned and noise reduced, and then the RGBD data is cropped around, resulting in 557 × 423 resolution images. Subsequently, extracting masks from all data in the data set to construct a Mask image set MaskDepth:
In the formula (I), the compound is shown in the specification,is a mask in which the depth image corresponds to missing information;
the above-mentioned generated IDincAnd corresponding to IRGBAnd IDcNew training data is formed, and a completion task training data set is further constructed.
1.2 construction of indoor scene super resolution dataset DatabaseSR
Two sets of pairs of depth maps M with the same content are acquired from http:// rgbd-dataset.cs. washington. edu/dataset/rgbd-scenes-v2, and the down-sampling factor of each image is 4 x. A set of bit low resolution depth maps ILRThe other set is a high resolution depth map IHRAnd performing data amplification on the horizontal turning of the image pairs to finally obtain 24000-25000 pairs of data pairs.
Step 2, data amplification and normalization
And the data amplification comprises horizontal inversion and expansion operation on the missing region of the depth image so as to obtain training data with different missing proportions.
For color image IRGBThe pixel values are scaled to 0-1 by dividing the pixel values by 255. For depth image IDincFirst, noise reduction processing is performed, and the depth values are set to zero at 200 or less and 40000 or more. Calculating the minimum value for each image separatelyAnd maximum valueThen, the following treatments were carried out:
step 3, constructing an adaptive convolutional layer
The core here is to build a suitable adaptive convolution layer to replace the standard convolution to better serve the task. The flow of the adaptive convolutional layer is as follows:
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight regularization term. The adaptive convolution gives different weights to different regions, and compared with the traditional convolution, the network can learn effective characteristics better. Network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution network Netsr,mjThe calculation of (c) is different. The calculation is as follows.
Network Net for semantic fillingfill:
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjIs 0.
For edge enhanced network Netrefine:
Wherein x is judgedjWhether the criterion is valid or not is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values.
For super-resolution network Netsr:
Step 4, constructing a depth image completion network and a super-resolution network and training
a. Constructing a depth image completion network
The depth image completion network is constructed by adopting a multi-scale coding and decoding structure network and is filled with a semantic network NetfillAnd edge enhanced network NetrefineThe two parts are composed and connected sequentially in sequence, wherein all standard convolutions except the last layer are replaced by adaptive convolutions.
Network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDout. Then adding IDoutAnd a color image IrgbInput Net togetherrefineRefining and adjusting to finally obtain a cavity filling result IFillThe output tensor form is also H × W × 1. The network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 1: 10.
As shown in FIG. 2, NetfillThe method adopts U-Net as a basic structure, and an encoder and a decoder both adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step sizes are all 2, the characteristic height and width of each coding layer are reduced to half of the original height and width, and 0 complementing processing is carried out on the boundary of an input image. The number of convolution kernels is 16,32,64,128 and 128 respectively. By continuously extracting features in different sizes to fill the missing regions, all the missing regions are finally repaired.
The decoder is also of a 5-layer structure, each layer comprises four operations of upsampling, feature splicing, adaptive convolution and regularization, and the leak relu is also adopted as an activation function. Each decoder layer firstly performs upsampling on the input of the previous layer, then splices the upsampled input with the corresponding features in the encoder with the same size, and then inputs the current adaptive convolution for feature learning. The convolution kernel sizes are all 3 × 3, the step sizes are all 1 × 1, and the number of convolution kernels is 128,64,32,16,1, respectively.
The last layer of the network is a common convolution layer with convolution kernel size of 1 x 1, and is the same as the channel transformation and the value interval mapping of the features.
c. Constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features and uses Manhattan distance as a loss function for optimization.
For super-resolution network NetsrInputting the result I obtained by the completion networkFillObtaining complete depth with high resolution ratio by semantic extraction of dense connection blocks and up-sampling of sub-pixel convolutionAnd measuring the image and finally converting into a grid.
Super-resolution network NetsrAnd (3) adopting a dense connecting block as a basic structure, and performing up-sampling through sub-pixel convolution. Likewise, all standard convolutions are replaced with adaptive convolutions. Similarly, the last layer of the network uses 1 × 1 convolution for channel tuning. The model uses five dense connection blocks to extract features, each dense connection block comprises two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of the input to keep the feature sizes of the input and the output consistent, and the number of the convolution kernels is 64. And the input and the output of the dense connecting block are connected in a cross-layer manner, namely the input and the output of the dense connecting block are spliced in characteristic dimensions and then serve as the input of the next dense connecting block. The network learns richer information by constantly fusing features from low dimension to high dimension. The up-sampling factor of the sub-pixel convolution is 4 and the feature becomes 4 times higher and wider after passing through this layer. At the end of the network, there is a standard convolution with a convolution kernel size of 1 x 1 and relu as the activation function.
Training a neural network: and dividing the data set into a training set, a verification set and a test set according to the ratio of 7:2:1, and respectively training the completion network and the super-resolution network. And evaluating the model in real time and calculating evaluation indexes by using the verification set, and performing performance test on the trained network by using the test set. The processor of the used equipment is Intel i7-7700, and the video card is Invitta 1080 ti;
for the completion task, NetfillInput as a depth map IinTraining is carried out for one day by using the batch size of 4 and the learning rate of 0.001, then the training is continued by reducing the learning rate to 0.0001, and the whole process takes three days. The training process takes the mean square error between the network output and the true value as the loss function. NetrefineWill be input into IrgbExtracted weight sum NetfillAnd the corresponding input is multiplied by the corresponding element, and is convolved by a standard convolution kernel with fixed parameters, so that the method has no trainable parameters and is quick to execute.
For the super-resolution task, NetsrIs input aslrIn a batch size of 8Training, the learning rate is 0.0001. Training takes 200 batches of models to converge.
Step 5, inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid, wherein the method specifically comprises the following steps:
and generating high-quality indoor scene grid data by using the high-resolution point cloud data repaired by the neural network through a Ball Pivoting method.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (4)
1. The method for generating the RGB-D image high-quality grid based on the adaptive convolution is characterized by comprising the following steps of:
1) constructing a training data set;
2) data amplification and normalization;
3) constructing an adaptive convolution layer, wherein the operation of the adaptive convolution is as follows:
wherein x isiIs a certain point in the tensor, xjIs xiNeighborhood point of, mjIs xjCorresponding mask, ω being the standard convolution operation, b being an offset,. indicates a multiplication operation by elements,. Ψ (M)i) Is a weight value regular term;
different weights are given to different regions of the image by the adaptive convolution, so that the effective features in the image can be better learned by the depth network; network Net for semantic fillingfillEdge enhanced network NetrefineAnd super-resolution meshNet (Net)sr,mjThe calculation methods of (a) are different, and specifically, the following are as follows:
network Net for semantic fillingfill:
Wherein x is judgedjThe basis for whether it is valid is that in the current feature, xjWhether the pixel value of (a) is 0;
for edge enhanced network Netrefine:
Wherein x is judgedjWhether the criterion is valid is that in the current RGB image, xjWhether the pixel difference from the center of the corresponding sliding window is less than 5 pixel values;
for super-resolution network Netsr:
4) Constructing a depth image completion network and a super-resolution network and training;
5) and inputting the test data into the two trained networks in sequence, outputting the repaired high-resolution picture and further converting the high-resolution picture into a high-quality grid.
2. The method for generating a high quality mesh for RGB-D images based on adaptive convolution of claim 1, wherein: in the step 1), basic data including an RGBD data set NYU-DATASET and an RGBD-SCENE-DATASET, wherein the two data of NYU-DATASET and RGBD-SCENE-DATASET comprise an indoor SCENE color image I acquired by using kinect v1RGBAnd the corresponding depth image I with missing bandDincAnd using the equation based on Poisson's equation and sparse momentRepairing the depth image with band missing by the array method to obtain a complete depth image IDcFor training.
3. The method for generating a high quality mesh for RGB-D images based on adaptive convolution of claim 1, wherein: in step 4), a depth image completion network for completing the task and a super-resolution network for the super-resolution task are respectively constructed and trained, specifically as follows:
a. depth image completion network
The completion network is constructed by adopting a multi-scale coding and decoding network, and the network Net is filled by semanticsfillAnd edge enhanced network NetrefineThe two parts are formed and sequentially connected in sequence, wherein all convolutions except the last layer are replaced by adaptive convolutions;
network Net for semantic fillingfillInputting depth image I with missingDincThe tensor form is H W1, wherein H is the height of the image, and W is the width of the image; through NetfillObtaining a complete depth image I after semantic completionDoutThen mix IDoutAnd a color image IRGBInput edge enhancement network Net togetherrefineRefining and adjusting to finally obtain a deletion repair result IrepairThe output tensor form is the same as the input image size; the network loss function is composed of loss of a missing area and loss of a non-missing area respectively, and the weight ratio is 10: 1;
semantic filling network NetfillThe method comprises the following steps that a U-shaped neural network U-Net is adopted as a basic structure, the network comprises an encoder and a decoder, the encoder is used for encoding image information and converting a feature space, the decoder is used for decoding high-order information, and the two parts adopt a 5-layer convolutional neural network architecture;
the encoder adopts a five-layer structure, each layer respectively comprises two operations of adaptive convolution and batch regularization, a leak-relu is used as an activation function, the sizes of convolution kernels are respectively 7 × 7,5 × 5,3 × 3 and 3 × 3, the convolution step length is all 2, the height and width of the features are reduced to half after each time of adaptive convolution, and 0 complementing processing is carried out on the boundary of an input image to eliminate the difference of convolution edge regions; the number of convolution kernels is 16,32,64,128 and 128 respectively; all the missing areas are finally repaired by continuously extracting features in different sizes to fill the missing areas;
the decoder is 5 layers of structures equally, and every layer contains four operations of upsampling, feature splicing, adaptive convolution and batch regularization, adopts leak relu as the activation function, carries out cross-layer connection between encoder and the decoder, and the output of every encoder all copies the concatenation with the same size's after the decoder upsampling signature graph output promptly to as the input of decoder, specifically be: after the input of the previous layer is sampled and spliced with the corresponding encoder features with the same size, the current adaptive convolution is input for feature learning, the sizes of convolution kernels are all 3 x 3, the step lengths are all 1 x 1, and the number of the convolution kernels is 128,64,32,16 and 1 respectively;
the last layer of the network is standard convolution with convolution kernel size of 1 x 1, and is used for channel transformation and numerical value interval mapping of features;
b. constructing super-resolution networks
The super-resolution task adopts a method of fusing global features and local features, and uses Manhattan distance as a loss function for optimization;
for super-resolution network NetsrSampling is carried out by sub-pixel convolution by adopting a dense connection block as a basic structure, all standard convolutions are replaced by adaptive convolutions, and the last layer of the network uses 1 × 1 convolution for channel adjustment; the network uses five dense connection blocks to extract features, each dense connection block comprises two times of adaptive convolution, the sizes of convolution kernels are all 3 x 3, the step length is 1, 0 is supplemented to the periphery of input to keep the feature sizes of input and output consistent, and the number of the convolution kernels is 64; performing cross-layer connection on the input and the output of the dense connecting blocks, namely splicing the input and the output of the dense connecting blocks in characteristic dimensions and then using the spliced input and the output as the input of the next dense connecting block; from low dimension to high dimension by constant fusionTo enable the network to learn richer information; the up-sampling factor of the sub-pixel convolution is 4, and at the end of the network, a standard convolution with the convolution kernel size of 1 x 1 takes relu as an activation function;
c. training the constructed network
Designing a corresponding loss function for the constructed network, and optimizing the loss function by using an Adam method to finally obtain the trained network.
4. The adaptive convolution-based RGB-D image high quality mesh generation method of claim 1, wherein: in the step 5), high-quality indoor scene grid data is generated by rolling Ball method from the high-resolution point cloud data repaired by the neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910609314.4A CN110349087B (en) | 2019-07-08 | 2019-07-08 | RGB-D image high-quality grid generation method based on adaptive convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910609314.4A CN110349087B (en) | 2019-07-08 | 2019-07-08 | RGB-D image high-quality grid generation method based on adaptive convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110349087A CN110349087A (en) | 2019-10-18 |
CN110349087B true CN110349087B (en) | 2021-02-12 |
Family
ID=68178224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910609314.4A Active CN110349087B (en) | 2019-07-08 | 2019-07-08 | RGB-D image high-quality grid generation method based on adaptive convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110349087B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091548B (en) * | 2019-12-12 | 2020-08-21 | 哈尔滨市科佳通用机电股份有限公司 | Railway wagon adapter dislocation fault image identification method and system based on deep learning |
CN111626929B (en) * | 2020-04-28 | 2023-08-08 | Oppo广东移动通信有限公司 | Depth image generation method and device, computer readable medium and electronic equipment |
CN111915619A (en) * | 2020-06-05 | 2020-11-10 | 华南理工大学 | Full convolution network semantic segmentation method for dual-feature extraction and fusion |
CN112734825A (en) * | 2020-12-31 | 2021-04-30 | 深兰人工智能(深圳)有限公司 | Depth completion method and device for 3D point cloud data |
CN113033645A (en) * | 2021-03-18 | 2021-06-25 | 南京大学 | Multi-scale fusion depth image enhancement method and device for RGB-D image |
CN114004754B (en) * | 2021-09-13 | 2022-07-26 | 北京航空航天大学 | Scene depth completion system and method based on deep learning |
CN117420209B (en) * | 2023-12-18 | 2024-05-07 | 中国机械总院集团沈阳铸造研究所有限公司 | Deep learning-based full-focus phased array ultrasonic rapid high-resolution imaging method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971409A (en) * | 2014-05-22 | 2014-08-06 | 福州大学 | Measuring method for foot three-dimensional foot-type information and three-dimensional reconstruction model by means of RGB-D camera |
CN109087375A (en) * | 2018-06-22 | 2018-12-25 | 华东师范大学 | Image cavity fill method based on deep learning |
CN109272447A (en) * | 2018-08-03 | 2019-01-25 | 天津大学 | A kind of depth map super-resolution method |
CN109903372A (en) * | 2019-01-28 | 2019-06-18 | 中国科学院自动化研究所 | Depth map super-resolution complementing method and high quality three-dimensional rebuilding method and system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10192347B2 (en) * | 2016-05-17 | 2019-01-29 | Vangogh Imaging, Inc. | 3D photogrammetry |
US10122994B2 (en) * | 2016-11-11 | 2018-11-06 | Disney Enterprises, Inc. | Object reconstruction from dense light fields via depth from gradients |
US10049463B1 (en) * | 2017-02-14 | 2018-08-14 | Pinnacle Imaging Corporation | Method for accurately aligning and correcting images in high dynamic range video and image processing |
US10497084B2 (en) * | 2017-04-24 | 2019-12-03 | Intel Corporation | Efficient sharing and compression expansion of data across processing systems |
CN108932550B (en) * | 2018-06-26 | 2020-04-24 | 湖北工业大学 | Method for classifying images based on fuzzy dense sparse dense algorithm |
CN109064406A (en) * | 2018-08-26 | 2018-12-21 | 东南大学 | A kind of rarefaction representation image rebuilding method that regularization parameter is adaptive |
-
2019
- 2019-07-08 CN CN201910609314.4A patent/CN110349087B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103971409A (en) * | 2014-05-22 | 2014-08-06 | 福州大学 | Measuring method for foot three-dimensional foot-type information and three-dimensional reconstruction model by means of RGB-D camera |
CN109087375A (en) * | 2018-06-22 | 2018-12-25 | 华东师范大学 | Image cavity fill method based on deep learning |
CN109272447A (en) * | 2018-08-03 | 2019-01-25 | 天津大学 | A kind of depth map super-resolution method |
CN109903372A (en) * | 2019-01-28 | 2019-06-18 | 中国科学院自动化研究所 | Depth map super-resolution complementing method and high quality three-dimensional rebuilding method and system |
Non-Patent Citations (6)
Title |
---|
Real-time scene reconstruction and triangle mesh generation using multiple RGB-D cameras;Siim Meerits 等;《J Real-Time Image Proc》;20171118;第2247-2259页 * |
三维网格模型特征向量水印嵌入;李世群 等;《图学学报》;20170415;第38卷(第2期);第155-161页 * |
基于RGBD图像的三维重建关键问题研究;郭庆慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140815(第08期);第I138-1360页 * |
基于多尺度卷积网络的单幅图像的点法向估计;冼楚华 等;《华南理工大学学报(自然科学版)》;20181215;第46卷(第12期);第1-9页 * |
基于深度学习的人脸表情识别研究;牛新亚;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215(第02期);第I138-3911页 * |
改进的基于卷积神经网络的图像超分辨率算法;肖进胜 等;《光学学报》;20170331;第37卷(第3期);第103-111页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110349087A (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110349087B (en) | RGB-D image high-quality grid generation method based on adaptive convolution | |
CN108765296B (en) | Image super-resolution reconstruction method based on recursive residual attention network | |
CN111915484B (en) | Reference image guiding super-resolution method based on dense matching and self-adaptive fusion | |
CN111784602B (en) | Method for generating countermeasure network for image restoration | |
CN108230278B (en) | Image raindrop removing method based on generation countermeasure network | |
CN110136062B (en) | Super-resolution reconstruction method combining semantic segmentation | |
CN111539887B (en) | Channel attention mechanism and layered learning neural network image defogging method based on mixed convolution | |
CN108596841B (en) | Method for realizing image super-resolution and deblurring in parallel | |
CN112001914A (en) | Depth image completion method and device | |
CN110599401A (en) | Remote sensing image super-resolution reconstruction method, processing device and readable storage medium | |
CN108734661B (en) | High-resolution image prediction method for constructing loss function based on image texture information | |
WO2020015330A1 (en) | Enhanced neural network-based image restoration method, storage medium, and system | |
CN109785279B (en) | Image fusion reconstruction method based on deep learning | |
CN113870124B (en) | Weak supervision-based double-network mutual excitation learning shadow removing method | |
CN111861886B (en) | Image super-resolution reconstruction method based on multi-scale feedback network | |
CN112241939B (en) | Multi-scale and non-local-based light rain removal method | |
Wei et al. | Improving resolution of medical images with deep dense convolutional neural network | |
CN105488759B (en) | A kind of image super-resolution rebuilding method based on local regression model | |
CN104899835A (en) | Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping | |
Guan et al. | Srdgan: learning the noise prior for super resolution with dual generative adversarial networks | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN116777764A (en) | Diffusion model-based cloud and mist removing method and system for optical remote sensing image | |
CN110633706B (en) | Semantic segmentation method based on pyramid network | |
Yang et al. | Image super-resolution reconstruction based on improved Dirac residual network | |
CN111553856A (en) | Image defogging method based on depth estimation assistance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |