CN107403415B - Compressed depth map quality enhancement method and device based on full convolution neural network - Google Patents

Compressed depth map quality enhancement method and device based on full convolution neural network Download PDF

Info

Publication number
CN107403415B
CN107403415B CN201710602293.4A CN201710602293A CN107403415B CN 107403415 B CN107403415 B CN 107403415B CN 201710602293 A CN201710602293 A CN 201710602293A CN 107403415 B CN107403415 B CN 107403415B
Authority
CN
China
Prior art keywords
neural network
depth map
convolution neural
network
full convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710602293.4A
Other languages
Chinese (zh)
Other versions
CN107403415A (en
Inventor
金枝
周长源
邹文斌
李霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201710602293.4A priority Critical patent/CN107403415B/en
Publication of CN107403415A publication Critical patent/CN107403415A/en
Application granted granted Critical
Publication of CN107403415B publication Critical patent/CN107403415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Generation (AREA)

Abstract

The invention is suitable for the technical field of image processing, and provides a method and a device for enhancing the quality of a compressed depth map based on a full convolution neural network, wherein the full convolution neural network comprises a plurality of cascaded FCN units, and the method comprises the following steps: training network parameters in a preset full convolution neural network by utilizing a large number of compressed texture maps and a preset first loss function to obtain an optimized full convolution neural network; training network parameters in the optimized full convolution neural network by using a small amount of compressed depth maps and a preset second loss function to obtain a target full convolution neural network; processing the compressed depth map to be enhanced sequentially through a cascade type FCN unit in a target full convolution neural network to obtain a depth map with enhanced quality; the method provided by the invention firstly trains the network to determine the optimal parameters, and then adopts the cascading FCN unit to process the depth map to be enhanced, thereby obviously improving the quality of the depth map after compression.

Description

Compressed depth map quality enhancement method and device based on full convolution neural network
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a method and a device for enhancing the quality of a compressed depth map based on a cascade neural network.
Background
The transmission of the depth map and the corresponding texture map can realize various 3D applications at a receiving end, and since each pixel point information on the depth map represents the geometric information of the corresponding 3D scene, when the depth map is compressed during transmission, the compression distortion will cause serious geometric distortion and visual perception degradation. Recently, many novel approaches have focused on removing the noise introduced when acquiring and generating depth maps and on super-resolution enhancement of low resolution depth maps. These methods can be mainly classified into three categories, i.e., filter-based (filter-based), model-based (model-based), and learning-based (learning-based) which are commonly used at present.
In the filter-based method, a typical representative method is a Joint bi-directional Upsampling (JBU), and the Bilateral weight proposed in the method needs to be determined according to a corresponding texture map. Based on this, more complex filters have been proposed, such as Adaptive Bilateral Filters (ABF) having Adaptive Filter parameters while preserving edges, and Joint Trilateral Filters (JTF) taking into account not only spatial correlation of depth maps and texture maps, but also luminance similarity thereof. Based on the structural similarity between the texture map and the depth map, a filter-based approach transfers significant structures from the texture map to the enhanced depth map. In model-based methods, model dependencies between texture maps and depth maps play an important role, such as Markov Random Field (MRF) models, Non Local Mean (NLM) models, Total Generalized Variation (TGV) models, and adaptive auto-regression models.
In the learning-based method, a Super-Resolution Convolutional Neural Network (SRCNN) model is more typical, and the method can reduce compression distortion on a texture map and realize Super-Resolution enhancement of an image. In addition, a DE-noising enhanced Convolutional Neural Network (DE-CNN, Denoise and Enhance Convolutional Neural Network) model is also available, and the main principle is to remove the noise of the depth map with the aid of a texture map.
However, filter-based and model-based approaches typically need to be specifically designed for a particular image type, are very targeted but have poor generalization capability, and are limited by their complex dependencies. While learning-based methods help to extract and map hidden information in compressed depth maps, most learning-based depth map enhancement methods not only require the assistance of texture maps, but also ensure that the used texture maps and depth maps achieve the accuracy of point-to-point alignment, which is not always satisfactory.
In addition, when the quality of the depth map is enhanced by using a learning-based method, the network needs to be trained first, and then the quality of the depth map needs to be enhanced. Generally speaking, the deeper the network structure, i.e. the more convolutional layers contained therein, the higher the quality of the reconstructed depth map; at the same time, however, the requirement for the number of training samples, i.e., the number of depth map samples, is also higher, so that the network is likely to be saturated or unconverged, and the quality of the reconstructed depth map cannot be improved.
Lossy compression in the transmission process inevitably reduces the image quality, and particularly for depth maps containing 3D scene geometric information, when large-scale lossy compression is adopted, geometric distortion and visual perception degradation obviously appear in discontinuous areas of the depth maps, which not only affects the quality of the depth maps, but also affects the quality of synthesized views in binocular stereo image application; therefore, after the receiving end receives the compressed depth map, an enhancement method needs to be provided to enhance the quality of the depth map at the receiving end.
Disclosure of Invention
The invention provides a method and a device for enhancing the quality of a compressed depth map based on a cascade neural network, and aims to provide a method for enhancing the quality of a depth map compressed by a lossy compression mode.
The invention provides a method for enhancing the quality of a compressed depth map based on a full convolution neural network, wherein the full convolution neural network comprises a plurality of cascaded Full Convolution Network (FCN) units, each FCN unit comprises 4 convolution layers which are sequentially connected, and the method comprises the following steps:
training network parameters in a preset full convolution neural network by using a large number of compressed texture maps and a preset first loss function to obtain optimized network parameters;
training the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function to obtain a target complete convolution neural network;
and processing the compressed depth map to be enhanced sequentially through the cascading FCN unit in the target full-convolution neural network to obtain the depth map with enhanced quality.
Further, the processing the compressed depth map to be enhanced sequentially through the cascaded FCN units in the target fully convolutional neural network includes:
in each FCN unit, performing feature extraction on the compressed depth map to be enhanced through a 1 st convolutional layer to obtain 64 first feature maps;
performing feature fusion on the 64 first feature maps through a 2 nd convolutional layer to obtain 32 second feature maps;
carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolutional layer to obtain 16 third feature maps;
and reconstructing the 16 third feature maps through the 4 th convolution layer to obtain a depth map with enhanced quality.
Further, the 1 st convolutional layer is composed of 64 convolution kernels of 9 × 9 size, the 2 nd convolutional layer is composed of 32 convolution kernels of 7 × 7 size, the 3 rd convolutional layer is composed of 16 convolution kernels of 1 × 1 size, and the 4 th convolutional layer is composed of 1 convolution kernel of 5 × 5 size.
Further, the training of the network parameters in the preset full convolution neural network by using a large number of compressed texture maps and a preset first loss function to obtain optimized network parameters includes:
training network parameters in a preset full convolution neural network by using a large number of compressed texture maps, and adjusting the network parameters in the full convolution neural network by using a texture map reconstructed by the network and a preset first loss function until the full convolution neural network reaches stable convergence to obtain optimized network parameters;
the training of the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function to obtain the target full-convolution neural network comprises the following steps:
and training the optimized network parameters by using a small amount of compressed depth maps, and adjusting the network parameters in the full convolution neural network by using the depth maps reconstructed by the network and a preset second loss function until the full convolution neural network reaches stable convergence to obtain a target full convolution neural network.
Further, the first loss function is:
Figure BDA0001357450260000041
therein, Loss1Representing after reconstructionThe Euclidean distance between the texture map and the texel point-to-point before compression, Θ representing all parameters in a full convolution neural network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure BDA0001357450260000042
representing the texture map before compression;
the second loss function is:
Figure BDA0001357450260000043
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the depth map pixel point-to-point before compression, theta represents all parameters in the full convolution neural network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure BDA0001357450260000044
representing the depth map before compression.
The invention also provides a device for enhancing the quality of a compressed depth map based on a full convolution neural network, wherein the full convolution neural network comprises a plurality of cascaded Full Convolution Network (FCN) units, each FCN unit comprises 4 convolution layers which are sequentially connected, and the device comprises:
the first training module is used for training the network parameters in the preset full convolution neural network by utilizing a large number of compressed texture maps and a preset first loss function to obtain optimized network parameters;
the second training module is used for training the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function to obtain a target full-convolution neural network;
and the quality enhancement module is used for processing the compressed depth map to be enhanced sequentially through the cascading FCN unit in the target full convolution neural network to obtain the depth map with enhanced quality.
Further, each of the FCN units includes: the device comprises a feature extraction submodule, a fusion submodule, a nonlinear mapping submodule and a reconstruction submodule;
the feature extraction submodule is used for performing feature extraction on the compressed depth map to be enhanced through the 1 st convolutional layer to obtain 64 first feature maps;
the fusion submodule is used for performing feature fusion on the 64 first feature maps through the 2 nd convolutional layer to obtain 32 second feature maps;
the nonlinear mapping submodule is used for carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolution layer to obtain 16 third feature maps;
and the reconstruction submodule is used for reconstructing the 16 third feature maps through the 4 th convolution layer to obtain the depth map with enhanced quality.
Further, the 1 st convolutional layer is composed of 64 convolution kernels of 9 × 9 size, the 2 nd convolutional layer is composed of 32 convolution kernels of 7 × 7 size, the 3 rd convolutional layer is composed of 16 convolution kernels of 1 × 1 size, and the 4 th convolutional layer is composed of 1 convolution kernel of 5 × 5 size.
Further, the first training module is specifically configured to train network parameters in a preset full convolution neural network by using a large number of compressed texture maps, and adjust the network parameters in the full convolution neural network by using a texture map reconstructed by the network and a preset first loss function until the full convolution neural network reaches stable convergence, so as to obtain optimized network parameters;
the second training module is specifically configured to train the optimized network parameters by using a small amount of compressed depth maps, and adjust the network parameters in the full convolutional neural network by using the depth maps reconstructed by the network and a preset second loss function until the full convolutional neural network reaches stable convergence, so as to obtain a target full convolutional neural network.
Further, the first loss function is:
Figure BDA0001357450260000051
therein, Loss1Representing the Euclidean distance between the reconstructed texture map and the texel point-to-point before compression, theta represents all the parameters in the full convolution neural network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure BDA0001357450260000052
representing the texture map before compression;
the second loss function is:
Figure BDA0001357450260000061
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the depth map pixel point-to-point before compression, theta represents all parameters in the full convolution neural network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure BDA0001357450260000062
representing the depth map before compression.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a method and a device for enhancing the quality of a compressed depth map based on a full convolution neural network, wherein the method comprises the following steps: training network parameters in a preset full convolution neural network by using a large number of compressed texture maps and a preset first loss function to obtain optimized network parameters; training the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function to obtain a target complete convolution neural network; processing the compressed depth map to be enhanced sequentially through a cascading FCN unit in the target full-convolution neural network to obtain a depth map with enhanced quality; compared with the prior art, the method has the advantages that a large number of compressed texture maps and a first loss function are used for training network parameters in the cascaded full convolution neural network to obtain optimized network parameters, and a small number of compressed depth maps and a second loss function are used for carrying out secondary training on the optimized network parameters, so that the network can be well trained under the condition that only a small number of depth map training samples exist, the network is converged but not supersaturated, and the generalization capability of the network is ensured; compared with a network with the same depth, the method has the advantages that the cascade network is adopted, so that the training is easier; and after the compressed depth map to be enhanced is processed by the full convolution neural network trained by the training method provided by the invention, the quality is obviously improved.
Drawings
Fig. 1 is a schematic flowchart of a method for enhancing quality of a compressed depth map based on a full convolution neural network according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a full convolution neural network provided in an embodiment of the present invention;
fig. 3 is a block diagram of a compressed depth map quality enhancement apparatus based on a full convolution neural network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The problem that image quality is damaged after a depth map is compressed by adopting a lossy compression mode exists in the prior art.
In order to solve the technical problem, the invention provides a compressed depth map quality enhancement method and a compressed depth map quality enhancement device based on a full convolution neural network, wherein the method comprises two processes, namely a training process and an enhancement process; the training process is as follows: firstly, training a full convolution neural network by using a large amount of compressed texture maps to obtain optimized network parameters, and then performing secondary training on the optimized network parameters by using a small amount of compressed depth maps to obtain a target full convolution neural network; the enhancement process is as follows: and processing the compressed depth map to be enhanced sequentially through a cascading FCN (Fully Convolutional Network) unit in the target full Convolutional neural Network to obtain the depth map with enhanced quality.
As shown in fig. 1, a method for enhancing quality of a compressed depth map based on a full convolutional neural network provided in an embodiment of the present invention is provided, where the full convolutional neural network includes a plurality of cascaded full convolutional network FCN units, each FCN unit includes 4 convolutional layers connected in sequence, and a specific structure is shown in fig. 2, and the method includes:
step S101, training network parameters in a preset full convolution neural network by using a large number of compressed texture maps and a preset first loss function to obtain optimized network parameters;
specifically, the technical scheme provided by the embodiment of the invention comprises two processes: a training process and an enhancement process.
With respect to the training process, because two problems are typically encountered when training a network of such tasks, one is data starvation because the number of depth maps available for training is very limited; the other is that the network does not converge. It has been shown that even if a sufficient amount of training images are input in a low-level visual task, network misconvergence is still likely to occur when training a network having more than 4 layers. In view of the above problems, the embodiment of the present invention divides training into two processes based on the provided full convolution neural network, wherein the first training process is: and training the network parameters in the preset full convolution neural network by utilizing a large number of compressed texture maps and combining a preset first loss function to obtain optimized network parameters.
Wherein the first loss function is:
Figure BDA0001357450260000081
therein, Loss1Representing the Euclidean distance between the reconstructed texture map and the texel point-to-point before compression, theta represents all the parameters in the full convolution neural network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure BDA0001357450260000082
showing the texture map before compression.
Specifically, in a first training process, inputting a large number of compressed texture maps into the full convolution neural network, and continuously adjusting network parameters in the full convolution neural network by using the texture maps generated by the full convolution neural network and combining with the function value fed back by the first loss function until the first training process is finished after the full convolution neural network converges, that is, when the value of the first loss function is sufficiently reduced and is not reduced any more; then, the network parameters obtained by training at this time are used as the network parameters of the full convolution neural network obtained by training with the texture map.
Step S102, training the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function to obtain a target full-convolution neural network;
specifically, after the first training process is finished, initializing network parameters in the full convolution neural network trained by the depth map in a transfer learning mode, namely assigning the network parameters obtained in the first training process to the network parameters in the full convolution neural network trained by the depth map to serve as initial values of the network parameters; then a second training process is performed: training the network parameters in the initialized full convolution neural network by utilizing a small number of compressed depth maps and combining a preset second loss function to obtain a target full convolution neural network; specifically, a small amount of compressed depth maps are used for training network parameters in the initialized full convolution neural network, and the depth maps output by training and a preset second loss function are used for adjusting the network parameters in the full convolution neural network until the full convolution neural network converges, so that the target full convolution neural network is obtained.
Wherein the second loss function is:
Figure BDA0001357450260000091
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the depth map pixel point-to-point before compression, theta represents all parameters in the full convolution neural network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure BDA0001357450260000092
representing the depth map before compression.
For example, the initial value of the network parameter of the texture map trained full convolution neural network is A0Then the value becomes A after the first training process is finished1(ii) a At this time, the network initial value of the depth map is B0Let us order B0=A1It is equivalent to initializing the full convolution neural network of the depth map; finally, after the second training process, the network parameters of the depth map are changed into B1This B1Is the network parameter that is ultimately used for the enhancement process.
In the training process provided by the embodiment of the invention, as the training process benefits from the cascade structure of the full convolution neural network and the transfer learning mode by utilizing the texture map, the full convolution neural network can be quickly converged even if the training data with limited quantity is used compared with the network with the same depth; in addition, the edge profile map is introduced into the second loss function, so that the aim of removing the compression distortion of the depth map to be enhanced without blurring the edge can be achieved.
Furthermore, training a full convolution neural network with a small block of training samples (24 × 24) instead of the entire texture or depth map has at least three benefits: firstly, the whole convolution process and the back propagation time in the full convolution neural network can be shortened; secondly, it can generate more training samples, thereby leading the network to present better results; thirdly, the full convolution neural network can 'see' more boundary information, and the purpose of eliminating boundary distortion is achieved. Therefore, in the actual training, the texture map or the depth map is divided into small blocks of 24x24 to be fed into the full convolution neural network for training.
And step S103, sequentially processing the compressed depth map to be enhanced by the cascading FCN unit in the target full convolution neural network to obtain the depth map with enhanced quality.
Specifically, each FCN unit includes 4 convolutional layers connected in sequence, and the enhancement process in each FCN unit is as follows:
performing feature extraction on the compressed depth map to be enhanced through a 1 st convolution layer to obtain 64 first feature maps;
performing feature fusion on the 64 first feature maps through a 2 nd convolutional layer to obtain 32 second feature maps;
carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolutional layer to obtain 16 third feature maps;
and reconstructing the 16 third feature maps through the 4 th convolution layer to obtain a depth map with enhanced quality.
Wherein the 1 st convolutional layer consists of 64 convolution kernels of size 9 × 9, the 2 nd convolutional layer consists of 32 convolution kernels of size 7 × 7, the 3 rd convolutional layer consists of 16 convolution kernels of size 1 × 1, and the 4 th convolutional layer consists of 1 convolution kernel of size 5 × 5.
Specifically, after the compressed depth map to be enhanced sequentially passes through the cascaded FCN units, the quality of the depth map to be enhanced is gradually enhanced; and outputting the final depth map with enhanced quality after the last FCN unit processing.
Specifically, the first three convolutional layers in each FCN Unit provided by the embodiment of the present invention are respectively connected with a Parametric modified Linear Unit (prellu), that is, a Parametric modified Linear Unit (parametrically modified Linear Unit), corresponding to the active layer shown in fig. 2, for introducing nonlinearity into the full-convolutional neural network, so as to enhance the nonlinear relationship between the convolutional layers.
Specifically, the full convolutional neural network provided by the embodiment of the present invention includes 2 FCN units, that is, 8 convolutional layers in total; in fact, experiments show that the effect of the network is optimal when the network is connected with two FCNs, and if the network depth is increased, the quality of a recovered image is improved slightly; from the point of view of computational efficiency and storage cost, 2 FCN units are finally selected.
The embodiment of the invention provides a method for enhancing the quality of a compressed depth map based on a full convolution neural network, wherein the designed full convolution neural network adopts cascaded FCN units, and each FCN unit consists of 4 layers of convolution layers which are sequentially connected; when the full convolution neural network is used for training and quality enhancement, only the compressed depth map is used as a unique input, no preprocessing or post-processing is involved, all calculations are encapsulated in the end-to-end network, and the training complexity of the equivalent depth network is simplified to a great extent by the cascade network design; when training is carried out, a large number of texture maps are used for primary training, and then a small number of depth maps are used for carrying out secondary training on the network to obtain optimal network parameters, so that the network can be well trained under the condition that only a small number of depth map training samples exist, the network is converged but not supersaturated, and the generalization capability of the network is ensured; in addition, the edge profile is introduced into the second loss function, so that the boundary distortion can be effectively reduced, and the quality of the depth map is obviously improved; when quality enhancement is carried out, the cascade FCN units are adopted to carry out quality enhancement on the depth map, and objective visualization results show that the method provided by the embodiment of the invention can achieve superior performance in the aspect of improving the quality of the depth map compared with the most advanced method at present. The method provided by the invention can be applied to the quality enhancement aspect of the 3D video in the future; in the future, the precision of the depth information can be improved, and the method can be used in the fields of human-computer interaction, 3D reconstruction and the like.
As shown in fig. 3, the compressed depth map quality enhancement apparatus based on a full convolutional neural network provided in an embodiment of the present invention, where the full convolutional neural network includes a plurality of cascaded full convolutional network FCN units, each FCN unit includes 4 convolutional layers connected in sequence, and a specific structure is shown in fig. 2, the apparatus includes:
a first training module 201, configured to train network parameters in a preset full convolution neural network by using a large amount of compressed texture maps and a preset first loss function, so as to obtain optimized network parameters;
specifically, the first loss function is:
Figure BDA0001357450260000111
therein, Loss1Representing the Euclidean distance between the reconstructed texture map and the texel point-to-point before compression, theta represents all the parameters in the full convolution neural network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure BDA0001357450260000112
showing the texture map before compression.
Specifically, in the first training module 201, a large number of compressed texture maps are input into the full convolution neural network, and network parameters in the full convolution neural network are continuously adjusted by using the texture maps generated by the full convolution neural network and combining with the function value fed back by the first loss function until the first training process is finished after the full convolution neural network converges, that is, when the value of the first loss function is sufficiently reduced and is not reduced any more; then, the network parameters obtained by training at this time are used as the network parameters of the full convolution neural network obtained by training with the texture map.
The second training module 202 is configured to train the optimized network parameters by using a small amount of compressed depth maps and a preset second loss function, so as to obtain a target full-convolution neural network;
specifically, after the first training process is finished, initializing network parameters in the full convolution neural network trained by the depth map in a transfer learning mode, namely assigning the network parameters obtained in the first training process to the network parameters in the full convolution neural network trained by the depth map to serve as initial values of the network parameters; in a second training module, training the initialized network parameters in the fully convolutional neural network by using a small number of compressed depth maps and combining a preset second loss function to obtain a target fully convolutional neural network; specifically, a small amount of compressed depth maps are used for training network parameters in the initialized full convolution neural network, and the depth maps output by training and a preset second loss function are used for adjusting the network parameters in the full convolution neural network until the full convolution neural network converges, so that the target full convolution neural network is obtained.
Wherein the second loss function is:
Figure BDA0001357450260000121
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the depth map pixel point-to-point before compression, theta represents all parameters in the full convolution neural network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure BDA0001357450260000122
representing the depth map before compression.
The first training module 201 and the second training module 202 provided in the embodiment of the present invention benefit from the cascade structure of the full convolutional neural network and the transfer learning manner using the texture map, so that the full convolutional neural network can rapidly converge even though the limited amount of training data is used, compared with a network with the same depth; in addition, the edge profile map is introduced into the second loss function, so that the aim of removing the compression distortion of the depth map to be enhanced without blurring the edge can be achieved.
Furthermore, training a full convolution neural network with a small block of training samples (24 × 24) instead of the entire texture or depth map has at least three benefits: firstly, the whole convolution process and the back propagation time in the full convolution neural network can be shortened; secondly, it can generate more training samples, thereby leading the network to present better results; thirdly, the full convolution neural network can 'see' more boundary information, and the purpose of eliminating boundary distortion is achieved. Therefore, in the actual training, the texture map or the depth map is divided into small blocks of 24x24 to be fed into the full convolution neural network for training.
And the quality enhancement module 203 is configured to sequentially process the compressed depth map to be enhanced by the cascaded FCN unit in the target full convolutional neural network to obtain a depth map with enhanced quality.
Specifically, each of the FCN units includes: the device comprises a feature extraction submodule, a fusion submodule, a nonlinear mapping submodule and a reconstruction submodule;
the feature extraction submodule is used for performing feature extraction on the compressed depth map to be enhanced through a 1 st convolution layer to obtain 64 first feature maps;
the fusion submodule is used for performing feature fusion on the 64 first feature maps through the 2 nd convolutional layer to obtain 32 second feature maps;
the nonlinear mapping submodule is used for carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolution layer to obtain 16 third feature maps;
and the reconstruction submodule is used for reconstructing the 16 third feature maps through the 4 th convolution layer to obtain the depth map with enhanced quality.
Wherein the 1 st convolutional layer consists of 64 convolution kernels of size 9 × 9, the 2 nd convolutional layer consists of 32 convolution kernels of size 7 × 7, the 3 rd convolutional layer consists of 16 convolution kernels of size 1 × 1, and the 4 th convolutional layer consists of 1 convolution kernel of size 5 × 5.
Specifically, after the compressed depth map to be enhanced sequentially passes through the cascaded FCN units, the quality of the depth map to be enhanced is gradually enhanced; and outputting the final depth map with enhanced quality after the last FCN unit processing.
Specifically, the first three convolutional layers in each FCN Unit provided by the embodiment of the present invention are respectively connected with a Parametric modified Linear Unit (prellu), that is, a Parametric modified Linear Unit (parametrically modified Linear Unit), corresponding to the active layer shown in fig. 2, for introducing nonlinearity into the full-convolutional neural network, so as to enhance the nonlinear relationship between the convolutional layers.
Specifically, the full convolutional neural network provided by the embodiment of the present invention includes 2 FCN units, that is, 8 convolutional layers in total; in fact, experiments show that the effect of the network is optimal when the network is connected with two FCNs, and if the network depth is increased, the quality of a recovered image is improved slightly; from the point of view of computational efficiency and storage cost, 2 FCN units are finally selected.
The embodiment of the invention provides a compression depth map quality enhancing device based on a full convolution neural network, wherein the designed full convolution neural network adopts cascaded FCN units, and each FCN unit consists of 4 convolution layers which are sequentially connected; when the full convolution neural network is used for training and quality enhancement, only the compressed depth map is used as a unique input, no preprocessing or post-processing is involved, all calculations are encapsulated in the end-to-end network, and the training complexity of the equivalent depth network is simplified to a great extent by the cascade network design; when training is carried out, a large number of texture maps are used for primary training, and then a small number of depth maps are used for carrying out secondary training on the network to obtain optimal network parameters, so that the network can be well trained under the condition that only a small number of depth map training samples exist, the network is converged but not supersaturated, and the generalization capability of the network is ensured; in addition, the edge profile is introduced into the second loss function, so that the boundary distortion can be effectively reduced, and the quality of the depth map is obviously improved; when quality enhancement is carried out, the cascade FCN units are adopted to carry out quality enhancement on the depth map, and objective visualization results show that the device provided by the embodiment of the invention can achieve superior performance in the aspect of improving the quality of the depth map compared with the most advanced method at present. The device provided by the invention can be applied to the quality enhancement aspect of 3D video in the future; in the future, the precision of the depth information can be improved, and the method can be used in the fields of human-computer interaction, 3D reconstruction and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A compressed depth map quality enhancement method based on a full convolution neural network is characterized in that the full convolution neural network comprises two cascaded FCN units, each FCN unit comprises 4 convolution layers which are sequentially connected, and a parameterization correction linear unit is respectively connected behind the first three convolution layers of each FCN unit, and the method comprises the following steps:
training network parameters in a preset full convolution neural network by using a first quantity of compressed texture maps, and adjusting the network parameters in the full convolution neural network by using the texture maps reconstructed by the network and a preset first loss function until the full convolution neural network reaches stable convergence to obtain optimized network parameters;
wherein the first loss function is:
Figure FDA0002891947670000011
therein, Loss1Representing the Euclidean distance between the reconstructed texture map and the texel point-to-point before compression, theta represents the full convolution magicVia all parameters in the network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure FDA0002891947670000012
representing the texture map before compression;
training the optimized network parameters by using a second number of compressed depth maps and a preset second loss function to obtain a target full convolution neural network, specifically, training the optimized network parameters by using a second number of compressed depth maps, and adjusting the network parameters in the full convolution neural network by using the depth maps reconstructed by the network and the preset second loss function until the full convolution neural network reaches stable convergence to obtain the target full convolution neural network; wherein the first number is greater than the second number;
wherein the second loss function is:
Figure FDA0002891947670000021
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the depth map pixel point-to-point before compression, theta represents all parameters in the full convolution neural network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure FDA0002891947670000022
representing a depth map before compression;
and processing the compressed depth map to be enhanced sequentially through two cascaded FCN units in the target full-convolution neural network to obtain the depth map with enhanced quality.
2. The method for enhancing the quality of the compressed depth map according to claim 1, wherein the processing the compressed depth map to be enhanced sequentially through two cascaded FCN units in the target full convolutional neural network comprises:
in each FCN unit, performing feature extraction on the compressed depth map to be enhanced through a 1 st convolutional layer to obtain 64 first feature maps;
performing feature fusion on the 64 first feature maps through a 2 nd convolutional layer to obtain 32 second feature maps;
carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolutional layer to obtain 16 third feature maps;
and reconstructing the 16 third feature maps through the 4 th convolution layer to obtain a depth map with enhanced quality.
3. The method of claim 2, wherein said 1 st convolutional layer consists of 64 convolution kernels of size 9 x 9, said 2 nd convolutional layer consists of 32 convolution kernels of size 7 x 7, said 3 rd convolutional layer consists of 16 convolution kernels of size 1 x 1, and said 4 th convolutional layer consists of 1 convolution kernel of size 5 x 5.
4. The device for enhancing the quality of the compressed depth map based on the full convolutional neural network is characterized in that the full convolutional neural network comprises two cascaded FCN units, each FCN unit comprises 4 convolutional layers which are sequentially connected, a parameterized modified linear unit is respectively connected behind the first three convolutional layers of each FCN unit, and the device comprises:
a first training module, configured to train network parameters in a preset full convolution neural network using a first number of compressed texture maps and a preset first loss function to obtain optimized network parameters, specifically,
training network parameters in a preset full convolution neural network by using a first quantity of compressed texture maps, and adjusting the network parameters in the full convolution neural network by using the texture maps reconstructed by the network and a preset first loss function until the full convolution neural network reaches stable convergence to obtain optimized network parameters;
wherein the first loss function is:
Figure FDA0002891947670000031
therein, Loss1Representing the Euclidean distance between the reconstructed texture map and the texel point-to-point before compression, theta represents all the parameters in the full convolution neural network, F (I)T(ii) a Θ) represents the quality enhanced texture map after reconstruction of the full convolution neural network, ITA compressed texture map is represented that is,
Figure FDA0002891947670000032
representing the texture map before compression;
a second training module, configured to train the optimized network parameters by using a second number of compressed depth maps and a preset second loss function to obtain a target full convolution neural network, specifically,
training the optimized network parameters by using a second number of compressed depth maps, and adjusting the network parameters in the full convolution neural network by using the depth maps reconstructed by the network and a preset second loss function until the full convolution neural network reaches stable convergence to obtain a target full convolution neural network; wherein the first number is greater than the second number;
wherein the second loss function is:
Figure FDA0002891947670000041
therein, Loss2Representing the Euclidean distance between the reconstructed depth map and the pixel points of the depth map before compression, theta represents the fully-convoluted nerveAll parameters in the network, IMRepresenting an edge profile, F (I), extracted from the depth map before compressionD(ii) a Θ) represents a depth map of quality enhancement after reconstruction of said full convolution neural network, IDA compressed depth map is represented that is,
Figure FDA0002891947670000042
representing a depth map before compression;
and the quality enhancement module is used for processing and correcting the compressed depth map to be enhanced sequentially through the two cascaded FCN units in the target full convolutional neural network to obtain the depth map with enhanced quality.
5. The compressed depth map quality enhancement apparatus of claim 4, wherein each of the FCN units comprises: the device comprises a feature extraction submodule, a fusion submodule, a nonlinear mapping submodule and a reconstruction submodule;
the feature extraction submodule is used for performing feature extraction on the compressed depth map to be enhanced through the 1 st convolutional layer to obtain 64 first feature maps;
the fusion submodule is used for performing feature fusion on the 64 first feature maps through the 2 nd convolutional layer to obtain 32 second feature maps;
the nonlinear mapping submodule is used for carrying out nonlinear mapping on the 32 second feature maps through a 3 rd convolution layer to obtain 16 third feature maps;
and the reconstruction submodule is used for reconstructing the 16 third feature maps through the 4 th convolution layer to obtain the depth map with enhanced quality.
6. The compressed depth map quality enhancement device of claim 5, wherein the 1 st convolutional layer is comprised of 64 convolution kernels of size 9 x 9, the 2 nd convolutional layer is comprised of 32 convolution kernels of size 7 x 7, the 3 rd convolutional layer is comprised of 16 convolution kernels of size 1 x 1, and the 4 th convolutional layer is comprised of 1 convolution kernel of size 5 x 5.
CN201710602293.4A 2017-07-21 2017-07-21 Compressed depth map quality enhancement method and device based on full convolution neural network Active CN107403415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710602293.4A CN107403415B (en) 2017-07-21 2017-07-21 Compressed depth map quality enhancement method and device based on full convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710602293.4A CN107403415B (en) 2017-07-21 2017-07-21 Compressed depth map quality enhancement method and device based on full convolution neural network

Publications (2)

Publication Number Publication Date
CN107403415A CN107403415A (en) 2017-11-28
CN107403415B true CN107403415B (en) 2021-04-09

Family

ID=60401268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710602293.4A Active CN107403415B (en) 2017-07-21 2017-07-21 Compressed depth map quality enhancement method and device based on full convolution neural network

Country Status (1)

Country Link
CN (1) CN107403415B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903350B (en) * 2017-12-07 2021-08-06 上海寒武纪信息科技有限公司 Image compression method and related device
WO2019109336A1 (en) 2017-12-08 2019-06-13 Baidu.Com Times Technology (Beijing) Co., Ltd. Stereo camera depth determination using hardware accelerator
CN108416748A (en) * 2018-02-26 2018-08-17 阿博茨德(北京)科技有限公司 The image pre-processing method and device of JPEG compression document
CN108520220B (en) * 2018-03-30 2021-07-09 百度在线网络技术(北京)有限公司 Model generation method and device
CN109102468B (en) * 2018-06-27 2021-06-01 广州视源电子科技股份有限公司 Image enhancement method and device, terminal equipment and storage medium
CN109003239B (en) * 2018-07-04 2022-03-29 华南理工大学 Multispectral image sharpening method based on transfer learning neural network
CN109255758B (en) * 2018-07-13 2021-09-21 杭州电子科技大学 Image enhancement method based on all 1 x 1 convolution neural network
CN109003272B (en) * 2018-07-26 2021-02-09 北京小米移动软件有限公司 Image processing method, device and system
CN110766152B (en) * 2018-07-27 2023-08-04 富士通株式会社 Method and apparatus for training deep neural networks
CN109410289B (en) * 2018-11-09 2021-11-12 中国科学院精密测量科学与技术创新研究院 Deep learning high undersampling hyperpolarized gas lung MRI reconstruction method
CN111382772B (en) * 2018-12-29 2024-01-26 Tcl科技集团股份有限公司 Image processing method and device and terminal equipment
CN109949224B (en) * 2019-02-26 2023-06-30 北京悦图遥感科技发展有限公司 Deep learning-based cascade super-resolution reconstruction method and device
CN109903219B (en) * 2019-02-28 2023-06-30 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and computer readable storage medium
CN110135582B (en) * 2019-05-09 2022-09-27 北京市商汤科技开发有限公司 Neural network training method, neural network training device, image processing method, image processing device and storage medium
CN110399881B (en) * 2019-07-11 2021-06-01 深圳大学 End-to-end quality enhancement method and device based on binocular stereo image
CN111415311B (en) * 2020-03-27 2023-03-14 北京航空航天大学杭州创新研究院 Resource-saving image quality enhancement model
CN115278246B (en) * 2022-08-01 2024-04-16 天津大学 Depth map end-to-end intelligent compression coding method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9514522B2 (en) * 2012-08-24 2016-12-06 Microsoft Technology Licensing, Llc Depth data processing and compression
CN105825484B (en) * 2016-03-23 2018-06-22 华南理工大学 A kind of depth image denoising and Enhancement Method based on deep learning
CN106709875B (en) * 2016-12-30 2020-02-18 北京工业大学 Compressed low-resolution image restoration method based on joint depth network

Also Published As

Publication number Publication date
CN107403415A (en) 2017-11-28

Similar Documents

Publication Publication Date Title
CN107403415B (en) Compressed depth map quality enhancement method and device based on full convolution neural network
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
Hai et al. R2rnet: Low-light image enhancement via real-low to real-normal network
CN111028177B (en) Edge-based deep learning image motion blur removing method
CN107392852B (en) Super-resolution reconstruction method, device and equipment for depth image and storage medium
CN110610526B (en) Method for segmenting monocular image and rendering depth of field based on WNET
CN112801901A (en) Image deblurring algorithm based on block multi-scale convolution neural network
CN109035146B (en) Low-quality image super-resolution method based on deep learning
CN111161360B (en) Image defogging method of end-to-end network based on Retinex theory
Shen et al. Convolutional neural pyramid for image processing
CN106709879B (en) A kind of spatial variations point spread function smoothing method that picture is calculated as based on unzoned lens
WO2021258959A1 (en) Image restoration method and apparatus, and electronic device
Purohit et al. Depth-guided dense dynamic filtering network for bokeh effect rendering
Zhao et al. Legacy photo editing with learned noise prior
TWI768517B (en) Image quality improvement method and image processing apparatus using the same
CN114881888A (en) Video Moire removing method based on linear sparse attention transducer
CN112991231A (en) Single-image super-image and perception image enhancement joint task learning system
Hai et al. Advanced retinexnet: a fully convolutional network for low-light image enhancement
Liu et al. Facial image inpainting using multi-level generative network
CN111626943B (en) Total variation image denoising method based on first-order forward and backward algorithm
CN110738624B (en) Area-adaptive image defogging system and method
CN115272131B (en) Image mole pattern removing system and method based on self-adaptive multispectral coding
CN116895037A (en) Frame insertion method and system based on edge information and multi-scale cross fusion network
CN115587934A (en) Image super-resolution reconstruction and defogging method and system based on loss classification and double-branch network
CN116977191A (en) Training method of image quality improvement model and image quality improvement method of video conference system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant