CN111107377A

CN111107377A - Depth image compression method, device, equipment and storage medium

Info

Publication number: CN111107377A
Application number: CN201811258164.9A
Authority: CN
Inventors: 胡强; 石志儒
Original assignee: Yaoke Intelligent Technology Shanghai Co ltd
Current assignee: Yaoke Intelligent Technology Shanghai Co ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2020-05-05

Abstract

The invention provides a depth image compression method, a device, equipment and a storage medium thereof, which are characterized in that a feature coefficient matrix is obtained by positive transformation of a depth image, a corresponding feature coefficient code stream is obtained by entropy coding of the quantized feature coefficient matrix through a Gaussian probability model, a corresponding meta information code stream is obtained by bypass entropy coding of meta information of the depth image, and finally the feature coefficient code stream and the meta information code stream are combined to be used as compression data of the depth image. The invention can solve the problem of synthetic view distortion caused by depth image compression and surpass the traditional coding standards JPEG and BPG in compression performance.

Description

Depth image compression method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of depth image processing. And more particularly, to a depth image compression method, apparatus, device, and storage medium thereof.

Background

The light field video is used as a novel digital medium, a user can freely select viewpoints to watch a three-dimensional scene from a plurality of angles by providing video data of a plurality of viewpoints, and the unique stereoscopic effect and the inter-viewpoint interaction function of the light field video enable the light field video to be widely applied to a plurality of fields such as three-dimensional televisions, free viewpoint televisions, light field monitoring and the like. The light field video contains a large amount of data information, so that the storage and transmission pressure is huge, and how to efficiently compress the light field video becomes a bottleneck restricting the development of the light field video. Virtual view synthesis based on depth image rendering is a key technology in light field video applications, which synthesizes views of arbitrary viewpoint using video data of reference viewpoint and corresponding depth image, and the quality of the synthesized views largely depends on the quality of the depth image.

In the past decades, a series of image coding standards have been widely used. There are many existing image compression standards, including JPEG and JPEG2000, which are established by the joint photographic experts group, PNG, which is released by the international organization for standardization/international electrotechnical commission, and the like. However, the image compression is for conventional images and not for depth images. Unlike video images, which are grayscale images containing a large number of uniform regions, more spatial redundancy, and sharp boundaries, compressing depth images using conventional image coding methods can produce severe distortion at the boundaries, thereby affecting the quality of the synthesized view.

Therefore, based on the key role of depth images in image processing and machine vision tasks, a method for reducing or extracting distortion for depth image compression is needed.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a depth image compression method and apparatus, an electronic device and a storage medium thereof, which are used to solve the problem of serious distortion generated by compressing a depth image in the prior art.

To achieve the above and other related objects, the present invention provides a depth image compression method, including: carrying out forward transformation processing on the depth image based on a self-coding network to obtain a characteristic coefficient matrix representing information of the depth image; quantizing the characteristic coefficient matrix, and entropy coding the quantized characteristic coefficient matrix through a Gaussian probability model to obtain a corresponding characteristic coefficient code stream; performing bypass entropy coding on the meta-information of the depth image to obtain a corresponding meta-information code stream; and merging the characteristic coefficient code stream and the meta-information code stream to be used as compressed data of the depth image.

In an embodiment of the invention, the feature coefficient matrix may be inverse transformed by a self-coding network to reconstruct pixel values of the depth image.

In an embodiment of the present invention, the forward transform and the backward transform are symmetric structures, and respectively consist of 6 convolution layers and 5 normalization layers.

In an embodiment of the present invention, the quantization process is scalar quantization, including: rounding quantization processing is performed on the input value, and the integer closest to the input value is selected as the output value.

In an embodiment of the present invention, the quantization process is approximated by adding random uniform noise during the training of the self-coding network, so that the encoding and decoding process becomes feasible.

In an embodiment of the invention, the gaussian probability model is obtained by performing probability modeling based on gaussian distribution on the feature coefficients of the depth image.

In an embodiment of the present invention, the method for performing bypass entropy coding on the meta information of the depth image to obtain a corresponding meta information code stream includes: the length and the width of the depth image are respectively binarized by using 2 integers of 16 bits, and binary code streams are obtained by adopting bypass binary digital coding; carrying out binarization on the serial numbers corresponding to the Gaussian probability models by using 8-bit integers, and obtaining binary code streams by adopting bypass binary digital coding; and summarizing the binary code streams corresponding to the length and the width of the depth image and the binary code streams corresponding to the sequence numbers corresponding to the Gaussian probability model to obtain the meta-information code stream.

To achieve the above and other related objects, the present invention provides a depth image compression apparatus, comprising: the positive and negative transformation module is used for carrying out positive transformation on the depth image to obtain a characteristic coefficient matrix representing the information of the depth image; and/or, inverse transforming the characteristic coefficient matrix to reconstruct pixel values of the depth image; the quantizer is used for performing quantization processing on the characteristic coefficient matrix; the entropy coder is used for entropy coding the quantized characteristic coefficient matrix through a Gaussian probability model to obtain a corresponding characteristic coefficient code stream; the method comprises the steps of obtaining a depth image, and performing bypass entropy coding on meta-information of the depth image to obtain a corresponding meta-information code stream; and the synthesizer is used for merging the characteristic coefficient code stream and the meta-information code stream to be used as the compressed data of the depth image.

To achieve the above and other related objects, the present invention provides an electronic device, comprising: a processor, and a memory; the memory is used for storing programs; the processor runs a program to realize the depth image compression method.

To achieve the above and other related objects, the present invention provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, implements the depth image compression method described above.

As described above, according to the depth image compression method, the apparatus, the device, and the storage medium provided by the present invention, a feature coefficient matrix is obtained by performing forward transformation on a depth image, entropy coding is performed on the quantized feature coefficient matrix by using a gaussian probability model to obtain a corresponding feature coefficient code stream, and side-by entropy coding is performed on meta-information of the depth image to obtain a corresponding meta-information code stream, and finally, the feature coefficient code stream and the meta-information code stream are combined to be used as compressed data of the depth image. Has the following beneficial effects:

being able to depth image compress causes problems with distortion of the synthesized view and outperforms the conventional coding standards JPEG and BPG in terms of compression performance.

Drawings

Fig. 1 is a flowchart illustrating a depth image compression method according to an embodiment of the invention.

FIG. 2 is a block diagram of a depth image compression apparatus according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a depth image compression apparatus according to an embodiment of the invention.

Description of the element reference numerals

Method steps S101 to S104

200 depth image compression device

201 positive and negative conversion module

202 quantizer

203 entropy coder

204 synthesizer

300 depth image compression device

301 memory

302 processor

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Fig. 1 shows a flowchart of a depth image compression method according to an embodiment of the invention. As shown, the method comprises:

step S101: the depth image is subjected to forward transformation processing based on a self-coding network to obtain a characteristic coefficient matrix of information representing the depth image.

In an embodiment of the present invention, the forward transform based on the self-coding network is a forward convolution transform algorithm performed according to a convolutional neural network, and the subsequent inverse transform is also based on a deconvolution transform algorithm in the convolutional neural network.

The Convolutional Neural Network (CNN) is a locally connected network. Compared with a fully-connected network, the method has the following main characteristics: local connectivity and weight sharing. Because for a certain pixel p in an image, pixels closer to the pixel p generally have a larger effect on it (local connectivity); in addition, according to the statistical characteristics of the natural image, the weight of a certain region may be used for another region (weight sharing). The weight sharing is convolution kernel sharing, for one convolution kernel, the convolution kernel is convolved with a given image to extract the characteristics of one image, and different convolution kernels can extract different image characteristics. In summary, the convolutional layer is calculated according to the formula: conv ═ σ (imgMat ° W + b); where "σ" represents an activation function; "imgMat" represents a gray-scale image matrix; "W" represents a convolution kernel; "°" denotes a convolution operation; "b" represents an offset value.

In an embodiment of the present invention, the forward transform and the reverse transform are necessary for compressing the image, and are used for extracting and restoring image features.

The characteristic coefficient matrix of the information of the depth image is obtained by performing forward transformation on the depth image, so that it is easy to understand that the pixel value corresponding to each pixel in the depth image can be obtained by performing deconvolution transformation on the characteristic coefficient matrix.

And different from the forward transform or the inverse transform adopted by the common image compression: in this embodiment, the forward transform and the backward transform are symmetrical structures, and respectively include 6 convolution layers and 5 normalization layers.

For example, a forward conversion network starts with convolutional layers, each followed by a normalization layer. The convolution kernel size of each convolution layer is 7 multiplied by 7, the number of the convolution kernels is 256, and the space length and the space width of the feature after convolution are reduced to half of the original space length and width. The inverse transform network starts with deconvolution layers, each of which is followed by an inverse normalization layer. The convolution kernel size of each deconvolution layer is 7 x 7, the number of convolution kernels is 256, and the space length and width of the features after deconvolution are enlarged to two times of the original ones.

Step S102: and quantizing the characteristic coefficient matrix, and entropy coding the quantized characteristic coefficient matrix through a Gaussian probability model to obtain a corresponding characteristic coefficient code stream.

In an embodiment of the present invention, the quantization process used is different from the quantization process used in the prior art. The quantization process described in this embodiment is a scalar quantization, and includes: rounding quantization processing is performed on the input value, and the integer closest to the input value is selected as the output value.

In addition, since the self-coding network in the invention is constructed based on the convolutional neural network, the related network parameters need to be trained, so that the whole self-coding network can be optimally trained end to end.

Specifically, the quantization process is approximated by adding random uniform noise when the self-coding network is trained for the quantization process, so that the codec process becomes conductive.

The encoding and decoding process can be conducted, gradient reverse conduction can be further conducted, the bilateral filtering effect is achieved, and high-frequency information such as the edge of an image can be better kept.

It should be noted that gradient back conduction is the most important technique in training or optimizing an intelligent system, and the function of the method is to finally make the model converge by searching for the minimum value, controlling the variance and updating the model parameters. In particular, the convolutional neural network model is mainly used for weight updating, that is, updating and adjusting the parameters of the model in one direction to minimize the loss function.

Compared with the prior art, the method adopts the convolutional neural network-based training for quantization processing, so that the coding and decoding process becomes conductive, high-frequency information such as the edge of an image can be better maintained, and the coding loss is reduced.

It should be noted that entropy coding is the most key to improve the compression efficiency of an image, in the prior art, a quantized coefficient is generally modeled by laplacian or non-parametric mode, the invention is modeled by a parameterized gaussian model, so that the spatial redundancy of the image can be effectively reduced, and only learned parameters are mean values and variances, so that the learned parameters are reduced, which is beneficial to realizing and improving the coding speed, and the compression performance of the invention is superior to that of the traditional coding standards JPEG and BPG.

For example, the self-coding network is trained with the objective of minimizing the loss function J ═ R + λ D. D is coding distortion, and PSNR is adopted as a measurement index; and R is the coding rate, the information entropy is used for approximation, the information entropy is obtained and is related to a probability density function of the coefficient, and the probability density adopts modeling based on Gaussian distribution.

In an embodiment of the present invention, the feature coefficient code stream corresponding to the feature coefficient of the depth image is a binary code stream.

Step S103: and performing bypass entropy coding on the meta-information of the depth image to obtain a corresponding meta-information code stream.

The bypass entropy coding mode can bypass estimation and updating of a probability model, and carry out average segmentation on intervals to accelerate coding and decoding.

In an embodiment of the present invention, besides the extracted image information features need to be compressed and transmitted, some header information or meta information needs to be compressed and transmitted, and the header information or meta information can be used for correct decoding, otherwise, only the characteristic compressed code stream is transmitted to the image that cannot be decoded.

Step S104: and merging the characteristic coefficient code stream and the meta-information code stream to be used as compressed data of the depth image.

In an embodiment of the present invention, entropy coding is performed on the quantized feature coefficient matrix through a gaussian probability model to obtain a corresponding feature coefficient code stream, and by-pass entropy coding is performed on the meta information of the depth image to obtain a corresponding meta information code stream, which are combined to obtain compressed data corresponding to the depth image formed based on a convolutional neural network.

By constructing the convolutional neural network and optimizing all parameters in the network end to end without supervision training, the network parameters can be updated by using gradient reverse conduction, so that the spatial redundancy of the image is effectively reduced, high-frequency information such as the edge of the image is better kept, and the realization and the improvement of the coding speed are facilitated.

Referring to FIG. 2, a block diagram of a depth image compression device in an embodiment of the invention is shown. As shown, the depth image compression apparatus 200 includes:

a forward-backward transformation module 201, configured to perform forward transformation on a depth image to obtain a characteristic coefficient matrix representing information of the depth image; and/or, the characteristic coefficient matrix is used for carrying out inverse transformation on the characteristic coefficient matrix so as to reconstruct pixel values of the depth image.

And a quantizer 202, configured to perform quantization processing on the feature coefficient matrix.

The entropy encoder 203 is configured to perform entropy encoding on the quantized feature coefficient matrix through a gaussian probability model to obtain a corresponding feature coefficient code stream; the method comprises the steps of obtaining a depth image, and performing bypass entropy coding on meta-information of the depth image to obtain a corresponding meta-information code stream;

a synthesizer 204, configured to merge the feature coefficient code stream and the meta information code stream to serve as compressed data of the depth image.

In an embodiment of the invention, the modules are used together to implement the steps of the depth image compression method as described in fig. 1.

In an embodiment of the present invention, the entropy encoder 203 may also be an autoencoder, which is a kind of neural network that is trained to attempt to copy the input to the output. The self-encoder has a hidden layer h inside, which can generate a coded (code) representation input. The network can be seen as being composed of two parts: an encoder represented by the function h ═ f (x) and a decoder r ═ g (h) that generates the reconstruction.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the forward/backward conversion module 201 may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the forward/backward conversion module 201. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 3 is a schematic structural diagram of a depth image compression apparatus according to an embodiment of the invention. As shown, the depth image compression apparatus 300 includes: a processor 301, and a memory 302; the memory 302 is used for storing programs; the processor 301 runs a program to implement the depth image compression method as described in fig. 1.

The Processor 301 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The Memory 302 may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

To achieve the above and other related objects, the present invention provides a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, implements a depth image compression method as described in fig. 1.

The computer-readable storage medium, as will be appreciated by one of ordinary skill in the art: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In summary, according to the depth image compression method and apparatus, the electronic device and the storage medium provided by the present invention, the depth image is subjected to forward transformation to obtain the feature coefficient matrix, the quantized feature coefficient matrix is subjected to entropy coding by using the gaussian probability model to obtain the corresponding feature coefficient code stream, the meta information of the depth image is subjected to bypass entropy coding to obtain the corresponding meta information code stream, and finally, the feature coefficient code stream and the meta information code stream are combined to be used as the compressed data of the depth image.

The invention can solve the problem of synthetic view distortion caused by depth image compression and surpass the traditional coding standards JPEG and BPG in compression performance.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A depth image compression method, the method comprising:

carrying out forward transformation processing on the depth image based on a self-coding network to obtain a characteristic coefficient matrix representing information of the depth image;

quantizing the characteristic coefficient matrix, and entropy coding the quantized characteristic coefficient matrix through a Gaussian probability model to obtain a corresponding characteristic coefficient code stream;

performing bypass entropy coding on the meta-information of the depth image to obtain a corresponding meta-information code stream;

and merging the characteristic coefficient code stream and the meta-information code stream to be used as compressed data of the depth image.

2. The method of claim 1, wherein the eigen coefficient matrix is inverse transformed from a coding network to reconstruct pixel values of the depth image.

3. The method of claim 2, wherein the forward transform and the inverse transform have a symmetrical structure and are respectively composed of 6 convolutional layers and 5 normalization layers.

4. The depth image compression method according to claim 1, wherein the quantization process is scalar quantization, including: rounding quantization processing is performed on the input value, and the integer closest to the input value is selected as the output value.

5. The method of claim 1, wherein the quantization process is approximated by adding random uniform noise when the self-coding network is trained, such that the codec process becomes conductive.

6. The depth image compression method according to claim 1, wherein the gaussian probability model is obtained by performing probability modeling based on a gaussian distribution on the feature coefficients of the depth image.

7. The method of claim 1, wherein the method of bypass entropy coding of the meta-information of the depth image to obtain a corresponding meta-information code stream comprises:

the length and the width of the depth image are respectively binarized by using 2 integers of 16 bits, and binary code streams are obtained by adopting bypass binary digital coding;

carrying out binarization on the serial numbers corresponding to the Gaussian probability models by using 8-bit integers, and obtaining binary code streams by adopting bypass binary digital coding;

and summarizing the binary code streams corresponding to the length and the width of the depth image and the binary code streams corresponding to the sequence numbers corresponding to the Gaussian probability model to obtain the meta-information code stream.

8. A depth image compression apparatus, comprising:

the positive and negative transformation module is used for carrying out positive transformation on the depth image to obtain a characteristic coefficient matrix representing the information of the depth image; and/or, inverse transforming the characteristic coefficient matrix to reconstruct pixel values of the depth image;

the quantizer is used for performing quantization processing on the characteristic coefficient matrix;

the entropy coder is used for entropy coding the quantized characteristic coefficient matrix through a Gaussian probability model to obtain a corresponding characteristic coefficient code stream; the method comprises the steps of obtaining a depth image, and performing bypass entropy coding on meta-information of the depth image to obtain a corresponding meta-information code stream;

and the synthesizer is used for merging the characteristic coefficient code stream and the meta-information code stream to be used as the compressed data of the depth image.

9. A depth image compression apparatus, comprising: a processor, and a memory;

the memory is used for storing programs; the processor runs a program to implement the depth image compression method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the depth image compression method of any one of claims 1 to 7.