CN111432211B

CN111432211B - Residual error information compression method for video coding

Info

Publication number: CN111432211B
Application number: CN202010247702.5A
Authority: CN
Inventors: 段强; 汝佩哲; 李锐; 金长新
Original assignee: Shandong Inspur Scientific Research Institute Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2021-11-12
Anticipated expiration: 2040-04-01
Also published as: CN111432211A

Abstract

The invention provides a residual error information compression method for video coding, which relates to the field of information compression and coding and decoding. When the residual error information is decoded, the stored entropy coding data is decoded and inversely quantized by using an opposite flow, and the residual error information is restored into three-channel residual error information from the characteristic diagram by decoding through a decoder with an opposite structure. The existing residual error information is compressed or secondarily compressed, so that the storage space is reduced by times, and the storage cost is reduced.

Description

Residual error information compression method for video coding

Technical Field

The invention relates to the field of information compression, coding and decoding, in particular to a residual error information compression method for video coding.

Background

In the digital media era, a large amount of image video data is generated and stored from the fields of daily life, social networking, public security monitoring, industrial production and the like, and a large amount of storage space needs to be consumed. The compression ratio of h264, which is the mainstream video compression format at present, still has a space for improvement, and the motion estimation based on blocks also generates color difference, so that h265, which is not popularized yet, is not considered well due to low compression efficiency and various patent disputes.

Motion compensation, which is an effective method for reducing redundant information of a frame sequence, is to predict and compensate a current local image from a previous local image. It usually has a residual with the real video information, and the residual information can complement the information lost in the motion compensation process.

In view of the large-scale application of neural networks and deep learning techniques to tasks in the field of artificial intelligence, it is very promising to compress data by means of neural networks.

Disclosure of Invention

Based on the above technical problem, the present invention provides a residual information compression method for video coding, which can obtain compressed residual information at a low bit rate for storing and compressing the residual information after motion estimation of video compression.

The method is based on a neural network structure of an autoencoder, uses a GDN activation function, and combines quantization and entropy coding to compress residual error information.

An autoencoder is an artificial neural network that learns an efficient representation of input data through unsupervised learning. It does not need to specially label the training data, and the loss is calculated based on the difference between the input and the output. The process of representing the input data by the neural network can be considered as a kind of encoding, and the dimension of the encoding is usually smaller than that of the input data, so that the compression and dimension reduction effects are achieved. Simple training it makes the input and output the same and has no great significance, so it is forced to learn an efficient representation of the data by adding internal size constraints, such as bottleeck layer, and training the data to add noise and train the self-encoder to recover the original data.

After an efficient representation is obtained, it can be quantized to achieve further compression. Because sometimes floating point numbers with higher precision occupy a lot of storage space, but too many bits after the decimal point do not have a great benefit to the actual task. However, in the back propagation of the neural network, optimization is performed by gradient descent, but quantization is an unguided process and cannot be used in the process of gradient calculation. There are various methods that can replace direct quantization, such as adding uniform noise, soft quantization, etc.

The quantized characteristic values need to be further compressed by entropy coding, and the commonly used entropy coding such as arithmetic coding, huffman coding, shannon coding and the like is important to design an efficient probability model.

Entropy coding belongs to lossless compression of data, reducing bits by identifying and eliminating portions of statistical redundancy, so that it does not lose information when compression is performed. The goal is to display discrete data with fewer bits (than needed for the original data representation) without loss of information during compression.

The method for compressing the residual information based on the self-encoder and the entropy coding can obtain the compressed residual information under the condition of low bit rate, and is used for storing and compressing the residual information after motion estimation of video compression.

The residual features are used to train the self-encoder network by using the self-encoder. Then, a trained Encoder (Encoder) network is used for extraction, a Feature Map (Feature Map) is generated, then the storage space of the data is reduced through quantization (quantization), and the quantized data is further compressed through Entropy Coding (Entropy Coding). When decoding the residual information, the stored entropy-encoded data is decoded and dequantized by using the reverse flow, and decoded by a Decoder (Decoder) with the reverse structure, and the residual information is restored from the feature map.

The implementation steps comprise: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,

1) building a neural network architecture, and specifying the number of layers of convolution layers, the size of convolution kernels, a padding method and the number of threads required by coding. In general, the design principle is that the size of a convolution kernel is first large and then small, the number is first small and then large or consistent, and strides >1 is arranged at certain layers to reduce the size of a feature map;

2) training is carried out by using a training set, each label of residual information is self, a loss function is constructed by mse and bpp, and optimization is carried out by using an Adam optimizer. After multiple iterations, a trained neural network model can be obtained;

3) the encoding process is a process of inputting the existing residual error information into the Encoder part of the trained neural network and obtaining a Feature Map (Feature Map) through multi-step convolution. Wherein the activation function of each convolutional layer uses ReLU or GDN;

4) quantization is commonly used in both the manner of adding uniform noise and soft quantization. Adding uniform noise is a process of adding noise to replace quantization in training, and because differences before and after quantization are similar to uniform noise, simulation is carried out by artificially adding noise.

5) The entropy coding is started, and binary coding is carried out firstly. Non-binary numbers must be binarized or converted to binary numbers before arithmetic coding. And counting the probability density functions of all binary symbols, and carrying out arithmetic coding on each bit of the binary symbols according to the probability density function obtained by counting.

6) The encoded file is stored in a serialized form and can be processed using a serialized package such as pickle.

7) And performing entropy decoding, reading the file stored in a serialized mode, converting the file into decimal fraction, namely converting the decimal point in front of the highest bit into the decimal fraction, and then decoding according to the existing probability density function.

8) After entropy decoding, a feature map with the size identical to that before entropy encoding is obtained, then a neural network opposite to the encoding network is constructed, a convolution layer is replaced by a deconvolution layer, the feature map is restored to residual information of three channels, and one-step rounding quantization is carried out during storage.

The invention has the advantages that

The method has better effect on the tasks of image compression and super-resolution.

The method can be applied to the field of video coding and decoding and compression, and the storage space and the storage cost are reduced by times by compressing or secondarily compressing the existing residual information. The compressed residual information is mainly used for supplementing lost information in video compression and improving the picture quality of video compression.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention;

fig. 2 is an exemplary diagram of a neural network structure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The method comprises the steps of extracting residual error information through a trained Encoder (Encoder) network by using the idea of a self-Encoder to generate a Feature Map (Feature Map), reducing the storage space of data through quantization (Quantize), and further compressing the quantized data through entropy coding. When decoding the residual information, the reverse flow is used to decode and inversely quantize the stored entropy coding data, and the decoding is carried out by a Decoder (Decoder) with the reverse structure, and the residual information of three channels is recovered from the characteristic diagram.

The method comprises the following specific steps: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy. In particular, the amount of the solvent to be used,

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A residual information compression method for video coding,

based on a neural network structure of an autoencoder, GDN activation function is used, and residual error information compression is carried out by combining quantization and entropy coding;

by adding internal size constraints, and training the data to add noise, and training the self-encoder to restore the original data, it is forced to learn an efficient representation of the data;

after the efficient representation is obtained, the efficient representation is quantized to achieve the effect of further compression;

the quantized feature values need entropy coding for further compression;

entropy coding belongs to lossless compression of data, reducing bits by identifying and eliminating portions of statistical redundancy, which makes it possible to perform compression without losing information;

using residual features to train an auto-encoder network by using the idea of an auto-encoder; then, extracting by using a trained encoder network to generate a characteristic diagram, then reducing the storage space of the data through quantization, and further compressing the quantized data by entropy coding; when residual information is decoded, the stored entropy coding data is decoded and dequantized by using an opposite flow, and is decoded by a decoder with an opposite structure, and the residual information is recovered from the characteristic diagram;

the method comprises the following steps: building a neural network architecture, coding, quantizing, entropy coding, storing a generated file, and decoding entropy;

wherein, the network structure at least comprises a group of convolution layers for downsampling by setting Strides, a group of deconvolution layers for upsampling by setting Strides and a group of layers for quantization and entropy coding;

the convolution kernel size and number of convolution layers are combined by experiment, and the activation function of the convolution layer uses GDN (generalized differentiated simulation) or ReLU;

the method comprises the following specific steps:

1) building a neural network architecture, and specifying the number of layers of convolution layers, the size of convolution kernels, a padding method and the number of threads required by coding;

2) training by using a training set, wherein each label of residual information is the label of each residual information, constructing a loss function by mse and bpp, and optimizing by using an Adam optimizer; after several iterations, a trained neural network model can be obtained;

3) the coding process is a process of inputting the existing residual error information into the Encoder part of the trained neural network and obtaining a characteristic diagram through multi-step convolution;

4) the quantization commonly uses two modes of adding uniform noise and soft quantization; adding uniform noise is the process of adding noise to replace quantization in training;

5) starting entropy coding, namely carrying out binarization firstly and coding a binary number; the non-binary number must be binarized or converted to a binary number before arithmetic coding; counting probability density functions of all binary symbols, and carrying out arithmetic coding on each bit of the binary symbols according to the probability density functions obtained by counting;

6) serializing and storing the encoded file, and processing by using a serialized packet;

7) performing entropy decoding, reading the file stored in a serialized mode, converting the file into decimal fraction, namely adding a decimal point in front of the highest bit to form decimal fraction, and then decoding according to the existing probability density function;