WO2019056898A1

WO2019056898A1 - Encoding and decoding method and device

Info

Publication number: WO2019056898A1
Application number: PCT/CN2018/101040
Authority: WO
Inventors: 徐威; 葛新宇; 周力
Original assignee: 华为技术有限公司
Priority date: 2017-09-21
Filing date: 2018-08-17
Publication date: 2019-03-28
Also published as: CN109547784A

Abstract

Provided in an embodiment of the present invention are an encoding and decoding method and a device. The encoding method comprises: first, performing a down sampling on an original resolution image to obtain a low resolution image, and then inputting the low resolution image into a main encoder for encoding to obtain main coding information; then, inputting the low resolution image to an up converter for up conversion processing to obtain a full resolution image; calculating a residual image according to the original resolution image and the full resolution image; inputting the residual image into an auto-encoder for encoding, and acquiring residual image encoding information; and inputting the residual image encoding information to a secondary encoder for entropy coding to obtain secondary coding information. In this way, 4K or higher quality images can be split into main coded information and auxiliary coded information to be transmitted separately. With this method, 4K or higher quality video transmission can be realized under the current network environment, and the reconstructed video image quality can be guaranteed.

Description

Encoding and decoding method and device

The present application claims the priority of the Chinese Patent Application, which is filed on Sep. 21, 2017, to the Chinese Patent Office, the number of which is hereby incorporated by reference. in.

Technical field

The present application relates to image processing technologies, and in particular, to a coding and decoding method and apparatus.

Background technique

With the increasing demand for 4K video by the user community, 4K TV has gradually moved toward thousands of households, providing users with ultra-high definition visual enjoyment. The UK market research organization's future consulting report pointed out that it is estimated that global 4K TV shipments will reach 100 million units in 2018, and the Chinese market accounts for 70% of the global 4K TV market demand. However, compared to 1080P video, only 8 to 10 Mbps is required to get better picture quality. 4K video has a large demand for transmission bandwidth. The basic 4K60P video requires a code rate of 30 to 50 Mbps to ensure good performance. According to the statistical report, China's current access network bandwidth is only 20Mbps, the peak rate is 18.4Mbps, and the average speed is only 3.4Mbps. In this case, the traditional universal video codec technology cannot guarantee the transmission of high quality 4K video in the existing bandwidth environment.

Summary of the invention

In view of this, the present application provides an encoding method and apparatus, which can solve the transmission of high quality video in an existing bandwidth environment.

In a first aspect, an encoding apparatus is provided, including: a down sampler, an up converter, a calculator, a self encoder, a main encoder, and a secondary encoder, wherein

The down sampler is configured to receive an original resolution image, perform down sampling processing on the original resolution image, acquire a low resolution image, and send the acquired low resolution image to the up converter and the Main encoder

The primary encoder is connected to the downsampler for encoding the received low resolution image to obtain primary coding information;

The up converter is coupled to the downsampler, configured to convert the low resolution image sent by the downsampler into a first full resolution image, and send the first full resolution image to The calculator;

The calculator is configured to receive the original resolution image and the first full resolution image, and calculate a residual image according to the original resolution image and the first full resolution image, and the residual image The difference image is sent to the self-encoder for processing;

The self-encoder is connected to the calculator, and is configured to encode the residual image according to a self-encoding algorithm to obtain residual image encoding information, and send the residual image encoding information to the secondary encoder. ;

The secondary encoder is connected to the self-encoder for entropy encoding the residual image coding information from the self-encoder to obtain secondary coding information.

In an optional manner of the first aspect, the encoding apparatus further includes: a filter booster connected to the up converter, configured to acquire a first full resolution image from the up converter, to the first full The resolution image is filtered and enhanced to obtain a second full resolution image; the calculator is further configured to calculate a residual image according to the original resolution image and the second full resolution image.

In an optional manner of the first aspect, the filter enhancer is specifically configured to perform filtering enhancement on the first full-resolution image by using a bilateral filtering algorithm to obtain a second full-resolution image.

In an optional manner of the first aspect, the calculator calculates a residual image according to the original resolution image and the second full resolution image, including:

The original resolution image is subtracted from the pixel value at the same position as the second full resolution image to obtain a residual image.

In an optional manner of the first aspect, the self-encoder encodes the residual image according to a self-encoding algorithm to obtain residual image coding information, including:

Dividing the residual image into a plurality of image blocks;

Each image block is separately encoded according to a preset self-encoder network parameter to obtain residual image coding information.

In a second aspect, a decoding apparatus is provided, including: a primary decoder, an up converter, a secondary decoder, a self decoder, and a synthesizer, wherein

The primary decoder is configured to obtain primary coding information, and decode the primary coding information to obtain a low resolution image;

The up converter is connected to the main decoder for receiving the low resolution image from the main decoder, and converting the low resolution image to obtain a first full resolution image;

The secondary decoder is configured to obtain secondary coding information, and perform entropy decoding on the secondary coding information to obtain residual image coding information.

The self-decoder is connected to the auxiliary decoder, and is configured to self-decode the residual image coding information from the secondary decoder to acquire a residual image;

a synthesizer coupled to the upconverter and the self decoder for transmitting the first full resolution image from the upconverter and the residual image from the self decoder The synthesis is performed to obtain a first original resolution image.

In an optional manner of the second aspect, the decoding apparatus further includes a filter enhancer coupled to the up converter for performing filter enhancement processing on the first full resolution image to obtain a second full resolution image. The synthesizer is further configured to synthesize the residual image and the second full resolution image to obtain a second original resolution image.

In an alternative manner of the second aspect, the synthesizer is specifically configured to:

Adding the first or second full resolution image to the pixel value at the same position of the residual image to obtain a first or second original resolution image.

In a third aspect, an encoding method is provided, including:

The original resolution image is input to a downsampler for downsampling processing to obtain a low resolution image;

And inputting the low resolution image into a primary encoder to obtain primary coding information;

Inputting the low resolution image into an up converter for conversion processing to obtain a first full resolution image;

Calculating a residual image according to the original resolution image and the first full resolution image;

Inputting the residual image into an encoder for encoding, and acquiring residual image encoding information;

The residual image coding information is input to a secondary encoder for entropy coding to obtain secondary coding information.

In a fourth aspect, a decoding method is provided, including:

Obtaining primary coding information, and decoding the primary coding information to obtain a low resolution image;

Inputting the low resolution image into an up-converter for up-conversion processing to obtain a first full-resolution image;

Obtaining the secondary coding information, inputting the secondary coding information into the secondary decoder for entropy decoding, and obtaining residual image coding information;

Inputting the residual image coding information into a self-decoding by a decoder to obtain a residual image;

The residual image is synthesized with the first full resolution image to obtain a first original resolution image.

In a fifth aspect, there is provided an encoding apparatus having the function of implementing the encoding apparatus in the method described in the first aspect or the method described in the second aspect. This function can be implemented in hardware or in hardware by executing the corresponding software. The hardware or software includes one or more modules (or units) corresponding to the functions described above.

In a sixth aspect, there is provided a decoding apparatus having the function of implementing the decoding apparatus in the method of the second aspect described above. This function can be implemented in hardware or in hardware by executing the corresponding software. The hardware or software includes one or more modules (or units) corresponding to the functions described above.

In a seventh aspect, a computer program product is provided, comprising executable program code, wherein the program code comprises instructions that, when executed by the processor, cause the encoding device to perform the method as described in the above aspect Coding method.

In an eighth aspect, a computer program product is provided, comprising executable program code, wherein the program code comprises instructions that, when executed by the processor, cause the decoding device to perform the method as described in the above aspect Decoding method.

In a ninth aspect, embodiments of the present application provide a computer storage medium for storing computer software instructions for use in an encoding apparatus as described above, including a program designed to perform the above aspects.

In a tenth aspect, embodiments of the present application provide a computer storage medium for storing computer software instructions for use in a decoding apparatus as described above, including a program designed to perform the above aspects.

In an eleventh aspect, a chip system is provided, the chip system comprising a processor for supporting an encoding device or a decoding device as described above to implement an encoding method or a decoding method as referred to in the above aspect. In one possible design, the chip system further includes a memory for holding program instructions and data necessary for the communication device. The chip system can be composed of chips, and can also include chips and other discrete devices.

Through the above aspects, it is possible to achieve 4K or higher quality video transmission in the current network environment, and to ensure the reconstructed video image quality.

DRAWINGS

1 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present application;

2 is a schematic structural diagram of a decoding apparatus according to an embodiment of the present application;

3 is a schematic flowchart of an encoding method according to an embodiment of the present application;

4 is a schematic flowchart of a decoding method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a self-encoder according to an embodiment of the present application; FIG.

FIG. 6 is a schematic flowchart of a self-encoding method according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

1 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present application. As shown in FIG. 1, the encoding apparatus includes: a down sampler 101, an up converter 102, a calculator 104, a self-encoder 105, a secondary encoder 106, and a main Encoder 107. The down sampler 101 is configured to receive an original resolution image, perform down sampling processing on the original resolution image, acquire a low resolution image, and send the acquired low resolution image to the up converter 102 and the main The encoder 107 performs processing; the main encoder 107 is connected to the downsampler 101 for encoding the received low resolution image to output main coding information; the up converter 102 and the down sampler 101 Connected to convert the low resolution image sent by the downsampler 101 into a first full resolution image, send the first full resolution image to the calculator 104; the calculator 104 is configured to receive And calculating the residual image according to the original resolution image and the first full resolution image, and transmitting the residual image to the encoder 105. Processing, the encoder 105 is connected to the calculator 104, and configured to encode the residual image according to a self-encoding algorithm to obtain residual image encoding information, and send the residual image encoding information. The secondary encoder 106; a secondary encoder 106, and from the encoder 105 is connected, for future entropy encoding the residual image encoding information to the encoder 105 is obtained from the secondary encoded information.

Optionally, the encoding device further includes a filter booster 103 connected to the up converter 102, configured to acquire a first full resolution image from the up converter 102, and perform filtering enhancement on the first full resolution image. , obtaining a second full resolution image.

FIG. 2 is a schematic structural diagram of a decoding apparatus according to an embodiment of the present application, as shown in FIG. 2. The decoding apparatus includes a main decoder 201, an up converter 202, a synthesizer 204, a self decoder 205, and a sub decoder 206. The main decoder 201 is configured to obtain main coding information, and decode the main coding information to obtain a low resolution image. The up converter 202 is connected to the main decoder 201 for receiving from the main decoder 201. The low-resolution image is subjected to conversion processing to obtain a first full-resolution image; the secondary decoder 206 is configured to acquire secondary encoding information, and perform entropy decoding on the secondary encoding information to obtain a residual image. The difference image coding information is output to the self-decoder 205 for processing; the self-decoder 205 is coupled to the secondary decoder 206 for the residual image from the secondary decoder 206. The encoded information is self-decoded to obtain a residual image; a synthesizer 204 is coupled to the upconverter 202 and the self decoder 205, respectively, for the first full resolution image and the source from the upconverter 202 The residual image described from the decoder is synthesized to obtain a first original resolution image.

Optionally, the decoding device further includes a filter enhancer 203, connected to the up converter 202, configured to perform filter enhancement processing on the first full resolution image to obtain a second full resolution image.

It should be noted that, in order to facilitate the description and distinction, in the embodiment, the concept of a self-encoder and a self-decoder is used. In practical applications, the self-encoder and the self-decoder may be a module or device, such as self-encoding. Auto Encoder (AE), which itself contains both an encoding part and a decoding part, the self-encoder can correspond to the encoding part of the self-encoder, and the self-decoder corresponds to the decoding part of the self-encoder.

It should be noted that the foregoing modules of the encoding device and the decoding device may be implemented by hardware or software. In one example, the encoder encodes the primary encoding information and the secondary encoding information, or the decoder pairs the primary encoding information and the secondary encoding. The decoding process of information can be implemented by a software program executing on a programmable device and/or other hardware device, such as a graphics processing unit (GPU), a field programmable gate array (FPGA), a central processing unit (CPU), and The implementation of the software program is executed on the computing device. In another example, the encoding process of the primary encoding information and the secondary encoding information by the encoding device, or the decoding process of the primary encoding information and the secondary encoding information by the decoding device may be at least partially hardware and/or embedded in a specific application integrated circuit ( The code in ASIC) is implemented together.

The encoding method of an embodiment of the present application is described below with reference to FIG. 3. As shown in FIG. 3, the encoding method can be implemented by the encoding apparatus shown in FIG. 1, and includes the following steps:

301. Input a raw resolution image into a downsampler module to calculate a low resolution image.

Specifically, the downsampler functions to reduce the image and reduce the resolution of the image. For the original resolution image of the input size of M×N, it is downsampled by s times to obtain an image of (M/s)×(N/s) size. Of course, s should be the common divisor of M and N. Row.

An alternative method is to convert the image in the original resolution image s×s window into a pixel whose value is the mean of all pixels in the window. Another method is to take a pixel of the original resolution image every (s-1) line and every (s-1) column to form a new image. For example, the image of 4K resolution (3840×2160) is reduced to 1080P resolution (1920×1080), which is actually 2 times downsampling of 4K objects, and the image in 2×2 window in 4K image can be turned into one. Pixels, which represent the pixels by the mean of all the pixels in the window; or one pixel at every other row and every other column of the original 4K image to form a reduced 1080P image.

302. Input the low resolution image into the primary encoder for encoding, and output the primary encoded information.

Alternatively, the primary encoder can be encoded using any coding standard, such as the H.264, H.265, and VP9 standards. Taking the H.265 coding standard as an example, the low-resolution images are sequentially subjected to predictive coding, transform coding, quantization loop post-processing, and entropy coding to obtain a binary code stream file.

303. Send a low resolution image to the up converter to obtain a low quality full resolution image.

The role of the upconverter is to convert a low resolution image into a high resolution image, also known as upsampling, such as upsampling a 1080P image into a 4K image. Interpolation methods are generally used to insert new elements between pixels using appropriate interpolation algorithms based on the original image pixels. For example, a classical bicubic interpolation algorithm can be employed. The full resolution image obtained at this time is of low quality, so it is called a low quality full resolution image.

304. Perform low-quality full-resolution video image filtering enhancement processing to obtain a high-quality full-resolution image.

The higher quality full resolution image is improved in image quality after filtering enhancement processing with respect to the low quality full resolution image as described above.

Specifically, the filter booster uses a Bilateral filter algorithm to enhance the low-quality full-resolution image obtained in step 303, so as to remove the image particle noise and maintain the image edge and detail texture, and the human eye visual attention. The mechanism is sensitive to information such as edge texture, and the subjective visual quality of the image can be improved by using the bilateral filtering algorithm. The bilateral filtering algorithm is a classical filtering enhancement algorithm. It is a nonlinear filtering method. It is a kind of compromise processing that combines the spatial proximity of the image and the similarity of the pixel values. At the same time, the spatial domain information and the grayscale similarity are considered to achieve the edge preservation. The purpose of denoising is simple, non-iterative, and local.

It should be noted that step 304 is an optional step.

305. Perform difference processing on the image of the original resolution and the high-quality full-resolution image obtained in step 304 to obtain a residual image.

Specifically, assuming that the image of the original resolution is A, the image of the higher quality full resolution is B, and the pixel values at all the same positions in A and B are subtracted, and a residual image is obtained.

306. Encode the residual image by using a self-encoder to obtain residual image coding information.

Specifically, the residual image is first divided into a plurality of n×n image blocks, each image block includes a total of N=n×n pixel points; secondly, the network structure of the self-encoder is constructed, and reference may be made to FIG. 5, and then utilized. The network structure of the constructed self-encoder encodes the plurality of image blocks of the residual image separately. For the specific implementation process, reference may be made to the self-encoding method as shown in FIG. 6.

307. Send the residual image coding information encoded by the encoder to the secondary encoder for entropy coding, and generate final secondary coding information.

Specifically, the so-called entropy coding refers to a lossless coding method according to the principle of information entropy. Encode the input information into a binary stream file. The entropy coding method used here may be Context-based Adaptive Binary Arithmetic Coding (CABAC), which is also widely used by the H.265 coding standard. Specifically, the self-encoded code output corresponding to each image block in step 306 is input to an entropy encoder, and CABAC is used as a binary code stream file to obtain auxiliary coded information of each image block.

A decoding method according to an embodiment of the present application is described below based on FIG. 4. As shown in FIG. 4, the decoding method includes the following steps:

401. Acquire primary coding information, and decode the primary coding information to obtain a low resolution image.

Specifically, the main coding information is input to the main decoder for decoding, and the main decoder can adopt any decoding standard, for example, adopting standards such as H.264, H.265, and VP9. Taking the H.265 standard as an example, the binary code stream file obtained after encoding is sequentially subjected to entropy decoding, inverse quantization, inverse transform and the like to decode a low resolution image.

402. Input a low-resolution image into an up-converter for conversion processing to obtain a low-quality full-resolution image.

Optionally, this step may adopt the same method as step 303 of FIG. 3, such as using a classical bicubic interpolation algorithm to convert a low resolution image into a low quality full resolution image.

403. Perform low-quality full-resolution video image filtering enhancement processing to obtain a high-quality full-resolution image.

Optionally, in this step, the same method as step 304 of FIG. 3 may be used, such as using a Bilateral filter algorithm to enhance the low-quality full-resolution image, so as to remove image particle noise and maintain image edge and detail texture. purpose.

404. Acquire secondary coding information, perform entropy decoding on the secondary coding information, and obtain residual image coding information.

Specifically, the secondary coding information is input to the secondary decoder, and the residual image coding information is obtained after entropy decoding.

This step is opposite to step 307 of FIG. 3, and the binary stream file is decoded by the CABAC entropy decoding algorithm, and each image block is decoded in turn, and the self-encoded L ₂ layer output corresponding to each image block is obtained.

to

405. Input the entropy decoded residual image coding information into a decoding portion of the self-encoder, and recover the residual image.

For a specific implementation, reference may be made to the decoding process of the self-encoder as described below.

It should be noted that the resolution of the residual image obtained in step 405 is the same as the higher quality full resolution image obtained in step 403.

406. Synthesize the residual image recovered in step 405 and the high-quality full-resolution image obtained in step 403 to obtain a reconstructed original resolution image.

Specifically, it is assumed that the residual image recovered in step 405 is A, and the high-quality full-resolution image obtained in step 403 is B, and the pixel values at all the same positions in A and B are added to obtain reconstruction. After the original resolution image.

The decoding method of this embodiment is the inverse of the encoding method of the embodiment of FIG. 3, and the related details may be referred to and applied to each other, and details are not described herein again.

The self-encoder in the embodiment of the present application is further described below with reference to FIG. 5.

The so-called self-encoder is a fully connected neural network, setting the target output value of the network equal to the input value. A self-encoder with 6 nodes is shown in Figure 5:

The self-encoder performs self-encoding on a 6-dimensional vector x=[x1,x2,...x6] such that the output h _w,b (x)=x. The self-encoder includes three layers. The input layer L1 has six common nodes and one offset node. The circle labeled "+1" is called a bias node; the L2 layer contains three common nodes and one offset. Node; output layer L3 contains 6 ordinary nodes. Each common node of L2 is connected to all nodes of the L1 layer by one edge, and each connected node has a weight parameter w. Each common node of L3 is connected with one edge of all nodes of the L2 layer, each There will also be a weight parameter w between the connected nodes, taking the output of the L2 layer as an example, and the output of the first node of the L2 layer.

for:

among them,

The weight parameter between the first node of the L1 layer and the first node of the L2 layer,

The weight parameter between the second node of the L1 layer and the first node of the L2 layer,

The weight parameter between the third node of the L1 layer and the first node of the L2 layer, and so on.

The weight parameter between the bias node of the L1 layer and the first node of the L2 layer. f(.) represents the activation function, here the sigmoid function is used. By analogy, the output of the second and third nodes of the L ₂ layer can be obtained.

with

Suppose that W ^l-1 represents the set of weight parameters between the l-1 layer and the l layer, and b ^l-1 represents the set of the bias term parameters of the l-1 layer and the first layer, then the lth full connection The output of the layer can be simply expressed as:

X ^l = W ^l-1 X ^l-1 + b ^l-1 .

By analogy, similar to the L ₂ layer, the output value of each node of the output layer L ₃ is:

And the output value is equal to the input value, thus forming a self-encoder. The key is the calculation of the weight w of the edge connected between each node, which is generally divided into the following steps:

First, a large number of 6-dimensional training samples are collected as input to the self-encoder. To ensure the accuracy of the parameter training, the training samples are generally selected to be more than 10,000.

Secondly, constructing the loss function J(w,b), that is, the degree of the original input loss compared to the output of the self-encoder, is generally expressed by the square of the difference between the two.

Finally, according to the loss function, the back propagation algorithm (BP) is used to calculate the weight w layer by layer.

After training the parameters of the self-encoder, the actual input 6-dimensional vector x=[x1,x2,...x6] can be self-encoded: according to the parameters between the L1 layer and the L2 layer, according to equation (1), (2) Calculate the 3-dimensional output of the L2 layer a = [a1, a2, a3], which is the encoding process of the self-encoder, which can achieve the compression of the original data well. If a sparse constraint is added to the L2 layer so that as many of the L2 layer nodes have a value of 0, the amount of data can be further compressed.

According to the parameters between the L2 layer and the L3 layer, the output of the L3 layer is calculated according to the equation (3), that is, the recovery of the original actual input 6-dimensional vector is obtained, which is the decoding process of the self-encoder.

The self-encoding method in the embodiment of the present application is further introduced in the following with reference to FIG. 6. The self-encoding method includes the following steps:

601. Divide the residual image into a plurality of image blocks.

Specifically, the residual image may be divided into n×n image blocks, and each image block includes N=n×n pixels in total.

602. Construct a network structure of the self-encoder.

In a specific implementation, reference may be made to the three-layer network shown in FIG. 5, but the number of nodes varies: since each image block includes N pixels, the corresponding input layer L1 has N corresponding to the pixel points. A common node and a bias node; the L2 layer contains M=N/2 common nodes (or a number smaller than N) and a bias node; the output layer L3 contains N ordinary nodes. Each common node of L2 is connected to all nodes of the L1 layer by one edge, and each connected node has a weight parameter w; similarly, each common node of L3 uses one edge with all nodes of the L2 layer. Connected, there will be a weight parameter w between each connected node. Taking the output of the L2 layer as an example, the output of the first node of the L2 layer

for:

among them,

The weight parameter between the first node of the L ₁ layer and the first node of the L ₂ layer,

The weight parameter between the second node of the L ₁ layer and the first node of the L ₂ layer, and so on,

A weight parameter is the weight between the second node N and L _{L. 1} layer ₂ layer of the first node.

Weight is the weight parameter bias node between L and _{L. 1} layer ₂ layer of the first node. f(.) represents the activation function, here the sigmoid function is used. By analogy, you can get the output of the second L- ₂ layer up to the Mth node:

The L ₃ layer has a total of N nodes. Similarly, the output of the first, second, and Nth nodes of the L ₃ layer is:

From the target output of the encoder is equal to the input, this is that the output L N layer ₃ comprises the original input image block of N pixels equal.

603. Train the network parameters of the self-encoder, that is, train the weight w of the edge connected between each node.

First, multiple types of video images are selected, including cartoons, indoor scenes, outdoor scenes, etc., and the image is divided into image blocks of N=n×n as training samples. In order to ensure the adequacy of network parameter learning, it is best to select enough training samples, generally taking more than 10,000;

Second, construct the loss function J(W,b), which is the degree to which the output of the self-encoder is compared to the original input loss:

Another possible loss function construction method is to add sparse constraints to the intermediate layer L ₂ . The so-called sparse constraint means that when the output of a node of the L ₂ layer is close to 1, we think that it is activated, and when the output of the node is close to 0, it is considered to be suppressed, and as many as possible, the number of suppression nodes of the L ₂ layer is large. Then the sparse constraint is reached. Specifically, the jth output of the L ₂ layer

The average activity relative to an image block containing N pixels (x ₁ to x _N ) is:

Join restrictions

ρ is the sparsity coefficient, usually a small value close to zero, such as 0.05. In order to achieve this limitation, an additional penalty factor needs to be added to the constructed loss function, which will punish those that are significantly different from ρ.

Therefore, the average activity of each node in the L ₂ layer is kept in a small range, and the penalty factor is:

Where M represents the number of L ₂ layer nodes. This penalty factor can be expressed in the form of relative entropy (KL divergence):

Then, the loss function after increasing the sparse constraint is:

Where β is the weight that controls the sparsity penalty factor. At this point, the loss function is constructed.

Our goal is to determine the optimal parameters W and b, taking J(W,b) to a minimum, which needs to be achieved by the gradient descent method. First, the network parameters of the self-encoder are initialized with random numbers, and the training samples are sequentially input into the self-encoder; then the loss function J(W, b) is obtained according to the difference between the output of the self-encoder and the original input; (a total of three layers) parameters are expressed as W ^l , the first layer bias term is expressed as b ^l , and the partial derivative is calculated:

Then, update W ^l and b ^l with partial derivatives:

Where α is called the learning speed, and the range of values is generally [0.01, 0.1], which is a smaller real number. The key to solving the gradient descent method is to calculate the partial derivative of the cost function J(W,b) for each layer of parameters, which needs to be implemented by the Back Propagation (BP) algorithm (this is a classic algorithm, no longer here. Brief description).

After all the input training samples are trained in sequence, the final optimal weight parameter W ^l and the offset term parameter b ^{l of} each layer of the self-encoder are obtained.

604. According to the trained self-encoder network parameter, each image block of the residual image is input to an encoding portion of the self-encoder to be encoded.

The coding portion of the self-encoder refers to the input layer Layer L ₁ and the intermediate layer Layer L ₂ and the network structure therebetween. The image block contains N=n×n pixels, then there are N common nodes and one bias node from the encoder L ₁ layer; the L ₂ layer contains M=N/2 common nodes (or take one less than N) And a bias node; after training the edge weight w connected between the nodes according to step 603, the output of each node of the L ₂ layer can be calculated according to formulas (4) to (5)

to

Due to the output of M<N, L ₂ layer

to

The compression of the original data is achieved, that is, the encoding process is implemented.

to

Can better reflect an intrinsic feature between raw data. Further, when the sparse constraint described in step 603 is added, that is, as many node outputs as possible in the L ₂ layer are close to 0, the amount of data can be further compressed in the subsequent entropy encoding step.

The decoding process of the self-encoder corresponds to the encoding process inversely. Specifically, as shown in step 405 of FIG. 4, the entropy decoded result is input to the decoding part of the self-encoder, and the residual image is restored. The specific process is as follows:

The decoding portion of the self-encoder refers to the intermediate layer L ₂ and the output layer L ₃ and the network structure therebetween. After constructing the network structure of the self-encoder and training the network parameters, what you need to do is to decode the entropy and get the output of the L ₂ layer of the self-encoder.

to

Connected to the L ₃ layer and calculated according to the formula (6), the output of the L ₃ layer is obtained, which is the decoding result of the self-encoder, that is, the approximate recovery of the original image block containing N pixel points.

The entropy decoding results of all k image blocks m ₁ , . . . m _k in the original residual image are sequentially input to the decoding portion of the self-encoder, and the approximate restorations m ^' ₁ , . . . m ^' _{k of the} image blocks are obtained after decoding. These approximate restored image blocks are combined in accordance with the positions of m ₁ , . . . m _k in the original residual image to obtain a restored residual image.

As described above, the codec method based on the self-encoder is nonlinear, and the distortion of the original residual image is less in the encoding and decoding process, so that the reconstructed video image has better subjective quality and objective quality. In addition, compared with the traditional Reconstructive Video Coding (RVC) method, both the pre-processing module and the post-processing module need to store a reference image matrix with a large amount of data, and the self-described embodiment is based on The encoding and decoding method of the encoder only needs to store the parameters of the encoding part in the encoder (the parameters between the input layer L1 and the intermediate layer L2) and the decoding parameters (the parameters between the intermediate layer L2 and the output layer L3) stored from the encoder. ), the amount of data is small, saving storage space.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, for clarity of hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present application.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any equivalents can be easily conceived by those skilled in the art within the technical scope disclosed in the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

Claims

An encoding device, comprising: a down sampler, an up converter, a calculator, a self encoder, a main encoder, and a secondary encoder, wherein

The down sampler is configured to receive an original resolution image, perform down sampling processing on the original resolution image, acquire a low resolution image, and send the acquired low resolution image to the up converter and the Main encoder

The primary encoder is connected to the downsampler for encoding the received low resolution image to obtain primary coding information;

The up converter is coupled to the downsampler, configured to convert the low resolution image sent by the downsampler into a first full resolution image, and send the first full resolution image to The calculator;

The calculator is configured to receive the original resolution image and the first full resolution image, and calculate a residual image according to the original resolution image and the first full resolution image, and the residual image The difference image is sent to the self-encoder for processing;

The self-encoder is connected to the calculator, and is configured to encode the residual image according to a self-encoding algorithm to obtain residual image encoding information, and send the residual image encoding information to the secondary encoder. ;

The secondary encoder is connected to the self-encoder for entropy encoding the residual image coding information from the self-encoder to obtain secondary coding information.
The device of claim 1 further comprising:

a filter booster connected to the up converter, configured to acquire a first full resolution image from the up converter, and perform filter enhancement on the first full resolution image to obtain a second full resolution image;

The calculator is further configured to calculate a residual image according to the original resolution image and the second full resolution image.
The apparatus according to claim 2, wherein the filter enhancer is specifically configured to filter and enhance the first full-resolution image by using a bilateral filtering algorithm to obtain the second full-resolution image.
The apparatus according to claim 3, wherein the calculator calculates the residual image based on the original resolution image and the second full resolution image, including:

The original resolution image is subtracted from the pixel value at the same position as the second full resolution image to obtain a residual image.
The apparatus according to any one of claims 1 to 4, wherein the self-encoder encodes the residual image according to a self-encoding algorithm to obtain residual image coding information, including:

Dividing the residual image into a plurality of image blocks;

Each image block is separately encoded according to a preset self-encoder network parameter to obtain residual image coding information.
A decoding device, comprising: a main decoder, an up converter, a sub decoder, a self decoder, and a synthesizer, wherein

The primary decoder is configured to obtain primary coding information, and decode the primary coding information to obtain a low resolution image;

The up converter is coupled to the main decoder for receiving the low resolution image from the main decoder, and performing upconversion processing on the low resolution image to obtain a first full resolution image ;

The secondary decoder is configured to obtain secondary coding information, and perform entropy decoding on the secondary coding information to obtain residual image coding information.

The self-decoder is connected to the auxiliary decoder, and is configured to decode the residual image coding information from the secondary decoder to acquire a residual image;

a synthesizer coupled to the upconverter and the self decoder for transmitting the first full resolution image from the upconverter and the residual image from the self decoder The synthesis is performed to obtain a first original resolution image.
The device of claim 6 further comprising:

a filter booster, coupled to the up converter, for performing a filter enhancement process on the first full resolution image to obtain a second full resolution image;

The synthesizer is further configured to synthesize the residual image and the second full resolution image to obtain a second original resolution image.
The apparatus according to claim 6 or 7, wherein the synthesizer is specifically configured to:

Adding the first or second full resolution image to the pixel value at the same position of the residual image to obtain a first or second original resolution image.
An encoding method, comprising:

The original resolution image is input to a downsampler for downsampling processing to obtain a low resolution image;

And inputting the low resolution image into a primary encoder to obtain primary coding information;

Inputting the low resolution image into an up-converter for up-conversion processing to obtain a first full-resolution image;

Calculating a residual image according to the original resolution image and the first full resolution image;

Inputting the residual image into an encoder for encoding, and acquiring residual image encoding information;

The residual image coding information is input to a secondary encoder for entropy coding to obtain secondary coding information.
The method of claim 9 further comprising:

Performing the first full-resolution image input filter enhancer to perform filter enhancement processing to obtain a second full-resolution image;

A residual image is calculated from the original resolution image and the second full resolution image.
The method according to claim 9 or 10, wherein the first full-resolution image input filter enhancer performs filter enhancement processing to obtain a second full-resolution image, comprising:

The filter enhancer filters and enhances the first full-resolution image by using a bilateral filtering algorithm to obtain the second full-resolution image.
The method according to claim 10, wherein said calculating a residual image based on said original resolution image and said second full resolution image comprises:

The original resolution image is subtracted from the pixel value at the same position as the second full resolution image to obtain a residual image.
The method according to any one of claims 9 to 12, wherein the inputting the residual image into an encoder for encoding to obtain residual image encoding information comprises:

Dividing the residual image into a plurality of image blocks;

Each image block is separately encoded according to a preset self-encoder network parameter, and residual image coding information is acquired.
A decoding method, comprising:

Obtaining primary coding information, and decoding the primary coding information to obtain a low resolution image;

Inputting the low resolution image into an up-converter for up-conversion processing to obtain a first full-resolution image;

Obtaining the secondary coding information, inputting the secondary coding information into the secondary decoder for entropy decoding, and obtaining residual image coding information;

Inputting the residual image coding information into a decoder for decoding, and acquiring a residual image;

The residual image is synthesized with the first full resolution image to obtain a first original resolution image.
The method of claim 14 wherein the method further comprises:

Performing the first full-resolution image input filter enhancer to perform filter enhancement processing to obtain a second full-resolution image;

The second full resolution image is combined with the residual image to obtain a second original resolution image.
The method according to claim 14 or 15, wherein the synthesizing the first or second full resolution image with the residual image to obtain the first or second original resolution image comprises:

Adding the first or second full resolution image to the pixel value at the same position of the residual image to obtain a first or second original resolution image.
A codec system, comprising:

An encoding device according to any one of claims 1 to 5, and

A decoding device according to any of claims 6-8.