WO2021042270A1

WO2021042270A1 - Compression artifacts reduction method based on dual-stream multi-path recursive residual network

Info

Publication number: WO2021042270A1
Application number: PCT/CN2019/104234
Authority: WO
Inventors: 金枝; 齐银鹤; 谭晓军
Original assignee: 中山大学
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2021-03-11

Abstract

Disclosed in the present invention is a compression artifacts reduction method based on a dual-stream multi-path recursive residual network. On the basis of a dual-stream multi-path recursive residual network architecture, high-frequency (HF) components and low-frequency (LF) components of an image are used to reduce compression artifacts in a targeted manner. First, a network decomposes a compression distortion image into a texture layer (containing HF components) and a structure layer (containing LF components); second, a multi-path recursive residual network is used to enhance texture information and structure information, respectively; and finally, the structure information and texture information are combined and then fed into a regression network to generate a final reconstructed image. The present invention designs a dual-stream multi-path recursive residual network architecture in the process of reducing artifacts of a lossy compressed image, the two streams in the network are used to reduce a specific type of artifacts associated with high-frequency or low-frequency components of an image, respectively, and two outputs are combined together by using a non-linear regression network, such that the proposed network can significantly suppress compression artifacts and reduce distortion, further enhancing the reconstructed image.

Description

Compression artifact removal method based on dual-stream multi-path recursive residual network

Technical field

The invention relates to a method for compressing and removing artifacts during storage and transmission of media files such as images and videos.

Background technique

Media files such as images and videos usually use compression algorithms to reduce costs during storage and transmission. From the perspective of information theory, image compression algorithms can be divided into lossy compression (such as JPEG) and lossless compression (such as PNG). Compared with lossless compression, lossy compression has a higher compression ratio. However, lossy compression often causes irreversible information loss and compression artifacts, such as ringing, blocking, and blur, especially for low-bit-rate encoding. These compression artifacts not only degrade the user experience, but also adversely affect many primary image processing tasks. JPEG is currently the most widely used image compression standard. Through the use of block-based discrete cosine transform (BDCT, Block-based Discrete Cosine Transform) and coarse quantization strategies, it aims to reduce statistical redundancy between pixels and achieve a high compression ratio. However, when the correlation between adjacent blocks is not considered, the intensity discontinuity at the block boundary is prone to appear. In addition, the truncation of high-frequency discrete cosine transform coefficients can also cause ringing and blurring artifacts. Therefore, compression artifacts caused by lossy compression can generally be considered as hybrid artifacts.

The early compression artifact removal (CAR, Compression Artifacts Reduction) was mainly based on filtering methods, which reduced the blocking effect by manually designing a filter that worked on the block boundary. In addition, there are also wavelet transform, adaptive discrete cosine transform (SA-DCT, Shape-Adaptive Discrete Cosine Transform), and methods based on sparse coding. When using the aforementioned method to obtain an artifact-free image from a compressed image, it is usually accompanied by noise edges and unnatural smooth areas. In addition to traditional compression artifact removal methods, methods based on deep learning obtain the best results by learning the nonlinear mapping relationship between the compressed image and the original image. However, due to the limitation of the shallow network, the reconstructed image appears excessively smooth. On the other hand, although the deep network can provide better performance, there are too many parameters, the network is difficult to train and the storage cost of the deep network model is higher than that of traditional methods.

In addition, previous work either focused on removing a specific artifact in compression artifacts, or only relied on an end-to-end network to reduce all types of artifacts inadvertently, but this may result in a reduction of artifacts without intention. Increase the influence of another kind of artifact (for example, removing the blocking effect may exacerbate the influence of the blur effect). Generally speaking, CAR can be divided into three subtasks: de-blocking, de-ringing artifacts and de-blurring. For the tasks of deblocking and ringing artifacts, it is necessary to suppress the interference high-frequency information, and for the deblurring tasks, it is necessary to enhance the sharp edge information and useful high-frequency information.

Since the tasks of deblocking, ringing artifacts, and edge sharpening (deblurring) cancel each other out, if these artifacts are not distinguished, the end-to-end convolutional network will reduce one type of artifact while enhancing the others. Type of artifacts.

Summary of the invention

The object of the present invention is to provide a compression artifact removal method based on a dual-stream multi-path recursive residual network that can significantly suppress compression artifacts, reduce distortion, and further enhance reconstructed images.

The present invention achieves the above-mentioned objects in this way:

Compression artifact removal method based on dual-stream multi-path recursive residual network, based on dual-stream multi-path recursive residual network architecture, using image high frequency (HF, high frequency) and low frequency (LF, low frequency) components to target removal and compression Artifacts. First, the network decomposes the compressed and distorted image into a texture layer (including HF component) and a structure layer (including LF component); secondly, it uses a multi-path recursive residual network to enhance texture and structure information respectively; finally, the structure and texture information are merged And feed it into the regression network to generate the final reconstructed image.

Further, the weight sharing strategy is adopted in each residual unit in the same tributary of the dual stream, and the training parameter amount of each tributary is fixed, which is equivalent to the parameter amount of a 4-layer convolutional neural network.

Further, first obtain the desired structure layer by minimizing the L _{0 norm.} Secondly, the difference between the compressed and distorted image and the structure layer is calculated and used as the corresponding texture layer. The formula is expressed as:

I _lq =I _s +I _t

I _lq represents a compressed distorted image, I _s represents a structural layer corresponding to the coarse information of the distorted image, and I _t represents a texture layer corresponding to the fine information of the distorted image.

Further, the network architecture is composed of several recursive residual units (RRU, Recursive Residual Unit) and intermediate residual blocks (IRB, Intermediate Residual Block), which integrates global residual learning, local residual learning, and multi-path intermediate Residual learning There are three kinds of residual learning; recursion refers to the use of the same weight between each feature map, that is, parameter sharing. As the network deepens, the amount of network learning parameters will also increase linearly, and the amount of network learning parameters is limited by weight sharing between each recursive unit.

Further, the RRU is mainly composed of two convolutional layers and a ReLU activation layer, and the expression formula of the RRU is:

X ^u and X ^u ′ represent the input and output of the u-th RRU,

Represents the mapping function of RRU,

Is the mapping function of the i-th convolutional layer in the u-th RRU,

Indicates the weight of the i-th convolutional layer in the u-th RRU, and the function σ is the ReLU activation function.

Further, in each IRB, the low-level features are transmitted to the next network layer of the residual block by skip connection mode; assuming that the representation function of each IRB is

The corresponding input and output of the b-th IRB are X ^b and X ^b ′,

Indicates the weight of the i-th residual unit of the b-th recursive block; the formula of IRB composed of two RRUs is expressed as:

Therefore, the expression formula for a network with two IRBs is:

X and X'respectively represent the input and output of the network, f and f _rec respectively represent the mapping function of the first and last convolutional layer in the entire dual-stream network.

Represents the overall mapping function of the proposed network.

The beneficial effect of the present invention is that in the process of removing artifacts from lossy compressed images, the present invention designs a dual-stream multi-path recursive residual network architecture. Related specific types of artifacts, and use a nonlinear regression network to combine the two outputs, so that the proposed network can significantly suppress compression artifacts, reduce distortion, and further enhance the reconstructed image.

Description of the drawings

The present invention will be further described below in conjunction with the drawings and embodiments:

Figure 1 is a schematic structural diagram of the dual-stream multi-path recursive residual network of the present invention;

Fig. 2 is a schematic diagram of the structure of the recursive residual unit of the present invention.

detailed description

Since the tasks of deblocking, ringing artifacts, and edge sharpening (deblurring) cancel each other out, if these artifacts are not distinguished, the end-to-end convolutional network will reduce one type of artifact while enhancing the others. Type of artifacts. The present invention designs a dual-stream multi-path recursive residual network architecture, which uses high frequency (HF) and low frequency (LF) components of an image to specifically remove compression artifacts. First, the network decomposes the compressed and distorted image into a texture layer (including HF components) and a structure layer (including LF components); secondly, it uses a multi-path recursive residual network to enhance texture and structure information respectively; finally, the structure and texture information are combined And feed it into the regression network to generate the final reconstructed image. In addition, in order to overcome the difficulties of network training and reduce training parameters, a weight sharing strategy is adopted in each residual unit in the same tributary. Therefore, the amount of training parameters for each tributary is fixed, which is equivalent to the amount of parameters of a 4-layer convolutional neural network.

The specific methods or steps are as follows:

1. Structure-texture decomposition

First, the desired structure layer is obtained by minimizing the L _{0 norm.} Secondly, the difference between the compressed and distorted image and the structure layer is calculated and used as the corresponding texture layer.

The compressed and distorted image is decomposed into a structure layer (including LF component) and a texture layer (including HF component).

Its formula is expressed as:

I _lq =I _s +I _t

Among them, I _lq represents a compressed distorted image, I _s represents a structural layer corresponding to the coarse information of the distorted image, and I _t represents a texture layer corresponding to the fine information of the distorted image.

2. Structure flow

In order to enhance the detailed information of the compressed image structure layer, a recursive residual network based on multi-path is designed. The entire network is composed of several recursive residual units (RRU, Recursive Residual Unit) and intermediate residual blocks (IRB, Intermediate Residual Block), which integrates global residual learning, local residual learning and multi-path intermediate residual learning. Residual learning. This kind of combined structure can not only help the transmission of gradients and low-level features, but also significantly reduce the amount of network parameters.

1) Recursive residual unit

Recursion refers to the use of the same weight between each feature map, that is, parameter sharing. As the network deepens, the amount of network learning parameters will also increase linearly. Therefore, the amount of network learning parameters can be limited by weight sharing between each recursive unit. RRU is mainly composed of two convolutional layers and ReLU activation layer, and its structure diagram is shown in Figure 2. It has the same structure as the residual unit in ResNet, but the difference lies in the activation sequence. In each residual unit of ResNet, the activation function is executed after the convolutional layer, and the RRU executes the activation layer (BN and ReLU) before the convolutional layer. Therefore, the expression formula of this RRU is:

Among them, X ^u and X ^u′ represent the input and output of the u-th RRU,

Represents the mapping function of RRU,

Is the mapping function of the i-th convolutional layer in the u-th RRU,

2) Interval residual block

In each IRB, the low-level features are transmitted to the next network layer of the residual block by skip connection mode. Therefore, there are multiple jump connections between the global connection and the local connection, as shown by the curved arrow in the lower middle in Figure 1. Assume that the representation function of each IRB is

The corresponding input and output of the b-th IRB are X ^b and X ^b ′,

Indicates the weight of the i-th residual unit of the b-th recursive block. Then the formula of IRB composed of two RRUs is expressed as:

Therefore, the expression formula for a network with two IRBs is:

Among them, X and X'respectively represent the input and output of the network, and f and f _rec respectively represent the mapping function of the first and last convolutional layer in the entire dual-stream network.

Represents the overall mapping function of the proposed network. 3) Network structure

Due to the three kinds of residual learning and the design of RRU and IRB, the network proposed by the present invention has a flexible combined structure. Given a specific number of network layers, the number of RRUs and the number of IRBs can be adjusted freely. Assuming that the number of RRUs is denoted as U and the number of IRBs is denoted as B, the calculation formula for the number of network layers is as follows:

d=2+2×U×B

Assuming that the number of designed network layers d=20, then the network structure has the following three different types:

A. 1B9U: There is only one IRB block, and the IRB contains 9 RRUs;

B, 3B3U: There are a total of 3 IRB blocks, and each IRB contains 3 RRUs;

C. 9B1U: There are a total of 9 IRB blocks, and each IRB contains only one RRU.

Since the combined structure of 3B3U includes three kinds of residual learning, the combined structure of 3B3U is respectively applied to the structure flow and texture flow designed in the present invention. In particular, as the number of network layers increases, the combination structure becomes more flexible, that is, there are more different combinations.

3. Texture flow and structure flow regression network

The purpose of the structure layer is to restore the high-frequency information lost in the image. On the contrary, the processing process of the texture layer aims to remove compression artifacts and preserve the details such as the edges of the original image. Supervised learning is performed on the texture layer of the original real image, and the designed network structure including the recursive residual unit area and the inter-residual block can greatly suppress the strong blocking effect and ringing artifacts in the texture layer.

In the network structure designed by the present invention, the two branches of the structure flow and the texture flow are operated in parallel, and the corresponding information is outputted and the structure layer is enhanced.

And texture layer

Then, the corresponding outputs of the two tributaries are added pixel by pixel to obtain an enhanced image, namely

Finally, the enhanced image

It is fed into a non-linear regression network to further improve the reconstructed image. The structure of the regression network is the same as the network applied to the structure or texture stream.

Claims

The compression artifact removal method based on the dual-stream multi-path recursive residual network is characterized by: based on the dual-stream multi-path recursive residual network architecture, the high frequency (HF, high frequency) and low frequency (LF, low frequency) components of the image are used to target Remove compression artifacts. First, the network decomposes the compressed and distorted image into a texture layer (including HF component) and a structure layer (including LF component); secondly, it uses a multi-path recursive residual network to enhance texture and structure information respectively; finally, the structure and texture information are merged And feed it into the regression network to generate the final reconstructed image.
The compression artifact removal method based on dual-stream multi-path recursive residual network according to claim 1, characterized in that: a weight sharing strategy is adopted in each residual unit in the same tributary of the dual stream, and the training of each tributary The parameter quantity is fixed, which is equivalent to the parameter quantity of a 4-layer convolutional neural network.
The method for removing compression artifacts based on a dual-stream multi-path recursive residual network according to claim 1, characterized in that: firstly, a desired structure layer is obtained by a method of minimizing the L 0 norm. Secondly, the difference between the compressed and distorted image and the structure layer is calculated and used as the corresponding texture layer. The formula is expressed as:

I lq =I s +I t

Among them, I lq represents a compressed distorted image, I s represents a structural layer corresponding to the coarse information of the distorted image, and I t represents a texture layer corresponding to the fine information of the distorted image.
The compression artifact removal method based on dual-stream multi-path recursive residual network according to claim 1, characterized in that: the network architecture consists of a plurality of recursive residual units (RRU, Recursive Residual Unit) and intermediate residual blocks ( IRB, Intermediate Residual Block), which integrates three kinds of residual learning: global residual learning, local residual learning, and multi-path intermediate residual learning; recursion refers to the use of the same weight between each feature map, that is, parameter sharing. As the network deepens, the amount of network learning parameters will also increase linearly, and the amount of network learning parameters is limited by weight sharing between each recursive unit.
The method for removing compression artifacts based on a dual-stream multi-path recursive residual network according to claim 4, wherein the RRU is mainly composed of two convolutional layers and a ReLU activation layer, and the expression formula of the RRU is:

Among them, X u and X u′ represent the input and output of the u-th RRU,
Represents the mapping function of RRU,
Is the mapping function of the i-th convolutional layer in the u-th RRU,
Indicates the weight of the i-th convolutional layer in the u-th RRU, and the function σ is the ReLU activation function.
The compression artifact removal method based on dual-stream multi-path recursive residual network according to claim 4, characterized in that: in each IRB, low-level features are transmitted to the next network layer of the residual block in a skip connection mode; Assume that the representation function of each IRB is
The corresponding input and output of the b-th IRB are X b and X b′ , respectively,
Indicates the weight of the i-th residual unit of the b-th recursive block; the formula of IRB composed of two RRUs is expressed as:

Therefore, the expression formula for a network with two IRBs is:

Among them, X and X'respectively represent the input and output of the network, and f and f rec respectively represent the mapping function of the first and last convolutional layer in the entire dual-stream network.
Represents the overall mapping function of the proposed network.