CN114119789B

CN114119789B - Lightweight HEVC chrominance image quality enhancement method based on online learning

Info

Publication number: CN114119789B
Application number: CN202210097819.9A
Authority: CN
Inventors: 曾兵; 杨仁威; 刘何为; 朱树元
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-03
Anticipated expiration: 2042-01-27
Also published as: CN114119789A

Abstract

The invention belongs to the field of quality enhancement of video compression, and provides a lightweight HEVC chrominance image quality enhancement method based on online learning, which is used for improving the quality of a compressed chrominance image. The main process of the invention is as follows: configuring the same lightweight HEVC chrominance image quality enhancement model (Dec-CEN) at a coding end and a decoding end respectively, performing online learning on the Dec-CEN model at the coding end, transmitting updated network parameters to the decoding end after learning is completed, and loading the parameters into the Dec-CEN model at the decoding end, so that the decoding end can obtain a high-performance network model without learning; the method can greatly improve the enhancement performance of the HEVC chrominance image quality, and has the advantages of small volume, low operation complexity and extremely low requirement on the computing power of equipment; particularly, when high-resolution pictures are processed, the method has better performance advantage and remarkable speed advantage, and accords with the current trend of high-definition video content.

Description

Lightweight HEVC chrominance image quality enhancement method based on online learning

Technical Field

The invention belongs to the field of quality enhancement of video compression, and particularly provides a lightweight HEVC chrominance image quality enhancement method based on online learning, aiming at video frames compressed by an HEVC video coding and decoding standard, so as to be applied to chrominance quality enhancement.

Background

The amount of video data has been increasing during the last decades, but under limited network bandwidth, the amount of original video data is too large to be spread, so video coding and decoding technology is proposed to compress video; high Efficiency Video Coding (HEVC) is one of the internationally known coding and decoding technical standards, but on the other hand, High-Efficiency compression inevitably leads to image quality degradation; how to improve the compressed low-quality image becomes a concern in the academic and industrial fields.

With the rapid development of artificial Neural Network technology based on deep learning in recent years, a large number of methods based on Convolutional Neural Network (CNN) have been proposed to solve the above problems, and all of them achieve excellent quality enhancement effect. For example, an image quality enhancement model named FQE-CNN is proposed in the article "Frame-wise CNN-based filtering for intra-Frame quality enhancement of HEVC videos" published by Huang et al, which takes a low-quality image and encoded information of HEVC as input, and finally outputs a higher-quality image through a U-shaped main body structure built by a convolutional layer.

However, most deep learning schemes suffer from several problems:

1) the model is large in size: the larger the number of models is, the more equipment storage space is needed, and often a scheme needs a plurality of models, which further causes storage burden and is extremely unfriendly to portable equipment, such as mobile phones;

2) the model complexity is high: increasingly complex models require higher computational power, resulting in a substantial reduction in the operating speed of the model; similarly, it is still very unfriendly to devices with limited computing resources (such as mobile phones);

3) not applicable to high definition pictures: with the progress of technology, video content is continuously developed to high definition, but the current quality enhancement scheme is not suitable for high-resolution pictures; on one hand, the quality improvement effect is reduced on high-definition pictures; on the other hand, the running speed of the network model is greatly reduced when the high-resolution picture is processed;

4) the characteristics of video coding and decoding are ignored: at present, in the prior art, an off-line learning framework is mostly adopted, and a neural network model is not updated after a training stage is completed, namely training is closed during actual use; however, in video coding and decoding, the performance upper limit is greatly limited by the offline learning framework.

Disclosure of Invention

Aiming at various problems in the prior art, the invention provides a lightweight HEVC Chrominance image quality Enhancement model (Dec-CEN), and provides an online learning framework on the basis of the model, so that the lightweight HEVC Chrominance image quality Enhancement method based on online learning is obtained, and the method is used for improving the quality of compressed Chrominance images. The method can greatly improve the enhancement performance of the HEVC chrominance image quality, and has the advantages of small volume, low operation complexity and extremely low requirement on the computing power of equipment; particularly, when high-resolution pictures are processed, the method has better performance advantage and remarkable speed advantage, and accords with the current trend of high-definition video content.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the method for enhancing the quality of the lightweight HEVC chrominance image based on online learning is characterized by comprising the following steps:

configuring the same light-weight HEVC chroma image quality enhancement model (Dec-CEN) at an encoding end and a decoding end respectively, wherein the light-weight HEVC chroma image quality enhancement model comprises an Adaptive Layer (AL);

at a coding end, inputting an original image, and obtaining a compressed image after HEVC compression; the method comprises the steps of taking a compressed image as input of a lightweight HEVC chrominance image quality enhancement model, setting a loss function and a learning rate, conducting online learning on an adaptive layer in the lightweight HEVC chrominance image quality enhancement model to obtain updated adaptive layer parameters, and subtracting the updated adaptive layer parameters from initial adaptive layer parameters to obtain adaptive layer parameter residual errors deltaparam(ii) a Adaptive layer parameter residual ΔparamCompressing the code stream into a binary code stream, and transmitting the binary code stream and the compressed image to a decoding end;

at the decoding end, based on the adaptive layer parameter residual ΔparamUpdating adaptive layer parameters in the lightweight HEVC chrominance image quality enhancement model, inputting the compressed image into the updated lightweight HEVC chrominance image quality enhancement model, and outputting a quality enhancement chrominance image by the updated lightweight HEVC chrominance image quality enhancement model.

Further, the encoding end performs one-time online learning at intervals of fixed frames, and stops learning when the online learning frequency reaches a preset threshold, wherein the intervals of the fixed frames are 50 frames.

Further, the loss function adopts an L1 norm loss function, which specifically includes:

wherein the content of the first and second substances,

represents the output of the lightweight HEVC chroma image quality enhancement model,

which represents the input of the original image, and,

which represents the coordinates of the pixel or pixels,W、Hrespectively representing the width and height of the image.

Further, the lightweight HEVC chroma enhancement model (Dec-CEN) is specifically:

compressing a chrominance image as inputx _chromaThe convolution layer conv (3 × 3,128), three Recursive Blocks (RB), and convolution layer conv (3 × 3,1) are obtainedx _chroma , ₁，x _chroma , ₁Andx _chromaadding to obtain a preliminary quality-enhanced intermediate outputx _chroma , ₂；

Compressing a luminance image as inputx _lumaThe features of the luminance image are obtained by a feature extraction module (FEB), a convolution layer conv (3 x 3,64) with a step length of 2, and an adaptive layer AL (64)x _luma,1；

Preliminary quality enhancement intermediate output x _chroma , ₂After convolution layer conv (3 × 3,64) and adaptive layer AL (64) are obtainedx _chroma , ₃，x _chroma , ₃Andx _luma,1after the addition, the quality-enhanced chrominance image is obtained through the convolution layer conv (3 × 3,64), the adaptive layer AL (64), the convolution layer conv (3 × 3,32), the convolution layer conv (3 × 3,16), the adaptive layer AL (16), and the convolution layer conv (3 × 3, 1).

Still further, the recursive block comprises: AU _1 unit, 3 shared parameter AU _2 units, convolutional layer conv (3 × 3,64), specifically:

wherein the content of the first and second substances,x _RBfor the input of the recursive block RB,x _RB,1is the output of the AU _1 unit,x _RB,2is the output of the first AU _2 unit,x _RB,3is the output of the second AU _2 unit,x _RB,4is the output of the third AU _2 unit,y _RBfor the output of the recursive block RB,βwhich represents the Gate Coefficient (GC),convrepresents a convolution operation;

the AU _1 unit has the same structure as the AU _2 unit, and comprises: 2 convolutional layers conv (1 × 1,128) and 1 convolutional layer conv (3 × 3,128), specifically:

wherein the content of the first and second substances,x _AUis an input to the AU unit and,y _AUis the output of the AU unit and,Avgrepresents the 3 × 3 averaging operation.

Still further, the feature extraction module (FEB) comprises: 4 convolutional layers conv (3 × 3,64), specifically:

wherein the content of the first and second substances,x _FEBis an input to the feature extraction module FEB,y _FEBis the output of the feature extraction module FEB,x _FEB,1is the output of the first convolutional layer conv (3 x 3,64),x _FEB,2is the output of the second convolutional layer conv (3 x 3,64),x _FEB,3is the output of the third convolutional layer conv (3 x 3,64),x _FEB,4is the output of the fourth convolution layer conv (3 × 3, 64).

Further, the Adaptive Layer (AL) is a packet-specific convolutional layer with a convolutional kernel size of 1 × 1, specifically:

for an adaptive layer with n channels, the input is

N parameters thereof are

Then the adaptive layer output is

。

In terms of working principle:

the self-adaptive layer only occupies a tiny parameter (volume), so that the Dec-CEN model can keep a small volume; in particular, for input

In conventional convolutional layer conv (3 × 3, n), each convolutional kernel size is

With n number of convolution kernels (i.e., number of channels output) and n depth per convolution kernel (i.e., number of channels input), then the conventional convolution layer conv (3 × 3, n) requires a total of

A parameter; the self-adaptive layer of the invention only needs n parameters in total, thereby greatly reducing the required parameters;

on the other hand, the complexity of the self-adaptive layer is extremely low, and the operation speed of the Dec-CEN model is also ensured; in particular, for input

The typical convolutional layer conv (3 × 3, n) has a size of each convolutional core of

I.e. 9 multiplications at each position, with a width ofWHigh isHSharing pictures of

For each position, the number of convolution kernels is n, and the depth of the convolution kernel is n, then the total number of multiplications required for convolution layer conv (3 × 3, n) is:

the adaptation layer of the present invention only multiplies the value on each channel once, that is:

therefore, the operation complexity of the self-adaptive layer is greatly reduced;

on the premise of extremely small volume and extremely low complexity, the self-adaptive layer under the online learning framework can still provide excellent performance.

The invention has the beneficial effects that:

the invention provides a lightweight HEVC chrominance image quality enhancement method based on online learning, which has the following advantages:

1) the volume is small: on one hand, the single model parameter of a lightweight HEVC chroma image quality enhancement model (Dec-CEN) model is small; on the other hand, online learning enables the image input device to adapt to different types of input, so that only one model is needed to deal with various types of image input, and compared with the prior art, a plurality of models are often required to be trained to deal with different types of images; therefore, the invention greatly reduces the total parameter number of the model, and can effectively save the storage space of the encoding and decoding end equipment;

2) the operation speed is fast: the Dec-CEN model effectively reduces a plurality of redundant convolution operations, so that the calculation amount is greatly reduced, and a faster operation speed is achieved; in other words, the invention has lower requirement on the computing power of the equipment, so the practicability is stronger;

3) adapting to high-definition pictures: at present, the high definition of video content is the integral trend of the video industry, and compared with the prior art, the method can achieve more excellent quality enhancement performance and faster running speed on high-resolution images, namely, the method is more in line with practical application scenes;

4) an efficient online learning framework: the invention designs a set of online learning framework by fully utilizing the characteristics available for high-quality frames in the video coding process, so that the performance of a decoding end model is obviously improved under the condition of no extra calculation burden.

Drawings

Fig. 1 is a network structure diagram of a lightweight HEVC chrominance image quality enhancement model (Dec-CEN) according to the present invention.

Fig. 2 is a network structure diagram of a Recursive Block (RB) in the lightweight HEVC chroma quality enhancement model shown in fig. 1.

Fig. 3 is a network structure diagram of AU _1 units and AU _2 units in the recursive block shown in fig. 2.

Fig. 4 is a network structure diagram of a feature extraction module (FEB) in the lightweight HEVC chroma image quality enhancement model shown in fig. 1.

Fig. 5 is a schematic flow diagram of a lightweight HEVC chrominance image quality enhancement method based on online learning according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The embodiment provides a method for enhancing the quality of a lightweight HEVC chrominance image based on online learning, and a schematic flow diagram of the method is shown in fig. 5; the core of the method is as follows: the method comprises the steps that an HEVC chroma image quality enhancement model (Dec-CEN) at a coding end is subjected to online learning, updated network parameters are transmitted to a decoding end after the learning is finished, the parameters are loaded to the HEVC chroma image quality enhancement model (Dec-CEN) at the decoding end, and the decoding end can obtain a high-performance network model without learning; the method specifically comprises the following steps:

at a coding end, inputting an original image, and obtaining a compressed image (a compressed brightness image and a compressed chroma image) after HEVC compression; the method comprises the steps of taking a compressed image as input of a lightweight HEVC chrominance image quality enhancement model, setting a loss function and a learning rate, conducting online learning on an adaptive layer in the lightweight HEVC chrominance image quality enhancement model to obtain updated adaptive layer parameters, and subtracting the updated adaptive layer parameters from initial adaptive layer parameters to obtain adaptive layer parameter residual errors deltaparam(ii) a Residual error delta of adaptive layer parameterparamCompressing the code stream into a binary code stream, and transmitting the binary code stream and the compressed image to a decoding end;

Furthermore, in the lightweight HEVC chrominance image quality enhancement method based on online learning, a strategy of multi-frame shared learning is adopted, namely, the encoding end performs online learning once at intervals of fixed frames; in the embodiment, the interval is fixed to be 50 frames, namely, online learning is only performed once every 50 video frames, and the parameters obtained through one-time learning are shared by 50 video frames, so that the strategy can reduce the time consumption and delta caused by online learningparamThe caused extra data volume is averagely shared on each frame, so that the running time and code rate of a coding end are effectively saved;

meanwhile, a 'fast learning' strategy is adopted, learning is stopped when the online learning times reach a preset threshold, and considering that the compression time consumption of HEVC is in direct proportion to the size of a picture, the online learning time of the large-size picture is longer (the preset threshold is larger), so that better image enhancement performance can be obtained, and the time of an HEVC coding end cannot be obviously increased; in contrast, the online learning time of the small-sized picture is shorter (the preset threshold is smaller).

The light-weighted HEVC chroma enhancement model (Dec-CEN) is shown in fig. 1, where conv (k × k, n) represents a convolutional layer with a convolutional kernel size of k × k and n output channels, and the default step size is 1; for example, conv (3 × 3,128) represents a convolutional layer with a step size of 1, a convolutional kernel size of 3 × 3, and a number of output channels of 128; AL (n) an Adaptive Layer (AL) indicating the number of channels n; specifically, the method comprises the following steps:

quality-impaired compressed chrominance image as inputx _chromaThe convolution layer conv (3 × 3,128), three Recursive Blocks (RB), and convolution layer conv (3 × 3,1) are obtainedx _chroma , ₁，x _chroma , ₁Plus withx _chromaObtaining a preliminary quality-enhanced intermediate output x _chroma , ₂；

Compressing a luminance image as inputx _lumaThe features of the luminance image are obtained by a feature extraction module (FEB), a convolution layer conv (3 × 3,64) having a step size of 2, and an adaptive layer AL (64)x _luma,1；

Preliminary quality enhancement intermediate output x _chroma , ₂After convolution layer conv (3 × 3,64) and adaptive layer AL (64) are obtainedx _chroma , ₃，x _chroma , ₃Andx _luma,1adding the convolution layer conv (3 × 3,64), the adaptive layer AL (64), the convolution layer conv (3 × 3,32), the convolution layer conv (3 × 3,16), the adaptive layer AL (16) and the convolution layer conv (3 × 3,1) to obtain a quality-enhanced chrominance image;

it should be noted that the step size of 2 indicates that a convolution operation is performed every 1 pixel at an interval, that is, the width and height of the final output are half of the original width and height; since the commonly used format YUV420 of HEVC specifies that chrominance images are half as wide and half as high as luminance images, the present invention uses a convolution with step size 2 to align the size of the images.

More specifically:

the Recursion Block (RB) employs a "recursion" technique, which uses completely consistent parameters for a plurality of identical modules, and based on the technique, the network depth can be deepened without adding additional parameters, thereby improving the model effect, such as the recursion Block RB of 3 shared parameters shown in fig. 1;

the concrete structure of the recursive block RB is shown in fig. 2, and the "recursive" technique is also adopted, that is, 3 parameter-sharing AU _2 units are shown in fig. 2; the recursive block RB includes: AU _1 unit, 3 shared parameter AU _2 units, convolutional layer conv (3 × 3,64), specifically:

wherein the content of the first and second substances,x _RBfor the input of the recursive block RB,x _RB,1is the output of the AU _1 unit,x _RB,2is the output of the first AU _2 unit,x _RB,3is the output of the second AU _2 unit,x _RB,4is the output of the third AU _2 unit,y _RBfor the output of the recursive block RB,βrepresents a Gate Coefficient (GC),convrepresents a convolution operation;

coefficient of the above doorβWhich is a parameter that can be learned, it should be noted that,βis the only unshared parameter in the recursive block RB of 3 shared parameters, i.e. in three RBsβThe values are different.

The AU _1 unit has the same structure as the AU _2 unit, as shown in fig. 3, and includes: 2 convolutional layers conv (1 × 1,128) and 1 convolutional layer conv (3 × 3,128), specifically:

wherein, the first and the second end of the pipe are connected with each other,x _AUis an input to the AU unit and,y _AUis the output of the AU unit and, Avg representing 3 × 3 averaging operationsCalculating the average value of the input values in 3 multiplied by 3;

the convolution kernel size of conv (1 × 1,128) in the AU unit is only 1 × 1, which not only greatly saves the parameter amount of the model, but also has small calculation amount.

The Feature Extraction Block (FEB) is shown in fig. 4, and includes: 4 convolutional layers conv (3 × 3,64), specifically:

wherein the content of the first and second substances,x _FEBis an input to the feature extraction module FEB,y _FEBfor the output of the feature extraction module FEB,x _FEB,1is the output of the first convolutional layer conv (3 x 3,64),x _FEB,2is the output of the second convolutional layer conv (3 x 3,64),x _FEB,3is the output of the third convolutional layer conv (3 x 3,64),x _FEB,4is the output of the fourth convolutional layer conv (3 × 3, 64);

meanwhile, two feature extraction modules which are connected in sequence are used in the invention, so that better performance is achieved.

The Adaptive Layer (AL) is a special packet convolutional Layer with a convolutional kernel size of 1 × 1, and is used to multiply each input channel by a coefficient and output the multiplied coefficient, specifically:

for an adaptive layer with n channels, the input is

N parameters thereof are

Then the adaptive layer output is

；

In the HEVC chroma image quality enhancement model (Dec-CEN) of the present invention, there is and only an adaptation layer participating in online learning.

Before the lightweight HEVC chroma image quality enhancement model (Dec-CEN) is configured to a coding end and a decoding end, offline pre-training needs to be performed, wherein the offline pre-training specifically comprises the following steps:

constructing a training data set: in this embodiment, pictures in a Flickr2K picture set are compressed by using HEVC official reference software HM-16.7, a Quantization Parameter (QP) is used to control a Quantization step size, and the larger the QP value is, the larger the quality loss of the picture is, and accordingly, the smaller the compressed volume is, and in this embodiment, QP =27 is taken; the All Intra mode of HEVC is used, i.e. All pictures are Intra compressed; then cutting the compressed luminance image and the compressed chrominance image into non-overlapping image blocks, wherein the size of the luminance image block is 64 multiplied by 64, the size of the chrominance image block is 32 multiplied by 32, the compressed luminance image block and the compressed chrominance image block are used as input, an original input image is used as a learning target, a training sample is formed, and a training data set is further formed;

off-line pre-training: in this embodiment, a PyTorch deep learning framework is used to train on an nvidia geforcegtx1080 TiGPU; the batch size was set to 64 training samples and the loss function was set to the L1 norm:

wherein the content of the first and second substances,

which represents the input original image, is shown,

which represents the coordinates of the pixel or pixels,W、Hrespectively representing the width and the height of the image;

setting the learning rate to be 1e-4, and training 40 periods to obtain the lightweight HEVC chroma image quality enhancement model after offline pre-training.

The lightweight HEVC chrominance image quality enhancement method based on online learning provided in this embodiment is tested as follows, and the test set is:

in this embodiment, a video test sequence in YUV420 format recommended by the official of HEVC is used, where the video test sequence includes 15 video sequences and is divided into four categories, and each category corresponds to a resolution, specifically:

class a, resolution is: 2560 × 1600, sequence name: A1.Traﬀic、A2.PeopleOnStreet；

class B, resolution is: 1920 × 1080, sequence name: B1.Traﬀic、B2. PeopleOnStreet、B3. Cactus、B4. BQTerrace、B5. BasketballDrive；

class C, resolution is: 832 × 480, sequence name: C1.RaceHorses、C2.BQMall、C3.PartyScene、C4.BasketballDrill；

class D, resolution is: 416 × 240, sequence name: D1.RaceHorses、D2.BQSquare、D3.BlowingBubbles、D4.BasketballPass；

the embodiment performs video compression by using HEVC under 4 QPs (22, 27, 32, 37), and takes a compressed video sequence as a test input of a model;

it should be noted that: in this embodiment, because the GPU has limited video memory, too large pictures cannot be directly sent to the model for testing; therefore, during online learning, a class A diagram with the resolution of 2560 × 1600 is divided into 16 parts of small diagrams (the width and the height of the small diagrams are respectively cut into 1/4 of the original diagram), a class B diagram with the resolution of 1920 × 1080 is divided into 4 parts of small diagrams, and the small diagrams are sequentially sent into a model to be tested, sequentially subjected to model processing and then spliced into the original diagram; in addition, in the present embodiment, the number of online learning times for A, B, C, D-class pictures is set to 2000, 1000, 100, and 100 times, respectively.

In this embodiment, a latest and performance-optimized quality enhancement scheme FQE-CNN (Frame-wise CNN-based filtering for intra-Frame quality enhancement of HEVC video) is adopted as a comparative example, and this embodiment compares the quality enhancement effect, the number of model parameters, the running time of the codec end, and other dimensions, and the test result is as follows:

1. quality enhancement performance

According to the method, the BD-rate popular in the video coding industry is used as an evaluation index, the BD-rate can calculate the code rate which can be saved by a scheme compared with HEVC according to the image quality improvement result under 4 QP, in other words, if the BD-rate is a negative value, the scheme can bring gain; for example, the experimental result of a certain scheme is BD-rate = -10%, which indicates that the data amount required by the scheme under the same image quality is 10% less than that of HEVC;

the average BD-rates under four categories are listed in the table 1, and the table shows that the invention has better comprehensive performance and can save the code rate by 26.9 percent; particularly on the maximum size class A picture, the performance of the invention is obviously superior to FQE-CNN, and the current 2K video with 2560 × 1600 resolution is more and more popular, and the user quantity is more and more; on a B-type picture with a secondary size, the performance of the scheme of the invention is also obviously better, and the resolution ratio is 1080p, namely 1920 multiplied by 1080 is the size with very high utilization rate of the current mobile phone end and the computer end;

table 1: BD-rate results of various class Performance tests

Meanwhile, the table 2 details the individual gain condition of each sequence in high-resolution class a and class B, and as can be seen from the table, the present invention brings better performance gain on each video sequence, and embodies the robustness of the present invention; among them, the highest possible sequence "PeopleOnStreet"a code rate savings of-38.9% is achieved;

table 2: performance test of each video sequence BD-rate results

In conclusion, the Dec-CEN model of the present invention not only has better comprehensive performance, but also has more obvious gain on video sequences with higher resolution, and conforms to the overall trend of high content definition in the current video industry.

2. Quantity of model parameters

Table 3 lists the parameter quantity of the single model of the model and the total model parameter quantity, and the table shows that the parameter quantity of the Dec-CEN single model provided by the invention is less than half of FQE-CNN; meanwhile, in order to be applicable to multiple types of picture input, most existing schemes currently require training several models offline to cope with different types of picture input, for example, FQE-CNN requires training a separate model for each QP and each resolution, so that a total of 4 × 4=16 models are required for picture input of 4 QPs (22, 27, 32, 37) and 4 resolutions (2560 × 1600, 1920 × 1080, 832 × 480, 416 × 240), whereas in the present invention, the adaptive capability brought by online learning enables one Dec-CEN model to be applicable to multiple types of picture input; therefore, the invention reduces the model parameter quantity by 96.98%, which greatly reduces the storage pressure of the decoding end equipment, is very friendly to the decoding end and has stronger practicability.

Table 3: number of model parameters (Unit: thousand k)

3. Decoding side run time

The model running time required by the model at the decoding end is listed in the table 4, and the average running time of the invention is obviously superior to FQE-CNN; moreover, FQE-CNN is more sensitive to resolution and the speed of processing large-size pictures becomes significantly slower.

Table 4: decoding end running time (unit: ms)

4. Run time of encoder side

Table 5 lists the inventive run time Δ on the encoding side relative to HEVCt：Δt=t ₀/t，t ₀For the run time of the Dec-CEN model at the encoding end,ttime originally required for video coding for HEVC; as can be seen, the present invention only increases the encoding end time by a small amount and is superior to the comparison scheme FQE-CNN.

Table 5: relative running time (unit: percentage%) of encoding end

Therefore, the method can effectively enhance the quality of the video frame, namely, the code rate is saved in a phase-changing manner, and the BD-rate is taken as an index, so that the code rate can be saved by 26.9 percent on average, wherein the highest code rate can reach 31.3 percent on the 2k resolution (2560 multiplied by 1600); compared with the prior art, the method reduces 96.98% of model parameters, shortens the running time of a decoding end by 85.7% on average, and particularly shortens the running time of the decoding end by 94.3% on a 1080p resolution (1920 x 1080).

In conclusion, the characteristics of the video coding and decoding task are fully utilized, and based on an online learning framework, the decoding end directly obtains the updated network model without training, so that a better quality enhancement effect is obtained, and the enhancement effect is better on a higher definition video, so that the method conforms to the industry trend of high definition of the current video content; meanwhile, on the premise of better performance, the Dec-CEN model has the advantages of small volume, low complexity, very good friendliness to a decoding end and stronger practicability.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. The method for enhancing the quality of the lightweight HEVC chrominance image based on online learning is characterized by comprising the following steps:

configuring the same light-weight HEVC chrominance image quality enhancement model at a coding end and a decoding end respectively, wherein the light-weight HEVC chrominance image quality enhancement model comprises an adaptive layer;

2. The on-line learning-based lightweight HEVC chrominance image quality enhancement method as claimed in claim 1, wherein a coding end performs one on-line learning at fixed frame intervals, and the learning is stopped when the on-line learning frequency reaches a preset threshold.

3. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 1, wherein said loss function is an L1 norm loss function.

4. The on-line learning-based lightweight HEVC chrominance image quality enhancement method of claim 1, wherein said lightweight HEVC chrominance image quality enhancement model is specifically:

compressing a chrominance image as inputx _chromaThe convolution layer conv (3 × 3,128), three recursive blocks, and convolution layer conv (3 × 3,1) are obtainedx _chroma , ₁，x _chroma , ₁Andx _chromaadding to obtain a preliminary quality-enhanced intermediate output x _chroma , ₂；

Compressing a luminance image as inputx _lumaThe features of the luminance image are obtained by the feature extraction module, the convolution layer conv (3 × 3,64) with the step length of 2 and the adaptive layer AL (64)x _luma,1；

Preliminary quality enhancement intermediate output x _chroma , ₂After convolutional layer conv (3 × 3,64) and adaptive layer AL (64) are processed to obtainx _chroma , ₃，x _chroma , ₃Andx _luma,1after addition, obtaining a quality-enhanced chrominance image through a convolution layer conv (3 × 3,64), an adaptive layer AL (64), a convolution layer conv (3 × 3,32), a convolution layer conv (3 × 3,16), an adaptive layer AL (16) and a convolution layer conv (3 × 3, 1);

here, convolutional layer conv (k × k, n) represents a convolutional layer having a convolutional kernel size of k × k and the number of output channels of n, and adaptive layer al (n) represents an adaptive layer having the number of channels of n.

5. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 4, wherein said recursive block comprises: AU _1 unit, 3 shared parameter AU _2 units, convolution layer conv (3 × 3,64), specifically:

wherein, the first and the second end of the pipe are connected with each other,x _RBfor the input of the recursive block RB,x _RB,1is the output of the AU _1 unit,x _RB,2is the first AU \The output of the 2-unit cell is,x _RB,3is the output of the second AU _2 unit,x _RB,4is the output of the third AU _2 unit,y _RBfor the output of the recursive block RB,βthe coefficient of the gate is represented by,convrepresents a convolution operation;

6. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 4, wherein said feature extraction module comprises: 4 convolutional layers conv (3 × 3,64), specifically:

wherein the content of the first and second substances,x _FEBis an input to the feature extraction module FEB,y _FEBfor the output of the feature extraction module FEB,x _FEB,1is the output of the first convolutional layer conv (3 x 3,64),x _FEB,2is the output of the second convolutional layer conv (3 x 3,64),x _FEB,3is the output of the third convolutional layer conv (3 x 3,64),x _FEB,4is the output of the fourth convolution layer conv (3 × 3, 64).

7. The on-line learning-based lightweight HEVC chrominance image quality enhancement method of claim 4, wherein said adaptive layer is a special packet convolution layer with a convolution kernel size of 1 x1, and specifically comprises:

for an adaptation layer with n channels, the input is

N parameters thereof are

Then the adaptive layer output is:

。