CN114119789B - Lightweight HEVC chrominance image quality enhancement method based on online learning - Google Patents

Lightweight HEVC chrominance image quality enhancement method based on online learning Download PDF

Info

Publication number
CN114119789B
CN114119789B CN202210097819.9A CN202210097819A CN114119789B CN 114119789 B CN114119789 B CN 114119789B CN 202210097819 A CN202210097819 A CN 202210097819A CN 114119789 B CN114119789 B CN 114119789B
Authority
CN
China
Prior art keywords
hevc
quality enhancement
layer
image quality
lightweight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210097819.9A
Other languages
Chinese (zh)
Other versions
CN114119789A (en
Inventor
曾兵
杨仁威
刘何为
朱树元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210097819.9A priority Critical patent/CN114119789B/en
Publication of CN114119789A publication Critical patent/CN114119789A/en
Application granted granted Critical
Publication of CN114119789B publication Critical patent/CN114119789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention belongs to the field of quality enhancement of video compression, and provides a lightweight HEVC chrominance image quality enhancement method based on online learning, which is used for improving the quality of a compressed chrominance image. The main process of the invention is as follows: configuring the same lightweight HEVC chrominance image quality enhancement model (Dec-CEN) at a coding end and a decoding end respectively, performing online learning on the Dec-CEN model at the coding end, transmitting updated network parameters to the decoding end after learning is completed, and loading the parameters into the Dec-CEN model at the decoding end, so that the decoding end can obtain a high-performance network model without learning; the method can greatly improve the enhancement performance of the HEVC chrominance image quality, and has the advantages of small volume, low operation complexity and extremely low requirement on the computing power of equipment; particularly, when high-resolution pictures are processed, the method has better performance advantage and remarkable speed advantage, and accords with the current trend of high-definition video content.

Description

Lightweight HEVC chrominance image quality enhancement method based on online learning
Technical Field
The invention belongs to the field of quality enhancement of video compression, and particularly provides a lightweight HEVC chrominance image quality enhancement method based on online learning, aiming at video frames compressed by an HEVC video coding and decoding standard, so as to be applied to chrominance quality enhancement.
Background
The amount of video data has been increasing during the last decades, but under limited network bandwidth, the amount of original video data is too large to be spread, so video coding and decoding technology is proposed to compress video; high Efficiency Video Coding (HEVC) is one of the internationally known coding and decoding technical standards, but on the other hand, High-Efficiency compression inevitably leads to image quality degradation; how to improve the compressed low-quality image becomes a concern in the academic and industrial fields.
With the rapid development of artificial Neural Network technology based on deep learning in recent years, a large number of methods based on Convolutional Neural Network (CNN) have been proposed to solve the above problems, and all of them achieve excellent quality enhancement effect. For example, an image quality enhancement model named FQE-CNN is proposed in the article "Frame-wise CNN-based filtering for intra-Frame quality enhancement of HEVC videos" published by Huang et al, which takes a low-quality image and encoded information of HEVC as input, and finally outputs a higher-quality image through a U-shaped main body structure built by a convolutional layer.
However, most deep learning schemes suffer from several problems:
1) the model is large in size: the larger the number of models is, the more equipment storage space is needed, and often a scheme needs a plurality of models, which further causes storage burden and is extremely unfriendly to portable equipment, such as mobile phones;
2) the model complexity is high: increasingly complex models require higher computational power, resulting in a substantial reduction in the operating speed of the model; similarly, it is still very unfriendly to devices with limited computing resources (such as mobile phones);
3) not applicable to high definition pictures: with the progress of technology, video content is continuously developed to high definition, but the current quality enhancement scheme is not suitable for high-resolution pictures; on one hand, the quality improvement effect is reduced on high-definition pictures; on the other hand, the running speed of the network model is greatly reduced when the high-resolution picture is processed;
4) the characteristics of video coding and decoding are ignored: at present, in the prior art, an off-line learning framework is mostly adopted, and a neural network model is not updated after a training stage is completed, namely training is closed during actual use; however, in video coding and decoding, the performance upper limit is greatly limited by the offline learning framework.
Disclosure of Invention
Aiming at various problems in the prior art, the invention provides a lightweight HEVC Chrominance image quality Enhancement model (Dec-CEN), and provides an online learning framework on the basis of the model, so that the lightweight HEVC Chrominance image quality Enhancement method based on online learning is obtained, and the method is used for improving the quality of compressed Chrominance images. The method can greatly improve the enhancement performance of the HEVC chrominance image quality, and has the advantages of small volume, low operation complexity and extremely low requirement on the computing power of equipment; particularly, when high-resolution pictures are processed, the method has better performance advantage and remarkable speed advantage, and accords with the current trend of high-definition video content.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
the method for enhancing the quality of the lightweight HEVC chrominance image based on online learning is characterized by comprising the following steps:
configuring the same light-weight HEVC chroma image quality enhancement model (Dec-CEN) at an encoding end and a decoding end respectively, wherein the light-weight HEVC chroma image quality enhancement model comprises an Adaptive Layer (AL);
at a coding end, inputting an original image, and obtaining a compressed image after HEVC compression; the method comprises the steps of taking a compressed image as input of a lightweight HEVC chrominance image quality enhancement model, setting a loss function and a learning rate, conducting online learning on an adaptive layer in the lightweight HEVC chrominance image quality enhancement model to obtain updated adaptive layer parameters, and subtracting the updated adaptive layer parameters from initial adaptive layer parameters to obtain adaptive layer parameter residual errors deltaparam(ii) a Adaptive layer parameter residual ΔparamCompressing the code stream into a binary code stream, and transmitting the binary code stream and the compressed image to a decoding end;
at the decoding end, based on the adaptive layer parameter residual ΔparamUpdating adaptive layer parameters in the lightweight HEVC chrominance image quality enhancement model, inputting the compressed image into the updated lightweight HEVC chrominance image quality enhancement model, and outputting a quality enhancement chrominance image by the updated lightweight HEVC chrominance image quality enhancement model.
Further, the encoding end performs one-time online learning at intervals of fixed frames, and stops learning when the online learning frequency reaches a preset threshold, wherein the intervals of the fixed frames are 50 frames.
Further, the loss function adopts an L1 norm loss function, which specifically includes:
Figure 550873DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 425288DEST_PATH_IMAGE002
represents the output of the lightweight HEVC chroma image quality enhancement model,
Figure 129939DEST_PATH_IMAGE003
which represents the input of the original image, and,
Figure 907664DEST_PATH_IMAGE004
which represents the coordinates of the pixel or pixels,WHrespectively representing the width and height of the image.
Further, the lightweight HEVC chroma enhancement model (Dec-CEN) is specifically:
compressing a chrominance image as inputx chroma The convolution layer conv (3 × 3,128), three Recursive Blocks (RB), and convolution layer conv (3 × 3,1) are obtainedx chroma , 1x chroma , 1Andx chroma adding to obtain a preliminary quality-enhanced intermediate outputx chroma , 2
Compressing a luminance image as inputx luma The features of the luminance image are obtained by a feature extraction module (FEB), a convolution layer conv (3 x 3,64) with a step length of 2, and an adaptive layer AL (64)x luma,1
Preliminary quality enhancement intermediate output x chroma , 2After convolution layer conv (3 × 3,64) and adaptive layer AL (64) are obtainedx chroma , 3x chroma , 3Andx luma,1 after the addition, the quality-enhanced chrominance image is obtained through the convolution layer conv (3 × 3,64), the adaptive layer AL (64), the convolution layer conv (3 × 3,32), the convolution layer conv (3 × 3,16), the adaptive layer AL (16), and the convolution layer conv (3 × 3, 1).
Still further, the recursive block comprises: AU _1 unit, 3 shared parameter AU _2 units, convolutional layer conv (3 × 3,64), specifically:
Figure 406779DEST_PATH_IMAGE005
wherein the content of the first and second substances,x RB for the input of the recursive block RB,x RB,1is the output of the AU _1 unit,x RB,2is the output of the first AU _2 unit,x RB,3is the output of the second AU _2 unit,x RB,4is the output of the third AU _2 unit,y RB for the output of the recursive block RB,βwhich represents the Gate Coefficient (GC),convrepresents a convolution operation;
the AU _1 unit has the same structure as the AU _2 unit, and comprises: 2 convolutional layers conv (1 × 1,128) and 1 convolutional layer conv (3 × 3,128), specifically:
Figure 34069DEST_PATH_IMAGE006
wherein the content of the first and second substances,x AU is an input to the AU unit and,y AU is the output of the AU unit and,Avgrepresents the 3 × 3 averaging operation.
Still further, the feature extraction module (FEB) comprises: 4 convolutional layers conv (3 × 3,64), specifically:
Figure 276832DEST_PATH_IMAGE007
wherein the content of the first and second substances,x FEB is an input to the feature extraction module FEB,y FEB is the output of the feature extraction module FEB,x FEB,1is the output of the first convolutional layer conv (3 x 3,64),x FEB,2is the output of the second convolutional layer conv (3 x 3,64),x FEB,3is the output of the third convolutional layer conv (3 x 3,64),x FEB,4is the output of the fourth convolution layer conv (3 × 3, 64).
Further, the Adaptive Layer (AL) is a packet-specific convolutional layer with a convolutional kernel size of 1 × 1, specifically:
for an adaptive layer with n channels, the input is
Figure 407599DEST_PATH_IMAGE008
N parameters thereof are
Figure 812035DEST_PATH_IMAGE009
Then the adaptive layer output is
Figure 661043DEST_PATH_IMAGE010
In terms of working principle:
the self-adaptive layer only occupies a tiny parameter (volume), so that the Dec-CEN model can keep a small volume; in particular, for input
Figure 176338DEST_PATH_IMAGE008
In conventional convolutional layer conv (3 × 3, n), each convolutional kernel size is
Figure 427190DEST_PATH_IMAGE011
With n number of convolution kernels (i.e., number of channels output) and n depth per convolution kernel (i.e., number of channels input), then the conventional convolution layer conv (3 × 3, n) requires a total of
Figure 2528DEST_PATH_IMAGE012
A parameter; the self-adaptive layer of the invention only needs n parameters in total, thereby greatly reducing the required parameters;
on the other hand, the complexity of the self-adaptive layer is extremely low, and the operation speed of the Dec-CEN model is also ensured; in particular, for input
Figure 338832DEST_PATH_IMAGE008
The typical convolutional layer conv (3 × 3, n) has a size of each convolutional core of
Figure 657818DEST_PATH_IMAGE011
I.e. 9 multiplications at each position, with a width ofWHigh isHSharing pictures of
Figure 763177DEST_PATH_IMAGE013
For each position, the number of convolution kernels is n, and the depth of the convolution kernel is n, then the total number of multiplications required for convolution layer conv (3 × 3, n) is:
Figure 7951DEST_PATH_IMAGE014
the adaptation layer of the present invention only multiplies the value on each channel once, that is:
Figure 565971DEST_PATH_IMAGE015
therefore, the operation complexity of the self-adaptive layer is greatly reduced;
on the premise of extremely small volume and extremely low complexity, the self-adaptive layer under the online learning framework can still provide excellent performance.
The invention has the beneficial effects that:
the invention provides a lightweight HEVC chrominance image quality enhancement method based on online learning, which has the following advantages:
1) the volume is small: on one hand, the single model parameter of a lightweight HEVC chroma image quality enhancement model (Dec-CEN) model is small; on the other hand, online learning enables the image input device to adapt to different types of input, so that only one model is needed to deal with various types of image input, and compared with the prior art, a plurality of models are often required to be trained to deal with different types of images; therefore, the invention greatly reduces the total parameter number of the model, and can effectively save the storage space of the encoding and decoding end equipment;
2) the operation speed is fast: the Dec-CEN model effectively reduces a plurality of redundant convolution operations, so that the calculation amount is greatly reduced, and a faster operation speed is achieved; in other words, the invention has lower requirement on the computing power of the equipment, so the practicability is stronger;
3) adapting to high-definition pictures: at present, the high definition of video content is the integral trend of the video industry, and compared with the prior art, the method can achieve more excellent quality enhancement performance and faster running speed on high-resolution images, namely, the method is more in line with practical application scenes;
4) an efficient online learning framework: the invention designs a set of online learning framework by fully utilizing the characteristics available for high-quality frames in the video coding process, so that the performance of a decoding end model is obviously improved under the condition of no extra calculation burden.
Drawings
Fig. 1 is a network structure diagram of a lightweight HEVC chrominance image quality enhancement model (Dec-CEN) according to the present invention.
Fig. 2 is a network structure diagram of a Recursive Block (RB) in the lightweight HEVC chroma quality enhancement model shown in fig. 1.
Fig. 3 is a network structure diagram of AU _1 units and AU _2 units in the recursive block shown in fig. 2.
Fig. 4 is a network structure diagram of a feature extraction module (FEB) in the lightweight HEVC chroma image quality enhancement model shown in fig. 1.
Fig. 5 is a schematic flow diagram of a lightweight HEVC chrominance image quality enhancement method based on online learning according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The embodiment provides a method for enhancing the quality of a lightweight HEVC chrominance image based on online learning, and a schematic flow diagram of the method is shown in fig. 5; the core of the method is as follows: the method comprises the steps that an HEVC chroma image quality enhancement model (Dec-CEN) at a coding end is subjected to online learning, updated network parameters are transmitted to a decoding end after the learning is finished, the parameters are loaded to the HEVC chroma image quality enhancement model (Dec-CEN) at the decoding end, and the decoding end can obtain a high-performance network model without learning; the method specifically comprises the following steps:
configuring the same light-weight HEVC chroma image quality enhancement model (Dec-CEN) at an encoding end and a decoding end respectively, wherein the light-weight HEVC chroma image quality enhancement model comprises an Adaptive Layer (AL);
at a coding end, inputting an original image, and obtaining a compressed image (a compressed brightness image and a compressed chroma image) after HEVC compression; the method comprises the steps of taking a compressed image as input of a lightweight HEVC chrominance image quality enhancement model, setting a loss function and a learning rate, conducting online learning on an adaptive layer in the lightweight HEVC chrominance image quality enhancement model to obtain updated adaptive layer parameters, and subtracting the updated adaptive layer parameters from initial adaptive layer parameters to obtain adaptive layer parameter residual errors deltaparam(ii) a Residual error delta of adaptive layer parameterparamCompressing the code stream into a binary code stream, and transmitting the binary code stream and the compressed image to a decoding end;
at the decoding end, based on the adaptive layer parameter residual ΔparamUpdating adaptive layer parameters in the lightweight HEVC chrominance image quality enhancement model, inputting the compressed image into the updated lightweight HEVC chrominance image quality enhancement model, and outputting a quality enhancement chrominance image by the updated lightweight HEVC chrominance image quality enhancement model.
Furthermore, in the lightweight HEVC chrominance image quality enhancement method based on online learning, a strategy of multi-frame shared learning is adopted, namely, the encoding end performs online learning once at intervals of fixed frames; in the embodiment, the interval is fixed to be 50 frames, namely, online learning is only performed once every 50 video frames, and the parameters obtained through one-time learning are shared by 50 video frames, so that the strategy can reduce the time consumption and delta caused by online learningparamThe caused extra data volume is averagely shared on each frame, so that the running time and code rate of a coding end are effectively saved;
meanwhile, a 'fast learning' strategy is adopted, learning is stopped when the online learning times reach a preset threshold, and considering that the compression time consumption of HEVC is in direct proportion to the size of a picture, the online learning time of the large-size picture is longer (the preset threshold is larger), so that better image enhancement performance can be obtained, and the time of an HEVC coding end cannot be obviously increased; in contrast, the online learning time of the small-sized picture is shorter (the preset threshold is smaller).
The light-weighted HEVC chroma enhancement model (Dec-CEN) is shown in fig. 1, where conv (k × k, n) represents a convolutional layer with a convolutional kernel size of k × k and n output channels, and the default step size is 1; for example, conv (3 × 3,128) represents a convolutional layer with a step size of 1, a convolutional kernel size of 3 × 3, and a number of output channels of 128; AL (n) an Adaptive Layer (AL) indicating the number of channels n; specifically, the method comprises the following steps:
quality-impaired compressed chrominance image as inputx chroma The convolution layer conv (3 × 3,128), three Recursive Blocks (RB), and convolution layer conv (3 × 3,1) are obtainedx chroma , 1x chroma , 1Plus withx chroma Obtaining a preliminary quality-enhanced intermediate output x chroma , 2
Compressing a luminance image as inputx luma The features of the luminance image are obtained by a feature extraction module (FEB), a convolution layer conv (3 × 3,64) having a step size of 2, and an adaptive layer AL (64)x luma,1
Preliminary quality enhancement intermediate output x chroma , 2After convolution layer conv (3 × 3,64) and adaptive layer AL (64) are obtainedx chroma , 3x chroma , 3Andx luma,1 adding the convolution layer conv (3 × 3,64), the adaptive layer AL (64), the convolution layer conv (3 × 3,32), the convolution layer conv (3 × 3,16), the adaptive layer AL (16) and the convolution layer conv (3 × 3,1) to obtain a quality-enhanced chrominance image;
it should be noted that the step size of 2 indicates that a convolution operation is performed every 1 pixel at an interval, that is, the width and height of the final output are half of the original width and height; since the commonly used format YUV420 of HEVC specifies that chrominance images are half as wide and half as high as luminance images, the present invention uses a convolution with step size 2 to align the size of the images.
More specifically:
the Recursion Block (RB) employs a "recursion" technique, which uses completely consistent parameters for a plurality of identical modules, and based on the technique, the network depth can be deepened without adding additional parameters, thereby improving the model effect, such as the recursion Block RB of 3 shared parameters shown in fig. 1;
the concrete structure of the recursive block RB is shown in fig. 2, and the "recursive" technique is also adopted, that is, 3 parameter-sharing AU _2 units are shown in fig. 2; the recursive block RB includes: AU _1 unit, 3 shared parameter AU _2 units, convolutional layer conv (3 × 3,64), specifically:
Figure 688648DEST_PATH_IMAGE005
wherein the content of the first and second substances,x RB for the input of the recursive block RB,x RB,1is the output of the AU _1 unit,x RB,2is the output of the first AU _2 unit,x RB,3is the output of the second AU _2 unit,x RB,4is the output of the third AU _2 unit,y RB for the output of the recursive block RB,βrepresents a Gate Coefficient (GC),convrepresents a convolution operation;
coefficient of the above doorβWhich is a parameter that can be learned, it should be noted that,βis the only unshared parameter in the recursive block RB of 3 shared parameters, i.e. in three RBsβThe values are different.
The AU _1 unit has the same structure as the AU _2 unit, as shown in fig. 3, and includes: 2 convolutional layers conv (1 × 1,128) and 1 convolutional layer conv (3 × 3,128), specifically:
Figure 914093DEST_PATH_IMAGE006
wherein, the first and the second end of the pipe are connected with each other,x AU is an input to the AU unit and,y AU is the output of the AU unit and, Avg representing 3 × 3 averaging operationsCalculating the average value of the input values in 3 multiplied by 3;
the convolution kernel size of conv (1 × 1,128) in the AU unit is only 1 × 1, which not only greatly saves the parameter amount of the model, but also has small calculation amount.
The Feature Extraction Block (FEB) is shown in fig. 4, and includes: 4 convolutional layers conv (3 × 3,64), specifically:
Figure 831234DEST_PATH_IMAGE007
wherein the content of the first and second substances,x FEB is an input to the feature extraction module FEB,y FEB for the output of the feature extraction module FEB,x FEB,1is the output of the first convolutional layer conv (3 x 3,64),x FEB,2is the output of the second convolutional layer conv (3 x 3,64),x FEB,3is the output of the third convolutional layer conv (3 x 3,64),x FEB,4is the output of the fourth convolutional layer conv (3 × 3, 64);
meanwhile, two feature extraction modules which are connected in sequence are used in the invention, so that better performance is achieved.
The Adaptive Layer (AL) is a special packet convolutional Layer with a convolutional kernel size of 1 × 1, and is used to multiply each input channel by a coefficient and output the multiplied coefficient, specifically:
for an adaptive layer with n channels, the input is
Figure 142129DEST_PATH_IMAGE008
N parameters thereof are
Figure 802918DEST_PATH_IMAGE009
Then the adaptive layer output is
Figure 351711DEST_PATH_IMAGE010
In the HEVC chroma image quality enhancement model (Dec-CEN) of the present invention, there is and only an adaptation layer participating in online learning.
Before the lightweight HEVC chroma image quality enhancement model (Dec-CEN) is configured to a coding end and a decoding end, offline pre-training needs to be performed, wherein the offline pre-training specifically comprises the following steps:
constructing a training data set: in this embodiment, pictures in a Flickr2K picture set are compressed by using HEVC official reference software HM-16.7, a Quantization Parameter (QP) is used to control a Quantization step size, and the larger the QP value is, the larger the quality loss of the picture is, and accordingly, the smaller the compressed volume is, and in this embodiment, QP =27 is taken; the All Intra mode of HEVC is used, i.e. All pictures are Intra compressed; then cutting the compressed luminance image and the compressed chrominance image into non-overlapping image blocks, wherein the size of the luminance image block is 64 multiplied by 64, the size of the chrominance image block is 32 multiplied by 32, the compressed luminance image block and the compressed chrominance image block are used as input, an original input image is used as a learning target, a training sample is formed, and a training data set is further formed;
off-line pre-training: in this embodiment, a PyTorch deep learning framework is used to train on an nvidia geforcegtx1080 TiGPU; the batch size was set to 64 training samples and the loss function was set to the L1 norm:
Figure 705332DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 237944DEST_PATH_IMAGE002
represents the output of the lightweight HEVC chroma image quality enhancement model,
Figure 702424DEST_PATH_IMAGE003
which represents the input original image, is shown,
Figure 371302DEST_PATH_IMAGE004
which represents the coordinates of the pixel or pixels,WHrespectively representing the width and the height of the image;
setting the learning rate to be 1e-4, and training 40 periods to obtain the lightweight HEVC chroma image quality enhancement model after offline pre-training.
The lightweight HEVC chrominance image quality enhancement method based on online learning provided in this embodiment is tested as follows, and the test set is:
in this embodiment, a video test sequence in YUV420 format recommended by the official of HEVC is used, where the video test sequence includes 15 video sequences and is divided into four categories, and each category corresponds to a resolution, specifically:
class a, resolution is: 2560 × 1600, sequence name: A1.Traffic、A2.PeopleOnStreet
class B, resolution is: 1920 × 1080, sequence name: B1.Traffic、B2. PeopleOnStreet、B3. Cactus、B4. BQTerrace、B5. BasketballDrive
class C, resolution is: 832 × 480, sequence name: C1.RaceHorses、C2.BQMall、C3.PartyScene、C4.BasketballDrill
class D, resolution is: 416 × 240, sequence name: D1.RaceHorses、D2.BQSquare、D3.BlowingBubbles、D4.BasketballPass
the embodiment performs video compression by using HEVC under 4 QPs (22, 27, 32, 37), and takes a compressed video sequence as a test input of a model;
it should be noted that: in this embodiment, because the GPU has limited video memory, too large pictures cannot be directly sent to the model for testing; therefore, during online learning, a class A diagram with the resolution of 2560 × 1600 is divided into 16 parts of small diagrams (the width and the height of the small diagrams are respectively cut into 1/4 of the original diagram), a class B diagram with the resolution of 1920 × 1080 is divided into 4 parts of small diagrams, and the small diagrams are sequentially sent into a model to be tested, sequentially subjected to model processing and then spliced into the original diagram; in addition, in the present embodiment, the number of online learning times for A, B, C, D-class pictures is set to 2000, 1000, 100, and 100 times, respectively.
In this embodiment, a latest and performance-optimized quality enhancement scheme FQE-CNN (Frame-wise CNN-based filtering for intra-Frame quality enhancement of HEVC video) is adopted as a comparative example, and this embodiment compares the quality enhancement effect, the number of model parameters, the running time of the codec end, and other dimensions, and the test result is as follows:
1. quality enhancement performance
According to the method, the BD-rate popular in the video coding industry is used as an evaluation index, the BD-rate can calculate the code rate which can be saved by a scheme compared with HEVC according to the image quality improvement result under 4 QP, in other words, if the BD-rate is a negative value, the scheme can bring gain; for example, the experimental result of a certain scheme is BD-rate = -10%, which indicates that the data amount required by the scheme under the same image quality is 10% less than that of HEVC;
the average BD-rates under four categories are listed in the table 1, and the table shows that the invention has better comprehensive performance and can save the code rate by 26.9 percent; particularly on the maximum size class A picture, the performance of the invention is obviously superior to FQE-CNN, and the current 2K video with 2560 × 1600 resolution is more and more popular, and the user quantity is more and more; on a B-type picture with a secondary size, the performance of the scheme of the invention is also obviously better, and the resolution ratio is 1080p, namely 1920 multiplied by 1080 is the size with very high utilization rate of the current mobile phone end and the computer end;
table 1: BD-rate results of various class Performance tests
Figure 895825DEST_PATH_IMAGE017
Meanwhile, the table 2 details the individual gain condition of each sequence in high-resolution class a and class B, and as can be seen from the table, the present invention brings better performance gain on each video sequence, and embodies the robustness of the present invention; among them, the highest possible sequence "PeopleOnStreet"a code rate savings of-38.9% is achieved;
table 2: performance test of each video sequence BD-rate results
Figure 915733DEST_PATH_IMAGE018
In conclusion, the Dec-CEN model of the present invention not only has better comprehensive performance, but also has more obvious gain on video sequences with higher resolution, and conforms to the overall trend of high content definition in the current video industry.
2. Quantity of model parameters
Table 3 lists the parameter quantity of the single model of the model and the total model parameter quantity, and the table shows that the parameter quantity of the Dec-CEN single model provided by the invention is less than half of FQE-CNN; meanwhile, in order to be applicable to multiple types of picture input, most existing schemes currently require training several models offline to cope with different types of picture input, for example, FQE-CNN requires training a separate model for each QP and each resolution, so that a total of 4 × 4=16 models are required for picture input of 4 QPs (22, 27, 32, 37) and 4 resolutions (2560 × 1600, 1920 × 1080, 832 × 480, 416 × 240), whereas in the present invention, the adaptive capability brought by online learning enables one Dec-CEN model to be applicable to multiple types of picture input; therefore, the invention reduces the model parameter quantity by 96.98%, which greatly reduces the storage pressure of the decoding end equipment, is very friendly to the decoding end and has stronger practicability.
Table 3: number of model parameters (Unit: thousand k)
Figure 918324DEST_PATH_IMAGE019
3. Decoding side run time
The model running time required by the model at the decoding end is listed in the table 4, and the average running time of the invention is obviously superior to FQE-CNN; moreover, FQE-CNN is more sensitive to resolution and the speed of processing large-size pictures becomes significantly slower.
Table 4: decoding end running time (unit: ms)
Figure 707289DEST_PATH_IMAGE020
4. Run time of encoder side
Table 5 lists the inventive run time Δ on the encoding side relative to HEVCt:Δt=t 0/tt 0For the run time of the Dec-CEN model at the encoding end,ttime originally required for video coding for HEVC; as can be seen, the present invention only increases the encoding end time by a small amount and is superior to the comparison scheme FQE-CNN.
Table 5: relative running time (unit: percentage%) of encoding end
Figure 904177DEST_PATH_IMAGE021
Therefore, the method can effectively enhance the quality of the video frame, namely, the code rate is saved in a phase-changing manner, and the BD-rate is taken as an index, so that the code rate can be saved by 26.9 percent on average, wherein the highest code rate can reach 31.3 percent on the 2k resolution (2560 multiplied by 1600); compared with the prior art, the method reduces 96.98% of model parameters, shortens the running time of a decoding end by 85.7% on average, and particularly shortens the running time of the decoding end by 94.3% on a 1080p resolution (1920 x 1080).
In conclusion, the characteristics of the video coding and decoding task are fully utilized, and based on an online learning framework, the decoding end directly obtains the updated network model without training, so that a better quality enhancement effect is obtained, and the enhancement effect is better on a higher definition video, so that the method conforms to the industry trend of high definition of the current video content; meanwhile, on the premise of better performance, the Dec-CEN model has the advantages of small volume, low complexity, very good friendliness to a decoding end and stronger practicability.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (7)

1. The method for enhancing the quality of the lightweight HEVC chrominance image based on online learning is characterized by comprising the following steps:
configuring the same light-weight HEVC chrominance image quality enhancement model at a coding end and a decoding end respectively, wherein the light-weight HEVC chrominance image quality enhancement model comprises an adaptive layer;
at a coding end, inputting an original image, and obtaining a compressed image after HEVC compression; the method comprises the steps of taking a compressed image as input of a lightweight HEVC chrominance image quality enhancement model, setting a loss function and a learning rate, conducting online learning on an adaptive layer in the lightweight HEVC chrominance image quality enhancement model to obtain updated adaptive layer parameters, and subtracting the updated adaptive layer parameters from initial adaptive layer parameters to obtain adaptive layer parameter residual errors deltaparam(ii) a Adaptive layer parameter residual ΔparamCompressing the code stream into a binary code stream, and transmitting the binary code stream and the compressed image to a decoding end;
at the decoding end, based on the adaptive layer parameter residual ΔparamUpdating adaptive layer parameters in the lightweight HEVC chrominance image quality enhancement model, inputting the compressed image into the updated lightweight HEVC chrominance image quality enhancement model, and outputting a quality enhancement chrominance image by the updated lightweight HEVC chrominance image quality enhancement model.
2. The on-line learning-based lightweight HEVC chrominance image quality enhancement method as claimed in claim 1, wherein a coding end performs one on-line learning at fixed frame intervals, and the learning is stopped when the on-line learning frequency reaches a preset threshold.
3. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 1, wherein said loss function is an L1 norm loss function.
4. The on-line learning-based lightweight HEVC chrominance image quality enhancement method of claim 1, wherein said lightweight HEVC chrominance image quality enhancement model is specifically:
compressing a chrominance image as inputx chroma The convolution layer conv (3 × 3,128), three recursive blocks, and convolution layer conv (3 × 3,1) are obtainedx chroma , 1x chroma , 1Andx chroma adding to obtain a preliminary quality-enhanced intermediate output x chroma , 2
Compressing a luminance image as inputx luma The features of the luminance image are obtained by the feature extraction module, the convolution layer conv (3 × 3,64) with the step length of 2 and the adaptive layer AL (64)x luma,1
Preliminary quality enhancement intermediate output x chroma , 2After convolutional layer conv (3 × 3,64) and adaptive layer AL (64) are processed to obtainx chroma , 3x chroma , 3Andx luma,1after addition, obtaining a quality-enhanced chrominance image through a convolution layer conv (3 × 3,64), an adaptive layer AL (64), a convolution layer conv (3 × 3,32), a convolution layer conv (3 × 3,16), an adaptive layer AL (16) and a convolution layer conv (3 × 3, 1);
here, convolutional layer conv (k × k, n) represents a convolutional layer having a convolutional kernel size of k × k and the number of output channels of n, and adaptive layer al (n) represents an adaptive layer having the number of channels of n.
5. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 4, wherein said recursive block comprises: AU _1 unit, 3 shared parameter AU _2 units, convolution layer conv (3 × 3,64), specifically:
Figure DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,x RB for the input of the recursive block RB,x RB,1is the output of the AU _1 unit,x RB,2is the first AU \The output of the 2-unit cell is,x RB,3is the output of the second AU _2 unit,x RB,4is the output of the third AU _2 unit,y RB for the output of the recursive block RB,βthe coefficient of the gate is represented by,convrepresents a convolution operation;
the AU _1 unit has the same structure as the AU _2 unit, and comprises: 2 convolutional layers conv (1 × 1,128) and 1 convolutional layer conv (3 × 3,128), specifically:
Figure 302807DEST_PATH_IMAGE002
wherein the content of the first and second substances,x AU is an input to the AU unit and,y AU is the output of the AU unit and,Avgrepresents the 3 × 3 averaging operation.
6. The on-line learning-based lightweight HEVC chroma image quality enhancement method of claim 4, wherein said feature extraction module comprises: 4 convolutional layers conv (3 × 3,64), specifically:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,x FEB is an input to the feature extraction module FEB,y FEB for the output of the feature extraction module FEB,x FEB,1is the output of the first convolutional layer conv (3 x 3,64),x FEB,2is the output of the second convolutional layer conv (3 x 3,64),x FEB,3is the output of the third convolutional layer conv (3 x 3,64),x FEB,4is the output of the fourth convolution layer conv (3 × 3, 64).
7. The on-line learning-based lightweight HEVC chrominance image quality enhancement method of claim 4, wherein said adaptive layer is a special packet convolution layer with a convolution kernel size of 1 x1, and specifically comprises:
for an adaptation layer with n channels, the input is
Figure 285194DEST_PATH_IMAGE004
N parameters thereof are
Figure DEST_PATH_IMAGE005
Then the adaptive layer output is:
Figure 861669DEST_PATH_IMAGE006
CN202210097819.9A 2022-01-27 2022-01-27 Lightweight HEVC chrominance image quality enhancement method based on online learning Active CN114119789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210097819.9A CN114119789B (en) 2022-01-27 2022-01-27 Lightweight HEVC chrominance image quality enhancement method based on online learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210097819.9A CN114119789B (en) 2022-01-27 2022-01-27 Lightweight HEVC chrominance image quality enhancement method based on online learning

Publications (2)

Publication Number Publication Date
CN114119789A CN114119789A (en) 2022-03-01
CN114119789B true CN114119789B (en) 2022-05-03

Family

ID=80361367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210097819.9A Active CN114119789B (en) 2022-01-27 2022-01-27 Lightweight HEVC chrominance image quality enhancement method based on online learning

Country Status (1)

Country Link
CN (1) CN114119789B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111432207A (en) * 2020-03-30 2020-07-17 北京航空航天大学 Perceptual high-definition video coding method based on salient target detection and salient guidance
WO2020254723A1 (en) * 2019-06-19 2020-12-24 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160033790A (en) * 2012-01-20 2016-03-28 소니 주식회사 Chroma quantization parameter extension
WO2015098561A1 (en) * 2013-12-27 2015-07-02 ソニー株式会社 Decoding device, decoding method, coding device, and coding method
CN106998470B (en) * 2016-01-25 2020-03-20 华为技术有限公司 Decoding method, encoding method, decoding apparatus, and encoding apparatus
CN106713935B (en) * 2017-01-09 2019-06-11 杭州电子科技大学 A kind of HEVC block division fast method based on Bayesian decision
US11375200B2 (en) * 2019-12-24 2022-06-28 Tencent America LLC Video coding method and device for controlling color component processing
CN113727106B (en) * 2020-05-25 2024-03-26 腾讯科技(深圳)有限公司 Video encoding and decoding methods, devices, electronic equipment and storage medium
CN113362225B (en) * 2021-06-03 2022-06-14 太原科技大学 Multi-description compressed image enhancement method based on residual recursive compensation and feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020254723A1 (en) * 2019-06-19 2020-12-24 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding
CN111432207A (en) * 2020-03-30 2020-07-17 北京航空航天大学 Perceptual high-definition video coding method based on salient target detection and salient guidance

Also Published As

Publication number Publication date
CN114119789A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2020097888A1 (en) Video processing method and apparatus, electronic device, and computer-readable storage medium
CN108495135B (en) Quick coding method for screen content video coding
CN110446041B (en) Video encoding and decoding method, device, system and storage medium
CN108322747B (en) Coding unit division optimization method for ultra-high definition video
JP4593437B2 (en) Video encoding device
JP2003152547A (en) Method of compressing moving image
CN112188196A (en) Method for rapid intra-frame prediction of general video coding based on texture
US20090262126A1 (en) System and Method for Separated Image Compression
CN115118964A (en) Video encoding method, video encoding device, electronic equipment and computer-readable storage medium
CN112509071B (en) Chroma information compression and reconstruction method assisted by luminance information
CN114119789B (en) Lightweight HEVC chrominance image quality enhancement method based on online learning
TWI505717B (en) Joint scalar embedded graphics coding for color images
KR20120100448A (en) Apparatus and method for video encoder using machine learning
CN103313055A (en) Intra-frame prediction method based on segmented chrominance and video coding and decoding method
CN111815502B (en) FPGA acceleration method for multi-graph processing based on WebP compression algorithm
CN114189688B (en) Chrominance component prediction method based on luminance template matching
CN106954074B (en) Video data processing method and device
CN114339263A (en) Lossless processing method for video data
CN111212288B (en) Video data encoding and decoding method and device, computer equipment and storage medium
Yang et al. Graph-convolution network for image compression
CN113709482B (en) Method for determining coding unit division mode in hardware-oriented intra-frame coding mode
CN107277508B (en) Pixel-level bidirectional intra-frame prediction method adopting self-adaptive mode selection
Huang et al. Quality enhancement of screen content video using dual-input CNN
CN116996697B (en) HEVC (high efficiency video coding) frame-oriented video recovery method
WO2023082520A1 (en) Image processing method and apparatus, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant