CN113542780B - Method and device for removing compression artifacts of live webcast video - Google Patents

Method and device for removing compression artifacts of live webcast video Download PDF

Info

Publication number
CN113542780B
CN113542780B CN202110649651.3A CN202110649651A CN113542780B CN 113542780 B CN113542780 B CN 113542780B CN 202110649651 A CN202110649651 A CN 202110649651A CN 113542780 B CN113542780 B CN 113542780B
Authority
CN
China
Prior art keywords
video
module
compression
convolution
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110649651.3A
Other languages
Chinese (zh)
Other versions
CN113542780A (en
Inventor
李嘉锋
高宇麒
张菁
郜征
徐晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110649651.3A priority Critical patent/CN113542780B/en
Publication of CN113542780A publication Critical patent/CN113542780A/en
Application granted granted Critical
Publication of CN113542780B publication Critical patent/CN113542780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • H04N19/865Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness with detection of the former encoding block subdivision in decompressed video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a method for removing compression artifacts of live webcast video, which comprises the following steps: acquiring a compressed video; inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video; and the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution so as to generate the high-quality recovered video. The invention can recover the compressed video by using a single network model under the condition of unknown compression code rate, thereby providing the high-quality network live video.

Description

Method and device for removing compression artifacts of live webcast video
Technical Field
The present invention relates to the field of digital image/video signal processing, and more particularly, to a method and an apparatus for removing compression artifacts of live webcast video.
Background
In recent years, with the development of video capture devices, the resolution of webcast video becomes larger and larger, and the transmission bandwidth and the storage space occupied by the webcast video also become larger and larger. In order to save the transmission and storage cost of live video, high-quality video needs to be compressed. In the process of encoding video, lossy compression algorithms, such as the common h.264 algorithm, are generally used.
The h.264 algorithm can adjust the video compression quality by controlling the video bitrate, however, the lower video bitrate can cause compression artifacts such as compression blocking, ringing, blurring, and aliasing in the video while greatly reducing the video volume. These factors can cause the quality of the video to be severely degraded, which affects the subjective viewing experience of the user. Meanwhile, videos with a large amount of compression artifacts influence subsequent intelligent analysis processing such as target detection, image classification and image segmentation. Therefore, it is significant to effectively remove the compression artifacts by performing post-processing on the video with the compression artifacts caused by the high compression ratio.
In many compression artifact removal efforts, both methods based on unknown compression rates and methods based on known compression rates are considered, where known rate methods assume that the compression rates are known and generally perform better than a single network trained using video with unknown compression rates. The known bitrate method has a great disadvantage in that it requires training of multiple networks specifically for different compression bitrates to recover the compressed video. Therefore, it usually takes up a lot of memory and there may be redundancy between models for similar compression code rates. Some existing unknown code rate methods improve the receptive field of the network by deepening the depth of the network, so that the network can adapt to artifacts caused by various compression code rates. However, deeper networks cannot guarantee real-time live video processing.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a compression artifact removing method and a compression artifact removing device for live network video, which can recover the compressed video by using a single network model in a video frame mode under the condition of unknown compression code rate, thereby providing high-quality live network video.
Specifically, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for removing a compression artifact of a live webcast video, where the method includes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality recovery video which is output by the compression artifact removal model and corresponds to the compressed video; and the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution so as to generate the high-quality recovered video.
Further, the recurrent neural network includes: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module.
Further, the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions of different expansion rates, a multi-scale feature fusion layer and dense residual connection.
Further, the input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network; the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration; the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information; dense residual connection is used between every two dilation convolutions with the same dilation rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network; and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.
Further, the bottom layer feature extraction module includes two convolution layers Conv1 and Conv2, wherein the image shallow feature information extracted by the bottom layer feature extraction module and the high-level feature information extracted by the loop module in the previous iteration are spliced according to the channel dimension and then input into the loop module, and the loop module is used for further processing to extract the high-level semantic feature information of the image.
Further, the image reconstruction module includes a convolution layer Conv3, and the convolution layer Conv3 reconstructs the features extracted by the loop module, so as to obtain residual information for restoring the high-quality video.
Further, the skip connection adds the high-quality video residual information to the low-quality input video to obtain a high-quality restored video from which the compressed artifact is removed.
Further, all activation functions in the recurrent neural network use the PReLU activation function.
Further, the method further comprises: training the compression artifact removal model, wherein the training the compression artifact removal model comprises: performing encoding compression on the original video by using an H.264 algorithm; converting the original video and the video after coding compression into video frames to form a paired video sample library; training the compression artifact removal model based on the paired video sample library.
In a second aspect, the present invention provides an apparatus for removing a compression artifact of a live webcast video, including: a compressed video acquisition unit for acquiring a compressed video; and a compression artifact removing unit, configured to input the compressed video into a compression artifact removing model to obtain a high-quality restored video output by the compression artifact removing model and corresponding to the compressed video, where the compression artifact removing model recovers the compressed video with an unknown compression bitrate and containing a compression artifact by using a recurrent neural network RNN and an expansion convolution, so as to generate the high-quality restored video.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the method for removing compression artifacts in webcast video according to any one of the first aspect.
In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for removing compression artifacts of live webcast video according to any one of the first aspects.
According to the invention, the compressed video is input into the compression artifact removal model by training the compression artifact removal model, and the compressed video with unknown compression code rate and containing the compression artifact is recovered by utilizing the recurrent neural network RNN and the expansion convolution, so that the high-quality recovered video is obtained, therefore, the compressed video can be recovered by using a single network model under the condition of unknown compression code rate, and the high-quality live network video is provided more economically and effectively.
Drawings
In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram of a method of compression artifact removal for live webcast video according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of the overall architecture of a recurrent neural network, in accordance with one embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of the recurrent block of the recurrent neural network, according to one embodiment of the present invention;
FIG. 4 is a flow diagram of a method of training a compression artifact removal model according to one embodiment of the invention;
fig. 5 is a schematic diagram of a compression artifact removal apparatus for webcast video according to another embodiment of the present invention; and
fig. 6 is a schematic structural diagram of an electronic device according to still another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Fig. 1 is a flowchart of a compression artifact removal method for webcast video according to an embodiment of the present invention. Referring to fig. 1, the method may include the steps of:
step 101: acquiring a compressed video;
step 102: inputting the compressed video into a compression artifact removing model to obtain a high-quality recovered video which is output by the compression artifact removing model and corresponds to the compressed video,
and the compression artifact removal model recovers the compressed video with unknown compression code rate and containing the compression artifact by utilizing a Recurrent Neural Network (RNN) and an expansion convolution so as to generate the high-quality recovered video.
Specifically, in this embodiment, it should be noted that, in step 101, the compressed video may be a compressed video with an unknown compression code rate and containing compression artifacts. Compression artifacts are created when video is compressed significantly, including compression blocking, ringing, blurring, aliasing, and the like. Compression artifacts can cause severe degradation in the quality of the video, affecting the user's subjective viewing experience. Meanwhile, videos with a large amount of compression artifacts influence subsequent intelligent analysis processing such as target detection, image classification and image segmentation.
In step 102, the compression artifact removal model may recover a compressed video with an unknown compression rate and containing compression artifacts by using a recurrent neural network RNN and an expansion convolution, so as to generate a high-quality recovered video.
Specifically, the recurrent neural network includes: the network uses the circulation module to realize an iteration mechanism of circular convolution, and the whole structure of the network is shown in figure 2. The parameters of each layer in the overall structure are shown in table 1.
TABLE 1 parameters per layer in the overall network architecture
Figure BDA0003111233830000061
The network circulates T times, and a compressed image I is input in each circulation LQ (ii) a Network finally outputs restored image
Figure BDA0003111233830000062
For a total of T images. All activation functions in the network use a PReLU (Parametric Rectified Linear Unit) activation function, which is shown in equation (1):
Figure BDA0003111233830000063
when x is less than or equal to 0, the slope a is a learnable parameter and is updated according to the gradient of network iteration.
Taking the t-th iteration as an example, the network inputs the compressed image I LQ ,I LQ Is hxwx 3, where H is the low quality image height and W is the low quality image width. To I LQ Firstly, extracting shallow layer characteristics H by using a bottom layer characteristic extraction module product 0 Which comprises a Conv1 and a Conv2 two-layer convolution. In Conv1, firstly carrying out convolution with 256 convolution kernels with the step size of 1 and the size of 3 multiplied by 3, and then carrying out a PReLU activation module to obtain a characteristic diagram with the size of H multiplied by W multiplied by 256; in Conv2, a feature map with a size of H × W × 64 is obtained through convolution with 64 convolution kernels with a size of 1 × 1 and through a PReLU activation module. Extracting the feature map H after two layers of convolution 0 And feeding into a circulating module. The input and output of the loop module can be represented by equation (2):
Figure BDA0003111233830000064
where concat () represents splicing by channel directionOperation f RB Representing the extraction of features using a rotation module. The network outputs the cyclic module when the previous cycle is carried out
Figure BDA0003111233830000071
And shallow layer characteristic H extracted by network in current cycle 0 Splicing, and then sending into a circulating module for processing. Output of circulation module
Figure BDA0003111233830000072
For the input of the loop module at the next loop. It is noted that, when the cycle t =1,
Figure BDA0003111233830000073
namely:
Figure BDA0003111233830000074
the output of the circulation module is
Figure BDA0003111233830000075
The size is H × W × 64. The loop module retains the current output
Figure BDA0003111233830000076
For the t +1 th iteration and outputting the current
Figure BDA0003111233830000077
Passing through an image reconstruction module, which contains 1 convolution Conv3,
Figure BDA0003111233830000078
after a convolution kernel of 3 × 3 with a step size of 1, a feature map of H × W × 3 is obtained. Finally, the feature map is added to the low-quality image I in the form of a residual LQ To generate a final high-quality restored image
Figure BDA0003111233830000079
As described above, the network first extracts shallow features of the image using an underlying feature extraction module, which consists of two convolutional layers Conv1 and Conv 2. And splicing the extracted image shallow feature information and the high-level feature information extracted by the loop module during the previous iteration of the network according to the channel dimensions, inputting the spliced image shallow feature information and the high-level feature information into the loop module, and further processing by using the loop module to extract the high-level semantic feature information of the image. And the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module in the next network iteration and the input of the image reconstruction module in the current iteration. The network reconstruction module consists of a convolution layer Conv3 and is used for reconstructing the features extracted by the loop module so as to obtain residual information capable of restoring high-quality images. And finally, the network uses a jump layer connection structure to add the high-quality image residual error information generated by the network and the low-quality input image to obtain a high-quality recovery image after the network is used for compression artifact removal.
The circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions of different expansion rates, a multi-scale feature fusion layer and dense residual connection. Specifically, the structure of the circulation module designed by the invention is shown in fig. 3, and parameters of each layer in the circulation module are shown in tables 2-1 to 2-4. Taking the circulation module used in the t-th iteration as an example, the circulation module inputs the shallow feature H extracted by the current iteration network 0 And hidden states generated by the loop module at the t-1 st iteration
Figure BDA0003111233830000081
H 0 And
Figure BDA0003111233830000082
splicing according to the channel direction to obtain a characteristic diagram with the size of H multiplied by W multiplied by 128; and then carrying out ConvA convolution by convolution kernels with the sizes of 1 × 1 and 64 step lengths of 1, and then carrying out a PReLU activation module to obtain a characteristic diagram with the size of H × W × 64. Inputting the characteristic diagram after the channel adjustment into N series convolution groups, wherein the characteristic diagram output by each convolution group
Figure BDA0003111233830000083
The size is H × W × 64. Then, the outputs of the N convolution groups are spliced according to the channel direction to obtain the H multiplied by W multiplied by 64 multiplied by N]A characteristic diagram of (c). Finally, performing ConvB convolution on the spliced feature map by 64 convolution kernels with the step length of 1 and the size of 1 multiplied by 1, performing the ConvB convolution on the spliced feature map by a PReLU activation module to obtain a feature map with the size of H multiplied by W multiplied by 64, and outputting the feature map as the output of the circulation module of the t iteration
Figure BDA0003111233830000084
The process is shown in the formula (4),
Figure BDA0003111233830000085
each convolution group extracts feature maps of different receptive fields using three dilation convolution layers ConvS, convM, convL with dilation rates 1, 2, 3, respectively, and includes a multi-scale feature fusion module ConvF, and dense residual connections are added between convolutions of the same dilation rate in each convolution group. In addition to the first convolution group, each convolution group uses one 1 × 1 convolution Conv1, conv2, conv3 before the dilation convolution for integrating the number of channels. Taking the nth convolution group at the t iteration as an example, the input of the convolution group
Figure BDA0003111233830000086
As output of the last convolution group
Figure BDA0003111233830000087
Wherein when n =1, the input of the loop module is: fusing shallow layer characteristics H extracted from the current circulation by convolution with 64 convolution kernels with the size of 1 multiplied by 1 0 Higher layer features retained by the loop module on the last loop
Figure BDA0003111233830000088
As shown in formula (5).
Figure BDA0003111233830000089
Convolution group input n-1 th convolution group output
Figure BDA00031112338300000810
And the output of the convolutional layer with expansion rate of 1 in the first n-1 convolutional groups: [ S ] 1 ,S 2 ,…,S n-1 ]Output of convolutional layer with expansion ratio of 2: [ M ] 1 ,M 2 ,…,M n-1 ]And the output of the convolutional layer with expansion ratio of 3: [ L ] 1 ,L 2 ,…,L n-1 ]. The first parallel convolution: firstly to S 1 ,S 2 ,…,S n-1 And
Figure BDA00031112338300000811
splicing according to the channel direction, then convolving by 64 convolution kernels Conv1 with the step length of 1 and the size of 1 multiplied by 1, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by 64 through a PReLU activation module; then, the feature map is convolved by convolution kernels ConvS with the size of 3 × 3 and the expansion coefficient of 1, the size of the convolution kernels is 64 step lengths of 1, and the feature map S with the size of H × W × 64 is obtained through a PReLU activation module n . The second parallel convolution: firstly to M 1 ,M 2 ,…,M n-1 And
Figure BDA0003111233830000091
splicing according to the channel direction, then convolving by 64 convolution kernels Conv2 with the step length of 1 and the size of 1 multiplied by 1, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by 64 through a PReLU activation module; then, the feature map is convolved by convolution kernels ConvM with the size of 3 × 3 and the expansion coefficient of 2 and with 64 step lengths of 1, and then the feature map M with the size of H × W × 64 is obtained through a PReLU activation module n . The third parallel convolution: firstly to L 1 ,L 2 ,…,L n-1 And
Figure BDA0003111233830000092
splicing according to the channel direction, then convolving by 64 convolution kernels Conv3 with the step length of 1 and the size of 1 multiplied by 1, and obtaining the size of H through a PReLU activation moduleA W × 64 signature; then, the feature map is convolved by convolution kernels ConvL with 3 × 3 size and expansion coefficient of 3 and with 64 step lengths of 1, and then the feature map L with the size of H × W × 64 is obtained through a PReLU activation module n . And finally, fusing the multi-scale features extracted by the three parallel convolutions by using a multi-scale feature fusion module, splicing the three multi-scale features according to the channel direction to obtain a feature map with the size of H multiplied by W multiplied by (64 multiplied by 3), performing ConvF convolution on the feature map by using convolution kernels with the sizes of 1 multiplied by 1 and with 64 step lengths, and activating by using the PReLU to obtain the output of a convolution group
Figure BDA0003111233830000093
The process is shown as formula (6) and formula (7):
Figure BDA0003111233830000094
Figure BDA0003111233830000095
wherein the content of the first and second substances,
Figure BDA0003111233830000096
the superscript of (a) indicates the expansion ratio of the convolution kernel.
TABLE 2-1 parameters for each layer in the Loop Module
Figure BDA0003111233830000101
TABLE 2-2 parameters for each layer in the Loop Module
Figure BDA0003111233830000102
TABLE 2-3 parameters for each layer in the Loop Module
Figure BDA0003111233830000111
TABLE 2-4 parameters for each layer in the Loop Module
Figure BDA0003111233830000112
From the above, the input of the loop module is the output generated by the loop module during the previous iteration, and the output is spliced with the image shallow feature extracted from the current network. The number of channels was adjusted using convolutional layer ConvA. In order to improve the receptive field of the network without increasing the number of parameters of the network and to sufficiently extract detailed information of the image, the present invention uses three convolution kernels ConvS, convM, and ConvL with different expansion rates in each convolution group to perform parallel convolution on the input. The convolution kernel ConvS with the small expansion rate can sufficiently extract detail information of the image; the convolution kernels ConvM and ConvL with large expansion rate can improve the receptive field of the network and obtain local information. In all convolution groups, dense residual connection is used between every two dilation convolutions with the same dilation rate so as to ensure that image information existing in low-quality images can freely flow in the network. Thus, in addition to the first convolution bank, each convolution bank also uses a convolution Conv1, conv2 and Conv3 for the number of integrated channels before the dilation convolution. And for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group. The cyclic module comprises N convolution groups. And finally, splicing the outputs of the N convolution groups according to the channel direction, adjusting an output channel by using one convolution layer ConvB, and generating the output of the circulation module.
FIG. 4 is a flow diagram of a method of training a compression artifact removal model according to one embodiment of the invention.
Step 401: performing encoding compression on the original video by using an H.264 algorithm;
step 402: converting the original video and the video after coding compression into video frames to form a paired video sample library;
step 403: training the compression artifact removal model based on the paired video sample library.
Specifically, the original high-definition videos in the paired video sample libraries (including the training set, the verification set and the test set) used by the invention are obtained by crawling the network live videos, wherein the original high-definition videos comprise four types of live videos such as live indoor broadcast, live outdoor broadcast, live game broadcast, movie playing and the like. To generate paired training and validation samples, the original data set is h.264 encoded, generating a low quality compressed video. The original video resolution is 1280 × 720, and since the compression artifacts in the video are not obvious when the compression code rate is greater than 1000Kbps, the compression code rate range is selected to be [1000kbps,128kbps ] when H.264 coding is carried out. Because the degradation degrees of the compressed videos with similar video code rates are similar, when low-quality videos are manufactured, the compression code rates are randomly selected from {1000Kbps,512Kbps,256Kbps and 128Kbps }. In addition, since the present invention does not use temporal correlation to the video, the original video and the encoded compressed video are converted into video frames and constructed as paired samples. The original high-definition video comprises 558 segments of video, and each segment of high-definition video corresponds to 4 segments of compressed video. Selecting 550 sections of high-definition videos and 2200 sections of corresponding compressed videos as training sets; and selecting 2 sections of high-definition videos and corresponding 8 sections of compressed videos for each type, and respectively using the videos as a verification set and a test set.
For the training set, each video randomly selects the same 4 frames of images, so that the training set has 8800 pairs of images; for the verification set, each video randomly selects the same 4 frames of images, so that the verification set has 32 pairs of images; and for the test set, directly converting the low-quality video after the high-definition video is compressed into video frames.
The method trains the constructed cyclic convolution neural network by using a training set and a verification set in a paired video sample library. Network random input training set low quality video frame LLQ, using bottom layer characteristic extraction module to extract image shallow information H 0 . The recurrent neural network can deepen the depth of the network and improve the receptive field and the nonlinear expression capability of the network in a recurrent mode under the condition of not increasing network parameters, therebyThe whole network is cycled T times. Except for the first circulation, when the network circulates each time, the input of the circulation module is the shallow information H of the image extracted by the current network bottom layer feature extraction module 0 And the output of the cycle module at the last cycle; for the first loop, because the last iteration process does not exist, the input of the loop module is the shallow information H of the image extracted by the current network bottom layer feature extraction module 0 And H 0 Splicing. The output of the loop block is used in the next iteration, in addition to reconstructing the image, to refine the underlying information representation in the next iteration. Since the network is cycled through T times, each cycle producing an output, the network produces T high quality restored images in total, i.e.
Figure BDA0003111233830000131
And (3) after calculating loss of the network generated image by using the high-definition video frame, solving a minimum loss function by using a gradient descent algorithm. The network adopts an Adam gradient descent algorithm, the momentum is set to be 0.9, the learning rate is set to be 0.0001, the performance of the network is verified by using a verification set once training, the learning rate of the network is halved once training is performed for 200 times, the training is stopped when the preset maximum iteration times (1000 times) are reached after repeated iteration, and finally the compression artifact removal model is obtained.
Additionally, the present invention uses circular convolution for the purpose of optimizing the underlying representation using the high-level features of the image extracted by the circular module. During the training of the network, each output image is supervised by using a corresponding lossless image GT, and the network uses L2 loss as a loss function, so that a network loss function formula designed by the invention is shown as an equation (8):
Figure BDA0003111233830000141
after the network training is finished, the test data set constructed in the establishment of the H.264 coding paired video sample base is used for removing video compression artifacts. And sending the low-quality compressed video to be restored with the artifacts into a network in an image frame mode, wherein the output result is a high-quality restored image. In the process of removing the video compression artifacts, the iteration times of the network can be adjusted according to the calculation performance of hardware so as to achieve the balance between the network performance and the operation time, but the iteration times of the network during testing are required to be less than or equal to the iteration times of the network during training.
From the above, the invention removes the compression artifact in the H.264 coded video with unknown compression code rate through the recurrent neural network, and recovers the high-quality video. Firstly, establishing an end-to-end compression video artifact removing scheme with unknown compression code rate by utilizing a cyclic neural network, and training by using paired low-quality and high-quality video frames; secondly, by utilizing the multi-scale characteristic diagram extracted by the expansion convolution, the receptive field of the network can be improved, the network can eliminate the image artifacts in a large range, and the detail characteristic can be extracted, so that the network can eliminate the artifacts and keep the details at the same time. The method can effectively improve the network performance, improve the generalization capability of the network, and can realize the recovery of the video coded by the compression code rate in all ranges by using a single network. Moreover, the cyclic neural network adopted by the invention can not only improve the depth of the network without increasing the number of parameters and effectively improve the expression capability and nonlinear modeling capability of the network, but also adjust the cycle number of the network according to the limitation of hardware performance to achieve the balance of network performance and operation time, thereby being convenient for the application of the method in practice.
Fig. 5 is a schematic diagram of a compression artifact removal apparatus for webcast video according to another embodiment of the present invention.
In this embodiment, it should be noted that, referring to fig. 5, the apparatus for removing compression artifacts of webcast video according to an embodiment of the present invention may include a compressed video obtaining unit 501, configured to obtain a compressed video; a compression artifact removing unit 502, configured to input the compressed video into a compression artifact removing model to obtain a high-quality recovered video corresponding to the compressed video and output by the compression artifact removing model,
and the compression artifact removal model recovers the compressed video with unknown compression code rate and containing the compression artifact by utilizing a Recurrent Neural Network (RNN) and an expansion convolution so as to generate the high-quality recovered video.
Since the apparatus for removing the compression artifact of the live webcast video according to the embodiment of the present invention can be used to execute the method for removing the compression artifact of the live webcast video according to the embodiment, and the working principle and the beneficial effects are similar, detailed descriptions are omitted here, and specific contents can be found in the description of the embodiment.
In this embodiment, it should be noted that each unit in the apparatus according to the embodiment of the present invention may be integrated into one body, or may be separately disposed. The units may be combined into one unit, or further divided into a plurality of sub-units.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 6: a processor 601, a memory 602, a communication interface 603, and a communication bus 604; the processor 601, the memory 602, and the communication interface 603 complete communication with each other through the communication bus 604.
The processor 601 is configured to call a computer program in the memory 602, and when the processor executes the computer program, the processor implements all the steps of the above method for removing compression artifacts of live webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality restored video which is output by the compression artifact removal model and corresponds to the compressed video,
and the compression artifact removal model recovers the compressed video with unknown compression code rate and containing the compression artifact by utilizing a Recurrent Neural Network (RNN) and an expansion convolution so as to generate the high-quality recovered video.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements all the steps of the above-mentioned method for removing compression artifacts of webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality restored video which is output by the compression artifact removal model and corresponds to the compressed video,
and the compression artifact removal model recovers the compressed video with unknown compression code rate and containing the compression artifact by utilizing a Recurrent Neural Network (RNN) and an expansion convolution so as to generate the high-quality recovered video.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
Based on the same inventive concept, another embodiment of the present invention provides a computer program product, which includes a computer program, when the computer program is executed by a processor, the computer program implements all the steps of the above-mentioned method for removing compression artifacts of webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; and inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video, wherein the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution to generate the high-quality recovery video.
It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the foregoing technical solutions may be substantially or partially embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method for removing compression artifacts of webcast video according to various embodiments or some portions of the embodiments.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present disclosure, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for removing compression artifacts of live webcast video is characterized by comprising the following steps:
acquiring a compressed video;
inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video;
wherein the compression artifact removal model recovers the compressed video with unknown compression code rate and containing compression artifacts by using a Recurrent Neural Network (RNN) and an expansion convolution to generate the high-quality recovered video,
wherein the recurrent neural network comprises: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module; the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions with different expansion rates, a multi-scale feature fusion layer and dense residual connection; and is
The input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network;
the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration;
the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information;
dense residual connection is used between every two expansion convolutions with the same expansion rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network;
and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.
2. The method of claim 1, wherein the bottom layer feature extraction module comprises two convolution layers Conv1 and Conv2,
the image shallow layer feature information extracted by the bottom layer feature extraction module and the high-level feature information extracted by the circulation module in the previous iteration are spliced according to the channel dimensions and then input into the circulation module, and the circulation module is used for further processing so as to extract the high-level semantic feature information of the image.
3. The method according to claim 1, wherein the image reconstruction module includes a convolutional layer Conv3, and the convolutional layer Conv3 reconstructs the features extracted by the loop module, so as to obtain residual information for restoring high-quality video.
4. The method of claim 1, wherein the skip connection adds the high quality video residual information to the low quality input video to obtain a high quality restored video after removing the compressed artifact.
5. The method for removing compression artifacts of live webcast video according to any of claims 1 to 4, wherein all activation functions in the recurrent neural network use a PReLU activation function.
6. The method of claim 1, wherein the method further comprises: the compression artifact removal model is trained to,
wherein said training said compression artifact removal model comprises:
the original video is coded and compressed by using an H.264 algorithm;
converting the original video and the video after coding compression into video frames to form a paired video sample library;
training the compression artifact removal model based on the paired video sample library.
7. A compression artifact removal apparatus for webcast video, comprising:
a compressed video acquisition unit for acquiring a compressed video; and
a compression artifact removing unit, configured to input the compressed video into a compression artifact removing model, so as to obtain a high-quality restored video output by the compression artifact removing model and corresponding to the compressed video, where the compression artifact removing model restores the compressed video with an unknown compression code rate and containing a compression artifact by using a recurrent neural network RNN and an expansion convolution, so as to generate the high-quality restored video, and the recurrent neural network includes: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module; the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions with different expansion rates, a multi-scale feature fusion layer and dense residual connection; the input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network; the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration; the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information; dense residual connection is used between every two dilation convolutions with the same dilation rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network; and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for compression artifact removal for live webcast video according to any of claims 1 to 6 when executing the computer program.
9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for compression artifact removal for webcast video according to any of claims 1 to 6.
CN202110649651.3A 2021-06-10 2021-06-10 Method and device for removing compression artifacts of live webcast video Active CN113542780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110649651.3A CN113542780B (en) 2021-06-10 2021-06-10 Method and device for removing compression artifacts of live webcast video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110649651.3A CN113542780B (en) 2021-06-10 2021-06-10 Method and device for removing compression artifacts of live webcast video

Publications (2)

Publication Number Publication Date
CN113542780A CN113542780A (en) 2021-10-22
CN113542780B true CN113542780B (en) 2023-01-20

Family

ID=78124817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110649651.3A Active CN113542780B (en) 2021-06-10 2021-06-10 Method and device for removing compression artifacts of live webcast video

Country Status (1)

Country Link
CN (1) CN113542780B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230188759A1 (en) * 2021-12-14 2023-06-15 Spectrum Optix Inc. Neural Network Assisted Removal of Video Compression Artifacts

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2418093A (en) * 2004-09-09 2006-03-15 Imagination Tech Ltd Method for removing blocky compression artefacts
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN111861926A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on airspace group enhancement mechanism and long-time and short-time memory network
CN112102176A (en) * 2020-07-27 2020-12-18 中山大学 Image rain removing method based on multi-scale intensive mixed attention neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2418093A (en) * 2004-09-09 2006-03-15 Imagination Tech Ltd Method for removing blocky compression artefacts
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN111861926A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on airspace group enhancement mechanism and long-time and short-time memory network
CN112102176A (en) * 2020-07-27 2020-12-18 中山大学 Image rain removing method based on multi-scale intensive mixed attention neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-Scale Dilated Convolution Neural Network for Image Artifact Correction of Limited-Angle Tomography;Haichuan Zhou等;《IEEE Access 》;20191224;全文 *
基于多通道多尺度卷积神经网络的单幅图像去雨方法;柳长源等;《电子与信息学报》;20200915(第09期);全文 *
多尺度卷积神经网络的单幅图像去雨方法;郭继昌等;《哈尔滨工业大学学报》;20180330(第03期);全文 *

Also Published As

Publication number Publication date
CN113542780A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
KR102082816B1 (en) Method for improving the resolution of streaming files
CN110933429B (en) Video compression sensing and reconstruction method and device based on deep neural network
US20180130178A1 (en) Enhancing Visual Data Using Strided Convolutions
US20180139458A1 (en) Training end-to-end video processes
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
CN110222758B (en) Image processing method, device, equipment and storage medium
CN110751649B (en) Video quality evaluation method and device, electronic equipment and storage medium
US11062210B2 (en) Method and apparatus for training a neural network used for denoising
JP7143529B2 (en) IMAGE RESTORATION METHOD AND DEVICE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
CN116233445B (en) Video encoding and decoding processing method and device, computer equipment and storage medium
Wang et al. Semantic perceptual image compression with a laplacian pyramid of convolutional networks
CN111539290A (en) Video motion recognition method and device, electronic equipment and storage medium
CN111047543A (en) Image enhancement method, device and storage medium
US20220414838A1 (en) Image dehazing method and system based on cyclegan
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN116740204A (en) Method, device, equipment and storage medium for generating stylized image generation model
CN109949234A (en) Video restoration model training method and video restoration method based on depth network
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
CN116309890A (en) Model generation method, stylized image generation method and device and electronic equipment
Fuoli et al. NTIRE 2020 challenge on video quality mapping: Methods and results
CN113542780B (en) Method and device for removing compression artifacts of live webcast video
CN114926336A (en) Video super-resolution reconstruction method and device, computer equipment and storage medium
CN113033616B (en) High-quality video reconstruction method, device, equipment and storage medium
CN113518229B (en) Method and device for training loop filter network, computer equipment and storage medium
CN116264606A (en) Method, apparatus and computer program product for processing video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant