CN113542780B

CN113542780B - Method and device for removing compression artifacts of live webcast video

Info

Publication number: CN113542780B
Application number: CN202110649651.3A
Authority: CN
Inventors: 李嘉锋; 高宇麒; 张菁; 郜征; 徐晗
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2023-01-20
Anticipated expiration: 2041-06-10
Also published as: CN113542780A

Abstract

The invention provides a method for removing compression artifacts of live webcast video, which comprises the following steps: acquiring a compressed video; inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video; and the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution so as to generate the high-quality recovered video. The invention can recover the compressed video by using a single network model under the condition of unknown compression code rate, thereby providing the high-quality network live video.

Description

Method and device for removing compression artifacts of live webcast video

Technical Field

The present invention relates to the field of digital image/video signal processing, and more particularly, to a method and an apparatus for removing compression artifacts of live webcast video.

Background

In recent years, with the development of video capture devices, the resolution of webcast video becomes larger and larger, and the transmission bandwidth and the storage space occupied by the webcast video also become larger and larger. In order to save the transmission and storage cost of live video, high-quality video needs to be compressed. In the process of encoding video, lossy compression algorithms, such as the common h.264 algorithm, are generally used.

The h.264 algorithm can adjust the video compression quality by controlling the video bitrate, however, the lower video bitrate can cause compression artifacts such as compression blocking, ringing, blurring, and aliasing in the video while greatly reducing the video volume. These factors can cause the quality of the video to be severely degraded, which affects the subjective viewing experience of the user. Meanwhile, videos with a large amount of compression artifacts influence subsequent intelligent analysis processing such as target detection, image classification and image segmentation. Therefore, it is significant to effectively remove the compression artifacts by performing post-processing on the video with the compression artifacts caused by the high compression ratio.

In many compression artifact removal efforts, both methods based on unknown compression rates and methods based on known compression rates are considered, where known rate methods assume that the compression rates are known and generally perform better than a single network trained using video with unknown compression rates. The known bitrate method has a great disadvantage in that it requires training of multiple networks specifically for different compression bitrates to recover the compressed video. Therefore, it usually takes up a lot of memory and there may be redundancy between models for similar compression code rates. Some existing unknown code rate methods improve the receptive field of the network by deepening the depth of the network, so that the network can adapt to artifacts caused by various compression code rates. However, deeper networks cannot guarantee real-time live video processing.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a compression artifact removing method and a compression artifact removing device for live network video, which can recover the compressed video by using a single network model in a video frame mode under the condition of unknown compression code rate, thereby providing high-quality live network video.

Specifically, the invention is realized by the following technical scheme:

in a first aspect, the present invention provides a method for removing a compression artifact of a live webcast video, where the method includes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality recovery video which is output by the compression artifact removal model and corresponds to the compressed video; and the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution so as to generate the high-quality recovered video.

Further, the recurrent neural network includes: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module.

Further, the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions of different expansion rates, a multi-scale feature fusion layer and dense residual connection.

Further, the input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network; the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration; the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information; dense residual connection is used between every two dilation convolutions with the same dilation rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network; and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.

Further, the bottom layer feature extraction module includes two convolution layers Conv1 and Conv2, wherein the image shallow feature information extracted by the bottom layer feature extraction module and the high-level feature information extracted by the loop module in the previous iteration are spliced according to the channel dimension and then input into the loop module, and the loop module is used for further processing to extract the high-level semantic feature information of the image.

Further, the image reconstruction module includes a convolution layer Conv3, and the convolution layer Conv3 reconstructs the features extracted by the loop module, so as to obtain residual information for restoring the high-quality video.

Further, the skip connection adds the high-quality video residual information to the low-quality input video to obtain a high-quality restored video from which the compressed artifact is removed.

Further, all activation functions in the recurrent neural network use the PReLU activation function.

Further, the method further comprises: training the compression artifact removal model, wherein the training the compression artifact removal model comprises: performing encoding compression on the original video by using an H.264 algorithm; converting the original video and the video after coding compression into video frames to form a paired video sample library; training the compression artifact removal model based on the paired video sample library.

In a second aspect, the present invention provides an apparatus for removing a compression artifact of a live webcast video, including: a compressed video acquisition unit for acquiring a compressed video; and a compression artifact removing unit, configured to input the compressed video into a compression artifact removing model to obtain a high-quality restored video output by the compression artifact removing model and corresponding to the compressed video, where the compression artifact removing model recovers the compressed video with an unknown compression bitrate and containing a compression artifact by using a recurrent neural network RNN and an expansion convolution, so as to generate the high-quality restored video.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the method for removing compression artifacts in webcast video according to any one of the first aspect.

In a fourth aspect, the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for removing compression artifacts of live webcast video according to any one of the first aspects.

According to the invention, the compressed video is input into the compression artifact removal model by training the compression artifact removal model, and the compressed video with unknown compression code rate and containing the compression artifact is recovered by utilizing the recurrent neural network RNN and the expansion convolution, so that the high-quality recovered video is obtained, therefore, the compressed video can be recovered by using a single network model under the condition of unknown compression code rate, and the high-quality live network video is provided more economically and effectively.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a method of compression artifact removal for live webcast video according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of the overall architecture of a recurrent neural network, in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of the structure of the recurrent block of the recurrent neural network, according to one embodiment of the present invention;

FIG. 4 is a flow diagram of a method of training a compression artifact removal model according to one embodiment of the invention;

fig. 5 is a schematic diagram of a compression artifact removal apparatus for webcast video according to another embodiment of the present invention; and

fig. 6 is a schematic structural diagram of an electronic device according to still another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Fig. 1 is a flowchart of a compression artifact removal method for webcast video according to an embodiment of the present invention. Referring to fig. 1, the method may include the steps of:

step 101: acquiring a compressed video;

step 102: inputting the compressed video into a compression artifact removing model to obtain a high-quality recovered video which is output by the compression artifact removing model and corresponds to the compressed video,

and the compression artifact removal model recovers the compressed video with unknown compression code rate and containing the compression artifact by utilizing a Recurrent Neural Network (RNN) and an expansion convolution so as to generate the high-quality recovered video.

Specifically, in this embodiment, it should be noted that, in step 101, the compressed video may be a compressed video with an unknown compression code rate and containing compression artifacts. Compression artifacts are created when video is compressed significantly, including compression blocking, ringing, blurring, aliasing, and the like. Compression artifacts can cause severe degradation in the quality of the video, affecting the user's subjective viewing experience. Meanwhile, videos with a large amount of compression artifacts influence subsequent intelligent analysis processing such as target detection, image classification and image segmentation.

In step 102, the compression artifact removal model may recover a compressed video with an unknown compression rate and containing compression artifacts by using a recurrent neural network RNN and an expansion convolution, so as to generate a high-quality recovered video.

Specifically, the recurrent neural network includes: the network uses the circulation module to realize an iteration mechanism of circular convolution, and the whole structure of the network is shown in figure 2. The parameters of each layer in the overall structure are shown in table 1.

TABLE 1 parameters per layer in the overall network architecture

The network circulates T times, and a compressed image I is input in each circulation _LQ (ii) a Network finally outputs restored image

For a total of T images. All activation functions in the network use a PReLU (Parametric Rectified Linear Unit) activation function, which is shown in equation (1):

when x is less than or equal to 0, the slope a is a learnable parameter and is updated according to the gradient of network iteration.

Taking the t-th iteration as an example, the network inputs the compressed image I _LQ ，I _LQ Is hxwx 3, where H is the low quality image height and W is the low quality image width. To I _LQ Firstly, extracting shallow layer characteristics H by using a bottom layer characteristic extraction module product ₀ Which comprises a Conv1 and a Conv2 two-layer convolution. In Conv1, firstly carrying out convolution with 256 convolution kernels with the step size of 1 and the size of 3 multiplied by 3, and then carrying out a PReLU activation module to obtain a characteristic diagram with the size of H multiplied by W multiplied by 256; in Conv2, a feature map with a size of H × W × 64 is obtained through convolution with 64 convolution kernels with a size of 1 × 1 and through a PReLU activation module. Extracting the feature map H after two layers of convolution ₀ And feeding into a circulating module. The input and output of the loop module can be represented by equation (2):

where concat () represents splicing by channel directionOperation f _RB Representing the extraction of features using a rotation module. The network outputs the cyclic module when the previous cycle is carried out

And shallow layer characteristic H extracted by network in current cycle ₀ Splicing, and then sending into a circulating module for processing. Output of circulation module

For the input of the loop module at the next loop. It is noted that, when the cycle t =1,

namely:

the output of the circulation module is

The size is H × W × 64. The loop module retains the current output

For the t +1 th iteration and outputting the current

Passing through an image reconstruction module, which contains 1 convolution Conv3,

after a convolution kernel of 3 × 3 with a step size of 1, a feature map of H × W × 3 is obtained. Finally, the feature map is added to the low-quality image I in the form of a residual _LQ To generate a final high-quality restored image

As described above, the network first extracts shallow features of the image using an underlying feature extraction module, which consists of two convolutional layers Conv1 and Conv 2. And splicing the extracted image shallow feature information and the high-level feature information extracted by the loop module during the previous iteration of the network according to the channel dimensions, inputting the spliced image shallow feature information and the high-level feature information into the loop module, and further processing by using the loop module to extract the high-level semantic feature information of the image. And the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module in the next network iteration and the input of the image reconstruction module in the current iteration. The network reconstruction module consists of a convolution layer Conv3 and is used for reconstructing the features extracted by the loop module so as to obtain residual information capable of restoring high-quality images. And finally, the network uses a jump layer connection structure to add the high-quality image residual error information generated by the network and the low-quality input image to obtain a high-quality recovery image after the network is used for compression artifact removal.

The circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions of different expansion rates, a multi-scale feature fusion layer and dense residual connection. Specifically, the structure of the circulation module designed by the invention is shown in fig. 3, and parameters of each layer in the circulation module are shown in tables 2-1 to 2-4. Taking the circulation module used in the t-th iteration as an example, the circulation module inputs the shallow feature H extracted by the current iteration network ₀ And hidden states generated by the loop module at the t-1 st iteration

H ₀ And

splicing according to the channel direction to obtain a characteristic diagram with the size of H multiplied by W multiplied by 128; and then carrying out ConvA convolution by convolution kernels with the sizes of 1 × 1 and 64 step lengths of 1, and then carrying out a PReLU activation module to obtain a characteristic diagram with the size of H × W × 64. Inputting the characteristic diagram after the channel adjustment into N series convolution groups, wherein the characteristic diagram output by each convolution group

The size is H × W × 64. Then, the outputs of the N convolution groups are spliced according to the channel direction to obtain the H multiplied by W multiplied by 64 multiplied by N]A characteristic diagram of (c). Finally, performing ConvB convolution on the spliced feature map by 64 convolution kernels with the step length of 1 and the size of 1 multiplied by 1, performing the ConvB convolution on the spliced feature map by a PReLU activation module to obtain a feature map with the size of H multiplied by W multiplied by 64, and outputting the feature map as the output of the circulation module of the t iteration

The process is shown in the formula (4),

each convolution group extracts feature maps of different receptive fields using three dilation convolution layers ConvS, convM, convL with dilation rates 1, 2, 3, respectively, and includes a multi-scale feature fusion module ConvF, and dense residual connections are added between convolutions of the same dilation rate in each convolution group. In addition to the first convolution group, each convolution group uses one 1 × 1 convolution Conv1, conv2, conv3 before the dilation convolution for integrating the number of channels. Taking the nth convolution group at the t iteration as an example, the input of the convolution group

As output of the last convolution group

Wherein when n =1, the input of the loop module is: fusing shallow layer characteristics H extracted from the current circulation by convolution with 64 convolution kernels with the size of 1 multiplied by 1 ₀ Higher layer features retained by the loop module on the last loop

As shown in formula (5).

Convolution group input n-1 th convolution group output

And the output of the convolutional layer with expansion rate of 1 in the first n-1 convolutional groups: [ S ] ¹ ,S ² ,…,S ^n-1 ]Output of convolutional layer with expansion ratio of 2: [ M ] ¹ ,M ² ,…,M ^n-1 ]And the output of the convolutional layer with expansion ratio of 3: [ L ] ¹ ,L ² ,…,L ^n-1 ]. The first parallel convolution: firstly to S ¹ ,S ² ,…,S ^n-1 And

splicing according to the channel direction, then convolving by 64 convolution kernels Conv1 with the step length of 1 and the size of 1 multiplied by 1, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by 64 through a PReLU activation module; then, the feature map is convolved by convolution kernels ConvS with the size of 3 × 3 and the expansion coefficient of 1, the size of the convolution kernels is 64 step lengths of 1, and the feature map S with the size of H × W × 64 is obtained through a PReLU activation module ⁿ . The second parallel convolution: firstly to M ¹ ,M ² ,…，M ^n-1 And

splicing according to the channel direction, then convolving by 64 convolution kernels Conv2 with the step length of 1 and the size of 1 multiplied by 1, and obtaining a characteristic diagram with the size of H multiplied by W multiplied by 64 through a PReLU activation module; then, the feature map is convolved by convolution kernels ConvM with the size of 3 × 3 and the expansion coefficient of 2 and with 64 step lengths of 1, and then the feature map M with the size of H × W × 64 is obtained through a PReLU activation module ⁿ . The third parallel convolution: firstly to L ¹ ,L ² ,…,L ^n-1 And

splicing according to the channel direction, then convolving by 64 convolution kernels Conv3 with the step length of 1 and the size of 1 multiplied by 1, and obtaining the size of H through a PReLU activation moduleA W × 64 signature; then, the feature map is convolved by convolution kernels ConvL with 3 × 3 size and expansion coefficient of 3 and with 64 step lengths of 1, and then the feature map L with the size of H × W × 64 is obtained through a PReLU activation module ⁿ . And finally, fusing the multi-scale features extracted by the three parallel convolutions by using a multi-scale feature fusion module, splicing the three multi-scale features according to the channel direction to obtain a feature map with the size of H multiplied by W multiplied by (64 multiplied by 3), performing ConvF convolution on the feature map by using convolution kernels with the sizes of 1 multiplied by 1 and with 64 step lengths, and activating by using the PReLU to obtain the output of a convolution group

The process is shown as formula (6) and formula (7):

wherein the content of the first and second substances,

the superscript of (a) indicates the expansion ratio of the convolution kernel.

TABLE 2-1 parameters for each layer in the Loop Module

TABLE 2-2 parameters for each layer in the Loop Module

TABLE 2-3 parameters for each layer in the Loop Module

TABLE 2-4 parameters for each layer in the Loop Module

From the above, the input of the loop module is the output generated by the loop module during the previous iteration, and the output is spliced with the image shallow feature extracted from the current network. The number of channels was adjusted using convolutional layer ConvA. In order to improve the receptive field of the network without increasing the number of parameters of the network and to sufficiently extract detailed information of the image, the present invention uses three convolution kernels ConvS, convM, and ConvL with different expansion rates in each convolution group to perform parallel convolution on the input. The convolution kernel ConvS with the small expansion rate can sufficiently extract detail information of the image; the convolution kernels ConvM and ConvL with large expansion rate can improve the receptive field of the network and obtain local information. In all convolution groups, dense residual connection is used between every two dilation convolutions with the same dilation rate so as to ensure that image information existing in low-quality images can freely flow in the network. Thus, in addition to the first convolution bank, each convolution bank also uses a convolution Conv1, conv2 and Conv3 for the number of integrated channels before the dilation convolution. And for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group. The cyclic module comprises N convolution groups. And finally, splicing the outputs of the N convolution groups according to the channel direction, adjusting an output channel by using one convolution layer ConvB, and generating the output of the circulation module.

FIG. 4 is a flow diagram of a method of training a compression artifact removal model according to one embodiment of the invention.

Step 401: performing encoding compression on the original video by using an H.264 algorithm;

step 402: converting the original video and the video after coding compression into video frames to form a paired video sample library;

step 403: training the compression artifact removal model based on the paired video sample library.

Specifically, the original high-definition videos in the paired video sample libraries (including the training set, the verification set and the test set) used by the invention are obtained by crawling the network live videos, wherein the original high-definition videos comprise four types of live videos such as live indoor broadcast, live outdoor broadcast, live game broadcast, movie playing and the like. To generate paired training and validation samples, the original data set is h.264 encoded, generating a low quality compressed video. The original video resolution is 1280 × 720, and since the compression artifacts in the video are not obvious when the compression code rate is greater than 1000Kbps, the compression code rate range is selected to be [1000kbps,128kbps ] when H.264 coding is carried out. Because the degradation degrees of the compressed videos with similar video code rates are similar, when low-quality videos are manufactured, the compression code rates are randomly selected from {1000Kbps,512Kbps,256Kbps and 128Kbps }. In addition, since the present invention does not use temporal correlation to the video, the original video and the encoded compressed video are converted into video frames and constructed as paired samples. The original high-definition video comprises 558 segments of video, and each segment of high-definition video corresponds to 4 segments of compressed video. Selecting 550 sections of high-definition videos and 2200 sections of corresponding compressed videos as training sets; and selecting 2 sections of high-definition videos and corresponding 8 sections of compressed videos for each type, and respectively using the videos as a verification set and a test set.

For the training set, each video randomly selects the same 4 frames of images, so that the training set has 8800 pairs of images; for the verification set, each video randomly selects the same 4 frames of images, so that the verification set has 32 pairs of images; and for the test set, directly converting the low-quality video after the high-definition video is compressed into video frames.

The method trains the constructed cyclic convolution neural network by using a training set and a verification set in a paired video sample library. Network random input training set low quality video frame LLQ, using bottom layer characteristic extraction module to extract image shallow information H ₀ . The recurrent neural network can deepen the depth of the network and improve the receptive field and the nonlinear expression capability of the network in a recurrent mode under the condition of not increasing network parameters, therebyThe whole network is cycled T times. Except for the first circulation, when the network circulates each time, the input of the circulation module is the shallow information H of the image extracted by the current network bottom layer feature extraction module ₀ And the output of the cycle module at the last cycle; for the first loop, because the last iteration process does not exist, the input of the loop module is the shallow information H of the image extracted by the current network bottom layer feature extraction module ₀ And H ₀ Splicing. The output of the loop block is used in the next iteration, in addition to reconstructing the image, to refine the underlying information representation in the next iteration. Since the network is cycled through T times, each cycle producing an output, the network produces T high quality restored images in total, i.e.

And (3) after calculating loss of the network generated image by using the high-definition video frame, solving a minimum loss function by using a gradient descent algorithm. The network adopts an Adam gradient descent algorithm, the momentum is set to be 0.9, the learning rate is set to be 0.0001, the performance of the network is verified by using a verification set once training, the learning rate of the network is halved once training is performed for 200 times, the training is stopped when the preset maximum iteration times (1000 times) are reached after repeated iteration, and finally the compression artifact removal model is obtained.

Additionally, the present invention uses circular convolution for the purpose of optimizing the underlying representation using the high-level features of the image extracted by the circular module. During the training of the network, each output image is supervised by using a corresponding lossless image GT, and the network uses L2 loss as a loss function, so that a network loss function formula designed by the invention is shown as an equation (8):

after the network training is finished, the test data set constructed in the establishment of the H.264 coding paired video sample base is used for removing video compression artifacts. And sending the low-quality compressed video to be restored with the artifacts into a network in an image frame mode, wherein the output result is a high-quality restored image. In the process of removing the video compression artifacts, the iteration times of the network can be adjusted according to the calculation performance of hardware so as to achieve the balance between the network performance and the operation time, but the iteration times of the network during testing are required to be less than or equal to the iteration times of the network during training.

From the above, the invention removes the compression artifact in the H.264 coded video with unknown compression code rate through the recurrent neural network, and recovers the high-quality video. Firstly, establishing an end-to-end compression video artifact removing scheme with unknown compression code rate by utilizing a cyclic neural network, and training by using paired low-quality and high-quality video frames; secondly, by utilizing the multi-scale characteristic diagram extracted by the expansion convolution, the receptive field of the network can be improved, the network can eliminate the image artifacts in a large range, and the detail characteristic can be extracted, so that the network can eliminate the artifacts and keep the details at the same time. The method can effectively improve the network performance, improve the generalization capability of the network, and can realize the recovery of the video coded by the compression code rate in all ranges by using a single network. Moreover, the cyclic neural network adopted by the invention can not only improve the depth of the network without increasing the number of parameters and effectively improve the expression capability and nonlinear modeling capability of the network, but also adjust the cycle number of the network according to the limitation of hardware performance to achieve the balance of network performance and operation time, thereby being convenient for the application of the method in practice.

Fig. 5 is a schematic diagram of a compression artifact removal apparatus for webcast video according to another embodiment of the present invention.

In this embodiment, it should be noted that, referring to fig. 5, the apparatus for removing compression artifacts of webcast video according to an embodiment of the present invention may include a compressed video obtaining unit 501, configured to obtain a compressed video; a compression artifact removing unit 502, configured to input the compressed video into a compression artifact removing model to obtain a high-quality recovered video corresponding to the compressed video and output by the compression artifact removing model,

Since the apparatus for removing the compression artifact of the live webcast video according to the embodiment of the present invention can be used to execute the method for removing the compression artifact of the live webcast video according to the embodiment, and the working principle and the beneficial effects are similar, detailed descriptions are omitted here, and specific contents can be found in the description of the embodiment.

In this embodiment, it should be noted that each unit in the apparatus according to the embodiment of the present invention may be integrated into one body, or may be separately disposed. The units may be combined into one unit, or further divided into a plurality of sub-units.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 6: a processor 601, a memory 602, a communication interface 603, and a communication bus 604; the processor 601, the memory 602, and the communication interface 603 complete communication with each other through the communication bus 604.

The processor 601 is configured to call a computer program in the memory 602, and when the processor executes the computer program, the processor implements all the steps of the above method for removing compression artifacts of live webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality restored video which is output by the compression artifact removal model and corresponds to the compressed video,

It will be appreciated that the detailed functions and extended functions that the computer program may perform may be as described with reference to the above embodiments.

Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements all the steps of the above-mentioned method for removing compression artifacts of webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; inputting the compressed video into a compression artifact removal model to obtain a high-quality restored video which is output by the compression artifact removal model and corresponds to the compressed video,

Based on the same inventive concept, another embodiment of the present invention provides a computer program product, which includes a computer program, when the computer program is executed by a processor, the computer program implements all the steps of the above-mentioned method for removing compression artifacts of webcast video, for example, when the processor executes the computer program, the processor implements the following processes: acquiring a compressed video; and inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video, wherein the compression artifact removing model recovers the compressed video with unknown compression code rate and containing the compression artifact by using a Recurrent Neural Network (RNN) and expansion convolution to generate the high-quality recovery video.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the foregoing technical solutions may be substantially or partially embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method for removing compression artifacts of webcast video according to various embodiments or some portions of the embodiments.

Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Furthermore, in the present disclosure, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for removing compression artifacts of live webcast video is characterized by comprising the following steps:

acquiring a compressed video;

inputting the compressed video into a compression artifact removing model to obtain a high-quality recovery video which is output by the compression artifact removing model and corresponds to the compressed video;

wherein the compression artifact removal model recovers the compressed video with unknown compression code rate and containing compression artifacts by using a Recurrent Neural Network (RNN) and an expansion convolution to generate the high-quality recovered video,

wherein the recurrent neural network comprises: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module; the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions with different expansion rates, a multi-scale feature fusion layer and dense residual connection; and is

The input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network;

the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration;

the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information;

dense residual connection is used between every two expansion convolutions with the same expansion rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network;

and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.

2. The method of claim 1, wherein the bottom layer feature extraction module comprises two convolution layers Conv1 and Conv2,

the image shallow layer feature information extracted by the bottom layer feature extraction module and the high-level feature information extracted by the circulation module in the previous iteration are spliced according to the channel dimensions and then input into the circulation module, and the circulation module is used for further processing so as to extract the high-level semantic feature information of the image.

3. The method according to claim 1, wherein the image reconstruction module includes a convolutional layer Conv3, and the convolutional layer Conv3 reconstructs the features extracted by the loop module, so as to obtain residual information for restoring high-quality video.

4. The method of claim 1, wherein the skip connection adds the high quality video residual information to the low quality input video to obtain a high quality restored video after removing the compressed artifact.

5. The method for removing compression artifacts of live webcast video according to any of claims 1 to 4, wherein all activation functions in the recurrent neural network use a PReLU activation function.

6. The method of claim 1, wherein the method further comprises: the compression artifact removal model is trained to,

wherein said training said compression artifact removal model comprises:

the original video is coded and compressed by using an H.264 algorithm;

converting the original video and the video after coding compression into video frames to form a paired video sample library;

training the compression artifact removal model based on the paired video sample library.

7. A compression artifact removal apparatus for webcast video, comprising:

a compressed video acquisition unit for acquiring a compressed video; and

a compression artifact removing unit, configured to input the compressed video into a compression artifact removing model, so as to obtain a high-quality restored video output by the compression artifact removing model and corresponding to the compressed video, where the compression artifact removing model restores the compressed video with an unknown compression code rate and containing a compression artifact by using a recurrent neural network RNN and an expansion convolution, so as to generate the high-quality restored video, and the recurrent neural network includes: the system comprises a bottom layer feature extraction module, a circulation module, an image reconstruction module and a jump connection module; the circulation module includes: n convolution groups and two convolution layers ConvA and ConvB for adjusting the output channels, where N is equal to 4; the set of convolutions includes: three parallel convolutions with different expansion rates, a multi-scale feature fusion layer and dense residual connection; the input of the loop module is the output generated by the loop module in the previous iteration and is spliced with the image shallow layer characteristics extracted from the current loop neural network; the high-level characteristic information extracted by the circulation module is respectively used as the input of the circulation module during the next circulation neural network iteration and the input of the image reconstruction module during the current iteration; the convolution kernel ConvS with small expansion rate in the circulation module fully extracts the detail information of the image; convolution kernels ConvM and ConvL with large expansion rates in the circulation module improve the receptive field of the circulation neural network to obtain local information; dense residual connection is used between every two dilation convolutions with the same dilation rate, so that image information existing in the low-quality video is transmitted in a recurrent neural network; and for the outputs of the three parallel convolutions, after splicing according to the channel direction, fusing the information of different receptive fields extracted by the three expansion convolutions through a multi-scale feature fusion module ConvF to generate the output of a convolution group.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for compression artifact removal for live webcast video according to any of claims 1 to 6 when executing the computer program.

9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for compression artifact removal for webcast video according to any of claims 1 to 6.