CN112019861A - Video compression method and device based on keyframe guidance super-resolution - Google Patents

Video compression method and device based on keyframe guidance super-resolution Download PDF

Info

Publication number
CN112019861A
CN112019861A CN202010698136.XA CN202010698136A CN112019861A CN 112019861 A CN112019861 A CN 112019861A CN 202010698136 A CN202010698136 A CN 202010698136A CN 112019861 A CN112019861 A CN 112019861A
Authority
CN
China
Prior art keywords
resolution
video
super
frame
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010698136.XA
Other languages
Chinese (zh)
Other versions
CN112019861B (en
Inventor
鲁继文
周杰
马程
饶永铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010698136.XA priority Critical patent/CN112019861B/en
Publication of CN112019861A publication Critical patent/CN112019861A/en
Application granted granted Critical
Publication of CN112019861B publication Critical patent/CN112019861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field

Abstract

The invention discloses a video compression method and a device for guiding super-resolution based on a key frame, wherein the method comprises the following steps: inputting an input video into a key frame selection network in a frame sequence form to obtain a high-resolution key frame of the input video; down-sampling an input video by a frame sequence to obtain a low-resolution frame sequence of the input video; and inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video. The method compresses a high-definition input video after downsampling, and selects a key frame from the compressed video to be used as a video super-resolution guide after the video is decompressed, so that high-level video compression is achieved, and a high-quality video can be restored from the compressed video.

Description

Video compression method and device based on keyframe guidance super-resolution
Technical Field
The invention relates to the technical field of image processing, in particular to a video compression method and device based on keyframe guidance super-resolution.
Background
In recent years, video processing techniques have been developed with great success, such as two fundamental problems in computer vision, video compression techniques and video super-resolution techniques. The video compression can improve the storage efficiency in a personal computer and enable online video browsing to be possible, and the video super-resolution technology has important value in applications such as satellite images, monitoring and high-definition televisions.
Recently many industry standards for video compression have been recognized worldwide, such as MPEG-4, H.264/AVC, and HEVC, among others. But these methods all require a trade-off between reconstruction loss and compression rate and at large compression rates the video quality is greatly degraded. It is known that downsampling a video before encoding and then upsampling the video after decoding improves the performance of the compression method at large compression rates, but does not achieve good decompressed video without a good upsampling method. Therefore, for the video compression problem, it is a difficult problem to maintain the quality of the video while reducing the storage space on a large scale. Meanwhile, with the development of deep neural networks, single-picture super-resolution [4,5,6] has been a good breakthrough in the past years, and based on this, many video super-resolution methods [7,8,9] have been proposed in recent years, wherein the reconstruction of each frame can be regarded as a combination of multiple single-picture super-resolution problems. In addition, the motion information between frames is also explored by many methods to find the temporal relationship between pixel pairs or block pairs between frames. However, both problems, whether single-picture super-resolution or video super-resolution, are highly ill-conditioned, since the super-resolution problem requires the recovery of multiple pixels from one pixel as the inverse of the down-sampling process, providing unknown detailed information for the input low-resolution picture, which is an unsolvable problem. The performance of the super-resolution method is therefore heavily dependent on the data distribution, which is also an important bottleneck for the super-resolution problem.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a method for compressing a super-resolution video based on keyframe guidance, which can achieve high video compression and recover a high-quality video therefrom.
Another object of the present invention is to provide a video compression apparatus for guiding super-resolution based on key frames.
In order to achieve the above object, an embodiment of the present invention provides a method for compressing a video based on a keyframe-guided super-resolution, including:
inputting an input video into a key frame selection network in a frame sequence form to obtain a high-resolution key frame of the input video;
down-sampling the input video by a frame sequence to obtain a low-resolution frame sequence of the input video;
and inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video.
In order to achieve the above object, another embodiment of the present invention provides a video compression apparatus for guiding super-resolution based on key frames, including:
the selection module is used for inputting an input video to a key frame selection network in a frame sequence form to obtain a high-resolution key frame of the input video;
the compression module is used for down-sampling the input video by a frame sequence to obtain a low-resolution frame sequence of the input video;
and the decompression module is used for inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video.
The video compression method and device based on the keyframe-guided super-resolution of the embodiment of the invention have the following advantages: the high-definition input video is compressed after being downsampled, and meanwhile, the key frame is selected from the compressed video and used as a video super-resolution guide after the compressed video is decompressed, so that high-level video compression is achieved, and the high-quality video can be restored from the compressed video.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for keyframe-based guided super-resolution video compression according to one embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for video compression based on keyframe-guided super-resolution according to an embodiment of the present invention;
FIG. 3 is a flow diagram and framework diagram of a key frame selector according to one embodiment of the invention;
FIG. 4 is a network framework diagram of a super-resolution video generator utilizing key-frame guidance according to one embodiment of the present invention;
FIG. 5 is a network structure diagram of a mutual attention layer according to one embodiment of the present invention;
fig. 6 is a schematic structural diagram of a video compression apparatus for guiding super-resolution based on key frames according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for video compression based on keyframe-guided super-resolution according to an embodiment of the present invention with reference to the accompanying drawings.
A video compression method based on keyframe-guided super-resolution proposed according to an embodiment of the present invention will first be described with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for video compression based on keyframe-guided super-resolution according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for video compression based on keyframe-guided super-resolution according to an embodiment of the present invention.
As shown in fig. 1 and 2, the method for video compression based on keyframe-guided super resolution includes the following steps:
step S1, the input video is input to the key frame selection network in the form of a frame sequence, and a high resolution key frame of the input video is obtained.
Further, the method comprises the steps of inputting an input video into a key frame selection network in a frame sequence mode to obtain a high-resolution key frame of the input video, and further comprises the following steps:
extracting a characteristic vector of each frame of an input video, and acquiring a time sequence relation between video frames through a long-term and short-term memory artificial neural network;
converting the characteristic vector into a confidence score through a full connection layer and a softmax layer;
and taking the video frame corresponding to the highest number in the confidence scores as the high-resolution key frame.
And further, according to the extracted feature vector of each frame of the input video, calculating a training score by using Gumbel distributed sampling and a softmax layer, wherein the training score is given to each frame of picture by a weight, further selecting a substitute frame, and training a key frame by using the substitute frame to select a network.
In the process of selecting the key frames, a deep neural network with bidirectional LSTM is designed to extract the relation features between frame sequences, and the most representative frames are selected as the key frames in a classification mode.
Specifically, as shown in fig. 3, the input of the key frame selection network is a frame sequence of high definition images, and one or more frames in the frame sequence of the input video can be selected as key frames through the key frame selection network. It should be noted that a preset selection rule may be set, for example, every 10 seconds of input video selects a key frame, and then a 60 seconds of input video selects 6 key frames from the key frame, which may be set according to actual requirements, and is not limited herein.
One of the multiple pictures is selected as a key frame, so that the method can be regarded as a classification problem, similar to [10,11], a feature vector is extracted by using a pre-trained ResNet18 for each input frame, and then a Long Short-Term Memory (LSTM) model is used for acquiring the time sequence relation between the frames. The features can be converted into a confidence score of real value through the full link layer and the softmax layer, and the confidence score is used for representing the probability that the frame is selected as the key frame. The frame with the largest score is selected as the key frame and the other frames are down sampled and taken as input for the next generator network G.
But the operation of taking the maximum is discontinuous, so the step has no optimization method, and the gradient is provided for the key frame selection network to complete the optimization; on the other hand, to avoid the case of pattern collapse, the score is generated with Gumbel-Softmax instead of Softmax. By vtAnd GtThe samples representing the output of the previous layer and the Gumbel distribution, respectively, the training score can be calculated by the following equation:
Figure BDA0002592020470000041
in the process of forward transmission, the operation of taking the maximum is used for deciding which frame is selected as the key frame, and in the process of backward transmission, the score calculated by the expression is given to each high-definition picture input as the weight of the high-definition picture, so that a substitute frame I in the process of backward transmission is formedSub
Figure BDA0002592020470000042
Although this is not a true image in the input video, it is combined from the input images according to the computed scores, so the parameters in the key frame selector S can also be updated according to the different scores of the input frames, so we perform the backpropagation of the backpropagation gradient with the following relaxation:
Figure BDA0002592020470000043
before using the key frame network, selecting a substitute frame by a back-transmission method to train the key frame selection network.
In step S2, the input video is down-sampled by a frame sequence to obtain a low resolution frame sequence of the input video.
Specifically, downsampling is performed on a high-definition input video to obtain a compressed low-resolution video.
In the process of video compression, except that the high-resolution video is down-sampled to a low-resolution video, a representative high-resolution frame is selected as a key frame of the whole high-resolution video. Both the key frames and the low-resolution video can be compressed by the existing method, and the number of the key frames is far less than that of the original frame rate, so that the storage space is greatly reduced.
Step S3, the high resolution key frame and the low resolution frame sequence are input to a generator, and a super-resolution video is generated.
Further, step S3 further includes:
s31, increasing the resolution of the low resolution frame sequence by interpolation;
s32, extracting feature maps of the low-resolution frame sequence and the high-resolution key frame through the convolutional layer to obtain a low-definition feature map and a high-definition feature map;
s33, performing super-resolution on the low-definition feature map to obtain a first intermediate feature map, and fusing the information of the low-definition feature map and the high-definition feature map through a mutual attention layer to obtain a second feature map;
and S34, splicing the first feature map and the second feature map, and inputting all spliced feature maps into a recovery module to generate the super-resolution video.
Fusing the information of the low-definition feature map and the high-definition feature map through the mutual attention layer to obtain a second feature map, wherein the second feature map comprises:
respectively extracting corresponding feature maps from the low-definition feature map and the high-definition feature map by using a preset stride;
respectively converting the corresponding characteristic diagrams into two-dimensional matrixes, and obtaining coefficient matrixes through point multiplication of characteristic vectors between the two-dimensional matrixes;
obtaining a reconstruction coefficient by using softmax according to the coefficient matrix, wherein the reconstruction coefficient is used as the weight of each block of the high-resolution key frame in the reconstruction process;
and cutting the high-resolution key frame into a corresponding number of blocks, and weighting according to the weight of each block to obtain a reconstructed second feature map.
In the decompression process, the high-quality reconstruction of the low-resolution video can be completed according to the guidance of the key frames by utilizing the time sequence consistency between adjacent frames, and because the lost detail information can be acquired from the key frames which are not downsampled, the decompression process is more solvable than the traditional ill-conditioned super-resolution problem.
As shown in fig. 4, the key frame selected by the key frame selection network S and the low resolution video frame are simultaneously input into the generator G, which has two branches, one branch is the super-resolution feature of the low resolution image itself, and the other branch is the guided recovery of the low resolution frame and the high resolution key frame by the attention mechanism.
Specifically, in the generator, the input is simply considered as one high definition key frame and one low definition video frame, rather than a series of video frame sequences. In order to restore a high-quality super-resolution image, the relation between two frames is fully utilized to mine detail information contained in a high-definition key frame. Firstly, recovering a high-definition frame from a low-definition picture in an interpolation mode, then extracting a feature map for two frames by using a convolution layer, wherein the length and the width of the feature map are reduced, but the number of channels is increased, so that the original information is kept to be input, and preparation for optimizing calculation is made for the next step. Two branches can generate two intermediate feature maps respectively, one is obtained by super-resolution of the low-definition map, and the other is generated by fusing the information in the two feature maps together by using the mutual attention layer. After the two feature maps are generated, the two intermediate feature maps are spliced together and then input into a recovery module to generate a final super-resolution picture. The overall architecture of the network is shown in fig. 4.
Layer of mutual attention: the direction of the low-definition to-be-recovered frame by the high-definition key frame is accomplished with a mutual attention layer, as shown in fig. 5. And firstly, extracting the other two feature maps by using the step s for the two feature maps, thereby further reducing the computational complexity. Then, the number of channels is taken as the length of the features, and then the length is multiplied by the width to be taken as the number of the features, so that the feature map of each picture is changed into a two-dimensional matrix from three dimensions. The mutual relation of the low-definition images and the key frames in the length dimension and the width dimension can be obtained through point multiplication of the feature vectors between the two-dimensional matrixes, and the guided reconstruction of the key frames on the low-definition images can be completed according to the relation. Firstly, two-dimensional matrixes are multiplied by a matrix to obtain a coefficient matrix, so that a parameter relation between two pictures is established, and secondly, softmax is used for obtaining a score which is used as the weight of each block of a key frame in the reconstruction process. After the weight scores are obtained, the original high-definition feature map is cut into blocks with the same size, and the reconstructed feature map is obtained according to the calculated score weights, so that super-resolution reconstruction with the key frame as a guide in the mutual attention branch is completed.
It can be understood that after the super-resolution video is obtained through decompression, in order to make the generated super-resolution image closer to the original high-definition image, a small neural network can be used as a discriminator to form a confrontation generation model with the discriminator, so that the network can better utilize information in the key frames and can also generate an image closer to the original high-definition video frame.
Specifically, a better super-resolution performance is obtained and the stability of the training process is improved, a discriminator D is built by using a convolutional neural network, and an original real high-resolution frame and a super-resolution frame are distinguished by using a wasserstein gan loss function with a gradient penalty, as follows:
Figure BDA0002592020470000061
with the above introduction, the ultimate goal is to generate super-resolution frames close to the original high-definition frames, so that the reconstruction loss is an important part, and for each pair of high-definition frame and super-resolution frame, the following is minimized:
Figure BDA0002592020470000062
when optimizing discriminator D, the penalty function is learned with countermeasures as follows:
Figure BDA0002592020470000063
the countermeasure loss and reconstruction loss also guide the generator during the training process, and unlike the discriminator which tries to distinguish between the high definition original frame and the generated super-resolution frame, the generator fools the discriminator by making the generated picture as close as possible to the original picture. The loss function of the generator is therefore as follows:
Figure BDA0002592020470000064
after back-propagation through the generator, we obtain the gradient for the key-frame, when we use I to optimize the parameters of the key-frame selectorSubInstead, the loss function of the key frame selector and the loss function of the generator are therefore identical, as:
Figure BDA0002592020470000065
equations (5) and (7) are used to optimize the generator, equation (6) is used to optimize the discriminator, and equation (8) is used to optimize the key frame selection network.
It will be appreciated that in the training, the key frame selection network, the generator and the discriminator are trained end-to-end, and in the testing, only the key frame selection network and the generator network are used.
According to the video compression method based on the key frame guidance super-resolution provided by the embodiment of the invention, the high-resolution key frame of the input video is obtained by inputting the input video into a key frame selection network in a frame sequence form; down-sampling an input video by a frame sequence to obtain a low-resolution frame sequence of the input video; and inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video. Therefore, the high-definition input video is compressed after being downsampled, and meanwhile, the key frame is selected from the compressed high-definition input video and used as a video super-resolution guide after the video is decompressed, so that high-level video compression is achieved, and the high-quality video can be restored from the compressed high-definition input video.
Next, a video compression apparatus for guiding super-resolution based on key frames according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 6 is a schematic structural diagram of a video compression apparatus for guiding super-resolution based on key frames according to an embodiment of the present invention.
As shown in fig. 6, the video compression apparatus for guiding super-resolution based on key frames includes: the device comprises a selection module 100, a compression module 200 and a decompression module 300.
The selecting module 100 is configured to input an input video to a key frame selecting network in a frame sequence form, so as to obtain a high-resolution key frame of the input video.
The compression module 200 is configured to down-sample the input video by a frame sequence to obtain a low resolution frame sequence of the input video.
And a decompression module 300, configured to input the high resolution key frame and the low resolution frame sequence into the generator, so as to generate the super-resolution video.
Further, in an embodiment of the present invention, the method further includes: a reinforcement module;
and the reinforcement module is used for building a discriminator through a convolution network, and performing countermeasure generation by taking the super-resolution video and the input video as the input of the discriminator at the same time.
Further, in an embodiment of the present invention, the method further includes: an optimization module;
and the optimization module is used for optimizing the key frame selection network, the generator and the discriminator through the loss function.
It should be noted that the foregoing explanation on the embodiment of the video compression method for guiding super-resolution based on the key frame is also applicable to the apparatus of this embodiment, and is not repeated here.
According to the video compression device based on the key frame guidance super-resolution, which is provided by the embodiment of the invention, the high-resolution key frame of the input video is obtained by inputting the input video to a key frame selection network in a frame sequence form; down-sampling an input video by a frame sequence to obtain a low-resolution frame sequence of the input video; and inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video. Therefore, the high-definition input video is compressed after being downsampled, and meanwhile, the key frame is selected from the compressed high-definition input video and used as a video super-resolution guide after the video is decompressed, so that high-level video compression is achieved, and the high-quality video can be restored from the compressed high-definition input video.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A video compression method based on super-resolution guided by key frames is characterized by comprising the following steps:
inputting an input video into a key frame selection network in a frame sequence form to obtain a high-resolution key frame of the input video;
down-sampling the input video by a frame sequence to obtain a low-resolution frame sequence of the input video;
and inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video.
2. The method for video compression based on keyframe-guided super resolution as claimed in claim 1, further comprising:
and constructing a discriminator through a convolutional network, and performing countermeasure generation by taking the super-resolution video and the input video as the input of the discriminator at the same time.
3. The method for video compression based on keyframe-guided super resolution of claim 1, wherein the inputting of the input video into the keyframe selection network in the form of a sequence of frames to obtain the high resolution keyframes of the input video further comprises:
extracting a characteristic vector of each frame of the input video, and acquiring a time sequence relation between video frames through a long-term and short-term memory artificial neural network;
converting the feature vector into a confidence score through a full connection layer and a softmax layer;
and taking the video frame corresponding to the highest number in the confidence scores as a high-resolution key frame.
4. The method for video compression with super-resolution guided based on key frames according to claim 3, wherein a training score is calculated by using Gumbel distributed sampling and softmax layer according to the extracted feature vector of each frame of the input video, the training score is given to each frame picture with a weight, and then a substitute frame is selected, and the key frame selection network is trained by using the substitute frame.
5. The method of claim 1, wherein the inputting the sequence of high resolution key frames and the sequence of low resolution frames into a generator generates super-resolution video, comprising:
increasing the resolution of the low-resolution frame sequence by means of interpolation;
extracting feature maps of the low-resolution frame sequence and the high-resolution key frame through a convolutional layer to obtain a low-definition feature map and a high-definition feature map;
performing super-resolution on the low-definition feature map to obtain a first intermediate feature map, and fusing information of the low-definition feature map and the high-definition feature map through a mutual attention layer to obtain a second feature map;
and splicing the first feature map and the second feature map, and inputting all spliced feature maps into a recovery module to generate a super-resolution video.
6. The method for video compression based on keyframe-guided super resolution as claimed in claim 5, wherein said fusing the information of the low-definition feature map and the high-definition feature map through a mutual attention layer to obtain a second feature map comprises:
respectively extracting corresponding feature maps from the low-definition feature map and the high-definition feature map by using a preset stride;
respectively converting the corresponding characteristic diagrams into two-dimensional matrixes, and obtaining coefficient matrixes through point multiplication of characteristic vectors between the two-dimensional matrixes;
obtaining a reconstruction coefficient by using softmax according to the coefficient matrix, wherein the reconstruction coefficient is used as the weight of each block of the high-resolution key frame in the reconstruction process;
and cutting the high-resolution key frame into a corresponding number of blocks, and weighting according to the weight of each block to obtain the reconstructed second feature map.
7. The method for video compression based on keyframe-guided super resolution as claimed in claims 1-6, further comprising: and optimizing the key frame selection network, the generator and the discriminator through a loss function.
8. A video compression apparatus for guiding super-resolution based on key frames, comprising:
the selection module is used for inputting an input video to a key frame selection network in a frame sequence form to obtain a high-resolution key frame of the input video;
the compression module is used for down-sampling the input video by a frame sequence to obtain a low-resolution frame sequence of the input video;
and the decompression module is used for inputting the high-resolution key frame and the low-resolution frame sequence into a generator to generate the super-resolution video.
9. The apparatus for video compression based on keyframe-guided super resolution of claim 8, further comprising: a reinforcement module;
and the enhancement module is used for building a discriminator through a convolution network, and performing countermeasure generation by taking the super-resolution video and the input video as the input of the discriminator at the same time.
10. The apparatus for video compression based on keyframe-guided super resolution as claimed in claims 8-9, further comprising: an optimization module;
and the optimization module is used for optimizing the key frame selection network, the generator and the discriminator through a loss function.
CN202010698136.XA 2020-07-20 2020-07-20 Video compression method and device based on keyframe guidance super-resolution Active CN112019861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010698136.XA CN112019861B (en) 2020-07-20 2020-07-20 Video compression method and device based on keyframe guidance super-resolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010698136.XA CN112019861B (en) 2020-07-20 2020-07-20 Video compression method and device based on keyframe guidance super-resolution

Publications (2)

Publication Number Publication Date
CN112019861A true CN112019861A (en) 2020-12-01
CN112019861B CN112019861B (en) 2021-09-14

Family

ID=73498509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010698136.XA Active CN112019861B (en) 2020-07-20 2020-07-20 Video compression method and device based on keyframe guidance super-resolution

Country Status (1)

Country Link
CN (1) CN112019861B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560760A (en) * 2020-12-24 2021-03-26 上海交通大学 Attention-assisted unsupervised video abstraction system
CN113033616A (en) * 2021-03-02 2021-06-25 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN114827714A (en) * 2022-04-11 2022-07-29 咪咕文化科技有限公司 Video restoration method based on video fingerprints, terminal equipment and storage medium
WO2023020513A1 (en) * 2021-08-19 2023-02-23 Huawei Technologies Co., Ltd. Method, device, and medium for generating super-resolution video
CN116523758A (en) * 2023-07-03 2023-08-01 清华大学 End cloud combined super-resolution video reconstruction method and system based on key frames

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938656A (en) * 2010-09-27 2011-01-05 上海交通大学 Video coding and decoding system based on keyframe super-resolution reconstruction
US20180338159A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co,. Ltd. Super-resolution processing method for moving image and image processing apparatus therefor
CN109636721A (en) * 2018-11-29 2019-04-16 武汉大学 Video super-resolution method based on confrontation study and attention mechanism
US20190130530A1 (en) * 2017-10-31 2019-05-02 Disney Enterprises Inc. Video Super-Resolution Using An Artificial Neural Network
CN109819321A (en) * 2019-03-13 2019-05-28 中国科学技术大学 A kind of video super-resolution Enhancement Method
CN110062232A (en) * 2019-04-01 2019-07-26 杭州电子科技大学 A kind of video-frequency compression method and system based on super-resolution
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device
CN110852944A (en) * 2019-10-12 2020-02-28 天津大学 Multi-frame self-adaptive fusion video super-resolution method based on deep learning
CN111340711A (en) * 2020-05-21 2020-06-26 腾讯科技(深圳)有限公司 Super-resolution reconstruction method, device, equipment and storage medium
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938656A (en) * 2010-09-27 2011-01-05 上海交通大学 Video coding and decoding system based on keyframe super-resolution reconstruction
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
US20180338159A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co,. Ltd. Super-resolution processing method for moving image and image processing apparatus therefor
US20190130530A1 (en) * 2017-10-31 2019-05-02 Disney Enterprises Inc. Video Super-Resolution Using An Artificial Neural Network
WO2019192588A1 (en) * 2018-04-04 2019-10-10 华为技术有限公司 Image super resolution method and device
CN109636721A (en) * 2018-11-29 2019-04-16 武汉大学 Video super-resolution method based on confrontation study and attention mechanism
CN109819321A (en) * 2019-03-13 2019-05-28 中国科学技术大学 A kind of video super-resolution Enhancement Method
CN110062232A (en) * 2019-04-01 2019-07-26 杭州电子科技大学 A kind of video-frequency compression method and system based on super-resolution
CN110852944A (en) * 2019-10-12 2020-02-28 天津大学 Multi-frame self-adaptive fusion video super-resolution method based on deep learning
CN111340711A (en) * 2020-05-21 2020-06-26 腾讯科技(深圳)有限公司 Super-resolution reconstruction method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HENG SU ET AL.: "Single image super-resolution based on space structure learning", 《PATTERN RECOGNITION LETTERS》 *
超分辨率图像重建方法综述: "苏衡", 《自动化学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560760A (en) * 2020-12-24 2021-03-26 上海交通大学 Attention-assisted unsupervised video abstraction system
CN112560760B (en) * 2020-12-24 2023-03-10 上海交通大学 Attention-assisted unsupervised video abstraction system
CN113033616A (en) * 2021-03-02 2021-06-25 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN113033616B (en) * 2021-03-02 2022-12-02 北京大学 High-quality video reconstruction method, device, equipment and storage medium
WO2023020513A1 (en) * 2021-08-19 2023-02-23 Huawei Technologies Co., Ltd. Method, device, and medium for generating super-resolution video
US11778223B2 (en) 2021-08-19 2023-10-03 Huawei Technologies Co., Ltd. Method, device, and medium for generating super-resolution video
CN114827714A (en) * 2022-04-11 2022-07-29 咪咕文化科技有限公司 Video restoration method based on video fingerprints, terminal equipment and storage medium
CN114827714B (en) * 2022-04-11 2023-11-21 咪咕文化科技有限公司 Video fingerprint-based video restoration method, terminal equipment and storage medium
CN116523758A (en) * 2023-07-03 2023-08-01 清华大学 End cloud combined super-resolution video reconstruction method and system based on key frames
CN116523758B (en) * 2023-07-03 2023-09-19 清华大学 End cloud combined super-resolution video reconstruction method and system based on key frames

Also Published As

Publication number Publication date
CN112019861B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN112019861B (en) Video compression method and device based on keyframe guidance super-resolution
CN109903228B (en) Image super-resolution reconstruction method based on convolutional neural network
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN109068174B (en) Video frame rate up-conversion method and system based on cyclic convolution neural network
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN111709895A (en) Image blind deblurring method and system based on attention mechanism
CN107396124A (en) Video-frequency compression method based on deep neural network
CN109949222B (en) Image super-resolution reconstruction method based on semantic graph
CN111787187B (en) Method, system and terminal for repairing video by utilizing deep convolutional neural network
RU2509439C2 (en) Method and apparatus for encoding and decoding signal, data medium and computer program product
CN112365422B (en) Irregular missing image restoration method and system based on deep aggregation network
CN109949217B (en) Video super-resolution reconstruction method based on residual learning and implicit motion compensation
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
Islam et al. Image compression with recurrent neural network and generalized divisive normalization
CN115131675A (en) Remote sensing image compression method and system based on reference image texture migration
CN113066022B (en) Video bit enhancement method based on efficient space-time information fusion
CN115936985A (en) Image super-resolution reconstruction method based on high-order degradation cycle generation countermeasure network
CN114202463B (en) Cloud fusion-oriented video super-resolution method and system
WO2023185284A1 (en) Video processing method and apparatuses
CN116668738A (en) Video space-time super-resolution reconstruction method, device and storage medium
CN115147274A (en) Method for acquiring super-resolution image, acquisition system device and storage medium
CN114677282A (en) Image super-resolution reconstruction method and system
CN114972024A (en) Image super-resolution reconstruction device and method based on graph representation learning
CN114663315A (en) Image bit enhancement method and device for generating countermeasure network based on semantic fusion
Yang et al. Blind VQA on 360° Video via Progressively Learning From Pixels, Frames, and Video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant