CN113344780A - Fully-known video super-resolution network, and video super-resolution reconstruction method and system - Google Patents
Fully-known video super-resolution network, and video super-resolution reconstruction method and system Download PDFInfo
- Publication number
- CN113344780A CN113344780A CN202110549356.0A CN202110549356A CN113344780A CN 113344780 A CN113344780 A CN 113344780A CN 202110549356 A CN202110549356 A CN 202110549356A CN 113344780 A CN113344780 A CN 113344780A
- Authority
- CN
- China
- Prior art keywords
- resolution
- network
- super
- fully
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000002243 precursor Substances 0.000 claims abstract description 55
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 22
- 239000000126 substance Substances 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 230000005484 gravity Effects 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 2
- 238000013461 design Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
- G06T3/4076—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
The invention discloses a fully-known video super-resolution network, a video super-resolution reconstruction method and a system, wherein the fully-known video super-resolution network consists of two sub-networks, namely a precursor network and a subsequent network; firstly, selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and down-sampling r times to obtain a low-resolution image as the input of a network; then according to the type of the fully-known video super-resolution frame, inputting the low-resolution frames into a precursor network in a forward direction (local fully-known type) or a backward direction (global fully-known type) to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames; inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence to further generate hidden state and high-resolution detail information; and finally, adding the high-resolution structure information and the detail information to obtain a finally reconstructed high-resolution video frame.
Description
Technical Field
The invention belongs to the technical field of digital image processing, and relates to a fully-known video super-resolution network, a video super-resolution reconstruction method and a system.
Background
In recent years, with the development of science and technology, video has become an increasingly important information carrier in people's lives. However, high-resolution video is still of limited popularity due to hardware limitations and traffic bandwidth. The video super-resolution technology can reconstruct a corresponding high-resolution video from a low-resolution video, and is widely applied to the fields of video monitoring, satellite remote sensing, video conferences and the like.
The existing internationally leading video super-resolution methods mostly focus on designing a more complex network structure so as to better fit the mapping relationship from a low-resolution space to a high-resolution space, but neglect the design of a video super-resolution framework. However, the frame is the foundation of the video super-resolution algorithm, and for the same network model, a poor frame cannot fully excavate the potential of the model, and a good frame can fully exert the performance of the model.
The existing video super-resolution network framework can be summarized into three types: iterative network frameworks, circular network frameworks, and hybrid network frameworks. The iterative network framework considers only low resolution video frames as processing objects, generates a central high resolution video frame from a given central frame using its surrounding video frames (usually 1 to 3 frames before and after), and iteratively processes the entire video sequence in a sliding window fashion. The circular network framework uses past and current low resolution frames, and past super-resolution results as information sources, while disregarding future low resolution frames. The hybrid network framework integrates the object information of both, but still does not fully encompass the sources of information that are embedded in the video sequence.
Disclosure of Invention
In order to solve the technical problems, the invention provides a fully-known video super-resolution network, a video super-resolution reconstruction method and a system. The core idea is to take the past, present and future low-resolution video frames and the intermediate results (hidden states) generated in the super-resolution process as information sources to fully mine the time-space domain related information contained in the video sequence.
The invention provides a fully-known video super-resolution network which consists of two sub-networks, namely a precursor network and a successor network;
according to the processing directions of a precursor network and a successor network, the fully-known video super-resolution network is divided into a local fully-known video super-resolution network and a global fully-known video super-resolution network;
the processing directions of a precursor network and a successor network of the local fully-aware video super-resolution network are the same and are both forward; the processing process of the precursor network of the local fully-aware video super-resolution network comprises the following steps:
wherein the content of the first and second substances,a low resolution video frame representing a previous time instant, a current time instant and a next time instant,is a hidden state at the previous moment, NetpA representation of a precursor network is shown,is in a hidden state at the current moment,super-resolution video frame structure information generated for a precursor network at the current moment;
the processing directions of a precursor network and a successor network of the global fully-aware video super-resolution network are opposite, wherein the direction of the precursor network is backward, and the direction of the successor network is forward; the processing process of the pioneer network of the global fully-aware video super-resolution network comprises the following steps:
the subsequent network processing process comprises:
whereinIs a hidden state generated by the precursor network at the current time and the next time, andit is the last hidden state, Net, of the subsequent network itselfsThe representation of the subsequent network is shown,is a hidden state at the current time, andit is the super-resolution video frame detail information generated by the subsequent network at the current moment.
The method adopts the technical scheme that: a video super-resolution reconstruction method comprises the following steps:
step 1: selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling the images by r times to obtain low-resolution images which are used as the input of a fully-known video super-resolution network;
step 2: if the fully-known video super-resolution network is a local fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in the forward direction to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames;
if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
and step 3: inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence, and further generating hidden state and high-resolution detail information;
and 4, step 4: and adding the high-resolution structure information and the detail information generated in the step 2 and the step 3 to obtain a finally reconstructed high-resolution video frame.
The technical scheme adopted by the system of the invention is as follows: a video super-resolution reconstruction system, comprising the following modules:
the module 1 is used for selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling r times the images to obtain low-resolution images as the input of a fully-known video super-resolution network;
the module 2 is used for inputting the low-resolution frames into a precursor network in sequence if the fully-known video super-resolution network is a local fully-known video super-resolution network, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
a module 3, configured to input the low-resolution frame and the hidden state obtained in the module 2 into a subsequent network in sequence, and further generate hidden state and high-resolution detail information;
and the module 4 is used for adding the high-resolution structure information and the detail information generated in the modules 2 and 3 to obtain a final reconstructed high-resolution video frame.
The invention firstly uses a precursor network to carry out one-step rough processing on the video frame to generate the hidden state and high-resolution structure information corresponding to the low-resolution video frame. And then, the subsequent network inherits the hidden state generated by the precursor network, and further generates the hidden state and high-resolution detail information corresponding to each low-resolution video frame. And finally, adding the high-resolution structure information and the high-resolution detail information to obtain a finally reconstructed high-resolution video frame.
Drawings
Fig. 1 is a fully-known video super-resolution network framework diagram according to an embodiment of the present invention.
FIG. 2 is a flow chart of a method according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
Referring to fig. 1, the fully-aware video super-resolution network provided by the present invention is composed of two sub-networks, a predecessor network and a successor network;
according to the processing directions of a precursor network and a successor network, the fully-known video super-resolution network is divided into a local fully-known video super-resolution network and a global fully-known video super-resolution network;
the processing directions of a precursor network and a successor network of the local fully-known video super-resolution network are the same and are both forward; the processing process of the precursor network of the local fully-aware video super-resolution network comprises the following steps:
wherein the content of the first and second substances,a low resolution video frame representing a previous time instant, a current time instant and a next time instant,is a hidden state at the previous moment, NetpA representation of a precursor network is shown,is in a hidden state at the current moment,the current time is firstDriving super-resolution video frame structure information generated by a network;
the processing directions of a precursor network and a successor network of the global fully-known video super-resolution network are opposite, wherein the direction of the precursor network is backward, and the direction of the successor network is forward; the processing process of the pioneer network of the global fully-aware video super-resolution network comprises the following steps:
the subsequent network processing procedure is as follows:
whereinIs a hidden state generated by the precursor network at the current time and the next time, andit is the last hidden state, Net, of the subsequent network itselfsThe representation of the subsequent network is shown,is a hidden state at the current time, andit is the super-resolution video frame detail information generated by the subsequent network at the current moment.
It should be noted that the present invention designs a fully-aware video super-resolution network, which includes two sub-networks, a predecessor network and a successor network. However, the specific network structures of the precursor network and the subsequent network are not designed in the invention, and because any network with any structure can be used as the precursor network or the subsequent network, the network is integrated into the fully-known video super-resolution network designed by the invention as long as the input and output forms of the network satisfy the formulas (1), (2) and (3).
Referring to fig. 2, the method for reconstructing super-resolution video provided by the present invention includes the following steps:
step 1: selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling the images by r times to obtain low-resolution images which are used as the input of a fully-known video super-resolution network;
step 2: if the fully-known video super-resolution network is a local fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in the forward direction to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames;
if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
the invention adopts a pioneer network firstly, and the pioneer network NetpThe information source of processing consecutive video frames in forward or backward order includes two aspects: low resolution video frames at previous, current and next momentsAnd the hidden state generated in the process of exceeding the mark at the last momentOrBased on these two types of information, the precursor network cyclically generates all the low resolution video framesCorresponding hidden stateAnd high resolution video frame structure information
And step 3: inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence, and further generating hidden state and high-resolution detail information;
all low resolution video frames are generated due to the precursor networkCorresponding hidden stateThe successor network can inherit its hidden state information, so its information source includes three aspects: low resolution video frames at previous, current and next momentsHidden state generated in the process of overtaking at last momentAnd a hidden state inherited from the current and next moments of the precursor networkAndbased on this information, the subsequent network further refines the generation of all low resolution video framesCorresponding hidden stateAnd high resolution video frame detail informationIn summary, the successor network can use the last time whenThe low resolution video frames and the hidden states at the previous moment and the next moment can fully utilize the time-space domain information contained in the video sequence.
And 4, step 4: and adding the high-resolution structure information and the detail information generated in the step 2 and the step 3 to obtain a finally reconstructed high-resolution video frame.
The precursor network firstly processes the low-resolution video frame in a coarsening way to generate the high-resolution video frameThe subsequent network processes the low-resolution video frame in a refined way to generate a high-resolution video frameContains detail high-frequency information, so that the detail high-frequency information and the detail high-frequency information are added to obtain a final reconstructed high-resolution video frame
The final super-resolution video frame output is:
wherein the content of the first and second substances,for the final super-resolution video frame output,representing high-resolution video frame detail information generated by a subsequent network,representing high resolution video frame structure information generated by the precursor network.
In the embodiment, a loss function is further constructed to respectively constrain the precursor network and the whole framework, so that the performance of the network model is optimized.
Limiting the final reconstructed high resolution video frames using a variant of the L1 loss functionHigh resolution video frames close to realityWhile also constraining high resolution video frame structure informationHigh resolution video frames close to realityBut the specific gravity is adjusted by a parameter alpha to balance the weights of the precursor network and the subsequent network.
The loss function constructed in this example is:
in the formula (I), the compound is shown in the specification,representing a real high-resolution video frame,represents the finally generated super-resolution video frame, andrepresenting high-resolution video frame structure information generated by a precursor network; t is the number of frames, ε is a small constant, typically set to 10-3And alpha is a weight for adjusting the specific gravity of the precursor network.
The method can make full use of the intra-frame spatial correlation and the inter-frame time correlation contained in the video sequence, and can generate high-fidelity video and keep higher speed.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (5)
1. A fully-aware video super-resolution network, comprising: the system consists of a precursor network and a subsequent network;
according to the processing directions of a precursor network and a successor network, the fully-known video super-resolution network is divided into a local fully-known video super-resolution network and a global fully-known video super-resolution network;
the processing directions of a precursor network and a successor network of the local fully-aware video super-resolution network are the same and are both forward; the processing process of the precursor network of the local fully-aware video super-resolution network comprises the following steps:
wherein the content of the first and second substances,a low resolution video frame representing a previous time instant, a current time instant and a next time instant,is a hidden state at the previous moment, NetpA representation of a precursor network is shown,is in a hidden state at the current moment,super-resolution video frame structure information generated for a precursor network at the current moment;
the processing directions of a precursor network and a successor network of the global fully-aware video super-resolution network are opposite, wherein the direction of the precursor network is backward, and the direction of the successor network is forward; the processing process of the pioneer network of the global fully-aware video super-resolution network comprises the following steps:
the subsequent network processing process comprises:
whereinIs a hidden state generated by the precursor network at the current time and the next time, andit is the last hidden state, Net, of the subsequent network itselfsThe representation of the subsequent network is shown,is a hidden state at the current time, andit is the super-resolution video frame detail information generated by the subsequent network at the current moment.
2. A video super-resolution reconstruction method is characterized by comprising the following steps:
step 1: selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling the images by r times to obtain low-resolution images which are used as the input of a fully-known video super-resolution network;
step 2: if the fully-known video super-resolution network is a local fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in the forward direction to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames;
if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
and step 3: inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence, and further generating hidden state and high-resolution detail information;
and 4, step 4: and adding the high-resolution structure information and the detail information generated in the step 2 and the step 3 to obtain a finally reconstructed high-resolution video frame.
3. The method for reconstructing super-resolution video of claim 2, wherein in step 4, the super-resolution video frames generated by the precursor network and the subsequent network are added to obtain a final super-resolution video frame, and the final super-resolution video frame output is as follows:
4. The video super-resolution reconstruction method according to claim 2 or 3, characterized in that: constructing a loss function to respectively constrain a precursor network and a fully-known video super-resolution network, and optimizing the performance of the fully-known video super-resolution network;
the loss function was constructed as:
in the formula (I), the compound is shown in the specification,representing a real high-resolution video frame,represents the finally generated super-resolution video frame, andrepresenting high-resolution video frame structure information generated by a precursor network; t is the frame number, and epsilon is a smaller constant; and alpha is a weight for adjusting the specific gravity of the precursor network.
5. The video super-resolution reconstruction system is characterized by comprising the following modules:
the module 1 is used for selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling r times the images to obtain low-resolution images as the input of a fully-known video super-resolution network;
the module 2 is used for inputting the low-resolution frames into a precursor network in sequence if the fully-known video super-resolution network is a local fully-known video super-resolution network, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;
a module 3, configured to input the low-resolution frame and the hidden state obtained in the module 2 into a subsequent network in sequence, and further generate hidden state and high-resolution detail information;
and the module 4 is used for adding the high-resolution structure information and the detail information generated in the modules 2 and 3 to obtain a final reconstructed high-resolution video frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110549356.0A CN113344780A (en) | 2021-05-20 | 2021-05-20 | Fully-known video super-resolution network, and video super-resolution reconstruction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110549356.0A CN113344780A (en) | 2021-05-20 | 2021-05-20 | Fully-known video super-resolution network, and video super-resolution reconstruction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113344780A true CN113344780A (en) | 2021-09-03 |
Family
ID=77469699
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110549356.0A Pending CN113344780A (en) | 2021-05-20 | 2021-05-20 | Fully-known video super-resolution network, and video super-resolution reconstruction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113344780A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180085002A1 (en) * | 2013-03-11 | 2018-03-29 | Carestream Dental Technology Topco Limited | Method and System For Three-Dimensional Imaging |
CN109102462A (en) * | 2018-08-01 | 2018-12-28 | 中国计量大学 | A kind of video super-resolution method for reconstructing based on deep learning |
CN110706155A (en) * | 2019-09-12 | 2020-01-17 | 武汉大学 | Video super-resolution reconstruction method |
-
2021
- 2021-05-20 CN CN202110549356.0A patent/CN113344780A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180085002A1 (en) * | 2013-03-11 | 2018-03-29 | Carestream Dental Technology Topco Limited | Method and System For Three-Dimensional Imaging |
CN109102462A (en) * | 2018-08-01 | 2018-12-28 | 中国计量大学 | A kind of video super-resolution method for reconstructing based on deep learning |
CN110706155A (en) * | 2019-09-12 | 2020-01-17 | 武汉大学 | Video super-resolution reconstruction method |
Non-Patent Citations (1)
Title |
---|
PENG YI等: "Omniscient Video Super-Resolution", 《ARXIV:2103.15683V1 [EESS.IV]》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109636721B (en) | Video super-resolution method based on countermeasure learning and attention mechanism | |
CN111242883A (en) | Dynamic scene HDR reconstruction method based on deep learning | |
CN110942424A (en) | Composite network single image super-resolution reconstruction method based on deep learning | |
CN115115516B (en) | Real world video super-resolution construction method based on Raw domain | |
CN114245007A (en) | High frame rate video synthesis method, device, equipment and storage medium | |
CN112836652A (en) | Multi-stage human body posture estimation method based on event camera | |
CN116091288A (en) | Diffusion model-based image steganography method | |
Yang et al. | Image super-resolution reconstruction based on improved Dirac residual network | |
CN110634101A (en) | Unsupervised image-to-image conversion method based on random reconstruction | |
CN113344780A (en) | Fully-known video super-resolution network, and video super-resolution reconstruction method and system | |
CN112215140A (en) | 3-dimensional signal processing method based on space-time countermeasure | |
CN113610707A (en) | Video super-resolution method based on time attention and cyclic feedback network | |
CN112950498A (en) | Image defogging method based on countermeasure network and multi-scale dense feature fusion | |
CN113379606A (en) | Face super-resolution method based on pre-training generation model | |
CN116091337B (en) | Image enhancement method and device based on event signal nerve coding mode | |
CN104182931B (en) | Super resolution method and device | |
Sun et al. | ESinGAN: Enhanced single-image GAN using pixel attention mechanism for image super-resolution | |
CN116385265B (en) | Training method and device for image super-resolution network | |
CN116958192A (en) | Event camera image reconstruction method based on diffusion model | |
CN113538225A (en) | Model training method, image conversion method, device, equipment and storage medium | |
US20230186608A1 (en) | Method, device, and computer program product for video processing | |
CN106131567B (en) | Ultraviolet aurora up-conversion method of video frame rate based on Lattice Boltzmann | |
Hengl et al. | SAGA vs GRASS: a comparative analysis of the two open source desktop GIS for the automated analysis of elevation data | |
Zhou et al. | Mixed Attention Densely Residual Network for Single Image Super-Resolution. | |
Qin et al. | Remote sensing image super-resolution using multi-scale convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210903 |
|
RJ01 | Rejection of invention patent application after publication |