CN113487481B

CN113487481B - Circular video super-resolution method based on information construction and multi-density residual block

Info

Publication number: CN113487481B
Application number: CN202110746815.4A
Authority: CN
Inventors: 于明; 王书韵; 薛翠红; 郭迎春; 朱叶; 于洋; 师硕; 阎刚; 刘依
Original assignee: Hebei University of Technology; Tianjin University of Technology
Current assignee: Hebei University of Technology; Tianjin University of Technology
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2022-04-12
Anticipated expiration: 2041-07-02
Also published as: CN113487481A

Abstract

The invention relates to a circular video super-resolution method based on information construction and multiple density residual blocks, which utilizes an introduced information construction module to carry out prior information construction and filling on an initial circular neural network, so that the circular neural network with the front ordinal number has enough information to reconstruct a front frame, and simultaneously utilizes the multiple density residual blocks to extract the characteristics representing deep information and propagate in the circular neural network in a hidden state; the method comprises the following steps: preprocessing video data to obtain a corresponding low-resolution frame sequence; a construction information construction module, which inputs the first m frames of low resolution frame sequence and outputs the initial hidden information h₀And initial pre-output information o₀(ii) a And constructing a cyclic neural network of a plurality of dense residual blocks, and inputting two outputs of the information construction module into the cyclic neural network of the plurality of dense residual blocks to obtain a super-resolution frame sequence. And the system can be competent for online super-resolution tasks.

Description

Circular video super-resolution method based on information construction and multi-density residual block

Technical Field

The technical scheme of the invention relates to super-resolution reconstruction of videos, in particular to a circular video super-resolution method based on information construction and multiple density residual blocks.

Background

With the advent of the 5G era, videos gradually replace images to become mainstream information of internet transmission, and more ultra-high definition video data change lives of people without sound. However, how to quickly and clearly deliver the large amount of high-definition video to the hands of the users is undoubtedly a problem to be solved for application manufacturers. High quality video is often compressed and encoded and then transmitted in a channel, and then transmitted to a terminal for decoding. However, this process is lossy, and some pixels may be lost during encoding and decoding, or during transmission, which results in greatly reduced quality of decoded video and affects user experience. The super-resolution technology is a method for restoring a low-resolution video into a high-resolution video by calculating and filling missing information, and the decoded video is processed by a super-resolution algorithm, so that the quality of the video can be improved, and the quality damage in the encoding, decoding and transmission processes is relieved.

Different from the super-resolution method of a single-frame image, video data is composed of continuous frames, and how to fully utilize space-time information is a key problem for recovering video details. The traditional super-resolution method based on reconstruction and learning is not ideal in restoration effect of ultra-high-definition videos, and the existing video super-resolution method of the unidirectional cyclic neural network is extremely poor in reconstruction effect of initial frames due to the fact that rich detail information cannot be obtained when the previous frames are reconstructed, and the super-resolution effect and the user experience degree are greatly influenced. The bidirectional cyclic neural network method cannot output reconstructed video frames in time, and is greatly limited in application fields such as video transmission. For example, CN111587447A discloses a frame-cycling video super-resolution method, which uses a displayed motion estimation and optical flow method to perform warping and motion compensation on a current frame to be processed to mine time information and then perform recursion and iteration, so as to achieve super-resolution reconstruction, and the method has the following disadvantages: the motion compensation method for the display depends on the precision of the optical flow calculated by motion estimation, and once deviation occurs in the calculation of the optical flow, serious artifacts and distortion problems can be caused, so that the super-resolution result is seriously influenced. CN109102462A discloses a video super-resolution reconstruction method based on deep learning, which first uses a bidirectional circulation network to extract forward and backward features, and then uses deep 3D back projection to integrate time and space information. The disadvantages of this method are: the bidirectional circulation network can be further input into the next module only by calculating all video frames first, so that the bidirectional circulation network cannot process online in a video super-resolution task, a large number of 3D convolutions are used when time-space information is integrated, the calculation complexity is greatly increased, the calculation burden is increased, and a separation type method of extracting first and integrating second cannot multiplex generated characteristic information, so that the calculation efficiency is greatly reduced, and particularly when a video with higher resolution or more frames is processed, the time is long, and the requirement on an internal memory is high. CN111260560A discloses a multi-frame video super-resolution method fused with an attention mechanism, which connects a 3D convolution feature alignment module and a deformed convolution feature alignment module together as an implicit alignment module, and then uses a plurality of residual blocks added with a space and channel attention mechanism to carry out feature reconstruction. The disadvantages of this method are: firstly, the 3D convolution feature alignment module and the deformation convolution feature alignment module are both time-consuming ultrahigh modules, so that the calculation parameter quantity is greatly improved, the super-resolution time is greatly prolonged, and the efficiency of the super-resolution process is seriously influenced. Secondly, when the method uses the residual block to reconstruct the features, the local features generated by the residual block cannot be effectively and repeatedly used, the computing resources are wasted, and the added attention mechanism is slightly influenced on the feature reconstruction, and the computing load is increased.

Disclosure of Invention

Aiming at the defects of the prior art, the technical problems to be solved by the invention are as follows: the method utilizes an information construction module to carry out prior information construction and filling on an initial cyclic neural network, so that the previous cyclic neural network has enough information to reconstruct a previous frame, the defect of poor reconstruction effect of a one-way cyclic neural network method on the previous frame is overcome, memory occupation and calculation cost are reduced compared with a two-way cyclic neural network, and the method can be competent for an online super-resolution task; the characteristics of deep information are extracted and integrated by utilizing a plurality of dense residual blocks, and the deep information is spread in a hidden state in a recurrent neural network, so that the defect of insufficient integration of space-time information in the prior art is overcome.

The technical scheme adopted by the invention for solving the technical problem is as follows: the method comprises the steps of constructing and filling prior information for an initial cyclic neural network by utilizing an introduced information construction module, enabling the cyclic neural network with the front ordinal number to have enough information to reconstruct a front frame, extracting features representing deep information by utilizing a multi-density residual block, and transmitting the features in a hidden state in the cyclic neural network; the method comprises the following steps:

preprocessing video data to obtain corresponding low resolution frame sequence

And (3) constructing an information construction module: the information construction module comprises a feature extraction and channel attention, quick information architecture adjustment block and a structure simulation block, wherein the input is a first m-frame low-resolution frame sequence, and the output is initial hidden information h₀And initial pre-output information o₀；

The process of feature extraction and channel attention is as follows: first a sequence of low resolution frames I^LRIn the previous m (m is more than or equal to 1 and less than or equal to n) frames, each frame is subjected to convolution processing to carry out feature extraction to obtain shallow layer features

Then, shallow features are overlapped on channel dimensions and input into a channel attention module SE to obtain a shallow feature screening set K;

the specific structure of the quick information architecture adjusting block is as follows: inputting a shallow feature screening set K into a multi-residual block which is stacked in series after passing through a convolution and ReLU activation function, and performing deep feature extraction to output deep information features, wherein the residual block is formed by connecting two convolutions and a jump layer;

the specific operation of the structure simulation block is as follows: processing deep information features output by the rapid information architecture adjusting block through a convolution layer and a ReLU activation function to obtain initial hidden information, meanwhile, performing feature dimensionality reduction on the output deep information features through another convolution layer, feeding the output deep information features into a sub-pixel convolution layer, and adding the sub-pixel convolution layer and a low-resolution initial image subjected to r-time upsampling processing to obtain initial pre-output information;

constructing a cyclic neural network of a plurality of dense residual blocks, inputting two outputs of an information construction module into the cyclic neural network of the plurality of dense residual blocks, and obtaining the output h of the cyclic neural network of the t ordinal number through t times of circulation_t、o_tAnd super-resolution image

If at this time t<n, the operation of the recurrent neural network with the (t + 1) th ordinal number is carried out, if t is equal to n, the super-resolution calculation of all n frames is completed, and a super-resolution frame sequence is obtained

The method comprises the following specific steps:

firstly, video data is preprocessed:

rotating or overturning the video data by using random function and self-defined threshold, and extracting frames from the rotated or overturned video data to obtain a video frame sequence V ═ (V ═ V-₁,…,v_n) Where n is the total number of frames of the video, v₁,v_nA first frame and a last frame, respectively, of V; carrying out average clipping processing on all video frames to ensure that the width and the height of the video frames are unified to 256 x 256 pixel values, and obtaining a high-resolution frame sequence corresponding to the video frame sequence V

Then for the resulting high resolution frame sequence I^HRPerforming Gaussian blur processing with variance sigma of 1.6 and blur radius of 3 and down-sampling processing with sampling multiple r to obtain corresponding low-resolution frame sequence

And secondly, using partial low-resolution frame sequences to carry out information construction:

the process of feature extraction and channel attention is as follows: firstly, obtaining an initial n-frame low-resolution frame sequence I in a first step^LRIn the previous m (m is more than or equal to 1 and less than or equal to n, m is 7 in the embodiment) frames, each frame is subjected to convolution processing to carry out feature extraction, and shallow layer features are obtained

Then, shallow features are overlapped on channel dimensions and input into a channel attention module (SE), and the operation process of the SE is divided into two steps of compression and excitation: compressing through a given feature map (shallow features)

) Performing global average pooling to obtain global compression characteristic quantity of the current characteristic diagram, exciting a two-layer fully-connected bottleneck layer structure with a ReLU activation function in the middle to obtain a weight of each channel in the characteristic diagram, and performing weighting on the characteristic diagram and the weight to output a shallow characteristic screening set K; as shown in the following equation (2):

in the formula (2), [, ] represents the superposition operation of the channel dimensions, and SE (-) is the channel attention operation;

then, the obtained shallow feature screening set K passes through a quick information framework adjusting block to obtain deep information features, the deep information features are constructed through a structure simulation block to output initial hidden information (h)₀) And initial pre-output information (o)₀) Wherein:

the specific process of the operation of the fast information architecture adjusting block is as follows: and (3) inputting the shallow feature screening set K into a multi-residual block which is stacked in series after passing through a convolution and ReLU activation function to carry out deep feature extraction, and outputting deep information features, wherein the residual block is formed by connecting two convolutions and a jump layer.

h₀expressed by equation (3):

h₀＝ReLU(Conv(FIA(K))) (3)

o₀expressed by equation (4):

in formulae (3) and (4), H_spc(. H) is a sub-pixel convolution operation, FIA (. H) is a fast information frame adjustment block, + represents the pixel addition of the same channel, H_us() r times bilinear upsampling operation, or other upsampling methods, Conv is convolution operation, the convolution kernels of the convolution operations in the formula (3) and the formula (4) are the same in size, the convolution parameters of the convolution kernels are different, ReLU is activation function operation,

for a sequence of low resolution frames I^LRThe first frame in (1);

thirdly, shallow layer feature extraction in the recurrent neural network of the multi-density residual block:

because the ordinal number of the recurrent neural network and the subscript of the low-resolution frame needing super-resolution are in one-to-one correspondence, t is uniformly defined as the ordinal number of the recurrent neural network or the subscript of the low-resolution frame needing super-resolution, and the parameters of the recurrent neural network corresponding to different ordinal numbers are shared, t is traversed to n from 1 in sequence, and t is more than or equal to 1 and less than or equal to n; in the cyclic neural network of the t ordinal number, the low resolution frame sequence I obtained in the first step is processed^LROf the t-th and t-1-th frames

And

hidden state h output by t-1 th cyclic neural network_t-1And downsampling the output result o by a factor of r_t-1In the channel dimension, let h be connected in series when t is 1_t-1，o_t-1And

respectively initialized to h calculated in the second step₀，o₀And in a first step a sequence of low resolution frames I^LRIn (1)

Inputting the series-connected sequence into a shallow feature extraction module in a recurrent neural network of a multi-density residual block to obtain a shallow feature F_tThe specific process is formula (5):

in the formula (5), F_tFor shallow features, Down (-) is downsampled, [, ]]For series connection in the channel dimension, H_sfe(r) is a shallow feature extraction moduleThe object is to perform feature extraction, comprising a convolutional layer, a nonlinear activation function ReLU layer and a downsampling operation, output o_t-1After being processed by the down sampling operation, h_t-1、

Inputting convolution layers together, and finally obtaining shallow layer characteristics F through nonlinear activation function ReLU layer output_t；

Fourthly, extracting and integrating depth detail information through the multi-density residual block:

the shallow layer characteristic F obtained in the third step is processed_tDepth detail information extraction and integration are performed through multi-dense residual blocks, each multi-dense residual block comprises p dense residual blocks (RDBs) which are sequentially connected in series (the number of the dense residual blocks used in the experiment is 8-12), wherein the specific operation of each dense residual block is as follows: first, shallow feature F_tInputting the two-dimensional convolution neural network with convolution kernel of 3 x 3, and inputting the features output by the two-dimensional convolution into a ReLU activation function for activation to obtain intermediate features

Then the intermediate features

And shallow feature F_tConnecting in series in channel dimension, inputting into convolution and activation function structure same as above to obtain intermediate characteristics

And so on to get the intermediate features

And shallow feature F_tInputting the same convolution and activation function structure in series in the channel dimension, and then obtaining the intermediate characteristics

Intermediate characteristics

And shallow feature F_tThe channel dimensions are connected in series and input into a feature integration layer to obtain a feature integration mapping, and the obtained feature integration mapping is added with shallow features originally input into a dense residual block to obtain the output of a first dense residual block

See formula (6):

in the formula (6), H_ff(. The) is a characteristic integration layer, which consists of a 1X 1 two-dimensional convolution neural network and only restricts the channel;

will be provided with

Inputting the second dense residual block to obtain the output of the second dense residual block

By analogy, the final output of the multiple dense residual blocks is obtained

See formula (7):

in formula (7), RDB₁,RDB₂,…,RDB_pAll are dense residual blocks with the same structure but not shared parameters;

the fifth step, hiding state h_tObtaining:

the output of the multi-density residual block obtained in the fourth step

Inputting the result into a hidden state generation module to obtain the hidden state output of the recurrent neural network of the t ordinal number, which is shown in formula (8):

h in the formula (8)_hg() is a hidden state generating module, which is composed of a 3 × 3 two-dimensional convolution neural network and a ReLU activation function, and the convolution is used for outputting hidden states;

and sixthly, obtaining a final super-resolution image through an SR reconstruction module:

simultaneously outputting the multi-density residual block obtained in the fourth step

Inputting the super-resolution image to an SR reconstruction module to obtain a final super-resolution image, wherein the process comprises the following steps: output of multiple dense residual blocks

Channel feature reduction and blending is performed through a convolution layer and then fed into a sub-pixel convolution layer which rearranges all pixels to H x W x r²c, arranging the feature maps into super-resolution residual images with the size of rH multiplied by rW multiplied by c, wherein the values of H and W are both 64, r is a sampling multiple, and c is the number of color channels; the super-resolution residual image and the low-resolution LR image subjected to r-time up-sampling processing are combined

Adding to obtain the final output result o_tI.e. obtaining a reconstructed super-resolution image

See formula (9):

the convolution layer in equation (9) adopts convolution with convolution kernel size of 3 × 3, H_spc(. H) is a sub-pixel convolution operation, H_us() r times bilinear upsampling operation;

so far, the output h of the recurrent neural network of the t ordinal number is obtained_t、o_tAnd super-resolution image

If at this time t<n, returning to the third step to operate the cyclic neural network with the (t + 1) th ordinal number, and if t is equal to n, indicating that the super-resolution calculation of all n frames is completed to obtain a super-resolution frame sequence

Seventh, the loss for the input video frame is calculated:

constructing a circular neural network model based on information construction and multiple density residual blocks through the first step to the sixth step, calculating a first frame with super-resolution, acquiring a subsequent super-resolution frame sequence through the circulation from the third step to the sixth step, and measuring the acquired final super-resolution frame sequence I^SRWith the high-resolution frame sequence I obtained in the first step^HRThe difference between the two, the L1 loss function is adopted during training,

l in equation (10) is the value of the calculated L1 loss function;

and eighth step, combining the super-resolution image results into a video:

according to the original video frame rate, the super-resolution frame sequence I after the super-resolution is processed^SRAnd synthesizing the video with the corresponding frame rate, and finishing the process of the circulating video super-resolution processing based on the information construction and the multi-density residual block.

The sampling multiple r is 4, and the number c of color channels is 3; m is 5 to 7, and n is 8 to 12.

Compared with the prior art, the invention has the beneficial effects that:

the outstanding substantive features of the invention are as follows:

(1) the method of the invention proposes a novel module, called information construction module. The method can reconstruct the previous frame by extracting the information of the previous frames, thereby not needing to carry out reverse circulation on all the frames, greatly reducing the calculation complexity and the time overhead, and simultaneously strengthening the reconstruction effect of the previous frame. In addition, the device is a migratory component and can be applied to any one-way recurrent neural network to improve the reconstruction effect of the devices. A large number of experiments show that the circular video super-resolution method based on information construction and multi-density residual block provided by the invention surpasses all current models in reconstruction effect, and the effectiveness of the module is verified.

(2) The method of the invention provides an improved recurrent neural network. The multi-density residual block is embedded into the recurrent neural network, and the accuracy of the video super-resolution is greatly improved because the multi-density residual block can be extracted to the depth detail information and the depth detail information is transmitted among frames in an implicit mode. The multi-density residual block realizes the real-time implementation of deep feature extraction and fusion, realizes the feature multiplexing, can keep higher signal-to-noise ratio, and the peak signal-to-noise ratio of the Vid4 benchmark test set reaches more than 28 dB.

The invention has the remarkable advantages that:

(1) compared with the CN111587447A method, the method of the invention adopts the improved recurrent neural network to propagate the high-dimensional implicit motion state. The invention has the outstanding substantive characteristics and remarkable progress that the optical flow does not need to be calculated, and the motion estimation and the motion compensation are displayed, and the super-resolution effect and the precision are better and faster than those of the optical flow method.

(2) Compared with the CN109102462A method, the method of the invention adopts the multi-density residual block to extract and integrate the detail information. The method has the outstanding substantive characteristics and remarkable progress that a large number of 3D convolution blocks are not needed to integrate and extract the space-time information, but a method based on multiple dense residual blocks is used for extracting and integrating detailed information, so that the computational complexity is greatly reduced, and the method adopts an improved single-cycle neural network, is an end-to-end method and has low memory occupancy rate.

(3) Compared with the CN111260560A method, the method provided by the invention adopts the cyclic neural network to carry out interframe information integration. The invention has the outstanding substantive characteristics and remarkable progress that the implicit alignment is carried out without a large number of 3D convolutions and deformable convolutions, but the interframe information is transferred and integrated by using the unidirectional cyclic neural network, and the time complexity and the space complexity are greatly reduced. And the module is used as the dense residual block during the feature extraction and reconstruction, so that the super-resolution effect is better and the efficiency is better due to the reuse of the features compared with the residual block.

(4) According to the method, the dense residual block is used for extracting and fusing the detail information, the feature map of the hidden layer can be fully utilized, the reuse of the features improves the use value of the generated features, so that the depth detail information can be better restored, particularly, annoying artifacts can not occur when an object with a large motion range in a video is encountered, and the super-resolution effect is better. Compared with RLSP, the problem of gradient disappearance can be avoided, and meanwhile, the precision of super-resolution is greatly improved. In addition, the RLSP has poor reconstruction effect of the previous frame due to no additional processing on the information of the previous frame, the invention adopts the information construction module to fully extract the detail information and the structure information of the previous frame, thereby greatly improving the reconstruction effect of the previous frame,

drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a block diagram showing the flow of the super-resolution method of the circular video based on information construction and multi-density residual block.

Fig. 2 is a schematic structural diagram of a circular video super-resolution method based on information construction and multi-density residual block in the present invention.

Fig. 3 is a schematic structural diagram of an information construction module in the present invention.

FIG. 4 is a schematic diagram of a recurrent neural network structure of multiple dense residual blocks in the present invention.

Fig. 5 is a schematic structural diagram of one dense residual block in multiple dense residual blocks in the present invention.

Detailed Description

The embodiment shown in fig. 1 shows that the process of the method for super-resolution of a circular video based on information construction and multi-density residual block according to the present invention is as follows:

obtaining video → preprocessing video data → constructing information by using partial low-resolution frame sequence → extracting shallow layer feature in a multi-density residual cyclic neural network → extracting and integrating depth detail information by using a multi-density residual block → obtaining hidden state → obtaining final super-resolution image by using an SR reconstruction module → calculating loss of input video frame → combining super-resolution image results into video → completing the cyclic video super-resolution method based on the multi-density residual block.

The embodiment shown in fig. 2 shows a super resolution method for a circular video based on information construction and multiple dense residual blocks, where the number of hidden state channels is 128, the number of output channels of the hidden layer is 48, the number of input and output channels is 3, m is the number of input frames, and the initialization is set to 7.

The embodiment shown in fig. 3 shows a schematic structural diagram of an information construction module, where m is an input frame number and is set to 7, 5 serially connected residual blocks are used in a multi-residual block in a fast information architecture adjustment block, and K is a shallow feature screening set. The information construction module comprises a feature extraction and channel attention, a rapid information architecture adjustment block and a structure simulation block, and input 1-m frames output input hidden states required by the initial recurrent neural network through the feature extraction and channel attention, the rapid information architecture adjustment and the structure simulation.

The embodiment shown in FIG. 4 shows a recurrent neural network structure with multiple dense residual blocksA schematic diagram, i.e. the recurrent neural network framework of the multi-dense residual block of fig. 2, F_tIn order to be a shallow layer characteristic,

is the final output of the multi-dense residual block.

Fig. 5 is a schematic diagram of a dense residual block according to an embodiment of the present invention, wherein,

F_tthe channel sizes of (a) and (b) are all 128, in order to ensure the time efficiency of the present invention,

the number of channels of (2) is taken to be 32. All convolutions except the first convolution are channel-series-plus-convolution operations.

Example 1

The method for super-resolution of the circular video based on the information construction and the multi-density residual block comprises the following steps:

firstly, video data is preprocessed:

Then for the resulting high resolution frame sequence I^HRPerforming Gaussian blur processing with variance sigma of 1.6 and blur radius of 3 and down-sampling processing with sampling multiple of 4 to obtain corresponding low-resolution frame sequence

As shown in the following equation (1):

I^LR＝BlurDown(AverageCrop(V)) (1)

in formula (1), BlurDown (. cndot.) is Gaussian fuzzy down-sampling, and AverageCrop (. cndot.) is average clipping;

) And performing global average pooling to obtain global compression characteristic quantity of the current characteristic diagram, exciting a bottleneck layer structure with a ReLU activation function in the middle and two layers of full connection to obtain a weight of each channel in the characteristic diagram, and weighting the characteristic diagram and the weight to output a shallow characteristic screening set K, wherein the formula (2) is as follows:

h₀expressed by equation (3):

h₀＝ReLU(Conv(FIA(K))) (3)

o₀expressed by equation (4):

for a sequence of low resolution frames I^LRThe first frame in (1);

because the ordinal number of the recurrent neural network and the subscript of the low-resolution frame needing super-resolution are in one-to-one correspondence, t is defined as the ordinal number of the recurrent neural network or the subscript of the low-resolution frame needing super-resolution, and the recurrent neural networks corresponding to different ordinal numbersParameters passing through the network are shared, t is traversed to n from 1 in sequence, and t is more than or equal to 1 and less than or equal to n; in the cyclic neural network of the t ordinal number, the low resolution frame sequence I obtained in the first step is processed^LROf the t-th and t-1-th frames

And

I.e. the initial season

in the formula (5), F_tFor shallow features, Down (-) is downsampled, [, ]]For series connection in the channel dimension, H_sfe(. h) is a shallow feature extraction module for feature extraction, comprising a convolutional layer, a nonlinear activation function ReLU layer and a downsampling operation, and outputting o_t-1After being processed by the down sampling operation, h_t-1、

the shallow layer characteristic F obtained in the third step is processed_tExtracting and integrating depth detail information by using multi-density residual blocks, wherein each multi-density residual block comprises p dense residual blocks (RDBs) which are sequentially connected in series, and the specific operation of each dense residual block is as follows: first, shallow feature F_tInputting the two-dimensional convolution neural network with convolution kernel of 3 x 3, and inputting the features output by the two-dimensional convolution into a ReLU activation function for activation to obtain intermediate features

Then the intermediate features

And so on to get the intermediate features

Intermediate characteristics

And shallow feature F_tThe channel dimensions are connected in series and input into a feature integration layer to obtain a feature integration mapping, and the obtained feature integration mapping is added with shallow features originally input into a dense residual block to obtain the output of a first dense residual blockGo out

See formula (6):

will be provided with

By analogy, the final output of the multiple dense residual blocks is obtained

See formula (7):

in formula (7), RDB₁,RDB₂,…,RDB_pAll are dense residual blocks with the same structure but not shared parameters; in the present embodiment, 10 dense residual blocks are provided in total, i.e., p is 10.

The fifth step, hiding state h_tObtaining:

the output of the multi-density residual block obtained in the fourth step

Channel feature reduction and blending is performed through a convolution layer and then fed into a sub-pixel convolution layer which rearranges all pixels to H x W x r²c, arranging the feature maps into super-resolution residual images with the size of rH multiplied by rW multiplied by c, wherein the values of H and W are both 64, r is a sampling multiple, and c is the number of color channels; the super-resolution residual image and the low-resolution LR image subjected to the up-sampling processing by 4 times are processed

See formula (9):

the convolution layer in equation (9) adopts convolution with convolution kernel size of 3 × 3, H_spc(. H) is a sub-pixel convolution operation, H_usR times bilinear upsamplingOperating;

Seventh, the loss for the input video frame is calculated:

l in equation (10) is the value of the calculated L1 loss function;

and eighth step, combining the super-resolution image results into a video:

The circular video super-resolution method based on the information construction and the multiple Dense Residual blocks is characterized in that the Dense Residual blocks are abbreviated as RDB in English, are called Residual Dense Block in full, are abbreviated as SE in channel attention, are called Squeeze and Excitation in full, the Linear rectification functions are abbreviated as ReLU in English, are called Rectified Linear Unit in full, and are all known in the technical field of BlurDown downsampling.

Nothing in this specification is said to apply to the prior art.

Claims

1. A circular video super-resolution method based on information construction and multi-density residual block is characterized in that: the method comprises the steps that an introduced information building module is used for building and filling prior information for an initial cyclic neural network, so that the cyclic neural network with the front ordinal number has enough information to reconstruct a front frame, and meanwhile, a multi-density residual block is used for extracting features representing deep information and propagating the features in the cyclic neural network in a hidden state; the method comprises the following steps:

preprocessing video data to obtain corresponding low resolution frame sequence

The process of feature extraction and channel attention is as follows: first a sequence of low resolution frames I^LRIn the previous m frames, each frame is subjected to convolution processing to extract features, m is more than or equal to 1 and less than or equal to n, and shallow layer features are obtained

constructing a cyclic neural network of a plurality of dense residual blocks, inputting two outputs of an information construction module into the cyclic neural network of the plurality of dense residual blocks, and obtaining the output h of the cyclic neural network of the t ordinal number through t times of circulation_tAnd o_tSuper-resolution image

2. The method for super resolution of circular videos based on information construction and multi-density residual block as claimed in claim 1, wherein the operation process of the channel attention module SE is divided into two steps of compression and excitation: compressing by giving shallow features

And performing global average pooling to obtain global compression characteristic quantity of the current characteristic diagram, exciting a bottleneck layer structure with a ReLU activation function in the middle and two layers of full connection to obtain a weight of each channel in the characteristic diagram, weighting the characteristic diagram and the weight to output a shallow characteristic screening set K, and expressing by a formula (2):

in equation (2), [, ] represents the superposition operation of the channel dimensions, and SE (·) is the channel attention operation.

3. The method for super resolution of a circular video based on information construction and multiple dense residual blocks as claimed in claim 1, wherein the circular neural network of multiple dense residual blocks comprises a shallow feature extraction module, multiple dense residual blocks, a hidden state generation module and an SR reconstruction module,

the multiple dense residual blocks are used for extracting and integrating depth detail information of shallow output Ft of the shallow feature extraction module, each multiple dense residual block comprises p dense residual blocks, p is an integer not less than 2, and the specific operation of each dense residual block is as follows: first, shallow feature F_tInputting the two-dimensional convolution neural network with convolution kernel of 3 x 3, and inputting the features output by the two-dimensional convolution into a ReLU activation function for activation to obtain intermediate features

A convolution and activation function structure is formed by a two-dimensional convolution neural network with a convolution kernel of 3 multiplied by 3 and a ReLU activation function; then the intermediate features

And so on to get the intermediate features

Intermediate characteristics

Will be provided with

As input to the second dense residual block, an output of the second dense residual block is obtained by the second dense residual block as described above

By analogy, the final output of the multiple dense residual blocks is obtained

The final output of the multi-dense residual block is

Inputting the result to a hidden state generation module to obtain the hidden state output h of the recurrent neural network of the t-th ordinal number_tSimultaneously outputting the final output of the multi-density residual block

Input to an SR reconstruction module to obtain a final super-resolution image

The characteristic integration layer is composed of a 1 x 1 two-dimensional convolution neural network; the hidden state generation module is composed of a 3 x 3 two-dimensional convolutional neural network and a ReLU activation function.

4. A circular video super-resolution method based on information construction and multi-density residual block comprises the following steps:

firstly, video data is preprocessed:

the process of feature extraction and channel attention is as follows: firstly, obtaining an initial n-frame low-resolution frame sequence I in a first step^LRIn the previous m frames, each frame is subjected to convolution processing to extract features, m is more than or equal to 1 and less than or equal to n, and shallow layer features are obtained

Then, shallow features are overlapped on channel dimensions and input into a channel attention module SE, and the operation process of the SE is divided into two steps of compression and excitationThe method comprises the following steps: by compressing in a given feature map, i.e. shallow features

Performing global average pooling to obtain global compression characteristic quantity of the current characteristic diagram, exciting a two-layer fully-connected bottleneck layer structure with a ReLU activation function in the middle to obtain a weight of each channel in the characteristic diagram, and performing weighting on the characteristic diagram and the weight to output a shallow characteristic screening set K;

then, the obtained shallow feature screening set K is subjected to a quick information framework adjusting block to obtain deep information features, the deep information features are subjected to construction of output information through a structure simulation block, and initial hidden information h is output₀And initial pre-output information o₀Wherein:

the specific process of the operation of the fast information architecture adjusting block is as follows: inputting a shallow feature screening set K into a multi-residual block which is stacked in series after passing through a convolution and ReLU activation function, and performing deep feature extraction to output deep information features, wherein the residual block is formed by connecting two convolutions and a jump layer;

h₀expressed by equation (3):

h₀＝ReLU(Conv(FIA(K))) (3)

o₀expressed by equation (4):

in formulae (3) and (4), H_spc(. cndot.) is a sub-pixel convolution operation, FIA (. cndot.) is a fast information architecture adjustment blockAnd + represents the pixel addition of the same channel, H_us() r times bilinear upsampling operation, Conv convolution operation, convolution kernel sizes of convolution operations in formula (3) and formula (4) are the same, convolution parameters are different, ReLU is activation function operation,

for a sequence of low resolution frames I^LRThe first frame in (1);

And

hidden state h output by t-1 th cyclic neural network_t-1And downsampling the output result o by a factor of r_t-1In the channel dimension, let h be connected in series when t is 1_t-1、o_t-1And

respectively initialized to h calculated in the second step₀、o₀And in a first step a sequence of low resolution frames I^LRIn (1)

in the formula (5), F_tFor shallow features, Down (-) is downsampled, [, ]]For series connection in the channel dimension, H_sfe(. to) is a shallow layer feature extraction module, which is used for extracting features and comprises a convolution layer and a nonlinear activation function ReLU layer, down sampling and output o_t-1After down sampling treatment, h_t-1、

the shallow layer characteristic F obtained in the third step is processed_tExtracting and integrating depth detail information through multiple dense residual blocks, wherein each multiple dense residual block comprises p dense residual blocks which are sequentially connected in series, and the specific operation of each dense residual block is as follows: first, shallow feature F_tInputting the two-dimensional convolution neural network with convolution kernel of 3 x 3, and inputting the features output by the two-dimensional convolution into a ReLU activation function for activation to obtain intermediate features

Then the intermediate features

And so on to get the intermediate features

Intermediate characteristics

See formula (6):

in the formula (6), H_ff(. The) is a characteristic integration layer, which consists of a 1 x 1 two-dimensional convolution neural network;

will be provided with

By analogy, the final output of the multiple dense residual blocks is obtained

See formula (7):

the fifth step, hiding state h_tObtaining:

the output of the multi-density residual block obtained in the fourth step

Channel feature reduction and blending is performed through a convolution layer and then fed into a sub-pixel convolution layer which rearranges all pixels to H x W x r²c, arranging the feature maps into super-resolution residual images with the size of rH multiplied by rW multiplied by c, wherein the values of H and W are both 64, r is a sampling multiple, and c is the number of color channels; will be overdividedResolution residual image and low resolution LR image subjected to r-time upsampling processing

See formula (9):

so far, the output h of the recurrent neural network of the t ordinal number is obtained_tAnd o_tSuper-resolution image

Seventh, the loss for the input video frame is calculated:

l in equation (10) is the value of the calculated L1 loss function;

and eighth step, combining the super-resolution image results into a video:

5. The method for super resolution of a circular video based on information construction and multi-density residual block according to claim 4, wherein the sampling multiple is 4, the number of color channels c is 3; m is 5-7, n is 8-12; the number of dense residual blocks is 8-12.