CN111062867A - Video super-resolution reconstruction method - Google Patents
Video super-resolution reconstruction method Download PDFInfo
- Publication number
- CN111062867A CN111062867A CN201911150376.XA CN201911150376A CN111062867A CN 111062867 A CN111062867 A CN 111062867A CN 201911150376 A CN201911150376 A CN 201911150376A CN 111062867 A CN111062867 A CN 111062867A
- Authority
- CN
- China
- Prior art keywords
- resolution
- low
- reconstructed
- image
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 131
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000009467 reduction Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 10
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 17
- 238000005070 sampling Methods 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 3
- 235000008322 Trichosanthes cucumerina Nutrition 0.000 description 2
- 244000078912 Trichosanthes cucumerina Species 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
The application discloses a video super-resolution reconstruction method. The video super-resolution reconstruction method comprises the following steps: obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block; training a convolutional neural network model by taking a plurality of low-resolution training pattern blocks as input data and high-resolution training pattern blocks as truth value labels; and performing super-resolution reconstruction on the low-resolution video to be reconstructed by utilizing the trained convolutional neural network model. The method and the device fully consider the time-related characteristics of the video in the training process of the model, so that the unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in the subsequent super-resolution reconstruction.
Description
Technical Field
The application relates to the technical field of video processing, in particular to a video super-resolution reconstruction method.
Background
Super-resolution reconstruction refers to a technique of processing a low-resolution image or video by using a computer to obtain a high-resolution image or video. The super-resolution reconstruction can provide more detail information than the traditional interpolation method, so that the quality of an image or video can be greatly improved.
In the related art, most super-resolution algorithms are directed to a single image, that is, a frame-by-frame processing mode is adopted. Specifically, the video super-resolution reconstruction algorithm usually intercepts a video as a frame-by-frame picture, performs super-resolution reconstruction on the whole picture, and connects each frame result into a video, which causes unnecessary computation overhead.
In addition, for videos with higher resolution in a monitoring scene, the whole image super-resolution processing is performed on one frame of video screenshot, so that the situation of video memory overflow can occur in the embedded device, and the purpose of real-time processing cannot be achieved.
Disclosure of Invention
The application mainly provides a video super-resolution reconstruction method, which aims to solve the problem that the existing single-frame processing operation cost is high.
In order to solve the technical problem, the application adopts a technical scheme that: a real-time display control video super-resolution reconstruction method is provided. The video super-resolution reconstruction method comprises the following steps: obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block; training a convolutional neural network model by using a plurality of low-resolution training pattern blocks as input data and using high-resolution training pattern blocks as truth labels; and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
The beneficial effect of this application is: different from the situation of the prior art, the convolutional neural network model is trained by taking a plurality of low-resolution training image blocks which are continuous in time as input data, and then the time-related characteristics of videos are fully considered in the training process of the model, so that the unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in the subsequent super-resolution reconstruction.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, obviously, the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a flow chart of an embodiment of a super-resolution video reconstruction method provided by the present application;
FIG. 2 is a schematic flow chart of S10 in FIG. 1;
FIG. 3 is a schematic flow chart of S12 in FIG. 2;
FIG. 4 is a schematic flow chart of S20 in FIG. 1;
FIG. 5 is a schematic flow chart of S30 in FIG. 1;
FIG. 6 is a schematic flow chart of S31 in FIG. 5;
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second" and "third" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" includes at least two, e.g., two, three, etc., unless specifically limited otherwise. Furthermore, the terms "including" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In this application, "high resolution" in "the high resolution training tiles, the high resolution training video, the high resolution training tiles, the low resolution training video or the low resolution training tiles, the high resolution output tiles, the low resolution video to be reconstructed, the low resolution image to be reconstructed, and the low resolution tiles to be reconstructed … …" may be higher than the resolution of "the low resolution".
Referring to fig. 1, a flow chart of an embodiment of a video super-resolution reconstruction method provided by the present application is schematically illustrated.
S10: a plurality of low resolution training patterns that are temporally consecutive and a high resolution training pattern corresponding to at least one of the low resolution training patterns are obtained.
Specifically, referring to fig. 2, in one embodiment, S10 may be performed as follows.
S11: converting the low resolution training video into a plurality of low resolution training images that are continuous in time; converting a high resolution training video corresponding to the low resolution training video into a plurality of high resolution training images that are consecutive in time. The low resolution training video and the high resolution training video may be obtained from an open video training set, or may be obtained by capturing the same scene using a low resolution camera and a high resolution camera that are pre-calibrated.
The operation of converting video into pictures is already a very mature prior art, and the commonly used software is ffmpeg, mencoder and the like. Therefore, all training videos and all to-be-reconstructed videos suitable for video reconstruction by using the method in the scheme can be converted into temporally continuous frame images by using the software ffmpeg and mencoder.
S12: performing tile partitioning on the low resolution training image and the high resolution training image, respectively.
In order to realize the real-time super-resolution reconstruction of a video and prevent the video memory overflow of an embedded device from influencing the reconstruction effect in the process of reconstructing the whole frame image. The method and the device divide the high-resolution training image and the low-resolution training image into a plurality of smaller image blocks which are continuous in time in a network training stage.
Specifically, referring to fig. 3, step S12 may be performed as follows.
S121: and respectively converting the low-resolution training image and the high-resolution training image into YUV images.
YUV is a color space that is adopted by european television systems. Including YUV444, YUV422, YUV420P, etc., the YUV format adopted in the present application is not particularly limited. Where Y refers to the brightness of the color, i.e., the luminance, and U and V refer to the hue. YUV is easy to realize compression, convenient to transmit and process, and can reduce and eliminate color conversion processing, thereby greatly accelerating the display speed of images. The Y value, the U value and the V value in the YUV format can be calculated by using R, G, B values of corresponding pixel points in the RGB format through formulas. Alternatively, it is calculated by the following conversion formula:
s122: y-channel image obtained by YUV image
Specifically, a split function and a merge function in opencv are a pair of reciprocal operations, each channel of the YUV image can be separated by using the split function, then U, V two channels are cleared, and finally the Y channel and the U, V cleared channel are merged by using the merge function, so that a Y channel picture can be obtained.
S123: and carrying out block division on the Y-channel image.
Alternatively, the number of pixels may be specified at the edge position in the length direction of the low-resolution training image as the length value x of the tile, and the number of pixels may be specified at the edge position in the width direction thereof as the width value y of the tile, thereby determining that the size of the training tile is x × y. Wherein x and y may be equal or unequal, and are not limited herein. The number of the segments divided by each training image may be determined as appropriate, but is not limited.
S13: the method comprises the steps of taking a plurality of low-resolution training images, respectively obtaining a plurality of low-resolution training image blocks at a first position of each image, and obtaining a high-resolution training image block from a first position of at least one high-resolution training image.
The same method as in step S123 is used to segment a plurality of low-resolution training images that are continuous in the time dimension. The number of the obtained plurality of temporally continuous low resolution training pattern blocks is an odd number not less than 3, the high resolution training pattern block corresponds to a low resolution training pattern block arranged in the middle of the plurality of low resolution training pattern blocks, and the low resolution training pattern block arranged in the middle of the plurality of low resolution training pattern blocks is subjected to hyper-segmentation reconstruction by using the plurality of low resolution training pattern blocks as a whole input network. For example, in one embodiment, a series of 5 low resolution training patches are obtained from a first position in five temporally successive low resolution training images as a set of low resolution training patches, and a high resolution training patch is obtained from a first position in a high resolution training image corresponding to a third low resolution training image.
S20: and training the convolutional neural network model by taking the plurality of low-resolution training pattern blocks as input data and taking the corresponding high-resolution training pattern blocks as truth labels.
The convolutional neural network model structure in the present application includes:
(1) an input layer: inputting a plurality of low resolution tiles into a subsequent layer as a whole in a temporal dimension;
(2) and (3) rolling layers: the method is used for processing the plurality of low-resolution image blocks by feature extraction, dimension reduction, nonlinear mapping, dimension ascending and the like. In the application, besides the common convolution layer, at least one variable convolution layer is further added, the variable convolution layer further shifts and adjusts position information of spatial sampling in a plurality of low-resolution training image blocks, and features extracted from the same object in the plurality of low-resolution training image blocks are aligned, so that the network can better adapt to geometric deformation of the object;
(3) excitation layer: adding an excitation function Relu after each convolution layer, and adding nonlinear factors for processing;
(4) and (3) deconvolution layer: and performing upsampling on the feature image block after the dimension reduction to obtain an amplified feature image block.
(5) An output layer: and outputting the high-resolution training picture block.
Referring to fig. 4, step S20 may be performed as follows.
S21: and performing convolution operation on the plurality of low-resolution training image blocks as a whole to extract features from the plurality of low-resolution training image blocks and perform dimensionality reduction processing.
In particular, a variable convolution operation is performed on an input plurality of low resolution training patches as a whole using a deformable convolution layer. The deformable convolution layer is formed by adding an offset variable deltas to the position of each sampling point in a convolution kerneln. With these variables, the convolution kernel can be sampled arbitrarily around the current position, and is no longer limited to the previous regular lattice points. So that the size and shape of the receptive field is no longer regular. At this time, the position of the sampling point is changed from the original snBecome sn+ΔsnBecause jitter may exist between each frame of image in the video, the position information of spatial sampling in the low-resolution training image blocks is further shifted and adjusted by utilizing variable convolution, and the features extracted from the same object in the low-resolution training image blocks are aligned, so that the network can better adapt to the geometric deformation of the object. Accordingly, in the deformable convolution layer in the embodiment of the present invention, the pixel values extracted by the convolution kernel are:
wherein, y(s)0) Is the position s in the training blocks0Image ofElemental value, w(s)n) The convolution kernel is the weight s of convolution operation at the corresponding positions of a plurality of training image blocksnIs the position of the sample point in the convolution kernel, x(s)0+sn+Δsn) Is the pixel value, Δ s, of the corresponding location of the training image blocksnIs an offset variable. Alternatively, the present application may set the offset variable Δ snAnd rounding to enable the sampling point to fall on a pixel point of the target image, and then calculating the pixel value of the point. Avoiding the need for a bias variable deltasnWhen the decimal point is a decimal point with high precision, the sampling point can not fall on each pixel point in the training image block. Of course, the offset variable Δ s may not be limitednThen the pixel values at the corresponding sampling points of the training image block are obtained by interpolation algorithm, such as bilinear interpolation. This application is not limited thereto. And then adding common convolution layer dimensionality reduction, nonlinear mapping, dimensionality lifting and other processing.
And S22, performing deconvolution operation on the current frame feature image block after dimension reduction by using a deconvolution layer to obtain a high-resolution output image block.
Deconvolution, also known as transposed convolution, deconvolution, is generally used to reduce or enlarge the size of the feature map. According to the method, the deconvolution layer is arranged behind the common convolution layer, so that the feature image blocks subjected to dimension reduction are up-sampled and amplified.
S23: a penalty cost is calculated between the high resolution output tile and the high resolution training tile using a penalty function.
Specifically, the mean square error MSE may be used as a loss function, and the MSE calculation formula is as follows:
where J (θ) is a loss function, m represents the number of low resolution tiles in the training set, hθ(xi) Is the output of the convolutional neural network, xiIs the ith low resolution training pattern, yiThe high-resolution training pattern block corresponding to the ith low-resolution training pattern block is used as a truth label.
S24: and updating the model parameters of the convolutional neural network model by using a gradient back propagation algorithm according to the loss cost so that the loss cost tends to be minimized.
Specifically, when a gradient back propagation algorithm is performed to train parameters, a gradient descent algorithm is required. The gradient descent algorithm used in the present invention is a random gradient descent algorithm. Of course, other gradient descent algorithms may be used, and are not specifically limited herein, as the case may be.
In the gradient descent algorithm, a loss function is set as follows:
and the gradient of the loss function is:
after the gradient of the loss function is obtained, the parameter θ in the convolutional layer can be updated by:
in the above formula, J (θ) is a loss function, m represents the number of low resolution tiles in the training set, hθ(xi) Is the output of the convolutional neural network, xiIs the ith low resolution training pattern, yiThe high-resolution training pattern block corresponding to the ith low-resolution training pattern block is used as a true value label. In the convolutional neural network, the parameter to be determined has an offset variable Δ snWeight parameter w, bias parameter b. Wherein the offset parameter deltas of the deformable convolution layer is applied by a gradient back propagation algorithm according to the loss costnUpdating to enable the deformable convolution layer to be selected from each of the low resolution training patchesTend to align spatially.
S30: and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
Specifically, referring to fig. 5, step S30 may be performed as follows.
S31: and acquiring a plurality of low-resolution blocks to be reconstructed continuously in time from the low-resolution video to be reconstructed. Referring to fig. 6, S31 may be performed as follows.
S311: and acquiring a plurality of temporally continuous low-resolution images to be reconstructed from the low-resolution video to be reconstructed.
The operation of converting the low-resolution video to be reconstructed into temporally continuous frame pictures is already described in step S11, and is not described herein again.
S312: and judging whether the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model or not. Wherein the input size is the size of the low resolution training tile.
S313: and if the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model, segmenting the low-resolution image to be reconstructed to obtain temporally continuous low-resolution image blocks to be reconstructed.
Because the scheme is applied to the embedded device, the hardware chip does not support the variable data size of the input network, and therefore the size of the small image blocks for segmenting the original image needs to be ensured to be consistent. In the segmentation process, the small segment to be segmented is inevitably smaller than the network input size, and if black edges or white edges are compensated for the segment, the black and white edges can influence the over-segmentation result, so that the edge ringing phenomenon is caused. In order to solve the problem, the scheme is used for segmenting the original image, an overlapping area exists between the image blocks, and the size of each image block is ensured to be consistent with the network input size by adjusting the overlapping area. The low-resolution image to be reconstructed can be segmented using the following formula:
the method comprises the steps that s is a network input size in a preset direction, m is a size of an overlapped part of a low-resolution block to be reconstructed in the preset direction, n is a size of a non-overlapped part of the low-resolution block to be reconstructed with the overlapped part on two sides in the preset direction, w is a size of an image to be reconstructed with a low resolution in the preset direction, p is the number of blocks of the low-resolution block to be reconstructed, which are cut from the image to be reconstructed with the low resolution in the preset direction, and is an integer not smaller than an upper integer value of w/s, w and p are known quantities, and m and n are quantities to be solved.
S314: and if the size of the low-resolution image to be reconstructed is smaller than or equal to the network input size, performing mirror image edge-filling on the low-resolution image to be reconstructed so as to enable the size of the low-resolution image to be reconstructed after edge-filling to be equal to the network input size.
In this scheme, mirror edge-filling of the image to be reconstructed with low resolution may be implemented by using the padrray function in MATLAB, and specifically, by controlling the padval parameter in the padrray function, padval is equal to 'symmetric', where 'symmetric' indicates that the image size is expanded by performing mirror reflection around the boundary.
S32: and inputting the plurality of low-resolution blocks to be reconstructed as a whole into the trained convolutional neural network to generate high-resolution reconstructed blocks.
And splicing the block to be reconstructed of the current frame and a plurality of frames of blocks to be reconstructed before and after the block to be reconstructed of the current frame to obtain the stack of the blocks in the time dimension.
And taking the spliced characteristic image blocks as input, sending the input into a trained convolutional neural network, and performing multi-stage characteristic extraction and transformation on the time dimension through operations such as convolution, dimensionality reduction and the like to fuse the information of the time dimension and finally obtain the characteristic image block fused with the information of the current frame.
And after a characteristic image block with current frame information fused is obtained, performing up-sampling on the characteristic image block by utilizing deconvolution to obtain a final high-resolution reconstruction image block.
S33: a high resolution reconstructed video is generated using the high resolution reconstructed tiles.
And if the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model, splicing a plurality of high-resolution reconstruction image blocks corresponding to the same frame into a high-resolution reconstruction image according to the splicing coordinates when the super-resolution reconstruction of all the image blocks is finished in sequence. And the splicing coordinate is obtained by multiplying the segmentation coordinate by a super-scale u, wherein the segmentation coordinate is the length of the rest part of each block to be reconstructed after the overlapped part of each block is removed when the image to be reconstructed is segmented, and the super-scale u is the magnification factor of the original image. For example, if the resolution of the input low-resolution tiles is 10 × 10, and the resolution thereof after the super-resolution reconstruction becomes 100 × 100, the super-resolution scale u is 100/10 — 10.
If the size of the low-resolution image to be reconstructed is smaller than or equal to the input size of the convolutional neural network model, cutting off an edge area of a mirror image edge filling area corresponding to the low-resolution image to be reconstructed in the high-resolution reconstruction image blocks when the super-resolution reconstruction of all the image blocks is finished in sequence.
In the scheme, when the size of the image to be reconstructed with the low resolution is smaller than or equal to the input size of the convolutional neural network model, mirror edge compensation of the image to be reconstructed with the low resolution is realized by using the padraray function in MATLAB, and when the function is calculated, the image is trimmed to the original size. Therefore, when the over-partition reconstruction of all the image blocks is finished in sequence, the scheme automatically cuts off the part of the mirror image edge repairing.
Compared with the prior art, the technical scheme of the scheme has the advantages that a plurality of low-resolution training image blocks which are continuous in time are used as input data to train the convolutional neural network model, so that the time-related characteristics of videos are fully considered in the training process of the model, and unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in subsequent super-resolution reconstruction.
Furthermore, the variable convolution layer is added in the convolution neural network, the displacement adjustment is carried out on the position information of the same characteristic of a plurality of low-resolution image blocks which are continuous in time, adjacent frames can be aligned in the network, end-to-end video super-resolution training is realized, and the preprocessing process of carrying out motion compensation on the low-resolution image blocks before input is avoided.
In addition, the image to be reconstructed is segmented or edge-supplemented by utilizing the characteristic that the input size of the embedded medium convolution network is fixed, so that the size of a block is consistent with the input size of the network. The problems of overlarge size and insufficient video memory of an image to be reconstructed are solved, time consumption is greatly reduced, and the function of real-time super-resolution reconstruction of a video with large resolution is realized.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.
Claims (11)
1. A method for reconstructing super-resolution video, the method comprising:
obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block;
training a convolutional neural network model with the plurality of low resolution training tiles as input data and the high resolution training tiles as truth labels;
and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
2. The method of claim 1, wherein the step of obtaining a plurality of temporally successive low resolution training tiles and a high resolution training tile corresponding to at least one of the low resolution training tiles comprises:
converting the low resolution training video into a plurality of low resolution training images that are continuous in time; converting a high resolution training video corresponding to the low resolution training video into a plurality of high resolution training images that are continuous in time;
performing tile division on the low-resolution training image and the high-resolution training image respectively;
the low resolution training patches are obtained from first locations of the plurality of low resolution training images, respectively, and the high resolution training patches are obtained from first locations of at least one of the high resolution training images.
3. The method of claim 2, wherein the step of tiling on the low-resolution training images and the high-resolution training images, respectively, comprises:
respectively converting the low-resolution training image and the high-resolution training image into YUV images;
and carrying out tile block division on the Y-channel image of the YUV image.
4. The method of claim 1, wherein the number of the plurality of low resolution training tiles is an odd number not less than 3, and the high resolution training tile corresponds to the low resolution training tile centrally located in the plurality of low resolution training tiles to hyper-reconstruct the low resolution training tile centrally located in the plurality of low resolution training tiles by inputting the plurality of low resolution training tiles as an overall input network.
5. The method of claim 1, wherein the step of training a convolutional neural network model with the plurality of low resolution training tiles as input data and the high resolution training tiles as truth labels comprises:
performing convolution operation on the plurality of low-resolution training image blocks as a whole to extract features from the plurality of low-resolution training image blocks and perform dimensionality reduction processing;
carrying out deconvolution operation on the dimensionality reduced features by using a deconvolution layer so as to obtain a high-resolution output image block;
calculating a loss cost between the high resolution output tile and the high resolution training tile using a loss function;
and updating the model parameters of the convolutional neural network model by using a gradient back propagation algorithm according to the loss cost so that the loss cost tends to be minimized.
6. The method of claim 5, wherein the step of separately convolving the plurality of low resolution training tiles comprises:
performing convolution operation on the plurality of low-resolution training image blocks by using a deformable convolution layer respectively;
the step of updating the model parameters of the convolutional neural network model using a back propagation algorithm according to the loss cost comprises:
updating the offset parameters of the deformable convolutional layers by using a back propagation algorithm according to the loss cost, so that the features extracted by the deformable convolutional layers from the same target in each low-resolution training image block tend to be aligned in space.
7. The method according to claim 1, wherein the step of performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model comprises:
obtaining a plurality of low-resolution blocks to be reconstructed continuously in time from the low-resolution video to be reconstructed;
inputting the plurality of low-resolution blocks to be reconstructed as a whole into the trained convolutional neural network to generate high-resolution reconstructed blocks;
generating a high resolution reconstructed video using the high resolution reconstructed tiles.
8. The method according to claim 7, wherein the step of obtaining a plurality of temporally successive low resolution blocks to be reconstructed from the low resolution video to be reconstructed comprises:
acquiring a plurality of temporally continuous low-resolution images to be reconstructed from the low-resolution video to be reconstructed;
comparing the size of the low-resolution image to be reconstructed with the input size of the convolutional neural network model;
preprocessing the low-resolution image to be reconstructed according to the comparison result so as to obtain a low-resolution image block to be reconstructed;
wherein the input size is a size of the low resolution training tile.
9. The method according to claim 8, wherein the step of preprocessing the low resolution image to be reconstructed according to the comparison result comprises:
if the size of the low-resolution image to be reconstructed is larger than the input size, segmenting the low-resolution image to be reconstructed by adopting the following formula so as to enable the size of the low-resolution image to be reconstructed after edge repairing to be equal to the input size,
wherein s is the input size in a preset direction, m is the size of the overlapping part of the low-resolution block to be reconstructed in the preset direction, n is the size of the non-overlapping part of the low-resolution block to be reconstructed with the overlapping part on both sides in the preset direction, w is the size of the low-resolution image to be reconstructed in the preset direction, p is the number of blocks of the low-resolution block to be reconstructed cut out of the low-resolution image to be reconstructed in the preset direction and is an integer not less than an upper integer of w/s, w and p are known quantities, and m and n are quantities to be solved;
and if the size of the low-resolution image to be reconstructed is smaller than the input size, performing mirror image edge repairing on the low-resolution image to be reconstructed so that the size of the low-resolution image to be reconstructed after edge repairing is equal to the input size.
10. The method of claim 7, wherein generating a high resolution reconstructed video using the high resolution reconstructed tiles comprises:
stitching a plurality of the high resolution reconstruction tiles corresponding to the same frame into a high resolution reconstructed image.
11. The method of claim 7,
the step of generating a high resolution reconstructed video using the high resolution reconstructed tiles comprises:
and cutting off an edge region of the high-resolution reconstruction image block, which corresponds to the mirror image edge-filling region of the low-resolution image to be reconstructed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911150376.XA CN111062867A (en) | 2019-11-21 | 2019-11-21 | Video super-resolution reconstruction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911150376.XA CN111062867A (en) | 2019-11-21 | 2019-11-21 | Video super-resolution reconstruction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111062867A true CN111062867A (en) | 2020-04-24 |
Family
ID=70298083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911150376.XA Pending CN111062867A (en) | 2019-11-21 | 2019-11-21 | Video super-resolution reconstruction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111062867A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667410A (en) * | 2020-06-10 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Image resolution improving method and device and electronic equipment |
CN113034380A (en) * | 2021-02-09 | 2021-06-25 | 浙江大学 | Video space-time super-resolution method and device based on improved deformable convolution correction |
CN113077385A (en) * | 2021-03-30 | 2021-07-06 | 上海大学 | Video super-resolution method and system based on countermeasure generation network and edge enhancement |
CN113344792A (en) * | 2021-08-02 | 2021-09-03 | 浙江大华技术股份有限公司 | Image generation method and device and electronic equipment |
CN116385391A (en) * | 2023-04-03 | 2023-07-04 | 深圳智现未来工业软件有限公司 | Multi-target detection method for wafer map defects |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108734660A (en) * | 2018-05-25 | 2018-11-02 | 上海通途半导体科技有限公司 | A kind of image super-resolution rebuilding method and device based on deep learning |
CN108734659A (en) * | 2018-05-17 | 2018-11-02 | 华中科技大学 | A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label |
CN108765291A (en) * | 2018-05-29 | 2018-11-06 | 天津大学 | Super resolution ratio reconstruction method based on dense neural network and two-parameter loss function |
CN109325915A (en) * | 2018-09-11 | 2019-02-12 | 合肥工业大学 | A kind of super resolution ratio reconstruction method for low resolution monitor video |
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN110070511A (en) * | 2019-04-30 | 2019-07-30 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
-
2019
- 2019-11-21 CN CN201911150376.XA patent/CN111062867A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN108734659A (en) * | 2018-05-17 | 2018-11-02 | 华中科技大学 | A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label |
CN108734660A (en) * | 2018-05-25 | 2018-11-02 | 上海通途半导体科技有限公司 | A kind of image super-resolution rebuilding method and device based on deep learning |
CN108765291A (en) * | 2018-05-29 | 2018-11-06 | 天津大学 | Super resolution ratio reconstruction method based on dense neural network and two-parameter loss function |
CN109325915A (en) * | 2018-09-11 | 2019-02-12 | 合肥工业大学 | A kind of super resolution ratio reconstruction method for low resolution monitor video |
CN110070511A (en) * | 2019-04-30 | 2019-07-30 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic equipment and storage medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111667410A (en) * | 2020-06-10 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Image resolution improving method and device and electronic equipment |
CN113034380A (en) * | 2021-02-09 | 2021-06-25 | 浙江大学 | Video space-time super-resolution method and device based on improved deformable convolution correction |
CN113034380B (en) * | 2021-02-09 | 2022-06-10 | 浙江大学 | Video space-time super-resolution method and device based on improved deformable convolution correction |
CN113077385A (en) * | 2021-03-30 | 2021-07-06 | 上海大学 | Video super-resolution method and system based on countermeasure generation network and edge enhancement |
CN113344792A (en) * | 2021-08-02 | 2021-09-03 | 浙江大华技术股份有限公司 | Image generation method and device and electronic equipment |
WO2023010701A1 (en) * | 2021-08-02 | 2023-02-09 | Zhejiang Dahua Technology Co., Ltd. | Image generation method, apparatus, and electronic device |
CN116385391A (en) * | 2023-04-03 | 2023-07-04 | 深圳智现未来工业软件有限公司 | Multi-target detection method for wafer map defects |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062867A (en) | Video super-resolution reconstruction method | |
US10970600B2 (en) | Method and apparatus for training neural network model used for image processing, and storage medium | |
US10542249B2 (en) | Stereoscopic video generation method based on 3D convolution neural network | |
EP2209090B1 (en) | Image restoring apparatus and method thereof | |
US6959117B2 (en) | Method and apparatus for deblurring and re-blurring image segments | |
US7260274B2 (en) | Techniques and systems for developing high-resolution imagery | |
US8494302B2 (en) | Importance filtering for image retargeting | |
US8520009B1 (en) | Method and apparatus for filtering video data using a programmable graphics processor | |
CN109462747B (en) | DIBR system cavity filling method based on generation countermeasure network | |
CA2702165C (en) | Image generation method and apparatus, program therefor, and storage medium which stores the program | |
CN112543317B (en) | Method for converting high-resolution monocular 2D video into binocular 3D video | |
WO2019071990A1 (en) | Image processing method and apparatus | |
JP2011095861A (en) | Image processing apparatus, control method and program | |
CN109643462B (en) | Real-time image processing method based on rendering engine and display device | |
JP2019029938A (en) | Error calculator and its program | |
WO2022033230A1 (en) | Image display method, electronic device and computer-readable storage medium | |
CN111582268B (en) | License plate image processing method and device and computer storage medium | |
RU2310911C1 (en) | Method for interpolation of images | |
JP2009147970A (en) | Reduction method of color blurring artifact in digital image | |
US12003859B2 (en) | Brightness adjustment method, and apparatus thereof | |
JP6902425B2 (en) | Color information magnifiers and color information estimators, and their programs | |
CN109754370B (en) | Image denoising method and device | |
KR101715349B1 (en) | Image correction method using bilateral interpolation | |
WO2023102189A2 (en) | Iterative graph-based image enhancement using object separation | |
CN117135466A (en) | Full-scene self-adaptive brightness correction fusion method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200424 |