CN111062867A - Video super-resolution reconstruction method - Google Patents

Video super-resolution reconstruction method Download PDF

Info

Publication number
CN111062867A
CN111062867A CN201911150376.XA CN201911150376A CN111062867A CN 111062867 A CN111062867 A CN 111062867A CN 201911150376 A CN201911150376 A CN 201911150376A CN 111062867 A CN111062867 A CN 111062867A
Authority
CN
China
Prior art keywords
resolution
low
reconstructed
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911150376.XA
Other languages
Chinese (zh)
Inventor
任馨怡
王枫
熊剑平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN201911150376.XA priority Critical patent/CN111062867A/en
Publication of CN111062867A publication Critical patent/CN111062867A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The application discloses a video super-resolution reconstruction method. The video super-resolution reconstruction method comprises the following steps: obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block; training a convolutional neural network model by taking a plurality of low-resolution training pattern blocks as input data and high-resolution training pattern blocks as truth value labels; and performing super-resolution reconstruction on the low-resolution video to be reconstructed by utilizing the trained convolutional neural network model. The method and the device fully consider the time-related characteristics of the video in the training process of the model, so that the unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in the subsequent super-resolution reconstruction.

Description

Video super-resolution reconstruction method
Technical Field
The application relates to the technical field of video processing, in particular to a video super-resolution reconstruction method.
Background
Super-resolution reconstruction refers to a technique of processing a low-resolution image or video by using a computer to obtain a high-resolution image or video. The super-resolution reconstruction can provide more detail information than the traditional interpolation method, so that the quality of an image or video can be greatly improved.
In the related art, most super-resolution algorithms are directed to a single image, that is, a frame-by-frame processing mode is adopted. Specifically, the video super-resolution reconstruction algorithm usually intercepts a video as a frame-by-frame picture, performs super-resolution reconstruction on the whole picture, and connects each frame result into a video, which causes unnecessary computation overhead.
In addition, for videos with higher resolution in a monitoring scene, the whole image super-resolution processing is performed on one frame of video screenshot, so that the situation of video memory overflow can occur in the embedded device, and the purpose of real-time processing cannot be achieved.
Disclosure of Invention
The application mainly provides a video super-resolution reconstruction method, which aims to solve the problem that the existing single-frame processing operation cost is high.
In order to solve the technical problem, the application adopts a technical scheme that: a real-time display control video super-resolution reconstruction method is provided. The video super-resolution reconstruction method comprises the following steps: obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block; training a convolutional neural network model by using a plurality of low-resolution training pattern blocks as input data and using high-resolution training pattern blocks as truth labels; and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
The beneficial effect of this application is: different from the situation of the prior art, the convolutional neural network model is trained by taking a plurality of low-resolution training image blocks which are continuous in time as input data, and then the time-related characteristics of videos are fully considered in the training process of the model, so that the unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in the subsequent super-resolution reconstruction.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, obviously, the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a flow chart of an embodiment of a super-resolution video reconstruction method provided by the present application;
FIG. 2 is a schematic flow chart of S10 in FIG. 1;
FIG. 3 is a schematic flow chart of S12 in FIG. 2;
FIG. 4 is a schematic flow chart of S20 in FIG. 1;
FIG. 5 is a schematic flow chart of S30 in FIG. 1;
FIG. 6 is a schematic flow chart of S31 in FIG. 5;
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first", "second" and "third" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" includes at least two, e.g., two, three, etc., unless specifically limited otherwise. Furthermore, the terms "including" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In this application, "high resolution" in "the high resolution training tiles, the high resolution training video, the high resolution training tiles, the low resolution training video or the low resolution training tiles, the high resolution output tiles, the low resolution video to be reconstructed, the low resolution image to be reconstructed, and the low resolution tiles to be reconstructed … …" may be higher than the resolution of "the low resolution".
Referring to fig. 1, a flow chart of an embodiment of a video super-resolution reconstruction method provided by the present application is schematically illustrated.
S10: a plurality of low resolution training patterns that are temporally consecutive and a high resolution training pattern corresponding to at least one of the low resolution training patterns are obtained.
Specifically, referring to fig. 2, in one embodiment, S10 may be performed as follows.
S11: converting the low resolution training video into a plurality of low resolution training images that are continuous in time; converting a high resolution training video corresponding to the low resolution training video into a plurality of high resolution training images that are consecutive in time. The low resolution training video and the high resolution training video may be obtained from an open video training set, or may be obtained by capturing the same scene using a low resolution camera and a high resolution camera that are pre-calibrated.
The operation of converting video into pictures is already a very mature prior art, and the commonly used software is ffmpeg, mencoder and the like. Therefore, all training videos and all to-be-reconstructed videos suitable for video reconstruction by using the method in the scheme can be converted into temporally continuous frame images by using the software ffmpeg and mencoder.
S12: performing tile partitioning on the low resolution training image and the high resolution training image, respectively.
In order to realize the real-time super-resolution reconstruction of a video and prevent the video memory overflow of an embedded device from influencing the reconstruction effect in the process of reconstructing the whole frame image. The method and the device divide the high-resolution training image and the low-resolution training image into a plurality of smaller image blocks which are continuous in time in a network training stage.
Specifically, referring to fig. 3, step S12 may be performed as follows.
S121: and respectively converting the low-resolution training image and the high-resolution training image into YUV images.
YUV is a color space that is adopted by european television systems. Including YUV444, YUV422, YUV420P, etc., the YUV format adopted in the present application is not particularly limited. Where Y refers to the brightness of the color, i.e., the luminance, and U and V refer to the hue. YUV is easy to realize compression, convenient to transmit and process, and can reduce and eliminate color conversion processing, thereby greatly accelerating the display speed of images. The Y value, the U value and the V value in the YUV format can be calculated by using R, G, B values of corresponding pixel points in the RGB format through formulas. Alternatively, it is calculated by the following conversion formula:
Figure BDA0002283357870000041
s122: y-channel image obtained by YUV image
Specifically, a split function and a merge function in opencv are a pair of reciprocal operations, each channel of the YUV image can be separated by using the split function, then U, V two channels are cleared, and finally the Y channel and the U, V cleared channel are merged by using the merge function, so that a Y channel picture can be obtained.
S123: and carrying out block division on the Y-channel image.
Alternatively, the number of pixels may be specified at the edge position in the length direction of the low-resolution training image as the length value x of the tile, and the number of pixels may be specified at the edge position in the width direction thereof as the width value y of the tile, thereby determining that the size of the training tile is x × y. Wherein x and y may be equal or unequal, and are not limited herein. The number of the segments divided by each training image may be determined as appropriate, but is not limited.
S13: the method comprises the steps of taking a plurality of low-resolution training images, respectively obtaining a plurality of low-resolution training image blocks at a first position of each image, and obtaining a high-resolution training image block from a first position of at least one high-resolution training image.
The same method as in step S123 is used to segment a plurality of low-resolution training images that are continuous in the time dimension. The number of the obtained plurality of temporally continuous low resolution training pattern blocks is an odd number not less than 3, the high resolution training pattern block corresponds to a low resolution training pattern block arranged in the middle of the plurality of low resolution training pattern blocks, and the low resolution training pattern block arranged in the middle of the plurality of low resolution training pattern blocks is subjected to hyper-segmentation reconstruction by using the plurality of low resolution training pattern blocks as a whole input network. For example, in one embodiment, a series of 5 low resolution training patches are obtained from a first position in five temporally successive low resolution training images as a set of low resolution training patches, and a high resolution training patch is obtained from a first position in a high resolution training image corresponding to a third low resolution training image.
S20: and training the convolutional neural network model by taking the plurality of low-resolution training pattern blocks as input data and taking the corresponding high-resolution training pattern blocks as truth labels.
The convolutional neural network model structure in the present application includes:
(1) an input layer: inputting a plurality of low resolution tiles into a subsequent layer as a whole in a temporal dimension;
(2) and (3) rolling layers: the method is used for processing the plurality of low-resolution image blocks by feature extraction, dimension reduction, nonlinear mapping, dimension ascending and the like. In the application, besides the common convolution layer, at least one variable convolution layer is further added, the variable convolution layer further shifts and adjusts position information of spatial sampling in a plurality of low-resolution training image blocks, and features extracted from the same object in the plurality of low-resolution training image blocks are aligned, so that the network can better adapt to geometric deformation of the object;
(3) excitation layer: adding an excitation function Relu after each convolution layer, and adding nonlinear factors for processing;
(4) and (3) deconvolution layer: and performing upsampling on the feature image block after the dimension reduction to obtain an amplified feature image block.
(5) An output layer: and outputting the high-resolution training picture block.
Referring to fig. 4, step S20 may be performed as follows.
S21: and performing convolution operation on the plurality of low-resolution training image blocks as a whole to extract features from the plurality of low-resolution training image blocks and perform dimensionality reduction processing.
In particular, a variable convolution operation is performed on an input plurality of low resolution training patches as a whole using a deformable convolution layer. The deformable convolution layer is formed by adding an offset variable deltas to the position of each sampling point in a convolution kerneln. With these variables, the convolution kernel can be sampled arbitrarily around the current position, and is no longer limited to the previous regular lattice points. So that the size and shape of the receptive field is no longer regular. At this time, the position of the sampling point is changed from the original snBecome sn+ΔsnBecause jitter may exist between each frame of image in the video, the position information of spatial sampling in the low-resolution training image blocks is further shifted and adjusted by utilizing variable convolution, and the features extracted from the same object in the low-resolution training image blocks are aligned, so that the network can better adapt to the geometric deformation of the object. Accordingly, in the deformable convolution layer in the embodiment of the present invention, the pixel values extracted by the convolution kernel are:
Figure BDA0002283357870000061
wherein, y(s)0) Is the position s in the training blocks0Image ofElemental value, w(s)n) The convolution kernel is the weight s of convolution operation at the corresponding positions of a plurality of training image blocksnIs the position of the sample point in the convolution kernel, x(s)0+sn+Δsn) Is the pixel value, Δ s, of the corresponding location of the training image blocksnIs an offset variable. Alternatively, the present application may set the offset variable Δ snAnd rounding to enable the sampling point to fall on a pixel point of the target image, and then calculating the pixel value of the point. Avoiding the need for a bias variable deltasnWhen the decimal point is a decimal point with high precision, the sampling point can not fall on each pixel point in the training image block. Of course, the offset variable Δ s may not be limitednThen the pixel values at the corresponding sampling points of the training image block are obtained by interpolation algorithm, such as bilinear interpolation. This application is not limited thereto. And then adding common convolution layer dimensionality reduction, nonlinear mapping, dimensionality lifting and other processing.
And S22, performing deconvolution operation on the current frame feature image block after dimension reduction by using a deconvolution layer to obtain a high-resolution output image block.
Deconvolution, also known as transposed convolution, deconvolution, is generally used to reduce or enlarge the size of the feature map. According to the method, the deconvolution layer is arranged behind the common convolution layer, so that the feature image blocks subjected to dimension reduction are up-sampled and amplified.
S23: a penalty cost is calculated between the high resolution output tile and the high resolution training tile using a penalty function.
Specifically, the mean square error MSE may be used as a loss function, and the MSE calculation formula is as follows:
Figure BDA0002283357870000071
where J (θ) is a loss function, m represents the number of low resolution tiles in the training set, hθ(xi) Is the output of the convolutional neural network, xiIs the ith low resolution training pattern, yiThe high-resolution training pattern block corresponding to the ith low-resolution training pattern block is used as a truth label.
S24: and updating the model parameters of the convolutional neural network model by using a gradient back propagation algorithm according to the loss cost so that the loss cost tends to be minimized.
Specifically, when a gradient back propagation algorithm is performed to train parameters, a gradient descent algorithm is required. The gradient descent algorithm used in the present invention is a random gradient descent algorithm. Of course, other gradient descent algorithms may be used, and are not specifically limited herein, as the case may be.
In the gradient descent algorithm, a loss function is set as follows:
Figure BDA0002283357870000072
and the gradient of the loss function is:
Figure BDA0002283357870000081
after the gradient of the loss function is obtained, the parameter θ in the convolutional layer can be updated by:
Figure BDA0002283357870000082
in the above formula, J (θ) is a loss function, m represents the number of low resolution tiles in the training set, hθ(xi) Is the output of the convolutional neural network, xiIs the ith low resolution training pattern, yiThe high-resolution training pattern block corresponding to the ith low-resolution training pattern block is used as a true value label. In the convolutional neural network, the parameter to be determined has an offset variable Δ snWeight parameter w, bias parameter b. Wherein the offset parameter deltas of the deformable convolution layer is applied by a gradient back propagation algorithm according to the loss costnUpdating to enable the deformable convolution layer to be selected from each of the low resolution training patchesTend to align spatially.
S30: and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
Specifically, referring to fig. 5, step S30 may be performed as follows.
S31: and acquiring a plurality of low-resolution blocks to be reconstructed continuously in time from the low-resolution video to be reconstructed. Referring to fig. 6, S31 may be performed as follows.
S311: and acquiring a plurality of temporally continuous low-resolution images to be reconstructed from the low-resolution video to be reconstructed.
The operation of converting the low-resolution video to be reconstructed into temporally continuous frame pictures is already described in step S11, and is not described herein again.
S312: and judging whether the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model or not. Wherein the input size is the size of the low resolution training tile.
S313: and if the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model, segmenting the low-resolution image to be reconstructed to obtain temporally continuous low-resolution image blocks to be reconstructed.
Because the scheme is applied to the embedded device, the hardware chip does not support the variable data size of the input network, and therefore the size of the small image blocks for segmenting the original image needs to be ensured to be consistent. In the segmentation process, the small segment to be segmented is inevitably smaller than the network input size, and if black edges or white edges are compensated for the segment, the black and white edges can influence the over-segmentation result, so that the edge ringing phenomenon is caused. In order to solve the problem, the scheme is used for segmenting the original image, an overlapping area exists between the image blocks, and the size of each image block is ensured to be consistent with the network input size by adjusting the overlapping area. The low-resolution image to be reconstructed can be segmented using the following formula:
Figure BDA0002283357870000091
the method comprises the steps that s is a network input size in a preset direction, m is a size of an overlapped part of a low-resolution block to be reconstructed in the preset direction, n is a size of a non-overlapped part of the low-resolution block to be reconstructed with the overlapped part on two sides in the preset direction, w is a size of an image to be reconstructed with a low resolution in the preset direction, p is the number of blocks of the low-resolution block to be reconstructed, which are cut from the image to be reconstructed with the low resolution in the preset direction, and is an integer not smaller than an upper integer value of w/s, w and p are known quantities, and m and n are quantities to be solved.
S314: and if the size of the low-resolution image to be reconstructed is smaller than or equal to the network input size, performing mirror image edge-filling on the low-resolution image to be reconstructed so as to enable the size of the low-resolution image to be reconstructed after edge-filling to be equal to the network input size.
In this scheme, mirror edge-filling of the image to be reconstructed with low resolution may be implemented by using the padrray function in MATLAB, and specifically, by controlling the padval parameter in the padrray function, padval is equal to 'symmetric', where 'symmetric' indicates that the image size is expanded by performing mirror reflection around the boundary.
S32: and inputting the plurality of low-resolution blocks to be reconstructed as a whole into the trained convolutional neural network to generate high-resolution reconstructed blocks.
And splicing the block to be reconstructed of the current frame and a plurality of frames of blocks to be reconstructed before and after the block to be reconstructed of the current frame to obtain the stack of the blocks in the time dimension.
And taking the spliced characteristic image blocks as input, sending the input into a trained convolutional neural network, and performing multi-stage characteristic extraction and transformation on the time dimension through operations such as convolution, dimensionality reduction and the like to fuse the information of the time dimension and finally obtain the characteristic image block fused with the information of the current frame.
And after a characteristic image block with current frame information fused is obtained, performing up-sampling on the characteristic image block by utilizing deconvolution to obtain a final high-resolution reconstruction image block.
S33: a high resolution reconstructed video is generated using the high resolution reconstructed tiles.
And if the size of the low-resolution image to be reconstructed is larger than the input size of the convolutional neural network model, splicing a plurality of high-resolution reconstruction image blocks corresponding to the same frame into a high-resolution reconstruction image according to the splicing coordinates when the super-resolution reconstruction of all the image blocks is finished in sequence. And the splicing coordinate is obtained by multiplying the segmentation coordinate by a super-scale u, wherein the segmentation coordinate is the length of the rest part of each block to be reconstructed after the overlapped part of each block is removed when the image to be reconstructed is segmented, and the super-scale u is the magnification factor of the original image. For example, if the resolution of the input low-resolution tiles is 10 × 10, and the resolution thereof after the super-resolution reconstruction becomes 100 × 100, the super-resolution scale u is 100/10 — 10.
If the size of the low-resolution image to be reconstructed is smaller than or equal to the input size of the convolutional neural network model, cutting off an edge area of a mirror image edge filling area corresponding to the low-resolution image to be reconstructed in the high-resolution reconstruction image blocks when the super-resolution reconstruction of all the image blocks is finished in sequence.
In the scheme, when the size of the image to be reconstructed with the low resolution is smaller than or equal to the input size of the convolutional neural network model, mirror edge compensation of the image to be reconstructed with the low resolution is realized by using the padraray function in MATLAB, and when the function is calculated, the image is trimmed to the original size. Therefore, when the over-partition reconstruction of all the image blocks is finished in sequence, the scheme automatically cuts off the part of the mirror image edge repairing.
Compared with the prior art, the technical scheme of the scheme has the advantages that a plurality of low-resolution training image blocks which are continuous in time are used as input data to train the convolutional neural network model, so that the time-related characteristics of videos are fully considered in the training process of the model, and unnecessary calculation overhead caused by frame-by-frame processing can be effectively avoided in subsequent super-resolution reconstruction.
Furthermore, the variable convolution layer is added in the convolution neural network, the displacement adjustment is carried out on the position information of the same characteristic of a plurality of low-resolution image blocks which are continuous in time, adjacent frames can be aligned in the network, end-to-end video super-resolution training is realized, and the preprocessing process of carrying out motion compensation on the low-resolution image blocks before input is avoided.
In addition, the image to be reconstructed is segmented or edge-supplemented by utilizing the characteristic that the input size of the embedded medium convolution network is fixed, so that the size of a block is consistent with the input size of the network. The problems of overlarge size and insufficient video memory of an image to be reconstructed are solved, time consumption is greatly reduced, and the function of real-time super-resolution reconstruction of a video with large resolution is realized.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (11)

1. A method for reconstructing super-resolution video, the method comprising:
obtaining a plurality of low-resolution training pattern blocks which are continuous in time and a high-resolution training pattern block corresponding to at least one low-resolution training pattern block;
training a convolutional neural network model with the plurality of low resolution training tiles as input data and the high resolution training tiles as truth labels;
and performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model.
2. The method of claim 1, wherein the step of obtaining a plurality of temporally successive low resolution training tiles and a high resolution training tile corresponding to at least one of the low resolution training tiles comprises:
converting the low resolution training video into a plurality of low resolution training images that are continuous in time; converting a high resolution training video corresponding to the low resolution training video into a plurality of high resolution training images that are continuous in time;
performing tile division on the low-resolution training image and the high-resolution training image respectively;
the low resolution training patches are obtained from first locations of the plurality of low resolution training images, respectively, and the high resolution training patches are obtained from first locations of at least one of the high resolution training images.
3. The method of claim 2, wherein the step of tiling on the low-resolution training images and the high-resolution training images, respectively, comprises:
respectively converting the low-resolution training image and the high-resolution training image into YUV images;
and carrying out tile block division on the Y-channel image of the YUV image.
4. The method of claim 1, wherein the number of the plurality of low resolution training tiles is an odd number not less than 3, and the high resolution training tile corresponds to the low resolution training tile centrally located in the plurality of low resolution training tiles to hyper-reconstruct the low resolution training tile centrally located in the plurality of low resolution training tiles by inputting the plurality of low resolution training tiles as an overall input network.
5. The method of claim 1, wherein the step of training a convolutional neural network model with the plurality of low resolution training tiles as input data and the high resolution training tiles as truth labels comprises:
performing convolution operation on the plurality of low-resolution training image blocks as a whole to extract features from the plurality of low-resolution training image blocks and perform dimensionality reduction processing;
carrying out deconvolution operation on the dimensionality reduced features by using a deconvolution layer so as to obtain a high-resolution output image block;
calculating a loss cost between the high resolution output tile and the high resolution training tile using a loss function;
and updating the model parameters of the convolutional neural network model by using a gradient back propagation algorithm according to the loss cost so that the loss cost tends to be minimized.
6. The method of claim 5, wherein the step of separately convolving the plurality of low resolution training tiles comprises:
performing convolution operation on the plurality of low-resolution training image blocks by using a deformable convolution layer respectively;
the step of updating the model parameters of the convolutional neural network model using a back propagation algorithm according to the loss cost comprises:
updating the offset parameters of the deformable convolutional layers by using a back propagation algorithm according to the loss cost, so that the features extracted by the deformable convolutional layers from the same target in each low-resolution training image block tend to be aligned in space.
7. The method according to claim 1, wherein the step of performing super-resolution reconstruction on the low-resolution video to be reconstructed by using the trained convolutional neural network model comprises:
obtaining a plurality of low-resolution blocks to be reconstructed continuously in time from the low-resolution video to be reconstructed;
inputting the plurality of low-resolution blocks to be reconstructed as a whole into the trained convolutional neural network to generate high-resolution reconstructed blocks;
generating a high resolution reconstructed video using the high resolution reconstructed tiles.
8. The method according to claim 7, wherein the step of obtaining a plurality of temporally successive low resolution blocks to be reconstructed from the low resolution video to be reconstructed comprises:
acquiring a plurality of temporally continuous low-resolution images to be reconstructed from the low-resolution video to be reconstructed;
comparing the size of the low-resolution image to be reconstructed with the input size of the convolutional neural network model;
preprocessing the low-resolution image to be reconstructed according to the comparison result so as to obtain a low-resolution image block to be reconstructed;
wherein the input size is a size of the low resolution training tile.
9. The method according to claim 8, wherein the step of preprocessing the low resolution image to be reconstructed according to the comparison result comprises:
if the size of the low-resolution image to be reconstructed is larger than the input size, segmenting the low-resolution image to be reconstructed by adopting the following formula so as to enable the size of the low-resolution image to be reconstructed after edge repairing to be equal to the input size,
Figure FDA0002283357860000041
wherein s is the input size in a preset direction, m is the size of the overlapping part of the low-resolution block to be reconstructed in the preset direction, n is the size of the non-overlapping part of the low-resolution block to be reconstructed with the overlapping part on both sides in the preset direction, w is the size of the low-resolution image to be reconstructed in the preset direction, p is the number of blocks of the low-resolution block to be reconstructed cut out of the low-resolution image to be reconstructed in the preset direction and is an integer not less than an upper integer of w/s, w and p are known quantities, and m and n are quantities to be solved;
and if the size of the low-resolution image to be reconstructed is smaller than the input size, performing mirror image edge repairing on the low-resolution image to be reconstructed so that the size of the low-resolution image to be reconstructed after edge repairing is equal to the input size.
10. The method of claim 7, wherein generating a high resolution reconstructed video using the high resolution reconstructed tiles comprises:
stitching a plurality of the high resolution reconstruction tiles corresponding to the same frame into a high resolution reconstructed image.
11. The method of claim 7,
the step of generating a high resolution reconstructed video using the high resolution reconstructed tiles comprises:
and cutting off an edge region of the high-resolution reconstruction image block, which corresponds to the mirror image edge-filling region of the low-resolution image to be reconstructed.
CN201911150376.XA 2019-11-21 2019-11-21 Video super-resolution reconstruction method Pending CN111062867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911150376.XA CN111062867A (en) 2019-11-21 2019-11-21 Video super-resolution reconstruction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911150376.XA CN111062867A (en) 2019-11-21 2019-11-21 Video super-resolution reconstruction method

Publications (1)

Publication Number Publication Date
CN111062867A true CN111062867A (en) 2020-04-24

Family

ID=70298083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911150376.XA Pending CN111062867A (en) 2019-11-21 2019-11-21 Video super-resolution reconstruction method

Country Status (1)

Country Link
CN (1) CN111062867A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667410A (en) * 2020-06-10 2020-09-15 腾讯科技(深圳)有限公司 Image resolution improving method and device and electronic equipment
CN113034380A (en) * 2021-02-09 2021-06-25 浙江大学 Video space-time super-resolution method and device based on improved deformable convolution correction
CN113077385A (en) * 2021-03-30 2021-07-06 上海大学 Video super-resolution method and system based on countermeasure generation network and edge enhancement
CN113344792A (en) * 2021-08-02 2021-09-03 浙江大华技术股份有限公司 Image generation method and device and electronic equipment
CN116385391A (en) * 2023-04-03 2023-07-04 深圳智现未来工业软件有限公司 Multi-target detection method for wafer map defects

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN108734659A (en) * 2018-05-17 2018-11-02 华中科技大学 A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label
CN108765291A (en) * 2018-05-29 2018-11-06 天津大学 Super resolution ratio reconstruction method based on dense neural network and two-parameter loss function
CN109325915A (en) * 2018-09-11 2019-02-12 合肥工业大学 A kind of super resolution ratio reconstruction method for low resolution monitor video
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device
CN110070511A (en) * 2019-04-30 2019-07-30 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device
CN108734659A (en) * 2018-05-17 2018-11-02 华中科技大学 A kind of sub-pix convolved image super resolution ratio reconstruction method based on multiple dimensioned label
CN108734660A (en) * 2018-05-25 2018-11-02 上海通途半导体科技有限公司 A kind of image super-resolution rebuilding method and device based on deep learning
CN108765291A (en) * 2018-05-29 2018-11-06 天津大学 Super resolution ratio reconstruction method based on dense neural network and two-parameter loss function
CN109325915A (en) * 2018-09-11 2019-02-12 合肥工业大学 A kind of super resolution ratio reconstruction method for low resolution monitor video
CN110070511A (en) * 2019-04-30 2019-07-30 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667410A (en) * 2020-06-10 2020-09-15 腾讯科技(深圳)有限公司 Image resolution improving method and device and electronic equipment
CN113034380A (en) * 2021-02-09 2021-06-25 浙江大学 Video space-time super-resolution method and device based on improved deformable convolution correction
CN113034380B (en) * 2021-02-09 2022-06-10 浙江大学 Video space-time super-resolution method and device based on improved deformable convolution correction
CN113077385A (en) * 2021-03-30 2021-07-06 上海大学 Video super-resolution method and system based on countermeasure generation network and edge enhancement
CN113344792A (en) * 2021-08-02 2021-09-03 浙江大华技术股份有限公司 Image generation method and device and electronic equipment
WO2023010701A1 (en) * 2021-08-02 2023-02-09 Zhejiang Dahua Technology Co., Ltd. Image generation method, apparatus, and electronic device
CN116385391A (en) * 2023-04-03 2023-07-04 深圳智现未来工业软件有限公司 Multi-target detection method for wafer map defects

Similar Documents

Publication Publication Date Title
CN111062867A (en) Video super-resolution reconstruction method
US10970600B2 (en) Method and apparatus for training neural network model used for image processing, and storage medium
US10542249B2 (en) Stereoscopic video generation method based on 3D convolution neural network
EP2209090B1 (en) Image restoring apparatus and method thereof
US6959117B2 (en) Method and apparatus for deblurring and re-blurring image segments
US7260274B2 (en) Techniques and systems for developing high-resolution imagery
US8494302B2 (en) Importance filtering for image retargeting
US8520009B1 (en) Method and apparatus for filtering video data using a programmable graphics processor
CN109462747B (en) DIBR system cavity filling method based on generation countermeasure network
CA2702165C (en) Image generation method and apparatus, program therefor, and storage medium which stores the program
CN112543317B (en) Method for converting high-resolution monocular 2D video into binocular 3D video
WO2019071990A1 (en) Image processing method and apparatus
JP2011095861A (en) Image processing apparatus, control method and program
CN109643462B (en) Real-time image processing method based on rendering engine and display device
JP2019029938A (en) Error calculator and its program
WO2022033230A1 (en) Image display method, electronic device and computer-readable storage medium
CN111582268B (en) License plate image processing method and device and computer storage medium
RU2310911C1 (en) Method for interpolation of images
JP2009147970A (en) Reduction method of color blurring artifact in digital image
US12003859B2 (en) Brightness adjustment method, and apparatus thereof
JP6902425B2 (en) Color information magnifiers and color information estimators, and their programs
CN109754370B (en) Image denoising method and device
KR101715349B1 (en) Image correction method using bilateral interpolation
WO2023102189A2 (en) Iterative graph-based image enhancement using object separation
CN117135466A (en) Full-scene self-adaptive brightness correction fusion method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200424