CN113822801A - Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network - Google Patents

Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network Download PDF

Info

Publication number
CN113822801A
CN113822801A CN202110718467.XA CN202110718467A CN113822801A CN 113822801 A CN113822801 A CN 113822801A CN 202110718467 A CN202110718467 A CN 202110718467A CN 113822801 A CN113822801 A CN 113822801A
Authority
CN
China
Prior art keywords
image
branch
matrix
network
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110718467.XA
Other languages
Chinese (zh)
Other versions
CN113822801B (en
Inventor
陈卫刚
周迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Zhejiang Gongshang University
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd, Zhejiang Gongshang University filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN202110718467.XA priority Critical patent/CN113822801B/en
Publication of CN113822801A publication Critical patent/CN113822801A/en
Application granted granted Critical
Publication of CN113822801B publication Critical patent/CN113822801B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network, which searches approximate blocks in intra-frame coding frames with similar intervals for each frame of image to be processed in a block processing mode, forms a predicted image corresponding to the current image to be processed by the approximate blocks, respectively takes the predicted image and the image to be processed as the input of each branch network, and fuses the output of the branch networks as the final high-resolution reconstruction result. The compressed video super-resolution reconstruction method adopting the multi-branch convolutional neural network can effectively utilize the inter-frame redundancy information of the video sequence, and particularly utilizes the characteristic that an intra-frame coding frame in the compressed video has better visual quality, so that the reconstructed super-resolution image has better quality.

Description

Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network
Technical Field
The invention relates to the field of computer vision, in particular to a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network.
Background
With the increasing popularity of high-resolution display devices and the continued emergence of new video applications, the market demand for ultra-high-definition video, such as 4K or 8K, is increasing. At the same time, the increase in network bandwidth as a common resource has not kept pace with the demand for transmitting high quality video. In the background, the super-resolution reconstruction of the video image can be operated as an image enhancement technology at a decoding end, thereby providing a feasible solution for the contradiction.
Chinese patent CN101345870B discloses a method for constructing a small amount of super-resolution reconstruction auxiliary code stream by using pre-decoding closed-loop feedback with super-resolution reconstruction through a coding device, and further guiding and correcting super-resolution reconstruction at the decoding end by using an eye-interested analysis module in the coding module, so as to improve resolution and subjective quality of video decoding output. Chinese patent CN103475876B discloses a super-resolution reconstruction method of a low-bit-rate compressed image based on learning, the off-line part of the method classifies low-resolution images according to the distortion degree thereof to establish a sample library, and trains respective super-resolution models for each type of samples; and the online part judges the distortion category of the input image and selects different models to realize super-resolution reconstruction. Chinese patent CN101605260B discloses a compressed video super-resolution reconstruction method based on maximum posterior probability estimation MAP, which defines the MAP reconstruction cost function as a reconstruction error item, a regular constraint item containing DCT coefficient distribution parameters before quantization and a general constraint item, and improves the quality of compressed video super-resolution reconstruction by introducing a DCT coefficient distribution model.
Unlike single-frame image and video image super-resolution reconstruction, the compressed video image super-resolution reconstruction system takes an image with compression loss as an input. Quantization errors are introduced in the quantization process of the lossy video compression coding system, and the errors are more expressed as loss of high-frequency components in a frequency domain, so that the compressed image has the characteristics of detail loss, edge blurring and the like. The reconstruction of high-resolution images by using the low-resolution images with defects of detail loss, edge blurring and the like as input poses more challenges to the ultra-resolution reconstruction system.
Disclosure of Invention
The invention aims to provide a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network by fully utilizing interframe redundancy information of a video sequence, in particular utilizing the characteristic that an intraframe coding frame in a compressed video has better visual quality.
The technical scheme adopted by the invention is as follows: a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network comprises the following specific steps
(1) The multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame I in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
(2) the first branch network and the second branch network have the same structure, according to the data flow direction in the forward data transmission, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and 3 multiplied by 3, and N residual blocks which are connected in sequence are connected behind the convolution layer; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing network through the channel merging operationCA characteristic diagram of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC
(3) The feature map formed by the channel merging in the step (2) is processed by a method comprising r2Convolution layers with convolution kernel of 3 × 3 and convolution step size of 1, and output generated by convolution operationObtaining an upsampled image H by means of phase screening1Wherein r is an upsampling factor;
(4) the input of the third branch network passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is periodically screened to obtain an up-sampled image H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And carrying out summation operation on the pixels one by one, and taking the generated output as a result image, namely the image after super-resolution reconstruction in the compressed video.
Further, the finding a block with the maximum similarity in the reference image for each block image in the current decoding frame in the form of block processing, and forming a reconstructed image from these similar blocks as an input of the first branch network Sub-a includes:
2.1 initializing the reconstructed image I by setting the height and width of the current decoded image to H and W respectivelypThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
2.2 in s, respectively1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left corner
Figure BDA0003135970480000021
The image blocks with different sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image to the matrix T as a row in T, and adding each row vector from the current decoded image to the matrix Q as a row in Q;
2.3 for the row vector Q in the matrix Q, the Euclidean distance is taken as the similarity measurement, the most similar row vector is searched in T by the k-nearest neighbor algorithm and is recorded as T, and if the Euclidean distance between the vector T and the vector Q is smaller than a preset threshold value e, the row vector Q in T is sequentially taken
Figure BDA0003135970480000031
The elements being one row of the matrix
Figure BDA0003135970480000032
The rows form one
Figure BDA0003135970480000033
A matrix of sizes as target blocks, otherwise in q are fetched sequentially
Figure BDA0003135970480000034
The elements being one row of the matrix
Figure BDA0003135970480000035
Line formation
Figure BDA0003135970480000036
A matrix of sizes as target blocks; adding a gray average value corresponding to q to each pixel value in the target block;
2.4 setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and the target block obtained in the step 2.3 as b, then reconstructing the image IpIn which (u, v) is the upper left corner and the size is
Figure BDA0003135970480000037
Each pixel is added with the value of the corresponding element in the target block, and the weight matrix C takes (u, v) as the upper left corner and has the size of
Figure BDA0003135970480000038
Each element value is added by 1;
2.5 repeating steps 2.3 and 2.4 for all row vectors in the matrix Q to obtain a reconstructed image Ip
2.6 reconstruction of image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
Furthermore, the N residual blocks connected in sequence in the first branch network and the second branch network, each residual block having the same structure, include two convolution layers and a ReLU layer, and according to the flow direction when data is transmitted forward, the convolution layers including 128 convolution kernels of 3 × 3 and convolution step size 1, and the ReLU layer and the convolution layer including 32 convolution kernels of 3 × 3 and convolution step size 1 are in sequence; let x be the input of any one residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
Further, the output of the convolutional layer is an up-sampled image in a periodic screening manner, including: let the output of the convolutional layer be H × W × r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as one row of the matrix in sequence, forming a matrix with the size of r multiplied by r in the total row, and placing the matrix at the (rx, ry) position in the up-sampling image; the above process is repeated for all coordinate positions of the feature map to form an up-sampled image.
Further, the parameters of each layer of the multi-branch convolutional neural network are determined in a learning mode, and the method comprises the following steps:
A. preparing a training sample: let I be the currently decoded frame in compressed video, IoIs an uncompressed coded original image corresponding to IpFor the reconstructed image constructed, the ith sample in the sample set used for training the multi-branch convolution neural network is shaped as
Figure BDA0003135970480000039
Wherein,
Figure BDA00031359704800000310
and yiAre respectively from I, IpAnd IoThe image blocks at the same position and the same size;
B. training: batch loading of samples in a training sample set will
Figure BDA00031359704800000311
Input to the second and third branch networks,
Figure BDA00031359704800000312
inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
Figure BDA00031359704800000313
wherein
Figure BDA00031359704800000314
For correspondences produced by multi-branch convolutional neural networks
Figure BDA00031359704800000315
I.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of all layers of the network are updated by an Adam optimization algorithm, the learning rate is adjusted in a piecewise descending mode, specifically, the total training period number is divided into four stages, and the learning rate of the next stage is equal to one half of the learning rate of the previous stage.
The invention has the beneficial technical effects that: the compressed video super-resolution reconstruction method adopting the multi-branch convolutional neural network can effectively utilize the inter-frame redundancy information of the video sequence, and particularly utilizes the characteristic that an intra-frame coding frame in the compressed video has better visual quality, so that the reconstructed super-resolution image has better quality.
Drawings
FIG. 1 is a schematic diagram of a multi-branch convolutional neural network structure according to the present invention;
fig. 2 is a schematic diagram of a residual block network structure.
Detailed Description
The invention is further described below in conjunction with the drawings and the specific embodiments so that those skilled in the art can better understand the essence of the invention.
As shown in fig. 1, the invention provides a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network, which comprises the following specific steps:
(1) the multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
the finding of the block with the maximum similarity in the reference image for each block image in the current decoding frame in the form of block processing, and the forming of the reconstructed image by these similar blocks as the input of the first branch network Sub-a, includes:
step 1A, setting the height and width of the current decoding image as H and W respectively, and initializing a reconstructed image IpThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
step 1B, respectively with s1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left corner
Figure BDA0003135970480000041
The image blocks with the sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image into the matrix T as a row in T, and adding each row vector from the current decoded image into the matrix Q as a row in Q; wherein d may be 36 or 64, s1Can be 1 or 2, s2(may be)
Figure BDA0003135970480000042
Step 1C, regarding the row vector Q in the matrix Q, using Euclidean distance as similarity measurement, using k-nearest neighbor algorithm to search the most similar row vector in T, and recording the most similar row vector as T, if the Euclidean distance between the vector T and the vector Q is equal to TIf the distance is less than a preset threshold e, the values in t are taken
Figure BDA0003135970480000051
The elements being one row of the matrix
Figure BDA0003135970480000052
The rows form one
Figure BDA0003135970480000053
Taking the matrix of the size as a target block, otherwise, sequentially taking the matrix in q
Figure BDA0003135970480000054
The elements being one row of the matrix
Figure BDA0003135970480000055
The rows form one
Figure BDA0003135970480000056
The matrix of the size is used as a target block; adding a gray average value corresponding to q to each pixel value in the target block;
step 1D, setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and setting the target block obtained in the step 1C as b, then reconstructing an image IpIn which (u, v) is the upper left corner and the size is
Figure BDA0003135970480000057
Each pixel is added with the value of the corresponding element in the target block b, and the weight matrix C takes (u, v) as the upper left corner and has the size of
Figure BDA0003135970480000058
Each element value is incremented by 1;
step 1E, repeating the steps 1C and 1D on all row vectors in the matrix Q;
step 1F, reconstructing an image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
(2) First branch network and second branch networkThe two branch networks have the same structure, according to the data flow direction when the data forward propagates, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and N residual blocks which are connected in sequence are connected behind the convolution layer, wherein N can be an integer which is more than 10 and less than 18; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing channel merging operationCCharacteristic of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC
Each of the N sequentially connected residual blocks has the same structure, which includes two convolutional layers and a ReLU layer, and the convolutional layers sequentially include 128 convolutional kernels with a convolution step length of 1 and 3 × 3 convolutional kernels with a convolution step length of 1, and the ReLU layer includes 32 convolutional kernels with a convolution step length of 1, according to the flow direction of data forward propagation; let x be the input of any residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
(3) The feature map formed by combining the channels in the previous step is processed by a method comprising r2Obtaining an up-sampling image H by output generated by convolution operation of convolution kernels with the convolution kernel number of 3 multiplied by 3 and convolution layer with the convolution step length of 1 in a periodic screening mode1Wherein r is an upper sampling factor;
the output generated by convolution operation is used for obtaining an up-sampled image in a periodic screening mode, and the output of the convolution layer is set to be H multiplied by W multiplied by r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as a row of a matrix in sequence, forming a matrix of r multiplied by r in the row, and placing the matrix at the (rx, ry) position in the up-sampling image; repeating the above process for all coordinate positions of the feature map to form an up-sampling image;
(4) the input of the third branch passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is up-sampled by periodic screeningImage H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And performing summation operation of corresponding pixels one by one, and generating output as a result image.
The technical scheme (1) of the invention determines the parameters of each layer of the multi-branch convolutional neural network in a learning mode, and comprises the following steps:
5A, preparing a training sample: let I be a frame in compressed video, IoIs the original image corresponding to the frame, I, which is not compression-encodedpFor the reconstructed images constructed as described in steps 1A to 1F, the ith sample in the sample set used to train the multi-branch convolutional neural network model
Figure BDA0003135970480000061
Wherein,
Figure BDA0003135970480000062
and yiAre respectively from I, IpAnd IoThe image blocks with the same position and size;
5B, training: batch loading of samples in a training sample set will
Figure BDA0003135970480000063
Input to said second and third branch networks,
Figure BDA0003135970480000064
inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
Figure BDA0003135970480000065
wherein
Figure BDA0003135970480000066
For correspondences generated by multi-branch convolutional neural network models
Figure BDA0003135970480000067
I.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of each layer of the network are updated by an Adam optimization algorithm, optionally, the initial value of the learning rate can be set to be a value between 0.001 and 0.005, the learning rate is adjusted in a piecewise descending manner, specifically, the total training cycle number is divided into four stages, and the learning rate of the latter stage is equal to one half of the learning rate of the former stage.
The method provided by the embodiment of the invention tests the video coded by HEVC; the HEVC reference software HM16.0 is adopted as a compression tool, and a test video with an original size and a video with a width direction and a height direction respectively reduced to an original size 1/2 are subjected to compression coding by using quantization parameters QP 27, 32, 37 and 42 respectively; setting the inter-frame coding frame interval to be 32, setting the QP value offset of the intra-frame coding frame to be-7, and setting the reserved encoder _ lowdelay _ P _ main.cfg configuration file for other parameters; recording the code rate of a compressed video with an original size and the peak signal to noise ratio (PSNR) with the original uncompressed video as a reference, recording the code rate of the compressed and coded video after being reduced, reconstructing the compressed and coded video into the video with the original size by adopting the model provided by the embodiment of the invention, and calculating the peak signal to noise ratio (PSNR) of the reconstructed video and the uncompressed video as the reference; the method provided by the invention has the advantages that the video compressed by the original size is taken as the reference, and the BD-rate is taken as the measurement criterion, so that the code rate saving condition of the method is given under the same objective quality; the PSNR gain of the method provided by the invention is given under the condition of the same code rate by taking BD-PSNR as a measurement criterion, and the result is listed in Table 1; as can be seen from the table, the method provided by the invention saves about 14% of code rate on average under the condition of giving the same objective quality, and provides about 0.77dB of PSNR gain on average under the condition of giving the same code rate.
Table 1 experimental results of examples of the present invention
Figure BDA0003135970480000068
Figure BDA0003135970480000071
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification or replacement within the spirit and principle of the present invention should be covered within the scope of the present invention.

Claims (5)

1. A compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network is characterized by comprising the following steps: the method comprises the following specific steps
(1) The multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame I in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
(2) the first branch network and the second branch network have the same structure, according to the data flow direction when the data forward propagates, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and 3 multiplied by 3, and N residual blocks which are connected in sequence are connected behind the convolution layer; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing network through the channel merging operationCA characteristic diagram of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC
(3) The feature map formed by the channel merging in the step (2) is processed by a method comprising r2Obtaining an up-sampled image H by output generated by convolution operation of convolution kernels with a convolution kernel of 3 multiplied by 3 and a convolution layer with a convolution step length of 1 in a periodic screening mode1Wherein r is an upsampling factor;
(4) The input of the third branch network passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is periodically screened to obtain an up-sampled image H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And carrying out summation operation on the pixels one by one, and taking the generated output as a result image, namely the image after super-resolution reconstruction of the compressed video image.
2. The compressed video super-resolution reconstruction method based on multi-branch convolutional neural network of claim 1, wherein the method searches for the block with the largest similarity in the reference image for each block image in the current decoded frame in the form of block processing, and the reconstructed image is formed by these similar blocks as the input of the first branch network Sub-a, and the specific process includes:
2.1 initializing the reconstructed image I by setting the height and width of the current decoded image to H and W respectivelypThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
2.2 in s, respectively1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left corner
Figure FDA0003135970470000011
The image blocks with different sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image into the matrix T as a row in T, and adding each row vector from the current decoded image into the matrix Q as a row in Q;
2.3 for the row vector Q in the matrix Q, the Euclidean distance is taken as the similarity measurement, the most similar row vector is searched in T by the k-nearest neighbor algorithm and is recorded as T, and if the Euclidean distance between the vector T and the vector Q is smaller than a preset threshold value e, the row vector Q in T is sequentially taken
Figure FDA0003135970470000021
The elements being one row of the matrix
Figure FDA0003135970470000022
The rows form one
Figure FDA0003135970470000023
The matrix of the size is used as a target block, otherwise, the matrix in q is taken in sequence
Figure FDA0003135970470000024
The elements being one row of the matrix
Figure FDA0003135970470000025
Line formation
Figure FDA0003135970470000026
A matrix of sizes as target blocks; adding a gray average value corresponding to q to each pixel value in the target block;
2.4 setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and the target block obtained in the step 2.3 as b, then reconstructing the image IpIn which (u, v) is the upper left corner and the size is
Figure FDA0003135970470000027
Each pixel is added with the value of the corresponding element in the target block, and the weight matrix C takes (u, v) as the upper left corner and has the size of
Figure FDA0003135970470000028
Each element value is added by 1;
2.5 repeating steps 2.3 and 2.4 for all row vectors in the matrix Q to obtain a reconstructed image Ip
2.6 reconstruction of image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
3. The compressed video super-resolution reconstruction method based on the multi-branch convolutional neural network of claim 1, wherein the N residual blocks connected in sequence in the first branch network and the second branch network, each residual block having the same structure, comprise two convolutional layers and a ReLU layer, and in sequence, in the flow direction when data is propagated in the forward direction, are convolutional layers comprising 128 3 × 3 convolutional kernels and having a convolution step size of 1, and the ReLU layer and a convolutional layer comprising 32 3 × 3 convolutional kernels and having a convolution step size of 1; let x be the input of any one residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
4. The compressed video super-resolution reconstruction method based on multi-branch convolutional neural network of claim 1, wherein the output of the convolutional layer is obtained as an up-sampled image in a periodic screening manner, and the method comprises the following steps: let the output of the convolutional layer be H × W × r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as a row of the matrix in sequence, forming a matrix with the size of r multiplied by r in the row, and placing the matrix at the (rx, ry) position in the up-sampling image; the above process is repeated for all coordinate positions of the feature map to form an up-sampled image.
5. The compressed video super-resolution reconstruction method based on the multi-branch convolutional neural network as claimed in claim 1, wherein the parameters of each layer of the multi-branch convolutional neural network are determined in a learning manner, and the method comprises the following steps:
A. preparing a training sample: let I be the currently decoded frame in compressed video, IoIs an uncompressed coded original image corresponding to IpFor the reconstructed image constructed, the ith sample in the sample set used for training the multi-branch convolution neural network is shaped as
Figure FDA0003135970470000029
Wherein,
Figure FDA00031359704700000210
and yiAre respectively from I, IpAnd IoThe image blocks with the same position and size;
B. training: batch loading of samples in a training sample set will
Figure FDA0003135970470000031
Input to the second and third branch networks,
Figure FDA0003135970470000032
inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
Figure FDA0003135970470000033
wherein
Figure FDA0003135970470000034
For correspondences produced by multi-branch convolutional neural networks
Figure FDA0003135970470000035
I.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of all layers of the network are updated by an Adam optimization algorithm, the learning rate is adjusted in a piecewise descending mode, specifically, the total training period number is divided into four stages, and the learning rate of the next stage is equal to one half of the learning rate of the previous stage.
CN202110718467.XA 2021-06-28 2021-06-28 Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network Active CN113822801B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110718467.XA CN113822801B (en) 2021-06-28 2021-06-28 Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110718467.XA CN113822801B (en) 2021-06-28 2021-06-28 Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network

Publications (2)

Publication Number Publication Date
CN113822801A true CN113822801A (en) 2021-12-21
CN113822801B CN113822801B (en) 2023-08-18

Family

ID=78924108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110718467.XA Active CN113822801B (en) 2021-06-28 2021-06-28 Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network

Country Status (1)

Country Link
CN (1) CN113822801B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913072A (en) * 2022-05-16 2022-08-16 中国第一汽车股份有限公司 Image processing method and device, storage medium and processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100272184A1 (en) * 2008-01-10 2010-10-28 Ramot At Tel-Aviv University Ltd. System and Method for Real-Time Super-Resolution
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device
CN111866521A (en) * 2020-07-09 2020-10-30 浙江工商大学 Video image compression artifact removing method combining motion compensation and generation type countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100272184A1 (en) * 2008-01-10 2010-10-28 Ramot At Tel-Aviv University Ltd. System and Method for Real-Time Super-Resolution
CN108012157A (en) * 2017-11-27 2018-05-08 上海交通大学 Construction method for the convolutional neural networks of Video coding fractional pixel interpolation
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device
CN111866521A (en) * 2020-07-09 2020-10-30 浙江工商大学 Video image compression artifact removing method combining motion compensation and generation type countermeasure network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913072A (en) * 2022-05-16 2022-08-16 中国第一汽车股份有限公司 Image processing method and device, storage medium and processor

Also Published As

Publication number Publication date
CN113822801B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
Chen et al. Learning for video compression
JP7047119B2 (en) Methods and equipment for residual code prediction in the conversion region
US6438168B2 (en) Bandwidth scaling of a compressed video stream
Brunello et al. Lossless compression of video using temporal information
CN107105278A (en) The coding and decoding video framework that motion vector is automatically generated
CN115956363A (en) Content adaptive online training method and device for post filtering
CN1695381A (en) Sharpness enhancement in post-processing of digital video signals using coding information and local spatial features
CN109903351B (en) Image compression method based on combination of convolutional neural network and traditional coding
JP2010534015A (en) Image processing method and corresponding electronic device
CN113066022B (en) Video bit enhancement method based on efficient space-time information fusion
EP1389875A2 (en) Method for motion estimation adaptive to DCT block content
CN115131675A (en) Remote sensing image compression method and system based on reference image texture migration
KR20090079286A (en) Method and apparatus for estimating motion vector of moving images using fast full search block matching algorithm
JP2024513693A (en) Configurable position of auxiliary information input to picture data processing neural network
CN111669588A (en) Ultra-high definition video compression coding and decoding method with ultra-low time delay
Lin et al. Multiple hypotheses based motion compensation for learned video compression
CN113822801B (en) Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network
CN115665413A (en) Method for estimating optimal quantization parameter of image compression
CN116012272A (en) Compressed video quality enhancement method based on reconstructed flow field
JP2004511978A (en) Motion vector compression
CN108833920A (en) A kind of DVC side information fusion method based on light stream and Block- matching
KR20240024921A (en) Methods and devices for encoding/decoding image or video
US20200128240A1 (en) Video encoding and decoding using an epitome
Li et al. You Can Mask More For Extremely Low-Bitrate Image Compression
CN115358954B (en) Attention-guided feature compression method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant