CN113822801A - Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network - Google Patents
Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network Download PDFInfo
- Publication number
- CN113822801A CN113822801A CN202110718467.XA CN202110718467A CN113822801A CN 113822801 A CN113822801 A CN 113822801A CN 202110718467 A CN202110718467 A CN 202110718467A CN 113822801 A CN113822801 A CN 113822801A
- Authority
- CN
- China
- Prior art keywords
- image
- branch
- matrix
- network
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 53
- 239000013598 vector Substances 0.000 claims description 36
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 6
- 230000000737 periodic effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 230000000644 propagated effect Effects 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network, which searches approximate blocks in intra-frame coding frames with similar intervals for each frame of image to be processed in a block processing mode, forms a predicted image corresponding to the current image to be processed by the approximate blocks, respectively takes the predicted image and the image to be processed as the input of each branch network, and fuses the output of the branch networks as the final high-resolution reconstruction result. The compressed video super-resolution reconstruction method adopting the multi-branch convolutional neural network can effectively utilize the inter-frame redundancy information of the video sequence, and particularly utilizes the characteristic that an intra-frame coding frame in the compressed video has better visual quality, so that the reconstructed super-resolution image has better quality.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network.
Background
With the increasing popularity of high-resolution display devices and the continued emergence of new video applications, the market demand for ultra-high-definition video, such as 4K or 8K, is increasing. At the same time, the increase in network bandwidth as a common resource has not kept pace with the demand for transmitting high quality video. In the background, the super-resolution reconstruction of the video image can be operated as an image enhancement technology at a decoding end, thereby providing a feasible solution for the contradiction.
Chinese patent CN101345870B discloses a method for constructing a small amount of super-resolution reconstruction auxiliary code stream by using pre-decoding closed-loop feedback with super-resolution reconstruction through a coding device, and further guiding and correcting super-resolution reconstruction at the decoding end by using an eye-interested analysis module in the coding module, so as to improve resolution and subjective quality of video decoding output. Chinese patent CN103475876B discloses a super-resolution reconstruction method of a low-bit-rate compressed image based on learning, the off-line part of the method classifies low-resolution images according to the distortion degree thereof to establish a sample library, and trains respective super-resolution models for each type of samples; and the online part judges the distortion category of the input image and selects different models to realize super-resolution reconstruction. Chinese patent CN101605260B discloses a compressed video super-resolution reconstruction method based on maximum posterior probability estimation MAP, which defines the MAP reconstruction cost function as a reconstruction error item, a regular constraint item containing DCT coefficient distribution parameters before quantization and a general constraint item, and improves the quality of compressed video super-resolution reconstruction by introducing a DCT coefficient distribution model.
Unlike single-frame image and video image super-resolution reconstruction, the compressed video image super-resolution reconstruction system takes an image with compression loss as an input. Quantization errors are introduced in the quantization process of the lossy video compression coding system, and the errors are more expressed as loss of high-frequency components in a frequency domain, so that the compressed image has the characteristics of detail loss, edge blurring and the like. The reconstruction of high-resolution images by using the low-resolution images with defects of detail loss, edge blurring and the like as input poses more challenges to the ultra-resolution reconstruction system.
Disclosure of Invention
The invention aims to provide a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network by fully utilizing interframe redundancy information of a video sequence, in particular utilizing the characteristic that an intraframe coding frame in a compressed video has better visual quality.
The technical scheme adopted by the invention is as follows: a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network comprises the following specific steps
(1) The multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame I in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
(2) the first branch network and the second branch network have the same structure, according to the data flow direction in the forward data transmission, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and 3 multiplied by 3, and N residual blocks which are connected in sequence are connected behind the convolution layer; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing network through the channel merging operationCA characteristic diagram of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC;
(3) The feature map formed by the channel merging in the step (2) is processed by a method comprising r2Convolution layers with convolution kernel of 3 × 3 and convolution step size of 1, and output generated by convolution operationObtaining an upsampled image H by means of phase screening1Wherein r is an upsampling factor;
(4) the input of the third branch network passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is periodically screened to obtain an up-sampled image H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And carrying out summation operation on the pixels one by one, and taking the generated output as a result image, namely the image after super-resolution reconstruction in the compressed video.
Further, the finding a block with the maximum similarity in the reference image for each block image in the current decoding frame in the form of block processing, and forming a reconstructed image from these similar blocks as an input of the first branch network Sub-a includes:
2.1 initializing the reconstructed image I by setting the height and width of the current decoded image to H and W respectivelypThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
2.2 in s, respectively1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left cornerThe image blocks with different sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image to the matrix T as a row in T, and adding each row vector from the current decoded image to the matrix Q as a row in Q;
2.3 for the row vector Q in the matrix Q, the Euclidean distance is taken as the similarity measurement, the most similar row vector is searched in T by the k-nearest neighbor algorithm and is recorded as T, and if the Euclidean distance between the vector T and the vector Q is smaller than a preset threshold value e, the row vector Q in T is sequentially takenThe elements being one row of the matrixThe rows form oneA matrix of sizes as target blocks, otherwise in q are fetched sequentiallyThe elements being one row of the matrixLine formationA matrix of sizes as target blocks; adding a gray average value corresponding to q to each pixel value in the target block;
2.4 setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and the target block obtained in the step 2.3 as b, then reconstructing the image IpIn which (u, v) is the upper left corner and the size isEach pixel is added with the value of the corresponding element in the target block, and the weight matrix C takes (u, v) as the upper left corner and has the size ofEach element value is added by 1;
2.5 repeating steps 2.3 and 2.4 for all row vectors in the matrix Q to obtain a reconstructed image Ip;
2.6 reconstruction of image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
Furthermore, the N residual blocks connected in sequence in the first branch network and the second branch network, each residual block having the same structure, include two convolution layers and a ReLU layer, and according to the flow direction when data is transmitted forward, the convolution layers including 128 convolution kernels of 3 × 3 and convolution step size 1, and the ReLU layer and the convolution layer including 32 convolution kernels of 3 × 3 and convolution step size 1 are in sequence; let x be the input of any one residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
Further, the output of the convolutional layer is an up-sampled image in a periodic screening manner, including: let the output of the convolutional layer be H × W × r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as one row of the matrix in sequence, forming a matrix with the size of r multiplied by r in the total row, and placing the matrix at the (rx, ry) position in the up-sampling image; the above process is repeated for all coordinate positions of the feature map to form an up-sampled image.
Further, the parameters of each layer of the multi-branch convolutional neural network are determined in a learning mode, and the method comprises the following steps:
A. preparing a training sample: let I be the currently decoded frame in compressed video, IoIs an uncompressed coded original image corresponding to IpFor the reconstructed image constructed, the ith sample in the sample set used for training the multi-branch convolution neural network is shaped asWherein,and yiAre respectively from I, IpAnd IoThe image blocks at the same position and the same size;
B. training: batch loading of samples in a training sample set willInput to the second and third branch networks,inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
whereinFor correspondences produced by multi-branch convolutional neural networksI.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of all layers of the network are updated by an Adam optimization algorithm, the learning rate is adjusted in a piecewise descending mode, specifically, the total training period number is divided into four stages, and the learning rate of the next stage is equal to one half of the learning rate of the previous stage.
The invention has the beneficial technical effects that: the compressed video super-resolution reconstruction method adopting the multi-branch convolutional neural network can effectively utilize the inter-frame redundancy information of the video sequence, and particularly utilizes the characteristic that an intra-frame coding frame in the compressed video has better visual quality, so that the reconstructed super-resolution image has better quality.
Drawings
FIG. 1 is a schematic diagram of a multi-branch convolutional neural network structure according to the present invention;
fig. 2 is a schematic diagram of a residual block network structure.
Detailed Description
The invention is further described below in conjunction with the drawings and the specific embodiments so that those skilled in the art can better understand the essence of the invention.
As shown in fig. 1, the invention provides a compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network, which comprises the following specific steps:
(1) the multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
the finding of the block with the maximum similarity in the reference image for each block image in the current decoding frame in the form of block processing, and the forming of the reconstructed image by these similar blocks as the input of the first branch network Sub-a, includes:
step 1A, setting the height and width of the current decoding image as H and W respectively, and initializing a reconstructed image IpThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
step 1B, respectively with s1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left cornerThe image blocks with the sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image into the matrix T as a row in T, and adding each row vector from the current decoded image into the matrix Q as a row in Q; wherein d may be 36 or 64, s1Can be 1 or 2, s2(may be)
Step 1C, regarding the row vector Q in the matrix Q, using Euclidean distance as similarity measurement, using k-nearest neighbor algorithm to search the most similar row vector in T, and recording the most similar row vector as T, if the Euclidean distance between the vector T and the vector Q is equal to TIf the distance is less than a preset threshold e, the values in t are takenThe elements being one row of the matrixThe rows form oneTaking the matrix of the size as a target block, otherwise, sequentially taking the matrix in qThe elements being one row of the matrixThe rows form oneThe matrix of the size is used as a target block; adding a gray average value corresponding to q to each pixel value in the target block;
step 1D, setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and setting the target block obtained in the step 1C as b, then reconstructing an image IpIn which (u, v) is the upper left corner and the size isEach pixel is added with the value of the corresponding element in the target block b, and the weight matrix C takes (u, v) as the upper left corner and has the size ofEach element value is incremented by 1;
step 1E, repeating the steps 1C and 1D on all row vectors in the matrix Q;
step 1F, reconstructing an image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
(2) First branch network and second branch networkThe two branch networks have the same structure, according to the data flow direction when the data forward propagates, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and N residual blocks which are connected in sequence are connected behind the convolution layer, wherein N can be an integer which is more than 10 and less than 18; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing channel merging operationCCharacteristic of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC;
Each of the N sequentially connected residual blocks has the same structure, which includes two convolutional layers and a ReLU layer, and the convolutional layers sequentially include 128 convolutional kernels with a convolution step length of 1 and 3 × 3 convolutional kernels with a convolution step length of 1, and the ReLU layer includes 32 convolutional kernels with a convolution step length of 1, according to the flow direction of data forward propagation; let x be the input of any residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
(3) The feature map formed by combining the channels in the previous step is processed by a method comprising r2Obtaining an up-sampling image H by output generated by convolution operation of convolution kernels with the convolution kernel number of 3 multiplied by 3 and convolution layer with the convolution step length of 1 in a periodic screening mode1Wherein r is an upper sampling factor;
the output generated by convolution operation is used for obtaining an up-sampled image in a periodic screening mode, and the output of the convolution layer is set to be H multiplied by W multiplied by r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as a row of a matrix in sequence, forming a matrix of r multiplied by r in the row, and placing the matrix at the (rx, ry) position in the up-sampling image; repeating the above process for all coordinate positions of the feature map to form an up-sampling image;
(4) the input of the third branch passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is up-sampled by periodic screeningImage H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And performing summation operation of corresponding pixels one by one, and generating output as a result image.
The technical scheme (1) of the invention determines the parameters of each layer of the multi-branch convolutional neural network in a learning mode, and comprises the following steps:
5A, preparing a training sample: let I be a frame in compressed video, IoIs the original image corresponding to the frame, I, which is not compression-encodedpFor the reconstructed images constructed as described in steps 1A to 1F, the ith sample in the sample set used to train the multi-branch convolutional neural network modelWherein,and yiAre respectively from I, IpAnd IoThe image blocks with the same position and size;
5B, training: batch loading of samples in a training sample set willInput to said second and third branch networks,inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
whereinFor correspondences generated by multi-branch convolutional neural network modelsI.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of each layer of the network are updated by an Adam optimization algorithm, optionally, the initial value of the learning rate can be set to be a value between 0.001 and 0.005, the learning rate is adjusted in a piecewise descending manner, specifically, the total training cycle number is divided into four stages, and the learning rate of the latter stage is equal to one half of the learning rate of the former stage.
The method provided by the embodiment of the invention tests the video coded by HEVC; the HEVC reference software HM16.0 is adopted as a compression tool, and a test video with an original size and a video with a width direction and a height direction respectively reduced to an original size 1/2 are subjected to compression coding by using quantization parameters QP 27, 32, 37 and 42 respectively; setting the inter-frame coding frame interval to be 32, setting the QP value offset of the intra-frame coding frame to be-7, and setting the reserved encoder _ lowdelay _ P _ main.cfg configuration file for other parameters; recording the code rate of a compressed video with an original size and the peak signal to noise ratio (PSNR) with the original uncompressed video as a reference, recording the code rate of the compressed and coded video after being reduced, reconstructing the compressed and coded video into the video with the original size by adopting the model provided by the embodiment of the invention, and calculating the peak signal to noise ratio (PSNR) of the reconstructed video and the uncompressed video as the reference; the method provided by the invention has the advantages that the video compressed by the original size is taken as the reference, and the BD-rate is taken as the measurement criterion, so that the code rate saving condition of the method is given under the same objective quality; the PSNR gain of the method provided by the invention is given under the condition of the same code rate by taking BD-PSNR as a measurement criterion, and the result is listed in Table 1; as can be seen from the table, the method provided by the invention saves about 14% of code rate on average under the condition of giving the same objective quality, and provides about 0.77dB of PSNR gain on average under the condition of giving the same code rate.
Table 1 experimental results of examples of the present invention
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification or replacement within the spirit and principle of the present invention should be covered within the scope of the present invention.
Claims (5)
1. A compressed video super-resolution reconstruction method based on a multi-branch convolutional neural network is characterized by comprising the following steps: the method comprises the following specific steps
(1) The multi-branch convolutional neural network for compressed video super-resolution reconstruction comprises three branches, wherein a second branch network Sub-B and a third branch network Sub-C take a current decoding frame I of a compressed video as input; selecting a reference image from two intra-frame coding frames which are positioned before and after the I frame according to the number of the interval frames, searching a block with the maximum similarity in the reference image for each block image in the current decoding frame I in a block processing mode, and forming a reconstructed image by the similar blocks as the input of a first branch network Sub-A;
(2) the first branch network and the second branch network have the same structure, according to the data flow direction when the data forward propagates, the input data firstly passes through a convolution layer containing 32 convolution kernels with the convolution step length of 1 and 3 multiplied by 3, and N residual blocks which are connected in sequence are connected behind the convolution layer; the output characteristic diagram of the last residual block of the first branch network and the output characteristic diagram of the last residual block of the second branch network form a 2N-containing network through the channel merging operationCA characteristic diagram of each channel, wherein the number of channels of the output characteristic diagram of the first branch network and the second branch network is NC;
(3) The feature map formed by the channel merging in the step (2) is processed by a method comprising r2Obtaining an up-sampled image H by output generated by convolution operation of convolution kernels with a convolution kernel of 3 multiplied by 3 and a convolution layer with a convolution step length of 1 in a periodic screening mode1Wherein r is an upsampling factor;
(4) The input of the third branch network passes through a filter comprising r2A convolution layer with 3 × 3 convolution kernel and convolution step size of 1, the output of the convolution layer is periodically screened to obtain an up-sampled image H2Wherein r is an upsampling factor;
(5) the up-sampled image H1And H2And carrying out summation operation on the pixels one by one, and taking the generated output as a result image, namely the image after super-resolution reconstruction of the compressed video image.
2. The compressed video super-resolution reconstruction method based on multi-branch convolutional neural network of claim 1, wherein the method searches for the block with the largest similarity in the reference image for each block image in the current decoded frame in the form of block processing, and the reconstructed image is formed by these similar blocks as the input of the first branch network Sub-a, and the specific process includes:
2.1 initializing the reconstructed image I by setting the height and width of the current decoded image to H and W respectivelypThe size of the initialized weight matrix C is H multiplied by W, all pixel values are 0, the size of the initialized weight matrix C is H multiplied by W, and the initial values of all elements are 0;
2.2 in s, respectively1And s2Scanning the reference image and the current decoded image from left to right and from top to bottom at equal intervals for the scanning step size, extracting at each scanning position (u, v) with the position as the upper left cornerThe image blocks with different sizes are obtained by subtracting the gray average value of each image block and are converted into a row vector containing d elements in a row-first mode; adding each row vector from the reference image into the matrix T as a row in T, and adding each row vector from the current decoded image into the matrix Q as a row in Q;
2.3 for the row vector Q in the matrix Q, the Euclidean distance is taken as the similarity measurement, the most similar row vector is searched in T by the k-nearest neighbor algorithm and is recorded as T, and if the Euclidean distance between the vector T and the vector Q is smaller than a preset threshold value e, the row vector Q in T is sequentially takenThe elements being one row of the matrixThe rows form oneThe matrix of the size is used as a target block, otherwise, the matrix in q is taken in sequenceThe elements being one row of the matrixLine formationA matrix of sizes as target blocks; adding a gray average value corresponding to q to each pixel value in the target block;
2.4 setting the scanning position of the image block corresponding to the row vector Q in the matrix Q as (u, v), and the target block obtained in the step 2.3 as b, then reconstructing the image IpIn which (u, v) is the upper left corner and the size isEach pixel is added with the value of the corresponding element in the target block, and the weight matrix C takes (u, v) as the upper left corner and has the size ofEach element value is added by 1;
2.5 repeating steps 2.3 and 2.4 for all row vectors in the matrix Q to obtain a reconstructed image Ip;
2.6 reconstruction of image IpDividing the value of each pixel in the weight matrix C by the value of the corresponding element in the weight matrix C to obtain the final reconstructed image.
3. The compressed video super-resolution reconstruction method based on the multi-branch convolutional neural network of claim 1, wherein the N residual blocks connected in sequence in the first branch network and the second branch network, each residual block having the same structure, comprise two convolutional layers and a ReLU layer, and in sequence, in the flow direction when data is propagated in the forward direction, are convolutional layers comprising 128 3 × 3 convolutional kernels and having a convolution step size of 1, and the ReLU layer and a convolutional layer comprising 32 3 × 3 convolutional kernels and having a convolution step size of 1; let x be the input of any one residual block, the two convolutional layers and the ReLU layer map this input to f (x), and finally f (x) + x is taken as the output of the residual block.
4. The compressed video super-resolution reconstruction method based on multi-branch convolutional neural network of claim 1, wherein the output of the convolutional layer is obtained as an up-sampled image in a periodic screening manner, and the method comprises the following steps: let the output of the convolutional layer be H × W × r2Taking r of all channels with coordinates of (x, y) position2Forming a vector by the elements, taking r elements of the vector as a row of the matrix in sequence, forming a matrix with the size of r multiplied by r in the row, and placing the matrix at the (rx, ry) position in the up-sampling image; the above process is repeated for all coordinate positions of the feature map to form an up-sampled image.
5. The compressed video super-resolution reconstruction method based on the multi-branch convolutional neural network as claimed in claim 1, wherein the parameters of each layer of the multi-branch convolutional neural network are determined in a learning manner, and the method comprises the following steps:
A. preparing a training sample: let I be the currently decoded frame in compressed video, IoIs an uncompressed coded original image corresponding to IpFor the reconstructed image constructed, the ith sample in the sample set used for training the multi-branch convolution neural network is shaped asWherein,and yiAre respectively from I, IpAnd IoThe image blocks with the same position and size;
B. training: batch loading of samples in a training sample set willInput to the second and third branch networks,inputting the network parameters into the first branch network, and searching for the optimal network parameters according to the following optimization process:
whereinFor correspondences produced by multi-branch convolutional neural networksI.e. | non-calculation of the luminance1Represents a norm of 1; in the training process, the weight values of all layers of the network are updated by an Adam optimization algorithm, the learning rate is adjusted in a piecewise descending mode, specifically, the total training period number is divided into four stages, and the learning rate of the next stage is equal to one half of the learning rate of the previous stage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110718467.XA CN113822801B (en) | 2021-06-28 | 2021-06-28 | Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110718467.XA CN113822801B (en) | 2021-06-28 | 2021-06-28 | Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822801A true CN113822801A (en) | 2021-12-21 |
CN113822801B CN113822801B (en) | 2023-08-18 |
Family
ID=78924108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110718467.XA Active CN113822801B (en) | 2021-06-28 | 2021-06-28 | Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822801B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913072A (en) * | 2022-05-16 | 2022-08-16 | 中国第一汽车股份有限公司 | Image processing method and device, storage medium and processor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100272184A1 (en) * | 2008-01-10 | 2010-10-28 | Ramot At Tel-Aviv University Ltd. | System and Method for Real-Time Super-Resolution |
CN108012157A (en) * | 2017-11-27 | 2018-05-08 | 上海交通大学 | Construction method for the convolutional neural networks of Video coding fractional pixel interpolation |
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN111866521A (en) * | 2020-07-09 | 2020-10-30 | 浙江工商大学 | Video image compression artifact removing method combining motion compensation and generation type countermeasure network |
-
2021
- 2021-06-28 CN CN202110718467.XA patent/CN113822801B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100272184A1 (en) * | 2008-01-10 | 2010-10-28 | Ramot At Tel-Aviv University Ltd. | System and Method for Real-Time Super-Resolution |
CN108012157A (en) * | 2017-11-27 | 2018-05-08 | 上海交通大学 | Construction method for the convolutional neural networks of Video coding fractional pixel interpolation |
CN109862370A (en) * | 2017-11-30 | 2019-06-07 | 北京大学 | Video super-resolution processing method and processing device |
CN111866521A (en) * | 2020-07-09 | 2020-10-30 | 浙江工商大学 | Video image compression artifact removing method combining motion compensation and generation type countermeasure network |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913072A (en) * | 2022-05-16 | 2022-08-16 | 中国第一汽车股份有限公司 | Image processing method and device, storage medium and processor |
Also Published As
Publication number | Publication date |
---|---|
CN113822801B (en) | 2023-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Learning for video compression | |
JP7047119B2 (en) | Methods and equipment for residual code prediction in the conversion region | |
US6438168B2 (en) | Bandwidth scaling of a compressed video stream | |
Brunello et al. | Lossless compression of video using temporal information | |
CN107105278A (en) | The coding and decoding video framework that motion vector is automatically generated | |
CN115956363A (en) | Content adaptive online training method and device for post filtering | |
CN1695381A (en) | Sharpness enhancement in post-processing of digital video signals using coding information and local spatial features | |
CN109903351B (en) | Image compression method based on combination of convolutional neural network and traditional coding | |
JP2010534015A (en) | Image processing method and corresponding electronic device | |
CN113066022B (en) | Video bit enhancement method based on efficient space-time information fusion | |
EP1389875A2 (en) | Method for motion estimation adaptive to DCT block content | |
CN115131675A (en) | Remote sensing image compression method and system based on reference image texture migration | |
KR20090079286A (en) | Method and apparatus for estimating motion vector of moving images using fast full search block matching algorithm | |
JP2024513693A (en) | Configurable position of auxiliary information input to picture data processing neural network | |
CN111669588A (en) | Ultra-high definition video compression coding and decoding method with ultra-low time delay | |
Lin et al. | Multiple hypotheses based motion compensation for learned video compression | |
CN113822801B (en) | Compressed video super-resolution reconstruction method based on multi-branch convolutional neural network | |
CN115665413A (en) | Method for estimating optimal quantization parameter of image compression | |
CN116012272A (en) | Compressed video quality enhancement method based on reconstructed flow field | |
JP2004511978A (en) | Motion vector compression | |
CN108833920A (en) | A kind of DVC side information fusion method based on light stream and Block- matching | |
KR20240024921A (en) | Methods and devices for encoding/decoding image or video | |
US20200128240A1 (en) | Video encoding and decoding using an epitome | |
Li et al. | You Can Mask More For Extremely Low-Bitrate Image Compression | |
CN115358954B (en) | Attention-guided feature compression method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |