CN113393377B - Single-frame image super-resolution method based on video coding - Google Patents

Single-frame image super-resolution method based on video coding Download PDF

Info

Publication number
CN113393377B
CN113393377B CN202110541900.7A CN202110541900A CN113393377B CN 113393377 B CN113393377 B CN 113393377B CN 202110541900 A CN202110541900 A CN 202110541900A CN 113393377 B CN113393377 B CN 113393377B
Authority
CN
China
Prior art keywords
network
sub
blocks
resolution
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110541900.7A
Other languages
Chinese (zh)
Other versions
CN113393377A (en
Inventor
吴庆波
李鹏飞
李宏亮
孟凡满
许林峰
潘力立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110541900.7A priority Critical patent/CN113393377B/en
Publication of CN113393377A publication Critical patent/CN113393377A/en
Application granted granted Critical
Publication of CN113393377B publication Critical patent/CN113393377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a single-frame image super-resolution method based on video coding, which utilizes prior information directly obtained in the video coding to perform targeted processing on subblocks in different parts of an image, utilizes a complex network to process subblocks with more complex textures, and simultaneously designs an adaptive convolution module to perform targeted processing on subblocks with different coding modes, so that the network is more targeted, different detailed information is restored aiming at different textures, and the precision of a super-resolution result is improved. The invention shares the parameters of the network with few channels into the network with deep channels, namely, the super-resolution process of the whole picture is realized by using different layers of a main network, the network with relatively simple use, shallow layers and few channels is used for processing relatively large subblocks with smoother texture, and the time required by the super-resolution process is reduced.

Description

Single-frame image super-resolution method based on video coding
Technical Field
The invention relates to the technical field of image processing, in particular to a single-frame image super-resolution method based on video coding.
Background
The image super-resolution is a process of converting an input visual image with low resolution into a visual image with high resolution. One important concern of recent super-resolution work has been to propose various networks that accelerate the reasoning process. One branch is to utilize fewer parameters and realize high-efficiency super-resolution work at a higher speed. For example, in the early FSRCNN, feature extraction is directly performed on an input image, and then a feature map passes through an up-sampling network to complete construction of a super-resolution image. Also for example, the recent work CARN is to design a residual network by using a packet convolution technique to realize fast processing of input pictures. Another branch is to increase the complexity of the network model, increase the number of model branches, by training separately for different kinds of inputs, such as classssr.
ClassSR trains and infers low-resolution input images of different complexity by using neural networks of different complexity. Because most areas of the image only need to pass through the network with relatively small calculation amount, the method improves the operation speed of the network inference stage to a certain extent. Specifically, the method is to divide a picture into small blocks of 32 × 32 pixels. Classifying the small images into three classes according to the texture complexity of the small images through a classification network trained in advance: simple pictures, medium pictures, difficult pictures. Different classes of pictures correspond to backbone networks with different channel numbers.
In a traditional super-resolution network, a feature map is directly extracted from a whole picture, so that the network has no way to well learn different features of each region, and the same convolution kernel is applied to process different regions, so that the texture details of the recovered image are inconsistent with those of a real image. And because the texture detail complexity of different areas of the image is different, the complex processing of the low-detail area is unnecessary to increase the calculation amount of the network. However, the neural network that is classified first and then is not shared by three parameters as proposed by the classssr will spend a lot of time and computation power in training, and the complexity of the network is increased. In addition to the above-mentioned disadvantages, the super-resolution methods of today mostly ignore the help of the original prior information of the image to the image super-resolution process. Therefore, a super-resolution method with small network computation amount and improved accuracy of the recovered image texture details and the real image is needed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a single-frame image super-resolution method based on video coding, which solves the problems mentioned in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a single-frame image super-resolution method based on video coding comprises the following steps:
s1, encoding the low-resolution image I in the video by using the prior information of each frame of image of the videoLRSub-blocks of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 × 32 pixels are divided according to h.265 video coding information, for sub-blocks of 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000021
The corresponding coding prediction mode M can be obtainedpreGenerating corresponding Gaussian distribution model G according to different coding modesm
S2, sub-blocks using 16 × 16 and 32 × 32 pixels
Figure BDA0003072132290000022
Training a channel adaptive backbone network CAB, dividing each convolution block in the CAB into conv1 and conv2 two-layer channels, and performing forward and backward propagation by using only the parameters of conv1 and not using the parameters of conv2 in each iteration by minimizing the perception loss
Figure BDA0003072132290000023
And mse loss
Figure BDA0003072132290000024
To obtain the final super-resolution output ISR
Figure BDA0003072132290000025
S3, sub-blocks using 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000026
Training the channel adaptive backbone network CAB, and performing forward propagation by using the parameters of conv1 and conv2, wherein conv1 is already in the process
Figure BDA0003072132290000027
The training of (2) learns the feature extraction mode of the smooth information, fixes the parameters of conv1 during back propagation, updates only the parameters of conv2, and minimizes the perception loss
Figure BDA0003072132290000028
And mse loss
Figure BDA0003072132290000029
Obtaining the final super-resolution output ISR
Figure BDA0003072132290000031
S4, S2 and S3, training the whole network, fixing the parameters of channel self-adapting backbone network CAB during training, and utilizing minimum perception loss
Figure BDA0003072132290000032
And mse loss
Figure BDA0003072132290000033
Training, updating the rest network parameters, training
Figure BDA0003072132290000034
Corresponding to the feature extraction module of the branch, and preliminarily extracting
Figure BDA0003072132290000035
Is characterized by
Figure BDA0003072132290000036
S5, sub-blocks for 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000037
When the corresponding branch network CAB is trained, the branch network CAB is used for training
Figure BDA0003072132290000038
The sub-blocks are input into the network in the relative order of the number i (i-0, 1,2 … 15), and each sub-block is denoted as
Figure BDA0003072132290000039
The four sub-blocks with the same size and adjacent to the sub-blocks are marked as
Figure BDA00030721322900000310
Wherein the content of the first and second substances,
Figure BDA00030721322900000311
i in (a) represents a numerical value of a numerical number;
s6, sampling the Gaussian model generated in the step S1 with high and wide pitches by taking (0,0) as the center
Figure BDA00030721322900000312
Obtain a matrix with the same width and height as the convolution block
Figure BDA00030721322900000313
And performing point multiplication operation with the convolution layer Conv in the adaptive convolution module ACB, and performing weighting, wherein the expression is as follows:
Figure BDA00030721322900000314
re-aligning the input image with the convolution kernel after dot multiplication
Figure BDA00030721322900000315
Carrying out common convolution operation, and obtaining a feature map more focused on image texture features after passing through an ACB module
Figure BDA00030721322900000316
S7, after each four adjacent sub-blocks pass through the self-adaptive texture processing module, splicing the four sub-blocks according to the positions of the four sub-blocks in the original image, and transmitting the four sub-blocks to the backbone network to obtain a feature map with the width and the height being twice of those of the single sub-block
Figure BDA00030721322900000317
Expressed in matrix form as:
Figure BDA00030721322900000318
s8 minimization of network utilizationtotalAnd further fine adjustment is carried out, and the super-resolution process of the picture is completed.
Preferably, the encoding prediction mode M of step S1preIncluding DC prediction mode, planar prediction mode, and angular prediction mode.
Preferably, the prediction mode M is encodedpreFor GmThe covariance matrix C of (a) is controlled,
Gm=Guss(C,θ|Mpre)
adjusting the covariance matrix to make the maximum value of the generated Gaussian model coincide with the pattern texture angle, and adaptively focusing on the image texture features, wherein M is usedpreIn the DC mode or the planar mode, a Gaussian model with a unit covariance matrix is set, and M is setpreSetting an initial covariance matrix C for the subblock with the angle mode and the angle theta, and performing theta angle transformation on the initial covariance matrix C to obtain a result, wherein the result is expressed as:
Gm=A(θ)CA(θ)T
wherein A (theta) is a two-dimensional rotation matrix
Figure BDA0003072132290000041
A(θ)TRepresenting the transpose of matrix a (θ).
Preferably, the fine adjustment in step S8 specifically includes:
using mse losses
Figure BDA0003072132290000042
To minimize the difference between the input low resolution image and the true high resolution image
Figure BDA0003072132290000043
Wherein, N represents the number of pixels,
Figure BDA0003072132290000044
representing the outputs of different branches, with their respective true images of the corresponding branches
Figure BDA0003072132290000045
Calculating, adding a perception loss term into the loss function, so that the distance L2 between the characteristic value of the generated picture passing through the CNN network and the characteristic value of the target picture passing through the CNN network is as small as possible, the picture to be generated is semantically more similar to the target picture,
Figure BDA0003072132290000046
wherein f represents a CNN network, and the CNN network is particularly a VGG-16 network.
Using larger loss weight values ω for 4 × 4, 8 × 8 sub-blocks2For larger smooth subblocks 16 × 16, 32 × 32, smaller weight values ω are used1
Loss function LtotalExpressed as:
Figure BDA0003072132290000047
wherein ω is1Is 0.5, omega2Is 1.
The invention has the beneficial effects that:
1) the invention utilizes prior information which can be directly obtained in video coding to carry out targeted processing on different sub-blocks of an image, utilizes a complex network to process the sub-blocks with more complex textures, and designs an adaptive convolution module to carry out targeted processing on the sub-blocks with different coding modes, so that the network is more targeted, different detailed information is restored aiming at different textures, and the precision of a super-resolution result is improved.
2) The invention shares the parameters of the network with few channels into the network with deep channels, namely, the super-resolution process of the whole picture is realized by using different layers of a main network, the network with relatively simple use, shallow layers and few channels is used for processing relatively large subblocks with smoother texture, and the time required by the super-resolution process is reduced.
Drawings
FIG. 1 is a schematic diagram of a network structure according to an embodiment of the present invention;
FIG. 2 is a block diagram of a network adaptive texture processing module according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an input sequence for training 4 × 4 and 8 × 8 pixel sub-blocks according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: a super-resolution method of single-frame image based on video coding, the network structure is shown in figure 1, comprising the following steps:
s1, encoding the low-resolution image I in the video by using the prior information of each frame of image of the videoLRSub-blocks of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 × 32 pixels are divided according to h.265 video coding information, for sub-blocks of 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000051
The corresponding coding prediction mode M can be obtainedpreCoding of prediction mode MpreIncluding DC prediction mode, plane prediction mode and angle prediction mode, and generating corresponding Gaussian distribution model G according to different coding modesm
Wherein the prediction mode M is predicted by encodingpreFor GmThe covariance matrix C of (a) is controlled,
Gm=Guss(C,θ|Mpre)
adaptively focusing on image texture features by adjusting the covariance matrix such that the maxima of the generated Gaussian model coincide with the pattern texture angles, wherein M is adjustedpreIn the DC mode or the planar mode, a Gaussian model with a unit covariance matrix is set, and M is setpreSetting an initial covariance matrix C for the subblock with the angle mode and the angle theta, and performing theta angle transformation on the initial covariance matrix C to obtain a result, wherein the result is expressed as:
Gm=A(θ)CA(θ)T
wherein A (theta) is a two-dimensional rotation matrix
Figure BDA0003072132290000061
A(θ)TRepresenting the transpose of matrix a (θ).
S2, sub-blocks using 16 × 16 and 32 × 32 pixels
Figure BDA0003072132290000062
Training the Channel Adaptive Backbone network (CAB) in fig. 1, note that in order to efficiently process different types of input, each convolution block in CAB is divided into conv1 and conv2 two-layer channels, and in each iteration, forward and backward propagation is performed only using the parameters of conv1, and the parameters of conv2 are not used, and by minimizing the perceptual loss
Figure BDA0003072132290000063
And mse loss
Figure BDA0003072132290000064
To obtain the final super-resolution output ISR
Figure BDA0003072132290000065
S3, sub-blocks using 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000066
Training the channel adaptive backbone network CAB in FIG. 1, note that since the complex texture information needs to be processed using more network parameters, the parameters of conv1 and conv2 are used for forward propagation since conv1 is already in the process
Figure BDA0003072132290000067
During training and learning the feature extraction method of the smoothed information, the parameters of conv1 are fixed and only the parameters of conv2 are updated during back propagation, and the perception loss is still minimized
Figure BDA0003072132290000068
And mse loss
Figure BDA0003072132290000069
Obtaining the final super-resolution output ISR
Figure BDA00030721322900000610
S4, S2 and S3, training the whole network, and only using the minimum perception loss to fix the parameters of the channel self-adaptive backbone network CAB during training
Figure BDA0003072132290000071
And mse loss
Figure BDA0003072132290000072
Training, updating the rest network parameters, firstly training
Figure BDA0003072132290000073
Corresponding to the feature extraction module of the branch, and preliminarily extracting
Figure BDA0003072132290000074
Is characterized by
Figure BDA0003072132290000075
S5, sub-blocks for 4 × 4 and 8 × 8 pixels
Figure BDA0003072132290000076
When the corresponding branch network CAB is trained, the branch network CAB is used for training
Figure BDA0003072132290000077
The sub-blocks are input into the network in the relative order of the number i (i-0, 1,2 … 15), and each sub-block is denoted as
Figure BDA0003072132290000078
The four sub-blocks with the same size and adjacent to the sub-blocks are marked as
Figure BDA0003072132290000079
Wherein the content of the first and second substances,
Figure BDA00030721322900000710
i in (a) represents a numerical value of a numerical number (as shown in fig. 3, when a sub-block in which i is 5 is input, its adjacent sub-blocks are i is 6,7, 8);
s6, sampling the Gaussian model generated in the step S1 with high and wide pitches by taking (0,0) as the center
Figure BDA00030721322900000711
Obtain a matrix with the same width and height as the convolution block
Figure BDA00030721322900000712
With adaptive convolution module ACB (as shown in FIG. 2)The convolutional layer Conv performs point multiplication operation and weighting, and the expression is as follows:
Figure BDA00030721322900000713
re-aligning the input image with the convolution kernel after dot multiplication
Figure BDA00030721322900000714
Carrying out common convolution operation, and obtaining a feature map more focused on image texture features after passing through an ACB module
Figure BDA00030721322900000715
S7, after each four adjacent sub-blocks pass through the self-adaptive texture processing module, splicing the four sub-blocks according to the positions of the four sub-blocks in the original image, and transmitting the four sub-blocks to the backbone network to obtain a feature map with the width and the height being twice of those of the single sub-block
Figure BDA00030721322900000716
Expressed in matrix form as:
Figure BDA00030721322900000717
s8, minimizing L for network utilization in order to focus more on detail informationtotalAnd further fine adjustment is carried out, and the super-resolution process of the picture is completed.
In the training process described above, the mse loss is used
Figure BDA00030721322900000718
To minimize the difference between the input low resolution image and the true high resolution image
Figure BDA00030721322900000719
Wherein, N represents the number of pixels,
Figure BDA00030721322900000720
representing the outputs of different branches, with their respective true images of the corresponding branches
Figure BDA00030721322900000721
Calculation is carried out, but Pixel-by-Pixel loss of mse loss is different from real visual perception, so that a perception loss item is added into a loss function, the distance between the characteristic value of a generated picture passing through a CNN network and the characteristic value of a target picture passing through the CNN network is as small as possible, the picture to be generated is semantically more similar to the target picture (relative to a Pixel-level loss function),
Figure BDA0003072132290000081
Figure BDA0003072132290000082
here, we select the CNN network represented by f as VGG-16.
Furthermore, since the super-resolution quality of the image is more apparent in detail, we are more concerned with the reconstruction effect of the texture complex part, i.e. the 4 × 4, 8 × 8 sub-blocks, and we therefore give larger loss weight values ω to the two sub-blocks2For larger smooth subblocks 16 × 16, 32 × 32, smaller weight values ω are used1
Therefore, the loss function LtotalExpressed as:
Figure BDA0003072132290000083
wherein ω is1Is 0.5, omega2Is 1.
The invention utilizes prior information which can be directly obtained in video coding to carry out targeted processing on different sub-blocks of an image, utilizes a complex network to process the sub-blocks with more complex textures, and designs an adaptive convolution module to carry out targeted processing on the sub-blocks with different coding modes, so that the network is more targeted, different detailed information is restored aiming at different textures, and the precision of a super-resolution result is improved. The invention shares the parameters of the network with few channels into the network with deep channels, namely, the super-resolution process of the whole picture is realized by using different layers of a main network, the network with relatively simple use, shallow layers and few channels is used for processing relatively large subblocks with smoother texture, and the time required by the super-resolution process is reduced.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (4)

1. A single-frame image super-resolution method based on video coding is characterized by comprising the following steps:
s1, encoding the low-resolution image I in the video by using the prior information of each frame of image of the videoLRSub-blocks of 4 × 4 pixels, 8 × 8 pixels, 16 × 16 pixels and 32 × 32 pixels are divided according to h.265 video coding information, for sub-blocks of 4 × 4 and 8 × 8 pixels
Figure FDA0003448287950000011
The corresponding coding prediction mode M can be obtainedpreGenerating corresponding Gaussian distribution model G according to different coding modesm
S2, sub-blocks using 16 × 16 and 32 × 32 pixels
Figure FDA0003448287950000012
Training a channel adaptive backbone network CAB, dividing each convolution block in the CAB into conv1 and conv2 two-layer channels, and performing forward and backward propagation by using only the parameters of conv1 and not using the parameters of conv2 in each iteration by minimizing the perception loss
Figure FDA0003448287950000013
And mse loss
Figure FDA0003448287950000014
To obtain the final super-resolution output ISR
Figure FDA0003448287950000015
S3, sub-blocks using 4 × 4 and 8 × 8 pixels
Figure FDA0003448287950000016
Training the channel adaptive backbone network CAB, and performing forward propagation by using the parameters of conv1 and conv2, wherein conv1 is already in the process
Figure FDA0003448287950000017
The training of (2) learns the feature extraction mode of the smooth information, fixes the parameters of conv1 during back propagation, updates only the parameters of conv2, and minimizes the perception loss
Figure FDA0003448287950000018
And mse loss
Figure FDA0003448287950000019
Obtaining the final super-resolution output ISR
Figure FDA00034482879500000110
S4, S2 and S3, training the whole network, fixing the parameters of channel self-adapting backbone network CAB during training, and utilizing minimum perception loss
Figure FDA00034482879500000111
And mse loss
Figure FDA00034482879500000112
Training, updating the rest network parameters, training
Figure FDA00034482879500000113
Corresponding to the feature extraction module of the branch, and preliminarily extracting
Figure FDA00034482879500000114
Is characterized by
Figure FDA00034482879500000115
S5, sub-blocks for 4 × 4 and 8 × 8 pixels
Figure FDA00034482879500000116
When the corresponding branch network CAB is trained, the branch network CAB is used for training
Figure FDA00034482879500000117
Inputting the sub-blocks into the network according to the relative sequence of the number i, i-0, 1,2 … 15, and recording each sub-block as
Figure FDA00034482879500000118
The four sub-blocks with the same size and adjacent to the sub-blocks are marked as
Figure FDA00034482879500000119
Wherein the content of the first and second substances,
Figure FDA00034482879500000120
i in (a) represents a numerical value of a numerical number;
s6, sampling the Gaussian model generated in the step S1 with high and wide pitches by taking (0,0) as the center
Figure FDA0003448287950000021
Figure FDA0003448287950000022
Obtain a matrix with the same width and height as the convolution block
Figure FDA0003448287950000023
And performing point multiplication operation with the convolution layer Conv in the adaptive convolution module ACB, and performing weighting, wherein the expression is as follows:
Figure FDA0003448287950000024
re-aligning the input image with the convolution kernel after dot multiplication
Figure FDA0003448287950000025
Carrying out common convolution operation, and obtaining a feature map more focused on image texture features after passing through an ACB module
Figure FDA0003448287950000026
S7, after each four adjacent sub-blocks pass through the self-adaptive texture processing module, splicing the four sub-blocks according to the positions of the four sub-blocks in the original image, and transmitting the four sub-blocks to the backbone network to obtain a feature map with the width and the height being twice of those of the single sub-block
Figure FDA0003448287950000027
Expressed in matrix form as:
Figure FDA0003448287950000028
s8 minimization of network utilizationtotalAnd further fine adjustment is carried out, and the super-resolution process of the picture is completed.
2. The method for super-resolution of single-frame images based on video coding according to claim 1, wherein: the encoding prediction mode of step S1MpreIncluding DC prediction mode, planar prediction mode, and angular prediction mode.
3. The method for super-resolution of single-frame images based on video coding according to claim 1, wherein: the prediction mode M is encodedpreFor GmThe covariance matrix C of (a) is controlled,
Gm=Guss(C,θ|Mpre)
adjusting the covariance matrix to make the maximum value of the generated Gaussian model coincide with the pattern texture angle, and adaptively focusing on the image texture features, wherein M is usedpreIn the DC mode or the planar mode, a Gaussian model with a unit covariance matrix is set, and M is setpreSetting an initial covariance matrix C for the subblock with the angle mode and the angle theta, and performing theta angle transformation on the initial covariance matrix C to obtain a result, wherein the result is expressed as:
Gm=A(θ)CA(θ)T
wherein A (theta) is a two-dimensional rotation matrix
Figure FDA0003448287950000031
A(θ)TRepresenting the transpose of matrix a (θ).
4. The method for super-resolution of single-frame images based on video coding according to claim 1, wherein: the fine adjustment in step S8 specifically includes:
using mse losses
Figure FDA0003448287950000032
To minimize the difference between the input low resolution image and the true high resolution image
Figure FDA0003448287950000033
Wherein, N represents the number of pixels,
Figure FDA0003448287950000034
representing the outputs of different branches, with their respective true images of the corresponding branches
Figure FDA0003448287950000035
Calculating, adding a perception loss term into the loss function, so that the distance L2 between the characteristic value of the generated picture passing through the CNN network and the characteristic value of the target picture passing through the CNN network is as small as possible, the picture to be generated is semantically more similar to the target picture,
Figure FDA0003448287950000036
wherein f represents a CNN network, and the CNN network is a VGG-16 network;
using larger loss weight values ω for 4 × 4, 8 × 8 sub-blocks2For larger smooth subblocks 16 × 16, 32 × 32, smaller weight values ω are used1
Loss function LtotalExpressed as:
Figure FDA0003448287950000037
wherein ω is1Is 0.5, omega2Is 1.
CN202110541900.7A 2021-05-18 2021-05-18 Single-frame image super-resolution method based on video coding Active CN113393377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110541900.7A CN113393377B (en) 2021-05-18 2021-05-18 Single-frame image super-resolution method based on video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110541900.7A CN113393377B (en) 2021-05-18 2021-05-18 Single-frame image super-resolution method based on video coding

Publications (2)

Publication Number Publication Date
CN113393377A CN113393377A (en) 2021-09-14
CN113393377B true CN113393377B (en) 2022-02-01

Family

ID=77617993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110541900.7A Active CN113393377B (en) 2021-05-18 2021-05-18 Single-frame image super-resolution method based on video coding

Country Status (1)

Country Link
CN (1) CN113393377B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115512B (en) * 2022-06-13 2023-10-03 荣耀终端有限公司 Training method and device for image superdivision network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102835105A (en) * 2010-02-19 2012-12-19 斯凯普公司 Data compression for video
CN110956671A (en) * 2019-12-12 2020-04-03 电子科技大学 Image compression method based on multi-scale feature coding
CN112449140A (en) * 2019-08-29 2021-03-05 华为技术有限公司 Video super-resolution processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969577B (en) * 2019-11-29 2022-03-11 北京交通大学 Video super-resolution reconstruction method based on deep double attention network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102835105A (en) * 2010-02-19 2012-12-19 斯凯普公司 Data compression for video
CN112449140A (en) * 2019-08-29 2021-03-05 华为技术有限公司 Video super-resolution processing method and device
CN110956671A (en) * 2019-12-12 2020-04-03 电子科技大学 Image compression method based on multi-scale feature coding

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A2RMNet: Adaptively Aspect Ratio Multi-Scale Network for Object Detection in Remote Sensing Images;Heqian Qiu等;《remote sensing》;20190704;第2-23页 *
REGION ADAPTIVE TWO-SHOT NETWORK FOR SINGLE IMAGE DEHAZING;Hui Li等;《UTC from IEEE Xplore》;20200619;第1-6页 *
基于压缩特征的稀疏表示运动目标跟踪;张红梅等;《郑州大学学报(工学版)》;20160603(第03期);第24-29页 *
高效率视频编码帧内预测编码单元划分快速算法;齐美彬等;《电子与信息学报》;20140731;第36卷(第7期);第1699-1704页 *

Also Published As

Publication number Publication date
CN113393377A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN112435282B (en) Real-time binocular stereo matching method based on self-adaptive candidate parallax prediction network
CN110610526B (en) Method for segmenting monocular image and rendering depth of field based on WNET
CN109949221B (en) Image processing method and electronic equipment
CN110569851A (en) real-time semantic segmentation method for gated multi-layer fusion
CN113989129A (en) Image restoration method based on gating and context attention mechanism
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
CN113255813A (en) Multi-style image generation method based on feature fusion
CN112184582B (en) Attention mechanism-based image completion method and device
CN112288630A (en) Super-resolution image reconstruction method and system based on improved wide-depth neural network
CN116958534A (en) Image processing method, training method of image processing model and related device
CN113344869A (en) Driving environment real-time stereo matching method and device based on candidate parallax
CN113393377B (en) Single-frame image super-resolution method based on video coding
CN116580184A (en) YOLOv 7-based lightweight model
CN114841859A (en) Single-image super-resolution reconstruction method based on lightweight neural network and Transformer
Tang et al. AutoEnhancer: Transformer on U-Net architecture search for underwater image enhancement
CN110110775A (en) A kind of matching cost calculation method based on hyper linking network
CN112200752B (en) Multi-frame image deblurring system and method based on ER network
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN117237190A (en) Lightweight image super-resolution reconstruction system and method for edge mobile equipment
CN116596822A (en) Pixel-level real-time multispectral image fusion method based on self-adaptive weight and target perception
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
CN114627293A (en) Image matting method based on multi-task learning
CN114638870A (en) Indoor scene monocular image depth estimation method based on deep learning
CN113436094A (en) Gray level image automatic coloring method based on multi-view attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant