CN111696038A - Image super-resolution method, device, equipment and computer-readable storage medium - Google Patents

Image super-resolution method, device, equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111696038A
CN111696038A CN202010454468.3A CN202010454468A CN111696038A CN 111696038 A CN111696038 A CN 111696038A CN 202010454468 A CN202010454468 A CN 202010454468A CN 111696038 A CN111696038 A CN 111696038A
Authority
CN
China
Prior art keywords
module
channel
image
feature
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010454468.3A
Other languages
Chinese (zh)
Inventor
曹朝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202010454468.3A priority Critical patent/CN111696038A/en
Publication of CN111696038A publication Critical patent/CN111696038A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The present disclosure provides an image super-resolution method, apparatus, device and computer-readable storage medium, which are used for solving the technical problem of low quality and efficiency of super-resolution processing on an image. The utility model provides an image super resolution device based on degree of depth learning, the network structure adopts full convolution structure, can input the colour image of arbitrary size, provides new solution thinking on small-object at tasks such as face identification, target detection. The method and the device do not need to carry out additional preprocessing on the input image, and have better generalization capability. Compared with the existing RCAN model, the method increases the attention mechanism of the feature map, increases the self-adaptive maximum pooling feature extraction in the inter-channel attention mechanism, increases the basis of directly upsampling the input image as the output image, and reduces the learning difficulty of the model.

Description

Image super-resolution method, device, equipment and computer-readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence and computer image processing technologies, and in particular, to an image super-resolution method, apparatus, device, and computer-readable storage medium.
Background
Image Super Resolution (Image Super Resolution) is a technique for restoring a low-Resolution Image or a sequence of images into a high-Resolution Image. High resolution images can provide more detail to what is being described, which is essential in many practical applications. Image super-resolution has been one of the most challenging tasks in the field of computer vision. The image super-resolution also has wide application in the industry, such as the fields of image compression, medical imaging, remote sensing imaging, public security and the like.
At present, in the field of image super-resolution, an open-source image super-resolution model RCAN (Residual Channel Attention network) based on deep learning has a good effect. However, the model needs more additional preprocessing on the input image, the generalization capability is not good, and the difficulty of model training is high.
Disclosure of Invention
The present disclosure provides an image super-resolution method, apparatus, device and computer-readable storage medium for solving the technical problem of low quality and efficiency of super-resolution processing of images.
Based on an aspect of the disclosed embodiments, the present disclosure provides an image super-resolution device, including: the convolution device comprises an input module, a first convolution module, K residual group modules connected in series, an amplification group module, an up-sampling module and an output module, wherein the residual group modules comprise K residual characteristic attention modules and at least one second convolution module, the amplification group module comprises a plurality of convolution modules and N/2 pixel recombination modules, and an active layer is connected behind each convolution layer in the input module, the convolution modules and the residual group modules;
the input module is used for preprocessing an input image to obtain a characteristic vector of the input image;
the first convolution module is used for performing feature amplification on the feature vector output by the input module;
the residual error group module is used for extracting image characteristics of the characteristic vectors output by the first convolution module, and the extracted image characteristics comprise intra-channel characteristics and inter-channel characteristics of the image;
the amplifying group module is used for amplifying the image characteristic size after extracting the characteristics of the characteristic vector output by the residual group module and then compressing the image characteristic size into a characteristic vector of 3 channels;
the up-sampling module is used for directly carrying out linear interpolation on the characteristic vector output by the input module, and the characteristic vector output by the up-sampling module is added with the characteristic vector output by the amplifying group module to be used as the input of the output module;
the output module is used for outputting an image with the resolution of the image enlarged by N times;
the residual feature attention module includes: a feature attention module and at least one convolution module;
the convolution module is used for extracting the features;
the characteristic attention module is used for selectively distributing the importance of the image characteristics through an inter-channel attention mechanism and an intra-channel attention mechanism, and enhancing the extraction capability aiming at the image characteristics; the feature vector output by the feature attention module is added with the input feature vector of the residual feature attention module to be used as the output feature vector of the residual feature attention module;
wherein K and K are positive integers, and N is the image magnification of the model.
Further, the feature attention module includes:
the same channel average module is used for respectively summing all element values of each channel of the input feature vectors and then averaging to respectively obtain an average feature vector in each channel;
the same-channel maximum value module is used for acquiring the maximum element value of each channel of the input feature vectors to obtain the intra-channel maximum value feature vector of each channel;
the first superposition module is used for superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel;
the in-channel feature extraction module is used for extracting features of the in-channel feature vectors and activating function processing;
the inter-channel average module is used for summing all element values at the same position on each channel of the input feature vector and then averaging to obtain an inter-channel average feature vector;
the inter-channel maximum value module is used for acquiring the maximum element value of all elements at the same element position on each channel of the input feature vector to obtain an inter-channel maximum value feature vector;
the second superposition module is used for superposing the inter-channel average value eigenvector and the inter-channel maximum value eigenvector;
the inter-channel feature extraction module is used for extracting features of the inter-channel feature vectors and activating function processing;
and the attention output module is used for multiplying the input characteristic vector of the characteristic attention module by the output characteristic vector of the in-channel characteristic extraction module and then multiplying the multiplied input characteristic vector by the output characteristic vector of the inter-channel characteristic extraction module to obtain an H multiplied by W multiplied by C dimension characteristic vector as the output characteristic vector of the whole characteristic attention module, wherein H is the pixel height of the image, W is the pixel width of the image and C is the number of channels.
Based on the embodiment of the present disclosure, the present disclosure further provides an image super-resolution method, including:
preprocessing an input image to obtain a first feature vector of the input image, and performing feature amplification on the first feature vector to obtain a second feature vector;
extracting image features of the second feature vector by using a plurality of residual error groups to obtain a third feature vector, wherein the third feature vector comprises intra-channel features and inter-channel features of the image;
performing feature extraction, image feature size amplification and recompression processing on the third feature vector by using a plurality of pixel recombination modules and convolution layers to obtain a fourth feature vector;
and performing linear interpolation up-sampling processing on the first feature vector to obtain a fifth feature vector, and adding the fifth feature vector and the fourth feature vector to output an image with the resolution increased by N times.
Further, in the step of extracting image features of the second feature vector by using a plurality of residual groups to obtain a third feature vector:
in each residual group, extracting image characteristics through k residual characteristic attention modules, performing convolutional layer processing, and adding the processed image characteristics with input characteristic vectors of the residual group to obtain output characteristic vectors of the residual group;
in each residual error feature attention module, after feature extraction is carried out on input feature vectors through a plurality of convolutional layers, intra-channel features and inter-channel features of the image are extracted through the feature attention module, and finally the feature vectors output by the feature attention module are added with the input feature vectors of the residual error feature attention module and then output.
Further, the processing step of the feature attention module extracting the intra-channel features and the inter-channel features of the image is as follows:
after feature extraction is carried out on the input feature vector by using a plurality of convolution layers, averaging after all element values of each channel are respectively summed, and obtaining an average feature vector in each channel;
acquiring the maximum element value of each channel of the input feature vector to obtain the intra-channel maximum value feature vector of each channel;
superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel;
carrying out feature extraction and activation function processing on the feature vectors in the channel;
summing all element values at the same position on each channel of the input feature vector, and then averaging to obtain an inter-channel average value feature vector;
acquiring maximum element values in all elements at the same element position on each channel of the input feature vector to obtain an inter-channel maximum value feature vector;
superposing the inter-channel average value feature vector and the inter-channel maximum value feature vector;
the system is used for extracting the features of the inter-channel feature vectors and activating function processing;
multiplying the input characteristic vector of the characteristic attention module by the output characteristic vector of the in-channel characteristic extraction module, and then multiplying the multiplied input characteristic vector by the output characteristic vector of the inter-channel characteristic extraction module to obtain an H multiplied by W multiplied by C dimension characteristic vector as the output characteristic vector of the whole characteristic attention module, wherein H is the pixel height of the image, W is the pixel width of the image, and C is the number of channels.
Based on an embodiment of the present disclosure, the present disclosure also provides an image super-resolution device, which includes a processor for executing the aforementioned image super-resolution method.
Based on an embodiment of the present disclosure, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the aforementioned image super-resolution method.
Based on an embodiment of the present disclosure, the present disclosure further provides a training method for an image super-resolution device, when performing model training on the image super-resolution device, a loss function used is a sum of an L1 loss function and an Ls loss function, and a calculation formula of an L1 loss function and an Ls loss function is as follows:
L1(x,y)=|x-y|
Figure BDA0002508829260000051
wherein, mux、μyRespectively representing the mean, σ, of the predicted image x and the label image yx、σyRespectively representing the standard deviation of the predicted image x and the tag image y,
Figure BDA0002508829260000052
respectively representing the variance, σ, of the predicted image x and the label image yxyRepresenting the covariance of the predicted image x and the tag image y.
Further, an Adam training algorithm is adopted when the model of the image super-resolution device is trained.
The utility model provides an image super resolution device based on degree of depth learning, the network structure adopts full convolution structure, can input the colour image of arbitrary size, provides new solution thinking on small-object at tasks such as face identification, target detection. The method and the device do not need to carry out additional preprocessing on the input image, and have better generalization capability. Compared with the existing RCAN model, the method increases the attention mechanism of the feature map, increases the self-adaptive maximum pooling feature extraction in the inter-channel attention mechanism, increases the basis of directly upsampling the input image as the output image, and reduces the learning difficulty of the model.
The feature attention module FA of the embodiment of the present disclosure adds an inter-channel attention mechanism and an intra-channel attention mechanism to the channel attention of the RCAN model, thereby enhancing the feature extraction capability of the model. The method increases the maximum pooling of the same channel and enhances the feature selection capability of the model among the channels. And meanwhile, the feature extraction of maximum pooling and average pooling between channels is also added, two H multiplied by W multiplied by 1 feature maps are obtained respectively for superposition, and 1 multiplied by 1 convolution operation is used for compressing the channels to be 1, so that the feature selection of different areas of the model in the same feature map is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present disclosure or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present disclosure.
Fig. 1 is a schematic structural diagram of an image super-resolution device according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a residual feature attention module according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a feature attention FA module according to an embodiment of the present disclosure;
fig. 4 is a flowchart illustrating an image super-resolution method according to an embodiment of the present disclosure.
Detailed Description
The terminology used in the embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present disclosure. As used in the disclosed embodiments and claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information in the embodiments of the present disclosure, such information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of embodiments of the present disclosure. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".
The image super-resolution technology can restore a high-resolution image from a low-resolution image or an image sequence, a large number of small targets always exist and appear in image target detection and face recognition, the small targets are difficult to detect in the tasks of target detection and face recognition due to small target size, and the image super-resolution technology can enlarge the small target size in the image, so that the difficulty in target recognition is reduced, and the accuracy of target recognition is improved. The image super-resolution technology based on the deep learning firstly uses a convolutional neural network to extract image features, the image features are up-sampled, and the up-sampled image features and the original image are directly added by image features obtained by linear interpolation to obtain a final output image. The method optimizes the structure of the image super-resolution RCAN model.
The present disclosure provides an improved image super-resolution processing model, which can reduce the training difficulty of the image super-resolution model and improve the image quality of image super-resolution processing. For a supervised deep learning neural network model, before formal application to an actual service application scenario, a training sample and a testing sample need to be prepared, the model is trained and tested, various parameters of the model are determined through the training and testing processes, and an applicable model is finally obtained. For example, if a super-resolution (high-resolution) image with a magnification of 4 times is to be obtained by a model, a low-resolution image training sample and an annotated high-resolution image sample are prepared first. After preparing a training sample, inputting a low-resolution training sample into a model, comparing a result output by the model with a corresponding labeled high-resolution sample, adjusting model parameters through a preset loss function, after iterative training of a large number of samples, fixing the model parameters and generating a model file when the prediction accuracy reaches a preset target threshold or reaches a preset recursion number. In the stage of model testing, new images with any size can be input, the input images are subjected to super-resolution processing through the model to obtain a predicted high-resolution image, and when the model reaches a design target through testing, the model can be deployed or solidified in hardware equipment and applied to a specific application scene. For the model which does not meet the design requirement, methods such as reorganizing training data or adjusting the structure of the model may need to be considered, then training and testing are performed again, and the model which can meet the requirements of practical application is finally determined through multiple iterations.
Fig. 1 is a schematic structural diagram of an image super-resolution device provided in an embodiment of the present disclosure, which is understood from a logical functional perspective to be equivalent to an image super-resolution model provided in the present disclosure. The image super-resolution device includes: the device comprises an input module, a first convolution module, K residual group modules connected in series, an amplification and recombination module, an up-sampling module and an output module. Each of the residual group modules includes k groups of Residual Feature Attention Blocks (RFAB) and a second convolution module. The amplification and recombination module comprises a plurality of convolution modules and N/2 pixel recombination modules. Wherein K and K are positive integers, N is the image magnification of the model, and N is an even number greater than 1. n is the size of the convolution kernel. It is understood that the number k of the residual groups K, RFAB, the number N/2 of the PixelShuffle, and the size of the convolution kernel need to be determined after comprehensive consideration of factors such as expected prediction effect, model efficiency, and number of resources, and the number of these blocks preset in this embodiment does not represent a limitation to the scope of the present disclosure, and those skilled in the art can determine the number of resinalgroup, RFAB, and PixelShuffle, and the size of the convolution kernel according to practical application requirements, and the present disclosure is not limited thereto.
And the input module is used for preprocessing the input image to obtain the characteristic vector of the input image. For example, the input image is a 3-channel RGB color image, and each channel is converted into an expression form of a feature vector by the input module.
And the first convolution module is used for performing feature amplification on the feature vector output by the input module. In an embodiment of the present disclosure, the module is implemented by Conv convolution layers of nxn, C, where nxn is the convolution kernel size and C is the number of channels, that is, a convolution kernel of 3 × 3 is used to perform a convolution operation on the feature vectors of the input image, and 64 channels of feature vectors are output.
And the residual group module is used for extracting the image characteristics of the characteristic vector output by the first convolution module, so that the gradient of the network does not disappear along with the increase of the depth. In the embodiment of the present disclosure, the model includes K groups of residual group modules, that is, residual group modules, which are sequentially connected in series, a feature vector output by the first convolution module is used as an input of the first residual group, a feature vector output by the first residual group is used as an input of the second residual group, and so on, and an output of the K residual group is used as an input of the third convolution module.
In an embodiment of the present disclosure, each residual group module includes k residual feature attention modules RFAB and at least one second convolution module. The k residual feature attention modules RFAB are connected in series. The feature vector output by the first convolution module is input to the first residual feature attention module, the feature vector output by the kth residual feature attention module is input to the second convolution module, the feature vector output by the second convolution module is subjected to vector addition with the feature vector output by the first convolution module or the feature vector output by the last residual group module, the obtained feature vector is used as the input of the next residual group module, and the feature vector output by the kth residual group module is used as the input of the third convolution module. Wherein, the addition in this module refers to summing elements of corresponding positions of two vectors, for example, two feature vectors are: v1 [ [1,2], [3,4] ] V2 [ [1,1], [1,2] ], and after addition, V1+ V2 [ [1,2], [3,4] ] + [ [1,1], [1,2] ] [ [2,3], [4,6] ].
Fig. 2 is a schematic structural diagram of residual feature attention modules (RFAB modules), according to an embodiment of the disclosure, each of the residual feature attention modules includes: at least one convolution module and a Feature Attention (FA) module. In an embodiment of the present disclosure, for better extracting image features, the residual feature attention module uses two convolution modules, including a sixth convolution module and a seventh convolution module.
And the sixth convolution module is used for extracting the features, so that the gradient of the network does not disappear along with the increase of the depth. The feature vector input to the RFAB module can be determined by the location of the RFAB module, and can be the feature vector output by the first convolution module, the feature vector output by the previous RFAB module, or the feature vector output by the previous residual group module.
The seventh convolution module is also used for extracting features, so that the network is not degraded along with the increase of the depth. The input of the seventh convolution module is the feature vector output by the sixth convolution module, and the feature vector output by the seventh convolution module is the input of the feature attention module. If the residual feature attention module comprises only one layer of convolution module, the feature vector output by the convolution module is directly input to the feature attention module.
The characteristic attention module is used for selectively distributing the importance of the image characteristics through an inter-channel attention mechanism and an intra-channel attention mechanism, and enhancing the extraction capability aiming at the image characteristics; and adding the feature vector output by the feature attention module and the input feature vector of the residual feature attention module to obtain an output feature vector of the residual feature attention module.
Unlike the channel attention module in the RCAN model network structure, fig. 3 is a schematic structural diagram of a feature attention FA module provided in an embodiment of the present disclosure, and the feature attention FA module further provides a same channel maximum value module, an inter-channel average value module, and an inter-channel maximum value module on the basis of providing the same channel average value module.
And the same channel average value module is used for respectively summing all element values of each channel of the input feature vector and then averaging to respectively obtain the average feature vector in each channel. The module can extract the comprehensive importance of the features in the same channel.
And the same channel maximum value module is used for acquiring the maximum element value of each channel of the input feature vector to obtain the intra-channel maximum value feature vector of each channel. The module can extract important features in the same channel.
And the first superposition module is used for superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel to generate the characteristic vector in the channel. And superposing the feature vector output by the same channel average value module and the feature vector output by the same channel maximum value module on the channel. For example, the output of the average module of the same channel is a feature vector of 1 × 1 × C, and the output of the maximum module of the same channel is a feature vector of 1 × 1 × C, and the two are superimposed to generate a feature vector of 1 × 1 × 2C. A 1 × 1 × 2C feature vector is generated. The superposition processing refers to piecing together two feature vectors, for example: the vector [1,2,3,4] is superimposed [2,3,4,5] to obtain [1,2,3,4,2,3,4,5], which is equivalent to expanding the number of channels C of the feature vector to 2C.
The in-channel feature extraction module is used for extracting features of the in-channel feature vectors and activating function processing;
in an embodiment of the present disclosure, the in-channel feature extraction module includes two convolution layers and one activation function layer, which are a ninth convolution module and a tenth convolution module.
And the ninth convolution module is used for compressing the characteristics. In an embodiment of the present disclosure, a convolution kernel used by the ninth convolution module is 1 × 1, a number of channels is C/D, and a multiple of channel compression is D.
And the tenth convolution module is used for extracting the features. In an embodiment of the present disclosure, a convolution kernel used by the tenth convolution module is 1 × 1, and the number of channels is C.
And the in-channel activation function module is used for processing the feature vector output by the tenth convolution module by using an activation function and changing the feature data into a range from 0 to 1. The activation function may be Sigmoid, tanh, etc., and the disclosure is not limited thereto.
And the inter-channel average module is used for summing all element values at the same position on each channel of the input feature vector and then averaging to obtain the inter-channel average feature vector. The module can extract the comprehensive importance of the features among the channels. And the inter-channel maximum value module is used for acquiring the maximum element value of all elements at the same element position on each channel of the input feature vector to obtain the inter-channel maximum value feature vector. The module can extract important features among channels.
And the second superposition module is used for superposing the inter-channel average value feature vector and the inter-channel maximum value feature vector to generate the inter-channel feature vector. The module superimposes the eigenvector output by the inter-channel average module with the eigenvector output by the inter-channel average module on the channel. For example, the output of the inter-channel average module is H × W × 1 dimensional feature vector, and the H × W × 2 feature vectors are generated by superimposing them.
The inter-channel feature extraction module is used for extracting features of the inter-channel feature vectors and activating function processing;
in an embodiment of the present disclosure, the inter-channel feature extraction module includes two convolution layers, an eleventh convolution module and a twelfth convolution module, and one activation function layer.
And the eleventh convolution module is used for carrying out channel expansion on the input features and then carrying out feature extraction.
And the twelfth convolution module is used for extracting the features. In an embodiment of the present disclosure, the twelfth convolution module is a convolution layer of 1x1x1, and the image feature vectors of 4 channels can be compressed into image feature vectors of 1 channel through the convolution layer, so as to compress information of 4 channels onto one channel.
And the activation function module is used for processing the feature vector output by the twelfth convolution module by using the activation function and changing the feature data into the range from 0 to 1. The activation function may be Sigmoid, tanh, etc., and the disclosure is not limited thereto.
An attention output module, which is used for multiplying the input feature vector of the feature attention module by the feature vector output by the intra-channel feature extraction module, and then multiplying the multiplied result by the feature vector output by the inter-channel feature extraction module to obtain the feature vector of H × W × C dimension as the output feature vector of the whole feature attention module, wherein, the multiplication is performed in FIG. 3 to obtain the output feature vector of the whole feature attention module
Figure BDA0002508829260000111
Symbolic representation means multiplication of elements at corresponding positions in two feature vectors, e.g. vector [1,2,3,4]]And vector [2,3,4,5]]Multiplying to obtain [2,6,12, 20%]。
The feature attention module FA of the embodiment of the present disclosure adds an inter-channel attention mechanism and an intra-channel attention mechanism to the ca (channel attention) channel attention of the RCAN model, thereby enhancing the feature extraction capability of the model. The method increases the maximum pooling of the same channel and enhances the feature selection capability of the model among the channels by adding feature extraction of the maximum value of the same channel in the attention of the channels, superposing the feature extraction with the average value of the same channel on the channels to be 1 multiplied by 2C, and compressing the channels to be C by using 1 multiplied by 1 convolution operation. And meanwhile, the feature extraction of maximum pooling and average pooling between channels is also added, two H multiplied by W multiplied by 1 feature maps are obtained respectively for superposition, and the 1 multiplied by 1 convolution operation is used for compressing the channels into 1, so that the operation can enhance the feature selection of the model in different areas in the same feature map.
And the amplifying group module is used for carrying out feature extraction on the feature vector output by the residual group module, amplifying the image feature size and then compressing the image feature size into a 3-channel feature vector.
In an embodiment of the present disclosure, the amplification group module includes an N/2 pixel recombination module, a third convolution module, and a fifth convolution module, where N is an image super-resolution magnification factor.
And the third convolution module is used for extracting the image characteristics.
And the pixel recombination module is used for enlarging the characteristic size of the image.
Each pixel reorganization module comprises at least one fourth convolution module and a first pixel reorganization unit. And the feature vector output by the third convolution module or the feature vector output by the previous-stage pixel recombination module is used as the input of a fourth convolution module, and the feature vector output by the fourth convolution module is used as the input of the first pixel recombination unit. And the feature vector output by the N/2 th pixel recombination module is used as the input of a fifth convolution module, wherein N represents the magnification of the model to the input image.
And the fifth convolution module is used for compressing the image features into 3-channel characteristic vectors to prepare for output.
And the up-sampling module is used for directly performing linear interpolation on the input image, and the feature vector obtained by adding the feature vector output by the up-sampling module and the feature vector output by the amplifying group module is used as the input of the output module.
And the output module is used for outputting the final image data, and finally outputting the high-resolution image with the image resolution expanded by N times.
It should be noted that, in the embodiment of the present disclosure, a ReLU active layer is connected to each of the input module, the convolution module, the residual group module, and after each convolution layer. However, in order to make the model structure more concise, the ReLU active layer connected in the input module, the convolution module, the residual group module and after each convolution layer in the drawing is omitted and not shown.
In an embodiment of the present disclosure, when training the image super-resolution model provided by the present disclosure, the loss function used includes two parts, one is an L1 loss function (denoted by L1), and the other is a structural similarity ssim (structural similarity index) loss function (denoted by Ls):
L=L1+Ls
wherein the calculation formulas of L1 and Ls are as follows:
L1(x,y)=|x-y|
Figure BDA0002508829260000121
wherein, mux、μyRespectively representing the mean, σ, of the predicted image x and the label image yx、σyRespectively representing the standard deviation of the predicted image x and the tag image y,
Figure BDA0002508829260000122
respectively representing the variance, σ, of the predicted image x and the label image yxyRepresenting the covariance of the predicted image x and the tag image y.
Compared with the RCAN model, the SSIM loss function is added in the loss function of the model provided by the disclosure, the optimization of the model in the aspect of structural similarity is enhanced, and the model is closer to the picture of a real scene.
When the image super-resolution model provided by the disclosure is trained, an Adam algorithm is adopted to train the model. The Adam algorithm is an improved version of the stochastic gradient descent algorithm, which may promote faster convergence of the model. The learning rate was set to 0.0001 while L2 regularization was employed to reduce overfitting of the model.
Fig. 4 is a schematic flowchart of an image super-resolution method according to an embodiment of the present disclosure, please refer to the image super-resolution model structure of fig. 1, where the method includes:
step 410, preprocessing the input image to obtain a first feature vector of the input image, and performing feature amplification on the first feature vector to obtain a second feature vector.
This step corresponds to the operation of the input module and the first convolution module.
Step 420, extracting image features of the second feature vector by using a plurality of residual error groups to obtain a third feature vector, wherein the third feature vector comprises intra-channel features and inter-channel features of the image;
this step corresponds to the function of K concatenated residual group modules.
430, performing feature extraction and image feature size amplification on the third feature vector by using a plurality of pixel recombination modules and convolution layers, and then performing recompression processing on the third feature vector to obtain a fourth feature vector;
this step corresponds to the function of the amplification group module. The compression in the given step refers to channel compression, and finally the number of channels of the image feature vector is compressed into 3 channels.
Step 440, performing linear interpolation up-sampling processing on the first feature vector to obtain a fifth feature vector, and adding the fifth feature vector and the fourth feature vector to output an image with the resolution increased by N times;
this step corresponds to the processing step from the up-sampling module to the output module.
In an embodiment of the present disclosure, step 420 further includes: extracting image features in each residual group through k residual feature attention modules, and adding the extracted image features and input feature vectors of the residual group through convolutional layer processing to obtain output feature vectors of the residual group;
in an embodiment of the present disclosure, in each residual feature attention module, after feature extraction is performed on an input feature vector through a plurality of convolutional layers, intra-channel features and inter-channel features of an image are extracted through the feature attention module, and finally, a feature vector output by the feature attention module is added to an input feature vector of the residual feature attention module and then output.
In an embodiment of the present disclosure, the processing steps of the feature attention module in each residual feature attention module for extracting the intra-channel features and the inter-channel features of the image are as follows:
step 421, after feature extraction is performed on the input feature vector by using a plurality of convolution layers, averaging after summing all element values of each channel respectively to obtain an average feature vector in each channel respectively;
step 422, obtaining the maximum element value of each channel of the input feature vector to obtain the intra-channel maximum value feature vector of each channel;
step 423, superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel;
step 424, feature extraction and function activation processing are carried out on the feature vectors in the channel;
step 425, summing all element values at the same position on each channel of the input feature vector, and then averaging to obtain an inter-channel average feature vector;
426, acquiring maximum element values of all elements at the same element position on each channel of the input feature vector to obtain a maximum value feature vector between channels;
427, superposing the inter-channel average value feature vector and the inter-channel maximum value feature vector;
step 428, is configured to perform feature extraction and activation function processing on the inter-channel feature vectors;
step 429, multiplying the input feature vector of the feature attention module by the output feature vector of the intra-channel feature extraction module, and then multiplying the multiplied input feature vector by the feature vector output by the inter-channel feature extraction module to obtain an H × W × C-dimensional feature vector as the output feature vector of the whole feature attention module, where H is the image pixel height, W is the image pixel width, and C is the number of channels.
The above description is only an example of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the scope of the claims of the present disclosure.

Claims (9)

1. An image super-resolution device, comprising: the convolution device comprises an input module, a first convolution module, K residual group modules connected in series, an amplification group module, an up-sampling module and an output module, wherein the residual group modules comprise K residual characteristic attention modules and at least one second convolution module, the amplification group module comprises a plurality of convolution modules and N/2 pixel recombination modules, and an active layer is connected behind each convolution layer in the input module, the convolution modules and the residual group modules;
the input module is used for preprocessing an input image to obtain a characteristic vector of the input image;
the first convolution module is used for performing feature amplification on the feature vector output by the input module;
the residual error group module is used for extracting image characteristics of the characteristic vectors output by the first convolution module, and the extracted image characteristics comprise intra-channel characteristics and inter-channel characteristics of the image;
the amplifying group module is used for amplifying the image characteristic size after extracting the characteristics of the characteristic vector output by the residual group module and then compressing the image characteristic size into a characteristic vector of 3 channels;
the up-sampling module is used for directly carrying out linear interpolation on the characteristic vector output by the input module, and the characteristic vector output by the up-sampling module is added with the characteristic vector output by the amplifying group module to be used as the input of the output module;
the output module is used for outputting an image with the resolution of the image enlarged by N times;
the residual feature attention module includes: a feature attention module and at least one convolution module;
the convolution module is used for extracting the features;
the characteristic attention module is used for selectively distributing the importance of the image characteristics through an inter-channel attention mechanism and an intra-channel attention mechanism, and enhancing the extraction capability aiming at the image characteristics; the feature vector output by the feature attention module is added with the input feature vector of the residual feature attention module to be used as the output feature vector of the residual feature attention module;
wherein K and K are positive integers, and N is the image magnification of the model.
2. The apparatus of claim 1, wherein the feature attention module comprises:
the same channel average module is used for respectively summing all element values of each channel of the input feature vectors and then averaging to respectively obtain an average feature vector in each channel;
the same-channel maximum value module is used for acquiring the maximum element value of each channel of the input feature vectors to obtain the intra-channel maximum value feature vector of each channel;
the first superposition module is used for superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel;
the in-channel feature extraction module is used for extracting features of the in-channel feature vectors and activating function processing;
the inter-channel average module is used for summing all element values at the same position on each channel of the input feature vector and then averaging to obtain an inter-channel average feature vector;
the inter-channel maximum value module is used for acquiring the maximum element value of all elements at the same element position on each channel of the input feature vector to obtain an inter-channel maximum value feature vector;
the second superposition module is used for superposing the inter-channel average value eigenvector and the inter-channel maximum value eigenvector;
the inter-channel feature extraction module is used for extracting features of the inter-channel feature vectors and activating function processing;
and the attention output module is used for multiplying the input characteristic vector of the characteristic attention module by the output characteristic vector of the in-channel characteristic extraction module and then multiplying the multiplied input characteristic vector by the output characteristic vector of the inter-channel characteristic extraction module to obtain an H multiplied by W multiplied by C dimension characteristic vector as the output characteristic vector of the whole characteristic attention module, wherein H is the pixel height of the image, W is the pixel width of the image and C is the number of channels.
3. An image super-resolution method, characterized in that the method comprises:
preprocessing an input image to obtain a first feature vector of the input image, and performing feature amplification on the first feature vector to obtain a second feature vector;
extracting image features of the second feature vector by using a plurality of residual error groups to obtain a third feature vector, wherein the third feature vector comprises intra-channel features and inter-channel features of the image;
performing feature extraction, image feature size amplification and recompression processing on the third feature vector by using a plurality of pixel recombination modules and convolution layers to obtain a fourth feature vector;
and performing linear interpolation up-sampling processing on the first feature vector to obtain a fifth feature vector, and adding the fifth feature vector and the fourth feature vector to output an image with the resolution increased by N times.
4. The method of claim 3, wherein the step of extracting the image feature of the second feature vector by using a plurality of residual groups to obtain a third feature vector comprises:
in each residual group, extracting image characteristics through k residual characteristic attention modules, performing convolutional layer processing, and adding the processed image characteristics with input characteristic vectors of the residual group to obtain output characteristic vectors of the residual group;
in each residual error feature attention module, after feature extraction is carried out on input feature vectors through a plurality of convolutional layers, intra-channel features and inter-channel features of the image are extracted through the feature attention module, and finally the feature vectors output by the feature attention module are added with the input feature vectors of the residual error feature attention module and then output.
5. The method of claim 4, wherein the processing step of the feature attention module extracting the intra-channel features and the inter-channel features of the image is:
after feature extraction is carried out on the input feature vector by using a plurality of convolution layers, averaging after all element values of each channel are respectively summed, and obtaining an average feature vector in each channel;
acquiring the maximum element value of each channel of the input feature vector to obtain the intra-channel maximum value feature vector of each channel;
superposing the average characteristic vector in the channel and the maximum characteristic vector in the channel;
carrying out feature extraction and activation function processing on the feature vectors in the channel;
summing all element values at the same position on each channel of the input feature vector, and then averaging to obtain an inter-channel average value feature vector;
acquiring maximum element values in all elements at the same element position on each channel of the input feature vector to obtain an inter-channel maximum value feature vector;
superposing the inter-channel average value feature vector and the inter-channel maximum value feature vector;
the system is used for extracting the features of the inter-channel feature vectors and activating function processing;
multiplying the input characteristic vector of the characteristic attention module by the output characteristic vector of the in-channel characteristic extraction module, and then multiplying the multiplied input characteristic vector by the output characteristic vector of the inter-channel characteristic extraction module to obtain an H multiplied by W multiplied by C dimension characteristic vector as the output characteristic vector of the whole characteristic attention module, wherein H is the pixel height of the image, W is the pixel width of the image, and C is the number of channels.
6. An image super-resolution device characterized by comprising a processor for performing the method of claim 3,4 or 5.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of claim 3,4 or 5.
8. A training method for the image super-resolution device, wherein when the model training is performed for the image super-resolution device of claim 1 or 2, the loss function used is the sum of the L1 loss function and the Ls loss function, and the calculation formula of the L1 loss function and the Ls loss function is as follows:
L1(x,y)=|x-y|
Figure FDA0002508829250000041
wherein, mux、μyRespectively representing the mean, σ, of the predicted image x and the label image yx、σyRespectively show the prediction graphsLike the standard deviation of x and the label image y,
Figure FDA0002508829250000042
respectively representing the variance, σ, of the predicted image x and the label image yxyRepresenting the covariance of the predicted image x and the tag image y.
9. The training method of claim 8, wherein an Adam training algorithm is used in model training of the image super-resolution device.
CN202010454468.3A 2020-05-26 2020-05-26 Image super-resolution method, device, equipment and computer-readable storage medium Pending CN111696038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010454468.3A CN111696038A (en) 2020-05-26 2020-05-26 Image super-resolution method, device, equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010454468.3A CN111696038A (en) 2020-05-26 2020-05-26 Image super-resolution method, device, equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111696038A true CN111696038A (en) 2020-09-22

Family

ID=72478321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010454468.3A Pending CN111696038A (en) 2020-05-26 2020-05-26 Image super-resolution method, device, equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111696038A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419201A (en) * 2020-12-04 2021-02-26 珠海亿智电子科技有限公司 Image deblurring method based on residual error network
CN113674153A (en) * 2021-08-10 2021-11-19 Oppo广东移动通信有限公司 Image processing chip, electronic device, image processing method, and storage medium
CN114049254A (en) * 2021-10-29 2022-02-15 华南农业大学 Low-pixel ox-head image reconstruction and identification method, system, equipment and storage medium
TWI799265B (en) * 2022-05-12 2023-04-11 瑞昱半導體股份有限公司 Super resolution device and method
CN117291846A (en) * 2023-11-27 2023-12-26 北京大学第三医院(北京大学第三临床医学院) OCT system applied to throat microsurgery and image denoising method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419201A (en) * 2020-12-04 2021-02-26 珠海亿智电子科技有限公司 Image deblurring method based on residual error network
CN113674153A (en) * 2021-08-10 2021-11-19 Oppo广东移动通信有限公司 Image processing chip, electronic device, image processing method, and storage medium
CN114049254A (en) * 2021-10-29 2022-02-15 华南农业大学 Low-pixel ox-head image reconstruction and identification method, system, equipment and storage medium
TWI799265B (en) * 2022-05-12 2023-04-11 瑞昱半導體股份有限公司 Super resolution device and method
CN117291846A (en) * 2023-11-27 2023-12-26 北京大学第三医院(北京大学第三临床医学院) OCT system applied to throat microsurgery and image denoising method
CN117291846B (en) * 2023-11-27 2024-02-27 北京大学第三医院(北京大学第三临床医学院) OCT system applied to throat microsurgery and image denoising method

Similar Documents

Publication Publication Date Title
CN112750082B (en) Human face super-resolution method and system based on fusion attention mechanism
Yu et al. Hallucinating very low-resolution unaligned and noisy face images by transformative discriminative autoencoders
CN110119780B (en) Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network
CN111696038A (en) Image super-resolution method, device, equipment and computer-readable storage medium
CN108475415B (en) Method and system for image processing
Fan et al. Balanced two-stage residual networks for image super-resolution
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
CN111639692A (en) Shadow detection method based on attention mechanism
CN112070670B (en) Face super-resolution method and system of global-local separation attention mechanism
Su et al. Spatially adaptive block-based super-resolution
Hayat Super-resolution via deep learning
GB2580671A (en) A computer vision system and method
US20110170784A1 (en) Image registration processing apparatus, region expansion processing apparatus, and image quality improvement processing apparatus
CN111105352A (en) Super-resolution image reconstruction method, system, computer device and storage medium
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
Sun et al. Hybrid pixel-unshuffled network for lightweight image super-resolution
CN111696035A (en) Multi-frame image super-resolution reconstruction method based on optical flow motion estimation algorithm
CN112465801B (en) Instance segmentation method for extracting mask features in scale division mode
US20230153946A1 (en) System and Method for Image Super-Resolution
Zhou et al. Image super-resolution based on dense convolutional auto-encoder blocks
CN116071748A (en) Unsupervised video target segmentation method based on frequency domain global filtering
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
Schirrmacher et al. Sr 2: Super-resolution with structure-aware reconstruction
CN116862765A (en) Medical image super-resolution reconstruction method and system
CN111611962A (en) Face image super-resolution identification method based on fractional order multi-set partial least square

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination