CN112330539A

CN112330539A - Super-resolution image reconstruction method, device, storage medium and electronic equipment

Info

Publication number: CN112330539A
Application number: CN202011080553.4A
Authority: CN
Inventors: 赵元; 任文琦; 牛犇; 温伟磊; 王培阳; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-02-05

Abstract

The embodiment of the invention discloses a super-resolution image reconstruction method, a super-resolution image reconstruction device, a storage medium and electronic equipment. And determining an intermediate feature matrix input into the first attention module according to each intermediate feature image sequence to obtain a first deep feature image sequence, and inputting the first intermediate image sequence input into the second attention module to obtain a second deep feature image sequence. And finally, determining a high-resolution image according to the shallow characteristic image sequence and each deep characteristic image sequence. The embodiment of the invention can convert the low-resolution image into the high-resolution image through super-resolution image reconstruction. Meanwhile, the importance of the characteristics of the middle layer is distinguished and utilized in the image reconstruction process, so that the accuracy of the output high-resolution image is improved.

Description

Super-resolution image reconstruction method, device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a super-resolution image reconstruction method and apparatus, a storage medium, and an electronic device.

Background

At present, the image technology is widely applied to various fields, but when the resolution of an image acquired under the conditions of severe image acquisition environment and the like is too low, the image technology cannot be directly used. The low-resolution image needs to be converted into a high-resolution image for use.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a super-resolution image reconstruction method, apparatus, storage medium, and electronic device, which aim to convert a low-resolution image into a high-resolution image by using the super-resolution image reconstruction method.

In a first aspect, an embodiment of the present invention provides a super-resolution image reconstruction method, where the method includes:

inputting the low-resolution image into a first convolution layer to determine a shallow feature image sequence;

inputting the shallow feature images into a plurality of feature extraction modules which are connected in sequence to determine an intermediate feature image sequence corresponding to each feature extraction module;

determining an intermediate characteristic matrix and a first intermediate image sequence according to each intermediate characteristic image sequence;

inputting the intermediate feature matrix into a first attention module to determine a first sequence of deep feature images;

inputting the first sequence of intermediate images into a second attention module to determine a second sequence of deep feature images;

and determining a high-resolution image according to the shallow feature image sequence, the first deep feature image sequence and the second deep feature image sequence.

In a second aspect, an embodiment of the present invention provides a super-resolution image reconstruction apparatus, including:

a first feature extraction unit, configured to input the low-resolution image into the first convolution layer to determine a shallow feature image sequence;

the second feature extraction unit is used for inputting the shallow feature images into a plurality of feature extraction modules which are connected in sequence so as to determine an intermediate feature image sequence corresponding to each feature extraction module;

an intermediate image determining unit configured to determine an intermediate feature matrix and a first intermediate image sequence from each of the intermediate feature image sequences;

a first deep feature determination unit for inputting the intermediate feature matrix into a first attention module to determine a first sequence of deep feature images;

a second deep feature determination unit for inputting the first sequence of intermediate images into a second attention module for determining a second sequence of deep feature images;

a high resolution image determination unit for determining a high resolution image from the shallow feature image sequence, the first deep feature image sequence and the second deep feature image sequence.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer program instructions, which when executed by a processor implement the method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, the memory being configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method according to the first aspect.

According to the embodiment of the invention, a shallow feature image sequence is determined by inputting a low-resolution image into a first convolution layer, and a plurality of corresponding intermediate feature image sequences are determined by inputting a shallow feature image into a plurality of feature extraction modules which are connected in sequence. And determining an intermediate feature matrix input into the first attention module according to each intermediate feature image sequence to obtain a first deep feature image sequence, and inputting the first intermediate image sequence input into the second attention module to obtain a second deep feature image sequence. And finally, determining a high-resolution image according to the shallow characteristic image sequence and each deep characteristic image sequence. The embodiment of the invention can convert the low-resolution image into the high-resolution image through super-resolution image reconstruction. Meanwhile, the importance of the characteristics of the middle layer is distinguished and utilized in the image reconstruction process, so that the accuracy of the output high-resolution image is improved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a flowchart of a super-resolution image reconstruction method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a super-resolution image reconstruction method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an intermediate feature image sequence determination process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a first deep feature image sequence determination process according to an embodiment of the invention;

FIG. 5 is a diagram illustrating a second sequence of deep feature images determination process according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a super-resolution image reconstruction apparatus according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.

Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.

The super-resolution image reconstruction method can be realized by installing the terminal equipment or the server of the image reconstruction frame obtained by pre-training, namely, the terminal equipment or the server inputs the low-resolution image to be processed into the image reconstruction model frame to obtain the corresponding high-resolution image. The image reconstruction framework is used for executing the super-resolution image reconstruction method and comprises a plurality of convolution layers, a feature extraction module, a first attention module and a second attention module. The terminal device may be a general data processing terminal capable of running a computer program and having a communication function, such as a smart phone, a tablet computer, or a notebook computer. The server may be a single server or a cluster of servers configured in a distributed manner. The low-resolution image can be acquired by an image acquisition device arranged on the terminal equipment or connected with the server, or can be transmitted to the terminal equipment or the server with the image reconstruction frame through other equipment, so that the image reconstruction is carried out through the terminal equipment or the server.

Fig. 1 is a flowchart of a super-resolution image reconstruction method according to an embodiment of the present invention. As shown in fig. 1, the super-resolution image reconstruction method includes the following steps:

and step S100, inputting the low-resolution image into the first convolution layer to determine a shallow feature image sequence.

Specifically, the low-resolution image may be directly acquired and acquired by an image acquisition device installed or connected to the terminal device, or may be directly acquired and acquired by a server for performing super-resolution image reconstruction or an image acquisition device connected to the interrupt device. For example, when the terminal device is a notebook computer, a low-resolution image may be acquired by a camera built in the notebook computer or a connected camera device. Alternatively, the low-resolution image may be transmitted to a terminal device or a server that performs super-resolution image reconstruction processing by another device. For example, the stored low-resolution image may be transmitted to a terminal device or a server through an image storage apparatus with a communication function for super-resolution image reconstruction. The super-resolution image reconstruction of the low-resolution image by a server with an image reconstruction framework is described as an example.

After determining the low-resolution image, the server inputs the low-resolution image into a first convolution layer (CNN) of the image reconstruction frame to perform preliminary feature extraction to obtain a shallow feature image sequence. The first convolution layer is a convolution neural network obtained through pre-training, and the shallow feature image corresponding to each convolution kernel is determined through sliding convolution on the low-resolution image through a plurality of corresponding convolution kernels. And finally determining a shallow feature image sequence according to each shallow feature image.

Step S200, inputting the shallow feature images into a plurality of feature extraction modules which are connected in sequence to determine an intermediate feature image sequence corresponding to each feature extraction module.

Specifically, the image reconstruction framework includes a plurality of feature extraction modules connected in sequence, and is configured to perform further feature extraction on each shallow feature image. In this embodiment of the present invention, the further feature extraction process specifically includes performing multiple feature extractions through each of the feature extraction modules in an iterative manner by using the shallow feature image sequence as an initial input, and determining an intermediate feature image sequence corresponding to each of the feature extraction modules. For example, when N feature extraction modules are included in the image reconstruction framework, N intermediate feature image sequences may be extracted. And the intermediate characteristic image sequence determined in each iteration process is also used as the input of a characteristic extraction module in the next iteration process. Optionally, the feature extraction module includes a plurality of sequentially connected residual channel attention layers and a second convolutional layer (CNN). Therefore, the process of determining the intermediate feature image sequence corresponding to each of the feature extraction modules may include the following steps:

and step S210, sequentially inputting the input of the current feature extraction module into each residual channel attention layer and the second convolution layer in sequence.

Specifically, when the current feature extraction module is a feature extraction module performing a first iteration process, the shallow feature image sequence is input. And when the current feature extraction module is not the feature extraction module for performing the first iteration process, inputting the intermediate feature image sequence output by the feature extraction module for performing the last iteration process. And after the shallow characteristic image sequence or the middle characteristic image sequence is input into the current characteristic extraction module, sequentially inputting a plurality of residual error channel attention layers which are sequentially connected, and inputting the second convolution layer for characteristic extraction.

In the embodiment of the present invention, the feature extraction processing of each residual attention layer is to use the input of the current feature extraction module as an initial input, perform feature extraction sequentially through each residual attention layer in an iterative manner, and use the result of each feature extraction as the input of the next residual attention layer. Wherein each of the residual channel attention layers further comprises a first sub-convolutional layer and a second sub-convolutional layer. After the shallow characteristic image sequence/middle characteristic image sequence or the output image sequence of the last residual channel attention layer is input into the residual channel attention layer, the convolution is carried out through a first sub-convolution layer, and the convolution result is subjected to linear correction to obtain a first convolution image sequence. And then, the first convolution image sequence is convoluted through a second sub-convolution layer, and a convolution result is input into the pooling layer after being linearly corrected to obtain a second convolution image sequence. And performing logistic regression processing on the second convolution image sequence through a logistic regression function (softmax), multiplying the second convolution image sequence by the first convolution image sequence, and adding the obtained result and the input of the residual channel attention layer to obtain the output of the current residual channel attention layer.

Step S220, determining a corresponding intermediate feature image sequence according to the input of the current feature extraction module and the output of the second convolution layer.

Specifically, after the input of the current feature extraction module sequentially passes through each residual channel attention layer and the second convolution layer for feature extraction, the obtained output is added with the input of the current feature extraction module to obtain a corresponding intermediate feature image sequence. The dimensionality of the intermediate characteristic image sequence is preset H multiplied by W multiplied by C, H is the image height, W is the image width, and C is the channel number. The number of channels may include a number of color channels, which may include three color channels of red, green, and blue, and a number of Alpha (Alpha) channels, which are used to characterize the degree of transparency and translucency of an image. When the embodiment of the invention performs feature extraction through N feature extraction modules which are sequentially connected, N intermediate feature image sequences with dimension H multiplied by W multiplied by C are obtained, namely, an intermediate feature image sequence group with dimension N multiplied by H multiplied by W multiplied by C is finally obtained.

And step S300, determining an intermediate characteristic matrix and a first intermediate image sequence according to the intermediate characteristic image sequences.

Specifically, after feature extraction is performed by a plurality of feature extraction modules to determine a plurality of corresponding intermediate feature image sequences, an intermediate feature matrix and a first intermediate image sequence are determined according to each intermediate feature image sequence. The intermediate feature matrix comprises information of each intermediate feature image sequence, and is used for inputting the information into the first attention module to adaptively emphasize the importance of the intermediate feature image sequence output by each feature extraction module. The first intermediate image sequence only comprises information of the intermediate feature image sequence corresponding to the last feature extraction module, and the information is used for inputting a second attention module to model the relationship of different channels and different positions in the first intermediate image sequence. The first attention module and the second attention module are pre-trained with different training sets to correspond to different attention mechanisms. The attention mechanism is used to focus on part of the information in the image and ignore irrelevant information. The attention mechanism corresponding to the first attention module is used for paying attention to the importance of different intermediate feature image sequences extracted by the low-resolution image features, and the attention mechanism corresponding to the second attention module is used for paying attention to the spatial relationships of different channels, positions and the like corresponding to the low-resolution images.

Thus, in an embodiment of the present invention, the process of determining the intermediate feature matrix and the first intermediate image sequence further includes the following steps:

and S310, determining an intermediate characteristic matrix according to each intermediate characteristic image sequence.

In particular, the intermediate feature matrix may be determined by stitching the respective intermediate feature image sequences. Splicing the intermediate characteristic image sequences to determine at least one intermediate matrix, and processing the intermediate matrices through a reshape function to readjust the row number, column number and dimension of the intermediate matrices to obtain an intermediate characteristic matrix with a preset dimension. For example, when N intermediate feature image sequences with dimensions H × W × C are determined in step S200, each intermediate feature image sequence is tiled to obtain a corresponding intermediate matrix with a size H × WC, and each intermediate matrix is processed by a reshape function to obtain a two-dimensional matrix with a size N × HWC as an intermediate feature matrix.

And step S320, determining a first intermediate image sequence according to the intermediate characteristic image sequence corresponding to the last characteristic extraction module.

Specifically, the determining process of the first intermediate image sequence may be to acquire an intermediate feature image sequence output by a last feature extraction module of the plurality of feature extraction modules connected in sequence, and input the intermediate feature image sequence into a third convolution layer to obtain the first intermediate image sequence. The third convolutional layer is a convolutional neural network obtained by pre-training so as to further perform feature extraction to determine a first intermediate image sequence. The first intermediate image sequence has the same dimensions as the intermediate feature image sequence input to the third convolutional layer. For example, when the dimension of the intermediate feature image sequence input to the third convolution layer is H × W × C, the dimension of the first intermediate image sequence is also H × W × C.

Step S400, inputting the intermediate feature matrix into a first attention module to determine a first deep feature image sequence.

Specifically, an intermediate feature matrix determined according to each intermediate feature image sequence is input into the first attention module, so that the importance of the intermediate feature image sequences output by each feature extraction module is adaptively distinguished, and a corresponding first deep-layer feature image sequence is determined. The attention mechanism of the first attention module is used to decide which part of the input information needs attention and to allocate limited information processing resources to the important part that needs attention. In an embodiment of the present invention, the first attention module is a layer attention module, and is configured to determine importance degrees of intermediate feature image sequences output by different feature extraction modules, so as to determine deep features corresponding to the low-resolution images according to the importance degrees of the intermediate feature image sequences, so as to obtain a first deep feature image sequence. The process of determining a first sequence of deep feature images by the first attention module comprises the steps of:

step S410, performing a logistic regression process on the value obtained by multiplying the intermediate feature matrix by the transposed intermediate feature matrix to determine a correlation matrix.

Specifically, after the intermediate feature matrix is input into the first attention module, the intermediate feature matrix is transposed. For example, when the intermediate feature matrix is a two-dimensional matrix with a size of N × HWC, the two-dimensional matrix with a size of HWC × N is obtained after transposition. And sequentially calculating the product of each row in the intermediate feature matrix and the transferred intermediate feature matrix, and performing logistic regression processing on the product through a logistic regression function (softmax) to finally determine a correlation matrix for representing the correlation of different intermediate feature image sequences. Optionally, before calculating the product of the intermediate feature matrix and the transposed intermediate feature matrix, the intermediate feature matrix and the transposed intermediate feature matrix may be converted into a matrix of predetermined dimensions by a recomposition (reshape) function. Each dimension in the correlation matrix is used for respectively representing the correlation of two intermediate characteristic image sequences.

For example, when the intermediate feature matrix is FG, the correlation matrix is:

wherein the content of the first and second substances,

for reshape function, δ is the softmax function, and N is the number of intermediate feature image sequences. Omega_j,iRepresenting the correlation coefficient between the ith and jth intermediate feature image sequences.

And step S420, determining a plurality of second intermediate image sequences according to the product of the correlation matrix and the intermediate characteristic matrix.

Specifically, after a correlation matrix is determined, products of the correlation matrix and each row in the intermediate feature matrix are calculated, the products are summed up and then added with a scaling coefficient obtained by pre-training, and then reshape processing is carried out, so that a plurality of second intermediate image sequences are determined. For example, when the intermediate feature matrix is FG, the correlation matrix is ω_j,iAlpha is a predetermined proportionality coefficient, and N is the number of the intermediate characteristic image sequences, calculating

And performing dimension conversion on the calculation result through a reshape function to obtain a plurality of corresponding second intermediate image sequences.

Step S430, calculating a sum of each second intermediate image sequence and each intermediate feature image sequence to determine a first deep feature image sequence.

Specifically, after a plurality of second intermediate image sequences are determined, the sum of each second intermediate image sequence and each intermediate feature image sequence is calculated, and then dimension conversion is performed on the calculation result through a reshape function to obtain a first deep feature image sequence. The following description will take an example in which N intermediate feature image sequences and N second intermediate image sequences are determined, and dimensions of the second intermediate image sequences and the intermediate feature image sequences are both H × W × C. And calculating the sum of each second intermediate image sequence and each intermediate characteristic image sequence to obtain N image sequences with dimension H multiplied by W multiplied by C, and performing dimension conversion on the image sequences through a reshape function to obtain a first deep characteristic image sequence with dimension H multiplied by W multiplied by NC. The first deep feature image sequence is used for characterizing the image deep features of the importance of each intermediate feature image sequence.

Step S500, inputting the first intermediate image sequence into a second attention module to determine a second deep characteristic image sequence.

Specifically, in an embodiment of the present invention, the second attention module further includes a fourth convolution layer. The fourth convolutional layer is a convolutional neural network obtained by pre-training so as to further perform feature extraction on the first intermediate image sequence. And inputting the first intermediate image sequence determined according to each intermediate characteristic image sequence into a second attention module so as to model the relationship of different channels and different positions in the first intermediate image sequence. In an embodiment of the present invention, the second attention module is a channel space attention module, and the process of determining the second deep feature image sequence by the second attention module includes the following steps:

step S510, inputting the first intermediate image sequence into a fourth convolution layer to determine a spatial attention image.

In particular, the fourth convolutional layer is a 3-dimensional convolutional layer having a 3-dimensional convolutional kernel for determining a spatial attention image by capturing joint channel and spatial features. After the first intermediate image sequence is input into a fourth convolution layer, convolving with a cube constructed from a plurality of adjacent channels of the first intermediate image sequence by a 3-dimensional convolution kernel within the fourth convolution layer to obtain a corresponding spatial attention image.

And step S520, performing logistic regression processing on the spatial attention image, and then multiplying the spatial attention image by the first intermediate image sequence in a weighting manner to determine a weighted characteristic image sequence.

Specifically, after the spatial attention image is determined, performing logistic regression processing on the spatial attention image through a softmax function, multiplying the result of the logistic regression processing by the first intermediate image sequence, and then multiplying by a scale factor obtained by pre-training to determine a weighted feature image sequence. For example, when the spatial attention image is W_csaThe first intermediate image sequence is F_NWhen the scale factor is beta and the delta is a softmax function, the weighted feature image sequence obtained by calculation is beta sigma (W)_csa)·F_N。

Step S530, calculating a sum of the first intermediate image sequence and the weighted feature image sequence to determine a second deep feature image sequence.

In particular, after determining the weighting also in the image sequence, the second deep feature image sequence can be determined by calculating the sum of the first intermediate image sequence and the weighted feature image sequence. The second deep feature image sequence is used for characterizing image deep features of image channels and spatial levels.

Still taking the spatial attention image as W_csaThe first intermediate image sequence is F_NThe second deep feature image sequence F is described by taking the scale factor beta and the delta as the softmax function as an example_CSComprises the following steps:

F_CS＝βσ(W_csa)·F_N+F_N

and S600, determining a high-resolution image according to the shallow feature image sequence, the first deep feature image sequence and the second deep feature image sequence.

Specifically, after a first deep feature image sequence of image deep features for characterizing the importance of each intermediate feature image sequence and a second deep feature image sequence of image deep features for characterizing image deep features of image channels and spatial levels are determined through step S400, a high resolution image is determined from the shallow feature image sequence, the first deep feature image sequence and the second deep feature image sequence. In an embodiment of the present invention, the process of determining the high resolution image includes the steps of:

and S610, splicing the first deep characteristic image sequence and the second deep characteristic image sequence to determine a target deep characteristic image sequence.

Specifically, the first deep layer feature image sequence and the second deep layer feature image sequence are spliced firstly, and then the splicing result is converted into a target deep layer feature image sequence with a preset dimension through a reshape function. The stitching process may be processing the first deep feature image sequence and the second deep feature image sequence through a stitching (concat) function to stitch the two feature image sequences together for feature fusion. For example, when the dimension of the first deep layer feature image sequence is H × W × C, and the dimension of the second deep layer feature image sequence is also H × W × C, an image sequence with the dimension of H × W × 2C is obtained after stitching, and a target deep layer feature image sequence with the dimension of H × W × C is obtained after reshape function conversion. The target deep-layer characteristic image sequence is used for representing the importance characteristic of each intermediate characteristic image sequence and the image deep-layer characteristic of the image channel and the space level characteristic.

And S620, adding the shallow characteristic image sequence and the target deep characteristic image sequence to determine a target characteristic image sequence.

Specifically, after a target deep layer feature image sequence used for characterizing deep layer features of an image is determined, the shallow layer feature image sequence and the target deep layer feature image sequence are added for feature fusion, and a target feature image sequence comprising the deep layer features and the shallow layer features of the low-resolution image is obtained. The lengths of the shallow characteristic image sequence and the target deep characteristic image sequence are the same, and the sequence adding process is to add shallow characteristic images and deep characteristic images at the same positions in the shallow characteristic image sequence and the target deep characteristic image sequence to obtain target characteristic images so as to determine the target characteristic image sequence according to each target characteristic image.

And step S630, sequentially inputting the target characteristic image sequence into an upsampling layer and a fifth convolution layer to determine a high-resolution image.

Specifically, the target feature image sequence is sequentially input into an upsampling layer and a fifth convolution layer to complete the super-resolution image reconstruction process, and the target feature image sequence is restored into a high-resolution image with the same size as the input low-resolution image.

Fig. 2 is a schematic diagram of a super-resolution image reconstruction method process according to an embodiment of the present invention. As shown in fig. 2, the super-resolution image reconstruction process is to input a low-resolution image 20 into a first convolution layer 21 to obtain a shallow feature image sequence. And then, the shallow feature image sequence is subjected to feature extraction sequentially through N feature extraction modules 22, and N corresponding intermediate feature image sequences are determined. An intermediate feature matrix and a first intermediate image sequence are determined from the N intermediate feature image sequences, such that the intermediate feature matrix is input to a first attention module 23 and the first intermediate image sequence is input to a second attention module 24 for deep feature extraction. Merging the first deep feature image sequence extracted by the first attention module 23 and the second deep feature image sequence extracted by the second attention module 24, and performing feature fusion with the shallow feature image sequence to determine a target feature image sequence including deep and shallow image features. The target feature image sequence is sequentially input to the upsampling layer 25 and the fifth convolution layer 26 to determine and output a high resolution image 27.

Fig. 3 is a schematic diagram of an intermediate feature image sequence determination process according to an embodiment of the present invention, which is used for characterizing a process of feature extraction performed by each of the feature extraction modules in fig. 2. As shown in fig. 3, the intermediate feature image sequence determination process includes sequentially inputting an input feature image sequence 30 into M sequentially connected residual channel attention layers 31 and then inputting a second convolution layer 32. An intermediate feature image sequence 33 is determined from the output of the second convolutional layer 32 and the input image feature sequence 30. The input feature image sequence 30 is a shallow feature image sequence or an intermediate feature image sequence output by a feature extraction module in the last iteration process of a line.

Fig. 4 is a schematic diagram of a first deep feature image sequence determining process for characterizing the process of determining the first deep feature image sequence by the first attention module in fig. 2 according to an embodiment of the present invention. As shown in fig. 4, the process of determining the first deep feature image sequence is to determine an intermediate feature matrix 41 according to each intermediate feature image sequence 40 after extracting a plurality of intermediate feature image sequences 40. And multiplying the intermediate characteristic matrix 41 after the intermediate characteristic matrix 41 is transformed, and performing logistic regression processing through a softmax function 42 to determine a correlation matrix. And calculating the product of the correlation matrix and the intermediate feature matrix 41, and performing dimension conversion through a reshape function 43 to obtain a plurality of second intermediate image sequences. And adding the plurality of second intermediate image sequences and the plurality of intermediate feature image sequences, and performing dimension conversion again through a reshape function 44 to obtain a first deep feature image sequence 45.

Fig. 5 is a schematic diagram of a second deep feature image sequence determination process according to an embodiment of the present invention. A process for characterizing the second attention module in fig. 2 to determine a second sequence of deep feature images. As shown in fig. 5, the process of determining the second deep feature image sequence is to input a third convolution layer 51 according to the intermediate feature image sequence corresponding to the nth feature extraction module 50 after extracting N intermediate feature image sequences to determine the first intermediate image sequence. The first sequence of intermediate images is then input into the fourth convolution layer 52 to determine the spatial attention image. The spatial attention image is subjected to logistic regression processing by a softmax function 53, then is subjected to weighted multiplication with the first intermediate image sequence to determine a weighted feature image sequence, and finally a second deep feature image sequence 54 is determined by calculating the sum of the first intermediate image sequence and the weighted feature image sequence.

The embodiment of the invention can convert the low-resolution image into the high-resolution image through super-resolution image reconstruction. Meanwhile, the importance of the intermediate layer features is distinguished and utilized in the image reconstruction process, and super-resolution image reconstruction is performed on the basis of the shallow layer features and the deep layer image features including the importance features, the channel features and the spatial features, so that the accuracy of the output high-resolution image is improved.

Fig. 6 is a schematic diagram of a super-resolution image reconstruction apparatus according to an embodiment of the present invention. As shown in fig. 6, the super-resolution image reconstruction apparatus includes a first feature extraction unit 60, an intermediate image determination unit 62, a first deep feature determination unit 63, a second deep feature determination unit 64, and a high-resolution image determination unit 65.

Specifically, the first feature extraction unit 60 is configured to input a low-resolution image into the first convolution layer to determine a shallow feature image sequence. The second feature extraction unit 61 is configured to input the shallow feature images into a plurality of feature extraction modules connected in sequence, so as to determine an intermediate feature image sequence corresponding to each feature extraction module. The intermediate image determining unit 62 is configured to determine an intermediate feature matrix and a first intermediate image sequence from each of the intermediate feature image sequences. The first deep feature determination unit 63 is configured to input the intermediate feature matrix into a first attention module for determining a first sequence of deep feature images. The second deep feature determination unit 64 is configured to input the first sequence of intermediate images into a second attention module to determine a second sequence of deep feature images. The high resolution image determination unit 65 is configured to determine a high resolution image from the sequence of shallow feature images, the sequence of first deep feature images and the sequence of second deep feature images.

Further, the second feature extraction unit specifically is:

the characteristic extraction subunit is used for taking the shallow characteristic image sequence as initial input, performing characteristic extraction for multiple times through each characteristic extraction module in an iterative mode, and determining an intermediate characteristic image sequence corresponding to each characteristic extraction module;

and the intermediate characteristic image sequence output by each characteristic extraction module is used as the input of the next characteristic extraction module.

Further, the feature extraction module comprises a plurality of residual channel attention layers and a second convolution layer which are sequentially connected;

the feature extraction subunit includes:

the first extraction module is used for sequentially inputting the input of the current feature extraction module into each residual channel attention layer and the second convolution layer, wherein the input of the current feature extraction module is the shallow feature image sequence or the intermediate feature image sequence output by the previous feature extraction module;

and the second extraction module is used for determining a corresponding intermediate feature image sequence according to the input of the current feature extraction module and the output of the second convolution layer.

Further, the intermediate image determination unit includes:

a first image determining subunit, configured to determine an intermediate feature matrix according to each of the intermediate feature image sequences;

and the second image determining subunit is used for determining the first intermediate image sequence according to the intermediate feature image sequence corresponding to the last feature extraction module.

Further, the first image determination subunit specifically is:

and the matrix determining module is used for splicing the intermediate characteristic image sequences to determine the intermediate characteristic matrix.

Further, the second image determination subunit specifically is:

and the image sequence determining module is used for inputting the intermediate characteristic image sequence corresponding to the last characteristic extracting module into the third convolution layer so as to determine the first intermediate image sequence.

Further, the first deep feature determination unit includes:

a matrix determination subunit, configured to perform a logistic regression process on a value obtained by multiplying the intermediate feature matrix by the transposed intermediate feature matrix to determine a correlation matrix;

a sequence determining subunit, configured to determine a plurality of second intermediate image sequences according to a product of the correlation matrix and the intermediate feature matrix;

a first deep feature determination subunit for calculating a sum of each of the second sequences of intermediate images and each of the sequences of intermediate feature images to determine a first sequence of deep feature images.

Further, a fourth convolutional layer is included in the second attention module;

the second deep feature determination unit includes:

a convolution subunit for inputting the first sequence of intermediate images into a fourth convolution layer to determine a spatial attention image;

the weighting subunit is used for performing logistic regression processing on the spatial attention image and then multiplying the spatial attention image by the first intermediate image sequence in a weighting manner to determine a weighted characteristic image sequence;

a second deep feature determination subunit to compute a sum of the first sequence of intermediate images and the sequence of weighted feature images to determine a second sequence of deep feature images.

Further, the high resolution image determination unit includes:

the image splicing subunit is used for splicing the first deep characteristic image sequence and the second deep characteristic image sequence to determine a target deep characteristic image sequence;

an image adding subunit, configured to add the shallow feature image sequence and the target deep feature image sequence to determine a target feature image sequence;

and the high-resolution image determining subunit is used for sequentially inputting the target characteristic image sequence into the upsampling layer and the fifth convolution layer so as to determine a high-resolution image.

Fig. 7 is a schematic diagram of an electronic device according to an embodiment of the invention. As shown in fig. 7, the electronic device shown in fig. 7 is a general address query device, which includes a general computer hardware structure, which includes at least a processor 70 and a memory 71. The processor 70 and the memory 71 are connected by a bus 72. The memory 71 is adapted to store instructions or programs executable by the processor 70. Processor 70 may be a stand-alone microprocessor or may be a collection of one or more microprocessors. Thus, the processor 70 implements the processing of data and the control of other devices by executing instructions stored by the memory 71 to perform the method flows of embodiments of the present invention as described above. The bus 72 connects the above components together, as well as to a display controller 73 and a display device and an input/output (I/O) device 74. Input/output (I/O) devices 74 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 74 are connected to the system through input/output (I/O) controllers 75.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.

These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable vehicle dispatch device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.

These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable vehicle scheduling apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable vehicle scheduling apparatus, create means for implementing the functions specified in the flowchart flow or flows.

Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be accomplished by specifying the relevant hardware through a program, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A super-resolution image reconstruction method, comprising:

2. The method according to claim 1, wherein the inputting the shallow feature image into a plurality of feature extraction modules connected in sequence to determine an intermediate feature image sequence corresponding to each feature extraction module specifically comprises:

taking the shallow feature image sequence as initial input, performing feature extraction for multiple times through each feature extraction module in an iterative mode, and determining an intermediate feature image sequence corresponding to each feature extraction module;

3. The method of claim 2, wherein the feature extraction module comprises a plurality of sequentially connected residual channel attention layers and a second convolution layer;

the extracting of the features by each of the feature extraction modules includes:

sequentially inputting the input of a current feature extraction module into each residual channel attention layer and a second convolution layer in sequence, wherein the input of the current feature extraction module is the shallow feature image sequence or the intermediate feature image sequence output by the previous feature extraction module;

and determining a corresponding intermediate feature image sequence according to the input of the current feature extraction module and the output of the second convolution layer.

4. The method of claim 1, wherein determining an intermediate feature matrix and a first intermediate image sequence from each of the intermediate feature image sequences comprises:

determining an intermediate characteristic matrix according to each intermediate characteristic image sequence;

and determining a first intermediate image sequence according to the intermediate feature image sequence corresponding to the last feature extraction module.

5. The method according to claim 4, wherein the determining an intermediate feature matrix from each of the intermediate feature image sequences is specifically:

and splicing the intermediate characteristic image sequences to determine the intermediate characteristic matrix.

6. The method according to claim 4, wherein the determining the first intermediate image sequence according to the intermediate feature image sequence corresponding to the last feature extraction module specifically comprises:

and inputting the intermediate characteristic image sequence corresponding to the last characteristic extraction module into the third convolution layer to determine a first intermediate image sequence.

7. The method of claim 1, wherein inputting the intermediate feature matrix into a first attention module to determine a first sequence of deep feature images comprises:

performing logistic regression processing on a value obtained by multiplying the intermediate characteristic matrix by the transposed intermediate characteristic matrix to determine a correlation matrix;

determining a plurality of second intermediate image sequences according to the product of the correlation matrix and the intermediate feature matrix;

the sum of each of the second intermediate image sequences and each of the intermediate feature image sequences is calculated to determine a first deep feature image sequence.

8. The method of claim 1, wherein the second attention module includes a fourth convolutional layer therein;

the inputting the first sequence of intermediate images into a second attention module to determine a second sequence of deep feature images comprises:

inputting the first sequence of intermediate images into a fourth convolution layer to determine a spatial attention image;

performing logistic regression processing on the spatial attention image, and then multiplying the spatial attention image by the first intermediate image sequence in a weighting manner to determine a weighted characteristic image sequence;

a sum of the first sequence of intermediate images and the sequence of weighted feature images is computed to determine a second sequence of deep feature images.

9. The method of claim 1, wherein determining a high resolution image from the sequence of shallow feature images, the sequence of first deep feature images, and the sequence of second deep feature images comprises:

stitching the first deep feature image sequence and the second deep feature image sequence to determine a target deep feature image sequence;

adding the shallow feature image sequence to the target deep feature image sequence to determine a target feature image sequence;

and sequentially inputting the target characteristic image sequence into an upsampling layer and a fifth convolution layer to determine a high-resolution image.

10. A super-resolution image reconstruction apparatus, characterized in that the apparatus comprises:

11. A computer readable storage medium storing computer program instructions, which when executed by a processor implement the method of any one of claims 1-9.

12. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-9.