CN116630152A

CN116630152A - Image resolution reconstruction method and device, storage medium and electronic equipment

Info

Publication number: CN116630152A
Application number: CN202310436642.5A
Authority: CN
Inventors: 吕少卿; 俞鸣园; 王克彦; 曹亚曦; 孙俊伟
Original assignee: Zhejiang Huachuang Video Signal Technology Co Ltd
Current assignee: Zhejiang Huachuang Video Signal Technology Co Ltd
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-08-22

Abstract

The application discloses an image resolution reconstruction method, an image resolution reconstruction device, a storage medium and electronic equipment, wherein the image resolution reconstruction method comprises the following steps: inputting an image to be reconstructed into a recursion unit layer of a trained resolution reconstruction model, carrying out feature extraction on the image to be reconstructed by using the recursion unit layer to obtain a feature image, carrying out fusion on the feature image and the image to be reconstructed to output a fusion image, and then iteratively calling the recursion unit layer, wherein the fusion image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer in the iterative calling process, the reconstruction image is obtained based on the fusion image output by the recursion unit layer after the iterative calling, the higher reconstruction image resolution is realized by increasing the recursion times, so that the complexity of the model is better controlled, and the resolution of the image is gradually increased by a plurality of times of recursions instead of directly mapping the low-resolution image to the high-resolution image at one time, so that the generation of a more accurate high-resolution image is facilitated.

Description

Image resolution reconstruction method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and apparatus for reconstructing image resolution, a storage medium, and an electronic device.

Background

The Super Resolution (SR) reconstruction technology is a series of technical means, and is used to complete the task of recovering a High Resolution (HR) image from a single or multiple frames (sequences) of Low Resolution (LR) images. The super-resolution reconstruction technology overcomes the limitation of hardware conditions such as imaging equipment, remote transmission equipment and the like under the condition of greatly reducing the cost, and provides ideal high-resolution pictures.

Traditional image super-resolution methods, such as: interpolation algorithm, sampling-based method and statistical-based method, but these methods cannot process complex image information and structure and have limited reconstruction effect; another example is: dictionary learning-based methods, sparse representation-based methods, and pixel pool-based methods, however, the performance of these methods depends on manually designed features and models, and it is difficult to extract details that are closer to the real image.

Disclosure of Invention

The application provides at least an image resolution reconstruction method, an image resolution reconstruction device, a computer readable storage medium and an electronic device.

The first aspect of the present application provides an image resolution reconstruction method, comprising: inputting an image to be reconstructed into a recursion unit layer of a resolution reconstruction model after training is completed; extracting features of the image to be reconstructed by using the recursion unit layer to obtain a feature image, and fusing the feature image and the image to be reconstructed to output a fused image; iteratively calling the recursion unit layer, and taking a fusion image output by the current iteration of the recursion unit layer as the input of the next iteration of the recursion unit layer in the iterative calling process; and obtaining a reconstructed image based on the fusion image output after the recursion unit layer iteration calling, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed.

In an embodiment, performing feature extraction on an image to be reconstructed by using a recursion unit layer to obtain a feature map, and fusing the feature map and the image to be reconstructed to output a fused image, including: extracting features of the image to be reconstructed to obtain a feature map; performing differential processing on the feature image and the image to be reconstructed to obtain a residual image; weighting the residual image based on an attention mechanism to obtain a weighted residual image; and carrying out fusion processing on the weighted residual error image and the feature image to obtain a fusion image.

In an embodiment, performing feature extraction on an image to be reconstructed to obtain a feature map, including: downsampling an image to be reconstructed to obtain a downsampled feature map; performing convolution operation on the downsampled feature map to obtain a convolution feature map; weighting the convolution feature map based on the attention mechanism to obtain a weighted feature map; and (5) up-sampling the weighted feature map to obtain the feature map.

In one embodiment, the recursive unit layer contains a plurality of feature span blocks; extracting features of an image to be reconstructed to obtain a feature map, including: extracting features of an image to be reconstructed to obtain an initial feature map; respectively carrying out span convolution on the initial feature images based on each feature span block to obtain a plurality of intermediate feature images; the span convolution of each characteristic span block adopts different space scale information; and splicing the plurality of intermediate feature graphs to obtain the feature graphs.

In an embodiment, obtaining a reconstructed image based on the fused image output after the recursive unit layer iteration call includes: acquiring a fusion image obtained by the recursion unit layer in each iteration call, and obtaining an output set corresponding to the recursion unit layer; and fusing each fused image in the output set to obtain a reconstructed image.

In one embodiment, fusing each fused image in the output set to obtain a reconstructed image includes: acquiring a weight parameter corresponding to each fusion image in the output set; and carrying out weighted fusion on each fusion image in the output set based on the weight parameters to obtain a reconstructed image.

In one embodiment, the resolution reconstruction model comprises a plurality of sequentially connected layers of recursive units; the method for obtaining the reconstructed image based on the fusion image output after the recursion unit layer iteration calling, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed comprises the following steps: if the recursion unit layer called by the current iteration is detected to meet the preset iteration ending condition, inputting the fusion image output by the recursion unit layer called by the current iteration to the recursion unit layer called by the next iteration for image reconstruction; traversing each recursion unit layer, and obtaining a reconstructed image based on the fusion image output after each recursion unit layer is iteratively invoked.

A second aspect of the present application provides an image resolution reconstruction apparatus comprising: the input module is used for inputting the image to be reconstructed into a recursion unit layer of the resolution reconstruction model after training; the fusion module is used for extracting features of the image to be reconstructed by using the recursion unit layer to obtain a feature image, and fusing the feature image and the image to be reconstructed to output a fused image; the recursion module is used for iteratively calling the recursion unit layer, and taking a fusion image output by the current iteration of the recursion unit layer as the input of the next iteration of the recursion unit layer in the iterative calling process; the result acquisition module is used for acquiring a reconstructed image based on the fusion image output after the recursion unit layer iteration call, and the resolution of the reconstructed image is higher than that of the image to be reconstructed.

A third aspect of the present application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the above-described image resolution reconstruction method.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-described image resolution reconstruction method.

According to the scheme, the image to be reconstructed is input into the recursion unit layer of the resolution reconstruction model after training, the recursion unit layer is utilized to perform feature extraction on the image to be reconstructed to obtain the feature image, the feature image and the image to be reconstructed are fused to output the fused image, the recursion unit layer is called again in an iterating mode, the fused image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer in the iterating and calling process, so that gradual recovery of image details is achieved, the reconstructed image is obtained based on the fused image output by the recursion unit layer after the iterating and calling, higher reconstruction image resolution is achieved through increasing the recursion times, the depth and the parameter number of a network are not required to be increased, the complexity of the model is controlled better, the resolution of the image is gradually increased through multiple recursions, and the low-resolution image is directly mapped to the high-resolution image at one time, so that more accurate high-resolution images can be generated.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic illustration of an environment in which an exemplary embodiment of an image resolution reconstruction method of the present application is implemented;

FIG. 2 is a flow chart of an exemplary embodiment of an image resolution reconstruction method of the present application;

FIG. 3 is a flow chart illustrating a training resolution reconstruction model according to an exemplary embodiment of the present application;

FIG. 4 is a block diagram of an image resolution reconstruction apparatus according to an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of an electronic device shown in an exemplary embodiment of the application;

fig. 6 is a schematic diagram of a structure of a computer-readable storage medium according to an exemplary embodiment of the present application.

Detailed Description

The following describes embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Referring to fig. 1, a schematic diagram of an operation environment of an image resolution reconstruction method according to an embodiment of the application is shown. The operating environment may include: a terminal 10 and a server 20.

The terminal 10 includes, but is not limited to, a cell phone, a computer, an intelligent audio interaction device, an intelligent home appliance, a vehicle-mounted terminal, an aircraft, a game console, an electronic book reader, a multimedia playing device, a wearable device, and the like.

The client in the terminal 10 may install an application program, which may be any application program capable of providing an image resolution reconstruction service. Optionally, the application program includes, but is not limited to, a video type application program, a shopping type application program, a content sharing type application program, and the like, which is not limited by the embodiment of the present application. In addition, for different applications, the corresponding image content and the corresponding functions may be different, which may be configured in advance according to the actual requirements, which is not limited by the embodiment of the present application. Optionally, a client of the above application program is running in the terminal 10.

The server 20 is used to provide background services for clients of applications in the terminal 10. For example, the server 20 may be a background server of the application program described above. The server 20 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform. Alternatively, the server 20 provides background services for applications in a plurality of terminals 10 at the same time.

Alternatively, the terminal 10 and the server 20 may communicate with each other via the network 30. The terminal 10 and the server 20 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.

Alternatively, it may be that the server 20 performs the primary image resolution reconstruction work and the terminal 10 performs the secondary image resolution reconstruction work; alternatively, the server 20 performs the secondary image resolution reconstruction, and the terminal 10 performs the primary image resolution reconstruction; alternatively, the server 20 or the terminal 10, respectively, may take over the image resolution reconstruction work alone.

It can be understood that, in the specific embodiment of the present application, related data such as an image to be reconstructed, user information, etc. are related, when the above embodiment of the present application is applied to a specific product or technology, user permission or consent needs to be obtained, and the collection, use and processing of related data needs to keep on related laws and regulations and standards of related countries and regions.

Referring to fig. 2, a flowchart of an image resolution reconstruction method according to an embodiment of the present application is shown, where the method may be applied to a computer device, and the computer device refers to an electronic device with data computing and processing capabilities, and the execution subject of each step may be the terminal 10 or the server 20 in the operating environment shown in fig. 1. It should be understood that the method may be adapted to other exemplary implementation environments and be specifically executed by devices in other implementation environments, and the implementation environments to which the method is adapted are not limited by the present embodiment.

The image resolution reconstruction method according to the embodiment of the present application will be described in detail below with the server as a specific execution subject.

As shown in fig. 2, in an exemplary embodiment, the image resolution reconstruction method at least includes steps S210 to S240, which are described in detail below:

Step S210: and inputting the image to be reconstructed into a recursion unit layer of the trained resolution reconstruction model.

The image to be reconstructed refers to a low-resolution image which needs to be reconstructed in super-resolution, and the image to be reconstructed can be an image acquired in real time or a historical image. The image to be reconstructed needs super resolution to improve the image quality and definition.

Illustratively, the server is deployed with a trained resolution reconstruction model, and the server can directly acquire the image to be reconstructed from the database, and input the image to be reconstructed into a recursive unit layer of the trained resolution reconstruction model. The server can also acquire the image to be reconstructed uploaded by the terminal, and then input the image to be reconstructed into a recursion unit layer of the trained resolution reconstruction model.

Step S220: and carrying out feature extraction on the image to be reconstructed by using the recursion unit layer to obtain a feature image, and carrying out fusion on the feature image and the image to be reconstructed to output a fusion image.

And the feature extraction network in the recursion unit layer performs feature extraction on the image to be reconstructed to obtain a feature image, and the feature image and the image to be reconstructed are fused through the fusion network in the recursion unit layer to output a fusion image.

The feature extraction network can be called to extract the image features of the image to be reconstructed, so as to obtain a feature image of the image to be reconstructed, wherein the feature image comprises feature parameters (such as feature values) extracted from the image to be reconstructed by the feature extraction network, and the feature image of the image to be reconstructed is expressed as a feature matrix. The feature value extracted from the image to be reconstructed represents key feature information of the image to be reconstructed, and the feature map of the image to be reconstructed can be generated according to the key feature information extracted from the image to be reconstructed.

The feature extraction network is illustratively implemented based on a convolutional neural network (Convolutional Neural Network, CNN), for example, the feature extraction network may include a convolutional layer and a modified linear unit ReLU (Rectified Linear Unit, reLU) function, where the convolutional layer may have a feature depth of 1×64 and a convolutional kernel size of 3×3, and after the image to be reconstructed is input to the convolutional layer of the recursive unit layer, a set of 64 feature maps of the image to be reconstructed may be extracted by processing the ReLU function. It should be understood that the composition of the feature extraction network is presented herein by way of example and the application is not limited thereto.

For example, the extracted feature map may be feature-enhanced, and then the feature-enhanced feature map and the image to be reconstructed may be added to obtain a fused image.

Step S230: and iteratively calling the recursion unit layer, and taking a fusion image output by the current iteration of the recursion unit layer as the input of the next iteration of the recursion unit layer in the iterative calling process.

And repeatedly iterating and calling the recursion unit layer, and taking the fusion image output by the current iteration of the recursion unit layer as the input of the next iteration of the recursion unit layer in the iterative calling process.

By repeatedly invoking the layer of recursive units to achieve higher reconstructed image resolution by increasing the number of recursions without increasing the depth and the number of parameters of the network, the complexity of the model is better controlled, and the resolution of the image is gradually increased by multiple recursions instead of directly mapping the low resolution image to the high resolution image at once, which helps to generate a more accurate high resolution image.

In some embodiments, the formula of recursive unit layer correlation includes:

F _i ＝FEM(K _i (X _i )，W _f )

K _i ＝CKGM(X _i ，W _i )

X _i+1 ＝X _i +K _i (X _i )·F _i (X _i ；W _f )

wherein X is _i Representing an input image of an ith layer (ith iteration), F _i Representing the feature enhancement result of the ith layer, implemented by a feature enhancement module FEM, W _f Model parameters for feature enhancement modulesA number; k (K) _i A convolution kernel representing the ith convolution layer, generated by a convolution kernel generation module CKGM, W _i Model parameters representing an i-th layer convolution kernel generation module; k (K) _i (X _i ) Representing the use of convolution kernel K _i For X _i Performing convolution operation to obtain a feature map, F _i (X _i ；W _f ) Representing the use of feature enhancement module pairs X _i Executing feature enhancement operation on the corresponding feature map to obtain a feature map after feature enhancement, and then carrying out X on the feature map _i The corresponding feature images and the feature-enhanced feature images are fused to obtain feature fusion results, such as fusion of the feature images and the feature-enhanced feature images by using a point multiplication mode; x is X _i+1 Representing a fused image of the recursive unit layer output, which is the input image X of the ith layer _i And the sum of the feature fusion results, so that gradual recovery of image details is realized by recursively extracting and enhancing the features layer by layer.

Step S240: and obtaining a reconstructed image based on the fusion image output after the recursion unit layer iteration calling, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed.

The reconstructed image is a high resolution image relative to the original image to be reconstructed, which means that the pixel density in the image is high, which can provide more details, which are essential in many practical applications. Thus, the method provided by the embodiments of the present disclosure may be applied to different scenarios.

For example, in a security monitoring system, because of the limitation of hardware technology, the situation of unclear pictures shot by a camera can exist, and the cost is increased by improving hardware equipment such as the camera. For another example, in the aspect of medical images, the image resolution reconstruction method provided by the application can improve the resolution of medical images, and the high-resolution medical images are helpful for doctors to confirm focus of patients, so that the diagnosis of diseases is accelerated, and the difficulty of disease diagnosis caused by unclear medical images is solved. For another example, in the aspect of satellite images, the satellite images have important roles in geological exploration, military reconnaissance and the like, and the image resolution reconstruction method provided by the application is used for carrying out super-resolution reconstruction on images acquired by satellites, so that satellite images with rich texture details can be obtained. As another example, the method can be applied to preprocessing of related machine vision tasks (detection, tracking and identification of targets), and the performance of pattern identification in computer vision is greatly improved if high-resolution images can be provided.

According to the image resolution reconstruction method provided by the application, the image to be reconstructed is input into the recursion unit layer of the trained resolution reconstruction model, the recursion unit layer is utilized to perform feature extraction on the image to be reconstructed to obtain the feature image, the feature image and the image to be reconstructed are fused to output the fused image, the recursion unit layer is called for iteration again, the fused image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer in the iteration calling process, so that gradual recovery of image details is realized, the reconstructed image is obtained based on the fused image output by the recursion unit layer after the iteration calling, higher reconstruction image resolution is realized by increasing the recursion times without increasing the depth and the parameter number of a network, so that the complexity of the model is better controlled, and the resolution of the image is gradually increased by multiple times of iteration instead of directly mapping the low-resolution image to the high-resolution image, so that more accurate high-resolution image can be generated.

In some embodiments, feature extraction is performed on an image to be reconstructed by using a recursion unit layer to obtain a feature map, and the feature map and the image to be reconstructed are fused to output a fused image, which includes: extracting features of the image to be reconstructed to obtain a feature map; performing differential processing on the feature image and the image to be reconstructed to obtain a residual image; weighting the residual image based on an attention mechanism to obtain a weighted residual image; and carrying out fusion processing on the weighted residual error image and the feature image to obtain a fusion image.

The residual map is used to characterize the feature differences between the feature map and the image to be reconstructed.

Illustratively, performing differential processing on the feature map and the image to be reconstructed to obtain a residual map, including: determining the feature variation between the image to be reconstructed and the feature map; and generating a residual diagram between the image to be reconstructed and the feature diagram according to the feature variation.

The feature variation refers to feature differences between the same pixel points or matched feature points in the two figures.

For example, feature point pairs matched with each other in the image to be reconstructed and the feature map are determined, then feature variation between two feature points in each feature point pair is calculated, feature variation corresponding to each pair of feature points is obtained, and a residual map is generated according to the feature variation corresponding to each pair of feature points.

Further, weighting the residual image based on an attention mechanism to obtain a weighted residual image, and fusing the weighted residual image and the feature image to obtain a fused image. The attention mechanism is used for improving the weight of the key features so as to make the key features in the obtained weighted residual diagram more remarkable. And then, carrying out fusion processing on the weighted residual error image and the feature image to restore the image to be reconstructed to obtain image details.

In some embodiments, feature extraction is performed on an image to be reconstructed to obtain a feature map, including: downsampling an image to be reconstructed to obtain a downsampled feature map; performing convolution operation on the downsampled feature map to obtain a convolution feature map; weighting the convolution feature map based on the attention mechanism to obtain a weighted feature map; and (5) up-sampling the weighted feature map to obtain the feature map.

In the embodiment of the application, a downsampling network is included in a recursive unit layer of the resolution reconstruction model. After the image to be reconstructed is input into a recursion unit layer of the resolution reconstruction model, the downsampling network downsamples the image to be reconstructed to obtain a downsampled feature map of the image to be reconstructed.

The embodiment of the application does not limit the structure, the size and the like of the downsampling network. Illustratively, the downsampling network may be a convolutional layer with a step size (Stride) of s, where s is a positive integer greater than 1, such as s=2. Alternatively, the downsampling network includes at least two convolution layers connected in series, the step size of any one convolution layer is a positive integer greater than 1, and the step sizes of any two convolution layers can be the same or different. Alternatively, the downsampling network performs downsampling on the image to be reconstructed in a pooling manner, and at this time, the downsampling network layer may be a pooling layer. Alternatively, the downsampling network layer may perform downsampling processing on the image to be reconstructed by bilinear interpolation or the like.

It should be noted that, the downsampling process is performed on the image to be reconstructed by using the downsampling network, so that the feature scale can be reduced, for example, when the downsampling network is a convolution layer with a step length of 2, the feature scale can be reduced to one half of the original size, so as to reduce the computational complexity.

And then, performing convolution operation on the downsampled feature map to obtain a convolution feature map, wherein the downsampled feature map can be subjected to convolution operation by using a depth separable convolution (depthwise separable convolution) network or other convolution networks, and the application is not limited to the description.

Wherein the depth separable convolution decomposes the standard convolution operation into a depth convolution and a point-by-point convolution, which can significantly reduce the amount of computation and the amount of parameters. Depth convolution processes only one channel in the input feature map using one convolution kernel, the size of this convolution kernel being (k, k, 1), where k represents the size of the convolution kernel and 1 represents feature information that processes only one channel. After the depth convolution, a feature map of the depth convolution output is obtained for each input channel. In the point-by-point convolution, a convolution kernel of (1, d) is used for each feature map of the depth convolution output, where d represents the number of channels of the depth convolution output, and the purpose of the point-by-point convolution is to compress the feature map of the depth convolution output into one channel, so as to reduce the calculation amount and the parameter amount.

The computational effort of the depth separable convolution is much more space optimized than conventional convolution because the size of the convolution kernel is relatively small in both the depth convolution and the point-wise convolution, and point-wise convolution typically uses only one convolution kernel. Therefore, the depth separable convolution can effectively reduce the calculated amount and the parameter amount in the resolution reconstruction model, thereby improving the running speed and the efficiency of the model.

Further, weighting the convolution feature map based on the attention mechanism to obtain a weighted feature map so as to improve the significance of the key features, and upsampling the weighted feature map to obtain a final feature map. Wherein up-sampling is used to increase the size of the image, and the pixel values of the pixel points in the amplified image are determined by sampling, for example, by bicubic interpolation (Bicubic Interpolation), and up-sampling the weighted feature map may be implemented by using a nearest neighbor interpolation algorithm (Near Neighbour Interpolation), bilinear interpolation (Bilinear Interpolation), cubic convolution (Bicubic Interpolation), deconvolution (Deconvolution), and the like, which is not limited by the present application.

The weighting of the feature map is illustratively performed using a self-attention mechanism. The self-attention mechanism compares each position of the feature map with all other positions to generate a weight vector for each position, thereby weighting the feature map. The self-attention mechanism formula can be expressed as:

wherein x is _i Represents the i-th position in the feature map, n represents the sum of the positions in the feature map, g (x _i ) A feature vector representing the position, f (x _i ，x _j ) Is to calculate x _i And x _j A function of the correlation between them, typically using a dot product operation. The formula will be x for each location _i For all positions x _j And (3) calculating the correlation of the position, generating the weight of each position through normalization, and finally adding the weighted eigenvectors to obtain the output eigenvector of the position.

When the self-attention mechanism is used, three linear transformations are needed to be carried out on the input feature map to obtain keys, values and query vectors, then a weight vector of each pixel point is obtained by calculating a similarity matrix and a softmax function, and finally the weight vector is multiplied by the value vector to obtain the final self-attention feature map.

In some implementations, the recursive unit layer contains a plurality of feature span blocks; extracting features of an image to be reconstructed to obtain a feature map, including: extracting features of an image to be reconstructed to obtain an initial feature map; respectively carrying out span convolution on the initial feature images based on each feature span block to obtain a plurality of intermediate feature images; the span convolution of each characteristic span block adopts different space scale information; and splicing the plurality of intermediate feature graphs to obtain the feature graphs.

The feature span block is used for enhancing the space perception capability of the feature map so as to improve the effect of super-resolution reconstruction.

Illustratively, the feature Span blocks include Span Convolution (Span Convolvulation) and channel attention (Channel Attention). The span convolution introduces information from different spatial scales to each pixel point in the feature map, so that the spatial perception capability of the feature map is enhanced, and the specific implementation of the span convolution can adopt different modes, such as multi-scale convolution on the feature map, jump convolution in the span direction, and the like. Channel attention is used to weight different channels in the feature map to increase the response to useful information and suppress the response to useless information, and can be implemented by different methods such as SE (sequential-and-specification) attention or CBAM (Convolutional Block Attention Module) attention.

Wherein the channel is a component of the image, and most of the information of the image is recorded. For example, in RGB (Red, green, blue) color mode, an image is superimposed by a Red channel, a Green channel, and a Blue channel; in gray mode, the image includes only one channel. For another example, in YUV color mode, the image is overlaid by a "Y" channel, a "U" channel, and a "V" channel.

The implementation of the feature span block is described next:

channel dilation of the initial feature map using subpixel convolution:

Y＝F _PSConv (X)

wherein X represents an initial feature map, X ε R ^C×H×W C represents the number of channels of the initial feature map D, and H and W represent the height and width of the initial feature map, respectively; y represents an initial characteristic diagram of the sub-pixel convolution output after channel expansion, Y epsilon R ^C’×rH×rw C' is the output channel and r is the upsampled scale factor.

The initial feature map after channel expansion is restored to the original size using an upsampling operation:

Z＝F _Upsample (Y)

wherein Z represents the initial feature map after upsampling, Z ε R ^C’×H×W 。

Downsampling the upsampled initial feature map using a downsampling operation to span different receptive field ranges such that the resolution of the resulting downsampled feature map is lower but the receptive field range is greater:

W＝F _Downsmple (X)

where W represents the initial feature map after downsampling,s is the downsampled scale factor.

Splicing the up-sampled initial feature map and the down-sampled initial feature map together to obtain feature maps spanning different receptive field ranges:

F _FSB (X)＝Concat(Z，W)

wherein Concat represents the splicing operation of the feature map, F _FSB Representing the output of the feature span block.

In some embodiments, a residual attention module (Residual Attention Module, RAM) is inserted between the recursive unit layers to adaptively weight the features of the different layers. The input of the residual attention module is a feature map processed by a recursion unit layer, the output is a weighted feature map, and the formulas of the related steps can be expressed as follows:

Calculating a weighted mask:

wherein F is _c Characteristic diagram representing the c-th channel, c representing the number of channels, W _c Indicating the weight matrix of the c-th channel, +..

And (5) weighting calculation:

A＝F⊙M

where A represents the weighted feature map, F represents the original feature map, and M represents the calculated weighted mask.

Outputting residual attention:

R＝F+A

where R represents the output of the residual attention module, F represents the original feature map, and a represents the weighted feature map.

In some embodiments, obtaining a reconstructed image based on the fused image output after the recursive unit layer iterative call includes: acquiring a fusion image obtained by the recursion unit layer in each iteration call, and obtaining an output set corresponding to the recursion unit layer; and fusing each fused image in the output set to obtain a reconstructed image.

The output set contains recursion unit layers to obtain output images in each iteration process, and each fusion image in the output set is fused. The fusion method can be that the values of the same pixel position of each image are summed, averaged and the like to obtain the target value of the pixel position in the reconstructed image, and the reconstructed image is obtained by calculating the target value of each pixel position in the reconstructed image to obtain a reconstructed image with higher fineness and higher accuracy.

Illustratively, fusing each fused image in the output set to obtain a reconstructed image includes: acquiring a weight parameter corresponding to each fusion image in the output set; the method comprises the steps of carrying out a first treatment on the surface of the And carrying out weighted fusion on each fusion image in the output set based on the weight parameters to obtain a reconstructed image.

And calculating the weight parameter corresponding to each fusion image through the attention mechanism, so that each fusion image in the output set is subjected to weighted fusion according to the weight parameter to obtain a reconstructed image, and the accuracy of the obtained reconstructed image is higher, and the image resolution reconstruction effect is better.

In some embodiments, the resolution reconstruction model is composed of an input layer, one or more recursion blocks, an output layer, etc., the recursion blocks are composed of a plurality of recursion layers or feature span blocks, the recursion layers are implemented by stacking a plurality of recursion unit layers, the recursion unit layers are composed of a plurality of recursion units, the number of recursion units contained in each recursion unit layer can be different, and the number of recursion unit layers in the whole network determines the depth of the network, and gradual recovery of image details can be achieved through the plurality of recursion unit layers. Wherein in each layer of recursive units the structure of the recursive units is the same, but their network parameters may be different.

The input layer receives an image to be reconstructed as input and performs preprocessing, such as normalization, clipping, rotation and the like, on the image so as to facilitate subsequent image reconstruction.

Illustratively, the recursive block contains a pyramid structure consisting of a plurality of feature span blocks, where each feature span block has a different span and channel attention to capture detailed information at different spatial scales through superposition of multi-level feature map pyramids while utilizing channel attention to enhance responsiveness to useful information.

Wherein when the number of the recursion blocks is plural, the input of the nth level recursion block is the output X of the previous level recursion block _n-1 The calculation formula of the output of the recursive block may be:

X _n ＝R _n (X _n-1 )＝f(X _n-1 ，W)+X _n-1

wherein R is _n Is the nth recursion layer, which represents a function of the recursion layer operation, f (X _n-1 W) represents the operation of the recursive layer, W represents the parameters of the recursive layer, X _n Is the output of the recursive block.

The output of the recursion block can be used as the input of the next level recursion block to form a multi-level feature map pyramid to capture detailed information under different spatial scales and simultaneously utilize channel attention to improve the response capability to useful information.

Pyramid pooling (pyramid pooling) can be introduced, so that the recognition capability of the model for images with different scales is improved. The method comprises the following specific steps: and dividing the feature map into blocks according to different proportions, carrying out pooling operation on the features in each block, and splicing all obtained pooling results to be used as an output feature map.

Optionally, in the task of image resolution reconstruction, pyramid pooling is used to fuse different scale features of an image to be reconstructed, so as to improve the recognition capability of a model, and the implementation method is as follows: dividing an original feature map corresponding to an image to be reconstructed into a plurality of blocks in proportion; pooling features within each block may use different pooling approaches, such as maximum pooling, average pooling, etc.; and splicing all the pooling results to be used as an output characteristic diagram.

Assuming that the size of the input feature map is H W C, pyramid pooling divides it into N blocks, each block having a size of H _i ×w _i The pyramid pooling formula may be:

wherein f _in Representing the original feature map of the input, f _pool Represents pooling operation, h _i ×w _i Representing the size of the ith block []Representing a splicing operation, f _out The output feature map is shown. In a specific implementation, the sizes of the different blocks may be set to the size obtained by scaling in equal proportion, or the sizes may be set according to a certain rule, such as pyramid-shaped decrease, which is not limited in the present application.

Alternatively, parameters may be shared between the recursion layers, the same weights and offsets may be shared among different feature span blocks, and the number of parameters and computational complexity may be reduced.

Illustratively, a plurality of recursion unit layers contained in the resolution reconstruction model are sequentially connected, a reconstruction image is obtained based on a fusion image output after the recursion unit layer is iteratively invoked, the resolution of the reconstruction image is higher than that of an image to be reconstructed, and the method comprises the following steps: if the recursion unit layer called by the current iteration is detected to meet the preset iteration ending condition, inputting the fusion image output by the recursion unit layer called by the current iteration to the recursion unit layer called by the next iteration for image reconstruction; traversing each recursion unit layer, and obtaining a reconstructed image based on the fusion image output after each recursion unit layer is iteratively invoked.

The preset iteration end condition may be that the iteration number of the recursion unit layer reaches the preset number, or may be that the fused image obtained by the current iteration meets the preset requirement, for example, the definition of the fused image obtained by the current iteration meets the preset requirement, which is not limited in the present application.

If the recursion unit layer called by the current iteration is detected to meet the preset iteration ending condition, the fusion image output by the recursion unit layer called by the current iteration is input to the recursion unit layer called by the next iteration for image reconstruction, and parameters among the recursion unit layers have differences, such as different numbers of recursion units, different scale factors used for up-sampling or down-sampling, and the like.

The recursion unit layer is composed of a plurality of recursion units which are connected in sequence, and the recursion units are main execution units for executing image resolution reconstruction in the recursion unit layer so as to realize the reconstruction of the image resolution.

For example, the output of the recursion unit is expressed as:

SR _t ＝F _θ (SR _t-1 ，LR)，t＝1，2，...，T

F _θ representing the network parameters of the recursive unit, SR _t The output of the recursive unit is represented, T represents the number of layers of the recursive unit, T represents the total number of recursive units in the layer of the recursive unit, LR represents the input image of the recursive unit.

The specific calculation formula of the output of the recursion unit may be:

SR _t ＝φ(Conv(γ _1，t (SR _t-1 )+γ _2，t (LR)))+γ _3，t (SR _t-1 )

wherein Conv represents a convolution operation, γ ₁ ,t()、γ ₂ ,t()、γ ₃ T () is a learnable parameter, phi represents an activation function, such as a ReLU activation function or a prilu activation function, etc.

For example, the recursion unit is composed of a series of convolutions, deconvolutions, activation functions, up-sampling modules, feature fusion modules, attention modules, feature enhancement modules, convolution kernel generation modules, recursion modules, feature span modules and the like, wherein the feature fusion modules can fuse feature graphs obtained by up-sampling with feature graphs of input images in a skip connection (skip connection) mode and the like so as to improve the richness and stability of feature representation.

In some embodiments, the output layer of the resolution reconstruction model may directly output the output of the last recursive block as a final reconstructed image, or may output the final reconstructed image by performing weighted addition on the fused image output by each recursive unit layer in each recursive block.

In some embodiments, the present application may use Cross-Attention (Cross-Attention) for image fusion during image fusion, and may interact and align feature maps of two images. For example, the cross-attention mechanism first calculates a similarity matrix between feature maps of two images, and then performs feature mapping based on the similarity matrix.

In some embodiments, non-Local blocks (Non-Local blocks) are used to calculate the similarity between each pixel and all pixels in the entire picture, and these similarities are weighted and summed, which can reduce noise during image reconstruction while retaining more detail. Wherein, in calculating the non-local block, down sampling or up sampling, etc. can be used to reduce the amount of calculation. Meanwhile, in order to avoid information loss caused by smaller similarity between pixels at a longer distance, an attention mechanism can be introduced in similarity calculation for better capturing the interrelationship between pixels.

In addition, attention mechanisms contemplated by the present application include, but are not limited to, self-Attention (Self-Attention Mechanism), cross-Attention (Cross-Attention), spatial Attention mechanisms, channel Attention mechanisms, temporal Attention, etc., and the different Attention mechanisms may be concatenated to form a sequence of Attention modules, which are then inserted into a resolution reconstruction model. For example, assuming that the resolution reconstruction task requires the use of n different attention mechanisms, A1, a2.

The sequence of attention modules Am may optionally be inserted into a position between the convolution layer and the deconvolution layer to improve the feature extraction and feature reconstruction capabilities of the network. For example, the resolution reconstruction model includes a plurality of convolution layers Conv and deconvolution layers Deconv, where a position between the ith convolution layer and the ith deconvolution layer is pi, and the attention module sequence Am may be inserted into a position of each pi to form an attention-enhanced resolution reconstruction model, which is denoted as SRnet:

SRnet＝[Conv1,Am1,Deconv1,Conv2,Am2,Deconv2,...,ConvN,AmN,DeconvN]

where Conv1 to ConvN represent all the convolution layers in the resolution reconstruction model, deconv1 to DeconvN represent all the deconvolution layers in the resolution reconstruction model, and Am1 to AmN represent the sequence of attention modules inserted between Conv1 to ConvN and Deconv1 to DeconvN, respectively.

In addition, when multiple attention mechanisms are used in combination, a parallel manner may be adopted, that is, different attention mechanisms are applied in parallel to different branches in the network, so as to increase the richness and complexity of the model. In this case, the attention mechanisms in the different branches may share other layers in the network to reduce the number of parameters and the amount of computation.

The following describes the training process of the resolution reconstruction model:

referring to fig. 3, the training process of the resolution reconstruction model specifically includes:

1. preparing a sample dataset: the sample data set comprises high resolution image samples and low resolution image samples corresponding to each high resolution image sample, wherein the low resolution image samples can be obtained by downsampling the high resolution image samples.

2. Dividing a training sample set and a test sample set: the sample data set is divided into a training sample set and a test sample set according to a certain proportion, wherein the training sample set is used for training a model to be trained, and the test sample set is used for evaluating the performance of the model to be trained.

3. Data preprocessing: the method comprises the steps of preprocessing images of a training sample set and a testing sample set, including normalizing, cutting, rotating, mirror image overturning and the like, so that the diversity of sample data sets is increased, and the generalization capability of a model is improved.

4. Defining a model to be trained: according to the task demand and the characteristics of the sample data set, parameters such as the layer number, the convolution kernel size, the step length, the activation function and the like of the network model are defined.

5. Training a model to be trained: training the model to be trained by using the training sample set, and optimizing parameters of the model to be trained through a back propagation algorithm.

The processing manner of the model to be trained on the input low resolution image sample can be referred to the above steps S210 to S240, and will not be described herein.

In some embodiments, the convergence speed and stability of the network can be improved by adding residual connections, while reducing the problem of gradient extinction due to network depth.

The residual connection may be added in the recursive unit layer or in the output layer, which is not limited in the present application.

For example, the formula for the residual connection may be:

H _out ＝F _RC (F _RB (F _DC (L)))+H

where L is an input low resolution image sample, H is a high resolution image sample corresponding to the low resolution image sample, F _DC Representing the downsampling block, F _RB Representing recursive blocks, F _RC Representing upstrokeSample module, H _out Is the output reconstructed image.

And then, calculating a direct difference between the reconstructed image output by the model to be trained and a real high-resolution image sample through a loss function, so as to optimize parameters of the model to be trained through the calculated difference and a back propagation algorithm. Among them, the loss functions include, but are not limited to, mean square error (mean squared error, MSE), L1 norm loss function, perceptual loss function, and the like.

And if the current model to be trained meets the model training completion condition, taking the current model to be trained as a resolution reconstruction model after training completion.

6. Evaluating model performance: and evaluating the trained resolution reconstruction model by using a test sample set, and calculating indexes such as prediction accuracy, error, loss function and the like of the model so as to optimize the model.

7. Model application: the optimized resolution reconstruction model is applied to an actual image super-resolution reconstruction task, so that an image to be reconstructed can be input into the resolution reconstruction model, and a high-resolution reconstruction image is obtained through prediction of the resolution reconstruction model, so that image super-resolution reconstruction is realized.

According to the image resolution reconstruction method provided by the application, the image to be reconstructed is input into the recursion unit layer of the trained resolution reconstruction model, the recursion unit layer is utilized to perform feature extraction on the image to be reconstructed to obtain the feature image, the feature image and the image to be reconstructed are fused to output the fused image, the recursion unit layer is called for iteration again, the fused image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer in the iteration calling process, the reconstructed image is obtained based on the fused image output by the recursion unit layer after the iteration calling, the higher reconstruction image resolution is realized by increasing the recursion times, so that the complexity of the model is better controlled, and the resolution of the image is gradually increased through repeated attribution instead of directly mapping the low resolution image to the high resolution image, so that more accurate high resolution image can be generated.

Fig. 4 is a block diagram of an image resolution reconstruction apparatus according to an exemplary embodiment of the present application. As shown in fig. 4, the exemplary image resolution reconstruction apparatus 400 includes: an input module 410, a fusion module 420, a recursion module 430, and a result acquisition module 440. Specifically:

an input module 410, configured to input an image to be reconstructed into a recursion unit layer of the trained resolution reconstruction model;

the fusion module 420 is configured to perform feature extraction on an image to be reconstructed by using the recursion unit layer to obtain a feature map, and fuse the feature map and the image to be reconstructed to output a fused image;

the recursion module 430 is configured to iteratively invoke the recursion unit layer, and in the iterative invocation process, use a fused image output by a current iteration of the recursion unit layer as an input of a next iteration of the recursion unit layer;

the result obtaining module 440 is configured to obtain a reconstructed image based on the fusion image output after the recursive unit layer iteration call, where the resolution of the reconstructed image is higher than that of the image to be reconstructed.

In the above-mentioned exemplary image resolution reconstruction device, the image to be reconstructed is input into the recursion unit layer of the trained resolution reconstruction model, the recursion unit layer is utilized to perform feature extraction on the image to be reconstructed to obtain a feature image, the feature image and the image to be reconstructed are fused to output a fused image, the recursion unit layer is called for iteration again, the fused image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer in the iteration calling process, so that gradual recovery of image details is realized, the reconstructed image is obtained based on the fused image output after the iteration calling of the recursion unit layer, higher reconstruction image resolution is realized by increasing the number of recursion times without increasing the depth and the parameter number of the network, so that the complexity of the model is better controlled, and the resolution of the image is gradually increased by multiple passes instead of directly mapping the low resolution image to the high resolution image at one time, which is helpful for generating more accurate high resolution image.

The functions of each module may be described in the embodiments of the image resolution reconstruction method, which are not described herein.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 500 comprises a memory 501 and a processor 502, the processor 502 being arranged to execute program instructions stored in the memory 501 for implementing the steps of any of the above described embodiments of the image resolution reconstruction method. In one particular implementation scenario, electronic device 500 may include, but is not limited to: the electronic device 500 may also include mobile devices such as a notebook computer and a tablet computer, and is not limited herein.

In particular, the processor 502 is used to control itself and the memory 501 to implement the steps in any of the image resolution reconstruction method embodiments described above. The processor 502 may also be referred to as a CPU (Central Processing Unit ). The processor 502 may be an integrated circuit chip with signal processing capabilities. The processor 502 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 502 may be commonly implemented by an integrated circuit chip.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 600 stores program instructions 610 that can be executed by a processor, the program instructions 610 being configured to implement the steps of any of the embodiments of the image resolution reconstruction method described above.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A method of image resolution reconstruction, the method comprising:

inputting an image to be reconstructed into a recursion unit layer of a resolution reconstruction model after training is completed;

extracting features of the image to be reconstructed by using the recursion unit layer to obtain a feature image, and fusing the feature image and the image to be reconstructed to output a fused image;

iteratively calling the recursion unit layer, wherein in the iterative calling process, a fusion image output by the current iteration of the recursion unit layer is used as the input of the next iteration of the recursion unit layer;

and obtaining a reconstructed image based on the fusion image output after the recursion unit layer iteration calling, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed.

2. The method according to claim 1, wherein the performing feature extraction on the image to be reconstructed by using the recursive unit layer to obtain a feature map, and performing fusion on the feature map and the image to be reconstructed to output a fused image, includes:

extracting features of the image to be reconstructed to obtain a feature map;

performing differential processing on the characteristic image and the image to be reconstructed to obtain a residual image;

Weighting the residual image based on an attention mechanism to obtain a weighted residual image;

and carrying out fusion processing on the weighted residual image and the feature image to obtain a fusion image.

3. The method according to claim 2, wherein the feature extraction of the image to be reconstructed to obtain a feature map includes:

downsampling the image to be reconstructed to obtain a downsampled feature map;

performing convolution operation on the downsampled feature map to obtain a convolution feature map;

weighting the convolution feature map based on an attention mechanism to obtain a weighted feature map;

and up-sampling the weighted feature map to obtain a feature map.

4. The method of claim 2, wherein the recursive unit layer contains a plurality of feature span blocks; the step of extracting the features of the image to be reconstructed to obtain a feature map comprises the following steps:

extracting features of the image to be reconstructed to obtain an initial feature map;

respectively carrying out span convolution on the initial feature images based on each feature span block to obtain a plurality of intermediate feature images; the span convolution of each characteristic span block adopts different space scale information;

And splicing the plurality of intermediate feature graphs to obtain a feature graph.

5. The method according to claim 1, wherein obtaining the reconstructed image based on the fused image output after the recursive unit layer iteration call comprises:

acquiring a fusion image obtained by the recursion unit layer in each iteration call to obtain an output set corresponding to the recursion unit layer;

and fusing each fused image in the output set to obtain a reconstructed image.

6. The method of claim 5, wherein fusing each fused image in the output set to obtain a reconstructed image comprises:

acquiring a weight parameter corresponding to each fusion image in the output set;

and carrying out weighted fusion on each fusion image in the output set based on the weight parameters to obtain a reconstructed image.

7. The method according to any one of claims 1 to 6, wherein the resolution reconstruction model contains a plurality of sequentially connected layers of recursive units; the obtaining a reconstructed image based on the fusion image output after the recursion unit layer iteration call, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed comprises the following steps:

If the recursion unit layer called by the current iteration is detected to meet the preset iteration ending condition, inputting the fusion image output by the recursion unit layer called by the current iteration to the recursion unit layer called by the next iteration for image reconstruction;

traversing each recursion unit layer, and obtaining a reconstructed image based on the fusion image output after each recursion unit layer is iteratively invoked.

8. An image resolution reconstruction apparatus, comprising:

the input module is used for inputting the image to be reconstructed into a recursion unit layer of the resolution reconstruction model after training;

the fusion module is used for extracting the characteristics of the image to be reconstructed by utilizing the recursion unit layer to obtain a characteristic image, and fusing the characteristic image and the image to be reconstructed to output a fused image;

the recursion module is used for iteratively calling the recursion unit layer, and taking a fusion image output by the current iteration of the recursion unit layer as the input of the next iteration of the recursion unit layer in the iteration calling process;

and the result acquisition module is used for acquiring a reconstructed image based on the fusion image output after the recursion unit layer iteration call, wherein the resolution of the reconstructed image is higher than that of the image to be reconstructed.

9. A computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.

10. An electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the method of any one of claims 1 to 7.