CN111179177B

CN111179177B - Image reconstruction model training method, image reconstruction method, device and medium

Info

Publication number: CN111179177B
Application number: CN201911409903.4A
Authority: CN
Inventors: 王汝欣; 邱亚军; 陶大鹏
Original assignee: Shenzhen Union Vision Innovation Technology Co ltd
Current assignee: Shenzhen Union Vision Innovation Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2024-03-26
Anticipated expiration: 2039-12-31
Also published as: CN111179177A

Abstract

The invention discloses an image reconstruction model training method, an image reconstruction method, image reconstruction equipment and a medium. The method comprises the following steps: acquiring an original high-resolution image and an original low-resolution image; inputting an original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image; inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value; and updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result to acquire a target generation network based on super-resolution reconstruction. The target generation network can reconstruct images with higher resolution containing texture features and structural features of different scales, and has higher perceived quality.

Description

Image reconstruction model training method, image reconstruction method, device and medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image reconstruction model training method, an image reconstruction device, and a medium.

Background

In the technical field of image processing, image resolution is one of the main technical indicators characterizing the level of image observation. Image resolution generally refers to the spatial resolution of an image, and the higher the image resolution, the finer the scene details that the image can reflect, and the more informative the image. Because of the limitations of objective conditions, high quality images are often not available in a realistic application environment. Therefore, it is critical to research advanced image super-resolution reconstruction technology to improve the recognition capability and recognition accuracy of images.

The image super-resolution reconstruction technique refers to a technique of converting an image with a lower resolution into an image with a higher resolution. The image super-resolution reconstruction technology has important application value in the aspects of monitoring equipment, video communication, satellite images, medical images and the like, for example, the image super-resolution reconstruction technology has higher application value in scenes such as super-resolution reconstruction based on human faces, visual experience optimization of video images, vehicle identification, industrial equipment fault detection, remote sensing image processing of moving object acquisition, video and image quality evaluation and the like.

The current human visual system (Human Visual System, abbreviated as HSV) adopts a convolutional neural network (Convolutional Neural Networks, abbreviated as CNN) to realize super-resolution reconstruction, in the model training process, pixel loss of training images in the CNN is directly used to optimize model parameters of the CNN, and the mode can obtain higher peak signal-to-noise ratio (Peak Signal to Noise Ratio, abbreviated as PSNR) and structure identity (SSIM), but in the optimizing process, in order to minimize the loss among pixels, the convolutional neural network does not consider the perceived quality of the images, the reconstructed images are smoother, the image perceived quality evaluation (Natural image quality evaluator, abbreviated as NIQE) is lower, and the subjective perception level is not in accordance with the preference of the human visual system.

Disclosure of Invention

The embodiment of the invention provides an image reconstruction model training method, an image reconstruction method, image reconstruction equipment and a medium, which are used for solving the problem of lower perceived quality of an image generated in the current image super-resolution reconstruction process.

The embodiment of the invention provides an image reconstruction model training method, which comprises the following steps:

acquiring an original high-resolution image and an original low-resolution image corresponding to the original high-resolution image;

inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image;

inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result;

inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value;

and updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result to acquire a target generation network based on super-resolution reconstruction.

Preferably, the inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, obtaining a pseudo high-resolution image corresponding to the original low-resolution image, includes:

Extracting features of the original low-resolution image to obtain an original feature map;

inputting the original feature map into a multi-band block residual error generation network to perform frequency division feature extraction, and obtaining a target frequency feature map and at least two low frequency feature maps; the multi-band block residual error generating network comprises at least two block residual error networks which are sequentially connected in series, the current block residual error network is adopted to perform feature extraction on an input feature map, the low-frequency feature map and the block high-frequency feature map corresponding to the current block residual error network are output, and the block high-frequency feature map is input to the next block residual error network; the input feature map comprises an original feature map or a block high-frequency feature map output by a previous block residual error network; the target high-frequency characteristic diagram is the block high-frequency characteristic diagram output by the last current block residual error network;

and carrying out image reconstruction based on the target high-frequency characteristic image and at least two low-frequency characteristic images, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image.

Preferably, the current block residual network comprises a low-frequency characteristic separation sub-network and a high-frequency characteristic separation sub-network;

the step of extracting the characteristics of the input characteristic diagram by adopting the current block residual error network, and outputting the low-frequency characteristic diagram and the block high-frequency characteristic diagram corresponding to the current block residual error network comprises the following steps:

According to the current frequency band, the low-frequency characteristic separation sub-network is adopted to conduct characteristic extraction on the input characteristic diagram, low-frequency characteristic information is obtained, low-frequency characteristic separation is conducted on the basis of the low-frequency characteristic information, and the low-frequency characteristic diagram is output;

and adopting the high-frequency characteristic separation sub-network to perform high-frequency characteristic separation on the low-frequency characteristic information, and outputting a block high-frequency characteristic diagram.

Preferably, the extracting the features of the input feature map by using the low-frequency feature separation sub-network according to the current frequency band, obtaining low-frequency feature information, performing low-frequency feature separation based on the low-frequency feature information, and outputting a low-frequency feature map, including:

the up-sampling unit is adopted to carry out up-sampling processing on the input feature map, and an up-sampling feature map is obtained;

performing feature extraction on the up-sampling feature map by adopting a first convolution unit to obtain low-frequency feature information;

and adopting a second convolution unit to perform low-frequency characteristic separation on the low-frequency characteristic information and outputting a low-frequency characteristic diagram.

Preferably, the high-frequency feature separation sub-network is used for performing high-frequency feature separation on the low-frequency feature information, and outputting a block high-frequency feature map, which includes:

mapping the low-frequency characteristic information to a low-resolution space by adopting a third convolution unit to obtain a first characteristic map;

Subtracting operation is carried out based on the input feature map and the first feature map, and high-frequency feature information is output;

performing feature extraction on the high-frequency feature information by adopting a fourth convolution unit to obtain a second feature map;

and carrying out addition operation based on the first characteristic diagram and the second characteristic diagram, and outputting a block high-frequency characteristic diagram.

Preferably, the acquiring an original high resolution image and an original low resolution image corresponding to the original high resolution image includes:

acquiring an original training image and determining the image resolution of the original training image;

if the image resolution of the original training image is larger than a first resolution threshold, determining the original training image as an original high-resolution image;

and carrying out downsampling processing on the original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image.

Preferably, after the acquiring the original high resolution image and the original low resolution image corresponding to the original high resolution image, the image reconstruction model training method includes:

inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image; inputting the pseudo high-resolution image into a second generation network to reconstruct the image in low resolution, and obtaining the pseudo low-resolution image;

Inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; inputting the original low-resolution image and the pseudo low-resolution image into a second discrimination network for discrimination to obtain a second discrimination result;

inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to perform perception loss calculation, and obtaining a first perception loss; inputting the original low-resolution image and the pseudo low-resolution image into a perception loss network to perform perception loss calculation, and obtaining a second perception loss;

updating model parameters of the first generation network, the second generation network, the first discrimination network and the second discrimination network based on the first perception loss, the second perception loss, the first discrimination result and the second discrimination result to acquire a target generation network based on super-resolution reconstruction

The embodiment of the invention also provides an image reconstruction method, which comprises the following steps:

acquiring an image to be processed, and determining the image resolution of the image to be processed;

if the image resolution of the image to be processed is smaller than a second resolution threshold, determining the image to be processed as an image to be reconstructed;

And carrying out image super-resolution reconstruction on the image to be reconstructed by using a target generation network acquired by an image reconstruction model training method, and acquiring a target reconstruction image.

The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the image reconstruction model training method when executing the computer program or realizes the image reconstruction method when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and is characterized in that the computer program realizes the image reconstruction model training method when being executed by a processor, or realizes the image reconstruction method when being executed by the processor.

The image reconstruction model training method, the image reconstruction equipment and the medium adopt a generated countermeasure network as a basic network for model training, and perform image super-resolution reconstruction on an original low-resolution image by utilizing a first generation network so as to acquire a pseudo high-resolution image; the method comprises the steps of calculating perception loss values of an original high-resolution image and a pseudo high-resolution image by using a perception loss network, updating model parameters of a first generation network based on the perception loss values, updating model parameters of a first discrimination network through a back propagation algorithm to finish a training process of generating an antagonistic network model, determining the first generation network after updating the model parameters as a target generation network, enabling the target generation network to enable texture features and structural features of the pseudo high-resolution image and the original high-resolution image to be consistent on different scales, and enabling the trained target generation network to generate images with higher resolutions and containing texture features and structural features of different scales when the trained target generation network carries out image super-resolution reconstruction on images with lower resolutions subsequently, so that the generated images have higher perception quality and are more consistent with human visual system preferences on a subjective perception level.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for training an image reconstruction model in an embodiment of the invention;

FIG. 2 is another flow chart of a method for training an image reconstruction model in accordance with an embodiment of the present invention;

FIG. 3 is another flow chart of a method of training an image reconstruction model in an embodiment of the present invention;

FIG. 4 is another flow chart of a method of training an image reconstruction model in accordance with an embodiment of the present invention;

FIG. 5 is another flow chart of a method of training an image reconstruction model in an embodiment of the present invention;

FIG. 6 is another flow chart of a method of training an image reconstruction model in an embodiment of the present invention;

FIG. 7 is another flow chart of a method of training an image reconstruction model in accordance with an embodiment of the present invention;

FIG. 8 is another flow chart of an image reconstruction method according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The image reconstruction model training method provided by the embodiment of the invention can be applied to computer equipment and is used for training a network model capable of realizing image super-resolution reconstruction, so that the target generation network capable of realizing image super-resolution reconstruction, which is obtained by training by using the image reconstruction model training method, can be used for carrying out super-resolution reconstruction on an image with lower resolution to obtain an image with higher resolution.

In one embodiment, as shown in fig. 1, an image reconstruction model training method is provided, and the image reconstruction model training method is applied to a computer device for illustration, and the image reconstruction model training method includes the following steps:

S101: an original high-resolution image and an original low-resolution image corresponding to the original high-resolution image are acquired.

Wherein the original high resolution image is a higher resolution image for model training. The original low resolution image is a lower resolution image for model training. It will be appreciated that the original low resolution image is an image that requires super resolution reconstruction of the image; accordingly, the original high-resolution image is an image obtained by performing image super-resolution reconstruction on the original low-resolution image, and the image is verified. In this example, the original low-resolution image corresponds to the original high-resolution image, which means that the image content of the original low-resolution image is the same as that of the original high-resolution image, so that the verification of the image obtained after the image super-resolution reconstruction of the original low-resolution image by using the original high-resolution image is feasible, and the target generation network trained by the image reconstruction model is guaranteed to reconstruct the super-resolution image.

S102: and inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image.

Wherein the first generation network is a Generator (Generator) in a generation countermeasure network (Generative Adversarial Nets, GAN), which is a network for capturing potential distributions of real data samples to generate new dummy data samples. In this example, the first generation network is a generation network that needs to perform subsequent model training, specifically, a generation network constructed by using a super-resolution technology, and is a generation network that can reconstruct an image with a lower resolution into an image with a higher resolution. The pseudo high resolution image is an image obtained after super resolution reconstruction based on the original low resolution image.

In this example, an original low-resolution image is input into a first generation network constructed based on a super-resolution technology, and image super-resolution reconstruction processing is performed on the original low-resolution image by using the first generation network, so as to obtain a pseudo high-resolution image with higher resolution. It can be appreciated that, since the original low resolution image corresponds to the original high resolution image, and the pseudo high resolution image is an image obtained after super resolution reconstruction from the original low resolution image, the pseudo high resolution image also corresponds to the image content of the original high resolution image, ensuring the feasibility of subsequent discrimination processing and perception loss calculation.

S103: and inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination, and obtaining a first discrimination result.

The first discrimination network is a Discriminator (Discriminator) in the generation countermeasure network (Generative Adversarial Nets, GAN), which is a network for discriminating whether a real data sample or a pseudo data sample is generated, and the first discrimination network is a discrimination network requiring model training. The first discrimination result is the probability that the output is used for reflecting that the pseudo high resolution image is the original high resolution image after the first discrimination network discriminates the original high resolution image and the pseudo high resolution image, and can be understood as the similarity of the original high resolution image and the pseudo high resolution image. In this example, the first discrimination network may be a discrimination network of an existing SRAGN, which is not described in detail herein. As an example, the first discrimination result output by the first discrimination network is a result for discriminating whether the input image is the original high resolution image or the pseudo high resolution image.

In the process of training a reconstruction model, the first generation network takes an original low-resolution image as input and a pseudo high-resolution image as output; the first discrimination network takes the original high-resolution image and the pseudo high-resolution image as inputs, and the first discrimination result as output; the first generation network and the first discrimination network are mutually opposed, and model parameters are updated through a reflection propagation algorithm.

S104: and inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value.

Wherein the perceptual loss network is a network for calculating a perceptual loss (Perceptual Losses) between two input images. In general, a perceptual loss network comprises a plurality of branch networks, each of which employs convolution kernels of different sizes to extract different scale feature information from an image, and a gap network of two input images is generated by defining a perceptual loss function. The perceptual loss value is an output value of the perceptual loss network, and specifically, in this example, the perceptual loss value is a loss value determined after the perceptual loss network performs the perceptual loss calculation on the original high-resolution image and the pseudo high-resolution image.

In this example, the original high-resolution image and the pseudo high-resolution image are input into the perceptual loss network together, and Feature extraction is performed on the original high-resolution image and the pseudo high-resolution image by using convolution kernels of different layers in the perceptual loss network, so as to extract scale Feature information corresponding to the convolution kernels, and obtain a Feature map (Feature map) corresponding to the original high-resolution image and a Feature map (Feature map) corresponding to the pseudo high-resolution image respectively; and then calculating the perception loss values corresponding to the original high-resolution image and the pseudo high-resolution image based on the Feature map (Feature map) corresponding to the original high-resolution image and the Feature map (Feature map) corresponding to the pseudo high-resolution image, wherein the perception loss values contain information of multiple scales in the two images, and can effectively guide the generation network to generate more detail texture information.

In a specific embodiment, the original high resolution image and the pseudo high resolution image are respectively input into a perception loss network, wherein the perception loss network is provided with at least one basic module, each basic module comprises a plurality of branches, and each branch adopts convolution kernels with different sizes. For example, an acceptance module is used as a basic module, where the acceptance module includes four branches, namely, a first branch, a second branch, a third branch, and a fourth branch from left to right. Wherein the first branch comprises a Conv-1 convolution layer employing a 1 x 64 convolution kernel. The second branch includes a Conv-3 convolution layer employing a 3×3×128 convolution kernel and a Conv-1 convolution layer employing a 1×1×96 convolution kernel. The third branch comprises a Conv-5 convolution layer using a 5X 32 convolution kernel and a Conv-1 convolution layer using a 1X 16 convolution kernel. The fourth branch comprises a Conv-1 convolution layer and an MP-3 pooling layer, wherein the Conv-1 convolution layer adopts a 1 multiplied by 31 convolution kernel; the Mp-3 pooling layer is the 3×3 maximum pooling layer. Since large convolution kernels can extract global structural information of an image, small convolution kernels are more good at capturing texture detail information of the image, a perception loss network can extract texture features and structural features of different scales by using convolution kernels of different sizes, and the gap between a pseudo high-resolution image and an original high-resolution image is measured by defining a perception loss function. In one example, the perceptual loss function is:

Wherein W is _i,j 、H _i,j For the width and height of feature map (feature map) output by each convolution kernel in the perceptual loss network, phi _i,j (I ^HR ) _x,y A feature map, phi, representing a j-th convolution before an i-th max-pooling layer of an original high resolution image _i,j (G _θG (I ^LR )) _x,y ) Pseudo high resolution generated via a first generation network on behalf of an original low resolution imageThe jth convolved feature map before the ith maximum pooling layer of the rate image, LR is the original low-resolution image, HR is the original high-resolution image, SR is the pseudo high-resolution image, and x and y refer to the x coordinate and y coordinate of the pixel point in the feature map respectively.

S105: and updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result to acquire a target generation network based on super-resolution reconstruction.

The target generation network is a generation network formed by model training of a first generation network constructed based on a super-resolution technology to update model parameters. The target discrimination network is a discrimination network formed by model training of the first discrimination network to update model parameters.

In this example, the model parameters of the first generating network are optimized by using the perceived loss value output by the perceived loss network, the model parameters of the first discriminating network are updated by using a back propagation algorithm, and the steps are repeatedly executed until the first discriminating result output by the first discriminating network is that the input image cannot be discriminated as the original high resolution image or the pseudo high resolution image, the generation of the model training of the countermeasure network is determined to be completed, the generating network in which the model parameter update has been completed in the model training completed generated countermeasure network model is determined to be the target generating network, and the discriminating network in which the model parameter update has been completed in the model training completed generated countermeasure network model is determined to be the target discriminating network.

The model parameters are updated compared to the pixel loss employed in the CNN training process so that the generation network will force the generated pseudo high resolution image to follow the learned distribution so that it is inseparable from the real high resolution image, but the difference network ignores the inter-sample relationship so that its recovered image may be excessively similar or remove important visual features. The model parameters are updated by using the perception loss value, namely, the model parameters of the first generation network are continuously adjusted and optimized through the perception loss value, so that texture features and structural features of the pseudo high-resolution image and the original high-resolution image on different scales are kept consistent, and accordingly, when the trained target generation network carries out image super-resolution reconstruction on the image with lower resolution subsequently, the image with higher resolution comprising the texture features and the structural features with different scales can be generated, the generated image has higher perception quality, and the generated image is more in line with human visual system preference on the subjective perception level.

In the image reconstruction model training method provided by the embodiment, a generated countermeasure network is adopted as a basic network for model training, and the first generation network is utilized to reconstruct the super-resolution of the original low-resolution image so as to obtain a pseudo high-resolution image; the method comprises the steps of calculating perception loss values of an original high-resolution image and a pseudo high-resolution image by using a perception loss network, updating model parameters of a first generation network based on the perception loss values, updating model parameters of a first discrimination network through a back propagation algorithm to finish a training process of generating an antagonistic network model, determining the first generation network after updating the model parameters as a target generation network, enabling the target generation network to enable texture features and structural features of the pseudo high-resolution image and the original high-resolution image to be consistent on different scales, and enabling the trained target generation network to generate images with higher resolutions and containing texture features and structural features of different scales when the trained target generation network carries out image super-resolution reconstruction on images with lower resolutions subsequently, so that the generated images have higher perception quality and are more consistent with human visual system preferences on a subjective perception level.

In one embodiment, as shown in fig. 2, step S101, namely, acquiring an original high resolution image and an original low resolution image corresponding to the original high resolution image, specifically includes the following steps:

s201: and acquiring an original training image, and determining the image resolution of the original training image.

Wherein the raw training image is an unprocessed image acquired by the computer device.

As an example, the computer device may obtain the original training image from an image database and then identify the original training image using a resolution identification technique to determine an image resolution of the original training image. The resolution identification technology is the prior art, and is not described in detail herein for the sake of avoiding redundancy.

S202: and if the image resolution of the original training image is greater than the first resolution threshold, determining the original training image as an original high-resolution image.

Wherein the first resolution threshold is a threshold set in advance for evaluating whether the image resolution reaches a criterion that is recognized as being able to be an original high-resolution image. As an example, the computer device compares the image resolution of each original training image to a preset first resolution threshold. If the image resolution of the original training image is larger than the first resolution threshold, the image resolution of the original training image is indicated to reach the standard of being identified as higher in resolution, at this time, the original training image is determined to be the original high-resolution image, so that when the original high-resolution image is respectively input into the first judging network and the perception loss network for processing, the accuracy of a processing result is ensured, the target generating network obtained through training reconstructs the image with lower resolution into the image with higher resolution, and the image quality of the reconstructed image is ensured. Correspondingly, if the image resolution of the original training image is not greater than the first resolution threshold, it is indicated that the image resolution of the original training image does not reach the standard of recognizing the original training image as high resolution, and at this time, if the original training image is directly determined to be the original high resolution image, the resolution of the image formed after the target generation network obtained by training performs image super-resolution reconstruction is low, and the requirement of a specific scene cannot be met.

S203: and performing downsampling processing on the original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image.

Among them, downsampling (subsampling), also called downsampling (downsampling), is a process for reducing an image so that the image conforms to the size of a display area or an image with a large resolution. As an example, if an original high-resolution image I has a resolution of m×n, s times downsampling is performed to obtain an original low-resolution image having a resolution of (M/s) ×n/s, where s is a common divisor of M and N. In this embodiment, methods such as nearest neighbor interpolation, bilinear interpolation, mean interpolation, and median interpolation may be used in the downsampling process.

In this example, after determining the original training image with the image resolution greater than the first resolution threshold as the original high resolution image, downsampling the original high resolution image so that the obtained original low resolution image is identical to the image content of the original high resolution image, and it is feasible to verify the image obtained after performing the image super-resolution reconstruction on the original low resolution image by using the original high resolution image.

In one embodiment, step S203, namely, performing downsampling processing on an original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image, specifically includes the following steps: and performing image low-resolution reconstruction on the original high-resolution image by adopting a second generation network, and acquiring the original low-resolution image corresponding to the original high-resolution image. Wherein the second generation network is a generation network for converting the higher resolution image into a lower resolution image.

In one embodiment, as shown in fig. 3, step S201, namely inputting an original low resolution image into a first generation network for image super resolution reconstruction, obtains a pseudo high resolution image corresponding to the original low resolution image, specifically includes the following steps:

s301: and extracting the characteristics of the original low-resolution image to obtain an original characteristic image.

The original Feature map is a Feature map (Feature map) obtained by extracting features of an original low-resolution image. In this example, the original low resolution image is extracted by using multiple convolution layers, for example, three convolution layers may be used to extract the original low resolution image, each convolution layer uses a convolution kernel of 3×3, a step size is 1, a padding (padding) is 1, the number of convolution kernels is 128, 64 and 64, respectively, and a feature map output by the last convolution layer is determined as an original feature map, where the original feature map includes structure information and texture information of a plurality of different frequency segments.

S302: inputting the original feature map into a multi-band block residual error generation network to perform frequency division feature extraction, and obtaining a target frequency feature map and at least two low frequency feature maps; the multi-band block residual error generating network comprises at least two block residual error networks which are sequentially connected in series, the current block residual error network is adopted to perform feature extraction on the input feature map, a low-frequency feature map and a block high-frequency feature map corresponding to the current block residual error network are output, and the block high-frequency feature map is input to the next block residual error network; the input feature map comprises an original feature map or a block high-frequency feature map output by a previous block residual error network; and the target frequency characteristic diagram is a block high frequency characteristic diagram output by the last current block residual error network.

The multi-band block residual error generation network is a network for realizing image super-resolution reconstruction, which is formed based on at least two block residual error networks connected in series. The block residual error network is a basic module for realizing super-resolution reconstruction of images, and each block residual error network is used for realizing extraction of image features of one frequency segment so as to extract structural features and texture features of the frequency segment. Correspondingly, the multi-band block residual error generating network performs frequency division feature extraction, specifically, at least two block residual error networks are adopted to perform feature extraction on the original feature map according to different frequency bands so as to extract structural features and texture features corresponding to the corresponding frequency bands. The low-frequency characteristic diagram is a low-frequency characteristic diagram which is output by the current block residual error network and corresponds to a current frequency section, and the low-frequency characteristic diagram comprises structural characteristics and texture characteristics corresponding to the current frequency section, wherein the current frequency section is a frequency section of image characteristics to be acquired by the current block residual error network. The block high-frequency characteristic diagram is a low-frequency higher characteristic diagram corresponding to a current frequency band output by a current block residual error network, and comprises structural characteristics and texture characteristics of the current block residual error network output to a next block residual error network for characteristic extraction.

For convenience of description, a block residual network currently undergoing feature extraction is defined as a current block residual network in this example; defining a last block residual network connected with the current block residual network as a last block residual network; the next block residual network connected to the current block residual network is defined as the next block residual network. In this example, the 1 st current block residual network performs feature extraction on an original feature map, obtains a low-frequency feature map and a block high-frequency feature map corresponding to a current frequency segment to be extracted by the 1 st current block residual network, outputs the low-frequency feature map to an image reconstruction network, outputs the block high-frequency feature map to a next block residual network (i.e., the 2 nd block residual network) … … nth (n is greater than or equal to 2) current block residual network, performs feature extraction on a block high-frequency feature map input by a previous block residual network (i.e., the n-1 st current block residual network) of the current block residual network, obtains a low-frequency feature map and a block high-frequency feature map corresponding to a current frequency segment to be extracted by the nth current block residual network, outputs the low-frequency feature map to the image reconstruction network, outputs the block high-frequency feature map to the next block residual network (i.e., the n+1 th block residual network), and so on, defines the block high-frequency feature map output by the last current block residual network as a target high-frequency feature map, and outputs the block high-frequency feature map to the image reconstruction network together. It will be appreciated that the frequency of the low frequency profile output in each block residual network is lower than the frequency of the block high frequency profile; because the input feature images of the nth (n is larger than or equal to 2) current block residual networks are all block high-frequency feature images output by the previous block residual network except the 1 st current block residual network, the nth current block residual network processes the input feature images, and further performs feature extraction on the block high-frequency feature images output by the previous block residual network to output a low-frequency feature image and a block high-frequency feature image corresponding to the nth current block residual network, so that the frequency of the low-frequency feature image output by the nth current block residual network is higher than the frequency of the low-frequency feature image output by the n-1 th current block residual network, the frequencies of at least two low-frequency feature images are sequentially increased, and the frequencies of the low-frequency feature images output by the current block residual networks with different scales are extracted, so that feature separation is finer.

Because the deep neural network extracts image features through multi-layer convolution kernel operation in the learning process, shallow layers (namely shallow convolution kernels, such as a first layer convolution kernel and a second layer convolution kernel in the deep neural network) can learn more simple low-frequency feature information of edges, corner points, textures, geometric shapes, surfaces and the like, and learn less complex abstract high-frequency feature information; the deep layer (namely, a deep convolution kernel, such as a K-th convolution kernel in a deep neural network, K is more than or equal to 3) can learn more complex and abstract high-frequency characteristic information, and the low-frequency characteristic information can be easily and completely fitted by the convolution kernel of the shallow layer from the aspect of frequency, and then characteristic conversion is carried out on the basis of the low-frequency characteristic information so as to obtain a low-frequency characteristic image, so that the low-frequency characteristic image output by each current block residual error network has feasibility; the block high frequency signature may be understood as a signature formed after separating the low frequency signature from the input signature. In this example, each current block residual network outputs a low-frequency feature map and a block high-frequency feature map, only the block high-frequency feature map is transmitted to the next block residual network, and finally, all the current block residual networks are connected in a cascade manner, and all the low-frequency feature maps and the target high-frequency feature map are connected through a concat to serve as the output of the multi-band block residual generation network. The local residual error learning mode can effectively improve the flow of information and gradients of the whole network, so that the frequency separation of images is finer, and the reconstruction of a pseudo high-resolution image containing more detail texture features and structural features is facilitated.

S303: and performing image reconstruction based on the target frequency characteristic image and at least two low frequency characteristic images to obtain a pseudo high resolution image corresponding to the original low resolution image.

In this example, an image reconstruction network may be used to reconstruct an image of the target high frequency feature map and at least two low frequency feature maps to obtain a pseudo high resolution image corresponding to the original low resolution image. Wherein the image reconstruction network is a network for implementing an image reconstruction process. The image reconstruction network is a network which performs image reconstruction on a target high-frequency feature map and at least two low-frequency feature maps output by the multi-band block residual error generation network to output a pseudo high-resolution image corresponding to an original low-resolution image. As an example, the image reconstruction network may perform an addition operation on at least two low frequency feature maps and the target high frequency feature map to construct a pseudo high resolution image, specifically, a layer of 3×3 convolution kernels (step size is 1, padding is 1, and the number of convolution kernels is 64), and perform feature transformation on the target high frequency feature map and the at least two low frequency feature maps to output the pseudo high resolution image.

In the image reconstruction model training method provided by the embodiment, the multi-band block residual error generating network is adopted to conduct frequency division feature extraction on the original feature image, the local residual error learning mode is adopted to extract the low-frequency feature image and the target high-frequency feature image corresponding to the corresponding frequency bands output by different block residual error networks, and then the image reconstruction network is utilized to conduct image reconstruction on at least two low-frequency feature images and the target high-frequency feature image, so that the generated pseudo high-resolution image contains feature information of different scales, namely structural features and texture features corresponding to different scales, and the perception quality of the generated pseudo high-resolution image is higher.

In an embodiment, the current block residual network comprises a low frequency feature separation sub-network and a high frequency feature separation sub-network. Correspondingly, as shown in fig. 4, the method adopts the current block residual error network to perform feature extraction on the input feature map, and outputs a low-frequency feature map and a block high-frequency feature map corresponding to the current block residual error network, which specifically includes the following steps:

s401: and according to the current frequency band, carrying out feature extraction on the input feature map by adopting a low-frequency feature separation network, obtaining low-frequency feature information, carrying out low-frequency feature separation based on the low-frequency feature information, and outputting the low-frequency feature map.

The low-frequency characteristic separation sub-network is a network for realizing low-frequency characteristic separation of the input characteristic diagram. And the current frequency segment is a frequency range for extracting image characteristics for the current block residual error network.

In the example, according to a current frequency segment corresponding to a current block residual error network, adopting a low-frequency characteristic separation network to perform characteristic separation on an input characteristic image, and extracting low-frequency characteristic information such as edges, corner points, textures, geometric shapes, surfaces and the like from the input characteristic image; and then, the low-frequency characteristic information is checked by convolution to carry out low-frequency characteristic separation, so as to obtain a low-frequency characteristic diagram.

Because the deep neural network can extract image features through multi-layer convolution kernel operation in the learning process, shallow layer (namely shallow layer convolution kernels, such as a first layer convolution kernel and a second layer convolution kernel in the deep neural network) can learn more simple low-frequency feature information of edges, angular points, textures, geometric shapes, surfaces and the like, only learn less complex abstract high-frequency feature information, and deep layer (namely deep layer convolution kernels, such as a K layer convolution kernel in the deep neural network, K is more than or equal to 3) can learn more complex abstract high-frequency feature information. Therefore, the low-frequency characteristic information can be understood as characteristic information which can be directly fitted by adopting a low-frequency characteristic separation sub-network corresponding to the current frequency band; the low-frequency feature map may be understood as a feature map determined after low-frequency feature separation based on the low-frequency feature information.

S402: and adopting a high-frequency characteristic separation sub-network to perform high-frequency characteristic separation on the low-frequency characteristic information, and outputting a block high-frequency characteristic diagram.

The high-frequency characteristic separation sub-network is a network for back-projecting low-frequency characteristic information. In the example, the high-frequency characteristic separation sub-network is adopted to perform high-frequency characteristic separation on the low-frequency characteristic information fitted and separated by the low-frequency characteristic separation sub-network, and the low-frequency characteristic information is filtered from the input characteristic diagram, so that a block high-frequency characteristic diagram for filtering the low-frequency characteristic information is formed.

In the image reconstruction model training method provided by the embodiment, the low-frequency characteristic sub-network is adopted to separate the low-frequency characteristic image corresponding to the low-frequency characteristic information from the input characteristic image, so that the low-frequency characteristic image can effectively reflect the information such as the structural characteristics, the texture characteristics and the like corresponding to the current frequency segment; and then, the high-frequency characteristic separation sub-network is utilized to carry out high-frequency characteristic separation on the low-frequency characteristic information of the low-frequency characteristic separation sub-network so as to obtain a block high-frequency characteristic image for filtering the low-frequency characteristic information, the block high-frequency characteristic image is used as an input characteristic image of the next block residual error network, and detail texture characteristics and structural characteristics corresponding to different frequency segments are extracted in a local residual error mode, so that the texture characteristics and structural characteristics of different frequency segments can be reflected by the subsequently reconstructed pseudo high-resolution image, and the method has better perception quality.

In an embodiment, as shown in fig. 5, step S401, that is, according to a current frequency band, performs feature extraction on an input feature map by using a low-frequency feature separation network, obtains low-frequency feature information, performs low-frequency feature separation based on the low-frequency feature information, and outputs a low-frequency feature map, includes the following steps:

s501: and carrying out up-sampling processing on the input feature map by adopting an up-sampling unit to obtain an up-sampling feature map.

The up-sampling unit is a unit for up-sampling an image to enlarge the image. As an example, the input feature map may be upsampled using an upsampling sub-network, which is specifically a 1-layer deconvolution kernel layer, specifically 64 6×6 deconvolution kernels, a step size of 2, and a padding (padding) of 2, with which the input feature map of the low resolution space may be mapped to the high resolution space to obtain the upsampled feature map. As another example, the up-sampling of the input feature map to obtain an up-sampled feature map may be implemented by sub-pixel convolution or using interpolation to scale the input feature map to a target size, thereby eliminating the need to provide an up-sampling sub-network in the low frequency feature separation sub-network.

S502: and carrying out feature extraction on the up-sampling feature map by adopting a first convolution unit to obtain low-frequency feature information.

The first convolution unit is a processing unit for extracting features of the up-sampling feature map to fit low-frequency feature information. The low-frequency characteristic information is main characteristic information extracted from shallow layers including edges, corner points, textures, geometric shapes, surfaces and the like extracted from the up-sampling characteristic diagram. It can be understood that the low-frequency characteristic information fitted by the first convolution unit corresponds to the current frequency segment corresponding to the current block residual error network, that is, the characteristic information of the shallow layers such as edges, corner points, textures, geometric shapes, surfaces and the like corresponding to the current frequency segment is fitted, and other characteristic information except the current frequency segment cannot be fitted.

In this example, the first convolution unit is a shallow layer, specifically 2 convolution layers may be used, each convolution layer uses a convolution kernel of 3×3, the step size is 1, the padding (padding) is 1, and the number of convolution kernels is 64. In this example, a first convolution unit may be used to extract more low-frequency feature information and less high-frequency feature information from the upsampled feature map, where the first convolution unit inputs the obtained low-frequency feature information to the second convolution unit and the high-frequency feature separation sub-network, respectively.

S503: and adopting a second convolution unit to perform low-frequency feature separation on the low-frequency feature information and outputting a low-frequency feature map.

The second convolution unit is a processing unit for performing low-frequency feature separation on the low-frequency feature information to extract structural features and texture features with lower frequency, which are useful for recovering the high-resolution image, and form a low-frequency feature map. That is, the low-frequency feature map is a feature map formed based on structural features and texture features in the low-frequency feature information at a low frequency. For example, the second convolution unit may employ 1-layer convolution layer, employ a 3×3 convolution kernel, step size of 1, padding (padding) of 1, and the number of convolution kernels of 64.

In the image reconstruction model training method provided by the embodiment, an up-sampling unit is firstly adopted to map an input feature map from a low-resolution space to an up-sampling feature map of a high-resolution space, so that low-frequency feature information corresponding to a current frequency band is effectively fitted from the up-sampling feature map by adopting a first convolution unit, then the low-frequency feature information is utilized to carry out low-frequency feature separation, and the low-frequency feature map is obtained, so that the low-frequency feature map can effectively reflect structural features and texture features with lower frequency in the input feature map.

In one embodiment, as shown in fig. 6, step S402, namely, performing high-frequency feature separation on low-frequency feature information by using a high-frequency feature separation network, outputs a block high-frequency feature map, specifically includes the following steps:

s601: and mapping the low-frequency characteristic information to a low-resolution space by adopting a third convolution unit, and acquiring a first characteristic map.

Wherein the third convolution unit is a processing unit for mapping the low frequency characteristic information to a low resolution space. The first feature map is the output of the third convolution unit. For example, the third convolution unit may employ 1-layer convolution layer, employ a 3×3 convolution kernel, step size of 1, padding (padding) of 1, and the number of convolution kernels of 64. In this example, the third convolution unit acquires the low-frequency feature information input by the first convolution unit, and maps the low-frequency feature information to a low-resolution space to acquire the first feature map. In this example, the upsampling unit maps the input feature map of the low resolution space to the high resolution space; and the third convolution unit fits the up-sampling feature map of the high-resolution space to obtain low-frequency feature information, and maps the low-frequency feature information back to the low-resolution space again to realize reflection projection so as to obtain a first feature map. By using a back-and-forth mapping mode, the structural features and texture features corresponding to the current frequency band in the input feature information can be effectively separated, and the perceived quality of the image formed by image super-resolution reconstruction can be guaranteed.

S602: and performing subtraction operation based on the input feature map and the first feature map, and outputting high-frequency feature information.

In this example, the input feature map is a feature map input to the current block residual network, and may specifically be an original feature map or a block high-frequency feature map output by the previous block residual network. For the current block residual network, the input feature map is an unprocessed feature map, and the first feature map is a feature map formed by fitting low-frequency feature information to the input feature map. Therefore, subtraction is performed on the input feature map and the first feature map to exclude low-frequency feature information from the input feature map, so as to output high-frequency feature information, which is feature information that cannot be fitted directly through the shallow network.

S603: and carrying out feature extraction on the high-frequency feature information by adopting a fourth convolution unit to obtain a second feature map.

The fourth convolution unit is a processing unit for extracting features of the high-frequency feature information. The second feature map is the output of the fourth convolution unit. For example, the fourth convolution unit may use 3 or more convolution layers, each layer uses a convolution kernel of 3×3, the step size is 1, the padding (padding) is 1, the number of convolution kernels is 64, and the feature extraction is performed on the high-frequency feature information by using the multiple convolution layers, so as to ensure that the acquired second feature map may reflect more structural features and texture features.

S604: and carrying out addition operation based on the first characteristic diagram and the second characteristic diagram, and outputting a block high-frequency characteristic diagram.

In this example, the first feature map and the second feature map formed by the low-frequency feature information are added, so that the output block high-frequency feature map can reflect more detailed structural features and texture features; the block high-frequency characteristic diagram is used as an input characteristic diagram of the next block residual error network, so that different block residual error networks are cascaded based on the block high-frequency characteristic diagram, the image frequency is separated more finely, and the recovery of the pseudo high-resolution image containing more detail information is facilitated.

In the image reconstruction model training method provided by the embodiment, the third convolution unit is utilized to fit the up-sampling feature map of the high-resolution space to the low-frequency feature information and map the low-frequency feature information back to the low-resolution space again, so that the first feature map can effectively separate structural features and texture features corresponding to the current frequency segment in the input feature information; then, subtracting operation is carried out based on the input feature map and the first feature map, so that high-frequency feature information which cannot be fitted by the low-frequency feature separation sub-network is extracted; and then extracting the characteristics of the high-frequency characteristic information, and carrying out addition operation on the acquired second characteristic image and the first characteristic image, so that the output block high-frequency characteristic image can reflect finer structural characteristics and texture characteristics, and the image quality of the reconstructed pseudo high-resolution image can be guaranteed, and the reconstructed pseudo high-resolution image is closer to the original high-resolution image.

In an embodiment, the image reconstruction model training method may further use a cyclic countermeasure generation network as a base network of the image reconstruction model, where the cyclic countermeasure generation network is essentially two mirror-symmetrical generation countermeasure networks, and the model is a ring structure, and the two generation networks are used for generating the image reconstruction modelAnd judging the network composition. The two generating networks are the first generating network G and the second generating network F, and the two distinguishing networks are the first distinguishing network D _Y And a second discrimination network D _X Wherein the first generation network G is a generation network for reconstructing a low resolution image into a high resolution image, the second generation network F is a generation network for reconstructing a high resolution image into a low resolution image, and the first discrimination network D _Y Is a discrimination network for discriminating a high resolution image, and a second discrimination network D _X Is a discrimination network for realizing discrimination of a low resolution image.

S701: an original high-resolution image and an original low-resolution image corresponding to the original high-resolution image are acquired.

The implementation steps of step S701 and step S101 are the same, and are not described here in detail to avoid repetition.

S702: inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image; and inputting the pseudo high-resolution image into a second generation network for image low-resolution reconstruction, and obtaining the pseudo low-resolution image.

S703: inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; and inputting the original low-resolution image and the pseudo low-resolution image into a second discrimination network for discrimination, and obtaining a second discrimination result.

S704: inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to perform perception loss calculation, and obtaining a first perception loss; and inputting the original low-resolution image and the pseudo low-resolution image into a perception loss network to perform perception loss calculation, and obtaining a second perception loss.

S705: updating model parameters of the first generation network, the second generation network, the first discrimination network and the second discrimination network based on the first perception loss, the second perception loss, the first discrimination result and the second discrimination result, and acquiring a target generation network based on super-resolution reconstruction.

In an example, step S701 obtains an original low resolution image X and an original high resolution image Y, respectively, and in this example, the original high resolution image Y may be processed by methods such as, but not limited to, nearest neighbor interpolation, bilinear interpolation, mean interpolation, and median interpolation, so as to obtain the original low resolution image X. In step S702, the original high-resolution image Y undergoes image low-resolution reconstruction via the second generation network F to obtain a first pseudo low-resolution image X'; the first pseudo low-resolution image X' is subjected to image super-resolution reconstruction through a first generation network G to obtain a first pseudo high-resolution image Meanwhile, the original low-resolution image X is subjected to image super-resolution reconstruction through a first generation network G, and a second pseudo high-resolution image Y' is obtained; the second pseudo high resolution image Y' is subjected to image low resolution reconstruction through a second generation network F, and a second pseudo low resolution image +.>Accordingly, in step S703, the first discrimination network D _Y For acquiring the original high resolution image Y and the first pseudo high resolution image respectively +.>A discrimination result between the original high resolution image Y and the second pseudo high resolution image Y'. Second discriminant network D _X For obtaining discrimination results between the original low resolution image X and the first pseudo low resolution image X 'and the original low resolution image X and the second pseudo low resolution image X', respectively->And (5) judging the result. When the first discrimination result and/or the second discrimination result does not reach the condition that the two input images cannot be distinguished, calculating the perception loss specifically comprises the following steps:

(A)X→Y'，in the process, the original high resolution image Y and the first pseudo high resolution image +.>The first perceived loss between, or the original high resolution image Y and the second pseudo high resolution image Y' is

Wherein W is _m,n 、H _m,n To perceive the width and height of the nth convolution layer feature map (feature map) before the mth max pooling layer in the loss network, phi _m,n (I ^HR ) _m,n Feature map, phi, representing the nth convolution before the mth max-pooling layer of the original high resolution image Y _m,n (G _θG1 (I ^LR )) _m,n Respectively represents X-Y',an nth convolved feature map preceding an mth maximum pooling layer of a pseudo high resolution image generated via a first generation network G, the pseudo high resolution image being a first pseudo high resolution image->Or a second pseudo high resolution image Y'.

(B)In the Y→X 'process, a second perceived loss between the original low resolution image X and the first pseudo low resolution image X', or the original low resolution image X and the second pseudo low resolution image +.>The second perceived loss between is

Wherein W is _m,n 、H _m,n To perceive the width and height of the nth convolution layer feature map (feature map) before the mth max pooling layer in the loss network, phi _m,n (I ^LR ) _m,n Feature map, phi, representing the nth convolution before the mth max-pooling layer of the original low resolution image X _m,n (G _θG2 (I ^LR )) _x,y ) Respectively representAn nth convolved feature map of the y→x 'preceding the mth largest pooled layer of pseudo low resolution images reconstructed via the second generation network F, the pseudo low resolution images comprising a first pseudo low resolution image X' and a second pseudo low resolution image->

As a further improvement of the above example, in step S701, the original high-resolution image Y is subjected to image low-resolution reconstruction via the second generation network F, and the original low-resolution image X is acquired. Correspondingly, in step S702, the original low-resolution image X is subjected to image super-resolution reconstruction through the first generation network G to obtain a third pseudo high-resolution image y″; the third pseudo high-resolution image y″ is subjected to image low-resolution reconstruction through the second generation network F, and the third pseudo low-resolution image x″ is obtained, so that compared with the image reconstruction process of the previous example, a process of performing super-resolution reconstruction on the low-resolution image by adopting the first generation network G can be saved, the processing time is reduced, and the system overhead is saved. First discrimination network D _Y A first discrimination result for discriminating between the original high-resolution image Y and the third pseudo high-resolution image y″; second discriminant network D _X The method for judging the second judging result between the original low-resolution image X and the third pseudo low-resolution image X' is used for calculating the perception loss when the first judging result and/or the second judging result does not reach the condition that two input images cannot be distinguished, and specifically comprises the following steps:

(A) The original high resolution image Y and the third pseudo high resolution image Y' are first perceived lost as

Wherein W is _m,n 、H _m,n To perceive the width and height of the nth convolution layer feature map (feature map) before the mth max pooling layer in the loss network, phi _m,n (I ^HR ) _m,n Feature map, phi, representing the nth convolution before the mth max-pooling layer of the original high resolution image Y _m,n (G _θG1 (I ^LR )) _x,y ) Respectively representing the feature map of the nth convolution before the mth maximum pooling layer of the third pseudo high resolution image Y "generated via the first generation network G.

(B) The original low resolution image X and the third pseudo low resolution image X' are second perceived loss as

Wherein W is _m,n 、H _m,n To perceive the width and height of the nth convolution layer feature map (feature map) before the mth max pooling layer in the loss network, phi _m,n (I ^LR ) _m,n Feature map, phi, representing the nth convolution before the mth max-pooling layer of the original low resolution image X _m,n (G _θG2 (I ^LR )) _m,n Respectively representing the feature map of the nth convolution before the mth maximum pooling layer of the third pseudo low resolution image X "generated via the second generation network F.

In this example, model parameters of the first and second generation networks G and F in the cyclic countermeasure generation network are continuously optimized according to the obtained second and first perceived losses, and the first discrimination network D is updated by a back propagation algorithm _Y And a second discrimination network D _X To complete the training process of circularly generating the countermeasure network model, and updating the model parametersThe first generation network is determined as a target generation network, and the constraint super-resolution result has a very realistic detail effect and accords with the human eye visual perception image law. In addition, the method for calculating the perceived loss based on the cyclic countermeasure generation network can be related to a specific input image, so that the restored image details can be faithful to original pictures as much as possible, and the method is suitable for some applications pursuing detail authenticity.

In one embodiment, as shown in fig. 8, an image reconstruction method is provided, which is described by taking an application of the image reconstruction method to a computer device as an example, and the image reconstruction method includes the following steps:

S801: and acquiring an image to be processed, and determining the image resolution of the image to be processed.

The image to be processed is an image which needs to be subjected to image recognition or other processing. Generally, according to different application scenarios, the image to be processed is an image to be processed, which is directly acquired in a specific application scenario and needs to be subjected to image recognition or other processing. For example, in an application scenario where an offending vehicle is recognized, the image to be processed may be an image containing vehicle information actually captured by image capturing apparatuses provided on both sides of a road.

S802: and if the image resolution of the image to be processed is smaller than the second resolution threshold, determining the image to be processed as the image to be reconstructed.

The second resolution threshold is a preset threshold for evaluating whether the resolution of the image reaches a standard without super-resolution reconstruction. The image to be reconstructed is an image which needs to be reconstructed in super resolution.

In this example, the image resolution of the image to be processed is compared with a second resolution threshold; if the image resolution of the image to be processed is smaller than the second resolution threshold, the image resolution of the image to be processed is lower, and if the image to be processed is directly utilized for image recognition or other processing, the accuracy of the recognition result or the processing result of the image may be affected, so that the image to be processed can be determined as the image to be reconstructed; if the image resolution of the image to be processed is not smaller than the second resolution threshold, the image resolution of the image to be processed is higher, the image recognition or the subsequent processing can be directly carried out, the image reconstruction is not needed, the accuracy of the image recognition effect or the processing result can be ensured, and the efficiency of the image recognition or other processing can be ensured.

S803: and performing image super-resolution reconstruction on the image to be reconstructed by using a target generation network acquired by an image reconstruction model training method to acquire a target reconstruction image.

In this example, the image to be reconstructed is input into the target generation network trained by the image reconstruction model training method in the above embodiment, and the target generation network is utilized to reconstruct the image to be reconstructed in super resolution, where the reconstruction process is the same as the processes in steps S301-S303, and in order to avoid repetition, details are omitted herein to obtain the target reconstruction image corresponding to the image to be reconstructed. The target reconstruction image is an image obtained after the target generation network performs image super-resolution reconstruction on the image to be reconstructed.

In the image reconstruction method provided by the embodiment, the target generation network is utilized to reconstruct the image to be reconstructed with lower resolution in image super-resolution, so that the target reconstruction image with higher resolution containing texture features and structural features with different scales can be generated, the generated target reconstruction image has higher perception quality, and the subjective perception level of the target reconstruction image is more in accordance with the preference of a human visual system.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data adopted or generated in the process of executing the image reconstruction model training method or used for storing data adopted or generated in the process of executing the image reconstruction method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements an image reconstruction model training method, or an image reconstruction method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the image reconstruction model training method in the above embodiment when executing the computer program, for example, S101-S104 shown in fig. 1, or S101-S104 shown in fig. 2-7, which are not repeated here. The processor executes the computer program to implement the image reconstruction method in the above embodiment, for example, S801 to S803 shown in fig. 8, and in order to avoid repetition, a description thereof will be omitted.

In an embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the image reconstruction model training method in the above embodiment is implemented, and the image reconstruction model training method is, for example, S101-S104 shown in fig. 1 or S101-S104 shown in fig. 2-7, which are not repeated herein. Alternatively, the computer program when executed by the processor implements the image reconstruction method in the above embodiment, for example, S801 to S803 shown in fig. 8, and in order to avoid repetition, a description thereof is omitted here.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An image reconstruction model training method, comprising:

Extracting features of the original low-resolution image to obtain an original feature map; inputting the original feature map into a multi-band block residual error generation network to perform frequency division feature extraction, and obtaining a target frequency feature map and at least two low frequency feature maps; the multi-band block residual error generating network comprises at least two block residual error networks which are sequentially connected in series, the current block residual error network is adopted to perform feature extraction on an input feature map, the low-frequency feature map and the block high-frequency feature map corresponding to the current block residual error network are output, and the block high-frequency feature map is input to the next block residual error network; the input feature map comprises an original feature map or a block high-frequency feature map output by a previous block residual error network; the target high-frequency characteristic diagram is the block high-frequency characteristic diagram output by the last current block residual error network; the low-frequency feature map is a feature map with lower frequency corresponding to a current frequency segment output by a current block residual network, the block high-frequency feature map is a feature map with higher frequency corresponding to the current frequency segment output by the current block residual network, image reconstruction is performed based on the target high-frequency feature map and at least two low-frequency feature maps, and a pseudo high-resolution image corresponding to the original low-resolution image is obtained, and the method comprises the following steps: inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and obtaining a pseudo high-resolution image corresponding to the original low-resolution image; inputting the pseudo high-resolution image into a second generation network to reconstruct the image in low resolution, and obtaining the pseudo low-resolution image;

inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value, wherein the method comprises the following steps of: inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to perform perception loss calculation, and obtaining a first perception loss; inputting the original low-resolution image and the pseudo low-resolution image into a perception loss network to perform perception loss calculation, and obtaining a second perception loss;

updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result to acquire a target generation network based on super-resolution reconstruction, wherein the method comprises the following steps: updating model parameters of the first generation network, the second generation network, the first discrimination network and the second discrimination network based on the first perception loss, the second perception loss, the first discrimination result and the second discrimination result, and acquiring a target generation network based on super-resolution reconstruction.

2. The image reconstruction model training method of claim 1, wherein the current block residual network comprises a low frequency feature separation sub-network and a high frequency feature separation sub-network;

3. The method for training the image reconstruction model according to claim 2, wherein the step of extracting features of the input feature map by using the low-frequency feature separation sub-network according to the current frequency band, obtaining low-frequency feature information, performing low-frequency feature separation based on the low-frequency feature information, and outputting a low-frequency feature map comprises:

4. The image reconstruction model training method as set forth in claim 2, wherein said performing high frequency feature separation on the low frequency feature information using the high frequency feature separation sub-network, outputting a block high frequency feature map, comprises:

5. The image reconstruction model training method of claim 1, wherein said acquiring an original high resolution image and an original low resolution image corresponding to said original high resolution image comprises:

6. An image reconstruction method, comprising:

performing image super-resolution reconstruction on the image to be reconstructed by using the target generation network acquired by the image reconstruction model training method according to any one of claims 1-5, and acquiring a target reconstruction image.

7. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the image reconstruction model training method according to any one of claims 1 to 5 when executing the computer program or the image reconstruction method according to claim 6 when the processor executes the computer program.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the image reconstruction model training method according to any one of claims 1 to 5 or the computer program when executed by a processor implements the image reconstruction method according to claim 6.