CN111179177A

CN111179177A - Image reconstruction model training method, image reconstruction method, device and medium

Info

Publication number: CN111179177A
Application number: CN201911409903.4A
Authority: CN
Inventors: 王汝欣; 邱亚军; 陶大鹏
Original assignee: Shenzhen Union Vision Innovation Technology Co ltd
Current assignee: Shenzhen Union Vision Innovation Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-19
Anticipated expiration: 2039-12-31
Also published as: CN111179177B

Abstract

The invention discloses an image reconstruction model training method, an image reconstruction method, equipment and a medium. The method comprises the following steps: acquiring an original high-resolution image and an original low-resolution image; inputting the original low-resolution image into a first generation network for image super-resolution reconstruction to obtain a pseudo high-resolution image; inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; inputting an original high-resolution image and a pseudo high-resolution image into a perception loss network to obtain a perception loss value; updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result, and acquiring a target generation network based on super-resolution reconstruction. The target generation network can reconstruct an image with higher resolution and containing texture features and structural features of different scales, and has higher perceptual quality.

Description

Image reconstruction model training method, image reconstruction method, device and medium

Technical Field

The invention relates to the technical field of image processing, in particular to an image reconstruction model training method, an image reconstruction method, image reconstruction equipment and a medium.

Background

In the field of image processing technology, image resolution is one of the main technical indexes for representing the observation level of an image. The image resolution generally refers to the spatial resolution of an image, and the higher the image resolution is, the finer the scene details can be reflected by the image, and the more abundant the information can be provided. Due to the limitation of objective conditions, high-quality images are not obtained in the real application environment. Therefore, it is very critical to research advanced image super-resolution reconstruction technology to improve the recognition capability and recognition accuracy of the image.

The image super-resolution reconstruction technology is a technology for converting an image with a lower resolution into an image with a higher resolution. The image super-resolution reconstruction technology has important application values in the aspects of monitoring equipment, video communication, satellite images, medical images and the like, and has higher application values in scenes such as super-resolution reconstruction based on human faces, video image visual experience optimization, vehicle identification, industrial equipment fault detection, remote sensing image processing acquired by moving objects, video and image quality assessment and the like.

The current Human Visual System (HSV) adopts a Convolutional Neural Network (CNN) to achieve super-resolution reconstruction, and in the process of model training, pixel loss of a training image in the CNN is directly used to optimize model parameters of the CNN, so that a high Peak Signal to Noise Ratio (PSNR) and structural identity (SSIM) can be obtained, but in the optimization process, in order to minimize loss between pixels, the Convolutional Neural network does not consider the perceptual quality of the image, the reconstructed image is often relatively smooth, the image perceptual quality evaluation (NIQE) is low, and the preference of the Human Visual System is not met in the subjective perception level.

Disclosure of Invention

The embodiment of the invention provides an image reconstruction model training method, an image reconstruction method, equipment and a medium, which are used for solving the problem of low image perception quality generated in the current image super-resolution reconstruction process.

The embodiment of the invention provides an image reconstruction model training method, which comprises the following steps:

acquiring an original high-resolution image and an original low-resolution image corresponding to the original high-resolution image;

inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image;

inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result;

inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value;

updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result, and acquiring a target generation network based on super-resolution reconstruction.

Preferably, the inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image includes:

extracting the features of the original low-resolution image to obtain an original feature map;

inputting the original characteristic diagram into a multi-band block residual error generation network for sub-band characteristic extraction, and acquiring a target high-frequency characteristic diagram and at least two low-frequency characteristic diagrams; the multi-band block residual error generating network comprises at least two block residual error networks which are sequentially connected in series, the current block residual error network is adopted to carry out feature extraction on an input feature map, the low-frequency feature map and the block high-frequency feature map which correspond to the current block residual error network are output, and the block high-frequency feature map is input into the next block residual error network; the input feature map comprises an original feature map or a block high-frequency feature map output by a last block of residual error network; the target high-frequency feature map is a block high-frequency feature map output by the last current block residual error network;

and carrying out image reconstruction based on the target high-frequency characteristic diagram and at least two low-frequency characteristic diagrams to obtain a pseudo high-resolution image corresponding to the original low-resolution image.

Preferably, the current block residual network comprises a low-frequency feature separation sub-network and a high-frequency feature separation sub-network;

the method for extracting the characteristics of the input characteristic graph by adopting the current block residual error network and outputting the low-frequency characteristic graph and the block high-frequency characteristic graph corresponding to the current block residual error network comprises the following steps:

according to the current frequency band, the low-frequency feature separation sub-network is adopted to perform feature extraction on the input feature map to obtain low-frequency feature information, low-frequency feature separation is performed on the basis of the low-frequency feature information, and a low-frequency feature map is output;

and performing high-frequency characteristic separation on the low-frequency characteristic information by adopting the high-frequency characteristic separation sub-network, and outputting a block high-frequency characteristic diagram.

Preferably, the performing, according to the current frequency band, feature extraction on the input feature map by using the low-frequency feature separation sub-network to obtain low-frequency feature information, performing low-frequency feature separation based on the low-frequency feature information, and outputting a low-frequency feature map includes:

an up-sampling unit is adopted to perform up-sampling processing on the input characteristic diagram to obtain an up-sampling characteristic diagram;

performing feature extraction on the up-sampling feature map by adopting a first convolution unit to acquire low-frequency feature information;

and carrying out low-frequency characteristic separation on the low-frequency characteristic information by adopting a second convolution unit, and outputting a low-frequency characteristic diagram.

Preferably, the performing high-frequency feature separation on the low-frequency feature information by using the high-frequency feature separation sub-network, and outputting a block high-frequency feature map includes:

mapping the low-frequency characteristic information to a low-resolution space by adopting a third convolution unit to obtain a first characteristic diagram;

performing subtraction operation based on the input feature map and the first feature map, and outputting high-frequency feature information;

performing feature extraction on the high-frequency feature information by adopting a fourth convolution unit to obtain a second feature map;

and performing addition operation based on the first feature map and the second feature map, and outputting a block high-frequency feature map.

Preferably, the acquiring an original high resolution image and an original low resolution image corresponding to the original high resolution image includes:

acquiring an original training image, and determining the image resolution of the original training image;

if the image resolution of the original training image is larger than a first resolution threshold, determining the original training image as an original high-resolution image;

and carrying out downsampling processing on the original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image.

Preferably, after the acquiring the original high-resolution image and the original low-resolution image corresponding to the original high-resolution image, the image reconstruction model training method includes:

inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image; inputting the pseudo high-resolution image into a second generation network to carry out image low-resolution reconstruction, and acquiring a pseudo low-resolution image;

inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; inputting the original low-resolution image and the pseudo low-resolution image into a second judgment network for judgment to obtain a second judgment result;

inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network for perception loss calculation to obtain a first perception loss; inputting the original low-resolution image and the pseudo low-resolution image into a perception loss network for perception loss calculation to obtain a second perception loss;

updating model parameters of the first generation network, the second generation network, the first discrimination network and the second discrimination network based on the first sensing loss, the second sensing loss, the first discrimination result and the second discrimination result, and acquiring a target generation network based on super-resolution reconstruction

The embodiment of the invention also provides an image reconstruction method, which comprises the following steps:

acquiring an image to be processed, and determining the image resolution of the image to be processed;

if the image resolution of the image to be processed is smaller than a second resolution threshold, determining the image to be processed as an image to be reconstructed;

and performing image super-resolution reconstruction on the image to be reconstructed by adopting a target generation network acquired by an image reconstruction model training method to acquire a target reconstruction image.

The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the above-mentioned image reconstruction model training method when executing the computer program, or implements the above-mentioned image reconstruction method when executing the computer program.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to implement the above image reconstruction model training method, or the computer program is executed by the processor to implement the above image reconstruction method.

The image reconstruction model training method, the image reconstruction method, the device and the medium adopt the generation countermeasure network as a basic network for model training, and perform image super-resolution reconstruction on the original low-resolution image by using the first generation network to acquire a pseudo high-resolution image; calculating the perceptual loss values of the original high-resolution image and the pseudo high-resolution image by using a perceptual loss network, updating model parameters of a first generation network based on the perceptual loss values, updating model parameters of a first discrimination network by a back propagation algorithm, to complete the training process of generating the confrontation network model, determine the first generation network after the model parameters are updated as the target generation network, so that the target generation network can make the texture features and the structural features of the pseudo high-resolution image and the original high-resolution image consistent on different scales, therefore, when the trained target generation network carries out image super-resolution reconstruction on the image with lower resolution, images with higher resolution containing texture features and structural features of different scales can be generated, so that the generated images have higher perception quality and are more in line with the preference of the human visual system in the aspect of subjective perception.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flowchart of an image reconstruction model training method according to an embodiment of the present invention;

FIG. 2 is another flowchart of a method for training an image reconstruction model according to an embodiment of the present invention;

FIG. 3 is another flowchart of a method for training an image reconstruction model according to an embodiment of the present invention;

FIG. 4 is another flowchart of a method for training an image reconstruction model according to an embodiment of the present invention;

FIG. 5 is another flowchart of a method for training an image reconstruction model according to an embodiment of the present invention;

FIG. 6 is another flowchart of a method for training an image reconstruction model according to an embodiment of the present invention;

FIG. 7 is another flowchart of a method for training an image reconstruction model according to an embodiment of the invention;

FIG. 8 is another flow chart of a method for image reconstruction according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The image reconstruction model training method provided by the embodiment of the invention can be applied to computer equipment and is used for training a network model capable of realizing image super-resolution reconstruction, so that a target generation network capable of realizing image super-resolution reconstruction and obtained by utilizing the image reconstruction model training method can be used for carrying out super-resolution reconstruction on an image with lower resolution to obtain an image with higher resolution.

In an embodiment, as shown in fig. 1, an image reconstruction model training method is provided, which is described by taking an example of application of the image reconstruction model training method to a computer device, and the image reconstruction model training method includes the following steps:

s101: an original high resolution image and an original low resolution image corresponding to the original high resolution image are acquired.

The original high-resolution image is an image with higher resolution for model training. The original low resolution image is the lower resolution image used for model training. Understandably, the original low-resolution image is an image which needs to be subjected to image super-resolution reconstruction; accordingly, the original high-resolution image is an image that verifies an image acquired after performing image super-resolution reconstruction on the original low-resolution image. In this example, the original low-resolution image corresponds to the original high-resolution image, which means that the image contents of the original low-resolution image and the original high-resolution image are the same, so that it is feasible to verify the image obtained after performing the image super-resolution reconstruction on the original low-resolution image by using the original high-resolution image, and it is ensured that the target generation network can reconstruct the super-resolution image after the image reconstruction model is trained.

S102: and inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image.

The first generation network is a Generator (Generator) in a generation countermeasure network (GAN), and is a network for capturing potential distributions of real data samples to generate new dummy data samples. In this example, the first generation network is a generation network that needs to be trained by a subsequent model, specifically, a generation network constructed by using a super-resolution technique, and is a generation network that can reconstruct an image with a lower resolution into an image with a higher resolution. The pseudo high-resolution image is an image obtained after super-resolution reconstruction based on the original low-resolution image.

In this example, the original low-resolution image is input to a first generation network constructed based on a super-resolution technique, and image super-resolution reconstruction processing is performed on the original low-resolution image using the first generation network to acquire a pseudo high-resolution image with a higher resolution. As can be understood, since the original low-resolution image corresponds to the original high-resolution image, and the pseudo high-resolution image is an image obtained by performing super-resolution reconstruction on the original low-resolution image, the pseudo high-resolution image also corresponds to the image content of the original high-resolution image, and feasibility of performing discrimination processing and perceptual loss calculation subsequently is ensured.

S103: and inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result.

The first discrimination network is a Discriminator (Discriminator) in a generation countermeasure network (GAN) for discriminating a true data sample from a dummy data sample, and the first discrimination network is a discrimination network that requires model training. The first discrimination result is the probability that the first discrimination network discriminates the original high-resolution image and the pseudo high-resolution image and is output to reflect that the pseudo high-resolution image is the original high-resolution image, and the probability can be understood as the similarity of the original high-resolution image and the pseudo high-resolution image. In this example, the first discriminant network may adopt a discriminant network of the SRAGN, which is not described herein again. As an example, the first discrimination result output by the first discrimination network is a result for discriminating whether the input image is an original high-resolution image or a pseudo high-resolution image.

In the process of carrying out reconstruction model training, a first generation network takes an original low-resolution image as input and takes a pseudo high-resolution image as output; the first discrimination network takes the original high-resolution image and the pseudo high-resolution image as input, and takes a first discrimination result as output; the first generation network and the first discrimination network oppose each other, and the model parameters are updated by a reflection propagation algorithm.

S104: and inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to obtain a perception loss value.

Wherein the perceptual loss network is a network for calculating perceptual loss (perceptual losses) between two input images. Generally, the perceptual loss network comprises a plurality of branch networks, each branch network adopts convolution kernels with different sizes to extract feature information with different scales in an image, and a network of a gap between two input images is generated by defining a perceptual loss function. The perceptual loss value is an output value of the perceptual loss network, and is specifically a loss value determined by the perceptual loss network after performing perceptual loss calculation on the original high-resolution image and the pseudo high-resolution image.

In this example, an original high-resolution image and a pseudo high-resolution image are input into a perceptual loss network together, Feature extraction is performed on the original high-resolution image and the pseudo high-resolution image respectively by using convolution kernels of different layers in the perceptual loss network to extract scale Feature information corresponding to the convolution kernels, and a Feature map (Feature map) corresponding to the original high-resolution image and a Feature map (Feature map) corresponding to the pseudo high-resolution image are obtained respectively; and then, based on a Feature map (Feature map) corresponding to the original high-resolution image and a Feature map (Feature map) corresponding to the pseudo high-resolution image, calculating a perception loss value corresponding to the original high-resolution image and the pseudo high-resolution image, wherein the perception loss value comprises information of multiple scales in the two images, and the method can effectively guide the generation network to generate more detailed texture information.

In one embodiment, the original high-resolution image and the pseudo high-resolution image are respectively input into a perceptual loss network, the perceptual loss network is provided with at least one basic module, each basic module comprises a plurality of branches, and each branch adopts convolution kernels with different sizes. For example, an inclusion module is used as a base module, and the inclusion module comprises four branches, namely a first branch, a second branch, a third branch and a fourth branch from left to right. Wherein the first branch includes a Conv-1 convolutional layer, the Conv-1 convolutional layer employing a 1 × 1 × 64 convolutional kernel. The second branch includes a Conv-3 convolutional layer that employs a 3 × 3 × 128 convolutional kernel and a Conv-1 convolutional layer that employs a 1 × 1 × 96 convolutional kernel. The third branch includes a Conv-5 convolutional layer, which employs a 5 × 5 × 32 convolutional kernel, and a Conv-1 convolutional layer, which employs a 1 × 1 × 16 convolutional kernel. The fourth branch includes a Conv-1 convolutional layer and an MP-3 pooling layer, the Conv-1 convolutional layer employing a 1 × 1 × 31 convolutional kernel; the Mp-3 pooling layer is a 3 × 3 maximum pooling layer. Since a large convolution kernel can extract the global structure information of an image, a small convolution kernel is better at capturing the texture detail information of the image, a perception loss network can extract texture features and structure features of different scales by using convolution kernels of different sizes, and the difference between a fake high-resolution image and an original high-resolution image is measured by defining a perception loss function. In one example, the perceptual loss function is:

wherein, W_i,j、H_i,jCharacterization of the output for each convolution kernel in a perceptual loss networkWidth and height of the map (feature map), phi_i,j(I^HR)_x,yA feature map representing the jth convolution before the ith max pooling layer of the original high resolution image, #_i,j(G_θG(I^LR))_x,y) The characteristic diagram represents the jth convolution before the ith maximum pooling layer of the pseudo high-resolution image generated by the original low-resolution image through the first generation network, LR is the original low-resolution image, HR is the original high-resolution image, SR is the pseudo high-resolution image, and x and y respectively refer to the x coordinate and the y coordinate of a pixel point in the characteristic diagram.

S105: updating model parameters of the first generation network and the first discrimination network based on the perception loss value and the first discrimination result, and acquiring a target generation network based on super-resolution reconstruction.

The target generation network is a generation network formed by a first generation network constructed based on a super-resolution technology and subjected to model training to update model parameters. The target discrimination network is formed by the first discrimination network through model training to update model parameters.

In this example, the model parameters of the first generation network are optimized by using the perceptual loss value output by the perceptual loss network, the model parameters of the first discrimination network are updated by using a back propagation algorithm, and the above steps are repeatedly executed until the first discrimination result output by the first discrimination network is that whether the input image is the original high-resolution image or the pseudo high-resolution image cannot be discriminated, the generation countermeasure network model training is determined to be completed, the generation network in which the model parameter updating is completed in the generation countermeasure network model in which the model training is completed is determined as the target generation network, and the discrimination network in which the model parameter updating is completed in the generation countermeasure network model in which the model training is completed is determined as the target discrimination network.

Model parameters are updated compared to pixel loss employed in the CNN training process, so that the generation network will force the generated pseudo high-resolution image to follow a learned distribution to make it inseparable from the true high-resolution image, but the difference network ignores the inter-sample relationship, making its restored image possibly excessively similar or removing important visual features. The model parameters are updated by using the perception loss value, namely, the model parameters of the first generation network are continuously adjusted and optimized through the perception loss value to guide the texture features and the structural features of the pseudo high-resolution image and the original high-resolution image to be consistent on different scales, so that the trained target generation network can generate an image with higher resolution containing the texture features and the structural features of different scales when performing image super-resolution reconstruction on an image with lower resolution in the subsequent process, the generated image has higher perception quality, and the image better conforms to the preference of a human visual system on a subjective perception level.

In the image reconstruction model training method provided by the embodiment, a generated countermeasure network is used as a basic network for model training, and a first generation network is used for performing image super-resolution reconstruction on an original low-resolution image to obtain a pseudo high-resolution image; calculating the perceptual loss values of the original high-resolution image and the pseudo high-resolution image by using a perceptual loss network, updating model parameters of a first generation network based on the perceptual loss values, updating model parameters of a first discrimination network by a back propagation algorithm, to complete the training process of generating the confrontation network model, determine the first generation network after the model parameters are updated as the target generation network, so that the target generation network can make the texture features and the structural features of the pseudo high-resolution image and the original high-resolution image consistent on different scales, therefore, when the trained target generation network carries out image super-resolution reconstruction on the image with lower resolution, images with higher resolution containing texture features and structural features of different scales can be generated, so that the generated images have higher perception quality and are more in line with the preference of the human visual system in the aspect of subjective perception.

In an embodiment, as shown in fig. 2, step S101, namely acquiring an original high resolution image and an original low resolution image corresponding to the original high resolution image, specifically includes the following steps:

s201: and acquiring an original training image, and determining the image resolution of the original training image.

Wherein the original training image is an unprocessed image acquired by the computer device.

As an example, a computer device may obtain an original training image from a database of images and identify the original training image using a resolution identification technique to determine an image resolution of the original training image. The resolution recognition technology is the prior art, and is not described in detail here to avoid redundancy.

S202: and if the image resolution of the original training image is greater than the first resolution threshold, determining the original training image as an original high-resolution image.

Wherein the first resolution threshold is a threshold set in advance for evaluating whether the resolution of the image reaches a threshold that is recognized as a standard that can be used as the original high-resolution image. As an example, the computer device compares the image resolution of each original training image to a first resolution threshold that is preset. If the image resolution of the original training image is greater than the first resolution threshold, the image resolution of the original training image reaches the standard which is determined to be higher in resolution, at the moment, the original training image is determined to be the original high-resolution image, so that when the original high-resolution image is input into the first discrimination network and the perception loss network for processing respectively, the accuracy of a processing result is ensured, the target generation network obtained through training reconstructs the image with lower resolution into the image with higher resolution, and the image quality of the reconstructed image is ensured. Correspondingly, if the image resolution of the original training image is not greater than the first resolution threshold, it is indicated that the image resolution of the original training image does not reach the standard that is regarded as higher resolution, and at this time, if the original training image is directly determined as the original high resolution image, the resolution of the image formed after the image super-resolution reconstruction is performed on the target generation network obtained through training is lower, and the requirement of a specific scene cannot be met.

S203: and carrying out downsampling processing on the original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image.

Among them, down-sampling (Subsampled), also called down-sampling (dowmampled), is a process for reducing an image so that the image fits the image having a large size of a display area or resolution. As an example, if the resolution of an original high-resolution image I is M × N, an original low-resolution image with a resolution of (M/s) × (N/s) can be obtained by performing s-fold down-sampling on the original high-resolution image I, where s is a common divisor of M and N. In this embodiment, methods such as nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation, and the like may be used in the downsampling process.

In this example, after determining the original training image with the image resolution greater than the first resolution threshold as the original high-resolution image, the original high-resolution image is downsampled so that the obtained original low-resolution image and the original high-resolution image have the same image content, and it is feasible to verify that the original high-resolution image is an image obtained after performing image super-resolution reconstruction on the original low-resolution image.

In an embodiment, the step S203 of performing downsampling on the original high-resolution image to obtain an original low-resolution image corresponding to the original high-resolution image specifically includes the following steps: and performing image low-resolution reconstruction on the original high-resolution image by adopting a second generation network to obtain the original low-resolution image corresponding to the original high-resolution image. Wherein the second generating network is a generating network for converting a higher resolution image into a lower resolution image.

In an embodiment, as shown in fig. 3, step S201, that is, inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image includes the following steps:

s301: and extracting the features of the original low-resolution image to obtain an original feature map.

The original feature map is a feature map (Featuremap) obtained by extracting features of the original low-resolution image. In this example, the original low-resolution image is subjected to feature extraction by using multiple layers of convolutional layers, for example, the original low-resolution image may be subjected to feature extraction by using three layers of convolutional layers, each convolutional layer uses a convolution kernel of 3 × 3, the step size is 1, the padding (padding) is 1, the number of convolution kernels is 128, 64 and 64, respectively, and a feature map output by the last convolutional layer is determined to be an original feature map, where the original feature map includes structure information and texture information of multiple different frequency bands.

S302: inputting the original characteristic diagram into a multi-band block residual error generation network for extracting sub-band characteristics to obtain a target high-frequency characteristic diagram and at least two low-frequency characteristic diagrams; the multi-band block residual error generating network comprises at least two block residual error networks which are sequentially connected in series, the current block residual error network is adopted to carry out feature extraction on an input feature map, a low-frequency feature map and a block high-frequency feature map which correspond to the current block residual error network are output, and the block high-frequency feature map is input into the next block residual error network; the input feature map comprises an original feature map or a block high-frequency feature map output by a last block of residual error network; and the target high-frequency feature map is a block high-frequency feature map output by the last current block residual error network.

The multi-band block residual error generation network is a network formed by connecting at least two block residual error networks in series and used for realizing image super-resolution reconstruction. The block residual error network is a basic module for realizing image super-resolution reconstruction, and each block residual error network is used for extracting image characteristics of one frequency band so as to extract structural characteristics and texture characteristics of the frequency band. Correspondingly, the multi-band block residual error generation network is used for performing sub-band feature extraction, specifically, at least two block residual error networks are used for performing feature extraction on an original feature graph according to different frequency bands so as to extract structural features and texture features corresponding to the corresponding frequency bands. The low-frequency feature map is a feature map which is output by the current block residual error network and has a lower low frequency corresponding to the current frequency band, and the low-frequency feature map comprises structural features and texture features corresponding to the current frequency band, wherein the current frequency band is a frequency band of image features to be acquired by the current block residual error network. The block high-frequency feature map is a feature map which is output by the current block residual error network and has a high low frequency corresponding to the current frequency segment, and comprises structural features and texture features which are output by the current block residual error network to a next block residual error network for feature extraction.

For convenience of description, a block residual network currently undergoing feature extraction is defined as a current block residual network in this example; defining a last block residual error network connected with the current block residual error network as a last block residual error network; and defining the next block residual error network connected with the current block residual error network as the next block residual error network. In this example, the 1 st current block residual network performs feature extraction on an original feature map, obtains a low-frequency feature map and a block high-frequency feature map corresponding to a current frequency segment to be extracted by the 1 st current block residual network, outputs the low-frequency feature map to an image reconstruction network, outputs the block high-frequency feature map to the nth (n ≧ 2) current block residual network of the next block residual network (i.e., the 2 nd block residual network) … …, performs feature extraction on the block high-frequency feature map input by the last block residual network (i.e., the n-1 th current block residual network) of the current block residual network, obtains the low-frequency feature map and the block high-frequency feature map corresponding to the current frequency segment to be extracted by the nth current block residual network, outputs the low-frequency feature map to the image reconstruction network, and outputs the block high-frequency feature map to the next block residual network (i.e., the n +1 th block residual network), and in analogy, defining the block high-frequency feature map output by the last current block residual error network as a target high-frequency feature map, and outputting the target high-frequency feature map and the low-frequency feature map output by each current block residual error network to an image reconstruction network for image reconstruction. Understandably, the frequency of the low-frequency feature map output in each block residual network is lower than that of the block high-frequency feature map; the input characteristic diagram of the residual error network of the nth (n is more than or equal to 2) current block is a high-frequency characteristic diagram output by the previous residual error network except the 1 st current block residual error network, the input characteristic diagram is processed by the nth current block residual error network, the high-frequency characteristic diagram output by the previous residual error network is further subjected to characteristic extraction, so that a low-frequency characteristic diagram and a high-frequency characteristic diagram corresponding to the residual error network of the nth current block are output, the frequency of the low-frequency characteristic diagram output by the nth current block residual error network is higher than that of the low-frequency characteristic diagram output by the (n-1) th current block residual error network, the frequencies of at least two low-frequency characteristic diagrams are sequentially increased, the frequencies of the low-frequency characteristic diagrams output by the current block residual error networks with different scales are extracted, and the characteristic separation is more detailed.

Because the deep neural network extracts image features through multilayer convolution kernel operation in the learning process, shallow layers (namely shallow convolution kernels, such as a first layer convolution kernel and a second layer convolution kernel in the deep neural network) can learn more simple low-frequency feature information such as edges, corners, textures, geometric shapes and surfaces, and learn less complex abstract high-frequency feature information; the deep layer (namely a deep layer convolution kernel, such as a K layer convolution kernel in a deep neural network, K is more than or equal to 3) can learn more complex and abstract high-frequency characteristic information, in the learning mode, from the frequency perspective, the shallow layer convolution kernel can easily and completely fit out the low-frequency characteristic information, and then the low-frequency characteristic image is obtained by performing characteristic conversion based on the low-frequency characteristic information, so that the low-frequency characteristic image output by each current block residual error network has feasibility; the block high frequency feature map may be understood as a feature map formed by separating a low frequency feature map from an input feature map. In this example, each current block residual error network outputs a low-frequency feature map and a block high-frequency feature map, only the block high-frequency feature map is transmitted into the next block residual error network, and finally, all the current block residual error networks are connected in a cascade manner, and all the low-frequency feature maps and the target high-frequency feature map are connected through concat to serve as the output of the multi-band block residual error generation network. The local residual error learning mode can effectively improve the information and gradient flow of the whole network, so that the image frequency separation is more detailed, and the pseudo high-resolution image containing more detail texture features and structure features is more favorably reconstructed.

S303: and reconstructing an image based on the target high-frequency characteristic diagram and at least two low-frequency characteristic diagrams, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image.

In this example, an image reconstruction network may be used to perform image reconstruction on the target high-frequency feature map and the at least two low-frequency feature maps, and obtain a pseudo high-resolution image corresponding to the original low-resolution image. The image reconstruction network is a network for implementing image reconstruction processing. The image reconstruction network is a network which carries out image reconstruction on a target high-frequency characteristic diagram and at least two low-frequency characteristic diagrams output by the multi-band block residual error generation network so as to output a pseudo high-resolution image corresponding to an original low-resolution image. As an example, the image reconstruction network may perform an addition operation on the at least two low-frequency feature maps and the target high-frequency feature map to construct a pseudo high-resolution image, specifically, perform feature transformation on the target high-frequency feature map and the at least two low-frequency feature maps by using a layer of 3 × 3 convolution kernels (step size is 1, padding (padding) is 1, and the number of convolution kernels is 64) to output the pseudo high-resolution image.

In the image reconstruction model training method provided by this embodiment, a multi-band block residual error generation network is used to perform sub-band feature extraction on an original feature map, a local residual error learning manner is used to extract low-frequency feature maps and target high-frequency feature maps corresponding to corresponding frequency bands output by different block residual error networks, and an image reconstruction network is used to perform image reconstruction on at least two low-frequency feature maps and target high-frequency feature maps, so that a generated pseudo high-resolution image includes feature information of different scales, that is, includes structural features and texture features corresponding to different scales, so that the perceptual quality of the generated pseudo high-resolution image is higher.

In an embodiment, the current block residual network includes a low frequency feature separation sub-network and a high frequency feature separation sub-network. Correspondingly, as shown in fig. 4, the above-mentioned extracting the features of the input feature map by using the current block residual network, and outputting the low-frequency feature map and the block high-frequency feature map corresponding to the current block residual network specifically includes the following steps:

s401: and according to the current frequency band, performing feature extraction on the input feature map by adopting a low-frequency feature separation sub-network to obtain low-frequency feature information, performing low-frequency feature separation based on the low-frequency feature information, and outputting the low-frequency feature map.

The low-frequency feature separation sub-network is a network for realizing low-frequency feature separation of the input feature map. And the current frequency segment is the frequency range of the image characteristics extracted by the residual error network of the current block.

In the present example, according to the current frequency segment corresponding to the residual error network of the current block, a low-frequency feature separation sub-network is adopted to perform feature separation on an input feature map, and low-frequency feature information such as edges, corners, textures, geometric shapes and surfaces is extracted from the input feature map; and then carrying out low-frequency characteristic separation on the low-frequency characteristic information by adopting convolution kernel to obtain a low-frequency characteristic diagram.

Because the deep neural network extracts image features through multilayer convolution kernel operation in the learning process, a shallow layer (namely, a shallow layer convolution kernel, such as a first layer convolution kernel and a second layer convolution kernel in the deep neural network) can learn more simple low-frequency feature information such as edges, corners, textures, geometric shapes, surfaces and the like, only less complex and abstract high-frequency feature information is learned, and a deep layer (namely, a deep layer convolution kernel, such as a K-th layer convolution kernel in the deep neural network, K is more than or equal to 3) can learn more complex and abstract high-frequency feature information. Therefore, the low-frequency characteristic information can be understood as the characteristic information which can be directly fitted by adopting the low-frequency characteristic separation sub-network corresponding to the current frequency band; the low-frequency feature map may be understood as a feature map determined after low-frequency feature separation based on the low-frequency feature information.

S402: and performing high-frequency characteristic separation on the low-frequency characteristic information by adopting a high-frequency characteristic separation sub-network, and outputting a block high-frequency characteristic diagram.

Wherein the high-frequency feature separation sub-network is a network for back-projecting the low-frequency feature information. In this example, the high-frequency feature separation sub-network is used to perform high-frequency feature separation on the low-frequency feature information obtained by fitting and separating the low-frequency feature separation sub-network, and the low-frequency feature information is filtered from the input feature map, so as to form a block high-frequency feature map with the low-frequency feature information filtered.

In the image reconstruction model training method provided by this embodiment, a low-frequency feature separation sub-network is used to separate a low-frequency feature map corresponding to low-frequency feature information from an input feature map, so that the low-frequency feature map can effectively reflect information such as structural features and texture features corresponding to a current frequency band; and then, performing high-frequency characteristic separation on the low-frequency characteristic information of the low-frequency characteristic separation sub-network by using the high-frequency characteristic separation sub-network to obtain a block high-frequency characteristic diagram with low-frequency characteristic information filtered out, taking the block high-frequency characteristic diagram as an input characteristic diagram of a next residual error network, and extracting detail texture characteristics and structural characteristics corresponding to different frequency segments by adopting a local residual error mode, so that a subsequently reconstructed pseudo high-resolution image can reflect the texture characteristics and the structural characteristics of the different frequency segments and has better perception quality.

In an embodiment, as shown in fig. 5, step S401 is to perform feature extraction on an input feature map by using a low-frequency feature separation sub-network according to a current frequency segment, to obtain low-frequency feature information, perform low-frequency feature separation based on the low-frequency feature information, and output the low-frequency feature map, and specifically includes the following steps:

s501: and performing upsampling processing on the input characteristic diagram by using an upsampling unit to obtain an upsampling characteristic diagram.

The up-sampling unit is a unit for performing up-sampling processing on the image to enlarge the image. As an example, the input feature map may be upsampled by using an upsampling subnetwork, where the upsampling subnetwork is specifically a 1-layer deconvolution kernel layer, specifically 64 6 × 6 deconvolution kernels, the step size is 2, and the padding (padding) is 2, and the input feature map in the low resolution space may be mapped to the high resolution space by using the upsampling subnetwork, so as to obtain the upsampled feature map. As another example, the upsampling process may be performed on the input feature map to obtain an upsampled feature map by using sub-pixel convolution or by using an interpolation method to enlarge the input feature map to a target size, so that an upsampling sub-network is not required to be arranged in the low-frequency feature separation sub-network.

S502: and performing feature extraction on the up-sampling feature map by adopting a first convolution unit to acquire low-frequency feature information.

The first convolution unit is a processing unit used for extracting the characteristics of the up-sampling characteristic diagram so as to fit the low-frequency characteristic information. The low-frequency feature information is main feature information extracted from shallow layers including edges, corners, textures, geometric shapes, surfaces and the like extracted from the up-sampling feature map. It can be understood that the low-frequency feature information fitted by the first convolution unit corresponds to the current frequency segment corresponding to the current block residual network, that is, the feature information of shallow layers such as an edge, an angular point, a texture, a geometric shape, and a surface corresponding to the current frequency segment is fitted, and other feature information except the current frequency segment cannot be fitted.

In this example, the first convolution unit is a shallow layer, and specifically, 2 convolution layers may be adopted, each convolution layer adopts 3 × 3 convolution kernels, the step size is 1, the padding (padding) is 1, and the number of convolution kernels is 64. In this example, the first convolution unit may extract more low-frequency feature information and a small amount of high-frequency feature information from the upsampled feature map, and the first convolution unit may input the obtained low-frequency feature information to the second convolution unit and the high-frequency feature separation sub-network, respectively.

S503: and performing low-frequency characteristic separation on the low-frequency characteristic information by adopting a second convolution unit, and outputting a low-frequency characteristic diagram.

The second convolution unit is a processing unit used for carrying out low-frequency feature separation on the low-frequency feature information so as to extract structural features and texture features with lower frequency, which are useful for recovering high-resolution images, and forming a low-frequency feature map. That is, the low frequency feature map is a feature map with a low frequency formed based on the structural features and texture features in the low frequency feature information. For example, the second convolution unit may use 1 convolution layer, 3 × 3 convolution kernels, step size 1, padding (padding) 1, and the number of convolution kernels 64.

In the image reconstruction model training method provided by this embodiment, an upsampling unit is first used to map an input feature map from a low-resolution space to an upsampled feature map of a high-resolution space, so that a first convolution unit is used to effectively fit low-frequency feature information corresponding to a current frequency band from the upsampled feature map, and then the low-frequency feature information is used to perform low-frequency feature separation to obtain a low-frequency feature map, so that the low-frequency feature map can effectively reflect structural features and texture features with lower frequency in the input feature map.

In an embodiment, as shown in fig. 6, in step S402, performing high-frequency feature separation on the low-frequency feature information by using a high-frequency feature separation sub-network, and outputting a block high-frequency feature map, the method specifically includes the following steps:

s601: and mapping the low-frequency characteristic information to a low-resolution space by adopting a third convolution unit to obtain a first characteristic map.

Wherein the third convolution unit is a processing unit for mapping the low frequency feature information to a low resolution space. The first signature is the output of the third convolution unit. For example, the third convolution unit may use 1 convolution layer, 3 × 3 convolution kernels, step size 1, padding (padding) 1, and the number of convolution kernels 64. In this example, the third convolution unit obtains the low-frequency feature information input by the first convolution unit, and maps the low-frequency feature information to the low-resolution space to obtain the first feature map. In this example, the up-sampling unit maps the input feature map of the low resolution space to the high resolution space; and fitting the up-sampling characteristic diagram of the high-resolution space by the third convolution unit to obtain low-frequency characteristic information, and mapping the low-frequency characteristic information back to the low-resolution space again to realize reflection projection so as to obtain the first characteristic diagram. By utilizing a back-and-forth mapping mode, the structural features and the texture features corresponding to the current frequency band in the input feature information can be effectively separated, and the perception quality of the image formed by the image super-resolution reconstruction is favorably ensured.

S602: and performing subtraction operation based on the input feature map and the first feature map, and outputting high-frequency feature information.

In this example, the input feature map is a feature map input to the current block residual error network, and may specifically be an original feature map, or a block high-frequency feature map output by the previous block residual error network. For the residual network of the current block, the input feature map is an unprocessed feature map, and the first feature map is a feature map formed by fitting the input feature map to the low-frequency feature information. Therefore, subtraction is performed on the input feature map and the first feature map to exclude low-frequency feature information from the input feature map, thereby outputting high-frequency feature information that cannot be directly fitted by the shallow network.

S603: and performing feature extraction on the high-frequency feature information by adopting a fourth convolution unit to obtain a second feature map.

The fourth convolution unit is a processing unit for extracting the characteristics of the high-frequency characteristic information. The second signature is the output of the fourth convolution unit. For example, the fourth convolution unit may adopt 3 convolutional layers or more convolutional layers, each convolutional layer adopts 3 × 3 convolutional kernels, the step size is 1, the padding (padding) is 1, the number of convolutional kernels is 64, and the multiple convolutional layers are adopted to perform feature extraction on the high-frequency feature information, so as to ensure that the obtained second feature map can reflect more structural features and texture features.

S604: and performing addition operation based on the first feature map and the second feature map, and outputting a block high-frequency feature map.

In the present example, the first feature map and the second feature map formed by the low-frequency feature information are added, so that the output block high-frequency feature map can reflect more detailed structural features and texture features; the block high-frequency characteristic diagram is used as an input characteristic diagram of the next block of residual error network, so that different block residual error networks are cascaded based on the block high-frequency characteristic diagram, the image frequency is more finely separated, and the recovery of a pseudo high-resolution image containing more detailed information is facilitated.

In the training method for the image reconstruction model provided by this embodiment, the third convolution unit is used to fit the up-sampling feature map of the high-resolution space to obtain the low-frequency feature information, and then the low-frequency feature information is mapped back to the low-resolution space, so that the first feature map can effectively separate the structural features and the texture features corresponding to the current frequency segment in the input feature information; performing subtraction operation based on the input feature map and the first feature map, thereby extracting high-frequency feature information which cannot be fitted by the low-frequency feature separation sub-network; and then, extracting the characteristics of the high-frequency characteristic information, and adding the acquired second characteristic diagram and the acquired first characteristic diagram, so that the output block high-frequency characteristic diagram can reflect more detailed structural characteristics and texture characteristics, and the image quality of the reconstructed pseudo high-resolution image is favorably ensured and is closer to the original high-resolution image.

In an embodiment, the image reconstruction model training method may further adopt a loop countermeasure generating network as a base network of the image reconstruction model, the loop countermeasure generating network is essentially two mirror-symmetric generating countermeasure networks, the model is an annular structure, and the loop countermeasure generating network is composed of two generating networks and a discriminating network. The two generation networks are a first generation network G and a second generation network F, and the two discrimination networks are a first discrimination network D_YAnd a second discrimination network D_XWherein the first generation network G is a generation network for reconstructing a low resolution image into a high resolution image, the second generation network F is a generation network for reconstructing a high resolution image into a low resolution image, and the first discrimination network D_YIs a discrimination network for discriminating a high-resolution image, a second discrimination network D_XThe network is a discrimination network for discriminating a low-resolution image.

S701: an original high resolution image and an original low resolution image corresponding to the original high resolution image are acquired.

Step S701 is the same as the step S101, and is not repeated here to avoid repetition.

S702: inputting the original low-resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high-resolution image corresponding to the original low-resolution image; and inputting the pseudo high-resolution image into a second generation network for image low-resolution reconstruction to obtain a pseudo low-resolution image.

S703: inputting the original high-resolution image and the pseudo high-resolution image into a first discrimination network for discrimination to obtain a first discrimination result; and inputting the original low-resolution image and the pseudo low-resolution image into a second judgment network for judgment to obtain a second judgment result.

S704: inputting the original high-resolution image and the pseudo high-resolution image into a perception loss network to perform perception loss calculation, and acquiring a first perception loss; and inputting the original low-resolution image and the pseudo low-resolution image into a perception loss network for perception loss calculation to obtain a second perception loss.

S705: updating model parameters of the first generation network, the second generation network, the first judgment network and the second judgment network based on the first perception loss, the second perception loss, the first judgment result and the second judgment result, and acquiring a target generation network based on super-resolution reconstruction.

In an example, the step S701 respectively acquires the original low-resolution image X and the original high-resolution image Y, and in this example, the original high-resolution image Y may be processed by, but not limited to, nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation, and the like to acquire the original low-resolution image X. In step S702, the original high-resolution image Y is subjected to image low-resolution reconstruction via the second generation network F to obtain a first pseudo low-resolution image X'; the first pseudo low-resolution image X' is subjected to image super-resolution reconstruction through a first generation network G to obtain a first pseudo high-resolution image

Meanwhile, carrying out image super-resolution reconstruction on the original low-resolution image X through a first generation network G to obtain a second pseudo high-resolution image Y'; the second pseudo high-resolution image Y' is subjected to image low-resolution reconstruction through a second generation network F to obtain a second pseudo low-resolution image

Accordingly, in step S703, the first discrimination network D_YFor separately acquiring an original high-resolution image Y and a first pseudo high-resolution image

And a discrimination result between the original high-resolution image Y and the second pseudo high-resolution image Y'. Second discrimination network D_XFor obtaining a discrimination result between the original low-resolution image X and the first pseudo low-resolution image X' and the original low-resolution image X and the second pseudo low-resolution image X, respectively

The result of the discrimination therebetween. When the first judgment result and/or the second judgment result do not reach the condition that the two input images cannot be distinguished, calculating the perception loss, specifically comprising:

(A)X→Y'，

in the process, an original high resolution image Y and a first pseudo high resolution image

Or the original high resolution image Y and the second pseudo high resolution image Y' is

Wherein, W_m,n、H_m,nFor sensing the width and height of the nth convolutional layer feature map before the mth largest pooling layer in the lossy network_m,n(I^HR)_m,nA feature map representing the nth convolution before the mth max pooling layer of the original high resolution image Y, #_m,n(G_θG1(I^LR))_m,nRespectively represent X → Y',

the characteristic diagram of the nth convolution before the mth maximum pooling layer of the pseudo high-resolution image generated by the first generation network G, wherein the pseudo high-resolution image is the first pseudo high-resolution image

Or a second pseudo high resolution image Y'.

(B)

A second perceptual loss between the original low-resolution image X and the first pseudo low-resolution image X 'or the original low-resolution image X and the second pseudo low-resolution image in the Y → X' process

A second perception loss in between is

Wherein, W_m,n、H_m,nFor sensing the width and height of the nth convolutional layer feature map before the mth largest pooling layer in the lossy network_m,n(I^LR)_m,nA feature map representing the nth convolution before the mth max pooling layer of the original low resolution image X, phi_m,n(G_θG2(I^LR))_x,y) Respectively represent

Y → X 'over the second generation network F, the characteristic map of the nth convolution preceding the mth max pooling layer of the pseudo low resolution image reconstructed from the second generation network F, the pseudo low resolution image comprising the first pseudo low resolution image X' and the second pseudo low resolution image

As a further improvement of the above example, in step S701, the original high-resolution image Y is subjected to image low-resolution reconstruction via the second generation network F, and the original low-resolution image X is acquired. Correspondingly, in step S702, the original low-resolution image X is subjected to image super-resolution reconstruction through the first generation network G to obtain a third pseudo high-resolution image Y "; compared with the image reconstruction process of the previous example, the process of performing super-resolution reconstruction on the low-resolution image by using the first generation network G can be saved, the processing time is reduced, and the system overhead is saved. First discrimination network D_YA first discrimination result for discriminating between the original high-resolution image Y and the third pseudo high-resolution image Y ″; second discrimination network D_XFor discriminating original low resolutionCalculating a second judgment result between the image X and the third pseudo low-resolution image X' when the first judgment result and/or the second judgment result do not reach the condition that two input images cannot be distinguished, wherein the second judgment result specifically comprises the following steps:

(A) the original high resolution image Y and the third pseudo high resolution image Y' have a first perceptual loss of

Wherein, W_m,n、H_m,nFor sensing the width and height of the nth convolutional layer feature map before the mth largest pooling layer in the lossy network_m,n(I^HR)_m,nA feature map representing the nth convolution before the mth max pooling layer of the original high resolution image Y, #_m,n(G_θG1(I^LR))_x,y) Respectively representing the feature maps of the nth convolution before the mth maximum pooling layer of the third pseudo high resolution image Y ″ generated via the first generating network G.

(B) The original low resolution image X and the third pseudo low resolution image X' have a second perceptual loss of

Wherein, W_m,n、H_m,nFor sensing the width and height of the nth convolutional layer feature map before the mth largest pooling layer in the lossy network_m,n(I^LR)_m,nA feature map representing the nth convolution before the mth max pooling layer of the original low resolution image X, phi_m,n(G_θG2(I^LR))_m,nRespectively, representing the characteristic maps of the nth convolution before the mth max pooling layer of the third pseudo low resolution image X ", generated via the second generating network F.

In this example, model parameters of the first generation network G and the second generation network F in the loop countermeasure generation network are continuously optimized according to the obtained second perception loss and the first perception loss, and the first discrimination is updated by a back propagation algorithmNetwork D_YAnd a second discrimination network D_XThe model parameters are used for completing a training process of circularly generating an antagonistic network model, the first generation network after the model parameters are updated is determined as a target generation network, and the constraint super-resolution result has a very vivid detail effect and accords with the human eye visual perception image rule. In addition, the mode of generating the network computing perception loss based on the cyclic countermeasure can establish a relationship with a specific input image, so that the recovered image detail can be as faithful as possible to the original image, and the method is suitable for some applications pursuing detail authenticity.

In an embodiment, as shown in fig. 8, an image reconstruction method is provided, which is described by taking an example of the application of the image reconstruction method to a computer device, and the image reconstruction method includes the following steps:

s801: and acquiring an image to be processed, and determining the image resolution of the image to be processed.

The image to be processed is an image which needs image recognition or other processing. Generally, the image to be processed is an image directly acquired under a specific application scene and needing image recognition or other processing according to different application scenes. For example, in an application scenario of illegal vehicle identification, the images to be processed may be images containing vehicle information actually captured by image capturing devices disposed on both sides of a road.

S802: and if the image resolution of the image to be processed is smaller than the second resolution threshold, determining the image to be processed as an image to be reconstructed.

And the second resolution threshold is a preset threshold used for evaluating whether the image resolution reaches a standard without performing super-resolution reconstruction. The image to be reconstructed is an image needing image super-resolution reconstruction.

In the present example, the image resolution of the image to be processed is compared with a second resolution threshold; if the image resolution of the image to be processed is smaller than the second resolution threshold, it is indicated that the image resolution of the image to be processed is lower, and if the image to be processed is directly utilized for image recognition or other processing, the accuracy of the recognition result or the processing result of the image may be affected, so that the image to be processed can be determined as the image to be reconstructed; if the image resolution of the image to be processed is not less than the second resolution threshold, the image resolution of the image to be processed is higher, image recognition or subsequent processing can be directly performed without image reconstruction, the accuracy of an image recognition effect or a processing result can be ensured, and the efficiency of image recognition or other processing can be ensured.

S803: and performing image super-resolution reconstruction on the image to be reconstructed by adopting the target generation network acquired by the image reconstruction model training method to acquire a target reconstruction image.

In this example, the image to be reconstructed is input into the target generation network trained by the image reconstruction model training method of the above embodiment, and the image to be reconstructed is subjected to super-resolution reconstruction by using the target generation network, where the reconstruction process is the same as the process of steps S301 to S303, and is not repeated here to avoid repetition, so as to obtain the target reconstruction image corresponding to the image to be reconstructed. The target reconstruction image is an image obtained after the target generation network carries out image super-resolution reconstruction on the image to be reconstructed.

In the image reconstruction method provided by the embodiment, the target generation network is used for performing image super-resolution reconstruction on the image to be reconstructed with lower resolution, and the target reconstruction image with higher resolution and containing texture features and structural features with different scales can be generated, so that the generated target reconstruction image has higher perceptual quality and better conforms to the preference of a human visual system on a subjective perception level.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data adopted or generated in the process of executing the image reconstruction model training method or used for storing data adopted or generated in the process of executing the image reconstruction model training method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image reconstruction model training method, or to implement an image reconstruction method.

In an embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the image reconstruction model training method in the foregoing embodiments is implemented, for example, S101 to S104 shown in fig. 1 or shown in fig. 2 to 7, which is not described herein again to avoid repetition. The processor executes the computer program to implement the image reconstruction method in the above embodiments, for example, S801 to S803 shown in fig. 8, and details are not repeated here to avoid repetition.

In an embodiment, a computer-readable storage medium is provided, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements an image reconstruction model training method in the foregoing embodiments, for example, the image reconstruction model training method shown in S101-S104 in fig. 1, or shown in fig. 2 to 7, which is not repeated here to avoid repetition. Alternatively, the computer program is executed by a processor to implement the image reconstruction method in the above embodiments, for example, S801 to S803 shown in fig. 8, and details are not repeated here to avoid repetition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An image reconstruction model training method is characterized by comprising the following steps:

2. The training method of image reconstruction model according to claim 1, wherein the inputting the original low resolution image into a first generation network for image super-resolution reconstruction, and acquiring a pseudo high resolution image corresponding to the original low resolution image comprises:

3. The method of claim 2, wherein the current block residual network comprises a low frequency feature separation sub-network and a high frequency feature separation sub-network;

4. The training method of image reconstruction models according to claim 3, wherein the extracting features of the input feature map by using the low-frequency feature separation sub-network according to the current frequency band to obtain low-frequency feature information, performing low-frequency feature separation based on the low-frequency feature information, and outputting a low-frequency feature map includes:

5. The training method of image reconstruction models according to claim 3, wherein the outputting the block high-frequency feature map by performing the high-frequency feature separation on the low-frequency feature information using the high-frequency feature separation sub-network comprises:

6. The method for training an image reconstruction model according to claim 1, wherein the acquiring an original high resolution image and an original low resolution image corresponding to the original high resolution image comprises:

7. The image reconstruction model training method according to claim 1, wherein after the acquiring of the original high resolution image and the original low resolution image corresponding to the original high resolution image, the image reconstruction model training method comprises:

updating model parameters of the first generation network, the second generation network, the first judgment network and the second judgment network based on the first perception loss, the second perception loss, the first judgment result and the second judgment result, and acquiring a target generation network based on super-resolution reconstruction.

8. An image reconstruction method, comprising:

and performing image super-resolution reconstruction on the image to be reconstructed by adopting the target generation network acquired by the image reconstruction model training method of any one of claims 1 to 7 to acquire a target reconstruction image.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the image reconstruction model training method according to any one of claims 1 to 7 when executing the computer program or the processor implements the image reconstruction method according to claim 8 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the image reconstruction model training method according to one of claims 1 to 7, or which, when being executed by a processor, carries out the image reconstruction method according to claim 8.