CN113837942A

CN113837942A - Super-resolution image generation method, device, equipment and storage medium based on SRGAN

Info

Publication number: CN113837942A
Application number: CN202111129143.9A
Authority: CN
Inventors: 司世景; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2021-12-24

Abstract

The application relates to the technical field of artificial intelligence, and discloses a super-resolution image generation method, device, equipment and storage medium based on SRGAN, which comprises the steps of obtaining an image to be processed; inputting an image to be processed into a pre-trained generation confrontation network model, and obtaining a first image characteristic corresponding to the image to be processed in a first pre-processing layer convolution process of the generation model in the pre-trained generation confrontation network model; weighting each channel in the first image characteristics based on the dependency relationship of each channel in the first image characteristics according to the channel attention layer in the generated model to obtain second image characteristics; and on the basis of a sampling layer in the generated model, improving the resolution of the second image characteristic to obtain a third image characteristic, and performing convolution on the third image characteristic to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image. The application also relates to a block chain technique, in which super-resolution image data is stored. The method and the device improve the quality of super-resolution image generation and avoid the generation of artifacts.

Description

Super-resolution image generation method, device, equipment and storage medium based on SRGAN

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for generating a super-resolution image based on SRGAN.

Background

At present, super-resolution picture generation plays an increasingly important role in the industrial field, the military field and the like. In the prior art, based on a generation countermeasure network (GAN), a generator can generate a high-resolution picture, and a discriminator calculates the difference between the picture and the real high-resolution picture to measure the real degree of the picture. The super-resolution reconstruction technology can overcome the limitation of the intrinsic resolution of an image system and improve most image performances in image processing. However, the quality of the Super-resolution image obtained by the conventional generation neural Network represented by the Super-resolution generation adaptive Network (SRGAN) is poor because the detail part in the image cannot be well learned by the unsupervised learning, and the SRGAN often generates artifacts on high-frequency details. Therefore, how to improve the quality of generating super-resolution images and eliminate artifacts becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a super-resolution image generation method, device, equipment and storage medium based on SRGAN, which aim to solve the problem of poor image quality during super-resolution image generation in the prior art.

In order to solve the above problem, the present application provides a super-resolution image generation method based on SRGAN, including:

acquiring an image to be processed;

inputting the image to be processed into a pre-trained generation countermeasure network model, and performing convolution processing on a first preprocessing layer of the generation model in the pre-trained generation countermeasure network model to obtain a first image feature corresponding to the image to be processed;

weighting each channel in the first image characteristics based on the dependency relationship of each channel in the first image characteristics according to the channel attention layer in the generated model to obtain second image characteristics;

and improving the resolution of the second image characteristic based on the sampling layer in the generated model to obtain a third image characteristic, and performing convolution on the third image characteristic to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image.

Further, before the inputting the image to be processed to the pre-trained generative confrontation network model, the method further includes:

acquiring a low-resolution picture set and a high-resolution image set corresponding to the low-resolution picture set;

inputting the data in the low-resolution picture set into a generation model in the generation countermeasure network model to obtain a corresponding pseudo high-definition image;

and inputting the data in the high-resolution picture set and the pseudo high-definition image corresponding to the data into a discrimination model in the generated countermeasure network model for training until the loss function of the generated countermeasure network model converges.

Further, before the inputting the data in the high resolution picture set and the pseudo high definition image corresponding to the data into the discriminant model in the generative countermeasure network model for training, the method further includes:

and performing data augmentation on each picture data in the high-resolution picture set to obtain a plurality of similar data corresponding to each picture data, taking the plurality of similar data corresponding to each picture data as an augmented picture set, and storing the augmented picture set in the high-resolution picture set.

Further, the discrimination model comprises a second preprocessing layer, a hidden layer feature extraction layer and a full connection layer; the inputting the data in the high-resolution picture set and the pseudo high-definition image corresponding to the data into a discriminant model in the generated countermeasure network model for training comprises:

taking the pseudo high-definition image as a negative example sample, taking the image data in the augmented picture set as a positive example sample, and performing shallow feature extraction processing on the positive example sample and the negative example sample through the second preprocessing layer to obtain a fourth image feature;

performing hidden layer feature extraction on the fourth image feature according to the hidden layer feature extraction layer to obtain a feature vector;

the feature vectors are mapped through the full connection layer to obtain the similarity between the pseudo high-definition images and the corresponding image data in the first picture set;

and adjusting parameters in the discrimination model and calculating a confrontation loss function and a sample loss function in the loss function based on the similarity, the pseudo high-definition image and the image data in the augmented picture set.

Further, the hidden layer feature extraction layer is combined with a contrast learning algorithm, and the calculating a sample loss function in the loss functions includes:

and calculating a sample loss function of the hidden layer feature extraction layer by utilizing a noise contrast estimation function and an inner product function in the contrast learning algorithm.

Further, the acquiring the low resolution image set and the corresponding high resolution image set includes:

sending a calling request to a database, wherein the calling request carries a signature checking token;

and receiving a signature checking result returned by the database, and calling a low-resolution image set and a high-resolution image set corresponding to the low-resolution image set in the database when the signature checking result is passed.

Further, the weighting each channel in the first image feature based on the dependency relationship of each channel in the first image feature according to the channel attention layer in the generative model to obtain a second image feature includes:

processing the first image characteristic through a characteristic extraction layer in the channel attention layer to obtain the space attention of the n-dimensional atlas;

processing the spatial attention through a first pooling layer and a second pooling layer in the channel attention layer respectively to obtain a first attention vector and a second attention vector respectively;

correspondingly adding the first attention vector and the second attention vector to obtain pooling characteristics F;

converting the pooled feature F into an attention map through a sigmoid function;

performing point multiplication on the attention diagram and the spatial attention to obtain a residual block;

and correspondingly adding the first image characteristics and the residual block to obtain second image characteristics.

In order to solve the above-mentioned problems, the present application also provides a super-resolution image generation apparatus based on SRGAN, the apparatus including:

the acquisition module is used for acquiring an image to be processed;

the convolution module is used for inputting the image to be processed into a pre-trained generation countermeasure network model, and performing convolution processing on a first preprocessing layer of the generation model in the pre-trained generation countermeasure network model to obtain a first image feature corresponding to the image to be processed;

a channel attention module, configured to weight, according to a channel attention layer in the generative model, each channel in the first image feature based on a dependency relationship of each channel in the first image feature, to obtain a second image feature;

and the enhancement module is used for improving the resolution of the second image characteristic based on the sampling layer in the generated model to obtain a third image characteristic, and performing convolution on the third image characteristic to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image.

In order to solve the above problem, the present application also provides a computer device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the SRGAN-based super resolution image generation method as described above.

In order to solve the above problem, the present application also provides a non-volatile computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the above-mentioned SRGAN-based super-resolution image generation method.

Compared with the prior art, the method, the device, the equipment and the storage medium for generating the super-resolution image based on the SRGAN have the following advantages that:

obtaining an image to be processed, inputting the image to be processed into a pre-trained generation countermeasure network model, performing convolution processing on a first preprocessing layer of a generation model in the pre-trained generation countermeasure network model to obtain a first image feature corresponding to the image to be processed, weighting each channel in the first image feature according to a channel attention layer in the generation model and based on the dependency relationship of each channel in the first image feature to obtain a second image feature, so that the weight of an important channel, namely a high-frequency feature, is larger, the weight of a channel of a part with small picture effect quality improvement amplitude is reduced, the importance of each feature is obtained, the quality of a subsequent generated picture is improved, the resolution of the second image feature is improved based on a sampling layer in the generation model to obtain a third image feature, performing convolution on the third image feature to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image, the quality of the super-resolution image is greatly improved, and the generation of artifacts is avoided.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for describing the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive effort.

Fig. 1 is a schematic flowchart of a super-resolution image generation method based on SRGAN according to an embodiment of the present application;

fig. 2 is a block diagram of a super-resolution image generation apparatus based on SRGAN according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.

The application provides a super-resolution image generation method based on SRGAN. Referring to fig. 1, a schematic flow chart of a super-resolution image generation method based on SRGAN according to an embodiment of the present application is shown.

In this embodiment, the super-resolution image generation method based on the SRGAN includes:

s1, acquiring an image to be processed;

specifically, the image to be processed is obtained from the database, or the image to be processed input by the user is directly received, where the image to be processed is a low-resolution image or the like.

When the image to be processed is obtained from the database, the database is encrypted for safety, so that when the image to be processed is called, the calling request needs to carry a signature checking token, and only when the signature checking result is passed, the image to be processed in the database is called.

S2, inputting the image to be processed into a pre-trained generation countermeasure network model, and performing convolution processing on a first preprocessing layer of the generation model in the pre-trained generation countermeasure network model to obtain a first image feature corresponding to the image to be processed;

specifically, the image to be processed is input into a preset generation countermeasure network model, wherein the generation countermeasure network model comprises a generation model and a discrimination model. Further, the image to be processed is a generative model which is first input into the generative confrontation network model. And performing feature extraction through a first preprocessing layer of the generated model, wherein the first preprocessing layer is a combination of a convolution layer and an activation function layer, and processing to obtain corresponding first image features.

Specifically, a low-resolution image set and a high-resolution image set corresponding to the low-resolution image set are obtained, and each image set contains a large amount of image data. The generation model obtains a pseudo high-definition image by using data concentrated by the low-resolution image, and calculates a content loss function l according to the pseudo high-definition image and the corresponding high-resolution image_xOnly then, the pseudo high-definition image and the corresponding high-resolution image are calculated and input into a discrimination model, training is carried out to judge whether the pseudo high-definition image output by the generation model is a true image, and a resistance loss function l is calculated according to the results output by the pseudo high-definition image and the discrimination model_gen. By generating models and decisionsContinuous antagonistic training of other models until convergence of a loss function comprising an antagonistic loss function l_genContent loss function l_xAnd sample loss. To obtain the final antagonistic network generation model.

In other embodiments of the present application, the generative model may be pre-trained first. After the pre-training is finished, the pre-training is combined with the discriminant model to carry out the integral training.

The countermeasure network model is trained to obtain the final countermeasure network model to process the data to be processed, and the training improves the processing effect of the generation model, so that the super-resolution image with high quality can be obtained.

Still further, before the inputting the data in the high resolution picture set and the pseudo high definition image corresponding to the data into the discriminant model in the generative countermeasure network model for training, the method further includes:

Specifically, each picture data of the high-resolution picture set is subjected to one or more of horizontal/vertical turning, rotation, scaling, clipping, shearing, translation, contrast, color dithering, noise and the like to obtain similar picture data, and the high-resolution pictures corresponding to the similar picture data are correspondingly stored. Only the data in the high-resolution picture set is augmented, the picture set in the low resolution is not augmented, and the low-resolution picture and the plurality of augmented high-resolution pictures are fully compared in the discrimination model by augmenting the data in the high-resolution picture set during training.

The image data are augmented by adopting a plurality of modes, a plurality of similar data are obtained from the image data, so that the subsequent model training is convenient to use, and a large amount of data are adopted for training, so that the trained model is better.

Still further, the discrimination model comprises a second preprocessing layer, a hidden layer feature extraction layer and a full connection layer; the inputting the data in the high-resolution picture set and the pseudo high-definition image corresponding to the data into the discriminant model for training comprises:

taking the pseudo high-definition image as a negative example sample, taking the image data in the first picture set as a positive example sample, and performing shallow feature extraction processing on the positive example sample and the negative example sample through a second preprocessing layer in the discrimination model to obtain a fourth image feature;

according to a hidden layer feature extraction layer in the discrimination model, hidden layer feature extraction is carried out on the fourth image feature to obtain a feature vector;

the feature vectors are mapped through a full connection layer in the discrimination model to obtain the similarity between the pseudo high-definition image and the corresponding image data in the first picture set;

Specifically, after receiving an input image, the discrimination model firstly obtains a fourth image feature through processing of a convolution layer and an activation function in a second preprocessing layer, and then obtains a fourth image feature through a plurality of identical hidden layer feature extraction layers, wherein the hidden layer feature extraction layers comprise a convolution layer, a BN layer and an activation function layer, the feature vectors are obtained through processing of the hidden layer feature extraction layers, due to the fact that a contrast learning algorithm is introduced into the feature vectors, the feature vectors corresponding to positive examples are mutually attracted, the feature vectors corresponding to negative examples are mutually attracted, the feature vectors corresponding to positive and negative examples are mutually far away, and in order to simplify the calculation process and avoid the operation of introducing a maximum pooling layer into a network structure, the model uses Leaky ReLu as the activation function. And finally, mapping and activating function processing are carried out through the two full-connection layers, and the similarity between the pseudo high-definition image and the corresponding image data in the first picture set is finally obtained. Performing full link layer mapping on the feature vectorBefore, mean pooling is also performed to reduce features and parameters. And continuously competing the discriminant model and the generated model until the numerical value obtained by the discriminant model approaches to be stable. To avoid the gradient vanishing, the generalization capability is enhanced in the discriminant network by using BatchNorm, i.e., BN layer, each time after using the Leaky ReLU function. After the similarity of the image data is obtained, a loss-resisting function l is calculated based on the pseudo high-definition image and the similarity_genAnd calculating a sample loss function in the loss function based on the pseudo high-definition image and the image data in the augmented picture set, and finally adjusting model parameters based on the pseudo high-definition image and the similarity.

By adding a contrast learning algorithm into the hidden layer feature extraction layer for training, the feature vectors extracted by the hidden layer feature extraction layer can better represent images.

Still further, the hidden layer feature extraction layer is combined with a contrast learning algorithm, and the calculating a sample loss function in the loss functions includes:

Specifically, in the loss function of the hidden layer feature extraction layer is calculated by using a noise contrast estimation function and an inner product function in the contrast learning algorithm, feature vectors corresponding to positive samples are gathered in an embedding space, and feature vectors corresponding to negative samples are gathered in the embedding space.

In particular, the function L is estimated by combining the noise contrast_NCEAnd inner product function S_SimCLRProcessing, calculating the loss of sample, i.e.

The feature vectors corresponding to the positive and negative samples are respectively gathered, so that the feature vectors corresponding to the positive and negative samples are far away from each other. The positive example samples belong to a plurality of images obtained by amplifying the same high-definition picture; the negative example sample is a plurality of images obtained by amplifying different high-definition pictures.

Wherein, I_HRepresenting high resolution pictures, I_LRepresenting a low resolution picture, i.e. a picture to be processed; d (I)_H) I.e. the result of the discrimination of the high resolution picture by the discrimination model, G (I)_L) I.e. pseudo high-definition images, D (I) obtained by passing low-resolution images through a generative model_L) V) the discrimination result of the discrimination model on the pseudo high-definition image, v_i ⁽¹⁾For the ith data, v, augmented with the first data⁽²⁾Is the data set augmented with the second data, s is the similarity measure, v_j ⁽²⁾For the j data, v, augmented by the second data_i ⁽²⁾For the ith data augmented with the second data, exp () is an exponential function based on e, v_-i ⁽¹⁾Removing the ith data set from the first data-augmented data set; v. of⁽¹⁾For the data set augmented with the first data, v_-i ⁽¹⁾Removing the ith data set from the data set augmented with the second data; i, j ═ 1,2, … …, N.

The discriminant model is trained by adopting a comparison learning algorithm, and a loss function is calculated, so that positive samples attract each other in an embedding space, negative samples are far away from each other, and the effect of optimizing the model is achieved.

Still further, the acquiring the low resolution image set and the corresponding high resolution image set includes:

Specifically, since the low-resolution image set and the high-resolution image set data corresponding to the low-resolution image set are obtained by autonomous editing, in order to avoid leakage, the low-resolution image set and the high-resolution image set data corresponding to the low-resolution image set are stored in a preset database, so that when the low-resolution image set and the high-resolution image set data corresponding to the low-resolution image set are obtained, the database performs a signature verification step to ensure the safety of data and avoid problems such as data leakage.

The security of the content in the database can be ensured by checking the label, and the leakage of data and the like is avoided.

S3, weighting each channel in the first image characteristics based on the dependency relationship of each channel in the first image characteristics according to the channel attention layer in the generated model to obtain second image characteristics;

specifically, the dependency relationship of each channel in the first image feature is obtained through a channel attention layer, and each channel in the first image is weighted based on the dependency relationship to obtain the second image feature. The dependency of each channel is indicative of the importance of each channel. And a plurality of channel attention layers are arranged in the generative model.

processing the first image feature by a feature extraction layer in the channel Attention layer to obtain the Spatial Attention (SA) of the n-dimensional map;

Specifically, after the first image feature passes through a convolution layer, an activation function layer and a convolution layer in the feature extraction layer, one n-dimensional atlas Spatial Attention (SA) formed by n features is obtained. Then, a first attention vector Fcavg is obtained through processing of a first pooling layer, wherein the first pooling layer comprises an average pooling layer, two convolution layers and an activation function layer; and the spatial attention is processed by a second pooling layer to obtain a second attention vector Fcmax, wherein the second pooling layer comprises a maximum pooling layer, two convolution layers and an activation function layer. Correspondingly adding the first attention vector Fcavg and the second attention vector Fcmax to obtain a pooled feature F, converting the pooled feature F into an attention map AM (attention map) composed of numbers between 0 and 1 through a sigmoid function to represent weights of different features, performing point multiplication on the attention map and the spatial attention to generate a residual block, and finally adding the residual block and the first image feature to obtain a second image feature.

The image is processed by the attention of the multi-layer channels, so that the importance of each channel is obtained, the weight is correspondingly distributed, the weight of the high-frequency characteristic is improved, the finally obtained super-resolution image has a better effect, and the artifact can be removed.

And S4, improving the resolution of the second image feature based on the sampling layer in the generated model to obtain a third image feature, and performing convolution on the third image feature to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image.

Specifically, the sampling layer includes a convolution layer, 2 pixelschuffer layers, and an activation function layer. And two groups of sampling layers are arranged in the generation model, the resolution of the second image characteristic is improved through the sampling layers to obtain a third image characteristic, and finally convolution is carried out to output a high-definition image corresponding to the image to be processed and output the high-definition image. Before the second image characteristic is input into the sampling layer, the second image characteristic is convoluted and normalized, and the obtained result is added with the first image characteristic and then input into the sampling layer.

For the global model architecture of SRGAN, first by taking a low resolution picture I_LAnd its corresponding high resolution picture I_HBy combining low resolution pictures I_LInputting the image data into a generation model G to obtain a pseudo high-definition image G (I)_L) For high resolution picture I_HCarrying out data augmentation to obtain augmented data I_H ⁽¹⁾And I_H ⁽²⁾A pseudo high definition image G (IL) and augmentation data I_H ⁽¹⁾And I_H ⁽²⁾And inputting the result into a discrimination model D to obtain a discrimination result, namely similarity.

Wherein a channel attention module is introduced into the generation model G, a contrast learning algorithm is introduced into the training of the discrimination model D, and the pseudo high-definition image G (I) is also used as the basis_L) And high resolution picture I_HTo calculate the content loss function l_xFrom pseudo high definition image G (I)_L) And calculating the countermeasure loss function L according to the judgment result_Gen ^SR，

Wherein (I)_H)_x,yFor the purpose of a high-resolution image,

to generate a model generated pseudo high definition image,

the probability that the image generated by the generator is a natural image, namely the similarity between the pseudo high-definition image and the high-resolution image, r, W and H are shownThe table indicates the number of pictures, the width and height of the pictures. And training to obtain a final generation confrontation network model.

It is emphasized that, in order to further ensure the privacy and security of the data, all data of the super-resolution image may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Fig. 2 is a functional block diagram of the super-resolution image generation apparatus based on SRGAN according to the present invention.

The super-resolution image generation apparatus 100 based on SRGAN described herein may be installed in an electronic device. According to the implemented functions, the apparatus 100 for generating a super-resolution image based on SRGAN may include an acquisition module 101, a convolution module 102, a channel attention module 103, and an enhancement module 104. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

an obtaining module 101, configured to obtain an image to be processed;

specifically, the obtaining module 101 obtains the image to be processed from the database, or directly receives the image to be processed input by the user, where the image to be processed is a low-resolution image.

The convolution module 102 is configured to input the image to be processed into a pre-trained generative confrontation network model, and perform convolution processing on a first preprocessing layer of the generative model in the pre-trained generative confrontation network model to obtain a first image feature corresponding to the image to be processed;

specifically, the convolution module 102 inputs the image to be processed into a preset generative confrontation network model, where the generative confrontation network model includes a generative model and a discriminant model, and further, the image to be processed is the generative model first input into the generative confrontation network model. And performing feature extraction through a first preprocessing layer of the generated model, wherein the first preprocessing layer is a combination of a convolution layer and an activation function layer, and processing to obtain corresponding first image features.

Further, the apparatus 100 for generating super-resolution images based on SRGAN further comprises a training data acquisition module, a generation module and a discrimination module;

the training data acquisition module is used for acquiring a low-resolution image set and a high-resolution image set corresponding to the low-resolution image set;

the generating module is used for inputting the data in the low-resolution picture set into the generating model to obtain a corresponding pseudo high-definition image;

and the discrimination module is used for inputting the data in the high-resolution picture set and the pseudo high-definition image corresponding to the data into a discrimination model in the generated countermeasure network model for training until the loss function of the generated countermeasure network model is converged.

Specifically, firstly, the training data acquisition module acquires a low-resolution image set and a high-resolution image set corresponding to the low-resolution image set, and each image set contains a large amount of image data. The generation module generates a model, obtains a pseudo high-definition image by using data concentrated by the low-resolution image, and calculates a content loss function l according to the pseudo high-definition image and the corresponding high-resolution image_xOnly then the discrimination module calculates and inputs the pseudo high-definition image and the corresponding high-resolution image into a discrimination model for training to judge whether the pseudo high-definition image output by the generation model is a true image or not, and calculates a resistance loss function l according to the pseudo high-definition image and the result output by the discrimination model_genSaid loss function comprising a penalty loss function/, which is a measure of the loss_genContent loss function l_xAnd sample loss. Through the continuous confrontation training of the generation network and the judgment network, the loss function is converged. To obtain the final antagonistic network generation model.

The countermeasure network model is trained through the cooperation of the training data acquisition module, the generation module and the judgment module to obtain a final generated countermeasure network model to process data to be processed, and the processing effect of the generated model is improved through training, so that a high-quality super-resolution image can be obtained.

Still further, the SRGAN-based super-resolution image generation apparatus 100 further includes an augmentation module;

the augmentation module is used for performing data augmentation on each picture data in the high-resolution picture set to obtain a plurality of similar data corresponding to each picture data, and storing the plurality of similar data corresponding to each picture data in the high-resolution picture set as an augmented picture set.

Specifically, the augmentation module performs one or more of horizontal/vertical flipping, rotation, scaling, clipping, shearing, translation, contrast, color dithering, noise and the like on each picture data of the high-resolution picture set to obtain similar picture data, and correspondingly stores the high-resolution pictures corresponding to the similar picture data.

The augmentation module augments through adopting multiple mode, each picture data obtains a plurality of similar data, and the follow-up model training of being convenient for is used, and adopts a large amount of data training for the model of training is more excellent.

Still further, the discrimination model comprises a second preprocessing layer, a hidden layer feature extraction layer and a full connection layer; the discrimination module comprises a shallow layer feature extraction submodule, a hidden layer feature extraction submodule, a similarity discrimination submodule and a training adjustment submodule;

the shallow feature extraction submodule is used for taking the pseudo high-definition image as a negative example sample and taking the image data in the first image set as a positive example sample, and the positive example sample and the negative example sample are subjected to shallow feature extraction processing through a second preprocessing layer in the discrimination model to obtain a fourth image feature;

a hidden layer feature extraction submodule, configured to perform hidden layer feature extraction on the fourth image feature according to a hidden layer feature extraction layer in the discrimination model, so as to obtain a feature vector;

the similarity judging submodule is used for obtaining the similarity between the pseudo high-definition image and the corresponding image data in the first picture set by mapping the feature vector through a full connection layer in the judging model;

and the training adjustment submodule is used for adjusting parameters in the discrimination model and calculating a confrontation loss function and a sample loss function in the loss function based on the similarity, the pseudo high-definition image and the image data in the augmented picture set.

Specifically, after the input image is received by the discrimination model, the shallow feature extraction submodule obtains a fourth image feature through the processing of the convolution layer and the activation function in the second preprocessing layer, and then the fourth image feature passes through a plurality of same hidden layer feature extraction layers of the hidden layer feature extraction submodule, the hidden layer feature extraction layer comprises a convolution layer, a BN layer and an activation function layer, the feature vector is obtained after the hidden layer feature extraction layer is processed, the feature vectors are introduced with a contrast learning algorithm, so that the feature vectors corresponding to positive samples are mutually attracted, the feature vectors corresponding to negative samples are mutually attracted, and the feature vectors corresponding to the positive and negative samples are far away from each other, and finally the similarity discrimination submodule carries out mapping through two full-connection layers, and activating function processing to finally obtain the similarity between the pseudo high-definition image and the corresponding image data in the first picture set. Before the feature vector is subjected to full-connection layer mapping, mean pooling is also performed to reduce features and parameters. After the similarity of the image data is obtained, the training adjustment submodule calculates an anti-loss function l based on the pseudo high-definition image and the similarity_genAnd calculating a sample loss function in the loss function based on the pseudo high-definition image and the image data in the augmented picture set, and finally adjusting model parameters based on the pseudo high-definition image and the similarity. And continuously competing the discriminant model and the generated model until the numerical value obtained by the discriminant model approaches to be stable. I.e. the loss function tends to be stable.

The shallow layer feature extraction submodule, the hidden layer feature extraction submodule, the similarity discrimination submodule and the training adjustment submodule are matched, so that a contrast learning algorithm is added into the hidden layer feature extraction layer for training, and the feature vector extracted by the hidden layer feature extraction layer can better represent an image.

Still further, the hidden layer feature extraction layer is combined with a contrast learning algorithm, and the training and adjusting submodule comprises: a sample loss unit;

the sample loss unit is used for calculating the sample loss function of the hidden layer feature extraction layer by utilizing the noise contrast estimation function and the inner product function in the contrast learning algorithm.

The sample loss unit calculates the loss function of the hidden layer feature extraction layer by using a noise contrast estimation function and an inner product function in the contrast learning algorithm, so that the feature vectors corresponding to positive samples are gathered in an embedding space, and the feature vectors corresponding to negative samples are gathered in the embedding space.

In particular, the sample loss unit estimates the function L by combining the noise contrast_NCEAnd inner product function S_SimCLRProcessing, calculating the loss of sample, i.e.

The feature vectors corresponding to the positive and negative samples are respectively gathered, so that the feature vectors corresponding to the positive and negative samples are far away from each other.

Wherein, I_HRepresenting high resolution pictures, I_LRepresenting a low resolution picture, i.e. a picture to be processed; d (I)_H) I.e. the result of the discrimination of the high resolution picture by the discrimination model, G (I)_L) Namely a pseudo high-definition image obtained by passing a low-resolution image through a generation modelLike, D (G (I)_L) V) the discrimination result of the discrimination model on the pseudo high-definition image, v_i ⁽¹⁾For the ith data, v, augmented with the first data⁽²⁾Is the data set augmented with the second data, s is the similarity measure, v_j ⁽²⁾For the j data, v, augmented by the second data_i ⁽²⁾For the ith data augmented with the second data, exp () is an exponential function based on e, v_-i ⁽¹⁾Removing the ith data set from the first data-augmented data set; v. of₍₁₎For the data set augmented with the first data, v_-i ⁽¹⁾Removing the ith data set from the data set augmented with the second data; i, j ═ 1,2, … …, N.

Through the sample loss unit, the discriminant model is trained by adopting a comparison learning algorithm, and a loss function is calculated, so that positive samples attract each other in an embedding space, negative samples are far away from each other, and the effect of optimizing the model is achieved.

Still further, the training data acquisition module comprises a request sending submodule and a result receiving submodule;

the request sending submodule is used for sending a calling request to a database, and the calling request carries a signature checking token;

and the result receiving submodule is used for receiving the label checking result returned by the database and calling the low-resolution picture set and the high-resolution picture set corresponding to the low-resolution picture set in the database when the label checking result passes.

The security of the content in the database can be ensured by the way that the request sending submodule and the result receiving submodule are matched for signature verification, and data leakage and the like are avoided.

A channel attention module 103, configured to weight, according to a channel attention layer in the generative model, each channel in the first image feature based on a dependency relationship of each channel in the first image feature, so as to obtain a second image feature;

specifically, the channel attention module 103 obtains a dependency relationship of each channel in the first image feature through a channel attention layer, and weights each channel in the first image based on the dependency relationship to obtain the second image feature. The dependency of each channel is indicative of the importance of each channel. And a plurality of channel attention layers are arranged in the generative model.

Further, the channel attention module 103 further includes a feature extraction sub-module, a pooling sub-module, a first corresponding addition sub-module, an activation sub-module, a point multiplication sub-module, and a second corresponding addition sub-module;

the characteristic extraction submodule is used for processing the first image characteristic through a characteristic extraction layer in the channel Attention layer to obtain the Space Attention (SA) of the n-dimensional map;

the pooling submodule is used for processing the space attention through a first pooling layer and a second pooling layer in the channel attention layer respectively to obtain a first attention vector and a second attention vector respectively;

the first corresponding addition submodule is used for correspondingly adding the first attention vector and the second attention vector to obtain pooling characteristics F;

the activation submodule is used for converting the pooling characteristics F into an attention map through a sigmoid function;

the point multiplication sub-module is used for carrying out point multiplication on the attention diagram and the space attention to obtain a residual block;

and the second corresponding addition submodule is used for correspondingly adding the first image characteristic and the residual block to obtain a second image characteristic.

Specifically, after the first image feature passes through a convolution layer, an activation function layer and a convolution layer in a feature extraction layer of the feature extraction submodule, an n-dimensional atlas Space Attention (SA) formed by n features is obtained. Then, a first attention vector Fcavg is obtained through the processing of a first pooling layer of a pooling submodule, wherein the first pooling layer comprises an average pooling layer, two convolution layers and an activation function layer; and the spatial attention is processed by a second pooling layer to obtain a second attention vector Fcmax, wherein the second pooling layer comprises a maximum pooling layer, two convolution layers and an activation function layer. The first corresponding addition submodule correspondingly adds the first attention vector Fcavg and the second attention vector Fcmax to obtain a pooled feature F, the pooled feature F is converted into an attention map AM (attention map) consisting of numbers between 0 and 1 by activating a sigmoid function of the submodule to represent weights of different features, the point multiplication submodule performs point multiplication on the attention map and the space attention to generate a residual block, and finally the second corresponding addition submodule adds the residual block and the first image feature to obtain a second image feature.

Through the matching of the feature extraction submodule, the pooling submodule, the first corresponding addition submodule, the activation submodule, the point multiplication submodule and the second corresponding addition submodule, the image is processed through the attention of the multilayer channels, so that the importance of each channel is obtained, the weight is correspondingly distributed, the weight of the high-frequency feature is improved, the finally obtained super-resolution image has a better effect, and the artifact can be removed.

The enhancement module 104 is configured to increase the resolution of the second image feature based on the sampling layer in the generated model to obtain a third image feature, perform convolution on the third image feature to obtain a super-resolution image corresponding to the image to be processed, and output the super-resolution image;

specifically, the enhancement module 104 includes a convolutional layer, 2 pixelschuffer layers, and an activation function layer through the sampling layer. And two groups of sampling layers are arranged in the generation model, the resolution of the second image characteristic is improved through the sampling layers to obtain a third image characteristic, and finally convolution is carried out to output a high-definition image corresponding to the image to be processed and output the high-definition image. Before the second image characteristic is input into the sampling layer, the second image characteristic is convoluted and normalized, and the obtained result is added with the first image characteristic and then input into the sampling layer.

By adopting the device, the device 100 for generating the super-resolution image based on the SRGAN greatly improves the quality of the generated super-resolution image and avoids the generation of artifacts by the cooperation of the acquisition module 101, the convolution module 102, the channel attention module 103 and the enhancement module 104.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 3, fig. 3 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of the SRGAN-based super-resolution image generation method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions or processing data stored in the memory 41, for example, computer readable instructions for executing the SRGAN-based super-resolution image generation method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

In this embodiment, when a processor executes a computer readable instruction stored in a memory, the method for generating a super-resolution image based on SRGAN according to the above embodiment is implemented, where an image to be processed is obtained, the image to be processed is input into a pre-trained generative confrontation network model, a first preprocessing layer of the generative model in the pre-trained generative confrontation network model is subjected to convolution processing to obtain a first image feature corresponding to the image to be processed, and according to a channel attention layer in the generative model, each channel in the first image feature is weighted based on a dependency relationship of each channel in the first image feature to obtain a second image feature, so that the weight of an important channel, that is, a high-frequency feature, is greater, the weight of a channel in a portion with a small picture effect quality improvement amplitude is reduced, that is, the importance of each feature is obtained, and further, the quality of a subsequently generated picture is improved, and based on the sampling layer in the generated model, improving the resolution of the second image characteristic to obtain a third image characteristic, and performing convolution on the third image characteristic to obtain a super-resolution image corresponding to the image to be processed and outputting the super-resolution image, so that the quality of the generated super-resolution image is greatly improved, and the generation of artifacts is avoided.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, where computer-readable instructions are stored, and the computer-readable instructions are executable by at least one processor, so as to cause the at least one processor to perform the steps of the above-mentioned super-resolution image generation method based on SRGAN, by obtaining an image to be processed, inputting the image to be processed into a pre-trained generation countermeasure network model, performing convolution processing on a first preprocessing layer of the generation model in the generation countermeasure network model through the pre-trained generation to obtain first image features corresponding to the image to be processed, weighting each channel in the first image features based on a dependency relationship of each channel in the first image features according to a channel attention layer in the generation model to obtain second image features, the method comprises the steps of enabling the weight of an important channel, namely a high-frequency feature to be larger, reducing the weight of a channel of a part with small improvement amplitude of picture effect quality to obtain the importance of each feature, further improving the quality of a subsequent generated picture, improving the resolution of a second image feature based on a sampling layer in a generation model to obtain a third image feature, performing convolution on the third image feature to obtain a super-resolution image corresponding to an image to be processed, outputting the super-resolution image, greatly improving the quality of the generated super-resolution image and avoiding the generation of artifacts.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A super-resolution image generation method based on SRGAN is characterized by comprising the following steps:

acquiring an image to be processed;

2. The SRGAN-based super-resolution image generation method according to claim 1, wherein before said inputting said image to be processed into a pre-trained generative confrontation network model, further comprising:

3. The method for generating super-resolution images based on SRGAN as claimed in claim 2, wherein before inputting the data in the high resolution image set and the pseudo high-definition images corresponding to the data into a discriminant model in the generative countermeasure network model for training, the method further comprises:

4. The SRGAN-based super-resolution image generation method according to claim 3, wherein said discriminant model comprises a second preprocessing layer, a hidden layer feature extraction layer and a full connection layer; the inputting the data in the high-resolution picture set and the pseudo high-definition image corresponding to the data into a discriminant model in the generated countermeasure network model for training comprises:

5. The SRGAN-based super-resolution image generation method of claim 4, wherein said hidden layer feature extraction layer is combined with a contrast learning algorithm, and said calculating a sample loss function of said loss functions comprises:

6. The method of claim 2, wherein the obtaining the set of low resolution pictures and the corresponding set of high resolution pictures comprises:

7. The method of claim 1, wherein the weighting each channel in the first image features according to the channel attention layer in the generation model based on the dependency relationship of each channel in the first image features to obtain the second image features comprises:

8. An apparatus for generating a super-resolution image based on SRGAN, the apparatus comprising:

the acquisition module is used for acquiring an image to be processed;

9. A computer device, characterized in that the computer device comprises:

at least one processor; and the number of the first and second groups,

the memory stores computer readable instructions which, when executed by the processor, implement the SRGAN-based super-resolution image generation method of any of claims 1 to 7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the method for generating the super-resolution SRGAN-based image as claimed in any one of claims 1 to 7.