CN113724139A

CN113724139A - Unsupervised infrared single-image hyper-resolution for generation of countermeasure network based on dual discriminators

Info

Publication number: CN113724139A
Application number: CN202111287047.7A
Authority: CN
Inventors: 冯琳; 张毅; 陈霄宇; 滕之杰; 李怡然; 何丰郴; 魏驰恒; 张靖远
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2021-11-30
Anticipated expiration: 2041-11-02
Also published as: CN113724139B

Abstract

The invention relates to an unsupervised infrared single-image hyper-score for generating a confrontation network based on a double-discriminator, which comprises 1. constructing a learning framework, constructing an unpaired unsupervised learning framework, generating the confrontation network and a content constraint module, wherein the confrontation network comprises a generator and a discriminator, and the learning framework extracts infrared characteristics from images to generate vivid hyper-score infrared images; 2. and (3) building a double-discriminator structure, combining modules and creating a data set. The invention ensures that the super-resolution image has the advantages of low noise, clear edge texture and high contrast; reserving and maintaining low-frequency information through a degraded image of a hyper-resolution image and an image subjected to interpolation amplification in advance; the style-texture double discriminator is utilized to reconstruct the high-frequency information of the image, simultaneously, the harmony and unity of the style of the whole image are kept, and the generation of abnormal pixels is avoided; and improving the structure of the discriminator, and enhancing the reconstruction of the texture detail information by using a true and false discrimination matrix at the pixel level.

Description

Unsupervised infrared single-image hyper-resolution for generation of countermeasure network based on dual discriminators

Technical Field

The invention relates to an unsupervised infrared single-image super-resolution method for generating a confrontation network based on a double-discriminator, belonging to the technical field of image processing.

Background

With the progress of computer technology, the demand for high-quality infrared images in various fields is increasing, such as: video surveillance, medical diagnostics, and remote sensing, among others. However, due to the limitations of the infrared sensor technology, such as the problems of infrared optical diffraction effect, uneven sensor response and the like, the infrared image acquired by the infrared imaging system is often low in resolution, large in noise and poor in contrast; in addition, infrared imaging systems of comparable resolution are far more expensive than visible imaging systems and are therefore difficult to apply widely. The solutions to improve hardware design, such as increasing photoreceptor size and decreasing pixel size, are the most fundamental and straightforward way to improve image quality, but require lengthy technological accumulations and the physical limits of the sensor cannot be avoided. With the development of vision enhancement technology in recent years, the super-resolution vision algorithm realizes the functions of improving the image resolution and improving the image quality, optimizes the imaging quality from a data layer compared with a route of an improved sensor, can more directly enhance the visual impression of data, and has great potential. Therefore, the imaging system limit performance can be expanded by designing the over-resolution reconstruction algorithm of the infrared image to amplify the infrared resolution and supplement high-frequency information, the infrared imaging cost is effectively reduced, and the method has very important significance.

The single-image super-resolution is a super-resolution algorithm for reversely solving a high-resolution image based on a single low-resolution image and used for improving the resolution of a spatial sensor. Considering the common techniques of infrared imaging systems: the non-uniformity correction can irregularly change the imaging characteristics, and compared with the multi-image hyper-resolution method for solving the high-resolution image based on a plurality of low-resolution images, the single-image hyper-resolution method is more suitable for the hyper-resolution reconstruction of the infrared image. The simple graph hyper-score of the visible light image based on the neural network has been developed quite abundantly, but the research result of the simple graph hyper-score of the visible light image cannot be directly applied to the infrared image due to the fact that the imaging principle and the characteristic of the infrared image and the visible light image are different. In addition, the lack of infrared data further exacerbates the lack of infrared image single-map hyper-segmentation studies. Due to low resolution, large noise and poor contrast of the infrared image, compared with the research challenge of the visible light image hyper-resolution technology, the infrared image is more urgent to overcome the problems.

At present, the hyper-differentiation based on the neural network mostly builds an end-to-end supervision network depending on a pairing data set: the high-resolution original sensor image and the low-resolution image generated by bicubic interpolation down-sampling are used as a matched data set, and network training is supervised and constrained by pixel-level loss. However, such supervised training strategies relying on paired datasets are not convincing. Firstly, supervision at the pixel level can cause the pixels generated by the network to tend to be average, and the super-resolution image is too smooth and not fit with the human eye. Secondly, the high-resolution image and the low-resolution image are quite complex nonlinear corresponding relations, the paired data set formed by down sampling through bicubic interpolation leads to the fact that the training network has the tendency of overfitting, and the effect of overdividing when the network is applied to a real image is poor. Aiming at the problem of over-smooth images caused by pixel level supervision, Christian et al introduce a generation countermeasure network and make the hyper-resolution images more approximate to real images through joint constraints such as pixel constraint, countermeasure constraint and feature constraint. After that, some work continues the idea to optimize the hyper-resolution image by improving the constraint mode or the network model. However, the super-resolution image still has a large difference from the natural image, such as a large number of abnormal pixel points, and the like, and therefore, the constraint effect of the discriminator needs to be further improved. Aiming at the problem that the matching data set causes network training overfitting, the solution strategy of mainstream research is basically to design and generate a more complex or more real low-resolution image corresponding to a high-resolution image, so that the composition in the matching data set is enriched, and the characteristics of an ultra-resolution image generated after the real low-resolution image is sent into a network are improved. It is therefore inevitable to search for unsupervised hyper-segmentation algorithms that do not rely on paired datasets. Yuan Yuan et al propose CinCGAN network and Zhi-Song Liu et al propose dSRVAE network to investigate this problem. However, the CinCGAN network relies on the existing supervised hyper-division network model for tuning training, image blurring may be caused by constraint similar to iterative back projection in the dSRVAE network, in addition, the two are researches on visible light images, the improvement in vision and indexes also benefits from a noise reduction module in front of the hyper-division module, and the noise reduction module is not suitable for infrared images with low resolution, poor contrast and low signal-to-noise ratio, so many unsupervised hyper-division algorithms of infrared images independent of paired data sets still have research and breakthrough spaces.

Disclosure of Invention

In order to solve the technical problem, the invention provides an unsupervised infrared single-image overdivision based on a dual-discriminator generation countermeasure network, which has the following specific technical scheme:

generating unsupervised infrared single-graph hyperscoring for a countermeasure network based on a dual discriminator, comprising the steps of:

step 1: constructing a learning framework, constructing an unpaired unsupervised learning framework, wherein the unpaired unsupervised learning framework comprises a generation countermeasure network and a content constraint module, the generation countermeasure network comprises a generator and a discriminator, and the unpaired unsupervised learning framework extracts infrared features from images to generate vivid hyper-resolution infrared images;

step 2: the method comprises the steps of building a double-discriminator structure, building a double-discriminator module structure, simultaneously improving the restraint capability of a discriminator, and enhancing global structure characteristics and local high-frequency details in the generation process of the hyper-resolution image, wherein the double-discriminator is composed of a style discriminator and a texture discriminator, the style discriminator reads a large image block, gives global true and false judgment, aims to restrain generation of abnormal pixel points and maintain unified coordination of overall visual perception of the image, the texture discriminator reads a small image block, gives local true and false judgment, aims to improve reconstruction effect of texture details, and in addition, the style discriminator uses absolute probability, as shown in formulas (1) and (2)

（1）

（2）

In the formula, L_rIs a large image block, L, cropped from a real image_fIs a large image block cut from a super-divided image, D (-) represents the final output of the style discriminator, C_styleRepresenting a style discriminator network, wherein sigma represents a sigmoid function; unlike the style discriminator, the texture discriminator uses the relative average probability to guide the generator to generate more realistic texture information, as shown in equations (3) and (4)

（3）

（4）

In the formula, S_rIs a small image block cropped from the real image, S_fIs a small image block cropped from a super-divided image, D_Ra(. represents the final output of the texture discriminator, C_textureRepresenting a network of texture discriminators, E [. cndot]Calculating a representative mean value;

thus, the generation of the style discriminator counteracts the loss function as shown in equations (5) and (6)

（5）

（6）

Wherein R represents the real image domain, and F represents the super-resolution image domain;

generation of texture discriminator against loss function is shown in equations (7) and (8)

（7）

（8）；

And step 3: the method comprises the following steps that module combination, a content constraint module and a style-texture discriminator module are combined to constrain network convergence, the content constraint module keeps low-frequency information from being damaged and omitted during hyper-resolution, basic content information of an image is reserved, the low-frequency information of a hyper-resolution image is constrained to be consistent with an image amplified in advance through interpolation, and the constraint of the image only containing the low-frequency information of the hyper-resolution image and the image amplified in advance is carried out in a mode of minimizing mean square error, as shown in formula (9)

（9）

In the formula (I), the compound is shown in the specification,

representing content constraints, x is the sensor image, r is an r-order estimate based on pixels, U is an interpolation function, W and H are target sizes, ψ is a function to extract low frequency information of a hyper-divided image, f_θIs a hyper-divided network, and the total loss function is shown in formula (10)

（10）

In the formula, α, β and γ represent eachThe weight of the constraint(s) is,

representing the constraints of the style discriminator,

representing texture discriminator constraints;

and 4, step 4: the method comprises the steps of creating a data set, creating an infrared super-resolution data set and providing a non-reference evaluation method for evaluating infrared hyper-resolution images, wherein the data set comprises a simulation data set and an infrared image data set, the simulation data set carries out quantitative and qualitative analysis on an ablation experiment, and the infrared image data set compares the reconstruction effect of a hyper-resolution algorithm on a real image.

Further, the unpaired unsupervised learning framework in the step 1 is provided with an unsupervised hyper-segmentation network, the unsupervised hyper-segmentation network is used for converting the low-segmentation images into hyper-segmentation images, the real images do not undergo down-sampling, and the real images directly enter the unsupervised hyper-segmentation network to obtain hyper-segmentation images.

Furthermore, the content constraint module maintains the low-frequency information not to be damaged and not to be omitted when the image is over-divided, and the content information of the image is reserved.

Furthermore, the simulation data set comprises a training set and a testing set, the data of the training set and the data of the testing set are not mutually contained, and the pixels of the images with different resolutions of the testing set accurately correspond to each other after the images with different resolutions are subjected to size scaling.

Further, the training set comprises 400 original images, and the test set comprises 100 original images and 100 high-definition images.

The invention has the beneficial effects that:

the invention ensures that the super-resolution image has the advantages of low noise, clear edge texture and high contrast; reserving and maintaining low-frequency information through a degraded image of a hyper-resolution image and an image subjected to interpolation amplification in advance; the style-texture double discriminator is utilized to reconstruct the high-frequency information of the image, simultaneously, the harmony and unity of the style of the whole image are kept, and the generation of abnormal pixels is avoided; and improving the structure of the discriminator, and enhancing the reconstruction of the texture detail information by using a true and false discrimination matrix at the pixel level.

Drawings

Figure 1 is a flow chart of the method of the present invention,

figure 2 is a schematic diagram of the supervised training method of the present invention,

figure 3 is a schematic diagram of the unsupervised training method of the present invention,

figure 4 is a schematic diagram of the infrared hyper-resolution algorithm of the present invention for generating a countermeasure network,

figure 5 is a simplified structural diagram of a DRRN of the present invention,

figure 6 is a simulated image of a simulated data set of the present invention,

figure 7 is an infrared image of an infrared image dataset of the invention,

figure 8 is a qualitative assessment of the simulation data set for different framework configurations of the present invention,

figure 9 is a schematic illustration of narrowband extraction of the present invention,

figure 10 is a graph of PSNR quantitative evaluation of different width narrow bands of the present invention,

figure 11 is a schematic illustration I of a hyper-divided image of the invention,

figure 12 is a schematic representation II of a hyper-resolution image of the present invention,

fig. 13 is a schematic III of a hyper-resolution image of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

As shown in fig. 1, the dual-discriminator-based generation of unsupervised infrared single-map hyper-scores for a countermeasure network of the present invention. The network model provided by the invention is mainly inspired by the work of PatchGAN, EnlightENGAN and U-Net GAN. PatchGAN proposes that a discriminator can better constrain high-frequency information based on image blocks (the size of each image block is far smaller than that of an original image), specifically, the discriminator traverses each image block with the size of N × N on an image, discriminates the true-false probability value of each image block, and constrains the high-frequency information by using the average image block probability value; EnlightENGAN provides a global-local discriminator structure, wherein the global discriminator adaptively enhances the local area of an image through small image blocks while enhancing the brightness of an original image in the global range of the image; the U-Net GAN constructs a discriminator of a U-Net structure, an encoder performs down-sampling on input gradually, extracts overall information for discrimination, and then performs up-sampling on the input gradually by a decoder to obtain a probability matrix with the same size as the original input for discrimination of pixel level, and the discriminator can provide constraint on the global characteristic and the local characteristic of an image.

The training and evaluation of the supervised hyper-score model can be summarized as: and taking the real image as a high-resolution image, performing down-sampling to obtain a low-resolution image, and constructing a pairing data set. In the training process, the hyper-resolution model generates a hyper-resolution image from the low-resolution image, and the generation effect is restrained by the corresponding real high-resolution image; when the model is evaluated, the low-resolution image is sent into the model to generate a super-resolution image, and the super-resolution image is compared with the corresponding original image to give a reference evaluation index. Obviously, the hyper-score model obtained by the training and evaluating mode cannot guarantee the hyper-score effect on the real image. In order to improve the performance of the hyper-resolution model on a real image, the invention provides a novel model training and evaluating mode: no paired dataset is constructed; model training directly depends on an original image to perform pairing-free unsupervised learning; the evaluation of the model does not depend on the reference evaluation index, but directly carries out visual impression reference-free evaluation on the over-scoring result of the real image. The differences between the proposed scheme and the supervised network model scheme are specifically explained in three aspects of data set composition, network training and network evaluation. It is assumed that there is a true image set HR of k true images Y^W*H*C={Y_iI =1,2,3, …, k, where W, H is the image size and C is the number of image channels. Supervised hyperscoring model training relies on a paired dataset S = { X_i,Y_iI =1,2,3, …, k, where the high-resolution image is directly formed by the real image, the low-resolution image is correspondingly generated by the real image through bicubic interpolation downsampling, and the low-resolution image X is generated_i

LR^{W /α*H/α*C}And alpha is the modelA large factor. In contrast, the method proposed here does not rely on the pairing dataset S, but directly on the real image set HR^W ^*H*CThroughout the training test, low score image X_i

LR^{W /α*H/α*C}Will not occur. The model training aims to obtain a network model parameterized by a parameter set theta and realizing the function of hyper-resolution reconstruction by training a data set:

wherein parameter set θ = { W =_1:L,b_1:LRepresents the network weight and bias of each layer of an L-layer network, and the parameter set theta is formed by a loss function L^SRAnd (4) restraining. As shown in fig. 2, a traditional hyperdifferential model training method is a supervised learning process that relies on a paired data set, in which a hyperdifferential reconstruction process from a low-component image x to a real image y is performed, specifically, a parameter set θ is constrained by the paired data set S according to the following formula:

on the contrary, the hyper-segmentation model training method provided by the invention is shown in fig. 3, and directly carries out hyper-segmentation reconstruction on the real image, and is unsupervised learning independent of the paired data set, specifically, the parameter set theta is formed by the real image set HR^W*H*CThe formula of the constraint is as follows:

wherein the content of the first and second substances,

is a true image set HR^W*H*CZhongfei

Of any randomly selected image.

The supervised hyper-resolution model learning is a hyper-resolution reconstruction process from a low-resolution image to a real image, so the performance evaluation of the supervised hyper-resolution model learning is often dependent on a reference evaluation mode, such as PSNR, SSIM and the like. Specifically, if a real image does not belong to the real image set, the low-score image input model is obtained by performing bicubic interpolation and down-sampling on the real image to obtain a super-score image, and the evaluation result is obtained by comparing the super-score image with the real image. Different from the above, the hyper-resolution model in the invention learns the hyper-resolution reconstruction process of the real image, and no reference image is used for evaluation, so that the traditional non-reference evaluation mode in the infrared field is selected to evaluate the model effect. Generally, low frequency information of an image forms the basic gray scale and the main edge structure of the image, determines the basic structure of the image, and high frequency information forms the edges and details of the image, thereby further enhancing the image content. According to theoretical research and practical discussion of residual learning in the hyper-resolution algorithm, the high-resolution image and the low-resolution image share the same low-frequency information, and the high-resolution image contains more high-frequency information than the low-resolution image, so that the hyper-resolution reconstruction process of the image is a process of recovering the high-resolution image containing more high-frequency information from the low-resolution image containing the low-frequency information.

The invention proposes an infrared hyper-resolution algorithm based on a generative countermeasure network, as shown in fig. 4. The generator is dedicated to restoring and generating high-frequency information, and the constraint is formed by combining a content constraint module for retaining low-frequency information and a style-texture dual discriminator module for supervising the generation of the high-frequency information. The training process is unsupervised learning that only the original images participate in, independent of the paired data set. The general structure of our proposed UISR. The original image is pre-interpolated to a target size before being fed into the generator. The low frequency information is constrained by the hyper-divided image and the interpolated image. The high frequency information is constrained by a style-texture dual discriminator.

In view of the difficulty of unsupervised learning, the invention adopts a pre-up-sampling mode to realize the enlargement of the image size so as to reduce the difficulty of network generation and promote the generator to concentrate on the learning of high-frequency information. In other words, the original image has been interpolated to a target size before being fed into the generator. Therefore, any generator can be applied as long as the generator meets the characteristic that the input and output sizes of the models are the same and has the function of recovering high-frequency information. The DRRN is taken as an example herein to perform practical tests as a generator network. As shown in fig. 5, which shows a simplified structure of DRRN, the recursive module outlined by a red dashed box actually uses 25 residual units outlined by a green dashed box. As shown in fig. 5, a simplified structure of DRRN. In this simplified structure, the recursion module outlined by the red dashed line contains 2 residual units outlined by the green dashed line. In the recursive module, the corresponding convolutional layers in the residual unit share the same weight. ≧ is the corresponding addition of the single element.

The content constraint module determines basic frames and composition characteristics of the images by the low-frequency information shared by the high-resolution images and the low-resolution images, namely the basic content information of the images, and maintains the low-frequency information to be not damaged or omitted during the super-resolution, and keeps the basic content information of the images. Specifically, the low-frequency information of the hyper-resolution image is constrained to be consistent with the image which is interpolated and amplified in advance. Wherein, the low-frequency information of the hyper-resolution image is extracted by Fourier transform: the method comprises the steps of mapping a hyper-resolution image from a space domain to a frequency domain through Fourier transform, transferring low-frequency information to the center of the frequency domain through frequency shift, intercepting the low-frequency information in a certain range of the center of the frequency domain, and obtaining an image only containing the low-frequency information through Fourier inverse transform. The image constraint for images containing only the low frequency information of the hyper-resolution image and pre-interpolated magnified images is done in a way that minimizes the mean square error:

where x is the sensor image, r is an r-order estimate based on pixels, U is an interpolation function, W and H are target sizes, ψ is a function to extract low frequency information of a hyper-divided image, f_θIs a hyper-divided network.

Style-texture double discriminator. The common method for generating the countermeasure network in the next hyper-resolution is to obtain probability values of respective real samples from a small image block (with the size of 48 pixels by 48 magnification) randomly cropped from a real image and a hyper-resolution image respectively, and then perform countermeasure constraint by the probability values, so that two problems exist in the method: firstly, only small image blocks are used for constraint, which may cause style difference on the image overall situation; secondly, a small image block corresponds to only one probability value, which may result in too weak constraints on texture detail features. Aiming at the two mentioned problems, the invention builds a double-scale discriminator structure to realize the cooperative constraint of the large-scale integral style and the small-scale texture details, and improves the structure of the discriminator to enhance the constraint performance of the texture detail information. The double discriminator is composed of a style discriminator and a texture discriminator: the style discriminator reads the large image block, gives out global true and false judgment, and aims to suppress the generation of abnormal pixel points and maintain the uniform coordination of the overall visual perception of the image; the texture discriminator reads the small image blocks, gives local true and false judgment, and aims to improve the reconstruction effect of texture details. The texture discriminator structure is optimized by the enlightenment of PatchGan, specifically, after an image block is input, an equal-size probability matrix is output, namely, a probability value that the image block is possibly a real sample is not simply given, but an equal-size probability matrix of the possibility that each pixel is the real sample is given, so that the texture detail constraint performance is enhanced. The style-texture discriminator can effectively ensure that the whole style of the super-resolution image is coordinated and unified without abnormal areas, and simultaneously enhance the edge and texture patterns to make the image clearer. Furthermore, the style discriminator uses absolute probabilities, the formula:

wherein L is_rIs a large image block, L, cropped from a real image_fIs a large image block cut from a super-divided image, D (-) represents the final output of the style discriminator, C_styleRepresenting a style discriminator network, and sigma represents a sigmoid function. Unlike the style discriminator, the texture discriminator uses the most recently proposed relative average probability. The relative mean probability estimates the relative mean probability that a true image is truer than a hyper-divided image and the relative mean probability that a hyper-divided image is more false than a true image, which may better guide the generator to generate truer texture information. The specific formula is as follows:

wherein S is_rIs a small image block cropped from the real image, S_fIs a small image block cropped from a super-divided image, D_Ra(. represents the final output of the texture discriminator, C_textureRepresenting a network of texture discriminators, E [. cndot]Represents mean calculation and σ represents sigmoid function. Thus, the generative penalty function of the style discriminator may be defined as follows:

wherein, R represents the real image domain, and F represents the super-divided image domain. Similarly, the generation of the penalty function for texture arbiter can be defined as follows:

wherein, R represents the real image domain, and F represents the super-divided image domain. The content constraint module and the style-texture discriminator module jointly constrain network convergence, and the total loss function can be summarized as follows:

in the formula, α, β, γ represent weights of the respective constraints.

In order to verify the reliability and effectiveness of the method, the invention designs a simulation data set and carries out an ablation experiment, collects a real infrared image and compares the hyper-resolution effect with other hyper-resolution methods. Data sets and implementation details. The experimental part designs and produces two groups of image data, one group is a simulated data set which is produced manually and is used for quantitative and qualitative evaluation of an ablation experiment, and the other group is a real infrared image data set which is collected by an infrared sensor and is used for comparing the reconstruction effect of a hyper-resolution algorithm on a real image. The invention carries out the overdividing based on the original image, does not have the reference image to carry out the quantitative evaluation, although other non-reference indexes can be used for carrying out the evaluation, the invention designs a simulation data set for overcoming the problem of carrying out the quantitative evaluation without the reference image due to the rigor and completeness of scheme verification. The simulation dataset consists of two parts: the training set contains only 400 original images (640 × 640 resolution), and the test set contains 100 original images (640 × 640 resolution) and 100 high-definition images (1280 × 1280 resolution). A partial image of the data set is simulated as shown in fig. 6. The data of the training set and the data of the test set are not mutually contained, and because the data are generated by a program, accurate correspondence between pixels after size scaling can be realized among images with different resolutions of the test set. The real infrared image data set is collected by the same commercial uncooled long-wave infrared camera in different scenes, the scene environment is rich, the scene environment comprises a large number of characters, streets, trees, buildings and the like, and partial images of the real infrared image data set are shown in fig. 7. The original image (640 × 512 resolution) is divided into two parts, 900 as training set and 100 as testing set. The simulation data set and the infrared real image data set adopt the same implementation scheme during training, large image blocks with 256 × 256 original image cutting resolution of the training sets of the simulation data set and the infrared real image data set are sent to a network randomly, 6 small image blocks with 96 × 96 original image cutting resolution are cut randomly in the network on the basis of the large image blocks, and the large image blocks and the small image blocks form the whole data source of a training part. The weights of the total loss function are set as: α =0.001, β =0.001, γ =1, η = 0.001. The network experiments were implemented on a NVIDIA 2080Ti GPU. During the experiment, an Adam optimizer was used, and the learning rate for each layer was set at 0.0001, the batch size at 6, and the epoch at 500.

Ablation experiments, to verify the reliability and effectiveness of the protocol herein, this section designed four comparative structures to be compared on the simulated images. The first framework structure is a supervision network depending on a matched data set, a network model is the same as a text generator, low-score images are generated by original images of a training set through bicubic interpolation down-sampling and correspond to each other to form the matched data set, the low-score images generate hyper-score images through a hyper-score network, and convergence of the network is restrained depending on minimization of mean square error of the hyper-score images and the original images. The second framework structure is a hyper-division network independent of a pairing data set and independent of generation of a countermeasure network, a network model is the same as a generator, an original image generates a high-resolution image through the hyper-division network, low-frequency information is extracted from the high-resolution image to generate a degraded image, and convergence of the network is constrained by mean square error minimization of the degraded image and the original image. The third frame structure is a hyper-division network which does not depend on a pairing data set and only generates countermeasures according to small image blocks, namely, a style discriminator is removed on the basis of the network structure, only the small image blocks are randomly cut from the original image and sent to the network, and the network is constrained to converge by using the same content constraint, feature constraint and countermeasures generation constraint combination. The fourth frame structure is a hyper-division network which does not depend on a pairing data set and only gives out single true and false judgment to generate countermeasures, namely, the output of the texture discriminator is changed to only give out single true and false judgment on the basis of the network structure, other constraint modes are not changed, and network convergence is jointly constrained. After the model provided by the invention and the four structures are trained and converged, the superseparation effect is tested by using the original images in the test set respectively, and the high-definition data in the test set are compared and evaluated, wherein the specific evaluation result is shown in table 1.

TABLE 1 quantitative evaluation of different structures with respect to PSNR index

Structure of the product	Supervision	Style discriminator	Texture discriminator	PSNR
					Structure
1	√	×	×	45.586581
					Structure 2	×	×	×	41.823485
Structure 3	×	×	√	42.336435
					Structure 4	×	√	×	43.290939
Ours	×	√	√	43.944832

Meanwhile, a partial detail of a partial image is given as shown in fig. 8 for visual evaluation comparison. In comparison, it can be seen that, compared with the complete structure proposed herein, the structure 1 is a typical supervised hyper-distribution network relying on paired data sets, and although the network has advantages in reference indexes, the actual visual effect is not ideal, and the gradient mutation region tends to be smooth and fuzzy; the structure II does not depend on counterstudy, high-frequency information of the image cannot be reasonably restricted, and reference evaluation indexes are too low; the third structure lacks of integral style constraint, and abnormal pixel points are possible in a local area, so that the integral style of the image is not uniform; and the structure IV is not accurate enough for controlling the image details, and the texture information of the partial image is lost. In order to further verify the advantages of the complete structure proposed by the present invention compared with the third structure and the fourth structure, narrow bands with different widths are selected along the abrupt gray level change region of the image, and the reconstruction effect is compared among the narrow bands, and the comparison result is shown in fig. 9 and fig. 10, which shows that the structure of the present invention has a stronger high-frequency recovery performance for the texture detail information, i.e., the abrupt gray level change region. The model proposed by the present invention is compared to a number of typical competitive methods including conventional bicubic interpolation, DRRN, WDSR, SRGAN, dSRVAE. The comparison network uses official source codes and performs some adjustments according to requirements: the DRRN model reconstructs a Y channel of a visible light image, and modifies the Y channel into a single channel for reconstructing an infrared image; according to the content of the thesis, the dSRVAE source code is adjusted, and only the unsupervised hyper-partition module is called. After training of each model is finished, performing model inspection by using a test set image of a real infrared image data set, and selecting a classical non-reference evaluation index in the field of infrared images to perform qualitative analysis on the over-scoring result of each method because no reference image exists: 1. a variance (V) reflecting the size of the high frequency part of an image, the greater the contrast of the image; 2. edge Strength (ES): reflecting the sharpness of the edge in an image, namely the amplitude of the gradient of the edge point; 3. information Entropy (IE): the amount of information contained in an image is measured, and the larger the information entropy is, the richer the information contained in the image is, and the better the enhancement effect is. The specific evaluation results are shown in table 2.

Table 2 quantitative evaluation of different hyper-resolution methods of infrared images according to variance, edge intensity and information entropy

	BICUBIC	DRRN	WDSR	SRGAN	DSRVAE	ours
							Variance (variance)	2654.619948	2695.399883	2697.640322	2703.637317	2634.992147	2741.004498
Edge strength	60.68146667	69.58330345	70.0297023	74.14867241	60.57164368	71.25566207
							Entropy of information	7.125904598	7.145049425	7.140949425	7.148155172	7.127842529	7.145855172

The following can be found from the indexes: the SRGAN and the method are most competitive, but according to the visual impression of the following real images, the edge strength and the information entropy of the SRGAN are caused by artificially increasing image noise by a network model to a great extent, and the method can obtain good evaluation on each non-reference evaluation index without introducing additional artificial pixel points. Partial hyperopic images of the methods may also be viewed in fig. 11, 12 and 13. It can be seen that: the hyper-resolution image based on the double-triple interpolation method is fuzzy, the visual impression is poor, and the image is also fuzzy due to the connection of the dSRVAE enhancement based on the iterative back projection and the interpolation method; the super-resolution images of the DRRN and the WDSR have close visual effects and have the problems of noise amplification, image ghost and the like; although the partial hyper-resolution images of the SRGAN sometimes have better clear effect, the images have a plurality of mosaic phenomena on the whole, and give people a sense of visual appearance of disorder and poor picture quality; the method inhibits image noise to a great extent, keeps the image impression to be real, has clearer texture and detail, improves the contrast to a certain extent, and has the best impression effect.

The invention provides an unsupervised infrared image overdividing method independent of a pairing data set. The method aims to improve the super-resolution effect of the real infrared image, so that the super-resolution image has the characteristics of low noise, clear edge texture and high contrast. The reasons for the success of this approach mainly include three keys: 1. reserving and maintaining low-frequency information by using a degraded image of a hyper-resolution image and an image subjected to interpolation amplification in advance; 2. the style-texture double discriminator is utilized to reconstruct the high-frequency information of the image, simultaneously, the harmony and unity of the style of the whole image are kept, and the generation of abnormal pixels is avoided; 3. and improving the structure of the discriminator, and enhancing the reconstruction of the texture detail information by using a true and false discrimination matrix at the pixel level. The ablation experiment verifies that the scheme is real and effective; compared with other hyper-resolution methods for performing the hyper-resolution effect on the real infrared image, the method provided by the invention can obtain a better evaluation result, and meanwhile, the hyper-resolution result has a good visual effect. In the field of infrared imaging, the method has strong technical application and commercial prospect.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. The method is characterized in that the method generates unsupervised infrared single-image hyperscoring of the countermeasure network based on a dual discriminator, and comprises the following steps: the method comprises the following steps:

（1）

（2）

（3）

（4）

（5）

（6）

（7）

（8）；

（9）

In the formula (I), the compound is shown in the specification,

（10）

Wherein, alpha, beta and gamma represent the weight of each constraint,

representing the constraints of the style discriminator,

representing texture discriminator constraints;

2. The dual-discriminator based generation of unsupervised infrared single-map hyper-scores for countermeasure networks of claim 1 wherein: the unpaired unsupervised learning framework in the step 1 is provided with an unsupervised hyper-segmentation network, the unsupervised hyper-segmentation network is used for converting the low-segmentation images into hyper-segmentation images, the real images do not undergo down-sampling, and the real images directly enter the unsupervised hyper-segmentation network to obtain hyper-segmentation images.

3. The dual-discriminator based generation of unsupervised infrared single-map hyper-scores for countermeasure networks of claim 1 wherein: and the content constraint module maintains the low-frequency information to be not damaged and not to be omitted when the image is over-divided, and the content information of the image is reserved.

4. The dual-discriminator based generation of unsupervised infrared single-map hyper-scores for countermeasure networks of claim 1 wherein: the simulation data set comprises a training set and a testing set, the data of the training set and the data of the testing set are not mutually contained, and the pixels of the images with different resolutions of the testing set are accurately corresponding after the images with different resolutions are scaled.

5. The dual-discriminator based generation of unsupervised infrared single-map hyper-scores for countermeasure networks of claim 1 wherein: the training set comprises 400 original images, and the test set comprises 100 original images and 100 high-definition images.