CN117934275A - Super-resolution reconstruction method and device based on detection network feedback - Google Patents

Super-resolution reconstruction method and device based on detection network feedback Download PDF

Info

Publication number
CN117934275A
CN117934275A CN202410065111.4A CN202410065111A CN117934275A CN 117934275 A CN117934275 A CN 117934275A CN 202410065111 A CN202410065111 A CN 202410065111A CN 117934275 A CN117934275 A CN 117934275A
Authority
CN
China
Prior art keywords
image
super
resolution
generator
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410065111.4A
Other languages
Chinese (zh)
Inventor
赵小明
董磊
欧彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202410065111.4A priority Critical patent/CN117934275A/en
Publication of CN117934275A publication Critical patent/CN117934275A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a super-resolution reconstruction method and device based on detection network feedback, wherein the method comprises the following steps: acquiring an image to be processed with a first resolution; inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; the super-division reconstruction model is as follows: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator is carried out by a detection module included in the generation countermeasure network. In the training process of generating the countermeasure network, the numerical form of the target detection result of the detection module is used as feedback information, and the first preset loss function is set in the form of fusing the global image detection rate and the omission factor, so that the detection module and the generator for superminute can be tightly combined, the training of the generator is guided by utilizing the detection performance, and the superminute quality of a single frame image can be improved. In addition, the invention has stronger generalization.

Description

Super-resolution reconstruction method and device based on detection network feedback
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a super-resolution reconstruction method and device based on detection network feedback.
Background
The target detection is an important research topic in computer vision, and has great application prospect in various aspects such as face recognition, intelligent traffic, industrial detection, medical diagnosis and the like. The target detection is mature in the technical method level, but the problem of poor small target detection performance still exists generally, and for the existing target detection algorithm, a huge difference exists between the small target detection performance and the large target detection performance, and according to the evaluation result of the research target detection model on COCO, the evaluation result shows that by taking effect as an example, the AP index of the small target detection is only 12%, and the AP index of the large target detection is 51%.
The small target detection is an important branch of target detection, has important application value in scenes such as security monitoring, unmanned aerial vehicle reconnaissance, automatic driving traffic sign detection and the like, and is a technical problem to be solved in the field of target detection because the small target detection has technical challenges such as less available characteristic information, high positioning accuracy requirement, unbalanced samples and the like.
Currently, methods for improving the detectability of small objects are mainly divided into four categories, including: multi-scale characterization, utilization of contextual information, image super-resolution reconstruction, and region candidates. The multi-scale characterization combines the image details of the shallow characterization with high resolution to perform target positioning and the deep characterization with low resolution to perform target classification by using semantic information, so that the multi-scale characterization combines the image details of the shallow characterization with the semantic information of the deep characterization to complete target detection; the context information refers to the utilization of the connection between the small target characteristic area and the surrounding environment, so that the accuracy of small target detection is improved; the region candidate is to utilize a preset anchor frame to replace a sliding window for traversing, so that the target detection efficiency is improved; the difficult root aimed by the super-resolution reconstruction of the image is that the small target covers few pixel points, and the target detection task is executed for the image with higher resolution after the super-resolution reconstruction is realized on the image.
Because of different image degradation characteristics and inconsistent texture degradation factors of the target objects under different imaging scenes, it is difficult to design a detector universally applicable to small target detection of each scene, so that scientific research teams at home and abroad are changed into researches on super-resolution reconstruction algorithms under specific scenes, sampling is performed by the super-resolution reconstruction algorithms, textures and MTGAN of the target objects under the small target scenes are reconstructed, an input image is subjected to baseline detection, then the target and a background part are distinguished, and finally the identified target is subjected to super-resolution reconstruction. However, the method does not enable the detection network to improve the detection performance of small targets in the images at all, otherwise missed images are not detected due to superdivision, and a plurality of target LR (Low ResoLution ) images need to be processed at one time, so that the computational complexity is greatly increased.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a super-resolution reconstruction method and device based on detection network feedback. The technical problems to be solved by the invention are realized by the following technical scheme:
in a first aspect, the present invention provides a super-resolution reconstruction method based on detection network feedback, including:
Acquiring an image to be processed with a first resolution;
inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; wherein the second resolution is higher than the first resolution, and the super-resolution reconstruction model is: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator by a detection module included in the generating countermeasure network, and the generating countermeasure network further comprises a discriminator.
In one embodiment of the invention, the super-resolution reconstruction model is obtained by training the following steps:
Obtaining training samples in a data set, wherein each training sample comprises a real image and a degradation image of the real image processed by a degradation model, and the real image contains calibration information;
Inputting a preset number of degradation images into a generator in a generating countermeasure network to be trained, and training the generator based on the preset number of degradation images, a first preset loss function, a real image and a current image obtained by the generator after super-resolution reconstruction of the degradation images until the loss value of the first preset loss function meets a preset condition, so as to obtain a trained generator;
The current image finally output by the generator after one training is used as a first image to be input into a detection module, so that the detection module carries out target detection on the first image to obtain a target detection result; the target detection result is as follows: an image with Bounding box information on the basis of the first image;
inputting the first image, the target detection result and the real image with calibration information into a discriminator in the to-be-trained generation countermeasure network, judging the authenticity of the first image based on the target detection result, the calibration information, a second preset loss function and a third preset loss function until the discriminator judges the first image to be true, and determining a trained generator as an ultra-division reconstruction model; the second preset loss function is used for evaluating the target texture reconstruction fineness of the first image through a target detection result, and the third loss function is used for evaluating the fidelity of the first image.
In one embodiment of the present invention, the first predetermined Loss function is a sum of pixel Loss pixel, edge Loss edge, perceived Loss perception, and texture Loss texture.
In one embodiment of the present invention,
Losspixel=Lpixel_L1+Lpixel_L2
In the method, in the process of the invention, I i,j,k respectively represents pixel values of pixel points (I, j) in a current image and a real image in a k channel, and h, w and c represent the height, width and channel number of the current image/the real image;
In the method, in the process of the invention, I i,j represents pixel points (I, j) in the current image and the real image respectively, and E i,j represents edge features extracted at (I, j);
In the method, in the process of the invention, Respectively representing the feature images extracted by the first layer of convolution layer when the generator carries out convolution operation on the current image and the real image, and h l、wl、cl respectively represents the height, the width and the channel number of the feature images extracted by the first layer of convolution layer;
In the method, in the process of the invention, And respectively representing inner products of vectorization feature mapping u and v in the feature map extracted by the first layer convolution layer when the generator carries out convolution operation on the real image and the current image.
In one embodiment of the present invention, inputting the first image, the target detection result and the real image with calibration information into the discriminator in the to-be-trained generation countermeasure network, and judging the authenticity of the first image based on the target detection result, the calibration information, the second preset loss function and the third preset loss function until the judging result of the discriminator on the first image is true, and determining the trained generator as a super-resolution reconstruction model, including:
inputting the first image, the target detection result and the real image with calibration information into a discriminator in the to-be-trained generation countermeasure network, so that the discriminator performs the following steps:
calculating a loss value of a second preset loss function according to the target detection result, the calibration information contained in the real image and the second preset loss function;
calculating a loss value of a third preset loss function according to the first image and the third preset loss function;
Judging whether the loss value of the second preset loss function and the loss value of the third preset loss function meet preset precision or not; if yes, judging that the first image is true, and taking a trained generator as a super-division reconstruction model; if not, after the parameters of the generator are adjusted, returning to the step of inputting the first image, the target detection result and the real image corresponding to the degraded image into the discriminator in the to-be-trained generated countermeasure network.
In one embodiment of the present invention, the calibration information includes ground truth calibration boxes of the respective targets, and the Bounding box information includes Bounding box of the respective targets in the first image output by the detection module;
the second preset loss function is:
In the method, in the process of the invention, (A) n denotes the total number of ground truth calibration frames, (B) n denotes the total number of Bounding box output by the detection module, a r∩Br denotes the overlapping region of ground truth calibration frames a r of the r-th target and Bounding box B r of the r-th target, and a r∪Br denotes the merging region of ground truth calibration frames a r and Bounding box B r of the r-th target.
In one embodiment of the present invention, the third preset loss function is:
wherein D (I) represents a result of judging whether the discriminator is true or not, E I~pdata(I) [ log D (I) ] represents a desire of log D (I), and the true image I belongs to the data set pdata, D (I ') represents a result of judging whether the discriminator is true or not, E I~pg(I') [ log (1-D (I')) represents a desire of log (1-D (I '), and the first image I' belongs to the set pg of all the first images output by the generator after the first training, BCE represents a binary cross entropy, I q、I'q represents the q-th pixel point in the true image I and the first image I ', respectively, and N is the total number of pixel points in the true image I and the first image I'.
In a second aspect, the present invention further provides a super-resolution reconstruction device based on detection network feedback, including:
the acquisition module is used for acquiring the image to be processed with the first resolution;
The super-resolution reconstruction module is used for inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; wherein the second resolution is higher than the first resolution, and the super-resolution reconstruction model is: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator by a detection module included in the generating countermeasure network, and the generating countermeasure network further comprises a discriminator.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a super-resolution reconstruction method and device based on detection network feedback, comprising the following steps: acquiring an image to be processed with a first resolution; inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; the super-division reconstruction model is as follows: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator is carried out by a detection module included in the generation countermeasure network. In the training process of generating the countermeasure network, the numerical form of the target detection result of the detection module is used as feedback information, and the first preset loss function is set in the form of fusing the global image detection rate and the omission factor, so that the detection module and the generator for superdivision can be tightly combined, and the training of the generator is guided by utilizing the detection performance; furthermore, the invention also introduces a second preset loss function for evaluating the reconstruction fineness degree of the target texture of the first image and a third preset loss function for evaluating the fidelity of the first image, which are beneficial to improving the reconstruction quality of the small target texture characteristics of the single frame image and further improving the super-resolution quality of the single frame image.
In addition, the method has stronger generalization, can be combined with different detection algorithms, and improves the detection performance of the subsequent detection network through super-division reconstruction.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
Fig. 1 is a flowchart of a super-resolution reconstruction method based on detection network feedback according to an embodiment of the present invention;
FIG. 2 is a training schematic diagram of a super-resolution reconstruction model provided by an embodiment of the present invention;
FIG. 3 is a diagram of an example of a ground truth calibration frame and Bounding box provided in an embodiment of the invention;
FIG. 4 is a schematic diagram of detection performance calculation of a single target provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating global detection performance calculation provided by an embodiment of the present invention;
FIG. 6a is a low resolution image provided by an embodiment of the present invention;
FIG. 6b is an image reconstructed using ESRGAN superminute networks;
FIG. 6c is an image reconstructed using the super-resolution reconstruction method provided by the present invention;
FIG. 6d is a raw high resolution image provided by an embodiment of the present invention;
FIG. 7a is a graph showing the result of object detection of a low resolution image provided by an embodiment of the present invention;
FIG. 7b is a target detection result of an image reconstructed using ESRGAN superdivision networks;
FIG. 7c is a target detection result of an image reconstructed by the super-resolution reconstruction method provided by the invention;
fig. 8 is a schematic structural diagram of a super-resolution reconstruction method based on detection network feedback according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Fig. 1 is a flowchart of a super-resolution reconstruction method based on detection network feedback according to an embodiment of the present invention. Referring to fig. 1, an embodiment of the present invention provides a super-resolution reconstruction method based on detection network feedback, including:
S101, acquiring an image to be processed with a first resolution;
S102, inputting an image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; the second resolution is higher than the first resolution, and the super-resolution reconstruction model is as follows: the generating countermeasure network further includes a discriminator based on a target detection result obtained by the detection module included in the generating countermeasure network after target detection of the first image output by the generator.
It should be understood that, the super-resolution reconstruction model used in the present embodiment is a pre-trained generator in a generated countermeasure network, and fig. 2 is a training schematic diagram of the super-resolution reconstruction model provided in the present embodiment, as shown in fig. 2, where the generated countermeasure network includes a generator, a discriminator, and a detection module introduced at the front end of the discriminator, and in this embodiment, the training process is tightly combined with the detection performance of the detection module, and the detection result fed back by the detection module is used to guide the training process, so that the obtained generator is better for recovering the texture features of the small target.
The detection module may optionally use YOLOv, however, in some other embodiments of the present application, the detection module may also use any existing target detection network, which is not limited in this respect.
Specifically, the super-resolution reconstruction model is obtained by training the following steps:
S201, acquiring training samples in a data set, wherein each training sample comprises a real image and a degradation image of the real image processed by a degradation model, and the real image contains calibration information;
S202, inputting a preset number of degradation images into a generator in a generating countermeasure network to be trained, and training the generator based on the preset number of degradation images, a first preset loss function, a real image and a current image obtained by the generator after super-resolution reconstruction of the degradation images until a loss value of the first preset loss function meets a preset condition, so as to obtain a generator after one training;
S203, taking the current image finally output by the generator after one training as a first image to be input into a detection module, so that the detection module carries out target detection on the first image to obtain a target detection result; the target detection result is: an image with Bounding box information on the basis of the first image;
S204, inputting the first image, the target detection result and the real image with the calibration information into a discriminator in a to-be-trained generation countermeasure network, judging the authenticity of the first image based on the target detection result, the calibration information, the second preset loss function and the third preset loss function until the discriminator judges that the judgment result of the first image is true, and determining a trained generator as a super-division reconstruction model; the second preset loss function is used for evaluating the target texture reconstruction fineness of the first image through the target detection result, and the third loss function is used for evaluating the fidelity of the first image.
In this embodiment, the data set includes a plurality of training samples, each training sample is composed of a real image and a corresponding degradation image thereof, where the real image contains calibration information, and the degradation image is a Low-Resolution (LR) image, and may be obtained by performing degradation processing on the real image using an existing degradation model, so as to simulate degradation of a small target imaging scene.
Considering that the present invention is based on improving the small target detection performance, the loss function of the existing GAN network for super-resolution reconstruction is not sound, and is mainly expressed in that: texture information, feature details, etc. of small targets in the present direction recovery image, but there is currently no explicit penalty function to guide the training of the generator and the qualification of the discriminator to determine the completeness of the GAN network training. In view of this, in step S202, the first predetermined Loss function adopted by the generator in training is the sum of the pixel Loss pixel, the edge Loss edge, the perceived Loss perception and the texture Loss texture.
By way of example only, and in an illustrative,
Losspixel=Lpixel_L1+Lpixel_L2 (1)
In the method, in the process of the invention, I i,j,k represents the pixel values of the pixel points (I, j) in the current image and the real image in the k channels, and h, w and c represent the height, width and channel number of the current image/the real image respectively.
It should be appreciated that pixel Loss pixel is a measure of the difference between pixels in two images, as shown in equation (1), and that pixel Loss pixel consists essentially of two parts: l1 loss and L2 loss, i.e., mean absolute error and mean square error. Typically, the L1 penalty exhibits better convergence than the L2 penalty, while L2 penalty penalizes larger errors and is more tolerant of small errors than L1 penalty. Since the definition of PSNR (PEAK SIGNAL-to-noise ratio) is highly correlated with pixel differences and minimizing pixel loss directly maximizes PSNR, pixel loss has become a widely used loss function in the super-resolution field.
However, pixel loss does not actually take into account image quality, such as perceived quality, texture, etc., and typically lacks high frequency detail, resulting in an excessively smooth texture. Therefore, the present embodiment further introduces edge loss and texture loss in the first preset loss function to compensate for the above-mentioned deficiency of pixel loss.
Alternatively, the process may be carried out in a single-stage,
In the method, in the process of the invention,I i,j represents pixel point (I, j) in the current image and the real image, respectively, and E i,j represents the edge feature extracted at (I, j).
Please refer to formula (2),Representing the absolute error of the current image and the real image output by the generator, but because the embodiment is more concerned about the edge information, the edge characteristic E is used for weighting, and the duty ratio of the edge part is increased, the Loss edge can be better optimized/>Edge information of (2).
Further, the method comprises the steps of,
In the method, in the process of the invention,Respectively representing the feature images extracted by the first layer convolution layer when the generator carries out convolution operation on the current image and the real image, and h l、wl、cl respectively representing the height, the width and the channel number of the feature images extracted by the first layer convolution layer.
Pixel Loss pixel results in a higher PSNR for the super-resolution reconstruction result, but lack of high frequency information, excessive smooth texture can occur. As shown in formula (4), the perceived Loss perception is obtained by comparing the feature map obtained by convolving the real calibration picture with the feature map obtained by convolving the generated current image, and further image details can be obtained by calculating the Loss of the extracted features.
In the first place of progress,
In the method, in the process of the invention,And respectively representing inner products of vectorization feature mapping u and v in the feature map extracted by the first layer convolution layer when the generator carries out convolution operation on the real image and the current image.
It should be appreciated that since the reconstructed current image should have the same pattern as the real image, e.g. the same color, texture and contrast, the texture of the image can be regarded as a correlation between different feature channels (the correlation is represented by a matrix dot product), a Gram matrix G is defined:
in equation (5), vec (·) represents a vectorization operation.
The mean square error of the correlation matrix between the current image and the real image which is finally generated is texture loss:
As can be seen from equation (5), the texture of an image is the Gram matrix formed after the feature map of the different channels is converted into vectors, and then the loss is calculated for the Gram matrices of the current image and the real image. By texture loss, a realistic texture can be created and a more pleasing visual effect is produced.
In step S203, the current image finally output in the training process of the generator after one training is used as the first image, and is input into the detection module, and the detection module performs target detection on the first image and then outputs a target detection result.
In step S204, inputting the first image, the target detection result and the real image with calibration information into a discriminator in the to-be-trained generation countermeasure network, and judging the authenticity of the first image based on the target detection result, the calibration information, the second preset loss function and the third preset loss function until the discriminator judges that the judgment result of the first image is true, and determining the trained generator as a super-resolution reconstruction model, including:
Inputting the first image, the target detection result and the real image with calibration information into a discriminator in a generated countermeasure network to be trained, so that the discriminator performs the following steps:
calculating a loss value of a second preset loss function according to the target detection result, the calibration information contained in the real image and the second preset loss function;
calculating a loss value of a third preset loss function according to the first image and the third preset loss function;
judging whether the loss value of the second preset loss function and the loss value of the third preset loss function meet preset precision or not; if yes, judging that the first image is true, and taking a trained generator as a super-resolution reconstruction model; if not, after the parameters of the generator are adjusted, returning to the step of inputting the first image, the target detection result and the real image corresponding to the degraded image into the discriminator in the generated countermeasure network to be trained.
As shown in fig. 2, after the first image, the target detection result, and the real image corresponding to the degraded image are input to the discriminator in the generation countermeasure network to be trained, the target detection effect, i.e., the fineness, and the image quality, i.e., the fidelity, of the first image after the super-division of the generator are respectively evaluated.
Specifically, the calibration information includes ground truth calibration frames of the targets, and the Bounding box information includes Bounding box of the targets in the first image output by the detection module;
The second preset loss function for evaluating the detection effect is:
In the method, in the process of the invention, (A) n denotes the total number of ground truth calibration frames, (B) n denotes the total number of Bounding box, a r∩Br denotes the overlap region of ground truth calibration frames a r of the r-th target and Bounding box B r of the r-th target, and a r∪Br denotes the merge region of ground truth calibration frames a r and Bounding box B r of the r-th target.
From global detection performance as consideration, the second preset loss function is subjected to double-stage normalization, the single detection frame is normalized in the first stage, the global detection frame is normalized in the second stage, and the detection performance can be better reflected by taking the missing detection performance as a weight factor, and the finally obtained second loss function value approaches to the value 1, so that the detection performance is better.
Fig. 3 is an example diagram of a calibration frame ground truth and Bounding box provided in an embodiment of the present invention, fig. 4 is a schematic diagram of detection performance calculation of a single target provided in an embodiment of the present invention, and fig. 5 is a schematic diagram of global detection performance calculation provided in an embodiment of the present invention. Referring to fig. 3-5, a rectangular box a represents a calibration box ground truth, and a rectangular box B represents a sounding box output after detection by the detection module.
First stage normalization: and the overlapping area of the calibration frame ground truth and the sounding box output by the detection module is compared with the merging area of the calibration frame ground truth and the sounding box output by the detection module, and the merging area is used as detection performance judgment of a single detection frame and numerical normalization processing. That is, as shown in fig. 4, the detection performance of a single small target is expressed as follows:
And (3) normalization in the second stage: since, from global consideration, for a single object, there is more than one small object in a small object scene, standard calculation is performed on the detection performance of the single small object of the global image, so the design concept of this embodiment is as follows:
Let (a) n represent the total number of calibration frames of ground truth, the molecular fraction is the cumulative sum of the detection performance of each single target, and the detection performance is 0 for individuals who miss detection. Then, as shown in fig. 5, the calculation expression of the global detection performance is:
in addition, the missing detection rate performance weight is introduced The global detection performance index calculation is further balanced, namely Bounding boxes numbers output by the detection module are compared with the number of calibration frames of ground truth. Therefore, the second preset loss function for evaluating the detection performance is designed as follows:
On the other hand, the third preset loss function is:
wherein D (I) represents a result of judging whether the discriminator is true or not, E I~pdata(I) [ log D (I) ] represents a desire of log D (I), and the true image I belongs to the data set pdata, D (I ') represents a result of judging whether the discriminator is true or not, E I~pg(I') [ log (1-D (I')) represents a desire of log (1-D (I '), and the first image I' belongs to the set pg of all the first images output by the generator after the first training, BCE represents a binary cross entropy, I q、I'q represents the q-th pixel point in the true image I and the first image I ', respectively, and N is the total number of pixel points in the true image I and the first image I'.
Specifically, the training process of the GAN network as a whole can be described as a process of solving a binary function minimum and maximum, and the formula is described as follows:
Optimizing the countermeasures against losses V (D, G) can achieve two objectives: firstly, the generator can generate more real samples, and secondly, the discriminator can better distinguish the real samples from the generated samples.
The goal of the generator is that the generated sample is close to a real sample, and the specific data formula is expressed as:
where BCE represents binary cross entropy, z represents noise of the input generator,
From the above equation, it can be shown that the generator minimization loss function LG in GAN can be written in a form that minimizes a binary entropy loss function.
The loss function of the training generator is actually a term for sample I in the counterloss V (D, G), which is:
LD=EI~pdata(I)[logD(I)]+EI~pg(I')[log(1-D(I'))]。
The aim of the arbiter is to be able to better distinguish between the generated sample and the real sample, which is expressed in the mathematical formula as follows:
Fig. 6a is a low resolution image provided by an embodiment of the present invention, fig. 6b is an image reconstructed using ESRGAN super-resolution network, fig. 6c is an image reconstructed using the super-resolution reconstruction method provided by the present invention, and fig. 6d is an original high resolution image provided by an embodiment of the present invention. Compared with the traditional ESRGAN network, the super-resolution reconstruction method provided by the invention has better texture recovery capability, and the recovered detail texture is closer to the real image.
Fig. 7a is a target detection result of a low resolution image provided by the embodiment of the present invention, fig. 7b is a target detection result of an image reconstructed by using ESRGAN super-resolution network, and fig. 7c is a target detection result of an image reconstructed by using the super-resolution reconstruction method provided by the present invention. As shown in fig. 7a-7c, the object detection is performed on the low-resolution image, the image reconstructed by adopting the ESRGAN super-resolution network and the image reconstructed by adopting the super-resolution reconstruction method provided by the invention by utilizing YOLOv, and the super-resolution reconstruction model adopted by the invention has positive feedback effect on the detection result of the detection module, so that the detection confidence of the small object can be effectively improved, and the small object which is ignored by the detection network can be detected.
Fig. 8 is a schematic structural diagram of a super-resolution reconstruction method based on detection network feedback according to an embodiment of the present invention. As shown in fig. 8, an embodiment of the present invention further provides a super-resolution reconstruction device based on detection network feedback, including:
An acquiring module 810, configured to acquire an image to be processed with a first resolution;
The super-resolution reconstruction module 820 is configured to input the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstructed image with a second resolution; wherein the second resolution is higher than the first resolution, and the super-resolution reconstruction model is: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator by a detection module included in the generating countermeasure network, and the generating countermeasure network further comprises a discriminator.
According to the above embodiments, the beneficial effects of the invention are as follows:
The invention provides a super-resolution reconstruction method and device based on detection network feedback, comprising the following steps: acquiring an image to be processed with a first resolution; inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; the super-division reconstruction model is as follows: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator is carried out by a detection module included in the generation countermeasure network. In the training process of generating the countermeasure network, the numerical form of the target detection result of the detection module is used as feedback information, and the first preset loss function is set in the form of fusing the global image detection rate and the omission factor, so that the detection module and the generator for superdivision can be tightly combined, and the training of the generator is guided by utilizing the detection performance; furthermore, the invention also introduces a second preset loss function for evaluating the reconstruction fineness degree of the target texture of the first image and a third preset loss function for evaluating the fidelity of the first image, which are beneficial to improving the reconstruction quality of the small target texture characteristics of the single frame image and further improving the super-resolution quality of the single frame image.
In addition, the method has stronger generalization, can be combined with different detection algorithms, and improves the detection performance of the subsequent detection network through super-division reconstruction.
In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
The description of the terms "one embodiment," "some embodiments," "example," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (8)

1. The super-resolution reconstruction method based on the detection network feedback is characterized by comprising the following steps of:
Acquiring an image to be processed with a first resolution;
inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; wherein the second resolution is higher than the first resolution, and the super-resolution reconstruction model is: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator by a detection module included in the generating countermeasure network, and the generating countermeasure network further comprises a discriminator.
2. The super-resolution reconstruction method based on detection network feedback according to claim 1, wherein the super-resolution reconstruction model is obtained by training according to the following steps:
Obtaining training samples in a data set, wherein each training sample comprises a real image and a degradation image of the real image processed by a degradation model, and the real image contains calibration information;
Inputting a preset number of degradation images into a generator in a generating countermeasure network to be trained, and training the generator based on the preset number of degradation images, a first preset loss function, a real image and a current image obtained by the generator after super-resolution reconstruction of the degradation images until the loss value of the first preset loss function meets a preset condition, so as to obtain a trained generator;
The current image finally output by the generator after one training is used as a first image to be input into a detection module, so that the detection module carries out target detection on the first image to obtain a target detection result; the target detection result is as follows: an image with a binding box information on the basis of the first image;
inputting the first image, the target detection result and the real image with calibration information into a discriminator in the to-be-trained generation countermeasure network, judging the authenticity of the first image based on the target detection result, the calibration information, a second preset loss function and a third preset loss function until the discriminator judges the first image to be true, and determining a trained generator as an ultra-division reconstruction model; the second preset loss function is used for evaluating the target texture reconstruction fineness of the first image through a target detection result, and the third loss function is used for evaluating the fidelity of the first image.
3. The super-resolution reconstruction method according to claim 2, wherein the first predetermined Loss function is a sum of pixel Loss pixel, edge Loss edge, perceived Loss perception, and texture Loss texture.
4. The method for super-resolution reconstruction based on detection network feedback as claimed in claim 3, wherein,
Losspixel=Lpixel_L1+Lpixel_L2
In the method, in the process of the invention, I i,j,k respectively represents pixel values of pixel points (I, j) in a current image and a real image in a k channel, and h, w and c represent the height, width and channel number of the current image/the real image;
In the method, in the process of the invention, I i,j represents pixel points (I, j) in the current image and the real image respectively, and E i,j represents edge features extracted at (I, j);
In the method, in the process of the invention, Respectively representing the feature images extracted by the first layer of convolution layer when the generator carries out convolution operation on the current image and the real image, and h l、wl、cl respectively represents the height, the width and the channel number of the feature images extracted by the first layer of convolution layer;
In the method, in the process of the invention, And respectively representing inner products of vectorization feature mapping u and v in the feature map extracted by the first layer convolution layer when the generator carries out convolution operation on the real image and the current image.
5. The super-resolution reconstruction method based on detection network feedback according to claim 2, wherein the step of inputting the first image, the target detection result, and the real image with calibration information into the discriminator in the generation countermeasure network to be trained, and judging the authenticity of the first image based on the target detection result, the calibration information, the second preset loss function, and the third preset loss function until the judgment result of the discriminator on the first image is true, and determining the trained generator as a super-resolution reconstruction model comprises the steps of:
inputting the first image, the target detection result and the real image with calibration information into a discriminator in the to-be-trained generation countermeasure network, so that the discriminator performs the following steps:
calculating a loss value of a second preset loss function according to the target detection result, the calibration information contained in the real image and the second preset loss function;
calculating a loss value of a third preset loss function according to the first image and the third preset loss function;
Judging whether the loss value of the second preset loss function and the loss value of the third preset loss function meet preset precision or not; if yes, judging that the first image is true, and taking a trained generator as a super-division reconstruction model; if not, after the parameters of the generator are adjusted, returning to the step of inputting the first image, the target detection result and the real image corresponding to the degraded image into the discriminator in the to-be-trained generated countermeasure network.
6. The super-resolution reconstruction method based on detection network feedback according to claim 2, wherein the calibration information includes ground truth calibration frames of each target, and the Bounding box information includes Bounding box of each target in the first image output by the detection module;
the second preset loss function is:
In the method, in the process of the invention, (A) n denotes the total number of ground truth calibration frames, (B) n denotes the total number of Bounding box output by the detection module, a r∩Br denotes the overlapping region of ground truth calibration frames a r of the r-th target and Bounding box B r of the r-th target, and a r∪Br denotes the merging region of ground truth calibration frames a r and Bounding box B r of the r-th target.
7. The super-resolution reconstruction method based on detection network feedback according to claim 2, wherein the third preset loss function is:
wherein D (I) represents a result of judging whether the discriminator is true or not, E I~pdata(I) [ log D (I) ] represents a desire of log D (I), and the true image I belongs to the data set pdata, D (I ') represents a result of judging whether the discriminator is true or not, E I~pg(I') [ log (1-D (I')) represents a desire of log (1-D (I '), and the first image I' belongs to the set pg of all the first images output by the generator after the first training, BCE represents a binary cross entropy, I q、I'q represents the q-th pixel point in the true image I and the first image I ', respectively, and N is the total number of pixel points in the true image I and the first image I'.
8. The utility model provides a super-resolution reconstruction device based on detect network feedback which characterized in that includes:
the acquisition module is used for acquiring the image to be processed with the first resolution;
The super-resolution reconstruction module is used for inputting the image to be processed into a super-resolution reconstruction model to obtain a super-resolution reconstruction image with a second resolution; wherein the second resolution is higher than the first resolution, and the super-resolution reconstruction model is: and the generator is trained in advance based on a target detection result obtained after the target detection of the first image output by the generator by a detection module included in the generating countermeasure network, and the generating countermeasure network further comprises a discriminator.
CN202410065111.4A 2024-01-16 2024-01-16 Super-resolution reconstruction method and device based on detection network feedback Pending CN117934275A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410065111.4A CN117934275A (en) 2024-01-16 2024-01-16 Super-resolution reconstruction method and device based on detection network feedback

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410065111.4A CN117934275A (en) 2024-01-16 2024-01-16 Super-resolution reconstruction method and device based on detection network feedback

Publications (1)

Publication Number Publication Date
CN117934275A true CN117934275A (en) 2024-04-26

Family

ID=90762391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410065111.4A Pending CN117934275A (en) 2024-01-16 2024-01-16 Super-resolution reconstruction method and device based on detection network feedback

Country Status (1)

Country Link
CN (1) CN117934275A (en)

Similar Documents

Publication Publication Date Title
CN111563418A (en) Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN113449727A (en) Camouflage target detection and identification method based on deep neural network
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN111402237B (en) Video image anomaly detection method and system based on space-time cascade self-encoder
CN114266977B (en) Multi-AUV underwater target identification method based on super-resolution selectable network
CN113052200B (en) Sonar image target detection method based on yolov3 network
Le Meur et al. A spatio-temporal model of the selective human visual attention
CN112329764A (en) Infrared dim target detection method based on TV-L1 model
CN112288778A (en) Infrared small target detection method based on multi-frame regression depth network
CN116704273A (en) Self-adaptive infrared and visible light dual-mode fusion detection method
Xu et al. COCO-Net: A dual-supervised network with unified ROI-loss for low-resolution ship detection from optical satellite image sequences
CN113487530B (en) Infrared and visible light fusion imaging method based on deep learning
WO2019228450A1 (en) Image processing method, device, and equipment, and readable medium
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN113222824B (en) Infrared image super-resolution and small target detection method
Singh et al. Visibility enhancement and dehazing: Research contribution challenges and direction
Lv et al. Blind dehazed image quality assessment: a deep CNN-based approach
CN112163606B (en) Infrared small target detection method based on block contrast weighting
CN113011399A (en) Video abnormal event detection method and system based on generation cooperative judgment network
CN116895007A (en) Small target detection method based on improved YOLOv8n
Wang et al. A unified framework of source camera identification based on features
CN111127355A (en) Method for finely complementing defective light flow graph and application thereof
CN114049289B (en) Near infrared-visible light face image synthesis method based on contrast learning and StyleGAN2
CN117934275A (en) Super-resolution reconstruction method and device based on detection network feedback
Jindal et al. An ensemble mosaicing and ridgelet based fusion technique for underwater panoramic image reconstruction and its refinement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination