CN114998124A

CN114998124A - Image sharpening processing method for target detection

Info

Publication number: CN114998124A
Application number: CN202210567780.2A
Authority: CN
Inventors: 胡海苗; 赵子琛; 黎世豪
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-09-02

Abstract

An image sharpening processing method for target detection comprises the following steps: training loop generation confrontation network, obtaining generator 1 from degraded image domain to clear image domain (S1); model parameters are loaded to a new generator 3 from a generator with training convergence, a confrontation network is generated by training degraded images with target labeling information and clear image sample data, a generator is obtained (S2), wherein the generator model parameters are restricted through the detection loss of a detector and the confrontation loss of the extracted features of the detector, the domain migration of a target semantic level is realized, and the generator model parameters are restricted through the loss of bottom visual features such as color, scattering, gradient and the like, so that the reduction of the bottom visual features of the degraded images is realized. The invention not only improves the image visual quality, but also obviously improves the performance of the subsequent target detection algorithm, can be used as a preprocessing plug-in of a video monitoring system, and provides a clear image for the subsequent target detection algorithm as input.

Description

Image sharpening processing method for target detection

Technical Field

The invention relates to an image sharpening processing method for target detection, and belongs to the technical field of digital image processing.

Background

With the wide application of video monitoring, a large amount of video data is generated every day, a video monitoring system gradually develops towards the direction of intellectualization, and high-level computer vision tasks such as target detection and the like play an increasingly greater role as artificial assistance. However, intelligent algorithms such as object detection generally require high quality input images. The imaging quality of the outdoor monitoring system is easily affected by a plurality of degradation factors: for example, at night, the ambient light illumination is low, and the scene reflected light signal is weak, so that the imaging contrast is reduced; in foggy weather, the scene signal is attenuated by atmospheric particle scattering in the air propagation process and mixed into atmospheric light, so that the imaging contrast is reduced. This limits the effective application scenarios of the intelligent algorithm. Therefore, efforts are made to enhance or restore degraded images by image sharpening methods to provide sufficient information for monitoring systems.

The existing image sharpening methods mainly comprise methods based on physical models and data driving. The method based on the physical model depends on physical modeling of specific degradation factors, and the specific degradation factors are removed by solving characteristic components in the physical model and reversely deducing the physical model. For example, at night, the illumination component is solved according to the theory of the retinal cortex, and then the brightness is remapped; and solving the illumination component and the transmissivity component during dense fog, and then performing reverse thrust according to the atmospheric scattering model. The method has a good visual effect due to deep knowledge of the degradation process, but still has a plurality of degradation factors which can not be modeled at present. The data-driven based approach relies primarily on generative models, with removal of the degradation process being achieved in a supervised or semi-supervised data-driven manner, with the degradation process being modeled implicitly by the design of the loss function.

It can be seen that the existing image sharpening method mainly restores or enhances the weakened bottom layer visual characteristics such as color, edge, texture and the like so as to improve the visual quality of human eyes; but neglects the reduction of high-level semantic features, and therefore has limited effect on the improvement of subsequent high-level visual tasks such as target detection. Therefore, the performance of intelligent algorithms such as target detection and the like on various degraded images is limited, and the intelligent performance of the whole video monitoring system is weakened.

Disclosure of Invention

Under the background, the invention considers the improvement effect on the performance of the target detection task, restores the degraded image from the aspects of scene perception, target semantics and bottom visual characteristics, further provides an image sharpening processing method for target detection, realizes sharpening of images under different severe environments such as dense fog, low illumination and the like, improves the performance of the target detection task under the scenes, and has important significance for an intelligent monitoring system.

The invention aims to improve the performance of the target detection task by image sharpening processing. An image sharpening processing method facing target detection is provided, and the sharpening problem of a degraded image under the influence of dense fog, low illumination and the like is converted into a domain adaptation problem from a degraded image domain A to a clear image domain B. And (3) by designing a constraint network model through a loss function, realizing the reduction of the degraded image on the level of bottom visual features and high-level semantic features so as to improve the performance of a subsequent target detection task on a clear processing result.

According to one aspect of the present invention, there is provided an image sharpening processing method for object detection, comprising:

A) training cycle generation countermeasure network to obtain scene domain adaptation generator G from degraded image domain A to clear image domain B ₁ The loop generation countermeasure network model comprises two generators G with the same structure ₁ 、G ₂ Two discriminators D of the same construction ₁ ，D ₂ The method comprises the following steps:

A1) selecting a sample image from the degraded image domain A and the clear image domain B;

A2) to generator G ₁ And discriminator D ₁ G, generator ₂ And discriminator D ₂ Performing countermeasure training, and performing multiple rounds of iterative training to obtain a generator G with convergence ₁ Carrying out domain migration on the degraded image from a scene perception level;

B) training the Generation of an antagonistic network model, from the Generator G which has been trained to converge ₁ In the network, network parameters are loaded, and then the determined detector is aimed atTraining generator G with degraded image domain and clear image domain sample images labeled with target positions and categories ₃ Wherein the generating of the countermeasure network model includes a generator G ₃ And a discriminator D ₃ The method comprises the following steps:

B1) by generators G ₃ To degrade the sample image a in the image domain A _o For input, an image a is generated ₂ And combining the images a ₂ Sending the predicted frame loss value and the classification loss value of the target detector and the head feature F of the feature extraction module to a discriminator D ₃ Obtaining the countermeasure loss, forming a pair generator G with the predicted frame loss, the classification loss and the countermeasure loss ₃ Constraint at the semantic level;

B2) by generators G ₃ To degrade the sample image a in the image domain A _o For input, an image a is generated ₂ And to a ₂ Extracting visual characteristics of bottom layer including color, scattering and gradient to obtain loss function value, and forming pair generator G ₃ The bottom characteristic layer is restrained to ensure the human eye visual quality of the generated image;

B3) selecting sample images from the degraded image domain A and the clear image domain B, and generating the image ₃ And discriminator D ₃ Training is carried out, and a generator G for convergence is obtained through multiple rounds of iteration ₃ The degraded image is restored from the level of target semantics and bottom visual features;

C) the testing phase takes the degraded image as a generator G ₃ To obtain the sharpening process result J of the present invention.

Drawings

Fig. 1 is a schematic network structure diagram of an image sharpening processing method for object detection according to an embodiment of the present invention.

FIG. 2 is a generator G according to one embodiment of the present invention ₁ 、G ₂ 、G ₃ All generators have the same lightweight network structure.

FIG. 3 is a decision according to one embodiment of the inventionPin D ₁ 、D ₂ 、D ₃ The network structure of (1) is schematic, and all the discriminators have the same lightweight network structure.

FIGS. 4(a) to 4(c) are experimental comparisons of an image sharpening method according to an embodiment of the present invention with a typical image sharpening method in the prior art on a test image, and boxes on the experimental comparisons indicate target detection results of the Faster RCNN detector; fig. 4(a) shows a dense fog image and a low-illumination image, fig. 4(b) shows a result of sharpening the dense fog image by the scatter-removal algorithm CEP and a result of sharpening the low-illumination image by the low-illumination reduction algorithm KinD, respectively, and fig. 4(c) shows a sharpened image obtained by the method of the present invention.

Detailed Description

According to one aspect of the invention, an image sharpening processing method for object detection is provided. The method realizes the improvement of the performance of the target detection algorithm by the sharpening process, and the sharpened image obtained by the method has the advantages of good human visual quality and accurate detection of the machine target.

In order to realize the image sharpening for the target detection, the invention provides an image sharpening processing method for the target detection, which is characterized by comprising the following steps:

A) training cycle generation countermeasure network to obtain scene domain adaptation generator G from degraded image domain A to clear image domain B ₁ Wherein the cyclic generation countermeasure network model comprises two structurally identical generators G ₁ 、G ₂ Two discriminators D of the same construction ₁ ，D ₂ The method comprises the following steps:

A2) to generator G ₁ And discriminator D ₁ Generator G ₂ And discriminator D ₂ Performing countermeasure training, and performing multiple rounds of iterative training to obtain a generator G with convergence ₁ Carrying out domain migration on the degraded image from a scene perception level;

B) training the Generation of an antagonistic network model, from the Generator G which has been trained to converge ₁ Mid-loading network parametersThen, for the determined detector, training generator G with sample image of degraded image domain and clear image domain labeled with target position and category ₃ Wherein said generating a countermeasure network model comprises a generator G ₃ And a discriminator D ₃ The method comprises the following steps:

B2) by generators G ₃ To degrade the sample image a in the image domain A _o For input, an image a is generated ₂ And to a ₂ Extracting visual characteristics of bottom layer including color, scattering and gradient to obtain loss function value, and forming pair generator G ₃ Constraining at the bottom characteristic layer to ensure the human visual quality of the generated image;

B3) selecting sample images from the degraded image domain A and the clear image domain B, and generating the image ₃ And discriminator D ₃ Training is carried out, and a convergent generator G is obtained through multiple rounds of iteration ₃ The degraded image is restored from the level of target semantics and bottom visual features;

C) using degraded image as generator G in test stage ₃ To obtain the sharpening process result J of the present invention.

According to a further embodiment of the invention, in the step a) above, the two image domains are non-paired degraded image and clear image, and the training loop generates a confrontation model to obtain the generator G from the degraded image domain to the clear image domain ₁ The method has the main function of zooming in the feature difference of the whole perception level of the scene and taking the feature difference as the pre-training weight of the subsequent target semantic level domain migration.

According to the inventionIn a further embodiment of the above step a), as shown in fig. 2, a generator G ₁ And G ₂ The network structure is the same, only include 2 convolutional layers and 6 residual convolutional layers, the convolutional kernel size is 3, the number of channels is 64, the filling mode is refelection pad2d, except that the last 1 convolutional layers are activated by Tanh and have no normalization layer, the other convolutional layer activation functions are LeakRelu, and normalization is adopted. Generator G ₁ With sample image a in degraded image domain A _o For input, an image a is generated which approximates a sharp image field B ₁ (ii) a Generator G ₂ With a sample image B in the clear image domain B _o For input, an image b is generated that approximates the degraded image domain A ₁ . As shown in fig. 3, a discriminator D ₁ And D ₂ The network structure is the same, including only 6 convolutional layers, the convolutional kernel size is 4, the step size of the first 4 convolutional layers is 2, and the step size of the last 2 convolutional layers is 1. Discriminator D ₁ With a generator G ₁ Generated image a of ₁ And a clear image field image b _o For input, each pixel point value in the output single-channel image indicates whether the image belongs to a clear image domain B; discriminator D ₂ With a generator G ₂ Generated image b ₁ And degraded image domain image a _o For input, each pixel point value in the output single-channel image indicates whether the image belongs to the degraded image domain A or not.

According to a further embodiment of the present invention, in the above step a2), two sets of generators \ discriminators optimize the network parameters by opposing training loss: the optimization goal of the discriminator is to be able to accurately distinguish whether the input image belongs to the sharp image domain B or the degraded image domain a, and the optimization goal of the generator is to generate an image that can confuse the discriminator. The generator G is listed as formula (1) ₁ And a discriminator D ₁ Loss function of resistance training, D ₁ For G ₁ The closer the generated image discrimination result is to the clear image region B, the more G is indicated ₁ The better the generation, the smaller the loss function value, corresponding to D ₁ The larger the loss function value of (D), the more D ₁ The accuracy is not sufficient.

Loss _g1 ＝E _z～A [log(D ₁ (G ₁ (z)))] (1)

Loss _d1 ＝E _x～B [logD ₁ (x)]+E _z～A [log(1-D ₁ (G ₁ (z)))]

Where B and a represent the sharp image set and the degraded image set, respectively. Clear image samples in the training data are from the VOC2007 data set, dense fog degraded image samples are from the RTTS data set, and low-illumination degraded image samples are from the Darkface data set. The model parameter optimizer is Adam, the learning rate is 1e-4, and the model parameter initialization mode is Xavier. Through counterstudy and mutual constraint between the generator and the discriminator, the generator can generate images which are closer to a target domain, the discriminator can more accurately discriminate the domain to which the images belong, and the model convergence of about 50 rounds of iterative training is achieved to obtain a generator G ₁ And carrying out domain migration on the degraded image from a perception level.

According to a further embodiment of the present invention, in the step B), the two input image domains are a degraded image and a clear image with target position and category labels, and the loss function constraint mainly includes two aspects: through the target semantic feature constraint and the unsupervised constraint of the underlying visual feature introduced by the fast RCNN detector, as shown in formula (2), the loss functions of the two parts respectively form a generator G from the two aspects of target semantic and visual quality ₃ The optimization objective of (1). Meanwhile, the content of the invention is explained by taking the Faster RCNN as a detection module, but the invention is not only directed to one detector of the Faster RCNN, and has a remarkable improvement effect on the performances of different detectors such as Yolo and the like.

Loss＝Loss _con +Loss _low (2)

Therein, Loss _con Is Loss of target semantics, Loss _low Is a loss of underlying visual features.

According to a further embodiment of the invention, in the above step B), the generator G ₃ The network structure of (2) is shown in FIG. 2, the input is a degraded image, the output is a generated clarified image, and a discriminator D ₃ In a network structure such asAs shown in FIG. 3, the input is G ₃ And (3) enabling the generated image and clear image domain image samples to be in a head characteristic diagram of a Faster RCNN characteristic extraction module, and enabling each pixel point value in an output single-channel image to indicate whether the image belongs to a clear image domain B or not.

According to a further embodiment of the present invention, in the above step B1), the domain migration at semantic level is trained using degraded sample images with target location labeling, and the loss function is defined as shown in formula (3):

Loss _con ＝Loss _detect +Loss _g3 (3)

therein, loss _detect And Loss _g3 The detection loss of the detector and the countering loss of the detector characteristic, respectively, are defined as follows:

Loss _detect ＝Loss _cls (G ₃ (z))+Loss _loc (G ₃ (z)) (4)

wherein, G ₃ (z) generator G with degraded image as input ₃ Output image of (2), Loss _cls And Loss _loc The target classification penalty and the prediction box offset penalty of fast RCNN, respectively.

Loss _g3 ＝E _z～N1 [log(D ₃ (F(G ₃ (z))))] (5)

Wherein, F represents the 7 th layer feature map of the Faster RCNN feature extraction module, and the dimension is 512.

According to a further embodiment of the present invention, in the above step B2), the visual quality is restored mainly by the constraints of the underlying visual features such as color, scattering, gradient, etc., so that the resulting image of the generator is more consistent with the visual quality of human eyes, and the loss function is defined as shown in equation (6):

loss _low ＝loss _recon +α·loss _color +β·loss _haze +γ·loss _edge (6)

therein, loss _recon 、loss _color 、loss _haze 、loss _edge The image is restored from the aspects of structure, color, scattering and edge gradient features respectivelyThe equation is constrained, α is 8, β and γ are both 0.1, and the respective loss functions are defined as follows:

wherein the content of the first and second substances,

and

respectively representing degraded images and G ₃ Each pixel position in the generated image is at the maximum of three channels, and the L1 loss constraint network generation result and the original image are structurally consistent.

Wherein, the first and the second end of the pipe are connected with each other,

respectively generate an image G ₃ (z) pixel mean at three color channels. According to the gray world assumption, the three-channel mean values of the white-balanced image are approximately equal, and the color cast problem of the whole scene is restrained by using the difference of the maximum and minimum values in the three-channel mean values.

loss _haz4 ＝-var(Gray(G ₃ (z))-Gray _mean )/Gray _std ) (9)

Wherein, Gray (G) ₃ (z)) is the Gray scale map, Gray, that generates the image transitions _mean And Gray _std Mean and variance, Gray (G), respectively, of the grayscale image ₃ (z))-Gray _mean )/Gray _std The gray scale map is normalized and var represents the variance of the entire image. The texture is constrained using the penalty function, which is smaller the richer the texture.

Wherein, grad represents the image gradient, and the edge of the generated image which keeps clear is kept by restricting the mean value of the gradient.

According to a further embodiment of the invention, in step B3), the generator G ₃ And optimizing model parameters through target semantic loss and bottom visual characteristic loss, optimizing the model parameters through the confrontation loss of detector characteristics by the discriminator, wherein clear image samples in the training data come from a VOC2007 data set, dense fog degraded image samples come from an RTTS data set, and low-illumination degraded image samples come from a DarkFace data set. Adam as model parameter optimizer, 2e-5 as learning rate, and G as generator ₃ Directly from the generator G ₁ Medium load, discriminator D ₃ The model parameters are initialized by using an Xavier mode, iterative training is carried out for about 80 model convergence, and a generator G is obtained ₃ And the clarity of the degraded image is realized from the level of target semantics and bottom visual characteristics.

According to a further embodiment of the invention, in step C), the testing phase directly takes the degraded image as generator G ₃ And inputting the model, and using the model inference result as a sharpening processing image J.

Therefore, the invention provides an image sharpening processing method for target detection. The method realizes the domain adaptation from the degraded image domain to the clear image domain from the scene perception and target semantic level, thereby closing the difference between target characteristics in the degraded image and the clear image and improving the performance of a target detection task. Meanwhile, the constraint of the bottom layer visual characteristics also enables the clear image obtained by the method to have better human visual quality. Therefore, the method is suitable for degraded images under various complex illumination and scattering conditions, can be used as a preprocessing module to provide a clear image for the subsequent target detection algorithm to call, and improves the performance of the target detection algorithm.

The image sharpening processing method for target detection provided by the invention constitutes substantial significant improvement on the existing image sharpening method. Compared with the prior art, the method has the beneficial effects of bringing the difference between the degraded image and the clear image at the scene perception and target semantic level closer, and obviously improving the performance of tasks such as target detection and the like.

Fig. 4 shows the comparison of the test results of the sharpening method of the present invention with the test results of the prior sharpening method (note that the original figures shown in fig. 4(a) to 4(c) are all in color), and the boxes on the figure are the detection results of the fast RCNN target detector. Fig. 4(a) shows a dense fog image and a low illumination image, the edge of the object, etc. is severely attenuated due to strong scattering and weak signals, resulting in the performance of the object detector being severely affected; fig. 4(b) is a result of the scattering removal algorithm CEP for clarifying the dense fog image and a result of the low-illumination reduction algorithm KinD for clarifying the low-illumination image, which respectively improve the detection accuracy, but the improvement effect is very limited; fig. 4(c) is the image sharpening processing result of the present invention, and it can be seen that many objects are detected by the detector, showing that the image sharpening method of the present invention significantly improves the object detection performance.

It is to be understood that the above disclosure is only illustrative of specific embodiments of the invention. According to the technical idea provided by the invention, the changes which can be thought by the ordinary skilled person in the field shall fall into the protection scope of the invention.

Claims

1. An image sharpening processing method for target detection is characterized by comprising the following steps:

A) training cycle generation countermeasure network to obtain scene domain adaptation generator G from degraded image domain A to clear image domain B ₁ Wherein the loop-generated countermeasure network model includes two structurally identical generators G ₁ 、G ₂ Two discriminators D of the same construction ₁ ，D ₂ The method comprises the following steps:

A2) to generator G ₁ And discriminator D ₁ G, generator ₂ And discriminator D ₂ Performing countermeasure training, and performing multiple rounds of iterative training to obtain a generator G with convergence ₁ Face down from scene perception levelCarrying out domain migration on the texture image;

B) training the Generation of an antagonistic network model, from the Generator G which has been trained to converge ₁ Then, aiming at the determined detector, a generator G is trained by the sample images with the degraded image domain and the clear image domain labeled by the target position and the category ₃ Wherein: the generating of the antagonistic network model comprises a generator G ₃ And a discriminator D ₃ The method comprises the following steps:

B2) by generators G ₃ To degrade the sample image a in the image domain A _o For input, an image a is generated ₂ And to a ₂ Extracting bottom visual characteristics including color, scattering and gradient to obtain loss function value, and forming a pair generator G ₃ The bottom characteristic layer is restrained to ensure the human eye visual quality of the generated image;

B3) selecting sample images from the degraded image domain A and the clear image domain B to the generator G ₃ And discriminator D ₃ Training is carried out, and a generator G for convergence is obtained through multiple rounds of iteration ₃ The degraded image is restored from the level of target semantics and bottom visual features;

2. The image sharpening processing method according to claim 1, wherein:

in the step A), the input two image fields are unpaired degraded image and clear image, and a training cycle is generatedGenerator G for forming a confrontational model to obtain a range from a degraded image domain to a clear image domain ₁ Generator G ₁ And G ₂ The network structure is the same, including 2 convolutional layers and 6 residual convolutional layers, the convolutional kernel size is 3, the number of channels is 64, the filling mode is refelection pad2d, except that the last 1 convolutional layer is activated by Tanh and has no normalization layer, the activation functions of the other convolutional layers are LeakRelu, and the normalization is adopted. Generator G ₁ By degrading the sample image a in the image domain A _o For input, an image a is generated which approximates a sharp image field B ₁ (ii) a Generator G ₂ With a sample image B in the clear image domain B _o For input, an image b is generated that approximates the degraded image domain A ₁ . Discriminator D ₁ And D ₂ The network structure is the same, including only 6 convolutional layers, the convolutional kernel size is 4, the step size of the first 4 convolutional layers is 2, and the step size of the last 2 convolutional layers is 1. Discriminator D ₁ With a generator G ₁ Generated image a of ₁ And a clear image field image b _o For input, each pixel point value in the output single-channel image indicates whether the image belongs to a clear image domain B; discriminator D ₂ With a generator G ₂ Generated image b ₁ And degraded image domain image a _o For inputting, each pixel point value in the outputted single-channel image indicates whether the image belongs to the degraded image domain A or not,

in the step a2), network parameters are optimized between two groups of generators \ discriminators through the training loss: the optimization goal of the discriminator is to be able to accurately distinguish whether the input image belongs to the sharp image domain B or the degraded image domain a, and the optimization goal of the generator is to generate an image that can confuse the discriminator. Equation (1) lists the generator G ₁ And a discriminator D ₁ Loss function against training:

Loss _g1 ＝E _z～A [log(D ₁ (G ₁ (z)))] (1)

Loss _d1 ＝E _x～B [logD ₁ (x)]+E _z～A [log(1-D ₁ (G ₁ (z)))]

where B and a represent the sharp image and the degraded image set, respectively. Clarity in training dataThe image samples are from a VOC2007 dataset, the dense fog degraded image samples are from an RTTS dataset, and the low-illumination degraded image samples are from a DarkFace dataset. The model parameter optimizer is Adam, the learning rate is 1e-4, the model parameter initialization mode is Xavier, the iterative training is about 50 model convergence rounds, and a generator G is obtained ₁ Performing domain migration on the degraded image from the perception level,

in the step B), the input two image domains are a degraded image (such as dense fog or low illumination) with target position and category labels and a clear image, and the loss function constraint comprises two aspects: through target semantic feature constraint and unsupervised constraint of underlying visual features introduced by the Faster RCNN detector, as shown in formula (2), a generator G is formed from two aspects of target semantics and visual quality respectively ₃ The optimization goal of the invention is to be noted, meanwhile, the content of the invention is explained by taking the fast RCNN as a detection module, but the invention not only aims at one detector of the fast RCNN, but also has significant improvement effect on the performances of different detectors such as Yolo and the like,

Loss＝Loss _con +Loss _low (2)

therein, Loss _con Is Loss of target semantics, Loss _low Is loss of underlying visual features, generator G ₃ Input as a degraded image, output as a generated sharpened image, and a discriminator D ₃ The input is G ₃ The generated image and clear image domain image samples are subjected to head feature map extraction by a fast RCNN feature extraction module, each pixel value in an output single-channel image indicates whether the image belongs to a clear image domain B or not,

in the above step B1), the domain migration at semantic level is trained using degraded sample images with target location labeling,

Loss _con ＝Loss _detect +Loss _g3 (3)

therein, Loss _detect And Loss _g3 Respectively detection loss and counter-loss of detector characteristics,

Loss _detect ＝Loss _cls (G ₃ (z))+Loss _loc (G ₃ (z)) (4)

wherein G is ₃ (z) generator G with degraded image as input ₃ Output image of (2), Loss _cls And Loss _loc Respectively target classification penalty and prediction box offset penalty for fast RCNN,

Loss _g3 ＝E _z～N1 [log(D ₃ (F(G ₃ (z))))] (5)

wherein F represents a 7-layer feature diagram of the fast RCNN feature extraction module, is 512-dimensional,

in the step B2), the reduction of the visual quality depends on the constraints of the bottom layer visual features such as color, scattering, gradient, etc., so that the resulting image of the generator better conforms to the visual quality of human eyes, and the loss function is defined as follows:

loss _low ＝loss _recon +α·loss _color +β·loss _haze +γ·loss _edge (6)

among them, loss _recon 、loss _color 、loss _haze 、loss _edge The image restoration process is constrained in terms of structure, color, scattering and edge gradient features, respectively, α is 8, β and γ are both 0.1, and the respective loss functions are defined as follows:

wherein the content of the first and second substances,

and

respectively representing degraded images and G ₃ The maximum value of each pixel position in the generated image in three channels, the L1 loss constraint network generation result and the original image are structurally consistent,

respectively generate an image G ₃ (z) in the pixel mean values of the three color channels, according to the gray world assumption, the three channel mean values of the white balanced image are approximately equal, the difference of the maximum value and the minimum value in the three channel mean values is used for restricting the color cast problem of the whole scene,

loss _haze ＝-var(Gray(G ₃ (z))-Gray _mean )/Gray _std ) (9)

wherein, Gray (G) ₃ (z)) is the Gray map, Gray, that generates the image transitions _mean And Gray _std Mean and variance, Gray (G), respectively, of the grayscale image ₃ (z))-Gray _mean )/Gray _std The gray level image is normalized, var represents the variance of the whole image, the loss function is used for constraining texture, the loss function is smaller when the texture is richer,

wherein, grad represents the image gradient, the mean value of the constrained gradient ensures that the result image keeps clear edge,

in said step B3), the generator G ₃ Optimizing model parameters through target semantic loss and bottom visual characteristic loss, optimizing the model parameters through the confrontation loss of detector characteristics by a discriminator, wherein clear image samples in training data come from a VOC2007 data set, dense fog degraded image samples come from an RTTS data set, low-illumination degraded image samples come from a DarkFace data set, the model parameter optimizer is Adam, the learning rate is 2e-5, and a generator G ₃ Directly from the generator G ₁ Medium load, discriminator D ₃ The model parameters are initialized by using an Xavier mode, iterative training is carried out for about 80 model convergence, and a generator G is obtained ₃ The clarity of the degraded image is realized from the level of target semantics and bottom visual characteristics,

said step C) In the testing stage, the degraded image is directly used as the generator G ₃ And inputting the model, and using the model inference result as a sharpening processing image J.

3. The method for image sharpening processing for object detection according to claim 2, characterized in that:

the steps A2), B1) and B2) are respectively designed with different loss functions to constrain the network model parameters, so as to realize the domain adaptation of the scene perception level, the domain adaptation of the target semantic level and the restoration of the bottom visual feature level.