CN116542864A - Unmanned aerial vehicle image defogging method based on global and local double-branch network - Google Patents

Unmanned aerial vehicle image defogging method based on global and local double-branch network Download PDF

Info

Publication number
CN116542864A
CN116542864A CN202310037485.0A CN202310037485A CN116542864A CN 116542864 A CN116542864 A CN 116542864A CN 202310037485 A CN202310037485 A CN 202310037485A CN 116542864 A CN116542864 A CN 116542864A
Authority
CN
China
Prior art keywords
image
defogging
convolution
global
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310037485.0A
Other languages
Chinese (zh)
Inventor
李红光
龙飞宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310037485.0A priority Critical patent/CN116542864A/en
Publication of CN116542864A publication Critical patent/CN116542864A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an unmanned aerial vehicle image defogging method based on global and local double-branch networks, and belongs to the field of image processing; the method comprises the following steps: firstly, generating fog images with different concentrations for the existing unmanned aerial vehicle by using an atmospheric scattering model; constructing a defogging network model based on global and local double-branch images; then, inputting each foggy image in the data set into a global and local double branches at the same time, respectively obtaining a global structural feature image and a local detail feature image, and then inputting the global structural feature image and the local detail feature image into a feature fusion module to obtain defogging parameters corresponding to each image; then, constructing a mixed loss function of image defogging training; utilizing images formed by the defogging and foggy images to train an image defogging network model, and optimizing parameters of the image defogging network model through a loss function; finally, inputting a new single foggy image into an optimal image defogging network model, and outputting a defogged image; the invention can process images with various sizes, and has fast processing speed and wide application range.

Description

Unmanned aerial vehicle image defogging method based on global and local double-branch network
Technical Field
The invention belongs to the field of image processing, and particularly relates to an unmanned aerial vehicle image defogging method based on global and local double-branch networks.
Background
In recent years, due to the development of industry, the atmosphere is seriously polluted, and severe weather such as haze and the like is continuously generated. The images photographed in haze weather often have problems such as blurry, color shift, reduced contrast, buried details, and the like. Therefore, the haze weather can seriously interfere with outdoor photography and image processing, and limits the performance of tasks such as security monitoring, automobile automatic driving and the like.
The unmanned aerial vehicle is high in flying height, and the distance between the unmanned aerial vehicle and the target is large, so that the unmanned aerial vehicle is more easily affected by haze weather compared with a ground image. The haze concentration in the image is related to the distance between the shooting point and the target, and the greater the distance is, the greater the haze concentration is. Unmanned aerial vehicle can squint ground in many cases when flight is shot, can have the position that is close and the position that is far away in the image simultaneously when squinting ground, can lead to in the image haze concentration distribution inhomogeneous like this, as shown in fig. 1. Uneven haze can bring a plurality of adverse effects to subsequent unmanned aerial vehicle image target detection, identification, tracking, positioning and other tasks.
The conventional image defogging method is often aimed at uniformly distributed haze, so that a method for effectively defogging and sharpening unmanned aerial vehicle images with uneven haze distribution is needed. The existing image defogging data sets comprise D-HAZY, NH-HAZE, RESIDE and the like, but most of the data sets are images shot on the ground, so that a method for synthesizing the non-uniform foggy image data set under the view angle of the unmanned aerial vehicle is needed to synthesize the foggy image of the unmanned aerial vehicle.
The existing image defogging algorithm is divided into a method based on traditional image processing and a method based on deep learning; the first class mostly relies on an atmospheric scattering model and prior information, and a foggy image is obtained through estimating a scene depth map and atmospheric ambient light according to a foggy image derivation. The method is limited by an atmospheric scattering model and various priori information under ideal conditions, and has great limitation in complex foggy images.
The second class of methods can be divided into methods based on an atmospheric scattering model and methods based on image conversion; the former uses an atmospheric scattering model, and uses a neural network to estimate a scene depth map and atmospheric environment light; the latter ignores the atmospheric scattering model and generates a haze-free image from direct conversion of the haze image. The method based on deep learning is superior to the method based on traditional image processing in terms of processing speed, processing effect and the like.
Disclosure of Invention
Aiming at the problems that haze distribution is uneven under an unmanned aerial vehicle visual angle, image restoration cannot be effectively achieved and local optimization is easy to fall into in the existing method, the invention provides a defogging method for an unmanned aerial vehicle image based on global and local double-branch networks by respectively extracting image detail information and overall structure information, and the defogging method is independent of complex priori assumptions, and a clear defogging image can be obtained by inputting a single foggy image.
The method comprises the following specific steps:
firstly, respectively estimating depth maps of all images by using a depth estimation model by using an existing unmanned aerial vehicle image, and generating foggy images with different concentrations by using an atmospheric scattering model;
each pair of defogging images is synthesized into a plurality of foggy images, and each foggy image and the corresponding defogging image are combined into an image pair;
step two, constructing a defogging network model based on global and local double-branch images;
the image defogging network comprises a global structure branch, a local detail branch and a feature fusion module;
the global structure branch acquires the whole haze structure information of the image by using a transformer structure, the local detail branch acquires the detail information of the image by using depth separable convolution, and meanwhile, the calculated amount is reduced, and the feature fusion module carries out weighted fusion on the feature graphs of the two branches;
the local detail branch consists of 9 convolution layers and 4 pixel attention modules:
the convolution layers 1, 3, 5, 7 and 9 are PW convolutions with the number of convolution kernels being 3; the convolution layer 2 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 3; the convolution layer 4 is DW convolution with the convolution kernel size of 5 multiplied by 5 and the number of the convolution kernels of 6; the convolution layer 6 is DW convolution with the convolution kernel size of 7 multiplied by 7 and the number of the convolution kernels of 9; the convolution layer 8 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 12;
each pixel attention module consists of one DW convolution and two PW convolutions, wherein a ReLU activation function is set after a first PW convolution layer, and a Sigmoid activation function is set after a second PW convolution layer; reLU activation functions are set after convolutional layer 1, convolutional layer 3, convolutional layer 5, convolutional layer 7, and convolutional layer 9. The feature map size remains unchanged during the image's local detail branching.
The global structure branch consists of a pooled downsampling layer, a convolutional coding layer, a transducer module and an upsampling layer.
The pooling downsampling layer is self-adaptive pooling, and downsampling is carried out on the feature map by 8 times; the convolution coding layer is a convolution with a convolution kernel size of 3×3 and the number of convolution kernels is 16.
Each transducer module contains a multi-headed attention module and an MLP structure with a jump connection. LayerNorm normalization method and GELU activation function are used in the module.
Feature fusion module for inputting two features x 1 ,x 2 First x is set using the linear layer 1 Projected toThereafter using global average pooling GAP (·), MLP layer F MLP (. Cndot.) Softmax, partitioning operation to obtain corresponding fusion weights a 1 ,a 2 And output y. The overall operation is expressed as the formula:
step three, inputting each foggy image in the data set into a global double branch and a local double branch at the same time to respectively obtain a global structural feature map and a local detail feature map;
inputting the global structural feature map and the local detail feature map into a feature fusion module to obtain defogging parameters K (x) corresponding to each image;
the feature fusion module calculates global and local feature weights respectively and performs weighted fusion, and the defogging parameters K (x) are obtained after convolution of the fused features.
Step five, constructing an image defogging training loss function;
the loss function is formed by weighting and combining an L1 loss function, a structural similarity SSIM loss function and a comparison regularization loss function;
the L1 loss function is expressed as:
wherein J is i GT for the ith pixel of the image output by the defogging network i The ith pixel of the corresponding real haze-free image is obtained, and N is the total number of pixels of the image;
the structural similarity SSIM loss is expressed as:
wherein J is an image output by a fog network, GT is a corresponding real fog-free image, mu J Sum mu GT Mean value, sigma, of image output by defogging network and defogging image in window J Sum sigma GT Respectively representing standard deviation sigma of images output by defogging network and images without defogging in window JGT The image output by the defogging network and the defogging image are represented by covariance in a window, and the window size is 11 multiplied by 11.C (C) 1 And C 2 Is a constant;
the contrast regularization loss is expressed as:
F CR representing the ratio of the distance between the network output image and the corresponding real foggy image and the distance between the network output image and the corresponding foggy image in the feature space.
Total loss function F loss The method comprises the following steps:wherein omega 123 Is the corresponding weight.
Step six, utilizing images formed by the defogging and foggy images to train an image defogging network model, and optimizing parameters of the image defogging network model through a loss function;
training involves the following process:
1) Aiming at the foggy images in the image pair, performing data enhancement by using a vertical overturning, horizontal overturning and random cutting mode;
2) Inputting the foggy image with the enhanced data into an image defogging network model to obtain defogging parameters corresponding to each foggy image, and calculating defogged image J (x);
the formula is: j (x) =k (x) I (x) -K (x)
Wherein I (x) represents a currently input foggy image;
3) Calculating the loss between the output image J (x) and the defogging image in the image pair through a loss function, feeding back to the defogging network model of the image, and updating the model weight;
4) Repeating the step 2) until the preset iteration times are reached, and storing the parameter updated last time as the optimal parameter of the image defogging model;
step seven, inputting a new single image to be defogged into an optimal image defogging network model, and directly outputting defogged images;
the invention has the advantages that:
(1) According to the unmanned aerial vehicle image defogging method based on the global and local double-branch network, a large amount of data is used for training, and compared with the traditional method, the defogging effect is better;
(2) According to the image defogging method of the unmanned aerial vehicle based on the global and local double-branch networks, after an image defogging network model is trained, image defogging can be carried out by inputting a foggy image, and the foggy image is output without other information;
(3) The unmanned aerial vehicle image defogging method based on the global and local double-branch networks has no requirement on the size of the image, can process images with various sizes, and has the advantages of high processing speed and wide application range.
Drawings
FIG. 1 is a view image of an unmanned aerial vehicle with non-uniform haze concentration distribution in the prior art;
FIG. 2 is a flow chart of a defogging method for an image of a unmanned aerial vehicle based on global and local dual-branch networks according to the present invention;
FIG. 3 is a schematic diagram of the structure of an image defogging network model employed in the present invention;
FIG. 4 is a schematic diagram of a partial detail branching architecture employed in the present invention;
FIG. 5 is a schematic diagram of a pixel attention architecture diagram employed by the present invention;
FIG. 6 is an example of a foggy image of an unmanned aerial vehicle and its defogging results employed in the present invention;
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention discloses an unmanned aerial vehicle image defogging method based on global and local double-branch networks. Because the unmanned plane platform possibly has the problem of low equipment calculation force, the common convolution is replaced by the depth separable convolution in the local detail branch, and the transform module is firstly downsampled and then input in the global structure branch, so that the calculation amount and the parameter amount are greatly reduced while the defogging effect of the model is ensured. In order to better extract image features, a pixel attention module is used for strengthening the features of key areas in the feature map; and effectively fusing the global structure branch characteristic diagram and the local detail branch characteristic diagram by using a weighted fusion mode. In order to avoid that the network falls into the local optimum in the training process and cannot reach the optimum result, the method uses a mixed loss function and uses a weighted sum of multiple loss functions to avoid that the network falls into the local optimum. Simulation effects prove that the method has a good defogging effect on the unmanned aerial vehicle images with uneven haze distribution.
As shown in fig. 2, the specific steps are as follows:
step one, establishing an unmanned aerial vehicle foggy image data set by utilizing the existing unmanned aerial vehicle foggy images and combining a depth map of each foggy image;
aiming at a plurality of unmanned aerial vehicle images, respectively estimating depth maps of the images by using a depth estimation model, randomly selecting an atmospheric scattering coefficient and atmospheric environment light, and generating the fog images with different concentrations by using the atmospheric scattering model;
in a real hazy image, the haze distribution is generally uneven, and the haze concentration is closely related to the distance. Most current methods of synthesizing a foggy image require that the image itself have depth information, i.e., RGBD images, or that the depth d (x) value be fixed during the synthesis; the former greatly limits the range of synthesizing the foggy image, and can not synthesize on the common RGB image; the haze distribution in the image synthesized by the latter is completely uniform and is inconsistent with the reality.
The method does not require the image to contain depth information, and uses the depth estimation model to acquire the image depth map, the depth map can embody the distance condition in the image, and the foggy image which accords with the actual condition can be conveniently synthesized according to the depth map, so that the application range is wider, and the synthesis effect is better.
For example, a user may synthesize hazy images at various concentrations from an atmospheric scattering model of haze using images in the unmanned target detection dataset VisDrone 2019;
specifically, a MiDaS model is used for obtaining a depth map d (x) of a current defogging image, and a foggy image corresponding to the defogging image is synthesized by utilizing a formula 1;
equation 1: i (x) =j (x) e βd(x) +A(1-e βd(x) )
Wherein J (x) is a haze-free image, I (x) is a haze image, beta is an atmospheric scattering coefficient, the value range is [0.2,1], A is atmospheric environment light, and the value range is [0.7,0.9].
In the implementation process, each pair of defogging images is synthesized into 10 foggy images, and each foggy image and the corresponding defogging image are combined into an image pair;
step two, constructing a defogging network model based on global and local double-branch images;
as shown in fig. 3, the image defogging network includes two branches: the method comprises the steps of respectively forming a local detail branch and a global structure branch, and setting a feature fusion module to fuse features of the two branches after the two branches;
the global structure branch acquires the whole haze structure information of the image by using a transformer structure, the local detail branch acquires the detail information of the image by using a plurality of depth separable convolutions, the calculated amount is reduced, and the feature fusion module carries out weighted fusion on the feature graphs of the two branches;
1) Local detail branching structure
As shown in fig. 4, it consists of 9 convolution layers and 4 pixel attention modules:
the convolution layers 1, 3, 5, 7 and 9 are PW convolutions with the number of convolution kernels being 3; the convolution layer 2 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 3; the convolution layer 4 is DW convolution with the convolution kernel size of 5 multiplied by 5 and the number of the convolution kernels of 6; the convolution layer 6 is DW convolution with the convolution kernel size of 7 multiplied by 7 and the number of the convolution kernels of 9; the convolution layer 8 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 12;
each pixel attention module consists of one DW convolution and two PW convolutions, wherein a ReLU activation function is set after a first PW convolution layer, and a Sigmoid activation function is set after a second PW convolution layer; reLU activation functions are set after convolutional layer 1, convolutional layer 3, convolutional layer 5, convolutional layer 7, and convolutional layer 9. The feature map size remains unchanged during the image's local detail branching.
The local detail branches designed by the embodiment use depth separable convolution, so that the parameter quantity and the calculated quantity of the network are reduced; the depth separable convolutions include DW convolutions and PW convolutions, with one convolution kernel of the DW convolutions being responsible for only one channel, which will only be convolved by one channel. DW convolution significantly reduces the computational effort compared to normal convolution. The PW convolution is similar to the common convolution calculation method, the convolution kernel size is 1×1×m, and M is the number of channels of the feature map of the upper layer. The PW convolution can carry out weighted summation on the feature images of the upper layer in the depth direction to generate a new feature image, and the information among different channels is fused to make up the defect of the DW convolution.
The present embodiment adds a pixel attention mechanism to the depth separable convolution, making the network more focused on important areas in the image. For the input feature F, three convolution layers are used to transform the dimensions of the feature from c×h×w to 1×h×w. Wherein the first convolution layer is DW convolution with convolution kernel size of 3×3, and the second convolution layer is convolution kernel number ofThe third convolution layer is PW convolution with a number of convolution kernels of 1. Features of dimension C x H x W are unchanged in dimension after passing through the first convolution layer and become +.>The dimension becomes 1×h×w after passing through the third convolution layer. The structure of the pixel attention is shown in FIG. 5; the process can be expressed as:
equation 2: pa=σ (Conv (δ (Conv (F)))
Wherein Conv () represents a convolution layer, delta () represents a ReLU activation function, and sigma () represents a Sigmoid activation function;
and multiplying the calculated PA feature and the input F feature by elements to obtain an output result of the pixel attention.
2) Global structure branching
Consists of a pooled downsampling layer, a convolution coding layer, a transducer module and an upsampling layer.
The pooling downsampling layer is self-adaptive pooling, and downsampling is carried out on the feature map by 8 times; the convolution coding layer is a convolution with a convolution kernel size of 3×3 and the number of convolution kernels is 16.
In the embodiment, the adaptive average pooling downsampling is performed firstly, and then the blocking flattening operation is performed, so that the effect of reducing the output characteristic dimension and the calculated amount of a transducer module is achieved.
Each transducer module contains a multi-headed attention module and an MLP structure with a jump connection. LayerNorm normalization method and GELU activation function are used in the module. Since the input of the transducer does not contain location information, a one-dimensional, learnable location embedding is used to preserve the location information.
3) Feature fusion module
For inputting two features x 1 ,x 2 First x is set using the linear layer 1 Projected toThereafter using global average pooling GAP (·), MLP layer F MLP (. Cndot.) Softmax, partitioning operation to obtain corresponding fusion weights a 1 ,a 2 And output y. The overall operation is expressed as the formula:
compared with a mode of directly adding different features, the method of using weighted fusion can better fuse the features from different branches, and further improve the feature extraction capability of the network.
Step three, inputting each foggy image in the data set into a global double branch and a local double branch at the same time to respectively obtain a global structural feature map and a local detail feature map;
after the foggy image is input into the global structure branch, the foggy image is firstly encoded through a convolution layer 1 to obtain a feature image 1, the feature image 1 is subjected to self-adaptive average pooling to obtain a feature image 2 through 8 times downsampling, the feature image 2 is flattened in a wide-high dimension to obtain a feature image 3, the feature image 3 is subjected to a transducer module to generate a feature image 4, and the feature image 4 is subjected to 8 times upsampling to generate the global structure feature image.
The method can utilize the long-distance perception characteristic of the transducer to acquire the global structural information of the image, and the parameter number and the calculated amount can be greatly reduced by inputting the acquired global structural information into the transducer module after downsampling, so that the real-time operation of a subsequent model is facilitated.
After a fog image is input into a local detail branch, generating a feature map 1 through a convolution layer 1 and a convolution layer 2, generating a feature map 2 through a pixel attention module 1, generating a feature map 3 through a convolution layer 3 and a convolution layer 4 after the feature map 1 is processed, splicing the feature map 3 and the feature map 1 in a channel direction to obtain a feature map 4, generating a feature map 5 through the pixel attention module 2 after the feature map 4 is processed, generating a feature map 6 through the convolution layer 5 and the convolution layer 6 after the feature map 5 is processed, splicing the feature map 6 with the feature map 3 and the feature map 1 in the channel direction to obtain a feature map 7, generating a feature map 8 through the pixel attention module 3 after the feature map 7 is processed, generating a feature map 9 through the convolution layer 7 and the convolution layer 8, splicing the feature map 9 with the feature map 6, the feature map 3 and the feature map 1 in the channel direction to obtain a feature map 10, generating a feature map 11 after the feature map 10 is processed through the convolution layer 8 and the convolution layer 9, and obtaining the local detail branch feature map after the feature map 11 is processed through the pixel attention module 4.
In the process of acquiring the local detail branch feature images, the size of the feature images is kept unchanged all the time, and the detail information of the images is reserved to the greatest extent.
Inputting the global structural feature map and the local detail feature map into a feature fusion module to obtain defogging parameters K (x) corresponding to each image;
the feature fusion module calculates global and local feature weights respectively and performs weighted fusion, and the defogging parameters K (x) are obtained after convolution of the fused features.
Step five, constructing an image defogging training loss function;
the loss function is formed by weighting and combining an L1 loss function, a structural similarity SSIM loss function and a comparison regularization loss function; compared with the L1 loss or L2 loss function commonly used by other methods, the performance of the network can be improved, and the network is prevented from falling into local optimum.
Comparing an image output by the defogging network with a corresponding real defogging image, calculating loss, wherein the loss function adopts L1 loss, structural similarity SSIM loss and contrast regularization loss, and the specific expression is as follows:
the L1 loss function is expressed as:
wherein J is i GT for the ith pixel of the image output by the defogging network i The ith pixel of the corresponding real haze-free image is obtained, and N is the total number of pixels of the image;
compared with the L2 loss function, the L1 loss function has more excellent performance and is less prone to being trapped in local optimum.
The structural similarity SSIM loss is expressed as:
wherein J is an image output by a fog network, GT is a corresponding real fog-free image, mu J Sum mu GT Mean value, sigma, of image output by defogging network and defogging image in window J Sum sigma GT Respectively representing standard deviation sigma of images output by defogging network and images without defogging in window JGT The image output by the defogging network and the defogging image are represented by covariance in a window, and the window size is 11 multiplied by 11.C (C) 1 And C 2 The values were 0.0001 and 0.0009 for constants, respectively.
The structural similarity measures the similarity between two images and is closely related to human visual perception. Since a larger value of SSIM indicates a more similar relationship between two images, when the two images are identical, the value of SSIM is 1, and thus the SSIM loss function is selected to be 1-SSIM.
The contrast regularization loss is expressed as:
F CR representing a ratio of a distance between the network output image and the corresponding real foggy image and a distance between the network output image and the corresponding foggy image in the feature space; the function is to shorten the distance between the network output image and the corresponding real fog-free image and to lengthen the network output image and phaseThe distance between the corresponding hazy images.
According to the method, a VGG-19 network is adopted as a network for feature extraction in comparison regularization, and the foggy and foggy images are used for pre-training to further improve the distance between the foggy images and the foggy images in the feature space. And obtaining the comparison regularized output by calculating the distances among three input images in different characteristic layers and weighting and summing. The contrast regularization is only used in the training process, and the training of the network is guided through the loss function, so that the contrast regularization method does not increase the parameters and the calculated amount of the network, and the reasoning of the trained model is not influenced.
Total loss function F loss The method comprises the following steps:
wherein omega 123 Corresponding weights are 1,1,0.8 respectively.
Training an image defogging network model by using image pairs formed by defogging images and foggy images with different concentrations, and optimizing parameters of the image defogging network model through a loss function;
training involves the following process:
1) Aiming at the foggy images in the image pair, performing data enhancement by using a vertical overturning, horizontal overturning and random cutting mode; the size of the image after random clipping is 512×512;
2) Inputting the foggy image with the enhanced data into an image defogging network model to obtain defogging parameters corresponding to each foggy image, and calculating defogged image J (x);
the formula is: j (x) =k (x) I (x) -K (x)
Wherein I (x) represents a currently input foggy image;
3) Calculating the loss between the output image J (x) and the defogging image in the image pair through a loss function, feeding back to the defogging network model of the image, and updating the model weight;
4) Repeating the step 2) until the preset iteration times are reached, and storing the parameter updated last time as the optimal parameter of the image defogging model;
compared with the method for performing image defogging by estimating two parameters of atmospheric ambient light and transmissivity, the method can realize image defogging by only estimating one defogging parameter K (x), can greatly reduce accumulated errors caused by estimating the two parameters, and can further simplify the network structure.
Step seven, inputting a new single image to be defogged into an optimal image defogging network model, and directly outputting defogged images;
the defogging step can realize defogging of the image only by inputting the foggy image and without inputting other information, and can adapt to the input images with different sizes, and the size of the output image is consistent with that of the input image.
Simulation verification
The training data uses the foggy and foggy image pairs synthesized from the static images in the drone target detection dataset VisDrone 2019. The training set is composed of VisDrone 2019 static image training sets, the verification set is composed of VisDrone 2019 static image training sets, the test set is composed of 10 images in the VisDrone 2019 static image test set, each image is composed of 10 hazy images with different haze concentrations, and therefore 64710 training image pairs, 5480 verification image pairs and 100 test image pairs can be obtained in total.
The training images are input into a built image defogging network, an optimizer uses an Adam optimizer, the initial value of the learning rate is set to be 0.0001, a cosine annealing adjustment method is selected by a learning rate adjustment strategy, the learning rate can be gradually reduced to be 0.01 of the initial learning rate in the training process, and the maximum iteration number is set to be 300 rounds.
In the iterative process, the loss function value gradually decreases until the loss function value is stable, and when the loss function value decreases to the lowest and tends to be stable, the network is fully fitted. And after each round is finished, using a verification set to verify the performance of the network, and if the result is better than the best performance of the previous round, storing the current round weight.
The peak signal-to-noise ratio index PSNR of the model on the verification set is 20.81, the structural similarity index SSIM is 0.8211, the peak signal-to-noise ratio index PSNR of the model on the test set is 21.21, and the structural similarity index SSIM is 0.9296. A graph of defogging effect of a partial image of the test set is shown in fig. 6.

Claims (6)

1. The unmanned aerial vehicle image defogging method based on the global and local double-branch networks is characterized by comprising the following specific steps:
firstly, respectively estimating depth maps of all images by using a depth estimation model by using an existing unmanned aerial vehicle image, and generating foggy images with different concentrations by using an atmospheric scattering model; constructing a defogging network model based on global and local double-branch images;
the image defogging network comprises a global structure branch, a local detail branch and a feature fusion module;
the local detail branch consists of 9 convolution layers and 4 pixel attention modules; the global structure branch consists of a pooling downsampling layer, a convolution coding layer, a transducer module and an upsampling layer; feature fusion module for inputting two features x 1 ,x 2 First x is set using the linear layer 1 Projected toThereafter using global average pooling GAP (·), MLP layer E MLP (. Cndot.) Softmax, partitioning operation to obtain corresponding fusion weights a 1 ,a 2 And output y; the specific calculation formula is as follows:
then, inputting each foggy image in the data set into a global and local double branches at the same time, respectively obtaining a global structural feature image and a local detail feature image, and then inputting the global structural feature image and the local detail feature image into a feature fusion module to obtain defogging parameters K (x) corresponding to each image;
then, constructing an image defogging training loss function which is formed by weighting and combining an L1 loss function, a structural similarity SSIM loss function and a contrast regularization loss function;
wherein, the L1 loss function is expressed as:
wherein J is i GT for the ith pixel of the image output by the defogging network i The ith pixel of the corresponding real haze-free image is obtained, and N is the total number of pixels of the image;
the structural similarity SSIM loss is expressed as:
wherein J is an image output by a fog network, GT is a corresponding real fog-free image, mu J Sum mu GT Mean value, sigma, of image output by defogging network and defogging image in window J Sum sigma GT Respectively representing standard deviation sigma of images output by defogging network and images without defogging in window JGT Representing the covariance of the image output by the defogging network and the defogging image in the window; c (C) 1 And C 2 Is a constant;
the contrast regularization loss is expressed as:
F CR representing a ratio of a distance between the network output image and the corresponding real foggy image and a distance between the network output image and the corresponding foggy image in the feature space;
total loss function F loss The method comprises the following steps:
wherein omega 123 Is the corresponding weight;
finally, utilizing images formed by the defogging and foggy images to train an image defogging network model, and optimizing parameters of the image defogging network model through a loss function; and inputting an optimal image defogging network model aiming at a new single image to be defogged, and directly outputting defogged images.
2. The defogging method for images of a unmanned aerial vehicle based on a global and local dual-branch network according to claim 1, wherein each defogging image is synthesized into a plurality of defogging images, and each defogging image is combined with the corresponding defogging image into an image pair.
3. The defogging method for the unmanned aerial vehicle image based on the global and local double-branch network as claimed in claim 1, wherein in the local detail branches, the convolution layers 1, 3, 5, 7 and 9 are PW convolutions with the number of convolution kernels being 3; the convolution layer 2 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 3; the convolution layer 4 is DW convolution with the convolution kernel size of 5 multiplied by 5 and the number of the convolution kernels of 6; the convolution layer 6 is DW convolution with the convolution kernel size of 7 multiplied by 7 and the number of the convolution kernels of 9; the convolution layer 8 is DW convolution with the convolution kernel size of 3 multiplied by 3 and the number of the convolution kernels of 12;
each pixel attention module consists of one DW convolution and two PW convolutions, wherein a ReLU activation function is set after a first PW convolution layer, and a Sigmoid activation function is set after a second PW convolution layer; the ReLU activation functions are set after the convolution layer 1, the convolution layer 3, the convolution layer 5, the convolution layer 7 and the convolution layer 9; the feature map size remains unchanged during the image's local detail branching.
4. The defogging method based on the global and local double-branch network unmanned aerial vehicle image according to claim 1, wherein in the global structure branch, a pooling downsampling layer is self-adaptive pooling, and a feature map is downsampled by 8 times; the convolution coding layer is a convolution with the size of 3 multiplied by 3 and the number of the convolution kernels being 16;
each transducer module contains a multi-headed attention module and a MLP structure with jump junctions, using LayerNorm normalization and gel activation functions in the module.
5. The defogging method based on the global and local dual-branch network unmanned aerial vehicle images according to claim 1, wherein the feature fusion module calculates global and local feature weights respectively and performs weighted fusion, and convolves the fused features to obtain defogging parameters K (x).
6. A global and local dual-branch network-based unmanned aerial vehicle image defogging method according to claim 1, wherein the training image defogging network model comprises the following procedures:
1) Aiming at the foggy images in the image pair, performing data enhancement by using a vertical overturning, horizontal overturning and random cutting mode;
2) Inputting the foggy image with the enhanced data into an image defogging network model to obtain defogging parameters corresponding to each foggy image, and calculating defogged image J (x);
the formula is: j (x) =k (x) I (x) -K (x)
Wherein I (x) represents a currently input foggy image;
3) Calculating the loss between the output image J (x) and the defogging image in the image pair through a loss function, feeding back to the defogging network model of the image, and updating the model weight;
4) Repeating the step 2) until the preset iteration times are reached, and storing the parameter updated last time as the optimal parameter of the image defogging model.
CN202310037485.0A 2023-01-09 2023-01-09 Unmanned aerial vehicle image defogging method based on global and local double-branch network Pending CN116542864A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310037485.0A CN116542864A (en) 2023-01-09 2023-01-09 Unmanned aerial vehicle image defogging method based on global and local double-branch network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310037485.0A CN116542864A (en) 2023-01-09 2023-01-09 Unmanned aerial vehicle image defogging method based on global and local double-branch network

Publications (1)

Publication Number Publication Date
CN116542864A true CN116542864A (en) 2023-08-04

Family

ID=87449428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310037485.0A Pending CN116542864A (en) 2023-01-09 2023-01-09 Unmanned aerial vehicle image defogging method based on global and local double-branch network

Country Status (1)

Country Link
CN (1) CN116542864A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977651A (en) * 2023-08-28 2023-10-31 河北师范大学 Image denoising method based on double-branch and multi-scale feature extraction
CN117576536A (en) * 2024-01-18 2024-02-20 佛山科学技术学院 Foggy image fusion model and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977651A (en) * 2023-08-28 2023-10-31 河北师范大学 Image denoising method based on double-branch and multi-scale feature extraction
CN116977651B (en) * 2023-08-28 2024-02-23 河北师范大学 Image denoising method based on double-branch and multi-scale feature extraction
CN117576536A (en) * 2024-01-18 2024-02-20 佛山科学技术学院 Foggy image fusion model and method
CN117576536B (en) * 2024-01-18 2024-04-23 佛山科学技术学院 Foggy image fusion model and method

Similar Documents

Publication Publication Date Title
CN112288658B (en) Underwater image enhancement method based on multi-residual joint learning
CN111915531B (en) Neural network image defogging method based on multi-level feature fusion and attention guidance
CN109087273B (en) Image restoration method, storage medium and system based on enhanced neural network
CN116542864A (en) Unmanned aerial vehicle image defogging method based on global and local double-branch network
Cao et al. Underwater image restoration using deep networks to estimate background light and scene depth
US20180231871A1 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
CN109034184B (en) Grading ring detection and identification method based on deep learning
CN110555465B (en) Weather image identification method based on CNN and multi-feature fusion
CN110288550B (en) Single-image defogging method for generating countermeasure network based on priori knowledge guiding condition
CN108629291B (en) Anti-grid effect human face depth prediction method
CN110363727B (en) Image defogging method based on multi-scale dark channel prior cascade deep neural network
CN111508013B (en) Stereo matching method
CN110136162B (en) Unmanned aerial vehicle visual angle remote sensing target tracking method and device
CN113450273B (en) Image defogging method and system based on multi-scale multi-stage neural network
CN112632311A (en) Cloud layer change trend prediction method based on deep learning
CN116258817B (en) Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN110349093B (en) Single image defogging model construction and defogging method based on multi-stage hourglass structure
JP6916548B2 (en) Learning methods and devices that segment images with at least one lane using embedding loss and softmax loss to support collaboration with HD maps needed to meet level 4 of autonomous vehicles, and use them. Test method and test equipment
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN111932452B (en) Infrared image convolution neural network super-resolution method based on visible image enhancement
CN113724155A (en) Self-boosting learning method, device and equipment for self-supervision monocular depth estimation
CN112419163B (en) Single image weak supervision defogging method based on priori knowledge and deep learning
CN112785517B (en) Image defogging method and device based on high-resolution representation
CN114463196A (en) Image correction method based on deep learning
CN113989612A (en) Remote sensing image target detection method based on attention and generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination