CN115439363A - Video defogging device and method based on comparison learning - Google Patents

Video defogging device and method based on comparison learning Download PDF

Info

Publication number
CN115439363A
CN115439363A CN202211078484.2A CN202211078484A CN115439363A CN 115439363 A CN115439363 A CN 115439363A CN 202211078484 A CN202211078484 A CN 202211078484A CN 115439363 A CN115439363 A CN 115439363A
Authority
CN
China
Prior art keywords
image
defogging
video
foggy
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211078484.2A
Other languages
Chinese (zh)
Inventor
赵佳
杨子龙
王宇
杨颖�
余正涛
郭晨靓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Fuyang Normal University
Original Assignee
Hefei University of Technology
Fuyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology, Fuyang Normal University filed Critical Hefei University of Technology
Priority to CN202211078484.2A priority Critical patent/CN115439363A/en
Publication of CN115439363A publication Critical patent/CN115439363A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a video defogging device and a defogging method based on contrast learning, which adopt an end-to-end image defogging method and comprise the following steps: acquiring experimental data, designing a defogging network model, analyzing an atmospheric scattering model, and simultaneously learning the transmissivity t (x) and atmospheric light A parameters in the atmospheric scattering model; then, calculating to obtain a fog-free image; respectively taking the information of the blurred image and the clear image as a negative sample and a positive sample, so that the defogged image output by the network is close to the clear image and far away from the foggy image in the representation space; acquiring fog data in a fog day; the acquired video information is transmitted to a terminal through a WIFI module to be processed, the foggy video is cut into single foggy images frame by frame, the foggy images to be processed adopt a trained defogging network model to obtain fogless images, and finally the fogless images are spliced into the fogless video again. The invention can improve the contrast of the image, enhance the detail information of the image and show good defogging performance.

Description

Video defogging device and method based on comparison learning
Technical Field
The application belongs to the field of computer image processing, and particularly relates to a video defogging device and a defogging method based on contrast learning.
Background
Under haze weather, due to the scattering effect of suspended particles in the atmosphere on light, the quality of a shot image is reduced, if image distortion and detail information are lost, the visual field is blurred due to fog, the visibility is reduced, the definition is reduced, people cannot acquire effective information from a foggy image, and certain threats are caused to a plurality of computer vision tasks such as target detection, video monitoring and unmanned driving. The video defogging algorithm is based on defogging of a single image, and is used for defogging each frame of image in the video independently, and then the processed images are arranged in sequence to reconstruct the video, so that the video defogging can be completed. Video defogging has important meaning to unmanned, intelligent monitoring, but the research about video defogging is less now, and defogging effect is relatively poor, and real-time performance is also relatively poor. Therefore, improving the visual effect and real-time performance of video defogging becomes a key problem of research in the field, and has important significance for the development of computer vision tasks.
At present, no very effective technology is available for defogging a long-period video image, the research on the video defogging technology is still in an initial research stage, defogging by using a single image is still the mainstream of the research on the video defogging technology, the weather and scene information acquisition and analysis are still the largest interference factors of the video processing technology, and the video image is blurred due to haze weather, so that the extraction of image information is not facilitated. Because of the limitation of the current video defogging technical field, continuous videos cannot be defogged, and single-frame processing is still performed by taking a frame as a unit, wherein the video defogging is the extension and extension of defogging of single-frame images.
The image defogging algorithm is mainly divided into a traditional image defogging algorithm and an image defogging algorithm based on deep learning, and the traditional image defogging algorithm can be divided into a defogging algorithm based on image enhancement and an image restoration. The image enhancement mainly aims to improve the contrast of an image, reduce noise, highlight useful information of the image and have a certain defogging effect, but the algorithm does not consider the reason of image formation in foggy days and can cause the loss of image detail information. Based on an image restoration method such as a dark channel prior algorithm, a color prior algorithm and the like, the atmosphere scattering model explains the generation process of the foggy day image, and the foggy image is input into the converted atmosphere scattering model to obtain a defogged image. The most classical method is a dark channel prior inspection algorithm, a large number of outdoor fog-free and fog-containing images are observed, the fact that the pixels in the non-sky area of most fog-free images have very low brightness values is found, the lowest brightness value is almost 0, even though the defogging algorithm based on the dark channel prior is the most classical defogging algorithm, the brightness of the defogged image obtained by the method is dark, and the defogging results of the images in the sky area and the non-sky area are different, so that color distortion is easily caused on the image in the sky area. The image defogging method based on deep learning overcomes the defects of the traditional defogging method, and has the advantages that the characteristics can be learned by training data, the defogging network trains a large number of foggy images and fogless images to obtain the mapping relation between the foggy images and the fogless images, and the fogless images are input into the trained model to directly output the fogless images.
The existing defogging method based on deep learning mostly adopts clear defogging images as positive samples to guide the training of a defogging network, and neglects the effective utilization of negative samples. The information of the foggy image and the corresponding clear image are respectively used as a negative sample and a positive sample, and the comparison learning can enable the output fogless image to be closer to the positive sample and far away from the negative sample. Therefore, the training of the network can be supervised by using the positive and negative sample information through comparative learning, and the defogging effect of the network is further improved.
Disclosure of Invention
Based on the analysis of the single image defogging and video defogging research situation, the invention provides a video defogging device and a defogging method based on comparison learning in order to overcome the defects of the prior art.
The purpose of the invention can be realized by the following technical scheme:
the invention provides a video defogging device based on contrast learning, which comprises a data acquisition module, a WIFI module and a processing terminal, wherein the data acquisition module is used for acquiring a video image; in foggy days, acquiring foggy data by using a data acquisition module (a camera device); the method comprises the steps that collected video information is transmitted to a processing terminal to be processed through a WIFI module, the processing terminal comprises a video processing module, a preprocessing module, a grid network, a post-processing module and a clear image generating module, the video processing module is used for selecting each frame of image in video data as an image to be processed, the preprocessing module is composed of a convolution layer and a residual error dense block (RDB), 16 feature maps of foggy images are obtained through the convolution layer to be input, more features capable of being learned in a self-adaptive mode can be fused through features in the RDB module, the grid network is a multi-scale feature fusion grid network combining with an attention system, the post-processing module and the preprocessing module are symmetrical in structure, image distortion or artifacts are prevented, the foggy images pass through the preprocessing module, the grid network and the post-processing module obtain parameters K (x), then the clear image generating module outputs the fogless images, and finally image fusion is carried out, and the fogless video is output.
The invention also provides a video defogging method based on contrast learning, which comprises the following steps:
s1, acquiring and processing image data, namely cutting an original data set into a preset image size to obtain a training set by using paired foggy images and fogless images in the RESIDE data set as original training data;
s2, analyzing the atmospheric scattering model, wherein the atmospheric scattering model is used for defogging images, two parameters of transmissivity t (x) and atmospheric light A need to be estimated, and when the two parameters are respectively estimated for defogging images, accumulation and even amplification errors can be caused, so that the atmospheric scattering model is changed, the two parameters of t (x) and A are unified into K (x), and the reconstruction errors of the output images and the real fog-free images are reduced;
s3, in a training stage, a K (x) estimation module is built, and a more accurate intermediate transmission diagram is estimated;
s4, in a training stage, on the basis of the step S2, the changed atmospheric scattering model is used as an image restoration problem to be processed, and the transmission diagram obtained in the step S3 is used as input to obtain a defogged image;
s5, constructing contrast loss by using a contrast learning strategy and carrying out multi-round training, and taking the foggy image and the corresponding clear image as a negative sample and a positive sample respectively to ensure that the image obtained in the step S4 is pulled to be closer to the clear image in a representation space and pushed away from the blurred image;
s6, in a testing stage, a camera collects a foggy video, and video processing is carried out by taking a frame as a unit to obtain a set of single foggy images;
s7, in the testing stage, inputting the single image obtained in the previous step into the trained defogging network model to obtain a fog-free image;
and S8, in the testing stage, fusing images and outputting a defogged video.
Further, step S1 is to input a clear image by using an atmospheric scattering model, and generate a corresponding foggy image, where the atmospheric scattering model has the following formula:
I(x)=J(x)t(x)+A(1-t(x)),
where I (x) is a hazy image, J (x) is a generated haze-free image, a represents a global atmospheric light value, t (x) represents a transmittance, and t (x) is defined as:
Figure BDA0003832713690000051
where β is the atmospheric scattering coefficient and d (x) is the depth of field.
The step S2 specifically includes: the conventional image defogging algorithm based on the atmospheric scattering model is mainly divided into three steps of estimating a transmission matrix t (x) from a blurred image I (x) by using a complex depth model, then estimating atmospheric light by using some empirical methods, and finally obtaining a defogged image by using the atmospheric model. However, the separate estimation of the atmospheric light and the transmittance by the procedure will lead to error amplification, so the text mainly replaces two parameters of the atmospheric light and the transmittance by K (x), and is transformed by the formula (1):
J(x)=k(x)I(x)-k(x)+b
Figure BDA0003832713690000052
where b is a deviation whose default value is 1, and t (x) and a are integrated into K (x), it is possible to reduce errors of the generated image and the original image since K (x) depends on the input fogging image.
Further, the K (x) estimation module constructed in the step S3 consists of a preprocessing module, a mesh network and a post-processing module.
The preprocessing module of the K (x) estimation module consists of a convolution layer and a residual error dense block (RDB), 16 feature map inputs are obtained by a foggy image through the convolution layer, more features capable of being learned in a self-adaptive mode are fused in the RDB module, each RDB block consists of 5 convolution layers, the first four layers are used for increasing the number of feature mappings, the last layer is used for fusing the feature mappings, and then the output of the RDB block is combined with the input of the RDB block;
the mesh network is a multi-scale feature fusion mesh network combined with an attention mechanism. Each row is composed of 5 RDB blocks, the up-sampling and down-sampling structures are the same, and feature maps with different scales are obtained through up-sampling or down-sampling of each column. After a downsampling block, the number of channels of the feature map is increased, the size of the feature map is reduced to half of the original size, and the upsampling result is opposite to the original size. Each RDB block is feature fused with the up-sampled or down-sampled result using a channel attention mechanism. The ReLU activation function is used after each convolutional layer. Setting the feature numbers of three different scales as 16, 32 and 64 respectively;
the post-processing module is a post-processing module which is symmetrical to the preprocessing structure because an image directly obtained through a mesh network can be distorted or generate an artifact.
Further, step S4 specifically includes:
and inputting the foggy image into a K (x) estimation module, outputting a more accurate intermediate transmission diagram, inputting the intermediate transmission diagram into an improved atmosphere scattering model formula, and outputting a defogged image.
Further, step S5 specifically includes:
the comparison learning aims at distinguishing data, so that the distance between the training result and the positive sample is shortened, and the distance between the training result and the negative sample is enlarged. The positive sample and the negative sample are respectively composed of a clear image and a synthesized foggy image, a common feature space is selected from the pre-training model VGG-19, and the contrast loss can be expressed as:
Figure BDA0003832713690000071
where J denotes a fog-free image as a positive sample, I denotes a composite fog-free image as phi (I, w) is a fog-free image generated by a defogging model, and G j Representing the extraction of features from different layers of pre-training, D (x, y) is the L1 distance between the two, w j Are the weight coefficients.
Compared with the prior art, the method has the following advantages:
1. the invention unifies the transmissivity in the atmospheric scattering model and the global atmospheric light value into a parameter K (x), can minimize the error of an output image and the real world depending on an input foggy image, generates different weight values for each channel by using a channel attention mechanism in a K (x) estimation module, and can unevenly process different characteristics and pixel regions.
2. The invention adopts a comparison learning strategy to guide the training of the defogging network, and simultaneously utilizes the positive and negative samples as the supervision information of the network, so that the defogging image is closer to the clear image serving as the positive sample and is far away from the foggy image serving as the negative sample, and the defogging effect is further improved.
3. The trained defogging network model obtains better objective and subjective evaluation results on an open test set, and fig. 4 is an experimental comparison result of the defogging data set by the method and the comparison method.
Drawings
Fig. 1 is a flow chart of a video defogging method.
FIG. 2 shows a K (x) estimation block used in the method of the present invention.
FIG. 3 is a diagram of an RDB module used in the present invention.
Fig. 4 is a schematic structural diagram of a video defogging device based on comparative learning according to an embodiment of the present invention.
FIG. 5 is a quantitative comparison of the inventive and comparative methods on a defogged dataset, evaluated by two evaluation indices PSNR and SSIM.
FIG. 6 is a qualitative comparison of the inventive and comparative methods on hazy images, from left to right, in order of hazy images, dark channel defogging, MSCNN defogging, dehazeNet defogging, CAP defogging, AODNet defogging, GCANet defogging, MSBDN defogging, defogging images of the present methods, and corresponding sharp images.
Fig. 7 is a comparison of several frames of a foggy video, the first line being the original video sequence and the second line being the dehazed video sequence.
Detailed Description
In order to explain the contents of the present invention more clearly, the present invention will be further explained with reference to the accompanying drawings.
The invention provides a video defogging method based on contrast learning, which specifically comprises the following steps:
s1, acquiring and processing image data, namely cutting an original data set into a preset image size to obtain a training set by using paired foggy images and fogless images in the RESIDE data set as original training data;
s2, analyzing the atmospheric scattering model, wherein the atmospheric scattering model is used for defogging images, two parameters of transmissivity t (x) and atmospheric light A need to be estimated, and when the two parameters are respectively estimated for defogging images, accumulation and even amplification errors can be caused, so that the atmospheric scattering model is changed, the two parameters of t (x) and A are unified into K (x), and the reconstruction errors of the output images and the real fog-free images are reduced;
s3, in a training stage, a K (x) estimation module is built, and a more accurate intermediate transmission diagram is estimated;
s4, in the training stage, on the basis of the step S2, the changed atmospheric scattering model is used as an image restoration problem to be processed, and the transmission diagram obtained in the step S3 is used as input to obtain a defogged image;
s5, constructing contrast loss by using a contrast learning strategy, and respectively taking the foggy image and the corresponding clear image as a negative sample and a positive sample to ensure that the image obtained in the step S4 is pulled to be closer to the clear image and pushed away from the blurred image in a representation space;
s6, in a testing stage, a camera collects a foggy video, and video processing is carried out by taking a frame as a unit to obtain a set of single foggy images;
s7, in the testing stage, the single image obtained in the step S6 is input into the trained defogging network model to obtain a fog-free image;
and S8, in the testing stage, fusing images and outputting a defogged video.
The image defogging method needs the pair training of the fog-free image and the corresponding fog image, but because the original data set is difficult to collect, the fog image is synthesized through an atmospheric scattering model.
The mathematical model of the foggy day imaging is as follows:
I(x)=J(x)t(x)+A(1-t(x)),
where I (x) is a hazy image, J (x) is a generated haze-free image, a represents a global atmospheric light value, and t (x) represents a transmittance.
In this example, we used a RESIDE data set, including a large-scale Indoor Training Set (ITS) and an Outdoor Training Set (OTS), where the indoor training set generates 13990 Zhang Youwu images from 1399 clear images and corresponding depth maps using an atmospheric scattering model, and the atmospheric light value A ∈ [0.7,1.0], and the scattering coefficient β ∈ [0.6,1.8]. The outdoor training set consists of outdoor synthesized foggy images and corresponding clear images, wherein the atmospheric light value A belongs to [0.8,1], and the scattering coefficient belongs to [0.04,0.2]. The test set is a synthetic objective test set SOTS, which comprises 500 pairs of indoor test sets and 500 pairs of outdoor test sets.
In this embodiment, the K (x) estimation module is composed of a preprocessing module, a mesh network, and a post-processing module. The preprocessing module comprises a convolution layer and a residual error dense block (RDB), 16 feature map inputs are obtained by the foggy image through the convolution layer, more features capable of self-adaptive learning are fused in the feature fusion in the RDB module, each RDB block comprises 5 convolution layers, the first four layers are used for increasing the number of feature mappings, the last layer is used for fusing the feature mappings, and then the output of the feature mappings is combined with the input of the RDB block; the mesh network is a multi-scale feature fusion mesh network combined with an attention mechanism. Each row is composed of 5 RDB blocks, the up-sampling and down-sampling structures are the same, and feature maps with different scales are obtained through up-sampling or down-sampling of each column. After a downsampling block, the number of channels of the feature map is increased, the size of the feature map is reduced to half of the original size, and the upsampling result is opposite to the original size. Each RDB block is feature fused with the up-sampled or down-sampled result using a channel attention mechanism. The ReLU activation function is used after each convolutional layer. Setting the feature numbers of three different scales as 16, 32 and 64 respectively; the post-processing module is a post-processing module which is symmetrical to the preprocessing structure because an image directly obtained through a mesh network can be distorted or generate an artifact.
In this example, two parameters of the transmittance and the atmospheric light value in the atmospheric scattering model are unified as K (x), and the converted atmospheric scattering model is:
J(x)=k(x)I(x)-k(x)+b,
Figure BDA0003832713690000111
where b is a deviation whose default value is 1, and t (x) and a are integrated into K (x), it is possible to reduce errors of the generated image and the original image since K (x) depends on the input fogging image.
And inputting the foggy image into a K (x) estimation module, outputting a more accurate intermediate transmission diagram, inputting the intermediate transmission diagram into an improved atmosphere scattering model formula, and outputting a defogged image.
In this embodiment, the network is optimized by using an Adam optimizer in the training process, the initial learning rate is 0.001, and when an indoor data set is trained, 100 iterations are performed, and the learning rate is reduced to half of the original learning rate every 20 rounds. Similarly, when training the outdoor data set, the learning rate is reduced to half of the original rate after every 2 times of iteration, and 10 training rounds are performed in total. And (3) establishing a defogging network model under a PyTorch1.9.0 framework, wherein the GPU model is NVIDIA GeForce RTX 2080Ti.
In the example, the image defogging effect is evaluated through two evaluation indexes of the PSNR and the SSIM, the larger the PSNR value is, the smaller the image distortion degree is, the SSIM measures the similarity of the images from three aspects of brightness, contrast and structure, and the larger the SSIM value is, the more the original information of the output defogged image is.
In this embodiment, a foggy video is input, and video processing is performed in units of frames to obtain a set of single foggy images; inputting the single image into a trained defogging network model to obtain a fog-free image;
and finally, carrying out image fusion and outputting the defogged video.
In this embodiment, the loss function of the defogging network consists of the smooth L1 loss, the perceptual loss and the contrast loss, and the specific formula is as follows:
L=L S +L g +λL p ,
wherein L is g Denotes the loss of contrast, L s Denotes the loss of smoothing L1, L p Representing the perceptual loss, λ is a parameter that adjusts the relative weight on the loss component, and in this embodiment, λ is set to 0.04.
The comparative loss formula is as follows:
Figure BDA0003832713690000121
where J denotes a fog-free image as a positive sample, I denotes a composite fog-free image as phi (I, w) is a fog-free image generated by a defogging model, and G j Representing the extraction of features from different layers of pre-training, D (x, y) is the L1 distance between the two, w j Are the weight coefficients.
The smooth L1 loss provides a measure of the difference between the defogged image and the true sharp image, can quickly converge at a position far away from the optimal solution, can slowly derive until the optimal solution is reached when the optimal solution is about to be reached, and can effectively prevent gradient explosion. The smoothed L1 loss equation is as follows:
Figure BDA0003832713690000122
wherein M represents the total number of pixels,
Figure BDA0003832713690000123
and J i (x) Respectively representing the intensity of the ith color channel of pixel x in the defogged image and the truthful clear image, and F S (x) The following were used:
Figure BDA0003832713690000124
the nature of the perceptual loss is that two pictures are matched at the depth characteristic level, and the perceptual loss formula is as follows:
Figure BDA0003832713690000131
in this example, a VGG16 pre-trained on ImageNet is used as the loss network, and features are extracted from the last layer of each of the first three stages, j represents the jth layer of the network,
Figure BDA0003832713690000132
and phi j (J) Feature maps representing the defogged image and the real clear image, respectively, C j H j W j The dimensions of the jth layer profile are shown.
As shown in fig. 4, an embodiment of the present invention provides a video defogging device based on contrast learning, including a data acquisition module, a WIFI module, and a processing terminal. In the foggy days, acquiring foggy data by using a camera device; transmitting the collected video information to a terminal for processing through a WIFI module; the processing terminal comprises a video processing module, a preprocessing module, a grid network, a post-processing module and a clear image generating module, wherein the video processing module is used for selecting each frame of image in video data as an image to be processed, the preprocessing module consists of a convolution layer and a residual error dense block (RDB), 16 feature maps of the foggy image are obtained through the convolution layer and input, more features capable of being learned in a self-adaptive mode are fused in the feature fusion of the RDB, the grid network is a multi-scale feature fusion grid network combined with an attention mechanism, the post-processing module is symmetrical to the preprocessing module in structure, image distortion or artifact generation is prevented, the foggy image passes through the preprocessing module, the grid network and the post-processing module to obtain a parameter K (x), then the clear image generating module outputs a fogless image, and finally the image fusion is carried out, and the fogless video is output.
The video defogging device based on the comparative learning and the video defogging method based on the comparative learning provided by the embodiment belong to the same concept, and the specific embodiment process is shown in the method embodiment, and the beneficial effects are the same as the method embodiment.

Claims (7)

1. The video defogging device based on the comparison learning is characterized by comprising a data acquisition module, a WIFI module and a processing terminal; in foggy days, acquiring foggy data by using a data acquisition module; the processing terminal comprises a video processing module, a preprocessing module, a grid network, a post-processing module and a clear image generating module, wherein the video processing module is used for selecting each frame of image in video data as an image to be processed, the preprocessing module consists of a convolution layer and a residual dense block, 16 feature graphs of the foggy image are input through the convolution layer, more features capable of being learned in a self-adaptive mode are fused in the residual dense block, the grid network is a multi-scale feature fusion grid network combining an attention system, the post-processing module and the preprocessing module are symmetrical in structure, image distortion or artifact is prevented, the foggy image passes through the preprocessing module and the grid network, the post-processing module obtains a parameter K (x), then the foggy image is output through the clear image generating module, image fusion is finally carried out, and the foggy video is output.
2. The defogging and defogging method for the video defogging device based on the contrast learning as claimed in claim 1, wherein the method comprises the following steps:
s1, acquiring and processing image data, namely cutting an original data set into a preset image size for training by using paired foggy images and fogless images in the RESIDE data set as original training data;
s2, analyzing the atmospheric scattering model, wherein the atmospheric scattering model is used for defogging images, two parameters of transmissivity t (x) and atmospheric light A need to be estimated, and when the two parameters are respectively estimated for defogging images, accumulation and even amplification errors can be caused, so that the atmospheric scattering model is changed, the two parameters of t (x) and A are unified into K (x), and the reconstruction errors of the output images and the real fog-free images are reduced;
s3, in a training stage, a K (x) estimation module is built, and a more accurate intermediate transmission diagram is estimated;
s4, in the training stage, on the basis of the step S2, the changed atmospheric scattering model is used as an image restoration problem to be processed, and the transmission diagram obtained in the step S3 is used as input to obtain a defogged image;
s5, constructing contrast loss by using a contrast learning strategy, and respectively taking the foggy image and the corresponding clear image as a negative sample and a positive sample to ensure that the image obtained in the step S4 is pulled to be closer to the clear image and pushed away from the blurred image in a representation space;
s6, in the testing stage, inputting a foggy video, and performing video processing by taking a frame as a unit to obtain a set of single foggy images;
s7, in the testing stage, inputting the single image obtained in the previous step into the trained defogging network model to obtain a fog-free image;
and S8, in the testing stage, fusing the images and outputting the defogged video.
3. The video defogging method based on the comparative learning as claimed in claim 2, wherein the step S1 specifically comprises: acquiring indoor and outdoor fog-free images, and generating fog images corresponding to the fog-free images according to an atmospheric scattering model, wherein a mathematical model of fog day imaging is as follows:
I(x)=J(x)t(x)+A(1-t(x))
where I (x) is a hazy image, J (x) is a generated haze-free image, a represents a global atmospheric light value, t (x) represents a transmittance, and t (x) is defined as:
Figure FDA0003832713680000021
where β is the atmospheric scattering coefficient and d (x) is the depth of field.
4. The video defogging method based on the comparative learning as claimed in claim 2, wherein the step S2 specifically comprises the following steps: the conventional image defogging algorithm based on the atmospheric scattering model mainly comprises three steps of estimating a transmission matrix t (x) from a blurred image I (x) by using a complex depth model, estimating atmospheric light by using some empirical methods, and finally obtaining a defogged image by using the atmospheric model; however, the separate estimation of the atmospheric light and the transmittance by the process will cause error amplification, so that the text mainly replaces two parameters of the atmospheric light and the transmittance by K (x), and the atmospheric scattering model is deformed to obtain:
J(x)=k(x)I(x)-k(x)+b
wherein:
Figure FDA0003832713680000031
b is a deviation of default value 1, t (x) and a are integrated into K (x), and since K (x) depends on the input fogging image, errors of the generated image and the original image can be reduced.
5. The video defogging method according to claim 2, wherein the step S3 specifically comprises: the K (x) estimation module mainly comprises a preprocessing module, a grid network and a post-processing module; the preprocessing module of the K (x) estimation module consists of a convolution layer and a residual error dense block (RDB), 16 feature map inputs are obtained by a foggy image through the convolution layer, more features capable of being learned in a self-adaptive mode are fused in the RDB module, each RDB block consists of 5 convolution layers, the first four layers are used for increasing the number of feature mappings, the last layer is used for fusing the feature mappings, and then the output of the RDB block is combined with the input of the RDB block; the grid network of the K (x) estimation module is a multi-scale feature fusion grid network combined with an attention mechanism; each row is composed of 5 RDB blocks, the up-sampling and down-sampling structures are the same, and feature maps with different scales are obtained through up-sampling or down-sampling of each column; after a downsampling block, the number of channels of the feature map is increased, the size of the feature map is reduced to a half of the original size, and an upsampling result is opposite to the upsampling result; each RDB block and an up-sampling or down-sampling result are subjected to feature fusion by using a channel attention mechanism; the ReLU activation function is used after each convolutional layer; setting the feature numbers under three different scales as 16, 32 and 64 respectively; the post-processing module of the K (x) estimation module is introduced to be symmetrical with the pre-processing structure because the image directly obtained through the mesh network may be distorted or generate artifacts.
6. The video defogging method according to claim 2, wherein feature maps of different scales may not have the same importance, therefore, a channel attention mechanism is integrated in the mesh network, trainable weights for feature fusion are generated, different weight values are generated for each channel, and different features and pixel regions are unequally processed based on the weight values.
7. The video defogging method based on the contrast learning as recited in claim 2, wherein the contrast learning aims at distinguishing data, so that the distance between the training result and the positive sample is reduced, and the distance between the training result and the negative sample is enlarged, therefore, the generalization ability of the model is stronger, and the quality of the generated fog-free image is better; the positive sample and the negative sample are respectively composed of a clear image and a synthesized fog image, a common feature space is selected from a pre-training model VGG-19, and the contrast loss can be expressed as follows:
Figure FDA0003832713680000041
where J denotes a fog-free image as a positive sample, I denotes a composite fog-free image as phi (I, w) is a fog-free image generated by a defogging model, and G j Representing the extraction of features from different layers of pre-training, D (x, y) is the L1 distance between the two, and wj is the weighting factor.
CN202211078484.2A 2022-09-05 2022-09-05 Video defogging device and method based on comparison learning Pending CN115439363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211078484.2A CN115439363A (en) 2022-09-05 2022-09-05 Video defogging device and method based on comparison learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211078484.2A CN115439363A (en) 2022-09-05 2022-09-05 Video defogging device and method based on comparison learning

Publications (1)

Publication Number Publication Date
CN115439363A true CN115439363A (en) 2022-12-06

Family

ID=84247388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211078484.2A Pending CN115439363A (en) 2022-09-05 2022-09-05 Video defogging device and method based on comparison learning

Country Status (1)

Country Link
CN (1) CN115439363A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343144A (en) * 2023-05-24 2023-06-27 武汉纺织大学 Real-time target detection method integrating visual perception and self-adaptive defogging

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343144A (en) * 2023-05-24 2023-06-27 武汉纺织大学 Real-time target detection method integrating visual perception and self-adaptive defogging
CN116343144B (en) * 2023-05-24 2023-08-11 武汉纺织大学 Real-time target detection method integrating visual perception and self-adaptive defogging

Similar Documents

Publication Publication Date Title
Santra et al. Learning a patch quality comparator for single image dehazing
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN110070517B (en) Blurred image synthesis method based on degradation imaging mechanism and generation countermeasure mechanism
CN109410135B (en) Anti-learning image defogging and fogging method
CN111275638B (en) Face repairing method for generating confrontation network based on multichannel attention selection
CN114782298B (en) Infrared and visible light image fusion method with regional attention
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
CN116757986A (en) Infrared and visible light image fusion method and device
CN117197624A (en) Infrared-visible light image fusion method based on attention mechanism
CN115439363A (en) Video defogging device and method based on comparison learning
CN115861094A (en) Lightweight GAN underwater image enhancement model fused with attention mechanism
CN117274759A (en) Infrared and visible light image fusion system based on distillation-fusion-semantic joint driving
CN116468645A (en) Antagonistic hyperspectral multispectral remote sensing fusion method
CN116757988A (en) Infrared and visible light image fusion method based on semantic enrichment and segmentation tasks
Wang et al. Multiscale supervision-guided context aggregation network for single image dehazing
Singh et al. Visibility enhancement and dehazing: Research contribution challenges and direction
CN113628143A (en) Weighted fusion image defogging method and device based on multi-scale convolution
CN116664448B (en) Medium-high visibility calculation method and system based on image defogging
CN116863320B (en) Underwater image enhancement method and system based on physical model
CN115631428B (en) Unsupervised image fusion method and system based on structural texture decomposition
CN115311186B (en) Cross-scale attention confrontation fusion method and terminal for infrared and visible light images
CN116542865A (en) Multi-scale real-time defogging method and device based on structural re-parameterization
CN116385293A (en) Foggy-day self-adaptive target detection method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination