CN116703752A - Image defogging method and device of near infrared fused transducer structure - Google Patents

Image defogging method and device of near infrared fused transducer structure Download PDF

Info

Publication number
CN116703752A
CN116703752A CN202310524524.XA CN202310524524A CN116703752A CN 116703752 A CN116703752 A CN 116703752A CN 202310524524 A CN202310524524 A CN 202310524524A CN 116703752 A CN116703752 A CN 116703752A
Authority
CN
China
Prior art keywords
image
visible light
feature
near infrared
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310524524.XA
Other languages
Chinese (zh)
Inventor
张佳
艾欣
白永强
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202310524524.XA priority Critical patent/CN116703752A/en
Publication of CN116703752A publication Critical patent/CN116703752A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image defogging method and device of a near infrared fused transform structure, which can utilize additional near infrared characteristics as supplementary information and adopt a transform deep neural network structure. And correspondingly shooting a near infrared image and a visible light image aiming at the same scene to form a data set. Inputting the near infrared image and the visible light image into a pre-trained defogging model of a transducer structure image; the model carries out coding characterization on visible light and near infrared images to respectively obtain feature vectors of the visible light and near infrared images, and the feature vectors are fused to obtain interaction feature vectors; and respectively decoding the feature vectors of the visible light and the near infrared images to obtain visible light and near infrared image sequences, combining the visible light and near infrared image sequences with the interactive feature vectors after processing, carrying out channel recombination and convolution processing on the combined result, and outputting the defogged image result.

Description

Image defogging method and device of near infrared fused transducer structure
Technical Field
The invention relates to the technical field of digital image processing, in particular to an image defogging method and device of a near infrared fused transducer structure, which are suitable for the pre-processing of computer vision application and can be widely applied to the fields of traffic target detection, road monitoring and the like.
Background
The visibility of haze weather is low, and degradation of the photographed images and videos often occurs. In order to improve the quality of an image and to increase the sharpness of the image, defogging treatment is required for the image.
Single image defogging aims to generate a fogless image from a blurred image. It is a classical image processing problem and has been an important research topic in the field of vision for the last decade. Defogging algorithms can be categorized into two categories: traditional defogging methods and methods based on deep learning. The conventional methods are mainly classified into two types, image enhancement and image restoration. The image enhancement method highlights image details, improves contrast ratio, has a wide application range, and can effectively improve contrast ratio of foggy images. The defogging algorithm based on image restoration is a physical process for researching degradation of images in foggy days, and is used for observing and summarizing a large number of foggy images and inverting the degradation process so as to obtain an estimated value of a clear image.
The traditional algorithm has good effect on fog image restoration under a certain scene, has a narrow application range, is not fine enough in the aspect of restoring the details of images, is complex in calculation, and cannot meet the requirements of real-time processing needed in many places at present. There are two methods based on deep learning, one is to use physical models or manual features in combination with neural networks, and the other is an end-to-end method that does not involve physical models. These techniques require a mapping of the foggy image and its corresponding true sharp image to train the model. The traditional convolutional neural network algorithm still has poor generalization capability and cannot correct noise and distortion.
With the vigorous development of the attention mechanism, the self-attention mechanism transducer structure has achieved better results in the visual field such as image processing, target detection and the like. How to use a self-attention mechanism transducer structure to defogging the image so as to obtain a defogging image with high resolution, high fidelity and rich texture details is an unsolved problem in the prior art.
Disclosure of Invention
In view of this, the invention provides a method and a device for defogging images of a near infrared fused Transformer structure, which can utilize additional near infrared characteristics as supplementary information, and simultaneously adopts a Transformer deep neural network structure, so that the method and the device have stronger characteristic extraction capability than the traditional convolutional neural network, and generate defogged images with high resolution, high fidelity and outstanding texture details.
The technical scheme for realizing the invention is as follows: the image defogging method of the near infrared fused transducer structure is characterized by comprising the following steps of:
s1: and correspondingly shooting a near infrared image and a visible light image aiming at the same scene to form a data set.
S2: inputting the near infrared image and the visible light image into a pre-trained defogging model of a transducer structure image; firstly, encoding and characterizing a visible light image and a near infrared image by a transform structure image defogging model to respectively obtain a feature vector of the visible light image and a feature vector of the near infrared image, and obtaining an interaction feature vector after the feature vector and the feature vector are fused; and then respectively decoding the feature vector of the visible light image and the feature vector of the near infrared image to obtain a visible light image sequence and a near infrared image sequence, combining the visible light image sequence and the near infrared image sequence with the interactive feature vector after processing, and then carrying out channel recombination and convolution processing on the combined result to output an defogged image result.
Further, the defogging model of the transducer structure image comprises a visible light image feature coding module, a near infrared image feature coding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module.
The flow executed by each module is as follows:
the visible light image feature coding module is used for representing the visible light image, obtaining feature vectors of three groups of visible light images according to RGB channels, and firstly comprises a downsampling convolution layer and a global average pooling layer, and then is connected with three transform coding units in series.
The near-infrared characteristic coding module is used for representing the near-infrared image to obtain a group of characteristic vectors of the near-infrared image, and comprises a downsampling convolution layer and a global average pooling layer, and two transform coding units are connected in series.
And the feature interaction module is used for obtaining interaction feature vectors through a deconvolution layer and a ReLU activation layer in a feature image addition mode by the feature vectors of the visible light image and the feature vectors of the near infrared image.
The visible light characteristic decoding module converts three groups of visible light image characteristic vectors into a visible light image sequence, and the visible light characteristic decoding module comprises three transform coding units and an up-sampling convolution layer.
The near-infrared feature decoding module converts a group of near-infrared image feature vectors into a near-infrared image sequence, and the near-infrared feature decoding module comprises two transform coding units and an up-sampling convolution layer.
The feature fusion module carries out global average pooling on the visible light sequence, then carries out 3*3 convolution, carries out 1*1 convolution on the near infrared image sequence, combines the results of the two steps and the interaction feature vector in a feature map addition mode, carries out channel recombination on the combined results, and finally carries out 5*5 depth separable convolution and size adjustment convolution to obtain a defogging clear color image result.
Further, after outputting the defogged image result, calibrating the defogged image result by adopting an image discriminator; the image discriminator adopts two block structures comprising a convolution layer, a normalization layer, a ReLU activation layer and a pooling layer, deep information of the image is checked by a mode of mixing channel attention and space attention, and true and false probabilities of the image are judged through a Softmax function.
Further, the pre-trained defogging model of the transducer structure image is pre-trained by adopting the following method:
training of a defogging model of a transducer structural image using three loss constraints, the first loss being a Charbonnier loss;
the second type of loss is a perceived loss, and the loss is calculated by using a feature map output by a pre-trained VGG16 network at the 14 th layer;
the third type of loss is the true and false probability loss obtained by the image discriminator.
The invention also provides an image defogging device of a near infrared fused transducer structure, which comprises a data acquisition module and a transducer structure image defogging model module.
And the data acquisition module is used for correspondingly shooting and acquiring visible light images and near infrared images aiming at the same scene.
The image defogging model module of the transducer structure is used for encoding and characterizing the visible light image and the near infrared image by using the visible light image and the near infrared image, respectively obtaining the feature vector of the visible light image and the feature vector of the near infrared image, and obtaining the interaction feature vector after the two are fused; and then respectively decoding the feature vector of the visible light image and the feature vector of the near infrared image to obtain a visible light image sequence and a near infrared image sequence, combining the visible light image sequence and the near infrared image sequence with the interactive feature vector after processing, and then carrying out channel recombination and convolution processing on the combined result to output an defogged image result.
Preferably, the defogging model of the image of the transducer structure comprises a visible light image feature encoding module, a near infrared image feature encoding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module.
The visible light image feature coding module is used for representing the visible light image and obtaining three groups of feature vectors of the visible light image according to RGB channels, and comprises a downsampling convolution layer and a global average pooling layer, and is connected with three transform coding units in series.
The near-infrared characteristic coding module is used for representing the near-infrared image to obtain a group of characteristic vectors of the near-infrared image, and comprises a downsampling convolution layer and a global average pooling layer, and two transform coding units are connected in series.
And the feature interaction module is used for obtaining interaction feature vectors through a deconvolution layer and a ReLU activation layer in a feature image addition mode by the feature vectors of the visible light image and the feature vectors of the near infrared image.
The visible light characteristic decoding module converts three groups of visible light image characteristic vectors into a visible light image sequence, and the decoding module comprises three transform coding units and an up-sampling convolution layer.
The near-infrared feature decoding module converts a set of near-infrared image feature vectors into a near-infrared image sequence, and comprises two transform coding units and an up-sampling convolution layer.
The feature fusion module carries out global average pooling on the visible light sequence, then carries out 3*3 convolution, carries out 1*1 convolution on the near infrared image sequence, combines the results of the two steps and the interaction feature vector in a feature map addition mode, carries out channel recombination on the combined results, and finally carries out 5*5 depth separable convolution and size adjustment convolution to obtain a defogging clear color image result.
The beneficial effects are that:
1. the image defogging method of the near infrared fused transducer structure provided by the invention adopts the image defogging model of the transducer structure to defog, and the transducer structure solves the problems of small local receptive field and deep loss of details of the traditional convolution. Texture information of the visible light image can be supplemented using the near infrared image. Compared with the traditional method for defogging single image, the method solves the problems of near infrared and visible light image fusion and visible light image defogging.
2. Compared with the traditional loss, the training strategy provided by the invention when training the defogging model of the image with the transducer structure can ensure that the gradient does not disappear and the generated image is more vivid, and the method provided by the invention obtains better evaluation on the data set.
Drawings
Fig. 1 is a schematic flow chart of an image defogging method of a near infrared fused transducer structure according to an embodiment of the present invention;
FIG. 2 is a block diagram of a defogging model of a transducer structure image provided by an embodiment of the present invention;
FIG. 3 is a block diagram of an image discriminator according to an embodiment of the invention;
FIG. 4 is a block diagram of coding units in a defogging model of a transducer structure image according to an embodiment of the present invention;
fig. 5 is a flowchart of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
Example 1:
the embodiment provides an image defogging method of a near infrared fused transducer structure, which uses a near infrared image to supplement texture information of a visible light image, wherein the visible light image has the characteristics of rich colors, strong authenticity and the like, and the near infrared image has low resolution but strong transmission force and can be processed by machine vision more quickly. The near infrared imaging technology has strong anti-interference capability and can provide rich target information and background information. Therefore, the target fused by the near infrared image and the visible light image not only needs to keep abundant visible light image detail information, but also improves the contrast ratio of the target and the background, thereby being beneficial to human eye vision and image processing. The method is realized by the following steps as shown in fig. 1:
s1, shooting near infrared images and visible light images corresponding to the same scene to form a data set.
And correspondingly inputting the infrared image and the visible light image into the defogging model of the constructed transducer structure image.
S2, obtaining clear color picture results after defogging according to the paired near infrared picture data and the defogging model of the transducer structure image. In the embodiment of the invention, the defogging model of the Transformer structural image comprises a visible light image feature coding module, a near infrared image feature coding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module; the composition of which is shown in figure 2.
Step 2 may include the following steps S21-S26;
s21, characterizing a visible light image through a visible light image feature coding module, and obtaining three groups of feature vectors of the visible light image according to three different channels, wherein the coding module comprises a downsampling convolution layer and a global average pooling layer, and is connected with three transform coding units in series; the structure of the transducer coding unit is shown in FIG. 3.
The method for characterizing the visible light image, which obtains the feature vectors of three groups of visible light images according to three different channels, comprises the following steps:
s211.1, separating three color channels of a visible light picture by using a Python image processing library to obtain a R, G, B matrix;
s211.2 sends the R, G, B matrix to the transducer coding unit, respectively. Taking a group of matrixes as an example, firstly, obtaining a matrix X through reflection expansion, and secondly, respectively calculating a query vector Q, a key vector K and a value vector V of the matrix through two linear connection layers. Q and K can be divided into multiple groups Q according to the division of fixed parameters i And K i ,Q i And K i The transpose of the V is multiplied and normalized to V i Multiply, output Z i The formula of (2) is as follows:
wherein d k Represented by Q i ,K i Is a normalized exponential function.
Further, output Z i Will splice together to form a matrix Z, which is the multi-headed attention output, and pass throughAnd the matrix X' after the convolution layer passes through a residual structure formed by a linear connecting layer and a multi-layer perceptron, and finally outputs Y which are the results of different channel matrixes after passing through a transform coding unit.
Three R, G, B matrices result in an output vector Y R 、Y G 、Y B Three image feature vectors fused with the self-attention mechanism represent information contents worth transferring on different channels and are sent to a subsequent decoding module.
S22, characterizing a near infrared image through a near infrared feature coding module to obtain a group of feature vectors of the near infrared image, wherein the coding module firstly comprises a downsampling convolution layer and a global average pooling layer, and then is connected with two transducer coding units in series, and the coding units are the same as S311.2 to obtain an output Y NIR The result is that the near infrared channel characteristic matrix is subjected to a transducer coding unit.
S23, through a feature interaction module, in order to carry out depth fusion on features of the two images on a feature level, the complementarity is utilized to code and output a vector Y of the visible light image R 、Y G 、Y B And the encoded output vector Y of the near infrared image NIR And obtaining the interaction feature vector through a deconvolution layer and a ReLU activation layer in a feature map adding mode. The specific expression is as follows:
o=ReLU(Deconv(Y R +Y G +Y B +Y NIR ))
where Deconv represents a deconvolution operation and ReLU represents a numerical activation. The obtained vector o represents the visible light and near infrared characteristic vector fused with the coding stage, and the vector is sent to a subsequent characteristic fusion module to supplement the characteristic relation obtained by each stage.
S24, encoding three groups of visible light images to output vectors Y through a visible light characteristic decoding module R 、Y G 、Y B Into a sequence of visible light images M RGB The decoding module comprises three transform coding units and an up-sampling convolution layer, and the transform unit in the decoder and the Trans in the encoder of the inventionThe former unit design is the same, both using the same number of attention heads and encoder settings. The output of the decoder is consistent with the original input image dimension. During training, the network weights are updated by calculating the loss between the decoder output and the encoder input using the Charbonnier function as a loss function. After model convergence, the output of the visible light signature decoder will be stored as M RGB
S25, converting a group of near infrared image feature vectors into a near infrared image sequence M through a near infrared feature decoding module NIR The decoding module comprises two transform coding units and an up-sampling convolution layer, the coding units are the same as in S34, and the output of near infrared feature decoding is stored as M NIR
S26, the visible light sequence M is processed through a feature fusion module RGB Global average pooling is carried out, convolution of 3*3 is carried out, and the near infrared image sequence M is carried out NIR The convolution of 1*1 is carried out twice, the results of the two steps and the interaction feature vector o are combined in a feature image addition mode, then the channel recombination is carried out on the combined results, finally the depth separable convolution and the size adjustment convolution of 5*5 are carried out, and a defogging clear color image result is obtained, wherein the expression is as follows:
Middle 1 =Conv 3*3 GAP(M RGB )+Conv 1*1 (Conv 1*1 (M NIR ))+o
Middle 2 =ChannelShuffle(Middle 1 )
O=ResizeConv(depthConv 5*5 (Middle 2 ))
GAP represents global average pooling, channelShelliffle represents channel reorganization strategy, information of vectors can be comprehensively captured, resizer Conv represents size adjustment convolution, checkerboard effect in generated images is eliminated, and finally an image O is output.
S3, generating a defocused clear color image result O is calibrated by an image discriminator, wherein the image discriminator adopts two block structures comprising a convolution layer, a normalization layer, a ReLU activation layer and a pooling layer, deep information of the image is checked by a mode of mixing channel attention and spatial attention, and finally true and false probabilities of the image are judged by a softmax function. The image discriminator composition is shown in fig. 4.
The embodiment of the invention further comprises a training process of using three loss constraint transform structural image defogging models, wherein the first loss is Charbonnier loss, and the specific formula is as follows:wherein y represents a clear image obtained by network estimation, y' represents a true clear image, epsilon represents a constant, and L is ensured c Non-zero and unexpected termination of training;
the second type of loss is a perceived loss, and the loss is calculated by using a feature map output by a pre-trained VGG16 network at the 14 th layer, and the specific formula is as follows: l (L) per = |Φ (y) - Φ (y')|| where Φ (z) represents the feature map of layer 14 of the VGG16 network;
the third loss is true and false probability loss obtained by the image discriminator, and the specific formula isWherein N represents the number of images, D represents an image identifier, and G represents a transform structure image defogging model.
In the embodiment of the invention, the training samples are 280 groups of acquired reference RGB images, foggy RGB images and near infrared images, all three types of images are shot by adopting an A7C, the reference RGB images and the foggy RGB images are shot by adopting a B+W486 (RGB) filter, and the near infrared images are shot by adopting a 093 (near infrared) filter. The three images are all fixed-point shooting, namely, a foggy RGB image and a near infrared image are shot at a foggy day fixed place, a reference RGB image is shot at a sunny day fixed place, and the three shot images are registered in the later stage through similar transformation. For each set of samples, there is a foggy RGB image and a near infrared image as input, and the reference RGB image is used as a label. Due to the small data volume, data enhancement techniques are used to expand the data set, such as affine transformation, horizontal flipping, etc. The dataset was expanded to 400 groups.
In the training experiment, 320 groups are divided into data sets to serve as training sets, and the remaining 80 groups serve as test sets. A pyrerch 1.10 framework was employed. The optimizer selects Adam, the learning rate is set to 0.0001, and the batch data size is set to 4; the iteration number is 400. And inputting the foggy RGB image and the near infrared image into a transducer structure, calculating to obtain an output defogging RGB image, comparing the defogging RGB image with a corresponding same group of reference RGB image, and calculating Charbonnier loss, perception loss and true and false probability loss. And deriving each parameter by the loss function to obtain a gradient, updating network parameters in the transducer structure by adopting a gradient descent method, and inputting the foggy image and the near infrared image again for iteration until the loss function converges and the iteration round number is zero, and finishing training.
Compared with the traditional loss, the training strategy provided by the invention can ensure that the gradient does not disappear, the generated image is more vivid, and the method provided by the invention obtains better evaluation on the data set.
Example 2:
the image defogging device with the near infrared fusion transform structure provided by the embodiment is shown in fig. 4, and comprises a data acquisition module and a transform structure image defogging model module.
The data acquisition module is used for correspondingly shooting and acquiring visible light images and near infrared images aiming at the same scene;
the defogging model module of the image of the transducer structure is used for encoding and characterizing the visible light image and the near infrared image according to the visible light image and the near infrared image, respectively obtaining a feature vector of the visible light image and a feature vector of the near infrared image, and obtaining an interaction feature vector after the two are fused; and then respectively decoding the feature vector of the visible light image and the feature vector of the near infrared image to obtain a visible light image sequence and a near infrared image sequence, combining the visible light image sequence and the near infrared image sequence with the interactive feature vector after processing, and then carrying out channel recombination and convolution processing on the combined result to output an defogged image result.
In the embodiment of the invention, a data processing module and a data dividing module can be added between the two modules.
The data processing module is used for carrying out data augmentation on the near infrared image data sets and expanding the number of the data sets; the method comprises the steps of performing data enhancement on a near infrared data set, and performing tone scale adjustment on a region with small contrast;
the data dividing module is used for carrying out dividing operation according to the public data set and the near infrared data set to obtain a training set, and the training set can be used for training the defogging model of the transducer structure image.
In the embodiment of the invention, the defogging model of the Transformer structural image comprises a visible light image feature coding module, a near infrared image feature coding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module; the composition of the defogging model of the transducer structure image is shown in figure 2.
Further, obtaining a defogged clear color image result according to the paired near infrared image data and the defogging model of the transducer structure image, including:
characterizing a visible light image through a visible light image feature coding module, obtaining three groups of feature vectors of the visible light image according to RGB channels, wherein the coding module comprises a downsampling convolution layer and a global average pooling layer, and is connected with three transform coding units in series;
characterizing the near infrared image through a near infrared feature coding module to obtain a group of feature vectors of the near infrared image, wherein the coding module firstly comprises a downsampling convolution layer and a global average pooling layer, and then is connected with two transform coding units in series;
through a feature interaction module, the feature vector of the visible light image and the feature vector of the near infrared image are subjected to deconvolution layer and a ReLU activation layer in a feature image addition mode to obtain interaction feature vectors;
converting the three groups of visible light image feature vectors into a visible light image sequence through a visible light feature decoding module, wherein the decoding module comprises three transform coding units and an up-sampling convolution layer;
converting a set of near infrared image feature vectors into a near infrared image sequence by a near infrared feature decoding module, wherein the decoding module comprises two transform coding units and an up-sampling convolution layer;
and (3) carrying out global average pooling on the visible light sequence through a feature fusion module, then carrying out 3*3 convolution, carrying out 1*1 convolution on the near infrared image sequence, combining the results of the two steps and the interaction feature vector through a feature map addition mode, carrying out channel recombination on the combined results, and finally carrying out 5*5 depth separable convolution and size adjustment convolution to obtain a defogging clear color image result.
Example 3:
fig. 5 is a flowchart of an embodiment of the present invention for using an electronic device 500, where the electronic device 500 may have a large difference due to different configurations or performances. The electronic device 500 is provided with a near infrared slide 501, and the optional wave band range is 700nm-1500nm, and is used for acquiring near infrared pictures, and when the electronic device is used, the electronic device is installed or detached to acquire visible light and near infrared paired images under the condition of ensuring that other devices are not moved. The electronic device 500 includes one or more photographing lenses 502 and one or more CMOS devices 503, where the photographing lenses 502 may take pictures in different scenes with different focal lengths, such as long focus, short focus, and wide angle, and the CMOS devices 503 need to support no high-pass filtering. The electronic device further comprises one or more processors (centralprocessing units, CPU) 504 and one or more memories 505, wherein the memories 505 have at least one instruction stored therein, the at least one instruction being loaded and executed by the processor 504 to implement the steps of the above-described image defogging method of a near infrared fused transducer structure.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The image defogging method of the near infrared fused transducer structure is characterized by comprising the following steps of:
s1: shooting near infrared images and visible light images correspondingly aiming at the same scene to form a data set;
s2: inputting the near infrared image and the visible light image into a pre-trained defogging model of a transducer structure image;
the defogging model of the transducer structure image is characterized in that a visible light image and a near infrared image are firstly coded and characterized to respectively obtain a feature vector of the visible light image and a feature vector of the near infrared image, and the feature vector are fused to obtain an interaction feature vector;
then, respectively decoding the feature vector of the visible light image and the feature vector of the near infrared image to obtain a visible light image sequence and a near infrared image sequence, and combining the two processed feature vectors with the interaction feature vector;
and then carrying out channel recombination and convolution processing on the combined result, and outputting an defogged image result.
2. The method for defogging an image of a near infrared fused transducer structure according to claim 1, wherein: the defogging model of the Transformer structural image comprises a visible light image feature coding module, a near infrared image feature coding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module;
the flow executed by each module is as follows:
the visible light image feature coding module is used for representing the visible light image and obtaining feature vectors of three groups of visible light images according to RGB channels, and comprises a downsampling convolution layer and a global average pooling layer, and is connected with three transform coding units in series;
the near-infrared characteristic coding module is used for representing the near-infrared image to obtain a group of characteristic vectors of the near-infrared image, and the coding module firstly comprises a downsampling convolution layer and a global average pooling layer and then is connected with two transducer coding units in series;
the feature interaction module obtains interaction feature vectors through a deconvolution layer and a ReLU activation layer in a feature image addition mode by the feature vectors of the visible light image and the feature vectors of the near infrared image;
the visible light characteristic decoding module converts three groups of visible light image characteristic vectors into a visible light image sequence, and the visible light characteristic decoding module comprises three transform coding units and an up-sampling convolution layer;
the near-infrared feature decoding module converts a group of near-infrared image feature vectors into a near-infrared image sequence, and comprises two transform coding units and an up-sampling convolution layer;
the feature fusion module carries out global average pooling on the visible light sequence, then carries out 3*3 convolution, carries out 1*1 convolution on the near infrared image sequence, combines the results of the two steps and the interaction feature vector in a feature map addition mode, carries out channel recombination on the combined results, and finally carries out 5*5 depth separable convolution and size adjustment convolution to obtain a defogging clear color image result.
3. The method for defogging an image of a near infrared fused transducer structure according to claim 1 or 2, wherein after outputting defogged image results, the method further comprises calibrating the defogged image results by using an image discriminator; the image discriminator adopts two block structures comprising a convolution layer, a normalization layer, a ReLU activation layer and a pooling layer, then deep information of the image is checked in a mode of mixing channel attention and space attention, and finally true and false probabilities of the image are judged through a Softmax function.
4. The method for defogging an image of a near infrared fused transducer structure according to claim 2, wherein: the pre-trained defogging model of the transducer structure image is pre-trained in the following mode:
training of a defogging model of a transducer structural image using three loss constraints, the first loss being a Charbonnier loss;
the second type of loss is a perceived loss, and the loss is calculated by using a feature map output by a pre-trained VGG16 network at the 14 th layer;
the third type of loss is the true and false probability loss obtained by the image discriminator.
5. The image defogging device of the near infrared fused transducer structure is characterized by comprising a data acquisition module and a transducer structure image defogging model module;
the data acquisition module is used for correspondingly shooting and acquiring visible light images and near infrared images aiming at the same scene;
the defogging model module of the image of the transducer structure is used for encoding and characterizing the visible light image and the near infrared image according to the visible light image and the near infrared image, respectively obtaining a feature vector of the visible light image and a feature vector of the near infrared image, and obtaining an interaction feature vector after the two are fused; and then respectively decoding the feature vector of the visible light image and the feature vector of the near infrared image to obtain a visible light image sequence and a near infrared image sequence, combining the visible light image sequence and the near infrared image sequence with the interactive feature vector after processing, and then carrying out channel recombination and convolution processing on the combined result to output an defogged image result.
6. The image defogging device of a near infrared fused transducer structure according to claim 5, wherein the transducer structure image defogging model comprises a visible light image feature encoding module, a near infrared image feature encoding module, a feature interaction module, a visible light feature decoding module, a near infrared feature decoding module and a feature fusion module;
the visible light image feature coding module is used for representing the visible light image and obtaining three groups of feature vectors of the visible light image according to RGB channels, and comprises a downsampling convolution layer and a global average pooling layer, and is connected with three transform coding units in series;
the near-infrared characteristic coding module is used for representing the near-infrared image to obtain a group of characteristic vectors of the near-infrared image, and the coding module firstly comprises a downsampling convolution layer and a global average pooling layer and then is connected with two transducer coding units in series;
the feature interaction module obtains interaction feature vectors through a deconvolution layer and a ReLU activation layer in a feature image addition mode by the feature vectors of the visible light image and the feature vectors of the near infrared image;
the visible light characteristic decoding module converts three groups of visible light image characteristic vectors into a visible light image sequence, and the decoding module comprises three transform coding units and an up-sampling convolution layer;
the near-infrared feature decoding module converts a group of near-infrared image feature vectors into a near-infrared image sequence, and comprises two transform coding units and an up-sampling convolution layer;
the feature fusion module carries out global average pooling on the visible light sequence, then carries out 3*3 convolution, carries out 1*1 convolution on the near infrared image sequence, combines the results of the two steps and the interaction feature vector in a feature map addition mode, carries out channel recombination on the combined results, and finally carries out 5*5 depth separable convolution and size adjustment convolution to obtain a defogging clear color image result.
CN202310524524.XA 2023-05-10 2023-05-10 Image defogging method and device of near infrared fused transducer structure Pending CN116703752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310524524.XA CN116703752A (en) 2023-05-10 2023-05-10 Image defogging method and device of near infrared fused transducer structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310524524.XA CN116703752A (en) 2023-05-10 2023-05-10 Image defogging method and device of near infrared fused transducer structure

Publications (1)

Publication Number Publication Date
CN116703752A true CN116703752A (en) 2023-09-05

Family

ID=87828382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310524524.XA Pending CN116703752A (en) 2023-05-10 2023-05-10 Image defogging method and device of near infrared fused transducer structure

Country Status (1)

Country Link
CN (1) CN116703752A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078920A (en) * 2023-10-16 2023-11-17 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism
CN117347306A (en) * 2023-11-06 2024-01-05 沈阳农业大学 Nondestructive testing method and system for fruit quality
CN117726920A (en) * 2023-12-20 2024-03-19 广州丽芳园林生态科技股份有限公司 Knowledge-graph-based plant disease and pest identification method, system, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419392A (en) * 2022-01-19 2022-04-29 北京理工大学重庆创新中心 Hyperspectral snapshot image recovery method, device, equipment and medium
CN114820408A (en) * 2022-05-12 2022-07-29 中国地质大学(武汉) Infrared and visible light image fusion method based on self-attention and convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419392A (en) * 2022-01-19 2022-04-29 北京理工大学重庆创新中心 Hyperspectral snapshot image recovery method, device, equipment and medium
CN114820408A (en) * 2022-05-12 2022-07-29 中国地质大学(武汉) Infrared and visible light image fusion method based on self-attention and convolutional neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LONGBIN YAN等: "Cascaded transformer U-net for image restoration", 《SIGNAL PROCESSING》, vol. 206, 5 January 2023 (2023-01-05), pages 108902 *
WANG, JIE等: "Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》, vol. 114, 11 July 2022 (2022-07-11), pages 115648, XP087140677, DOI: 10.1016/j.engappai.2022.105162 *
吴佳佳: "基于特征融合的RGB-D视觉显著性物体检测算法研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 01, 15 January 2023 (2023-01-15), pages 138 - 65 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078920A (en) * 2023-10-16 2023-11-17 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism
CN117078920B (en) * 2023-10-16 2024-01-23 昆明理工大学 Infrared-visible light target detection method based on deformable attention mechanism
CN117347306A (en) * 2023-11-06 2024-01-05 沈阳农业大学 Nondestructive testing method and system for fruit quality
CN117347306B (en) * 2023-11-06 2024-08-30 沈阳农业大学 Nondestructive testing method and system for fruit quality
CN117726920A (en) * 2023-12-20 2024-03-19 广州丽芳园林生态科技股份有限公司 Knowledge-graph-based plant disease and pest identification method, system, equipment and storage medium
CN117726920B (en) * 2023-12-20 2024-06-07 广州丽芳园林生态科技股份有限公司 Knowledge-graph-based plant disease and pest identification method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113658051B (en) Image defogging method and system based on cyclic generation countermeasure network
CN109360171B (en) Real-time deblurring method for video image based on neural network
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN116703752A (en) Image defogging method and device of near infrared fused transducer structure
CN112233038A (en) True image denoising method based on multi-scale fusion and edge enhancement
Hu et al. Underwater image restoration based on convolutional neural network
CN112767279B (en) Underwater image enhancement method for generating countermeasure network based on discrete wavelet integration
CN110544213A (en) Image defogging method based on global and local feature fusion
CN110136057B (en) Image super-resolution reconstruction method and device and electronic equipment
CN109191366B (en) Multi-view human body image synthesis method and device based on human body posture
CN113284061B (en) Underwater image enhancement method based on gradient network
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN116152120A (en) Low-light image enhancement method and device integrating high-low frequency characteristic information
CN115546505A (en) Unsupervised monocular image depth estimation method based on deep learning
CN115035011B (en) Low-illumination image enhancement method of self-adaption RetinexNet under fusion strategy
CN116957931A (en) Method for improving image quality of camera image based on nerve radiation field
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN111553856A (en) Image defogging method based on depth estimation assistance
CN115511708A (en) Depth map super-resolution method and system based on uncertainty perception feature transmission
CN114565539A (en) Image defogging method based on online knowledge distillation
CN113628143A (en) Weighted fusion image defogging method and device based on multi-scale convolution
CN113379606A (en) Face super-resolution method based on pre-training generation model
CN115311149A (en) Image denoising method, model, computer-readable storage medium and terminal device
CN111353982B (en) Depth camera image sequence screening method and device
CN111292251B (en) Image color cast correction method, device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination