CN114841907A - Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images - Google Patents

Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images Download PDF

Info

Publication number
CN114841907A
CN114841907A CN202210599873.3A CN202210599873A CN114841907A CN 114841907 A CN114841907 A CN 114841907A CN 202210599873 A CN202210599873 A CN 202210599873A CN 114841907 A CN114841907 A CN 114841907A
Authority
CN
China
Prior art keywords
image
network
layer
generator
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210599873.3A
Other languages
Chinese (zh)
Inventor
王文卿
张纪乾
刘涵
李余兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202210599873.3A priority Critical patent/CN114841907A/en
Publication of CN114841907A publication Critical patent/CN114841907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for generating a confrontation fusion network in a multi-scale mode facing infrared and visible light images, which comprises the steps of selecting a plurality of infrared and visible light image pairs from a standard training set, inputting the image pairs into an edge-preserving filter, and obtaining a basic layer and a detail layer; and inputting the basic layer into a gradient filter to obtain a gradient map and a new basic layer, adding the gradient map and the original detail layer to obtain a new detail layer, calculating to obtain network parameters of the discriminator, training, and finally obtaining an output which is a final fusion image. The image obtained by fusion retains the target information and the texture information of the source image to the maximum extent, improves the quality of the fused image, and provides more convenient prerequisite for subsequent target detection and identification.

Description

Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images
Technical Field
The invention belongs to the technical field of image decomposition and image fusion in digital image processing, and particularly relates to a method for generating a confrontation fusion network in a multi-scale mode facing to infrared and visible light images.
Background
Image fusion is a branch of the information fusion field, belongs to the research of the cross field, and relates to the fields of sensor imaging, image preprocessing, computer vision, artificial intelligence and the like. With the rapid development of multi-type imaging sensors, the problem that image target information provided by a single sensor is limited is effectively solved. Aiming at the same scene, two or more source images from the same or different imaging sensors are fused, so that a fused image with rich information and high definition can be obtained. The visible light sensor utilizes the reflected light of an object to image, and the obtained image has the characteristics of high resolution and abundant details. However, in the case of poor lighting conditions, the obtained image is less sharp. During infrared sensor formation of image, the thermal radiation information through the target is imaged, and the penetrating power is stronger, can solve visible light sensor simultaneously and shine not enough or have the object to shelter from the not good problem of formation of image effect under the condition, and it still can detect the target when the illumination condition is relatively poor, but formed image detail information and contrast information are not enough. The infrared and visible light image fusion technology can realize the complementation of the respective advantages of the two images, and ensure that the finally obtained fusion image contains the thermal radiation information, the contrast information and the detail information, so that the target information of the image can be better understood, and the all-weather work of the system can be finally realized. In recent years, the technology based on multi-scale image fusion has made an important progress. In general, multi-scale transform based infrared and visible image fusion schemes include three steps. Firstly, decomposing each source image into a series of multi-scale representations, then fusing the multi-scale representations of the source images according to a given fusion rule, and finally carrying out corresponding multi-scale inverse transformation on the fused images. Meanwhile, with the rapid development of deep learning, the unsupervised deep learning is expanded in the fusion field and achieves certain results. Although the method is more suitable for multi-source image fusion without reference images, higher requirements are put forward on the design of a network structure and a loss function. Therefore, the generation of the anti-network fusion method based on unsupervised is gradually receiving attention of researchers.
Disclosure of Invention
The invention aims to provide a method for generating a confrontation fusion network in multiple scales for infrared and visible light images.
The technical scheme adopted by the invention is that the method for generating the confrontation fusion network in multiple scales for the infrared and visible light images is implemented according to the following steps:
step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs into an edge preserving filter to obtain a basic layer and a detail layer;
step 2, inputting the basic layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new basic layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer;
step 3, inputting the basic layer and the detail layer obtained in the step 2 into a generator network G, obtaining a fusion image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
and 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.
The present invention is also characterized in that,
the filtering formula in step 1 is as follows:
Figure BDA0003666431360000031
wherein:
Figure BDA0003666431360000032
in the formula (1) I q In order to input an image, the image is,
Figure BDA0003666431360000033
for the filtered image, q is I q S is a set of q pixels, p is a pixel in the q domain,
Figure BDA0003666431360000034
is a part of the input image block,
Figure BDA0003666431360000035
is that
Figure BDA0003666431360000036
The peripheral image blocks are displayed on the display unit,
Figure BDA0003666431360000037
is a spatial filter kernel that is a spatial filter kernel,
Figure BDA0003666431360000038
is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
Figure BDA0003666431360000039
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
In step 2
And solving gradient values of all pixel points in the image by the following formula:
Figure BDA00036664313600000310
then, a threshold Gmax is defined, if the gradient value of the pixel point is larger than the threshold, the pixel point is set to be white, otherwise, the pixel point is set to be black, and thus, a gradient image I is obtained G
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d
In the step 3, a generator network structure consists of a double-current network and a convolutional neural network connected behind the double-current network, wherein the upper network and the lower network in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
Generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen ,(6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
Figure BDA0003666431360000041
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,
Figure BDA0003666431360000042
is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))](8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
Computing a generator loss function L G And meanwhile, the SGD is used for updating the network parameters so as to achieve the purpose of optimization and obtain the network parameters of the generator.
Two arbiter networks D in step 3 I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a full connection layer, and the input classification result is output, so that whether the output is a fusion image or a source image is predicted;
the discriminator loss function input for the infrared image and the fused image is:
Figure BDA0003666431360000051
wherein D is I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function with the input of the visible image and the fused image is:
Figure BDA0003666431360000052
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculated
Figure BDA0003666431360000053
And
Figure BDA0003666431360000054
the optimization method for updating the network parameters is that the SGD finally obtains the network parameters of the discriminator.
The method for generating the confrontation fusion network in multiple scales facing the infrared and visible light images combines the multi-scale decomposition and the generation of the confrontation network, not only optimizes the source image, but also applies the neural network with good fusion effect to the fusion process. A method for generating a confrontation fusion network in multiple scales facing to infrared and visible light images includes obtaining a basic layer and a detail layer of an image through an edge filter holder and gradient filtering, retaining the required information to the maximum extent of the obtained image component, fusing the basic layer (structure information) and the detail layer (detail information) after multi-scale decomposition respectively by utilizing two branch networks of a generator network in the confrontation network, adding the generated basic layer image and the detail layer image to obtain a final fusion image, and generating two discriminator structures in the confrontation network to classify and discriminate two source images and the fusion image. The image obtained by fusion of the invention furthest retains the target information and the texture information of the source image, improves the quality of the fused image, and provides more convenient conditions for subsequent target detection and identification.
Drawings
FIG. 1 is a flowchart of the overall method of the present invention for multi-scale generation of a confrontational fusion network for infrared and visible light images;
FIG. 2 is a diagram of base and detail layers after bilateral filtering of a source image in accordance with the present invention;
FIG. 3 is a diagram of a new base layer and detail layer obtained after gradient filtering of the bilaterally filtered base layer according to the present invention;
FIG. 4 is a network architecture diagram of a generator in a generating confrontation network of the present invention;
fig. 5 is a network structure diagram of an arbiter in the generation of a countermeasure network according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a method for generating a confrontation fusion network in multiple scales for infrared and visible light images; the method comprises the steps of decomposing a source image through an edge filter retainer and gradient filter to obtain a basic layer and a detail layer of the image, inputting the basic layer and the detail layer into a generator network in a generation countermeasure network for fusion, inputting a fused image and two source images into a discriminator respectively for discrimination to optimize network parameters, obtaining a final fused image, and achieving image fusion. The overall network structure of the algorithm is shown in FIG. 1, and the fusion process of the infrared and visible light images for generating the anti-network based on multi-scale decomposition is mainly divided into the following three stages;
1) source image multiscale decomposition
The multi-scale decomposition of the source image is mainly divided into three steps, firstly, the image is input into an edge-preserving filter (bilateral filtering) to obtain a base layer and a detail layer, the obtained base layer and the detail layer are shown in figure 2, then, the base layer is subjected to gradient filtering to obtain a gradient map and a new base layer, and finally, the gradient map and the detail layer are added to be used as a new detail layer, so that a new base layer and a new detail layer are obtained, and the new base layer and the new detail layer are shown in figure 3. The bilateral filtering and gradient filtering principles are as follows:
bilateral filtering is an edge-preserving filter, which can achieve the effects of preserving edges and reducing noise and smoothing. As with other filtering principles, bilateral filtering also uses a weighted average method, in which the intensity of a certain pixel is represented by a weighted average of the intensity values of the peripheral pixels, and the weighted average is based on gaussian distribution. Most importantly, the weight of the bilateral filtering not only considers the euclidean distance of the pixel (like common gaussian low-pass filtering, only the influence of the position on the central pixel is considered), but also considers the radiation difference in the pixel range domain (such as the similarity between the pixel and the central pixel in the convolution kernel, the color intensity, the depth distance and the like), and the two weights are considered when the central pixel is calculated. The filter formula is as follows:
Figure BDA0003666431360000071
Figure BDA0003666431360000072
in the formula (1) I q In order to input an image, the image is,
Figure BDA0003666431360000073
for filtered images
And 2, inputting the base layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new base layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer. The gradient filtering principle is as follows:
the gradient is simply the derivative, three different filters: sobel, Scharr and Laplacian; sobel and Scharr actually calculate the first or second derivative; scharr is an optimization of Sobel; laplacian is the second derivative. The Sobel filter is adopted, so that high frequency passes through and low frequency is blocked, and the edge is enabled to achieve the purpose of enhancing the image more obviously. The specific principle is as follows:
the Sobel operator is a discrete difference operator for calculating the approximate value of the gray scale of the image brightness function. Using this operator at any point in the image will produce the corresponding gray scale vector or its normal vector.
Figure BDA0003666431360000081
The operator comprises two sets of 3x3 matrices, horizontal and vertical, respectively, which are then subjected to planar convolution with the image to obtain horizontal and vertical luminance difference approximations. If A represents the original image, Gx and Gy represent the gray level of the image detected by the horizontal and vertical edges respectively, the formula is as follows:
G X =G x *A and G Y =G y *A (4)
2) generating countermeasure network parameter acquisitions
Acquiring network parameters of the discriminator: splicing the base layer and the detail layer obtained in the step 1) in pairs on the dimension of an image channel to be used as the input of a generator, wherein the network structure diagram of the generator is shown in figure 4 and consists of a double-current network and a convolution block. The upper network and the lower network of the double-current network are the same and consist of six layers of convolutional neural networks, the first four layers have the same structure, and the network structure is a convolutional layer of 3 multiplied by 3, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the latter two layers have the same structure and consist of a 5 × 5 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu). The network structure following the dual-stream network consists of 1 × 1 convolutional layers and active layers (the activation function is tanh), and the layer outputs the final fused image.And then splicing respective fusion results (fusion images of the base layer and the detail layer) to obtain a final fusion image. After the generator G is passed through the process, the generator loss function L is calculated G Meanwhile, network parameters are updated by using SGD (random gradient descent) so as to achieve the purpose of optimization, and the network parameters of the generator are obtained.
Acquiring network parameters of the discriminator: because the source image is a pair, two discriminators are used, one for obtaining the probability P that the fused image is an infrared image I And the other probability P of acquiring the fused image as a visible light image V . The two arbiter networks are identical in structure, as shown in fig. 5. The convolutional neural network comprises five layers of convolutional neural networks, wherein the first four layers have the same structure, and the network structure comprises a 3x3 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the last layer is a full connection layer, and the output is the classification result of the input. Inputting a source image and a fusion image to obtain two probabilities, and continuously updating network parameters when the probabilities are larger than a preset threshold until the probabilities are smaller than the preset threshold. In the process, the signal passes through a discriminator D I And D V Then, corresponding discriminator loss function is calculated
Figure BDA0003666431360000091
And
Figure BDA0003666431360000092
the optimization method for updating the network parameters is SGD (random gradient descent), and finally the network parameters of the discriminator are obtained.
The loss function comprises a generator loss function L G And loss function of two discriminators
Figure BDA0003666431360000093
And
Figure BDA0003666431360000094
the design is as follows:
the purpose of the generator loss function is to save more source image information, which consists of two parts, namely content loss and countermeasure loss:
L G =λL content +L Gen , (5)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
Figure BDA0003666431360000095
wherein H and W are the height and width of the image input by the generator, | · | | | luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,
Figure BDA0003666431360000096
is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (7)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
The two discriminators are used for effectively reducing the information loss of the fusion result, and the function of the discriminators is to enable the generator to store more source image information; the definition is as follows:
Figure BDA0003666431360000101
Figure BDA0003666431360000102
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Denotes a discriminator discrimination value D having the fused image as an input I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input.
3) Converged test network
Inputting the generator network parameters obtained in the step 2) into a generator network, carrying out multi-scale decomposition on the tested picture, splicing the corresponding basic layer and the detail layer obtained by decomposition, and inputting the spliced basic layer and detail layer into the generator network, wherein the output of the generator is the final fused image.
The invention relates to a method for generating a confrontation fusion network in multiple scales for infrared and visible light images, a flow chart is shown in figure 1, and the method is implemented according to the following steps:
with reference to fig. 2, step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs to an edge preserving filter (bilateral filtering) to obtain a base layer and a detail layer;
the filtering formula in step 1 is as follows:
Figure BDA0003666431360000103
wherein:
Figure BDA0003666431360000104
in the formula (1) I q In order to input an image, the image is,
Figure BDA0003666431360000105
for the filtered image, q is I q S is a set of q pixels, p is a pixel in the q domain,
Figure BDA0003666431360000106
is a part of the input image block,
Figure BDA0003666431360000107
is that
Figure BDA0003666431360000108
The peripheral image blocks are displayed on the display unit,
Figure BDA0003666431360000109
is a spatial filter kernel that is a spatial filter kernel,
Figure BDA00036664313600001010
is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
Figure BDA0003666431360000111
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
With reference to fig. 3, step 2, inputting the base layer obtained in step 1 to a gradient filter to obtain a gradient map and a new base layer, and adding the gradient map and the detail layer in step 1 to obtain a new detail layer;
in step 2
The gradient filtering principle is as follows:
gradient filtering is simply the derivation of an image, and includes three different filters: sobel, Scharr and Laplacian; sobel and Scharr are first order derivatives, Scharr is an optimization of Sobel, Laplacian is second order derivatives. The Sobel filter is adopted here, and the purpose is to allow high-frequency information to pass through and block low-frequency information, so that the edge is more obvious to achieve the purpose of enhancing the image. It is composed of
The specific principle is as follows:
the Sobel operator is a discrete difference operator for calculating the approximate value of the gray scale of the image brightness function. Using this operator at any point in the image will produce the corresponding gray scale vector or its normal vector.
Figure BDA0003666431360000112
The operator comprises two sets of 3x3 matrices, horizontal and vertical, respectively, which are then subjected to planar convolution with the image to obtain horizontal and vertical luminance difference approximations. If A represents the original image, Gx and Gy represent the gray level of the image detected by the horizontal and vertical edges respectively, the formula is as follows:
G X =G x *A and G y =G y *A (4)
and solving gradient values of all pixel points in the image by the following formula:
Figure BDA0003666431360000113
then, a threshold value Gmax (defined as 150 here) is defined, if the gradient value of the pixel is larger than the threshold value, the pixel is set to be white, otherwise, the pixel is set to be black, and thus, a gradient map I is obtained G
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d
With reference to fig. 4 and 5, step 3, inputting the base layer and the detail layer obtained in step 2 into a generator network G, obtaining a fused image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
in the step 3, a generator network structure consists of a double-current network and a convolutional neural network connected behind the double-current network, wherein the upper network and the lower network in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
Generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen , (6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
Figure BDA0003666431360000121
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f To the output of the generator, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,
Figure BDA0003666431360000131
is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
Calculating generator lossLoss function L G Meanwhile, network parameters are updated by using SGD (random gradient descent) so as to achieve the purpose of optimization, and the network parameters of the generator are obtained.
Two arbiter networks D in step 3 I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, and the output is the classification result of the input, thereby predicting whether the output is a fusion image or a source image (the source image has two possibilities, an infrared image and a visible light image I Or D V ,D I Refers to infrared images, D V Refers to visible light images);
the discriminator loss function input for the infrared image and the fused image is:
Figure BDA0003666431360000132
wherein D is I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function with the input of the visible image and the fused image is:
Figure BDA0003666431360000133
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculated
Figure BDA0003666431360000141
And
Figure BDA0003666431360000142
the optimization method for updating the network parameters is SGD (random gradient descent), and finally the network parameters of the discriminator are obtained.
Step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
specifically, the generator network parameters obtained in the step 4 are loaded into a generator network in a test network, multi-scale decomposition is carried out on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, then corresponding basic layers and detail layers obtained through decomposition are spliced together to serve as the input of the test network, and the obtained output is the final fusion image.
And 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.

Claims (6)

1. The method for generating the confrontation fusion network in a multi-scale mode facing to the infrared and visible light images is characterized by comprising the following steps:
step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs into an edge preserving filter to obtain a basic layer and a detail layer;
step 2, inputting the basic layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new basic layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer;
step 3, inputting the basic layer and the detail layer obtained in the step 2 into a generator network G, obtaining a fusion image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
and 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.
2. The method for multi-scale generation of a confrontational fusion network facing infrared and visible light images according to claim 1, characterized in that the filtering formula in step 1 is as follows:
Figure FDA0003666431350000011
wherein:
Figure FDA0003666431350000021
in the formula (1) I q In order to input an image, the image is,
Figure FDA0003666431350000022
for the filtered image q is I q S is a set of q pixels, p is a pixel in the q domain,
Figure FDA0003666431350000023
is a part of the input image block,
Figure FDA0003666431350000024
is that
Figure FDA0003666431350000025
The peripheral image blocks are displayed on the display unit,
Figure FDA0003666431350000026
is a spatial filter kernel that is a spatial filter kernel,
Figure FDA0003666431350000027
is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
Figure FDA0003666431350000028
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
3. The method for multi-scale generation of an antagonistic fusion network oriented to infrared and visible light images according to claim 2, characterized in that in step 2
And solving gradient values of all pixel points in the image through the following formula:
Figure FDA0003666431350000029
then, a threshold Gmax is defined, if the gradient value of the pixel is greater than the threshold, the pixel is set to be white, otherwise, the pixel is set to be black,thus, a gradient map I is obtained G
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d
4. The method for multi-scale generation of the confrontation fusion network facing the infrared and visible light images according to claim 3, wherein the generator network structure in the step 3 is composed of a double-current network and a convolutional neural network connected behind the double-current network, the upper and lower two networks in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
5. Method for multi-scale generation of an anti-convergence network for infrared and visible light images according to claim 4, characterized in that the generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen , (6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
Figure FDA0003666431350000031
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,
Figure FDA0003666431350000032
is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Representing a discriminator discrimination value inputted with a visible light image or a fusion image;
computing a generator loss function L G And meanwhile, the SGD is used for updating the network parameters so as to achieve the purpose of optimization and obtain the network parameters of the generator.
6. Method for multi-scale generation of a contextualized network oriented to infrared and visible light images according to claim 5, characterized in that in step 3 there are two discriminator networks D I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a full connection layer, and the input classification result is output, so that whether the output is a fusion image or a source image is predicted;
the discriminator loss function input for the infrared image and the fused image is:
Figure FDA0003666431350000041
wherein D is I (I I ) Indicating input in the form of infrared imagesDiscriminator discrimination value, D I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function input for the visible image and the fused image is:
Figure FDA0003666431350000042
wherein D is V (I V ) Denotes a discriminator discrimination value D of a visible light image as an input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculated
Figure FDA0003666431350000043
And
Figure FDA0003666431350000044
the optimization method for updating the network parameters is that the SGD finally obtains the network parameters of the discriminator.
CN202210599873.3A 2022-05-27 2022-05-27 Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images Pending CN114841907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210599873.3A CN114841907A (en) 2022-05-27 2022-05-27 Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210599873.3A CN114841907A (en) 2022-05-27 2022-05-27 Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images

Publications (1)

Publication Number Publication Date
CN114841907A true CN114841907A (en) 2022-08-02

Family

ID=82572920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210599873.3A Pending CN114841907A (en) 2022-05-27 2022-05-27 Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images

Country Status (1)

Country Link
CN (1) CN114841907A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 Target detection method, system, computing device and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934869A (en) * 2024-03-22 2024-04-26 中铁大桥局集团有限公司 Target detection method, system, computing device and medium

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN107680054B (en) Multi-source image fusion method in haze environment
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN103886344B (en) A kind of Image Fire Flame recognition methods
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN109948566B (en) Double-flow face anti-fraud detection method based on weight fusion and feature selection
CN111369605B (en) Infrared and visible light image registration method and system based on edge features
CN107705288A (en) Hazardous gas spillage infrared video detection method under pseudo- target fast-moving strong interferers
CN109034184B (en) Grading ring detection and identification method based on deep learning
CN106485651B (en) The image matching method of fast robust Scale invariant
CN111179208B (en) Infrared-visible light image fusion method based on saliency map and convolutional neural network
CN116704273A (en) Self-adaptive infrared and visible light dual-mode fusion detection method
CN111079518A (en) Fall-down abnormal behavior identification method based on scene of law enforcement and case handling area
CN113313107A (en) Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge
CN114841907A (en) Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images
Rekik et al. Review of satellite image segmentation for an optimal fusion system based on the edge and region approaches
CN113205494A (en) Infrared small target detection method and system based on adaptive scale image block weighting difference measurement
CN112184608A (en) Infrared and visible light image fusion method based on feature transfer
CN114842235B (en) Infrared dim and small target identification method based on shape prior segmentation and multi-scale feature aggregation
CN116189160A (en) Infrared dim target detection method based on local contrast mechanism
Zhu et al. Infrared and visible image fusion using threshold segmentation and weight optimization
CN113920087A (en) Micro component defect detection system and method based on deep learning
CN113963178A (en) Method, device, equipment and medium for detecting infrared dim and small target under ground-air background
CN113052833A (en) Non-vision field imaging method based on infrared thermal radiation
Chen et al. GADO-Net: an improved AOD-Net single image dehazing algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination