CN114841907A - Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images - Google Patents
Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images Download PDFInfo
- Publication number
- CN114841907A CN114841907A CN202210599873.3A CN202210599873A CN114841907A CN 114841907 A CN114841907 A CN 114841907A CN 202210599873 A CN202210599873 A CN 202210599873A CN 114841907 A CN114841907 A CN 114841907A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- layer
- generator
- discriminator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000001914 filtration Methods 0.000 claims description 29
- 230000004913 activation Effects 0.000 claims description 25
- 238000000354 decomposition reaction Methods 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 12
- 230000002146 bilateral effect Effects 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 5
- 230000002093 peripheral effect Effects 0.000 claims description 4
- 230000003042 antagnostic effect Effects 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 36
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000005855 radiation Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20028—Bilateral filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for generating a confrontation fusion network in a multi-scale mode facing infrared and visible light images, which comprises the steps of selecting a plurality of infrared and visible light image pairs from a standard training set, inputting the image pairs into an edge-preserving filter, and obtaining a basic layer and a detail layer; and inputting the basic layer into a gradient filter to obtain a gradient map and a new basic layer, adding the gradient map and the original detail layer to obtain a new detail layer, calculating to obtain network parameters of the discriminator, training, and finally obtaining an output which is a final fusion image. The image obtained by fusion retains the target information and the texture information of the source image to the maximum extent, improves the quality of the fused image, and provides more convenient prerequisite for subsequent target detection and identification.
Description
Technical Field
The invention belongs to the technical field of image decomposition and image fusion in digital image processing, and particularly relates to a method for generating a confrontation fusion network in a multi-scale mode facing to infrared and visible light images.
Background
Image fusion is a branch of the information fusion field, belongs to the research of the cross field, and relates to the fields of sensor imaging, image preprocessing, computer vision, artificial intelligence and the like. With the rapid development of multi-type imaging sensors, the problem that image target information provided by a single sensor is limited is effectively solved. Aiming at the same scene, two or more source images from the same or different imaging sensors are fused, so that a fused image with rich information and high definition can be obtained. The visible light sensor utilizes the reflected light of an object to image, and the obtained image has the characteristics of high resolution and abundant details. However, in the case of poor lighting conditions, the obtained image is less sharp. During infrared sensor formation of image, the thermal radiation information through the target is imaged, and the penetrating power is stronger, can solve visible light sensor simultaneously and shine not enough or have the object to shelter from the not good problem of formation of image effect under the condition, and it still can detect the target when the illumination condition is relatively poor, but formed image detail information and contrast information are not enough. The infrared and visible light image fusion technology can realize the complementation of the respective advantages of the two images, and ensure that the finally obtained fusion image contains the thermal radiation information, the contrast information and the detail information, so that the target information of the image can be better understood, and the all-weather work of the system can be finally realized. In recent years, the technology based on multi-scale image fusion has made an important progress. In general, multi-scale transform based infrared and visible image fusion schemes include three steps. Firstly, decomposing each source image into a series of multi-scale representations, then fusing the multi-scale representations of the source images according to a given fusion rule, and finally carrying out corresponding multi-scale inverse transformation on the fused images. Meanwhile, with the rapid development of deep learning, the unsupervised deep learning is expanded in the fusion field and achieves certain results. Although the method is more suitable for multi-source image fusion without reference images, higher requirements are put forward on the design of a network structure and a loss function. Therefore, the generation of the anti-network fusion method based on unsupervised is gradually receiving attention of researchers.
Disclosure of Invention
The invention aims to provide a method for generating a confrontation fusion network in multiple scales for infrared and visible light images.
The technical scheme adopted by the invention is that the method for generating the confrontation fusion network in multiple scales for the infrared and visible light images is implemented according to the following steps:
step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs into an edge preserving filter to obtain a basic layer and a detail layer;
step 2, inputting the basic layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new basic layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer;
step 3, inputting the basic layer and the detail layer obtained in the step 2 into a generator network G, obtaining a fusion image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
and 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.
The present invention is also characterized in that,
the filtering formula in step 1 is as follows:
wherein:
in the formula (1) I q In order to input an image, the image is,for the filtered image, q is I q S is a set of q pixels, p is a pixel in the q domain,is a part of the input image block,is thatThe peripheral image blocks are displayed on the display unit,is a spatial filter kernel that is a spatial filter kernel,is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
In step 2
And solving gradient values of all pixel points in the image by the following formula:
then, a threshold Gmax is defined, if the gradient value of the pixel point is larger than the threshold, the pixel point is set to be white, otherwise, the pixel point is set to be black, and thus, a gradient image I is obtained G ;
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d 。
In the step 3, a generator network structure consists of a double-current network and a convolutional neural network connected behind the double-current network, wherein the upper network and the lower network in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
Generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen ,(6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))](8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
Computing a generator loss function L G And meanwhile, the SGD is used for updating the network parameters so as to achieve the purpose of optimization and obtain the network parameters of the generator.
Two arbiter networks D in step 3 I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a full connection layer, and the input classification result is output, so that whether the output is a fusion image or a source image is predicted;
the discriminator loss function input for the infrared image and the fused image is:
wherein D is I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function with the input of the visible image and the fused image is:
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculatedAndthe optimization method for updating the network parameters is that the SGD finally obtains the network parameters of the discriminator.
The method for generating the confrontation fusion network in multiple scales facing the infrared and visible light images combines the multi-scale decomposition and the generation of the confrontation network, not only optimizes the source image, but also applies the neural network with good fusion effect to the fusion process. A method for generating a confrontation fusion network in multiple scales facing to infrared and visible light images includes obtaining a basic layer and a detail layer of an image through an edge filter holder and gradient filtering, retaining the required information to the maximum extent of the obtained image component, fusing the basic layer (structure information) and the detail layer (detail information) after multi-scale decomposition respectively by utilizing two branch networks of a generator network in the confrontation network, adding the generated basic layer image and the detail layer image to obtain a final fusion image, and generating two discriminator structures in the confrontation network to classify and discriminate two source images and the fusion image. The image obtained by fusion of the invention furthest retains the target information and the texture information of the source image, improves the quality of the fused image, and provides more convenient conditions for subsequent target detection and identification.
Drawings
FIG. 1 is a flowchart of the overall method of the present invention for multi-scale generation of a confrontational fusion network for infrared and visible light images;
FIG. 2 is a diagram of base and detail layers after bilateral filtering of a source image in accordance with the present invention;
FIG. 3 is a diagram of a new base layer and detail layer obtained after gradient filtering of the bilaterally filtered base layer according to the present invention;
FIG. 4 is a network architecture diagram of a generator in a generating confrontation network of the present invention;
fig. 5 is a network structure diagram of an arbiter in the generation of a countermeasure network according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a method for generating a confrontation fusion network in multiple scales for infrared and visible light images; the method comprises the steps of decomposing a source image through an edge filter retainer and gradient filter to obtain a basic layer and a detail layer of the image, inputting the basic layer and the detail layer into a generator network in a generation countermeasure network for fusion, inputting a fused image and two source images into a discriminator respectively for discrimination to optimize network parameters, obtaining a final fused image, and achieving image fusion. The overall network structure of the algorithm is shown in FIG. 1, and the fusion process of the infrared and visible light images for generating the anti-network based on multi-scale decomposition is mainly divided into the following three stages;
1) source image multiscale decomposition
The multi-scale decomposition of the source image is mainly divided into three steps, firstly, the image is input into an edge-preserving filter (bilateral filtering) to obtain a base layer and a detail layer, the obtained base layer and the detail layer are shown in figure 2, then, the base layer is subjected to gradient filtering to obtain a gradient map and a new base layer, and finally, the gradient map and the detail layer are added to be used as a new detail layer, so that a new base layer and a new detail layer are obtained, and the new base layer and the new detail layer are shown in figure 3. The bilateral filtering and gradient filtering principles are as follows:
bilateral filtering is an edge-preserving filter, which can achieve the effects of preserving edges and reducing noise and smoothing. As with other filtering principles, bilateral filtering also uses a weighted average method, in which the intensity of a certain pixel is represented by a weighted average of the intensity values of the peripheral pixels, and the weighted average is based on gaussian distribution. Most importantly, the weight of the bilateral filtering not only considers the euclidean distance of the pixel (like common gaussian low-pass filtering, only the influence of the position on the central pixel is considered), but also considers the radiation difference in the pixel range domain (such as the similarity between the pixel and the central pixel in the convolution kernel, the color intensity, the depth distance and the like), and the two weights are considered when the central pixel is calculated. The filter formula is as follows:
And 2, inputting the base layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new base layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer. The gradient filtering principle is as follows:
the gradient is simply the derivative, three different filters: sobel, Scharr and Laplacian; sobel and Scharr actually calculate the first or second derivative; scharr is an optimization of Sobel; laplacian is the second derivative. The Sobel filter is adopted, so that high frequency passes through and low frequency is blocked, and the edge is enabled to achieve the purpose of enhancing the image more obviously. The specific principle is as follows:
the Sobel operator is a discrete difference operator for calculating the approximate value of the gray scale of the image brightness function. Using this operator at any point in the image will produce the corresponding gray scale vector or its normal vector.
The operator comprises two sets of 3x3 matrices, horizontal and vertical, respectively, which are then subjected to planar convolution with the image to obtain horizontal and vertical luminance difference approximations. If A represents the original image, Gx and Gy represent the gray level of the image detected by the horizontal and vertical edges respectively, the formula is as follows:
G X =G x *A and G Y =G y *A (4)
2) generating countermeasure network parameter acquisitions
Acquiring network parameters of the discriminator: splicing the base layer and the detail layer obtained in the step 1) in pairs on the dimension of an image channel to be used as the input of a generator, wherein the network structure diagram of the generator is shown in figure 4 and consists of a double-current network and a convolution block. The upper network and the lower network of the double-current network are the same and consist of six layers of convolutional neural networks, the first four layers have the same structure, and the network structure is a convolutional layer of 3 multiplied by 3, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the latter two layers have the same structure and consist of a 5 × 5 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu). The network structure following the dual-stream network consists of 1 × 1 convolutional layers and active layers (the activation function is tanh), and the layer outputs the final fused image.And then splicing respective fusion results (fusion images of the base layer and the detail layer) to obtain a final fusion image. After the generator G is passed through the process, the generator loss function L is calculated G Meanwhile, network parameters are updated by using SGD (random gradient descent) so as to achieve the purpose of optimization, and the network parameters of the generator are obtained.
Acquiring network parameters of the discriminator: because the source image is a pair, two discriminators are used, one for obtaining the probability P that the fused image is an infrared image I And the other probability P of acquiring the fused image as a visible light image V . The two arbiter networks are identical in structure, as shown in fig. 5. The convolutional neural network comprises five layers of convolutional neural networks, wherein the first four layers have the same structure, and the network structure comprises a 3x3 convolutional layer, a batch normalization layer and an activation layer (the activation function is Leaky Relu); the last layer is a full connection layer, and the output is the classification result of the input. Inputting a source image and a fusion image to obtain two probabilities, and continuously updating network parameters when the probabilities are larger than a preset threshold until the probabilities are smaller than the preset threshold. In the process, the signal passes through a discriminator D I And D V Then, corresponding discriminator loss function is calculatedAndthe optimization method for updating the network parameters is SGD (random gradient descent), and finally the network parameters of the discriminator are obtained.
The loss function comprises a generator loss function L G And loss function of two discriminatorsAndthe design is as follows:
the purpose of the generator loss function is to save more source image information, which consists of two parts, namely content loss and countermeasure loss:
L G =λL content +L Gen , (5)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
wherein H and W are the height and width of the image input by the generator, | · | | | luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (7)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
The two discriminators are used for effectively reducing the information loss of the fusion result, and the function of the discriminators is to enable the generator to store more source image information; the definition is as follows:
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Denotes a discriminator discrimination value D having the fused image as an input I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input.
3) Converged test network
Inputting the generator network parameters obtained in the step 2) into a generator network, carrying out multi-scale decomposition on the tested picture, splicing the corresponding basic layer and the detail layer obtained by decomposition, and inputting the spliced basic layer and detail layer into the generator network, wherein the output of the generator is the final fused image.
The invention relates to a method for generating a confrontation fusion network in multiple scales for infrared and visible light images, a flow chart is shown in figure 1, and the method is implemented according to the following steps:
with reference to fig. 2, step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs to an edge preserving filter (bilateral filtering) to obtain a base layer and a detail layer;
the filtering formula in step 1 is as follows:
wherein:
in the formula (1) I q In order to input an image, the image is,for the filtered image, q is I q S is a set of q pixels, p is a pixel in the q domain,is a part of the input image block,is thatThe peripheral image blocks are displayed on the display unit,is a spatial filter kernel that is a spatial filter kernel,is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
With reference to fig. 3, step 2, inputting the base layer obtained in step 1 to a gradient filter to obtain a gradient map and a new base layer, and adding the gradient map and the detail layer in step 1 to obtain a new detail layer;
in step 2
The gradient filtering principle is as follows:
gradient filtering is simply the derivation of an image, and includes three different filters: sobel, Scharr and Laplacian; sobel and Scharr are first order derivatives, Scharr is an optimization of Sobel, Laplacian is second order derivatives. The Sobel filter is adopted here, and the purpose is to allow high-frequency information to pass through and block low-frequency information, so that the edge is more obvious to achieve the purpose of enhancing the image. It is composed of
The specific principle is as follows:
the Sobel operator is a discrete difference operator for calculating the approximate value of the gray scale of the image brightness function. Using this operator at any point in the image will produce the corresponding gray scale vector or its normal vector.
The operator comprises two sets of 3x3 matrices, horizontal and vertical, respectively, which are then subjected to planar convolution with the image to obtain horizontal and vertical luminance difference approximations. If A represents the original image, Gx and Gy represent the gray level of the image detected by the horizontal and vertical edges respectively, the formula is as follows:
G X =G x *A and G y =G y *A (4)
and solving gradient values of all pixel points in the image by the following formula:
then, a threshold value Gmax (defined as 150 here) is defined, if the gradient value of the pixel is larger than the threshold value, the pixel is set to be white, otherwise, the pixel is set to be black, and thus, a gradient map I is obtained G ;
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d 。
With reference to fig. 4 and 5, step 3, inputting the base layer and the detail layer obtained in step 2 into a generator network G, obtaining a fused image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
in the step 3, a generator network structure consists of a double-current network and a convolutional neural network connected behind the double-current network, wherein the upper network and the lower network in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
Generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen , (6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f To the output of the generator, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Is) a discriminator discrimination value inputted in the visible light image or the fusion image.
Calculating generator lossLoss function L G Meanwhile, network parameters are updated by using SGD (random gradient descent) so as to achieve the purpose of optimization, and the network parameters of the generator are obtained.
Two arbiter networks D in step 3 I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a fully connected layer, and the output is the classification result of the input, thereby predicting whether the output is a fusion image or a source image (the source image has two possibilities, an infrared image and a visible light image I Or D V ,D I Refers to infrared images, D V Refers to visible light images);
the discriminator loss function input for the infrared image and the fused image is:
wherein D is I (I I ) Representing the discriminators discrimination value, D, with infrared images as input I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function with the input of the visible image and the fused image is:
wherein D is V (I V ) Indicating a discriminator discrimination value, D, using a visible light image as input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculatedAndthe optimization method for updating the network parameters is SGD (random gradient descent), and finally the network parameters of the discriminator are obtained.
Step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
specifically, the generator network parameters obtained in the step 4 are loaded into a generator network in a test network, multi-scale decomposition is carried out on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, then corresponding basic layers and detail layers obtained through decomposition are spliced together to serve as the input of the test network, and the obtained output is the final fusion image.
And 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.
Claims (6)
1. The method for generating the confrontation fusion network in a multi-scale mode facing to the infrared and visible light images is characterized by comprising the following steps:
step 1, selecting a plurality of infrared and visible light image pairs from a standard training set, and inputting the image pairs into an edge preserving filter to obtain a basic layer and a detail layer;
step 2, inputting the basic layer obtained in the step 1 into a gradient filter to obtain a gradient map and a new basic layer, and adding the gradient map and the detail layer obtained in the step 1 to obtain a new detail layer;
step 3, inputting the basic layer and the detail layer obtained in the step 2 into a generator network G, obtaining a fusion image corresponding to the source image pair after passing through the generator network G, and calculating a generator loss function L G Updating the generator network G parameters to obtain final generator network parameters, inputting the source images and the fusion images into a discriminator network D for classification, and calculating a discriminator loss function L D Updating the network parameters of the discriminator to obtain the final network parameters of the discriminator;
step 4, starting to train the network, judging whether iteration is finished or not, namely whether the current iteration number reaches the set iteration number or not, taking the network parameter obtained when the iteration number reaches the set iteration number as the final network parameter, and storing the network parameter;
and 5, loading the generator network parameters obtained in the step 4 into a generator network in a test network, carrying out multi-scale decomposition on the tested infrared and visible light source images, namely the filtering operation in the step 1 and the step 2, splicing corresponding basic layers and detail layers obtained by decomposition and taking the spliced layers as the input of the test network, and obtaining the output which is the final fusion image.
2. The method for multi-scale generation of a confrontational fusion network facing infrared and visible light images according to claim 1, characterized in that the filtering formula in step 1 is as follows:
wherein:
in the formula (1) I q In order to input an image, the image is,for the filtered image q is I q S is a set of q pixels, p is a pixel in the q domain,is a part of the input image block,is thatThe peripheral image blocks are displayed on the display unit,is a spatial filter kernel that is a spatial filter kernel,is the distance filter kernel, both the spatial kernel and the distance kernel are usually represented in gaussian fashion;
in the formula (3) I d0 For detail layers obtained by bilateral filtering, I b0 Is the base layer obtained.
3. The method for multi-scale generation of an antagonistic fusion network oriented to infrared and visible light images according to claim 2, characterized in that in step 2
And solving gradient values of all pixel points in the image through the following formula:
then, a threshold Gmax is defined, if the gradient value of the pixel is greater than the threshold, the pixel is set to be white, otherwise, the pixel is set to be black,thus, a gradient map I is obtained G ;
The base layer I obtained in the step 1 b0 Gradient filtering is carried out to obtain a gradient map I G Then the original base layer I b0 Subtracting the gradient map to obtain a new base layer I b Gradient map I G Layer of homologous detail I d0 Adding up to obtain a new detail layer I d 。
4. The method for multi-scale generation of the confrontation fusion network facing the infrared and visible light images according to claim 3, wherein the generator network structure in the step 3 is composed of a double-current network and a convolutional neural network connected behind the double-current network, the upper and lower two networks in the double-current network have the same structure and are all six layers of convolutional neural networks, the first four layers have the same structure, the network structures are a 3x3 convolutional layer, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the latter two layers have the same structure and are composed of a 5 × 5 convolutional layer, a batch normalization layer and an active layer, the active function of the active layer is Leaky Relu, the network structure following the dual-flow network is composed of a 1 × 1 convolutional layer and an active layer, the active function of the active layer is tanh, and the output of the convolutional neural network is the final fusion image.
5. Method for multi-scale generation of an anti-convergence network for infrared and visible light images according to claim 4, characterized in that the generator loss function L in step 3 G Comprises the following steps:
L G =λL content +L Gen , (6)
wherein L is content Is the content loss after comparison of the generator input and output, L Gen Is the countermeasure loss of the generator and the discriminator, and lambda is a constant;
wherein H and W are the height and width of the image input by the generator, | · | |. luminance 2 To calculate the two-norm, I f For the generator output, i.e. the fused image, I b Being a base layer of an input generator, I d For the purpose of the detail layer of the input generator,is a gradient operator, and xi is a constant;
L Gen =E[log(1-D V (G(I b ,I d )))]+E[log(1-D I (G(I b ,I d )))] (8)
D V (G(I b ,I d ) G (I) represents a discriminator discrimination value input in the form of an infrared image or a fusion image b ,I d ) Representing the fused image generated by the generator, D I (G(I b ,I d ) Representing a discriminator discrimination value inputted with a visible light image or a fusion image;
computing a generator loss function L G And meanwhile, the SGD is used for updating the network parameters so as to achieve the purpose of optimization and obtain the network parameters of the generator.
6. Method for multi-scale generation of a contextualized network oriented to infrared and visible light images according to claim 5, characterized in that in step 3 there are two discriminator networks D I And D V The network structure is a convolution layer of 3 multiplied by 3, a batch normalization layer and an activation layer, and the activation function of the activation layer is Leaky Relu; the last layer is a full connection layer, and the input classification result is output, so that whether the output is a fusion image or a source image is predicted;
the discriminator loss function input for the infrared image and the fused image is:
wherein D is I (I I ) Indicating input in the form of infrared imagesDiscriminator discrimination value, D I (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
the discriminator loss function input for the visible image and the fused image is:
wherein D is V (I V ) Denotes a discriminator discrimination value D of a visible light image as an input V (G(I b ,I d ) Represents a discriminator discrimination value having the fused image as an input;
setting a threshold value for the output of the discriminator, and continuously updating the network parameters when the output value of the discriminator is larger than the preset threshold value until the output value is smaller than the preset threshold value, wherein the output value passes through the discriminator D in the process I And D V Then, corresponding discriminator loss function is calculatedAndthe optimization method for updating the network parameters is that the SGD finally obtains the network parameters of the discriminator.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210599873.3A CN114841907A (en) | 2022-05-27 | 2022-05-27 | Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210599873.3A CN114841907A (en) | 2022-05-27 | 2022-05-27 | Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114841907A true CN114841907A (en) | 2022-08-02 |
Family
ID=82572920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210599873.3A Pending CN114841907A (en) | 2022-05-27 | 2022-05-27 | Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114841907A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117934869A (en) * | 2024-03-22 | 2024-04-26 | 中铁大桥局集团有限公司 | Target detection method, system, computing device and medium |
-
2022
- 2022-05-27 CN CN202210599873.3A patent/CN114841907A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117934869A (en) * | 2024-03-22 | 2024-04-26 | 中铁大桥局集团有限公司 | Target detection method, system, computing device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN107680054B (en) | Multi-source image fusion method in haze environment | |
CN109684922B (en) | Multi-model finished dish identification method based on convolutional neural network | |
CN103886344B (en) | A kind of Image Fire Flame recognition methods | |
CN112733950A (en) | Power equipment fault diagnosis method based on combination of image fusion and target detection | |
CN109948566B (en) | Double-flow face anti-fraud detection method based on weight fusion and feature selection | |
CN111369605B (en) | Infrared and visible light image registration method and system based on edge features | |
CN107705288A (en) | Hazardous gas spillage infrared video detection method under pseudo- target fast-moving strong interferers | |
CN109034184B (en) | Grading ring detection and identification method based on deep learning | |
CN106485651B (en) | The image matching method of fast robust Scale invariant | |
CN111179208B (en) | Infrared-visible light image fusion method based on saliency map and convolutional neural network | |
CN116704273A (en) | Self-adaptive infrared and visible light dual-mode fusion detection method | |
CN111079518A (en) | Fall-down abnormal behavior identification method based on scene of law enforcement and case handling area | |
CN113313107A (en) | Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge | |
CN114841907A (en) | Method for generating countermeasure fusion network in multiple scales facing infrared and visible light images | |
Rekik et al. | Review of satellite image segmentation for an optimal fusion system based on the edge and region approaches | |
CN113205494A (en) | Infrared small target detection method and system based on adaptive scale image block weighting difference measurement | |
CN112184608A (en) | Infrared and visible light image fusion method based on feature transfer | |
CN114842235B (en) | Infrared dim and small target identification method based on shape prior segmentation and multi-scale feature aggregation | |
CN116189160A (en) | Infrared dim target detection method based on local contrast mechanism | |
Zhu et al. | Infrared and visible image fusion using threshold segmentation and weight optimization | |
CN113920087A (en) | Micro component defect detection system and method based on deep learning | |
CN113963178A (en) | Method, device, equipment and medium for detecting infrared dim and small target under ground-air background | |
CN113052833A (en) | Non-vision field imaging method based on infrared thermal radiation | |
Chen et al. | GADO-Net: an improved AOD-Net single image dehazing algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |