CN117058507B - Fourier convolution-based visible light and infrared image multi-scale feature fusion method - Google Patents
Fourier convolution-based visible light and infrared image multi-scale feature fusion method Download PDFInfo
- Publication number
- CN117058507B CN117058507B CN202311037544.0A CN202311037544A CN117058507B CN 117058507 B CN117058507 B CN 117058507B CN 202311037544 A CN202311037544 A CN 202311037544A CN 117058507 B CN117058507 B CN 117058507B
- Authority
- CN
- China
- Prior art keywords
- convolution
- fusion
- feature map
- feature
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 14
- 230000004927 fusion Effects 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000011176 pooling Methods 0.000 claims abstract description 22
- 238000007499 fusion processing Methods 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 45
- 238000012545 processing Methods 0.000 claims description 33
- 238000013507 mapping Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 12
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 7
- 238000001514 detection method Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000002411 adverse Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000005855 radiation Effects 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000002364 input neuron Anatomy 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000004297 night vision Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000001429 visible spectrum Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a Fourier convolution-based visible light and infrared image multi-scale feature fusion method, which comprises the following steps of: A. acquiring an RGB image and an infrared image to be fused; B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained; C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map; D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map; E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map. According to the invention, the infrared image and the RGB image are effectively fused together, so that more comprehensive and more accurate image data is obtained.
Description
Technical Field
The invention relates to an image feature extraction processing method, in particular to a multi-scale feature fusion method based on Fourier convolution visible light and infrared images.
Background
In the military and security fields, infrared or visible light detection images are currently in common use for detection and identification of objects. The infrared detection image can capture infrared radiation outside visible spectrum, which is radiation that can not be perceived by human eyes, the infrared radiation has strong penetrability for some substances and environments, and can penetrate through smog, haze, cloud layers and other visual barriers, so that the infrared image can still provide effective image information under severe weather conditions, and is beneficial to observation and monitoring in complex environments; but infrared images have some limitations such as relatively low resolution, imaging quality being affected by environmental factors, etc. The visible light detection image has weaker penetrability, but higher resolution and good imaging quality. If the infrared image and the RGB image can be combined, the respective limitations can be overcome, and the comprehensiveness and usability of the image can be improved. Therefore, there is a need for an infrared image and an RGB image that can be effectively fused together to obtain more comprehensive and accurate image data.
Disclosure of Invention
The invention aims to provide a multi-scale feature fusion method based on Fourier convolution visible light and infrared images. According to the invention, the infrared image and the RGB image are effectively fused together, so that more comprehensive and more accurate image data is obtained.
The technical scheme of the invention is as follows: the method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution comprises the following steps:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map.
In the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific process of the fast Fourier convolution module for fusing the multi-source information is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixel height and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
feature map Y comprising a mapping of high frequency branches H to high frequency branches H l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L→L ;
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, C represents the spatial resolution and the number of channels, respectively;
c4 fast Fourier convolution module along feature passThe dimensions of the trace are split into series characteristic diagrams X, i.e. into x= { X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useLet y= { Y as output tensor l ,Y g And updated with equation 1),
c6, use 3X 3 convolution to Y l And Y g After convolution processing, fusing the two to obtain an output tensor Y, namely a multisource information fusion feature map;
in the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, firstly carrying out feature extraction by using 1X 1 convolution processing on the bottleneck block through depth convolution, and finally carrying out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
In the method for fusing the multi-scale features of the visible light and the infrared image based on Fourier convolution, the specific processing process of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map;
e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, 1 is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula a= (1/(tr (Σ))) Σ;
e4, carrying out iterative treatment by adopting a Newton-Schulz iterative formula;
and E4, performing post-compensation treatment and splicing treatment in sequence.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the Newton-Schulz iterative formula is as follows
Wherein I represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
Z k-1 represents the result obtained by iterating k-1 times by taking the identity matrix I as the initial value, and is the same as Z k The results obtained after iterating k times are shown.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the calculation formula of post-compensation processing is as follows: c= (tr (Σ)) 1/2 Y N Where tr (Σ) is the covariance matrix trace, Y N The results obtained after N iterations.
In the Fourier convolution visible light and infrared image-based multi-scale feature fusion method, the specific process of the splicing treatment is to splice an upper triangular matrix of the symmetrical matrix obtained by post-compensation treatment into a feature map of d (d-1)/2-dimensional vector, so as to obtain a comprehensive fusion feature map.
Compared with the prior art, the invention sequentially carries out the feature extraction of the multi-scale feature extractor, the multi-source information fusion processing of the fast Fourier convolution module, the fusion of different layer features of the multi-scale feature fusion module and the global covariance pooling processing of the covariance pooling module on the RGB image and the infrared image, thereby effectively fusing the RGB image and the infrared image together, simultaneously acquiring the heat energy information and the color information, realizing the efficient feature fusion and the more comprehensive target analysis and the feature extraction, acquiring the more comprehensive and more accurate image data and providing powerful support for the analysis and the research of a plurality of fields. For example, in the military and security fields, the combination of infrared images with RGB images can enable more accurate target detection and identification, improving night vision and target tracking capabilities.
Specifically, deep semantic information in an image is extracted through a multi-scale feature extractor; the fast Fourier convolution module performs multi-source information fusion and retains discrimination information; fusing the features of different layers in the feature map by utilizing a multi-scale feature fusion module; global covariance pooling replaces global average pooling, and high-order information is extracted from RGB images and infrared images to obtain richer depth feature statistical information.
In summary, the invention effectively fuses the infrared image and the RGB image together to obtain more comprehensive and more accurate image data.
Extensive experimentation on the baseline dataset showed that classification accuracy by fusion increased 2.036% and 1.926%, respectively, compared to using only infrared images or RGB images.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of the overall structure of the present invention;
FIG. 3 is a schematic diagram of a fast Fourier convolution module according to the present invention;
FIG. 4 is a schematic diagram of the structure of a fast Fourier convolution layer in the fast Fourier convolution module of the present invention; wherein, (a) is a total chart of a Fourier convolution module, and (b) is a specific structure of a SpectralTransformer branch in a;
FIG. 5 is a schematic diagram of a bottleneck block according to the present invention;
FIG. 6 is a schematic diagram of a multi-scale convolution in a bottleneck block according to an embodiment of the present invention;
fig. 7 is a schematic flow chart of a covariance pooling module according to an embodiment of the invention.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not intended to be limiting.
Examples. The method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution comprises the following steps:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. and the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map.
The specific process of the fast Fourier convolution module for carrying out multi-source information fusion processing is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixel height and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
feature map Y comprising a mapping of high frequency branches H to high frequency branches H l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L→L ;
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, C represents the spatial resolution and the number of channels, respectively;
c4, splitting the serial characteristic graph X along the dimension of the characteristic channel by the fast Fourier convolution module, namely splitting the serial characteristic graph X into X= { X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useLet y= { Y as output tensor l ,Y g And updated with equation 1),
Y l =Y l→l +Y g→l =f l (X l )+f g→l (X g )
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) Formula 1);
c6, use 3X 3 convolution to Y l And Y g After convolution processing, fusing the two to obtain an output tensor Y, namely a multisource information fusion feature map;
the specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, firstly carrying out feature extraction by using 1X 1 convolution processing on the bottleneck block through depth convolution, and finally carrying out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
The specific processing procedure of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
The specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map; e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, 1 is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula a= (1/(tr (Σ))) Σ;
e4, carrying out iterative treatment by adopting a Newton-Schulz iterative formula;
and E4, performing post-compensation treatment and splicing treatment in sequence.
Newton-schulz iteration formula is
Wherein I represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
Z k-1 represents the result obtained by iterating k-1 times by taking the identity matrix I as the initial value, and is the same as Z k The results obtained after iterating k times are shown.
The calculation formula of the post-compensation processing is as follows: c=tr (Σ) 1/2 Y N Where tr (Σ) is the covariance matrix trace, Y N The results obtained after N iterations.
The specific process of the splicing treatment is to splice the upper triangular matrix of the symmetrical matrix obtained by the post-compensation treatment into a characteristic diagram of d (d-1)/2-dimensional vector, and obtain a comprehensive fusion characteristic diagram.
Example 2. Based on a Fourier convolution visible light and infrared image multi-scale feature fusion method, the framework of the invention is designed for carrying out pixel-level classification by fusing multi-source remote sensing images, and is shown in figure 2; it is mainly composed of two parts: 1) Multi-source frequency decomposition and fusion (first part) based on fast fourier convolution module (FFCN); 2) Feature extraction (second part) of the multi-scale layer feature fusion module, and covariance pooling module (GCP model).
Construction of a multiscale feature fusion covariance network of a fast fourier convolution module (F 2 MCN), which focuses on efficient feature fusion and comprehensive feature extraction. First, FFCN adopts fast Fourier convolution layer to fuse multi-source information and retain discrimination information. Then, using a multiscale feature fusion (MF 2) module pair F 2 Features of different layers in the MCN are fused. Finally, global Covariance Pooling (GCP) replaces Global Average Pooling (GAP), and high-order information is extracted from RGB images and infrared images to obtain richer depth feature statistics.
Fig. 3 is a fast fourier convolution (FFC Conv) layer in the lower half of fig. 2, which has been used for visible image classification with a more efficient convolution layer. The simple characteristic splicing or superposition operation is extremely easy to generate redundant information superposition. The classical feature extraction and fusion method is adopted to fuse the visible light image and the infrared image information, so that partial redundancy is reduced, but redundancy still exists in a low-frequency part. The present application first uses a fourier convolution layer to decompose the input image into a multi-resolution representation, which makes it easier to reduce spatial redundancy.
In this step, the visible light image (RGB image) is represented asWherein b r Representing the band, r c represents the pixel height and width. An Infrared image (Infrared image) is expressed as +.>Wherein b i Representing the band, r c represents the pixel height and width.
Convolving X with FastFourierConvolution (FFC) r And X i Explicit decomposition along the channel dimension. Wherein Y is l H→H Representing a feature map for high frequency branch (H) to high frequency branch (H) mapping, Y l H→L Feature map, Y, representing a mapping for high frequency branches (H) to low frequency branches (L) h L→H Representing a feature map, Y, for mapping of low frequency branches (L) to high frequency branches (H) h L→L A feature map for the low frequency branch (L) to low frequency branch (L) mapping is represented.
The FFC architecture is shown as a in fig. 4, and b in fig. 4 is a block diagram of a spectrolforsformer. Conceptually, the FFC is composed of two interconnected paths: a spatial (or local) path that performs a common convolution on a portion of the input feature channels, and a spectral (or global) path that operates in the spectral domain. Each path can capture complementary information with a different receptive field. The exchange of information between these paths is performed internally.
Formally, is provided withThe input features of the FFC are mapped, where h× W, C represents the spatial resolution and the number of channels, respectively. At FFC inletFirst split X along the dimension of the feature channel, i.e., x= { X l ,X g }. Local part->Learning from local neighborhood, second global part->The remote context is intended to be captured. Alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion. To simplify the network, it is assumed that the output is the same size as the input. By usingAs an output tensor. Similarly, let Y= { Y l ,Y g The } is a local-global partition, the global part proportion of the output tensor is defined by the superparameter alpha out ∈[0,1]And (5) controlling. The update process inside the FFC can be described by the following formula:
Y l =y l→l +Y g→l =f l (X l )+f g→l (X g )
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) (1)
wherein component Y l→l The purpose of (2) is to capture small-scale information using conventional convolution. Likewise, the other two components (Y g →l /Y l→g ) Obtained by inter-path conversion, is also implemented using conventional convolution to make full use of the multi-scale acceptance domain. The main complexity is due to Y g→g Is calculated by the computer. For clarity of description we call f g Is a spectral converter as shown in fig. 4 b.
There are two modes 1 and 2 in fig. 3 using stride in Bottleneck block. When the Bottleneck sets the stride to 1, the Bottleneck is specifically the structure shown in the left half of fig. 5, first the dimension is lifted using a 1 x 1 convolution, feature extraction is performed by a depth convolution (DW Conv), and finally a point convolution is performed. When the stride is set to 2, the bottleck is specifically the structure shown in the right half of fig. 6, and the dimension is lifted by using 1×1 convolution, then feature extraction is performed by using multi-scale convolution (MS Conv, the detailed structure of which is shown in fig. 6), and finally point convolution is performed.
Fig. 6 shows the MSConv structure at split=4, with the input feature map divided equally into s groups according to channel. Features are extracted from the input feature map of the first set using 3 x 3Conv. The output of the first group is then sent to the second group and added to the input of the second group. At the same time, the result of the addition is sent to the second set of 3 x 3Conv. This process is repeated multiple times until the final set of feature maps has been processed. And finally, splicing all the outputs according to the channels, and carrying out 1X 1 point convolution to carry out information fusion.
The first 1 x 1 convolution of the bottleneck block of the present application performs an up-scaling process on the input feature map, which may provide sufficient channels for MSConv to perform multi-scale feature extraction, as compared to the bottleneck block of the res net. Since in MSConv the output of the previous group is added to the input of the current group, the size of the feature map needs to be the same, we use MS Conv only when the bottleneck step is 1.
The architecture of GCP is shown in fig. 7. A feature map of size h×w×d of the multi-scale feature fusion output is converted to a feature map of size n×d, where n=h×w. First, the covariance matrix Σ is composed ofAnd (3) calculating, wherein,i is an n x n identity matrix, and 1 is a matrix with all elements being 1.
Then, the covariance matrix Σ is divided by its trace a= (1/(tr (Σ))) Σ in a pre-normalization step, (the adverse effect of pre-normalization is eliminated), where tr (i) is the trace of the matrix. This is done to enable the subsequent newton-schulz iterations to converge. The iterative formula is as follows:
i represents an identity matrix; y is Y k-1 Taking the matrix A as a starting value, iterating the results obtained after k-1 times, and repeating the same as Y k Representing the result obtained after iterating k times;
in the post-compensation, the result Y obtained after N iterations N Multiplying the square root of the covariance matrix trace, c= (tr (Σ)) 1/2 Y N To eliminate the adverse effects of pre-normalization. And finally, splicing the upper triangular matrix of the symmetrical matrix C obtained by post-compensation into a d (d-1)/2-dimensional vector, and transmitting the d (d-1)/2-dimensional vector to the FC layer.
C=(tr(∑)) 1/2 Y N The function of this formula is to eliminate the adverse effects of pre-normalization.
The FC layer, also known as the fully connected layer (Fully Connected Layer), is a common layer type in deep learning neural networks. In the FC layer, each neuron is connected to all neurons of the previous layer to form a fully connected structure. Thus, each neuron of the FC layer has a weight connection with all input neurons of the previous layer.
The main function of the fully connected layer is to map the feature representation of the previous layer to the final output space. The method can learn complex nonlinear relations among input features, and perform linear combination and activation function processing through weight parameters so as to generate an output result. In deep learning, the full-join layer is often used for the final classification task or regression task.
The method and the device can also perform back propagation through final result data (parameters in a GCP module), and are convenient for learning and training based on a Fourier convolution visible light and infrared image multi-scale feature fusion model. In back propagation, the partial derivative of the loss function/relative to the covariance error input matrix is obtained by the gradient related to the network structure in the matrix back propagation algorithm, through first-order TayThe le approximation establishes a chain law of a general matrix function,then, the corresponding gradient is calculated>And->
From the chain law of matrix back propagation and newton-schulz iteration, through a series of operations, k=n, …,2, one can derive
In the pre-normalization of the values,can be obtained by (7)
Here we need to combine the gradient of the loss function i with respect to Σ with the gradient of the back-propagation of the back-compensation layer, it can be deduced that:
finally, the gradient of the loss function/with respect to the input matrix X can be deduced as:
the parameters in the GCP module may be updated by back propagation formulas. GCP retains semantic information better than GAP. Most importantly, the GCP module is more suitable for GPU parallel operation.
Claims (6)
1. The method for fusing the multi-scale features of the visible light and infrared images based on Fourier convolution is characterized by comprising the following steps of:
A. acquiring an RGB image and an infrared image to be fused;
B. deep semantic information in the RGB image and the infrared image is extracted through a multi-scale feature extractor, and the RGB image and the infrared image with the deep semantic information are obtained;
C. carrying out multi-source information fusion processing on the RGB image and the infrared image with deep semantic information through a fast Fourier convolution module to obtain a multi-source information fusion feature map;
D. fusing the characteristics of different layers in the multiscale information fusion characteristic map by utilizing a multiscale characteristic fusion module to obtain a multiscale characteristic fusion characteristic map;
E. the covariance pooling module processes the multi-scale feature fusion feature map by adopting a global covariance pooling mode to obtain a comprehensive fusion feature map;
the specific process of the fast Fourier convolution module for carrying out multi-source information fusion processing is as follows:
c1, representing RGB image with deep semantic information asWherein b r Representing the band, r×c representing the pixelHeight and width; representing an infrared image with deep semantic information as +.>Wherein b i Representing the band, r×c representing the pixel height and width;
c2, X is convolved with fast Fourier convolution module r And X i The decomposition is explicit along the channel dimension,
feature map Y comprising a mapping of high frequency branches H to high frequency branches H l H→H Feature map Y of the mapping of high frequency branch H to low frequency branch L l H→L Feature map Y of the mapping of low-frequency branches L to high-frequency branches H h L→H Feature map Y of low frequency branch L to low frequency branch L mapping h L →L ;
C3, Y l H→H And Y is equal to h L→H In series, Y l H→L And Y is equal to h L→L Connected in series to obtain two series characteristic diagramsWherein h× W, M represents the spatial resolution and the number of channels, respectively;
c4, splitting the serial characteristic diagram X along the dimension of the characteristic channel by the fast Fourier convolution module ri I.e. split into X ri ={X l ,X g -a }; wherein the local partFor learning from local neighborhood, global part->For capturing remote context, alpha in ∈[0,1]Representing the percentage of characteristic channels assigned to the global portion;
c5, useAs an inputTensor is given, let Y= { Y l ,Y g And updated with equation 1),
Y l =Y l→l +Y g→l =f l (X l )+f g→l (X g )
Y g =Y g→g +Y l→g =f g (X g )+f l→g (X l ) Formula 1);
c6, use 3X 3 convolution to Y l And Y g After convolution processing, fusing the two to obtain an output tensor Y, namely a multisource information fusion feature map;
the specific processing procedure of the multi-scale feature fusion module is as follows: sequentially carrying out bottleneck processing on the multisource information fusion feature map through a plurality of bottleneck blocks which are connected in series to obtain a multiscale feature fusion feature map;
the convolution window moving stride of the bottleneck block comprises two modes, namely 1 and 2; when the convolution window moving step of the bottleneck block is 1, the bottleneck block firstly uses 1 multiplied by 1 convolution processing, then carries out feature extraction through depth convolution, and finally carries out point convolution processing; when the convolution window moving step of the bottleneck block is 2, the bottleneck block firstly uses 1×1 convolution processing, then uses multi-scale convolution to extract features, and finally carries out point convolution processing.
2. The method for fusing multi-scale features of visible light and infrared images based on Fourier convolution as claimed in claim 1, wherein the specific processing procedure of the multi-scale convolution is as follows: the input characteristic mapping diagram is equally divided into s groups according to the channel; features are then extracted from the feature map of the first set of inputs using a 3 x 3 convolution; transmitting the extracted feature output of the first group to the second group and adding to the input of the second group and transmitting the added result to the 3 x 3 convolution of the second group; repeating the steps until the final group of feature mapping is processed; and finally, splicing all extracted characteristic outputs according to channels, and carrying out 1X 1 point convolution to carry out information fusion.
3. The fourier convolution visible light and infrared image-based multi-scale feature fusion method according to claim 1, wherein the specific processing procedure of the covariance pooling module is as follows:
e1, firstly converting a multi-scale feature fusion feature map with the size of h multiplied by w multiplied by d into a feature map with the size of n multiplied by d, wherein n=h multiplied by w;
h and w represent the height and width of the feature map, respectively, and d represents the size of the third dimensional channel of the feature map;
e2, byCalculating a covariance matrix Sigma, wherein +.>I is an n multiplied by n identity matrix, phi is a matrix with all elements being 1, and X represents an original input characteristic diagram input to a covariance pooling module;
e3, pre-normalizing the covariance matrix Σ by the formula n= (1/(tr (Σ))) Σ;
e4, carrying out iterative processing by adopting Newton-Shultz iterative formula ;
And E4, performing post-compensation treatment and splicing treatment in sequence.
4. The fourier convolution visible and infrared image-based multi-scale feature fusion method as recited in claim 3, wherein the newton-schulz iterative formula is
Wherein I represents an identity matrix; y is Y k-1 Taking the N matrix as a starting value, iterating k-1 times
Obtained after thatAs a result, the same thing Y k Representing the result obtained after iterating k times;
Z k-1 represents the result obtained by iterating k-1 times by taking the identity matrix I as the initial value, and is the same as Z k The results obtained after iterating k times are shown.
5. The fourier convolution visible light and infrared image-based multi-scale feature fusion method according to claim 3, wherein a calculation formula of the post-compensation process is:
Q=(tr(∑)) 1/2 Y J where tr (Σ) is the covariance matrix trace, Y J The results obtained after J iterations.
6. The method for multi-scale feature fusion based on Fourier convolution visible light and infrared images according to claim 3, wherein the specific process of the splicing process is to splice an upper triangular matrix of a symmetrical matrix obtained by post-compensation processing into a feature map of d (d-1)/2-dimensional vector, so as to obtain a comprehensive fused feature map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311037544.0A CN117058507B (en) | 2023-08-17 | 2023-08-17 | Fourier convolution-based visible light and infrared image multi-scale feature fusion method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311037544.0A CN117058507B (en) | 2023-08-17 | 2023-08-17 | Fourier convolution-based visible light and infrared image multi-scale feature fusion method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117058507A CN117058507A (en) | 2023-11-14 |
CN117058507B true CN117058507B (en) | 2024-03-19 |
Family
ID=88658487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311037544.0A Active CN117058507B (en) | 2023-08-17 | 2023-08-17 | Fourier convolution-based visible light and infrared image multi-scale feature fusion method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117058507B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570845B (en) * | 2024-08-01 | 2024-09-27 | 江南大学 | Frequency-adaptive cross-mode pedestrian retrieval method and device |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401373A (en) * | 2020-03-04 | 2020-07-10 | 武汉大学 | Efficient semantic segmentation method based on packet asymmetric convolution |
CN111738314A (en) * | 2020-06-09 | 2020-10-02 | 南通大学 | Deep learning method of multi-modal image visibility detection model based on shallow fusion |
CN111899206A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Medical brain image fusion method based on convolutional dictionary learning |
CN111899209A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Visible light infrared image fusion method based on convolution matching pursuit dictionary learning |
CN111899207A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Visible light and infrared image fusion method based on local processing convolution dictionary learning |
CN112801040A (en) * | 2021-03-08 | 2021-05-14 | 重庆邮电大学 | Lightweight unconstrained facial expression recognition method and system embedded with high-order information |
WO2021120404A1 (en) * | 2019-12-17 | 2021-06-24 | 大连理工大学 | Infrared and visible light fusing method |
CN113159067A (en) * | 2021-04-13 | 2021-07-23 | 北京工商大学 | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation |
CN114005046A (en) * | 2021-11-04 | 2022-02-01 | 长安大学 | Remote sensing scene classification method based on Gabor filter and covariance pooling |
CN114445430A (en) * | 2022-04-08 | 2022-05-06 | 暨南大学 | Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion |
CN115019132A (en) * | 2022-06-14 | 2022-09-06 | 哈尔滨工程大学 | Multi-target identification method for complex background ship |
CN115100301A (en) * | 2022-07-19 | 2022-09-23 | 重庆七腾科技有限公司 | Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow |
CN115688040A (en) * | 2022-11-08 | 2023-02-03 | 西安交通大学 | Mechanical equipment fault diagnosis method, device, equipment and readable storage medium |
CN116310688A (en) * | 2023-03-16 | 2023-06-23 | 城云科技(中国)有限公司 | Target detection model based on cascade fusion, and construction method, device and application thereof |
CN116486251A (en) * | 2023-03-01 | 2023-07-25 | 中国矿业大学 | Hyperspectral image classification method based on multi-mode fusion |
CN116486288A (en) * | 2023-04-23 | 2023-07-25 | 东南大学 | Aerial target counting and detecting method based on lightweight density estimation network |
-
2023
- 2023-08-17 CN CN202311037544.0A patent/CN117058507B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021120404A1 (en) * | 2019-12-17 | 2021-06-24 | 大连理工大学 | Infrared and visible light fusing method |
CN111401373A (en) * | 2020-03-04 | 2020-07-10 | 武汉大学 | Efficient semantic segmentation method based on packet asymmetric convolution |
CN111738314A (en) * | 2020-06-09 | 2020-10-02 | 南通大学 | Deep learning method of multi-modal image visibility detection model based on shallow fusion |
CN111899206A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Medical brain image fusion method based on convolutional dictionary learning |
CN111899209A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Visible light infrared image fusion method based on convolution matching pursuit dictionary learning |
CN111899207A (en) * | 2020-08-11 | 2020-11-06 | 四川警察学院 | Visible light and infrared image fusion method based on local processing convolution dictionary learning |
CN112801040A (en) * | 2021-03-08 | 2021-05-14 | 重庆邮电大学 | Lightweight unconstrained facial expression recognition method and system embedded with high-order information |
CN113159067A (en) * | 2021-04-13 | 2021-07-23 | 北京工商大学 | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation |
CN114005046A (en) * | 2021-11-04 | 2022-02-01 | 长安大学 | Remote sensing scene classification method based on Gabor filter and covariance pooling |
CN114445430A (en) * | 2022-04-08 | 2022-05-06 | 暨南大学 | Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion |
CN115019132A (en) * | 2022-06-14 | 2022-09-06 | 哈尔滨工程大学 | Multi-target identification method for complex background ship |
CN115100301A (en) * | 2022-07-19 | 2022-09-23 | 重庆七腾科技有限公司 | Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow |
CN115688040A (en) * | 2022-11-08 | 2023-02-03 | 西安交通大学 | Mechanical equipment fault diagnosis method, device, equipment and readable storage medium |
CN116486251A (en) * | 2023-03-01 | 2023-07-25 | 中国矿业大学 | Hyperspectral image classification method based on multi-mode fusion |
CN116310688A (en) * | 2023-03-16 | 2023-06-23 | 城云科技(中国)有限公司 | Target detection model based on cascade fusion, and construction method, device and application thereof |
CN116486288A (en) * | 2023-04-23 | 2023-07-25 | 东南大学 | Aerial target counting and detecting method based on lightweight density estimation network |
Also Published As
Publication number | Publication date |
---|---|
CN117058507A (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109741256B (en) | Image super-resolution reconstruction method based on sparse representation and deep learning | |
CN111709902B (en) | Infrared and visible light image fusion method based on self-attention mechanism | |
Lin et al. | Hyperspectral image denoising via matrix factorization and deep prior regularization | |
CN111080567A (en) | Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network | |
Panigrahy et al. | Parameter adaptive unit-linking dual-channel PCNN based infrared and visible image fusion | |
CN107730482B (en) | Sparse fusion method based on regional energy and variance | |
CN117058507B (en) | Fourier convolution-based visible light and infrared image multi-scale feature fusion method | |
CN113762277B (en) | Multiband infrared image fusion method based on Cascade-GAN | |
CN114862731B (en) | Multi-hyperspectral image fusion method guided by low-rank priori and spatial spectrum information | |
CN105225213B (en) | A kind of Color Image Fusion method based on S PCNN and laplacian pyramid | |
Cui et al. | Dual-domain strip attention for image restoration | |
Yin et al. | Significant target analysis and detail preserving based infrared and visible image fusion | |
Pan et al. | FDPPGAN: remote sensing image fusion based on deep perceptual patchGAN | |
CN115984323A (en) | Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization | |
CN117576483B (en) | Multisource data fusion ground object classification method based on multiscale convolution self-encoder | |
Li et al. | Low-light hyperspectral image enhancement | |
Huang et al. | MAGAN: Multiattention generative adversarial network for infrared and visible image fusion | |
CN117853596A (en) | Unmanned aerial vehicle remote sensing mapping method and system | |
CN117726938A (en) | Depth multi-scale attention network model and hyperspectral image classification method based on limited training sample | |
Sun et al. | IMGAN: Infrared and visible image fusion using a novel intensity masking generative adversarial network | |
Cui et al. | Enhancing Local–Global Representation Learning for Image Restoration | |
Salem et al. | Image fusion models and techniques at pixel level | |
CN116051444A (en) | Effective infrared and visible light image self-adaptive fusion method | |
CN108596831B (en) | Super-resolution reconstruction method based on AdaBoost example regression | |
CN112990230B (en) | Spectral image compression reconstruction method based on two-stage grouping attention residual error mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |