CN116342455B - Efficient multi-source image fusion method, system and medium - Google Patents
Efficient multi-source image fusion method, system and medium Download PDFInfo
- Publication number
- CN116342455B CN116342455B CN202310614277.2A CN202310614277A CN116342455B CN 116342455 B CN116342455 B CN 116342455B CN 202310614277 A CN202310614277 A CN 202310614277A CN 116342455 B CN116342455 B CN 116342455B
- Authority
- CN
- China
- Prior art keywords
- fusion
- image
- network
- weight
- source image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 28
- 230000004927 fusion Effects 0.000 claims abstract description 169
- 238000000034 method Methods 0.000 claims abstract description 76
- 238000000605 extraction Methods 0.000 claims abstract description 34
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 96
- 230000004913 activation Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000000007 visual effect Effects 0.000 abstract description 5
- 230000014759 maintenance of location Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-source image high-efficiency fusion method, a system and a medium, wherein the method comprises the following training for a multi-source image high-efficiency fusion network consisting of two feature extraction networks D and a feature reconstruction network F: source image of sample pairAnd (3) withRespectively inputting the high-dimensional image features extracted by the feature extraction network DAnd (3) withSpliced as fusion featuresGenerating a fusion image through a feature fusion network FThe method comprises the steps of carrying out a first treatment on the surface of the Will beAnd (3) withRespectively input to the intermediate information layer L i Generating weightsAnd (3) withTo guide the calculation of the value of the Loss function Loss designed according to the similarity between the weight fusion image and the weight source image to complete the training of the feature extraction network D and the feature fusion network F. The invention has the advantages of high fusion speed, good visual effect, obvious texture information, high structure retention and strong universality.
Description
Technical Field
The invention relates to the technical field of efficient multi-source image fusion, in particular to a method, a system and a medium for efficient multi-source image fusion.
Background
The efficient fusion of the multi-source images aims to quickly integrate the dominant information of different input source images into one image so as to improve the efficiency of subsequent works such as image interpretation, target recognition, scene classification and the like. This type of fusion has a great deal of application in video surveillance, camouflage analysis, photography, and the like. Specifically, such image fusion is a fusion mainly including different kinds of multi-modal fusion, multi-focus fusion, multi-exposure fusion, and the like.
Typically this type of fusion is categorized into two broad categories, traditional image fusion and non-traditional image fusion. The traditional pixel-level unified image fusion is almost indistinguishable from three steps of manually designed image feature extraction, fusion and reconstruction. The most common feature extraction methods include Sparse Representation (SR), multi-scale transformation (MST), dictionary learning, and the like. The fusion rules include maximum, minimum, addition, L1 norm (L1-norm), etc. The traditional pixel-level unified image fusion method severely depends on expert knowledge to extract features and formulate a fusion strategy, which not only results in poor generalization capability of a model, but also severely limits the fusion effect. In addition, this method generally requires parameter adjustment according to the fusion type, and the parameter adjustment process is complicated and time-consuming. In recent years, the non-traditional image fusion taking convolutional neural networks as the main stream has made a great breakthrough in fusion performance, and the non-traditional image fusion can be roughly classified into network types based on loss function control, priori information control, manual operator and the like according to network types. U2Fusion (unified unsupervised image Fusion network, see: xu, han, et al, "U2Fusion: A unified unsupervised image Fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence 44.1.1 (2020): 502-518), fusion DN (image Fusion unified dense connectivity network, see: xu, han, et a)"fusion: A unified densely connected network for image fusion" Proceedings of the AAAI Conference on Artificial intelligence. Vol. 34. No. 07.2020.) PMGI (fast unified image fusion network based on gradient and intensity scale maintenance, see: zhang, hao, et al, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity," Proceedings of the AAAI Conference on Artificial intelligence, vol.34, no. 07.2020) to achieve image fusion by calculating the similarity between the source image and the fusion result. Furthermore, the weight of the loss function is also typically set according to the task or source image. Specifically, fusion dn uses a manual operator to determine weights, while U2Fusion achieves more accurate Fusion by determining weights using a pre-trained feature extraction network, and PMGI adapts to different Fusion tasks by manually designing weights for different intensity loss functions and gradient loss functions. A representative method based on a priori information control is IFCNN (general image fusion framework based on convolutional neural network, see: zhang, yu, et al, "IFCNN: A general image fusion framework based on convolutional neural network," Information Fusion (2020): 99-118.), which implements image fusion using multi-focal source images and Ground Truth values (Ground Truth) thereof, so that the fusion network can reveal the relationship between source images and Ground Truth. The method based on the artificial operator is first performed in deep (deep unsupervised network for exposure fusion with extreme exposure image pairs, see: ram Prabhakar, k., v. Sai Srikar, and r. Venkatesh babu, "deep: A deep unsupervised approach for exposure fusion with extreme exposure image pairs.":Proceedings of the IEEE International Conference on Computer Vision2017.), the method uses a maximum fusion rule. The method aims to solve the problem of lack of training data and ground truth value in the fusion process. To enhance the fusion effect of DeepFuse, denseFose (Infrared and visible light fusion network, see: li, hui, and Xiao-Jun Wu. "DenseFose: A fusion ap)proach to infrared and visible images."IEEE Transactions on Image Processing28.5 (2018): 2614-2623.) extend the L1 norm into the fusion rules and add dense connections in the network.
However, deep learning networks still have some problems, and some deep learning networks adopt redundant converged network structural designs, so that the calculation cost is high, and the deep learning networks are difficult to adapt to tasks requiring high efficient performance. Meanwhile, the existing method does not fully consider and explore the differences of source images, particularly in the aspect of design of loss functions.
Disclosure of Invention
The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides a multi-source image efficient fusion method, a multi-source image efficient fusion system and a multi-source image efficient fusion medium, the fusion speed can achieve the efficient requirement, the differences of the source images such as texture features and image characteristics can be accurately estimated, images with excellent visual effects can be generated, the texture detail information of the source images can be enhanced, information such as illumination can be effectively processed, and meanwhile, the multi-source image efficient fusion method has the advantages of no obvious artifact, weak input image restriction and the like, and has strong universality.
In order to solve the technical problems, the invention adopts the following technical scheme:
the multi-source image high-efficiency fusion method comprises the following training for a multi-source image high-efficiency fusion network consisting of two feature extraction networks D and one feature reconstruction network F:
s101, establishing two types of source images to be fusedAnd->A sample set of sample pairs of (a);
s102, source images of the sample pairsAnd->Respectively inputting the high-dimensional image features extracted by the feature extraction network D>And->;
S103, the high-dimensional image is characterizedAnd->Splicing to obtain fusion characteristics->Fusion feature->The input feature fusion network F generates a fusion image +.>The method comprises the steps of carrying out a first treatment on the surface of the High-dimensional image features->And->Respectively input to the intermediate information layer L i Generating weight->And->;
S104, weightingAnd->Respectively->Multiplying to obtain two weight fusion images +.>And->The method comprises the steps of carrying out a first treatment on the surface of the Weight +.>And Source image->Multiplication, weight->And Source image->Multiplying to obtain two weight source images +.>And->;
S105, calculating a Loss function Loss value designed according to the similarity between the weight fusion image and the weight source image, and judging that the feature extraction network D and the feature fusion network F are trained if the Loss function Loss value meets a preset convergence condition; otherwise, the network parameters of the feature extraction network D and the feature fusion network F are adjusted, and the step S102 is skipped.
Optionally, the Loss function Loss designed in step S103 is a weighted sum of both a gradient Loss function and an intensity Loss function, wherein the gradient Loss function is a weighted sum of both a gradient Loss function of an exponential measure of structural similarity SSIM, a gradient Loss function of a mean square error MSE of structural similarity, said structural similarity comprising a weighted fusion imageAnd weight source image->Similarity between the weight fusion images +.>Weighting source image with weight source image +.>Similarity between them.
Optionally, the computational function expression of the gradient loss function of the exponential measure SSIM of structural similarity is:
,
in the above-mentioned method, the step of,gradient loss function for SSIM, an exponential measure of structural similarity, SIMM is an exponential measure of structural similarity, +.>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
Optionally, the calculation function expression of the gradient loss function of the mean square error MSE of the structural similarity is:
,
in the above-mentioned method, the step of,gradient loss function of MSE, which is mean square error of structural similarity, ++>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
Optionally, the calculation function expression of the intensity loss function is:
,
in the above-mentioned method, the step of,MSE is the mean square error of structural similarity, as a strength loss function, +.>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
Optionally, the weight matrix、/>、/>、/>The expression of the calculation function of (c) is:
,
,
,
,
in the above-mentioned method, the step of,as constant coefficients, fgrad is gradient operator,>and->Respectively high-dimensional image features->And->Tensor of (1), sigmoid is a normalization function, fmean is the mean of the calculated tensor, intermediate variable +.>And->The expression of the calculation function of (c) is:
,
in the above-mentioned method, the step of,and->Is an intermediate variable, and has:
,
,
where sqrt is a square root calculation.
Optionally, the feature reconstruction network F performs feature reconstruction to obtain a fused image includes: the fusion features are subjected to 3×3 convolution layers to obtain tensors with the channel number of 64, activated tensors are generated through an activation function LeakyReLU, the channel number of the activated tensors is reduced to 1 through the 3×3 convolution layers, and finally the final fusion image is obtained through the activation function LeakyReLU.
Optionally, the feature extraction network D extracts high-dimensional image features including: and (3) obtaining a tensor with the channel number of 32 by a 3X 3 convolution layer from the source image, generating an activated tensor by an activation function LeakyReLU, lifting the channel number of 64 by the activated tensor by the 3X 3 convolution layer, and finally obtaining a high-dimensional image characteristic corresponding to the source image by the activation function LeakyReLU.
In addition, the invention also provides a multi-source image efficient fusion system which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the multi-source image efficient fusion method.
Furthermore, the present invention provides a computer readable storage medium having stored therein a computer program for programming or configuring by a microprocessor to perform the multi-source image efficient fusion method.
Compared with the prior art, the invention has the following advantages:
1. the multi-source image efficient fusion network formed by the two feature extraction networks D and the feature reconstruction network F can effectively realize multi-source image efficient fusion, and the high-dimensional image features are adoptedAnd->Respectively input to the intermediate information layer L i Generating weight->And->Weight +.>And->Respectively->Multiplying to obtain two weight fusionComposite imageAnd->Weight +.>And->And source image of sample pair +.>And->Multiplying to obtain two weight source images +.>And->The method calculates the value of a Loss function Loss designed according to the similarity between the weight fusion image and the weight source image, realizes the image fusion Loss function guided by intermediate information, well reserves the gradient and strength information of the source image, has good visual effect of the generated fusion image, and can better reserve the texture characteristics, strength characteristics and the like of the fused image.
2. The multi-source image high-efficiency fusion network formed by the two feature extraction networks D and the feature reconstruction network F realizes a lightweight network for avoiding redundant structural design, has extremely low requirement on calculation training cost, and can be suitable for production and research of light and small-sized products.
3. The multi-source image high-efficiency fusion network formed by the two feature extraction networks D and the feature reconstruction network F can adapt to different types of pixel-level image fusion tasks only by training one model.
4. The invention realizes the image fusion loss function guided by the intermediate information, and the proposed mode of guiding the image generation by the intermediate information layer can be expanded into other network applications so as to enhance the extraction capability of the network for the concerned information.
5. The invention has the capability of processing data of different sources, and can realize different types of image fusion, including infrared visible light image fusion, multi-focus image fusion and multi-exposure image fusion.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a multi-source image efficient fusion network according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of training principle of a multi-source image efficient fusion network in an embodiment of the invention.
Fig. 5 is a schematic diagram showing comparison of multi-focus fusion results of a fusion layer according to an embodiment of the present invention.
FIG. 6 is a comparison of weights and fusion results in an embodiment of the present invention.
FIG. 7 is a comparison result of the fusion experiment of infrared and visible light images in the embodiment of the present invention.
Detailed Description
As shown in fig. 1 and 2, the multi-source image efficient fusion method of the present embodiment includes the following training for a multi-source image efficient fusion network composed of two feature extraction networks D and one feature reconstruction network F:
s101, establishing two types of source images to be fusedAnd->A sample set of sample pairs of (a);
s102, source images of the sample pairsAnd->Respectively inputting the high-dimensional image features extracted by the feature extraction network D>And->;
S103, the high-dimensional image is characterizedAnd->Splicing to obtain fusion characteristics->Fusion feature->The input feature fusion network F generates a fusion image +.>The method comprises the steps of carrying out a first treatment on the surface of the High-dimensional image features->And->Respectively input to the intermediate information layer L i Generating weight->And->;
S104, weightingAnd->Respectively->Multiplying to obtain two weight fusion images +.>And->The method comprises the steps of carrying out a first treatment on the surface of the Weight +.>And Source image->Multiplication, weight->And Source image->Multiplying to obtain two weight source images +.>And->;
S105, calculating a Loss function Loss value designed according to the similarity between the weight fusion image and the weight source image, and judging that the feature extraction network D and the feature fusion network F are trained if the Loss function Loss value meets a preset convergence condition; otherwise, the network parameters of the feature extraction network D and the feature fusion network F are adjusted, and the step S102 is skipped.
Based on the training process, when the multi-source image efficient fusion network works, two feature extraction networks D respectively input a source image and extract high-dimensional image features, and then the feature reconstruction network F fuses the high-dimensional image features extracted by the two feature extraction networks D to obtain a pixel-level image fusion result.
In this embodiment, the Loss function Loss designed in step S103 is a weighted sum of both the gradient Loss function and the intensity Loss function, wherein the gradient Loss function is the gradient Loss function of the exponential measure SSIM of the structural similarity,Weighted summation of both gradient loss functions of mean square error MSE of structural similarity including weight fused imageAnd weight source image->Similarity between the weight fusion images +.>Weighting source image with weight source image +.>Similarity between them. As a specific embodiment, the weighted summation of both the gradient loss function and the intensity loss function can be expressed as:
,
in the above-mentioned method, the step of,representing gradient loss function, ++>Representing the intensity loss function, +.>Represents weight, weight->The general value interval is (0.1-1), and the value is 0.4 in this embodiment.
Gradient loss functionGradient loss function of an exponential measurement SSIM for structural similarity +.>Mean square error of structural similarityGradient loss function of MSE->The weighted sum of the two can be expressed as:
,
in the above-mentioned method, the step of,represents weight, weight->The general value interval is (10) -1 ~ 10 3 ) In this embodiment, the value is 20.
In this embodiment, the calculation function expression of the gradient loss function of the exponential measurement SSIM of the structural similarity is:
,
in the above-mentioned method, the step of,gradient loss function for SSIM, an exponential measure of structural similarity, SIMM is an exponential measure of structural similarity, +.>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
In this embodiment, the expression of the gradient loss function of the mean square error MSE of the structural similarity is:
,
in the above-mentioned method, the step of,gradient loss function of MSE, which is mean square error of structural similarity, ++>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
In this embodiment, the calculation function expression of the intensity loss function is:
,
in the above-mentioned method, the step of,MSE is the mean square error of structural similarity, as a strength loss function, +.>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-wise multiplication.
In this embodiment, the weight matrix、/>、/>、/>The expression of the calculation function of (c) is:
,
,
,
,
in the above,As constant coefficients, fgrad is gradient operator,>and->Respectively high-dimensional image features->And->Tensor of (1), sigmoid is a normalization function, fmean is the mean of the calculated tensor, intermediate variable +.>And->The expression of the calculation function of (c) is:
,
in the above-mentioned method, the step of,and->Is an intermediate variable, and has:
,
,
where sqrt is a square root calculation.
The key point of the method of the embodiment is that the method aims at a combined training mode of a feature reconstruction network F and a feature extraction network D in a multi-source image high-efficiency fusion network, and the feature reconstruction network F and the feature extraction network D can adopt the needed existing deep neural network structure according to the needs. For example, as an alternative embodiment, as shown in fig. 3, the feature reconstruction network F performs feature reconstruction to obtain a fused image includes: the fusion features are subjected to 3×3 convolution layers to obtain tensors with the channel number of 64, activated tensors are generated through an activation function LeakyReLU, the channel number of the activated tensors is reduced to 1 through the 3×3 convolution layers, and finally the final fusion image is obtained through the activation function LeakyReLU.
For example, as an alternative embodiment, as shown in fig. 3, the feature extraction network D extracts high-dimensional image features including: and (3) obtaining a tensor with the channel number of 32 by a 3X 3 convolution layer from the source image, generating an activated tensor by an activation function LeakyReLU, lifting the channel number of 64 by the activated tensor by the 3X 3 convolution layer, and finally obtaining a high-dimensional image characteristic corresponding to the source image by the activation function LeakyReLU.
Corresponding to different source images, the method of the embodiment uses the feature extraction network D with the same set of network parameters to perform feature extraction to generate high-dimensional features, and then the high-dimensional features are combined and input into the feature reconstruction network F to generate a fusion image. The network structures of the feature reconstruction network F and the feature extraction network D are very simple.
Fig. 4 is a schematic diagram of a training principle of the multi-source image efficient fusion network in this embodiment. As can be seen from a review of FIG. 4, the present embodiment uses an intermediate information layer to characterize the high-dimensional imageAnd->Respectively input to the intermediate information layer L i Generating weight->And (3) withTo extract different for high-dimensional informationGradient and intensity information, and using the information to generate a weight-guided loss function to calculate similarity between the source image and the fused image, the intermediate information layer being formed by combining a 1 x 1 convolution layer with a Tanh activation function.
In this embodiment, a multi-aggregation fusion dataset (Lytro) is expanded as training data of network parameters, and since the processing of multi-channel and single-channel images in the multi-source image efficient fusion network in this embodiment has little influence on the network structure, the dataset is converted into a single-channel gray scale map, and is cut into pixels with the size of 128×128 as input, and 10% of the total input data is taken as a verification set. If the multi-channel input network or the multi-channel output network needs to be trained, the number of network parameter input channels (n_channels) and the number of categories (n_categories) need to be changed into proper values. The learning rate is 1e-4, and the parameters are updated by a callback function (ReduceLROnPlateeau). The batch size at training was set to 32.
To further verify the effectiveness of the method of this example, three typical Fusion methods of U2Fusion, IFCNN, denseFuse were used as a comparison in this example, which was compared to the method of this example for a multi-source image efficient Fusion experiment. Experiments were performed on NVIDIA Quadro RTX, 6000 and 2.10 GHz Intel Xeon silver 4216 CPU using the multi-focus fusion dataset Lytro as training data, and in order to verify the effectiveness of the unified fusion of this embodiment, three different fusion methods were selected for comparative verification, including three types of multi-modal fusion, multi-focus fusion and multi-exposure fusion.
For multimodal fusion, the present example selects the most representative visible and infrared fusion dataset TNO. For the visible and infrared fusion dataset TNO, this example selected 21 pairs of typical images from the entire scene, and the experimental results obtained are shown in table 1 and fig. 5.
Table 1 objective performance metrics for the present example method and three exemplary multi-modal fusion methods.
Further, the present example verifies multi-focus fusion by using 13 pairs of images from MFFW dataset, and the experimental results obtained are shown in table 2 and fig. 6.
Table 2 objective performance metrics for the present example method and three exemplary multi-focus fusion methods.
For multi-exposure fusion, the present embodiment selects 20 pairs of appropriate EMPA-HDR datasets. In order to accommodate the device limitations, this example performed four downsampling operations on the EMPA-HDR dataset, and the experimental results obtained are shown in Table 3 and FIG. 7.
Table 3 objective performance metrics for the present example method and three typical multi-exposure fusion methods.
In tables 1 to 3, the index (EI) reflects the gradient information extraction capability of the corresponding method, and the larger the index is, the better the effect is. The index Entropy (EN) calculates the detail features of the image, and the larger the index is, the better the effect is. The inter-indicator (MI) measures the similarity between images, with a larger indicator indicating a better effect. The Index (TIME) reflects the TIME in seconds required for the different methods to fuse, the smaller the index, the better. The index (FPS) reflects the number of image frames that can be generated per second, and can reflect the efficiency of the algorithm, and a larger index indicates a better effect. As can be seen from table 1, all objective evaluation indexes of the method proposed in this example are superior to those of other methods. The index (EI) in tables 2 and 3 is slightly lower than IFCNN, and the other indexes have good advancement. The method is characterized in that the mode of guiding the loss function by the intermediate information layer used by the network can greatly endow the network with the capability of processing intensity information and gradient information, so that the network can realize image feature extraction and image reconstruction by using only a very small amount of convolution, and high-efficiency and accurate pixel-level image fusion is realized.
Fig. 5, 6 and 7 are comparisons of three exemplary Fusion methods of U2Fusion, IFCNN, denseFuse and a total of four Fusion methods of the method of the present example on multi-modal Fusion, multi-focus Fusion and multi-exposure Fusion.
In fig. 5, a is an infrared source image, B is a visible light source image, C is a Fusion image obtained by Fusion of U2Fusion method, D is a Fusion image obtained by Fusion of IFCNN method, E is a Fusion image obtained by Fusion of DenseFuse method, and F is a Fusion image obtained by Fusion of the method proposed in this embodiment. As can be seen from fig. 5, the texture and intensity retention of the image fused by the three typical Fusion methods U2Fusion and IFCNN, denseFuse on the background and the target are poor, and the quality of the fused image fused by the method proposed in this embodiment is the best.
In fig. 6, a is a far focus source image, B is a near focus source image, C is a Fusion image obtained by Fusion of U2Fusion method, D is a Fusion image obtained by Fusion of IFCNN method, E is a Fusion image obtained by Fusion of DenseFuse method, and F is a Fusion image obtained by Fusion of the method proposed in this embodiment. As can be seen from fig. 6, the Fusion image obtained by the Fusion method according to the present embodiment has the best effect on edge preservation, and can complement gradient advantage information of different source images, while the Fusion effect of the three typical Fusion methods of U2Fusion and IFCNN, denseFuse is general.
In fig. 7, a is a high exposure source image, B is a low exposure source image, C is a Fusion image obtained by Fusion of U2Fusion method, D is a Fusion image obtained by Fusion of IFCNN method, E is a Fusion image obtained by Fusion of DenseFuse method, and F is a Fusion image obtained by Fusion of the method proposed in this embodiment. As can be seen from fig. 7, the image obtained by Fusion of three typical Fusion methods U2Fusion and IFCNN, denseFuse has a general effect on image restoration and texture maintenance, and has obvious flaws; the fused image obtained by fusion according to the method provided by the embodiment has good performance in both aspects, and the obtained fused image has the best quality.
In summary, the method of the embodiment uses the convolutional neural network to extract the features of the input image, uses the intermediate information guiding optimization mode to the extracted features to generate the corresponding weight system, and gives the corresponding weight system to the loss function of the network. Inputting the source images to be fused into a network to generate high-dimensional features of corresponding different source images, merging the high-dimensional features subsequently and then using the merged high-dimensional features to reconstruct the images, extracting the intensity and gradient information of the high-dimensional features, and merging the intensity and gradient information into a loss function to calculate the similarity between the generated images and the source images. The image thus reconstructed is a fused image. The method has the advantages that excessive training is not needed, only training on a multi-focus data set is needed, and the method is suitable for different types of fusion tasks. After being compared with other advanced fusion methods, the fusion image generated by the fusion method used in the embodiment has higher objective performance index, better visual effect and more important, the method has extremely high efficiency and extremely strong universality and robustness. The method of the embodiment has the advantages of high fusion speed, good visual effect, obvious texture information, high structure retention, strong universality and the like.
In addition, the embodiment also provides a multi-source image efficient fusion system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the multi-source image efficient fusion method.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program for programming or configuring by a microprocessor to perform the multi-source image efficient fusion method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (5)
1. The multi-source image efficient fusion method is characterized by comprising the following training for a multi-source image efficient fusion network consisting of two feature extraction networks D and one feature reconstruction network F:
s101, establishing two types of source images to be fusedAnd->A sample set of sample pairs of (a);
s102, source images of the sample pairsAnd->Respectively inputting the high-dimensional image features extracted by the feature extraction network D>And->;
S103, the high-dimensional image is characterizedAnd->Splicing to obtain fusion characteristics->Fusion feature->The input feature fusion network F generates a fusion image +.>The method comprises the steps of carrying out a first treatment on the surface of the High-dimensional image features->And->Respectively input to the intermediate information layer L i Generating weight->And->The method comprises the steps of carrying out a first treatment on the surface of the The intermediate information layer L i The method is formed by combining a 1 multiplied by 1 convolution layer and a Tanh activation function;
s104, weightingAnd->Respectively->Multiplying to obtain two weight fusion images +.>And->The method comprises the steps of carrying out a first treatment on the surface of the Weight +.>And Source image->Multiplication, weight->And Source image->Multiplying to obtain two weight source images +.>And->;
S105, calculating a Loss function Loss value designed according to the similarity between the weight fusion image and the weight source image, and judging that the feature extraction network D and the feature fusion network F are trained if the Loss function Loss value meets a preset convergence condition; otherwise, adjusting network parameters of the feature extraction network D and the feature fusion network F, and jumping to the step S102;
the Loss function Loss designed in step S103 is a weighted sum of both the gradient Loss function and the intensity Loss function, wherein the gradient Loss function is a weighted sum of both the gradient Loss function of the exponential measure SSIM of the structural similarity, the gradient Loss function of the mean square error MSE of the structural similarity, the structural similarity comprising the weight fused imageAnd weight source imageSimilarity between the weight fusion images +.>Weighting source image with weight source image +.>Similarity between; the calculation function expression of the gradient loss function of the exponential measurement SSIM of the structural similarity is as follows:
,
in the above-mentioned method, the step of,gradient loss function for measuring SSIM as an index of structural similarity, SIMM is the index of structural similarityQuantity (S)>And->For two types of source images to be fused, +.>To fuse images, W 1 And W is 2 Are weight matrix->For element-by-element multiplication; the calculation function expression of the gradient loss function of the MSE of the structural similarity is as follows:
,
in the above-mentioned method, the step of,gradient loss function of MSE, which is mean square error of structural similarity, ++>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-by-element multiplication; the calculation function expression of the intensity loss function is as follows:
,
in the above-mentioned method, the step of,MSE is the mean square error of structural similarity, as a strength loss function, +.>And->For two types of source images to be fused, +.>For fusing images, +.>And->Are weight matrix->For element-by-element multiplication; the weight matrix->、/>、/>、The expression of the calculation function of (c) is:
,
,
,
,
in the above-mentioned method, the step of,as constant coefficients, fgrad is gradient operator,>and->Respectively high-dimensional image features->And->Tensor of (1), sigmoid is a normalization function, fmean is the mean of the calculated tensor, intermediate variable +.>And->The expression of the calculation function of (c) is:
,
in the above-mentioned method, the step of,and->Is an intermediate variable, and has:
,
,
where sqrt is a square root calculation.
2. The method of claim 1, wherein the feature reconstruction network F performs feature reconstruction to obtain a fused image comprises: the fusion features are subjected to 3×3 convolution layers to obtain tensors with the channel number of 64, activated tensors are generated through an activation function LeakyReLU, the channel number of the activated tensors is reduced to 1 through the 3×3 convolution layers, and finally the final fusion image is obtained through the activation function LeakyReLU.
3. The method of claim 1, wherein the feature extraction network D extracts high-dimensional image features comprising: and (3) obtaining a tensor with the channel number of 32 by a 3X 3 convolution layer from the source image, generating an activated tensor by an activation function LeakyReLU, lifting the channel number of 64 by the activated tensor by the 3X 3 convolution layer, and finally obtaining a high-dimensional image characteristic corresponding to the source image by the activation function LeakyReLU.
4. A multi-source image efficient fusion system comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the multi-source image efficient fusion method of any one of claims 1-3.
5. A computer readable storage medium having a computer program stored therein, wherein the computer program is for programming or configuring by a microprocessor to perform the multi-source image efficient fusion method of any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310614277.2A CN116342455B (en) | 2023-05-29 | 2023-05-29 | Efficient multi-source image fusion method, system and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310614277.2A CN116342455B (en) | 2023-05-29 | 2023-05-29 | Efficient multi-source image fusion method, system and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116342455A CN116342455A (en) | 2023-06-27 |
CN116342455B true CN116342455B (en) | 2023-08-08 |
Family
ID=86886215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310614277.2A Active CN116342455B (en) | 2023-05-29 | 2023-05-29 | Efficient multi-source image fusion method, system and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116342455B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668648A (en) * | 2020-12-29 | 2021-04-16 | 西安电子科技大学 | Infrared and visible light fusion identification method based on symmetric fusion network |
CN114331931A (en) * | 2021-11-26 | 2022-04-12 | 西安邮电大学 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
CN114529794A (en) * | 2022-04-20 | 2022-05-24 | 湖南大学 | Infrared and visible light image fusion method, system and medium |
CN116109538A (en) * | 2023-03-23 | 2023-05-12 | 广东工业大学 | Image fusion method based on simple gate unit feature extraction |
-
2023
- 2023-05-29 CN CN202310614277.2A patent/CN116342455B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668648A (en) * | 2020-12-29 | 2021-04-16 | 西安电子科技大学 | Infrared and visible light fusion identification method based on symmetric fusion network |
CN114331931A (en) * | 2021-11-26 | 2022-04-12 | 西安邮电大学 | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
CN114529794A (en) * | 2022-04-20 | 2022-05-24 | 湖南大学 | Infrared and visible light image fusion method, system and medium |
CN116109538A (en) * | 2023-03-23 | 2023-05-12 | 广东工业大学 | Image fusion method based on simple gate unit feature extraction |
Non-Patent Citations (1)
Title |
---|
Han Xu 等.U2Fusion: A Unified Unsupervised Image Fusion Network.《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》.2020,502-518. * |
Also Published As
Publication number | Publication date |
---|---|
CN116342455A (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107977932B (en) | Face image super-resolution reconstruction method based on discriminable attribute constraint generation countermeasure network | |
JP6980958B1 (en) | Rural area classification garbage identification method based on deep learning | |
CN110287846B (en) | Attention mechanism-based face key point detection method | |
CN111738363B (en) | Alzheimer disease classification method based on improved 3D CNN network | |
CN113628249B (en) | RGBT target tracking method based on cross-modal attention mechanism and twin structure | |
TW202141358A (en) | Method and apparatus for image restoration, storage medium and terminal | |
CN109005398B (en) | Stereo image parallax matching method based on convolutional neural network | |
CN112819910A (en) | Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network | |
CN115018727B (en) | Multi-scale image restoration method, storage medium and terminal | |
CN114283158A (en) | Retinal blood vessel image segmentation method and device and computer equipment | |
CN114511798B (en) | Driver distraction detection method and device based on transformer | |
CN111860528B (en) | Image segmentation model based on improved U-Net network and training method | |
CN107871103B (en) | Face authentication method and device | |
CN115311186B (en) | Cross-scale attention confrontation fusion method and terminal for infrared and visible light images | |
CN113066065B (en) | No-reference image quality detection method, system, terminal and medium | |
CN113870286B (en) | Foreground segmentation method based on multi-level feature and mask fusion | |
CN116757986A (en) | Infrared and visible light image fusion method and device | |
CN118379288B (en) | Embryo prokaryotic target counting method based on fuzzy rejection and multi-focus image fusion | |
CN117456330A (en) | MSFAF-Net-based low-illumination target detection method | |
CN115861094A (en) | Lightweight GAN underwater image enhancement model fused with attention mechanism | |
CN109086806A (en) | A kind of IOT portable device visual identity accelerated method based on low resolution, compressed image | |
CN116152128A (en) | High dynamic range multi-exposure image fusion model and method based on attention mechanism | |
CN118212240A (en) | Automobile gear production defect detection method | |
CN114972753A (en) | Lightweight semantic segmentation method and system based on context information aggregation and assisted learning | |
CN117830900A (en) | Unsupervised video object segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |