CN116630763A

CN116630763A - Multi-scale context awareness-based multi-focus image fusion method

Info

Publication number: CN116630763A
Application number: CN202310767148.7A
Authority: CN
Inventors: 刘羽; 齐争争; 成娟
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-08-22

Abstract

The invention discloses a multi-scale context awareness-based multi-focus image fusion method, which comprises the following steps: 1, data preparation and preprocessing, and multi-scale context-aware network construction, including: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module, wherein the encoder is a visual converter and is used for multi-scale feature extraction, and the decoder comprises: convolution and activation functions for multi-scale feature coarse decoding, the receptive field enhancement module comprising convolution and activation functions for feature receptive field enhancement, the multi-scale feature interaction module comprising: convolution and activation functions for multi-level feature fusion; 2 fusing the input multi-focus images, comprising: network training and multi-focus image fusion. The invention can fully utilize the complementary and redundant information in different defocused images to fuse the better quality fully focused image, provide better quality image for human eye observation, and simultaneously provide support for computer vision tasks such as image identification, segmentation and the like.

Description

Multi-scale context awareness-based multi-focus image fusion method

Technical Field

The invention relates to the technical field of multi-focus image fusion, in particular to a multi-focus image fusion method based on multi-scale context awareness.

Background

Because of the limited imaging capabilities of the camera, it is often difficult to capture images that are focused on all objects. In particular, the camera is limited by the depth of field, and only the image inside the depth of field remains in focus, while the image outside the depth of field becomes blurred. Therefore, in the current production and life, the information contained in the collected images is different and incomplete, the images contain different contour, texture and other characteristic information, the analysis of the image characteristics is inconvenient, and the comprehensive information can realize more comprehensive and clear understanding of targets in an image scene. The goal of multi-focus image fusion is therefore to extract and integrate this important information into one image. In order to solve the problem, a plurality of images with different focusing areas can be selected for image fusion to obtain a full-focus image, so that the defect of limited depth of field can be overcome, and the technology is called multi-focus image fusion.

The multi-focus image fusion is one of important branches in image fusion, and the main purpose is to fusion process a plurality of images of different focus areas in the same scene to obtain a full-focus image with clear all areas. The existing method has the problems that firstly, the traditional method mainly comprises the steps of manually designing and extracting characteristics and rules by human beings, has strong constraint and poor robustness, has poor effect under different scenes, and can generate misjudgment and artifact of a focusing area, thereby influencing a final fusion image; the other is a method mainly based on deep learning, the comparison depends on the design of a network structure, the former method mainly uses convolution, only focuses on the utilization of local characteristic information, ignores global information, and lacks feature fusion and interaction on multiple scales.

Disclosure of Invention

The invention provides a multi-focus image fusion method based on multi-scale context sensing, which aims to fully utilize complementary and redundant information of images in different focus areas to provide better image characteristic expression and reconstruct a full-focus image with higher quality, thereby providing images with better quality for human eye observation, and simultaneously providing support for computer vision tasks such as image identification, classification, segmentation and the like, thereby assisting research such as human eye identification, computer analysis and the like.

The invention adopts the following technical scheme for solving the problems:

the invention discloses a multi-scale context awareness-based multi-focus image fusion method which is characterized by comprising the following steps of:

step 1: acquiring a P-to-RGB multi-focus image and converting the multi-focus image into a gray image, and recording the gray image asAndand as training set, wherein->And->Respectively representing a foreground focusing image and a background focusing image in the p-th pair of gray level images; taking the p-th real ground mask corresponding to the p-th gray level image as the p-th label, and marking as G ^p Thereby constructing a label set { G ] of the P-to-RGB multi-focus image ¹ ,G ² ,…,G ^p ,…G ^P }；

Step 2: constructing a multi-scale context aware network, comprising: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module;

step 2.1: the encoder comprises 1 first convolution block Conv for adjusting the number of channels _3×3 And Y vision converters, wherein Conv _3×3 Representing 1 convolution layer with a convolution kernel of 3×3 with 1 ReLU activation function;

will p-th pair gray scale imageAnd->After being spliced in the channel dimension, the data are input into a multi-scale context awareness network and pass through a first convolution block Conv of an encoder _3×3 After processing to obtain the p-th input feature I ^p Then after being processed by Y vision converters in turn, Y primary feature images corresponding to the p-th pair of gray images are correspondingly obtained>Wherein (1)>Representing a y-th primary feature map;

step 2.2: the coarse positioning decoder consists of a plurality of multistage cross-scale connected second convolution blocks Conv _3×3 And 1 first convolution block Conv _1×1 Make up and map Y primary featuresProceeding with M _R Feature decoding at the individual stage to obtain the p-th coarse positioning decoder feature +.>And p-th initial decision diagram->Wherein Conv _1×1 Representing 1 convolution layer with 1 x 1 convolution kernel;

step 2.3: the receptive field enhancement module consists of 4 receptive field enhancement branches with the same structure but different parameters k and r and 5 second convolution blocks Conv _1×1 1 ReLU activation function, wherein each receptive field enhancement branch consists of 1 asymmetric convolution block Conv _1×k 1 asymmetric convolution block Conv _k×1 And 1 first stride convolution block Conv _k×k,r Sequentially stacking to form; wherein Conv _1×k Representing 1 asymmetric convolution layer with convolution kernel of 1 xk, conv _k×1 Representing 1 asymmetric convolution layer with a convolution kernel of kx1, conv _k×k,r Representing 1 symmetric convolution layer with convolution kernel k x k and step length r;

y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv _1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again _1×1 After the channel is adjusted, the product is obtainedTo the y-th characteristic diagram after adjustment +.>Will->And->Adding and processing by activating function ReLU to obtain final output y-th receptive field enhanced characteristic diagram +.>Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>

Step 2.4: the multi-scale feature interaction module comprises a preprocessing module, a multi-scale feature pyramid module and a third convolution block Conv _3×3 Composition and sequence of p-th coarse positioning decoder featuresAnd p-th initial phase decision diagram->After treatment, 1-Y-1 characteristic patterns of enhanced receptive fields are obtained +.>Corresponding p-th series multiscale interaction feature map +.>Decision map with p-th series->Wherein (1)>Characteristic map showing enhancement of the y-th receptive field, < >>Representing the kth stage feature map in the p-th series of multiscale interaction feature maps, ++>Representing a kth phase decision graph in the p-th series of decision graphs;

step 2.5: map the p-th downsampling decisionAfter up-sampling processing, a 1 st fusion decision graph is obtainedRespectively with the p-th series decision diagram->After stepwise addition, the p-th series fusion decision diagram +.>Wherein (1)>Representing p-th pair of gray-scale images->And->Is the kth fusion decision graph of (2);

step 2.6:after up-sampling operation and processing of Sigmoid activation function, the p-th multi-level output decision diagram is obtained>Wherein (1)>Representing p-th pair of gray-scale images->And->Is to be +.>As p-th pair gray scale image->And->Is a final decision graph of (1);

step 3: constructing a loss function using (1)

In the formula (1), L _wBCE Representing weighted binary cross entropy loss, L _wIOU Representing weighted cross-ratio loss;

constructing a total loss function L of a multi-scale context-aware network using (2) _total ：

Step 4: training the multi-scale context-aware network by adopting a back propagation algorithm based on the training set, and calculating the total loss functionNumber L _total Adjusting network parameters until the maximum iteration times are reached, so as to obtain a trained multi-scale context awareness network;

step 5: using final decision graphsAfter the inversion, a reverse decision diagram is obtained>Respectively, will make the final decision diagramAnd->Reverse decision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>

The multi-scale context awareness-based multi-focus image fusion method of the invention is also characterized in that the step 2.2 comprises:

step 2.2.1, when r=1, the coarse positioning decoder is at mth _r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input _3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv _3×3 After the treatment of (1) to obtain the Mth _r Individual phase output feature->

When r=2, the coarse positioning decoder is at mth _r Stage Y primary feature mapAnd Y-1 th primary profile +.>The upsampling operations are performed separately so that +.>And->Size and->Is the same in size and gives the (r+1) th up-sampled feature map +.>And (r+2) th upsampling feature map +.>Then, the (r+3) th and (r+4) th second convolution blocks Conv are input _3×3 And to obtain the (r+1) th and (r+2) th characteristic maps->And->Will beAnd->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+5) th and (r+6) th second convolution blocks Conv _3×3 After the treatment of (1) to obtain the Mth _r Individual phase output feature->

When r=3, 4, …, R-1, the coarse positioning decoder is at mth _r Stage and for the Y-th primary feature mapUp to Y-r+1 primary profile +.>After the same treatment, the output characteristics +.1 from the (R) th to the (R-1) th stages are obtained>

When r=r, the output characteristics of the R-1 stage are calculatedInput the last 2 second convolution blocks Conv _3×3 The processing of the R phase to obtain an output characteristic diagram +.>The p-th rough positioning decoder characteristic diagram finally output by the rough positioning decoder is input into a first convolution block Conv _1×1 After that, the p-th initial stage decision diagram is obtained>

The multi-scale feature pyramid module in the step 2.4 is composed of 4 multi-scale feature extraction branches with the same structure but different parameters k and r and 5 third convolution blocks Conv _1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv _k×k And 1 second stride convolution block Conv _k×k,r Sequentially stacked, wherein Conv _k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;

step 2.4.0, defining the current stage as k, and initializing k=1; will make the p-th initial stage decision diagramP decision diagram as k-1 stage->

Step 2.4.1: the preprocessing module makes a decision on the p decision diagram of the k-1 stageDownsampling operations are performed such thatAnd->Is the same in size and gives a p-th downsampling decision map in stage k-1 +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure in the k-1 stage +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>

Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map of the k-1 stage is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map of the k-1 stage is obtained correspondingly +.>And the p-th inverse characteristic diagram of the k-1 stage

Step 2.4.3: multiple scale feature pyramid module pairsAnd->After processing, an output characteristic diagram is obtained>And->Then respectively and p-th characteristic diagram->After short ligation, the p-th forward short feature map +.>And p-th inverse short profile->

Step 2.4.3.1:through 4 third convolution blocks Conv _1×1 After the adjustment of the channel, a 4-channel output profile of the k-1 stage is obtained +.>

Step 2.4.3.2: phase k-1 4 channel output profileRespectively inputting into 4 multi-scale feature extraction branches, and performing Conv by the symmetrical convolution blocks _k×k Obtaining 4 symmetrical convolution characteristic graphs in the k-1 stageThen pass through a second stride convolution block Conv _k×k,r After the processing of (2), 4 stride convolution feature maps of the k-1 stage are obtained as +.>

Step 2.4.3.3: outputting characteristic diagram of 4 channels in k-1 stage4 symmetrical convolution characteristic diagrams +.>And 4 stride convolutions characteristic map +.k-1 stage>After splicing in the channel dimension along the output sequence of the 4 multi-scale feature extraction branches, 4 multi-scale fusion feature graphs in the k-1 stage are correspondingly obtained>

Step 2.4.3.4: mapping the k-1 stage 4 multiscale fusion features

Respectively superposing a plurality of multi-scale fusion characteristics before the k-1 stage to correspondingly obtain 4 superposition characteristic diagrams of the k-1 stage>

Step 2.4.3.5: the 4 overlapped feature images in the k-1 stage are spliced again in the channel dimension to obtain 1 spliced feature image in the k-1 stageWill->Input 5 th third convolution block Conv _1×1 After the channel is adjusted, a characteristic diagram after the k-1 phase adjustment is obtained>

Step 2.4.3.6: will beAnd->After addition, the p-th multi-scale feature pyramid forward feature map ++is obtained after the treatment of the ReLU activation function>

Step 2.4.3.7: the characteristic diagram is mapped according to the process from step 2.4.3.1 to step 2.4.3.6After the same treatment, the p-th multi-scale characteristic pyramid inverse characteristic diagram +.>

Step 2.4.4: p-th forward short feature map featureAnd the p-th inverse short feature mapAfter multiplying two self-learning parameters 'alpha' and 'beta', the p-th self-learning characteristic diagram pair +_>And->

Step 2.4.5: will p-th feature mapThrough a third convolution block Conv _3×3 After the up-sampling operation, the p up-sampling characteristic diagram is obtained>

Step 2.4.6: map the p-th self-learning featureAnd p-th upsampling feature->After subtraction, the p-th fusion characteristic F is obtained ^p ；

Step 2.4.7: fusing feature F with p ^p 、And p-th self-learning feature map->After addition, the p-th multiscale feature interaction feature map ++in the k-th stage is obtained>

Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding K-stage p-th series multiscale interaction feature mapDecision diagram +.p series with K stages>Otherwise, go to step 2.4.9;

step 2.4.9: stage kAfter 3 times of third convolution block Conv _3×3 After the processing of (a) a kth phase p decision diagram is obtained +.>

Step 2.4.10: will beAnd the kth stage p decision diagram D _k ^p The up-sampling operations are performed separately so thatAnd->Is the same in size and correspondingly gets the kth phase p up-sampling decision diagram +.>And kth phase p upsampling feature map +.>And with the characteristic pattern of receptive field enhancement->And (3) inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step (2.4.1) for sequential execution after the k+1 values are assigned to the k.

The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute the multi-focus image fusion method, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the multi-focus image fusion method.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a unified network framework to realize the multi-focus image fusion task, fully utilizes redundancy and complementary information among images of different modes, and fuses high-quality images. Compared with the prior art, the method provided by the invention simulates the mechanism of the human visual system by learning the multi-scale context sensing characteristics combined with local and global information, and better promotes the learning of a network, thereby obtaining the image fusion result with higher quality.

2. The invention designs a multi-scale feature combining CNN and a transducer, which effectively extracts local and global features; through introducing a transducer model, long-distance dependency relationship is established, multi-scale information in an image is extracted, and shallow characteristic information is enhanced by utilizing a receptive field enhancement module; the method can better enable the characteristics of different scales to have stronger semantic information; the details of the bottom layer and the semantic information of the high layer are integrated, so that better detail expression is brought to the fusion result.

3. The invention designs a coarse positioning decoder, which is used for gradually aggregating multi-scale features extracted from a trans-former backbone network, wherein the aggregated features contain context information, and coarse features and decision diagrams are generated to provide guidance for subsequent steps. In addition, the invention designs a multi-scale feature interaction module which is used for focusing on the information of the focusing area and the defocusing area simultaneously, and promotes the interaction between the two types of information so as to enrich local details and inhibit the misclassified area. The multi-scale feature interaction module is used for realizing the fusion interaction of multi-scale information between the shallow features and the decoder features, optimizing rough features and decision graphs obtained through the decoder, guiding the network to learn detail features better, realizing more accurate defocusing region detection and enhancing the fused image quality.

Drawings

FIG. 1 is a flow chart of a multi-scale context aware multi-focus image fusion method of the present invention;

FIG. 2 is a schematic diagram of a network architecture according to the present invention;

FIG. 3 is a schematic diagram of a fusion structure according to the present invention;

FIG. 4 is a schematic diagram of a coarse positioning decoder according to the present invention;

FIG. 5 is a schematic view of a receptive field enhancement module of the invention;

FIG. 6a is a schematic diagram of a multi-scale feature interaction module structure of the present invention;

FIG. 6b is a schematic diagram of a multi-scale feature pyramid module architecture of the present invention.

Detailed Description

In this embodiment, a multi-scale context-aware multi-focus image fusion method, as shown in fig. 1, includes the following steps:

Step 2: constructing a multi-scale context aware network using as shown in fig. 2, comprising: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module; s and plus in the circular symbols shown in FIG. 2 represent Sigmoid activation function processing and pixel level addition processing, respectively;

in this embodiment, as shown in the structure diagram of fig. 2, y=4; p-th pair of gray scale imagesAnd->They are a pair of 512×512×1 images, which are input into a multi-scale context-aware network after being spliced in the channel dimension, and pass through a first convolution block Conv _3×3 The number of channels is regulated to obtain the p-th input characteristic diagram I ^p The size is 512 multiplied by 3, and then the primary deep characteristic images are respectively 128 multiplied by 64, 64 multiplied by 128, 32 multiplied by 320 and 16 multiplied by 512 respectively marked as +.>Wherein the visual transducer is a standard visual transducer;

step 2.2: the coarse positioning decoder is formed by a plurality of multistage cross-scale connected second convolution blocks Conv _3×3 And 1 first convolution block Conv _1×1 Make up and map Y primary featuresProceeding with M _R Feature decoding at the individual stage to obtain the p-th coarse positioning decoder feature +.>And p-th initial decision diagram->Wherein Conv _1×1 Representing 1 convolution layer with 1 x 1 convolution kernel;

in the present embodiment, as shown in the structure diagram of FIG. 4, M _R =4, x and C in the circular symbol shown, respectively representing pixel-level multiplication processing and splicing processing in the channel dimension;

step 2.2.1, coarse positioning decoder is at mth when r=1 _r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input _3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying and then withSplicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv _3×3 After the treatment of (1) to obtain the Mth _r Individual phase output feature->

When r=2, the coarse positioning decoder is at mth _r Stage Y primary feature mapAnd Y-1 th primary profile +.>The upsampling operations are performed separately so that +.>And->Size and->Is the same in size and gives the (r+1) th up-sampled feature map +.>And (r+2) th upsampling feature map +.>Then, the (r+3) th and (r+4) th second convolution blocks Conv are input _3×3 And to obtain the (r+1) th and (r+2) th characteristic maps->And->Will be And->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+5) th and (r+6) th second convolution blocks Conv _3×3 After the treatment of (1) to obtain the Mth _r Individual phase output feature->

When r=3, 4, …, R-1, the coarse positioning decoder is at mth _r Stage and for the Y-th primary feature mapUp to Y-r+1 primary profile +.>After the same treatment, the output characteristics from the (R) th stage to the (R-1) th stage are obtained

As shown in the structure diagram of FIG. 2, the primary deep feature mapInputting to a coarse positioning decoder to obtain a coarse positioning characteristic map +.>And initial decision diagram->The sizes are 128×128×64 and 128×128×1, respectively;

in this embodiment, as shown in the structure diagram of fig. 5, parameters k and r in 4 receptive field enhancement branches are {3,5,7,9}, and +, C in the circular symbol shown, respectively represent pixel-level addition processing and splicing processing in the channel dimension;

y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv _1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again _1×1 After the channel has been set, an adjusted y-th characteristic map is obtained +.>Will->And->Adding and processing the active function ReLU to obtain the final output y-th receptive field increaseStrong feature map->Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>

As shown in the structure diagram of FIG. 2, the primary shallow feature mapParallel input to receptive field enhancement module, and the enhanced characteristic images are marked as +.>The sizes are 128×128×64, 64×64×128, 32×32×320 respectively;

step 2.4: the multi-scale feature interaction module comprises a preprocessing module, a multi-scale feature pyramid module and a third convolution block Conv _3×3 The multi-scale feature pyramid module consists of 4 multi-scale feature extraction branches with the same structure and different parameters k and r and 5 third convolution blocks Conv _1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv _k×k And 1 second stride convolution block Conv _k×k,r Sequentially stacked, wherein Conv _k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;

in this embodiment, as shown in the block diagrams of fig. 6a and 6b, wherein +, -, × in the circular symbol shown in fig. 6a represent pixel-level addition processing, pixel-level subtraction processing, and pixel-level multiplication processing, respectively, and 1, α, β in the square symbol shown represent a constant "1" and two self-learning parameters "α" and "β", respectively; parameters k and r in 4 multi-scale feature extraction branches in fig. 6b are {1,3,5,7}, {1,2,4,8}, respectively, and the +, C in the circular symbols shown, respectively represent pixel-level addition processing and splicing processing in the channel dimension;

step 2.4.0 defines the current phase as k,and initializing k=1; will make the p-th initial stage decision diagramAs a decision diagram of the k-1 stage->

Step 2.4.1: the preprocessing module makes a decision on the p-th initial stageDownsampling operation is performed such that +.>Andis the same in size and gives the p-th downsampling decision map +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>

Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map is obtained correspondingly>And p-th inverse characteristic map->

Step 2.4.3.1:through 4 third convolution blocks Conv _1×1 After the adjustment of the channels, 4 channel output profiles are obtained +.>

Step 2.4.3.2: 4-channel output characteristic diagramRespectively inputting into 4 multi-scale feature extraction branches, and performing a symmetric convolution block Conv _k×k Obtaining 4 symmetrical convolution characteristic graphsThen pass through a second stride convolution block Conv _k×k,r After the processing of (2), 4 stride convolution feature maps are obtained, respectively + ->

Step 2.4.3.3: outputting characteristic diagram of 4 channels4 symmetrical convolution characteristic diagrams->And 4 stride convolution profiles +.>After being spliced in the channel dimension along the output sequence of the 4 multi-scale feature extraction branches, 4 multi-scale fusion feature graphs are correspondingly obtained

Step 2.4.3.4: 4 multiscale fusion feature mapsRespectively superposing the first plurality of multi-scale fusion features to correspondingly obtain 4 superposition feature graphs

Step 2.4.3.5: after the 4 overlapped feature images are spliced in the channel dimension again, 1 spliced feature image is obtainedWill->Input 5 th third convolution block Conv _1×1 After the channel is adjusted, an adjusted characteristic diagram +.>

Step 2.4.7: fusing feature F with p ^p 、And p-th self-learning feature map->After addition, the p-th multiscale characteristic interaction characteristic diagram is obtained>

Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding p-th series multiscale interaction feature map +.>Decision map with p-th series->Wherein (1)>A characteristic diagram showing the enhancement of the y-th receptive field,representing the kth stage feature map in the p-th series of multiscale interaction feature maps, ++>Representing a kth phase decision graph in the p-th series of decision graphs; otherwise, go to step 2.4.9; in this embodiment, k=3;

step 2.4.9: will beAfter 3 times of third convolution block Conv _3×3 After the treatment of (a) a p-th and k-th stage decision diagram is obtained>

Step 2.4.10: will beDecision map +.>The up-sampling operations are performed separately so thatAnd->Is the same in size and correspondingly gets the p-th up-sampling decision diagram +.>And p-th upsampling feature map->And with the characteristic pattern of receptive field enhancement->Inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step 2.4.1 for sequential execution after assigning k;

as shown in the structure diagram of FIG. 2, the p-th coarse positioning feature mapAnd p-th initial decision diagram->And enhanced feature map->Step by step, inputting the p series multi-scale interaction features into a multi-scale feature interaction module to obtain p series multi-scale interaction features ∈10>The sizes are 128×128×64, 64×64×128, 32×32×320, and p-th series decision diagram>The sizes are 128×128×1, 64×64×1, 32×32×1 respectively;

and 2, step 2.5: map the p-th downsampling decisionAfter up-sampling processing, a 1 st fusion decision graph is obtainedRespectively with the p-th series decision diagram->After stepwise addition, the p-th series fusion decision diagram +.>Wherein (1)>Representing p-th pair of gray-scale images->And->Is the kth fusion decision graph of (2);

as shown in the structure diagram of FIG. 2, the p-th initial decision diagram is obtainedAfter up-sampling processing, the 1 st fusion decision diagram is obtained>Respectively with the p-th series decision diagram->Step-by-step addition to obtain a p-th series fusion decision diagram +.>The sizes are respectively 32×32×1, 64×64×1 and 128×128×1;

as shown in the structure diagram of FIG. 2, the p-th series fusion decision diagramThrough up-sampling operation, the size of the input image is kept consistent with that of the input image, the sizes are 512 multiplied by 1, and after Sigmoid activation function processing, a p-th multistage output decision diagram->Wherein->A final decision graph of the p-th pair of gray images;

step 3: constructing a loss function using (1)

Step 4: based on the training set, training the multi-scale context awareness network by adopting a back propagation algorithm, and calculating a total loss function L _total The network parameters are adjusted until the maximum iteration times are reached, so that a trained multi-scale context sensing network is obtained, and in the embodiment, an Adam optimizer is adopted to carry out optimization solution on the total loss;

In this embodiment, as shown in the structure diagram of fig. 3, x and + in the circular symbols are shown as pixel level multiplication processing and pixel level addition processing, respectively; using final decision graphsAfter the inversion, a reverse decision diagram is obtained>The final decision diagram is respectively->And->Reverse directionDecision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>

In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.

Claims

1. The multi-focus image fusion method based on multi-scale context awareness is characterized by comprising the following steps of:

will p-th pair gray scale imageAnd->After being spliced in the channel dimension, the data are input into a multi-scale context awareness network and pass through a first convolution block Conv of an encoder _3×3 After processing to obtain the p-th input feature I ^p Then sequentially pass through Y vision turnsAfter the processing of the converter, Y primary feature maps corresponding to the p-th pair of gray images are correspondingly obtained>Wherein (1)>Representing a y-th primary feature map;

y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv _1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again _1×1 After the channel has been set, an adjusted y-th characteristic map is obtained +.>Will->And->Adding and processing by activating function ReLU to obtain final output y-th receptive field enhanced characteristic diagram +.>Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>

step 3: constructing a loss function using (1)

Step 4: based on the training set, training the multi-scale context-aware network by adopting a back propagation algorithm, and calculating the total loss function L _total Adjusting network parameters until the maximum iteration times are reached, so as to obtain a trained multi-scale context awareness network;

step 5: using final decision graphsAfter the inversion, a reverse decision diagram is obtained>The final decision diagram is respectively->And (3) withReverse decision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>

2. The multi-scale context aware multi-focus image fusion method according to claim 1, wherein said step 2.2 comprises:

step 2.2.1, when r=1, the coarse positioning decoder is at mth _r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input _3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying and then withSplicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv _3×3 After the treatment of (1) to obtain the Mth _r Individual phase output feature->

3. The multi-scale context aware-based multi-focus image fusion method according to claim 2, wherein the multi-scale feature pyramid module in step 2.4 is composed of 4 multi-scale feature extraction branches with the same structure but different parameters k and r and 5 third convolution blocks Conv _1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv _k×k And 1 second stride convolution block Conv _k×k,r Sequentially stacked, wherein Conv _k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;

Step 2.4.1: the preprocessing module makes a decision on the p decision diagram of the k-1 stageDownsampling operation is performed such that +.>And->Is the same in size and gives a p-th downsampling decision map in stage k-1 +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure in the k-1 stage +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>

Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map of the k-1 stage is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map of the k-1 stage is obtained correspondingly +.>And the k-1 th orderSegment p-th inverse profile->

Step 2.4.3.2: phase k-1 4 channel output profileRespectively inputting into 4 multi-scale feature extraction branches, and performing Conv by the symmetrical convolution blocks _k×k Obtaining 4 symmetrical convolution characteristic diagrams in the k-1 stage +.>Then pass through a second stride convolution block Conv _k×k,r After the processing of (a), 4 stride convolution characteristic diagrams of the k-1 stage are obtained respectively

Step 2.4.3.4: mapping the k-1 stage 4 multiscale fusion featuresRespectively superposing the two multi-scale fusion characteristics before the k-1 stage to correspondingly obtain 4 superposition characteristic diagrams of the k-1 stage

Step 2.4.4: p-th forward short feature map featureAnd p-th inverse short profile->After multiplying two self-learning parameters 'alpha' and 'beta', the p-th self-learning characteristic diagram pair +_>And->

Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding K-stage p-th series multiscale interaction feature map->Decision diagram +.p series with K stages>Otherwise, go to step 2.4.9;

Step 2.4.10: will beAnd the kth stage p decision diagram D _k ^p The upsampling operations are performed separately so that +.>Andis the same in size and correspondingly gets the kth phase p up-sampling decision diagram +.>And kth phase p upsamplingFeature map->And with the characteristic pattern of receptive field enhancement->And (3) inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step (2.4.1) for sequential execution after the k+1 values are assigned to the k.

4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the multi-focus image fusion method of any one of claims 1-3, the processor being configured to execute the program stored in the memory.

5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the multi-focus image fusion method of any of claims 1-3.