CN116630763A - Multi-scale context awareness-based multi-focus image fusion method - Google Patents
Multi-scale context awareness-based multi-focus image fusion method Download PDFInfo
- Publication number
- CN116630763A CN116630763A CN202310767148.7A CN202310767148A CN116630763A CN 116630763 A CN116630763 A CN 116630763A CN 202310767148 A CN202310767148 A CN 202310767148A CN 116630763 A CN116630763 A CN 116630763A
- Authority
- CN
- China
- Prior art keywords
- feature
- convolution
- stage
- scale
- conv
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 16
- 230000004927 fusion Effects 0.000 claims abstract description 42
- 230000006870 function Effects 0.000 claims abstract description 37
- 230000003993 interaction Effects 0.000 claims abstract description 31
- 230000004913 activation Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000010586 diagram Methods 0.000 claims description 106
- 238000012545 processing Methods 0.000 claims description 42
- 238000005070 sampling Methods 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 241000282414 Homo sapiens Species 0.000 abstract description 5
- 230000000007 visual effect Effects 0.000 abstract description 4
- 230000000295 complement effect Effects 0.000 abstract description 3
- 230000011218 segmentation Effects 0.000 abstract description 2
- 238000010276 construction Methods 0.000 abstract 1
- 238000002360 preparation method Methods 0.000 abstract 1
- 238000013461 design Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a multi-scale context awareness-based multi-focus image fusion method, which comprises the following steps: 1, data preparation and preprocessing, and multi-scale context-aware network construction, including: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module, wherein the encoder is a visual converter and is used for multi-scale feature extraction, and the decoder comprises: convolution and activation functions for multi-scale feature coarse decoding, the receptive field enhancement module comprising convolution and activation functions for feature receptive field enhancement, the multi-scale feature interaction module comprising: convolution and activation functions for multi-level feature fusion; 2 fusing the input multi-focus images, comprising: network training and multi-focus image fusion. The invention can fully utilize the complementary and redundant information in different defocused images to fuse the better quality fully focused image, provide better quality image for human eye observation, and simultaneously provide support for computer vision tasks such as image identification, segmentation and the like.
Description
Technical Field
The invention relates to the technical field of multi-focus image fusion, in particular to a multi-focus image fusion method based on multi-scale context awareness.
Background
Because of the limited imaging capabilities of the camera, it is often difficult to capture images that are focused on all objects. In particular, the camera is limited by the depth of field, and only the image inside the depth of field remains in focus, while the image outside the depth of field becomes blurred. Therefore, in the current production and life, the information contained in the collected images is different and incomplete, the images contain different contour, texture and other characteristic information, the analysis of the image characteristics is inconvenient, and the comprehensive information can realize more comprehensive and clear understanding of targets in an image scene. The goal of multi-focus image fusion is therefore to extract and integrate this important information into one image. In order to solve the problem, a plurality of images with different focusing areas can be selected for image fusion to obtain a full-focus image, so that the defect of limited depth of field can be overcome, and the technology is called multi-focus image fusion.
The multi-focus image fusion is one of important branches in image fusion, and the main purpose is to fusion process a plurality of images of different focus areas in the same scene to obtain a full-focus image with clear all areas. The existing method has the problems that firstly, the traditional method mainly comprises the steps of manually designing and extracting characteristics and rules by human beings, has strong constraint and poor robustness, has poor effect under different scenes, and can generate misjudgment and artifact of a focusing area, thereby influencing a final fusion image; the other is a method mainly based on deep learning, the comparison depends on the design of a network structure, the former method mainly uses convolution, only focuses on the utilization of local characteristic information, ignores global information, and lacks feature fusion and interaction on multiple scales.
Disclosure of Invention
The invention provides a multi-focus image fusion method based on multi-scale context sensing, which aims to fully utilize complementary and redundant information of images in different focus areas to provide better image characteristic expression and reconstruct a full-focus image with higher quality, thereby providing images with better quality for human eye observation, and simultaneously providing support for computer vision tasks such as image identification, classification, segmentation and the like, thereby assisting research such as human eye identification, computer analysis and the like.
The invention adopts the following technical scheme for solving the problems:
the invention discloses a multi-scale context awareness-based multi-focus image fusion method which is characterized by comprising the following steps of:
step 1: acquiring a P-to-RGB multi-focus image and converting the multi-focus image into a gray image, and recording the gray image asAndand as training set, wherein->And->Respectively representing a foreground focusing image and a background focusing image in the p-th pair of gray level images; taking the p-th real ground mask corresponding to the p-th gray level image as the p-th label, and marking as G p Thereby constructing a label set { G ] of the P-to-RGB multi-focus image 1 ,G 2 ,…,G p ,…G P };
Step 2: constructing a multi-scale context aware network, comprising: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module;
step 2.1: the encoder comprises 1 first convolution block Conv for adjusting the number of channels 3×3 And Y vision converters, wherein Conv 3×3 Representing 1 convolution layer with a convolution kernel of 3×3 with 1 ReLU activation function;
will p-th pair gray scale imageAnd->After being spliced in the channel dimension, the data are input into a multi-scale context awareness network and pass through a first convolution block Conv of an encoder 3×3 After processing to obtain the p-th input feature I p Then after being processed by Y vision converters in turn, Y primary feature images corresponding to the p-th pair of gray images are correspondingly obtained>Wherein (1)>Representing a y-th primary feature map;
step 2.2: the coarse positioning decoder consists of a plurality of multistage cross-scale connected second convolution blocks Conv 3×3 And 1 first convolution block Conv 1×1 Make up and map Y primary featuresProceeding with M R Feature decoding at the individual stage to obtain the p-th coarse positioning decoder feature +.>And p-th initial decision diagram->Wherein Conv 1×1 Representing 1 convolution layer with 1 x 1 convolution kernel;
step 2.3: the receptive field enhancement module consists of 4 receptive field enhancement branches with the same structure but different parameters k and r and 5 second convolution blocks Conv 1×1 1 ReLU activation function, wherein each receptive field enhancement branch consists of 1 asymmetric convolution block Conv 1×k 1 asymmetric convolution block Conv k×1 And 1 first stride convolution block Conv k×k,r Sequentially stacking to form; wherein Conv 1×k Representing 1 asymmetric convolution layer with convolution kernel of 1 xk, conv k×1 Representing 1 asymmetric convolution layer with a convolution kernel of kx1, conv k×k,r Representing 1 symmetric convolution layer with convolution kernel k x k and step length r;
y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv 1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again 1×1 After the channel is adjusted, the product is obtainedTo the y-th characteristic diagram after adjustment +.>Will->And->Adding and processing by activating function ReLU to obtain final output y-th receptive field enhanced characteristic diagram +.>Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>
Step 2.4: the multi-scale feature interaction module comprises a preprocessing module, a multi-scale feature pyramid module and a third convolution block Conv 3×3 Composition and sequence of p-th coarse positioning decoder featuresAnd p-th initial phase decision diagram->After treatment, 1-Y-1 characteristic patterns of enhanced receptive fields are obtained +.>Corresponding p-th series multiscale interaction feature map +.>Decision map with p-th series->Wherein (1)>Characteristic map showing enhancement of the y-th receptive field, < >>Representing the kth stage feature map in the p-th series of multiscale interaction feature maps, ++>Representing a kth phase decision graph in the p-th series of decision graphs;
step 2.5: map the p-th downsampling decisionAfter up-sampling processing, a 1 st fusion decision graph is obtainedRespectively with the p-th series decision diagram->After stepwise addition, the p-th series fusion decision diagram +.>Wherein (1)>Representing p-th pair of gray-scale images->And->Is the kth fusion decision graph of (2);
step 2.6:after up-sampling operation and processing of Sigmoid activation function, the p-th multi-level output decision diagram is obtained>Wherein (1)>Representing p-th pair of gray-scale images->And->Is to be +.>As p-th pair gray scale image->And->Is a final decision graph of (1);
step 3: constructing a loss function using (1)
In the formula (1), L wBCE Representing weighted binary cross entropy loss, L wIOU Representing weighted cross-ratio loss;
constructing a total loss function L of a multi-scale context-aware network using (2) total :
Step 4: training the multi-scale context-aware network by adopting a back propagation algorithm based on the training set, and calculating the total loss functionNumber L total Adjusting network parameters until the maximum iteration times are reached, so as to obtain a trained multi-scale context awareness network;
step 5: using final decision graphsAfter the inversion, a reverse decision diagram is obtained>Respectively, will make the final decision diagramAnd->Reverse decision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>
The multi-scale context awareness-based multi-focus image fusion method of the invention is also characterized in that the step 2.2 comprises:
step 2.2.1, when r=1, the coarse positioning decoder is at mth r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input 3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=2, the coarse positioning decoder is at mth r Stage Y primary feature mapAnd Y-1 th primary profile +.>The upsampling operations are performed separately so that +.>And->Size and->Is the same in size and gives the (r+1) th up-sampled feature map +.>And (r+2) th upsampling feature map +.>Then, the (r+3) th and (r+4) th second convolution blocks Conv are input 3×3 And to obtain the (r+1) th and (r+2) th characteristic maps->And->Will beAnd->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+5) th and (r+6) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=3, 4, …, R-1, the coarse positioning decoder is at mth r Stage and for the Y-th primary feature mapUp to Y-r+1 primary profile +.>After the same treatment, the output characteristics +.1 from the (R) th to the (R-1) th stages are obtained>
When r=r, the output characteristics of the R-1 stage are calculatedInput the last 2 second convolution blocks Conv 3×3 The processing of the R phase to obtain an output characteristic diagram +.>The p-th rough positioning decoder characteristic diagram finally output by the rough positioning decoder is input into a first convolution block Conv 1×1 After that, the p-th initial stage decision diagram is obtained>
The multi-scale feature pyramid module in the step 2.4 is composed of 4 multi-scale feature extraction branches with the same structure but different parameters k and r and 5 third convolution blocks Conv 1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv k×k And 1 second stride convolution block Conv k×k,r Sequentially stacked, wherein Conv k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;
step 2.4.0, defining the current stage as k, and initializing k=1; will make the p-th initial stage decision diagramP decision diagram as k-1 stage->
Step 2.4.1: the preprocessing module makes a decision on the p decision diagram of the k-1 stageDownsampling operations are performed such thatAnd->Is the same in size and gives a p-th downsampling decision map in stage k-1 +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure in the k-1 stage +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>
Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map of the k-1 stage is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map of the k-1 stage is obtained correspondingly +.>And the p-th inverse characteristic diagram of the k-1 stage
Step 2.4.3: multiple scale feature pyramid module pairsAnd->After processing, an output characteristic diagram is obtained>And->Then respectively and p-th characteristic diagram->After short ligation, the p-th forward short feature map +.>And p-th inverse short profile->
Step 2.4.3.1:through 4 third convolution blocks Conv 1×1 After the adjustment of the channel, a 4-channel output profile of the k-1 stage is obtained +.>
Step 2.4.3.2: phase k-1 4 channel output profileRespectively inputting into 4 multi-scale feature extraction branches, and performing Conv by the symmetrical convolution blocks k×k Obtaining 4 symmetrical convolution characteristic graphs in the k-1 stageThen pass through a second stride convolution block Conv k×k,r After the processing of (2), 4 stride convolution feature maps of the k-1 stage are obtained as +.>
Step 2.4.3.3: outputting characteristic diagram of 4 channels in k-1 stage4 symmetrical convolution characteristic diagrams +.>And 4 stride convolutions characteristic map +.k-1 stage>After splicing in the channel dimension along the output sequence of the 4 multi-scale feature extraction branches, 4 multi-scale fusion feature graphs in the k-1 stage are correspondingly obtained>
Step 2.4.3.4: mapping the k-1 stage 4 multiscale fusion features
Respectively superposing a plurality of multi-scale fusion characteristics before the k-1 stage to correspondingly obtain 4 superposition characteristic diagrams of the k-1 stage>
Step 2.4.3.5: the 4 overlapped feature images in the k-1 stage are spliced again in the channel dimension to obtain 1 spliced feature image in the k-1 stageWill->Input 5 th third convolution block Conv 1×1 After the channel is adjusted, a characteristic diagram after the k-1 phase adjustment is obtained>
Step 2.4.3.6: will beAnd->After addition, the p-th multi-scale feature pyramid forward feature map ++is obtained after the treatment of the ReLU activation function>
Step 2.4.3.7: the characteristic diagram is mapped according to the process from step 2.4.3.1 to step 2.4.3.6After the same treatment, the p-th multi-scale characteristic pyramid inverse characteristic diagram +.>
Step 2.4.4: p-th forward short feature map featureAnd the p-th inverse short feature mapAfter multiplying two self-learning parameters 'alpha' and 'beta', the p-th self-learning characteristic diagram pair +_>And->
Step 2.4.5: will p-th feature mapThrough a third convolution block Conv 3×3 After the up-sampling operation, the p up-sampling characteristic diagram is obtained>
Step 2.4.6: map the p-th self-learning featureAnd p-th upsampling feature->After subtraction, the p-th fusion characteristic F is obtained p ;
Step 2.4.7: fusing feature F with p p 、And p-th self-learning feature map->After addition, the p-th multiscale feature interaction feature map ++in the k-th stage is obtained>
Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding K-stage p-th series multiscale interaction feature mapDecision diagram +.p series with K stages>Otherwise, go to step 2.4.9;
step 2.4.9: stage kAfter 3 times of third convolution block Conv 3×3 After the processing of (a) a kth phase p decision diagram is obtained +.>
Step 2.4.10: will beAnd the kth stage p decision diagram D k p The up-sampling operations are performed separately so thatAnd->Is the same in size and correspondingly gets the kth phase p up-sampling decision diagram +.>And kth phase p upsampling feature map +.>And with the characteristic pattern of receptive field enhancement->And (3) inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step (2.4.1) for sequential execution after the k+1 values are assigned to the k.
The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute the multi-focus image fusion method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the multi-focus image fusion method.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a unified network framework to realize the multi-focus image fusion task, fully utilizes redundancy and complementary information among images of different modes, and fuses high-quality images. Compared with the prior art, the method provided by the invention simulates the mechanism of the human visual system by learning the multi-scale context sensing characteristics combined with local and global information, and better promotes the learning of a network, thereby obtaining the image fusion result with higher quality.
2. The invention designs a multi-scale feature combining CNN and a transducer, which effectively extracts local and global features; through introducing a transducer model, long-distance dependency relationship is established, multi-scale information in an image is extracted, and shallow characteristic information is enhanced by utilizing a receptive field enhancement module; the method can better enable the characteristics of different scales to have stronger semantic information; the details of the bottom layer and the semantic information of the high layer are integrated, so that better detail expression is brought to the fusion result.
3. The invention designs a coarse positioning decoder, which is used for gradually aggregating multi-scale features extracted from a trans-former backbone network, wherein the aggregated features contain context information, and coarse features and decision diagrams are generated to provide guidance for subsequent steps. In addition, the invention designs a multi-scale feature interaction module which is used for focusing on the information of the focusing area and the defocusing area simultaneously, and promotes the interaction between the two types of information so as to enrich local details and inhibit the misclassified area. The multi-scale feature interaction module is used for realizing the fusion interaction of multi-scale information between the shallow features and the decoder features, optimizing rough features and decision graphs obtained through the decoder, guiding the network to learn detail features better, realizing more accurate defocusing region detection and enhancing the fused image quality.
Drawings
FIG. 1 is a flow chart of a multi-scale context aware multi-focus image fusion method of the present invention;
FIG. 2 is a schematic diagram of a network architecture according to the present invention;
FIG. 3 is a schematic diagram of a fusion structure according to the present invention;
FIG. 4 is a schematic diagram of a coarse positioning decoder according to the present invention;
FIG. 5 is a schematic view of a receptive field enhancement module of the invention;
FIG. 6a is a schematic diagram of a multi-scale feature interaction module structure of the present invention;
FIG. 6b is a schematic diagram of a multi-scale feature pyramid module architecture of the present invention.
Detailed Description
In this embodiment, a multi-scale context-aware multi-focus image fusion method, as shown in fig. 1, includes the following steps:
step 1: acquiring a P-to-RGB multi-focus image and converting the multi-focus image into a gray image, and recording the gray image asAndand as training set, wherein->And->Respectively representing a foreground focusing image and a background focusing image in the p-th pair of gray level images; taking the p-th real ground mask corresponding to the p-th gray level image as the p-th label, and marking as G p Thereby constructing a label set { G ] of the P-to-RGB multi-focus image 1 ,G 2 ,…,G p ,…G P };
Step 2: constructing a multi-scale context aware network using as shown in fig. 2, comprising: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module; s and plus in the circular symbols shown in FIG. 2 represent Sigmoid activation function processing and pixel level addition processing, respectively;
step 2.1: the encoder comprises 1 first convolution block Conv for adjusting the number of channels 3×3 And Y vision converters, wherein Conv 3×3 Representing 1 convolution layer with a convolution kernel of 3×3 with 1 ReLU activation function;
will p-th pair gray scale imageAnd->After being spliced in the channel dimension, the data are input into a multi-scale context awareness network and pass through a first convolution block Conv of an encoder 3×3 After processing to obtain the p-th input feature I p Then after being processed by Y vision converters in turn, Y primary feature images corresponding to the p-th pair of gray images are correspondingly obtained>Wherein (1)>Representing a y-th primary feature map;
in this embodiment, as shown in the structure diagram of fig. 2, y=4; p-th pair of gray scale imagesAnd->They are a pair of 512×512×1 images, which are input into a multi-scale context-aware network after being spliced in the channel dimension, and pass through a first convolution block Conv 3×3 The number of channels is regulated to obtain the p-th input characteristic diagram I p The size is 512 multiplied by 3, and then the primary deep characteristic images are respectively 128 multiplied by 64, 64 multiplied by 128, 32 multiplied by 320 and 16 multiplied by 512 respectively marked as +.>Wherein the visual transducer is a standard visual transducer;
step 2.2: the coarse positioning decoder is formed by a plurality of multistage cross-scale connected second convolution blocks Conv 3×3 And 1 first convolution block Conv 1×1 Make up and map Y primary featuresProceeding with M R Feature decoding at the individual stage to obtain the p-th coarse positioning decoder feature +.>And p-th initial decision diagram->Wherein Conv 1×1 Representing 1 convolution layer with 1 x 1 convolution kernel;
in the present embodiment, as shown in the structure diagram of FIG. 4, M R =4, x and C in the circular symbol shown, respectively representing pixel-level multiplication processing and splicing processing in the channel dimension;
step 2.2.1, coarse positioning decoder is at mth when r=1 r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input 3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying and then withSplicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=2, the coarse positioning decoder is at mth r Stage Y primary feature mapAnd Y-1 th primary profile +.>The upsampling operations are performed separately so that +.>And->Size and->Is the same in size and gives the (r+1) th up-sampled feature map +.>And (r+2) th upsampling feature map +.>Then, the (r+3) th and (r+4) th second convolution blocks Conv are input 3×3 And to obtain the (r+1) th and (r+2) th characteristic maps->And->Will be And->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+5) th and (r+6) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=3, 4, …, R-1, the coarse positioning decoder is at mth r Stage and for the Y-th primary feature mapUp to Y-r+1 primary profile +.>After the same treatment, the output characteristics from the (R) th stage to the (R-1) th stage are obtained
When r=r, the output characteristics of the R-1 stage are calculatedInput the last 2 second convolution blocks Conv 3×3 The processing of the R phase to obtain an output characteristic diagram +.>The p-th rough positioning decoder characteristic diagram finally output by the rough positioning decoder is input into a first convolution block Conv 1×1 After that, the p-th initial stage decision diagram is obtained>
As shown in the structure diagram of FIG. 2, the primary deep feature mapInputting to a coarse positioning decoder to obtain a coarse positioning characteristic map +.>And initial decision diagram->The sizes are 128×128×64 and 128×128×1, respectively;
step 2.3: the receptive field enhancement module consists of 4 receptive field enhancement branches with the same structure but different parameters k and r and 5 second convolution blocks Conv 1×1 1 ReLU activation function, wherein each receptive field enhancement branch consists of 1 asymmetric convolution block Conv 1×k 1 asymmetric convolution block Conv k×1 And 1 first stride convolution block Conv k×k,r Sequentially stacking to form; wherein Conv 1×k Representing 1 asymmetric convolution layer with convolution kernel of 1 xk, conv k×1 Representing 1 asymmetric convolution layer with a convolution kernel of kx1, conv k×k,r Representing 1 symmetric convolution layer with convolution kernel k x k and step length r;
in this embodiment, as shown in the structure diagram of fig. 5, parameters k and r in 4 receptive field enhancement branches are {3,5,7,9}, and +, C in the circular symbol shown, respectively represent pixel-level addition processing and splicing processing in the channel dimension;
y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv 1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again 1×1 After the channel has been set, an adjusted y-th characteristic map is obtained +.>Will->And->Adding and processing the active function ReLU to obtain the final output y-th receptive field increaseStrong feature map->Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>
As shown in the structure diagram of FIG. 2, the primary shallow feature mapParallel input to receptive field enhancement module, and the enhanced characteristic images are marked as +.>The sizes are 128×128×64, 64×64×128, 32×32×320 respectively;
step 2.4: the multi-scale feature interaction module comprises a preprocessing module, a multi-scale feature pyramid module and a third convolution block Conv 3×3 The multi-scale feature pyramid module consists of 4 multi-scale feature extraction branches with the same structure and different parameters k and r and 5 third convolution blocks Conv 1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv k×k And 1 second stride convolution block Conv k×k,r Sequentially stacked, wherein Conv k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;
in this embodiment, as shown in the block diagrams of fig. 6a and 6b, wherein +, -, × in the circular symbol shown in fig. 6a represent pixel-level addition processing, pixel-level subtraction processing, and pixel-level multiplication processing, respectively, and 1, α, β in the square symbol shown represent a constant "1" and two self-learning parameters "α" and "β", respectively; parameters k and r in 4 multi-scale feature extraction branches in fig. 6b are {1,3,5,7}, {1,2,4,8}, respectively, and the +, C in the circular symbols shown, respectively represent pixel-level addition processing and splicing processing in the channel dimension;
step 2.4.0 defines the current phase as k,and initializing k=1; will make the p-th initial stage decision diagramAs a decision diagram of the k-1 stage->
Step 2.4.1: the preprocessing module makes a decision on the p-th initial stageDownsampling operation is performed such that +.>Andis the same in size and gives the p-th downsampling decision map +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>
Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map is obtained correspondingly>And p-th inverse characteristic map->
Step 2.4.3: multiple scale feature pyramid module pairsAnd->After processing, an output characteristic diagram is obtained>And->Then respectively and p-th characteristic diagram->After short ligation, the p-th forward short feature map +.>And p-th inverse short profile->
Step 2.4.3.1:through 4 third convolution blocks Conv 1×1 After the adjustment of the channels, 4 channel output profiles are obtained +.>
Step 2.4.3.2: 4-channel output characteristic diagramRespectively inputting into 4 multi-scale feature extraction branches, and performing a symmetric convolution block Conv k×k Obtaining 4 symmetrical convolution characteristic graphsThen pass through a second stride convolution block Conv k×k,r After the processing of (2), 4 stride convolution feature maps are obtained, respectively + ->
Step 2.4.3.3: outputting characteristic diagram of 4 channels4 symmetrical convolution characteristic diagrams->And 4 stride convolution profiles +.>After being spliced in the channel dimension along the output sequence of the 4 multi-scale feature extraction branches, 4 multi-scale fusion feature graphs are correspondingly obtained
Step 2.4.3.4: 4 multiscale fusion feature mapsRespectively superposing the first plurality of multi-scale fusion features to correspondingly obtain 4 superposition feature graphs
Step 2.4.3.5: after the 4 overlapped feature images are spliced in the channel dimension again, 1 spliced feature image is obtainedWill->Input 5 th third convolution block Conv 1×1 After the channel is adjusted, an adjusted characteristic diagram +.>
Step 2.4.3.6: will beAnd->After addition, the p-th multi-scale feature pyramid forward feature map ++is obtained after the treatment of the ReLU activation function>
Step 2.4.3.7: the characteristic diagram is mapped according to the process from step 2.4.3.1 to step 2.4.3.6After the same treatment, the p-th multi-scale characteristic pyramid inverse characteristic diagram +.>
Step 2.4.4: p-th forward short feature map featureAnd the p-th inverse short feature mapAfter multiplying two self-learning parameters 'alpha' and 'beta', the p-th self-learning characteristic diagram pair +_>And->
Step 2.4.5: will p-th feature mapThrough a third convolution block Conv 3×3 After the up-sampling operation, the p up-sampling characteristic diagram is obtained>
Step 2.4.6: map the p-th self-learning featureAnd p-th upsampling feature->After subtraction, the p-th fusion characteristic F is obtained p ;
Step 2.4.7: fusing feature F with p p 、And p-th self-learning feature map->After addition, the p-th multiscale characteristic interaction characteristic diagram is obtained>
Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding p-th series multiscale interaction feature map +.>Decision map with p-th series->Wherein (1)>A characteristic diagram showing the enhancement of the y-th receptive field,representing the kth stage feature map in the p-th series of multiscale interaction feature maps, ++>Representing a kth phase decision graph in the p-th series of decision graphs; otherwise, go to step 2.4.9; in this embodiment, k=3;
step 2.4.9: will beAfter 3 times of third convolution block Conv 3×3 After the treatment of (a) a p-th and k-th stage decision diagram is obtained>
Step 2.4.10: will beDecision map +.>The up-sampling operations are performed separately so thatAnd->Is the same in size and correspondingly gets the p-th up-sampling decision diagram +.>And p-th upsampling feature map->And with the characteristic pattern of receptive field enhancement->Inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step 2.4.1 for sequential execution after assigning k;
as shown in the structure diagram of FIG. 2, the p-th coarse positioning feature mapAnd p-th initial decision diagram->And enhanced feature map->Step by step, inputting the p series multi-scale interaction features into a multi-scale feature interaction module to obtain p series multi-scale interaction features ∈10>The sizes are 128×128×64, 64×64×128, 32×32×320, and p-th series decision diagram>The sizes are 128×128×1, 64×64×1, 32×32×1 respectively;
and 2, step 2.5: map the p-th downsampling decisionAfter up-sampling processing, a 1 st fusion decision graph is obtainedRespectively with the p-th series decision diagram->After stepwise addition, the p-th series fusion decision diagram +.>Wherein (1)>Representing p-th pair of gray-scale images->And->Is the kth fusion decision graph of (2);
as shown in the structure diagram of FIG. 2, the p-th initial decision diagram is obtainedAfter up-sampling processing, the 1 st fusion decision diagram is obtained>Respectively with the p-th series decision diagram->Step-by-step addition to obtain a p-th series fusion decision diagram +.>The sizes are respectively 32×32×1, 64×64×1 and 128×128×1;
step 2.6:after up-sampling operation and processing of Sigmoid activation function, the p-th multi-level output decision diagram is obtained>Wherein (1)>Representing p-th pair of gray-scale images->And->Is to be +.>As p-th pair gray scale image->And->Is a final decision graph of (1);
as shown in the structure diagram of FIG. 2, the p-th series fusion decision diagramThrough up-sampling operation, the size of the input image is kept consistent with that of the input image, the sizes are 512 multiplied by 1, and after Sigmoid activation function processing, a p-th multistage output decision diagram->Wherein->A final decision graph of the p-th pair of gray images;
step 3: constructing a loss function using (1)
In the formula (1), L wBCE Representing weighted binary cross entropy loss, L wIOU Representing weighted cross-ratio loss;
constructing a total loss function L of a multi-scale context-aware network using (2) total :
Step 4: based on the training set, training the multi-scale context awareness network by adopting a back propagation algorithm, and calculating a total loss function L total The network parameters are adjusted until the maximum iteration times are reached, so that a trained multi-scale context sensing network is obtained, and in the embodiment, an Adam optimizer is adopted to carry out optimization solution on the total loss;
step 5: using final decision graphsAfter the inversion, a reverse decision diagram is obtained>Respectively, will make the final decision diagramAnd->Reverse decision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>
In this embodiment, as shown in the structure diagram of fig. 3, x and + in the circular symbols are shown as pixel level multiplication processing and pixel level addition processing, respectively; using final decision graphsAfter the inversion, a reverse decision diagram is obtained>The final decision diagram is respectively->And->Reverse directionDecision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>
In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.
Claims (5)
1. The multi-focus image fusion method based on multi-scale context awareness is characterized by comprising the following steps of:
step 1: acquiring a P-to-RGB multi-focus image and converting the multi-focus image into a gray image, and recording the gray image asAndand as training set, wherein->And->Respectively representing a foreground focusing image and a background focusing image in the p-th pair of gray level images; taking the p-th real ground mask corresponding to the p-th gray level image as the p-th label, and marking as G p Thereby constructing a label set { G ] of the P-to-RGB multi-focus image 1 ,G 2 ,…,G p ,…G P };
Step 2: constructing a multi-scale context aware network, comprising: the system comprises an encoder, a coarse positioning decoder, a receptive field enhancement module and a multi-scale feature interaction module;
step 2.1: the encoder comprises 1 first convolution block Conv for adjusting the number of channels 3×3 And Y vision converters, wherein Conv 3×3 Representing 1 convolution layer with a convolution kernel of 3×3 with 1 ReLU activation function;
will p-th pair gray scale imageAnd->After being spliced in the channel dimension, the data are input into a multi-scale context awareness network and pass through a first convolution block Conv of an encoder 3×3 After processing to obtain the p-th input feature I p Then sequentially pass through Y vision turnsAfter the processing of the converter, Y primary feature maps corresponding to the p-th pair of gray images are correspondingly obtained>Wherein (1)>Representing a y-th primary feature map;
step 2.2: the coarse positioning decoder consists of a plurality of multistage cross-scale connected second convolution blocks Conv 3×3 And 1 first convolution block Conv 1×1 Make up and map Y primary featuresProceeding with M R Feature decoding at the individual stage to obtain the p-th coarse positioning decoder feature +.>And p-th initial decision diagram->Wherein Conv 1×1 Representing 1 convolution layer with 1 x 1 convolution kernel;
step 2.3: the receptive field enhancement module consists of 4 receptive field enhancement branches with the same structure but different parameters k and r and 5 second convolution blocks Conv 1×1 1 ReLU activation function, wherein each receptive field enhancement branch consists of 1 asymmetric convolution block Conv 1×k 1 asymmetric convolution block Conv k×1 And 1 first stride convolution block Conv k×k,r Sequentially stacking to form; wherein Conv 1×k Representing 1 asymmetric convolution layer with convolution kernel of 1 xk, conv k×1 Representing 1 asymmetric convolution layer with a convolution kernel of kx1, conv k×k,r Representing 1 symmetric convolution layer with convolution kernel k x k and step length r;
y-1 primary feature mapsParallel input to receptive field enhancement module, wherein ∈>By 5 second convolution blocks Conv 1×1 After the adjustment of the channel, 5 output features are obtained>The latter 4 output profiles +.>Respectively and correspondingly inputting into 4 receptive field enhancement branches for processing to obtain 4 receptive field enhancement branch characteristic diagrams +.>Then and->After the channel dimension is spliced, a y fusion characteristic diagram is obtained>And input the second convolution block Conv again 1×1 After the channel has been set, an adjusted y-th characteristic map is obtained +.>Will->And->Adding and processing by activating function ReLU to obtain final output y-th receptive field enhanced characteristic diagram +.>Thus obtaining Y-1 characteristic patterns enhanced in receptive field +.>
Step 2.4: the multi-scale feature interaction module comprises a preprocessing module, a multi-scale feature pyramid module and a third convolution block Conv 3×3 Composition and sequence of p-th coarse positioning decoder featuresAnd p-th initial phase decision diagram->After treatment, 1-Y-1 characteristic patterns of enhanced receptive fields are obtained +.>Corresponding p-th series multiscale interaction feature map +.>Decision map with p-th series->Wherein (1)>Characteristic map showing enhancement of the y-th receptive field, < >>Representing the kth stage feature map in the p-th series of multiscale interaction feature maps, ++>Representing a kth phase decision graph in the p-th series of decision graphs;
step 2.5: map the p-th downsampling decisionAfter up-sampling processing, a 1 st fusion decision graph is obtainedRespectively with the p-th series decision diagram->After stepwise addition, the p-th series fusion decision diagram +.>Wherein (1)>Representing p-th pair of gray-scale images->And->Is the kth fusion decision graph of (2);
step 2.6:after up-sampling operation and processing of Sigmoid activation function, the p-th multi-level output decision diagram is obtained>Wherein (1)>Representing p-th pair of gray-scale images->And->Is to be +.>As p-th pair gray scale image->And->Is a final decision graph of (1);
step 3: constructing a loss function using (1)
In the formula (1), L wBCE Representing weighted binary cross entropy loss, L wIOU Representing weighted cross-ratio loss;
constructing a total loss function L of a multi-scale context-aware network using (2) total :
Step 4: based on the training set, training the multi-scale context-aware network by adopting a back propagation algorithm, and calculating the total loss function L total Adjusting network parameters until the maximum iteration times are reached, so as to obtain a trained multi-scale context awareness network;
step 5: using final decision graphsAfter the inversion, a reverse decision diagram is obtained>The final decision diagram is respectively->And (3) withReverse decision diagram->And->After pixel-by-pixel multiplication, a partially sharp image is obtained>And->Partial clear image +.>And->Pixel-by-pixel addition is performed to obtain a p-th pair of gray-scale images +>And->Is +.>
2. The multi-scale context aware multi-focus image fusion method according to claim 1, wherein said step 2.2 comprises:
step 2.2.1, when r=1, the coarse positioning decoder is at mth r Stage and for the Y-th primary feature mapUp-sampling operations of different weights are performed twice, respectively, so that +.>And->Is the same in size and gives the r up-sampled feature map +.>And (r+1th upsample feature map +.>Then, the (r) th and (r+1) th second convolution blocks Conv are respectively input 3×3 And to obtain the r and r+1 feature maps +.>And->Will->And->Multiplying and then withSplicing in the channel dimension, and sequentially passing through the (r+2) th and (r+3) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=2, the coarse positioning decoder is at mth r Stage Y primary feature mapAnd Y-1 th primary profile +.>The upsampling operations are performed separately so that +.>And->Size and->Is the same in size and gives the (r+1) th up-sampled feature map +.>And (r+2) th upsampling feature map +.>Then, the (r+3) th and (r+4) th second convolution blocks Conv are input 3×3 And to obtain the (r+1) th and (r+2) th characteristic maps->And->Will beAnd->Multiplying by +.>Splicing in the channel dimension, and sequentially passing through the (r+5) th and (r+6) th second convolution blocks Conv 3×3 After the treatment of (1) to obtain the Mth r Individual phase output feature->
When r=3, 4, …, R-1, the coarse positioning decoder is at mth r Stage and for the Y-th primary feature mapUp to Y-r+1 primary profile +.>After the same treatment, the output characteristics from the (R) th stage to the (R-1) th stage are obtained
When r=r, the output characteristics of the R-1 stage are calculatedInput the last 2 second convolution blocks Conv 3×3 The processing of the R phase to obtain an output characteristic diagram +.>The p-th rough positioning decoder characteristic diagram finally output by the rough positioning decoder is input into a first convolution block Conv 1×1 After that, the p-th initial stage decision diagram is obtained>
3. The multi-scale context aware-based multi-focus image fusion method according to claim 2, wherein the multi-scale feature pyramid module in step 2.4 is composed of 4 multi-scale feature extraction branches with the same structure but different parameters k and r and 5 third convolution blocks Conv 1×1 1 ReLU activation function, wherein each multi-scale feature extraction branch is composed of 1 symmetrical convolution block Conv k×k And 1 second stride convolution block Conv k×k,r Sequentially stacked, wherein Conv k×k Representing 1 symmetric convolution layer with a convolution kernel of k×k and 1 ReLU activation function;
step 2.4.0, defining the current stage as k, and initializing k=1; will make the p-th initial stage decision diagramP decision diagram as k-1 stage->
Step 2.4.1: the preprocessing module makes a decision on the p decision diagram of the k-1 stageDownsampling operation is performed such that +.>And->Is the same in size and gives a p-th downsampling decision map in stage k-1 +.>Then carrying out Sigmoid activation function operation to obtain the p weight figure in the k-1 stage +.>Meanwhile, the p-th coarse positioning decoder feature +.>After the same downsampling operation, the p-th feature map is obtained>
Step 2.4.2: the preprocessing module subtracts the p weight figure from' 1After that, the p-th inverse weight map of the k-1 stage is obtained>Then will->And->Respectively with p-th feature map->After multiplication, the p-th forward feature map of the k-1 stage is obtained correspondingly +.>And the k-1 th orderSegment p-th inverse profile->
Step 2.4.3: multiple scale feature pyramid module pairsAnd->After processing, an output characteristic diagram is obtained>And->Then respectively and p-th characteristic diagram->After short ligation, the p-th forward short feature map +.>And p-th inverse short profile->
Step 2.4.3.1:through 4 third convolution blocks Conv 1×1 After the adjustment of the channel, a 4-channel output profile of the k-1 stage is obtained +.>
Step 2.4.3.2: phase k-1 4 channel output profileRespectively inputting into 4 multi-scale feature extraction branches, and performing Conv by the symmetrical convolution blocks k×k Obtaining 4 symmetrical convolution characteristic diagrams in the k-1 stage +.>Then pass through a second stride convolution block Conv k×k,r After the processing of (a), 4 stride convolution characteristic diagrams of the k-1 stage are obtained respectively
Step 2.4.3.3: outputting characteristic diagram of 4 channels in k-1 stage4 symmetrical convolution characteristic diagrams +.>And 4 stride convolutions characteristic map +.k-1 stage>After splicing in the channel dimension along the output sequence of the 4 multi-scale feature extraction branches, 4 multi-scale fusion feature graphs in the k-1 stage are correspondingly obtained>
Step 2.4.3.4: mapping the k-1 stage 4 multiscale fusion featuresRespectively superposing the two multi-scale fusion characteristics before the k-1 stage to correspondingly obtain 4 superposition characteristic diagrams of the k-1 stage
Step 2.4.3.5: the 4 overlapped feature images in the k-1 stage are spliced again in the channel dimension to obtain 1 spliced feature image in the k-1 stageWill->Input 5 th third convolution block Conv 1×1 After the channel is adjusted, a characteristic diagram after the k-1 phase adjustment is obtained>
Step 2.4.3.6: will beAnd->After addition, the p-th multi-scale feature pyramid forward feature map ++is obtained after the treatment of the ReLU activation function>
Step 2.4.3.7: the characteristic diagram is mapped according to the process from step 2.4.3.1 to step 2.4.3.6After the same treatment, the p-th multi-scale characteristic pyramid inverse characteristic diagram +.>
Step 2.4.4: p-th forward short feature map featureAnd p-th inverse short profile->After multiplying two self-learning parameters 'alpha' and 'beta', the p-th self-learning characteristic diagram pair +_>And->
Step 2.4.5: will p-th feature mapThrough a third convolution block Conv 3×3 After the up-sampling operation, the p up-sampling characteristic diagram is obtained>
Step 2.4.6: map the p-th self-learning featureAnd p-th upsampling feature->After subtraction, the p-th fusion characteristic F is obtained p ;
Step 2.4.7: fusing feature F with p p 、And p-th self-learning feature map->After addition, the p-th multiscale feature interaction feature map ++in the k-th stage is obtained>
Step 2.4.8: judging whether k=k is satisfied, if so, obtaining a characteristic diagram with 1-Y-1 receptive field enhancementCorresponding K-stage p-th series multiscale interaction feature map->Decision diagram +.p series with K stages>Otherwise, go to step 2.4.9;
step 2.4.9: stage kAfter 3 times of third convolution block Conv 3×3 After the processing of (a) a kth phase p decision diagram is obtained +.>
Step 2.4.10: will beAnd the kth stage p decision diagram D k p The upsampling operations are performed separately so that +.>Andis the same in size and correspondingly gets the kth phase p up-sampling decision diagram +.>And kth phase p upsamplingFeature map->And with the characteristic pattern of receptive field enhancement->And (3) inputting the k+1 values into a multi-scale feature interaction module together, and returning to the step (2.4.1) for sequential execution after the k+1 values are assigned to the k.
4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the multi-focus image fusion method of any one of claims 1-3, the processor being configured to execute the program stored in the memory.
5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the multi-focus image fusion method of any of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310767148.7A CN116630763A (en) | 2023-06-27 | 2023-06-27 | Multi-scale context awareness-based multi-focus image fusion method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310767148.7A CN116630763A (en) | 2023-06-27 | 2023-06-27 | Multi-scale context awareness-based multi-focus image fusion method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116630763A true CN116630763A (en) | 2023-08-22 |
Family
ID=87613610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310767148.7A Pending CN116630763A (en) | 2023-06-27 | 2023-06-27 | Multi-scale context awareness-based multi-focus image fusion method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116630763A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593619A (en) * | 2024-01-18 | 2024-02-23 | 腾讯科技(深圳)有限公司 | Image processing method, device, electronic equipment and storage medium |
-
2023
- 2023-06-27 CN CN202310767148.7A patent/CN116630763A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593619A (en) * | 2024-01-18 | 2024-02-23 | 腾讯科技(深圳)有限公司 | Image processing method, device, electronic equipment and storage medium |
CN117593619B (en) * | 2024-01-18 | 2024-05-14 | 腾讯科技(深圳)有限公司 | Image processing method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109410239B (en) | Text image super-resolution reconstruction method based on condition generation countermeasure network | |
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
US20230080693A1 (en) | Image processing method, electronic device and readable storage medium | |
CN111754438B (en) | Underwater image restoration model based on multi-branch gating fusion and restoration method thereof | |
CN111754446A (en) | Image fusion method, system and storage medium based on generation countermeasure network | |
CN112927209B (en) | CNN-based significance detection system and method | |
CN116309648A (en) | Medical image segmentation model construction method based on multi-attention fusion | |
CN111931857B (en) | MSCFF-based low-illumination target detection method | |
Cun et al. | Defocus blur detection via depth distillation | |
CN112528782A (en) | Underwater fish target detection method and device | |
CN116229056A (en) | Semantic segmentation method, device and equipment based on double-branch feature fusion | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN113392711A (en) | Smoke semantic segmentation method and system based on high-level semantics and noise suppression | |
Liu et al. | Griddehazenet+: An enhanced multi-scale network with intra-task knowledge transfer for single image dehazing | |
CN110135446A (en) | Method for text detection and computer storage medium | |
CN116630763A (en) | Multi-scale context awareness-based multi-focus image fusion method | |
CN113393434A (en) | RGB-D significance detection method based on asymmetric double-current network architecture | |
CN112270366A (en) | Micro target detection method based on self-adaptive multi-feature fusion | |
WO2024109336A1 (en) | Image repair method and apparatus, and device and medium | |
CN115331024A (en) | Intestinal polyp detection method based on deep supervision and gradual learning | |
CN113297956A (en) | Gesture recognition method and system based on vision | |
CN116645592A (en) | Crack detection method based on image processing and storage medium | |
CN116152128A (en) | High dynamic range multi-exposure image fusion model and method based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |