CN113159236A - Multi-focus image fusion method and device based on multi-scale transformation - Google Patents

Multi-focus image fusion method and device based on multi-scale transformation Download PDF

Info

Publication number
CN113159236A
CN113159236A CN202110581448.7A CN202110581448A CN113159236A CN 113159236 A CN113159236 A CN 113159236A CN 202110581448 A CN202110581448 A CN 202110581448A CN 113159236 A CN113159236 A CN 113159236A
Authority
CN
China
Prior art keywords
image
fusion
scale
fused
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110581448.7A
Other languages
Chinese (zh)
Inventor
田赛赛
老伟雄
苏喆
高佩忻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110581448.7A priority Critical patent/CN113159236A/en
Publication of CN113159236A publication Critical patent/CN113159236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)

Abstract

The disclosure provides a multi-focus image fusion method and device based on multi-scale transformation, and relates to the field of artificial intelligence or computer vision. The method comprises the following steps: acquiring two input images in the same imaging scene to acquire different depths of the input images; respectively extracting k-level scale image features at different depths from the input image by using a k-level context feature extraction model; performing primary fusion on the image features of the k-level scale by using a same-level scale fusion mode to obtain primary fusion features; fusing the primary fusion feature of each level of scale with the primary fusion feature of the previous level after inverse transformation to obtain a refined fusion feature; reconstructing the refined fusion characteristics by using an image reconstruction model to obtain a fusion image; the multi-focus image fusion network model is trained using the input images and fused images of the input images as training data.

Description

Multi-focus image fusion method and device based on multi-scale transformation
Technical Field
The present disclosure relates to the field of artificial intelligence or computer vision, and in particular, to a multi-focus image fusion method and apparatus based on multi-scale transformation.
Background
The multi-focus image fusion technology aims to fuse a plurality of images containing the same scene under different focus settings to form a full-definition image with more complete information content, and the obtained full-definition image is convenient for subsequent computer vision tasks such as identification and supervision. In order to obtain excellent fusion effect, researchers have proposed various image fusion methods, which can be broadly divided into two main categories according to the principles of algorithms: a conventional image fusion method and an image fusion method based on deep learning.
Conventional image fusion methods can be further divided into fusion methods based on a spatial domain and fusion methods based on a transform domain. However, this kind of multi-focus image fusion method requires human design of activity level detection and fusion rules, which greatly increases the difficulty of algorithm design. In recent years, with the wide application of deep learning, multi-focus image fusion based on deep learning is greatly developed, and compared with the traditional multi-focus image fusion method, the algorithm complexity is greatly reduced, and the fusion performance is greatly improved.
At present, most of multi-focus image fusion algorithms design a convolutional neural network as a classifier, and a label is set for a pair of training data, so that the situation that misclassification occurs at the edges of a focus area and a non-focus area can be caused. In addition, only the focus point detection is completed by the network, and the rest parts also need to set the judgment criterion manually, so that the complexity of the algorithm is increased. In addition, the existing neural network training mode makes the network difficult to learn the edges of the focusing area and the non-focusing area. Also, when defining a variety of masks for network learning of the edges of the in-focus and out-of-focus regions, such regularly shaped masks are not sufficient to simulate real-world situations.
Disclosure of Invention
In view of the above-mentioned deficiencies of the existing multi-focus image fusion technology, the present disclosure provides a multi-focus image fusion method and apparatus based on multi-scale transformation, so as to solve the problem that the existing multi-focus image fusion technology is inaccurate in edge fusion between focused and unfocused images.
One aspect of the present disclosure provides a multi-focus image fusion method based on multi-scale transformation, including: acquiring two input images A under the same imaging scenen(n ═ 1, 2), acquiring different depths of the input image; extracting k-level scale image features at different depths from an input image respectively by using a k-level context feature extraction model
Figure BDA0003084598280000021
Image characteristics of k-level scale by using same-level scale fusion mode
Figure BDA0003084598280000022
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud(ii) a Preliminarily fusing the characteristics U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd(ii) a Refining fused feature U 'by using image reconstruction model'dReconstructing to obtain a fused image Frecon(ii) a Using input image AnFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
According to an embodiment of the present disclosure, the two input images are two multi-focus images to be fused and pre-registered.
According to the embodiment of the disclosure, each level of context feature extraction model is composed of 3 parallel convolution modules with different receptive fields, and the image features containing context information under each level of scale are obtained
Figure BDA00030845982800000213
Wherein: the parallel convolution module of 3 different receptive fields includes: original image characteristic branch, receptive field expansion branch and attention weight branch; the receptive field enlarging branch enlarges the receptive field by using hole convolution to acquire image relative global information, and the attention weight branch refers to self-attentionThe mechanism performs weight processing on the receptive field expansion branch.
According to an embodiment of the present disclosure, in a k-level context feature extraction model, an input image anD scale feature of
Figure BDA0003084598280000023
Is input as an input image AnCharacteristic of the previous scale
Figure BDA0003084598280000024
Representing an input image; by reference to the self-attention mechanism
Figure BDA0003084598280000025
Conversion to transitional image features
Figure BDA0003084598280000026
Adopting Sigmoid function as transition image characteristic
Figure BDA0003084598280000027
Each pixel point of (a) is assigned a weight.
According to embodiments of the present disclosure, a self-attentive mechanism will be cited
Figure BDA0003084598280000028
Conversion to transitional image features
Figure BDA0003084598280000029
The method comprises the following steps: using a filter pair of size 1 and outputting 32 channels
Figure BDA00030845982800000210
Performing convolution operation to obtain transition image characteristics
Figure BDA00030845982800000211
Adopting Sigmoid function as transition image characteristic
Figure BDA00030845982800000212
Each pixel point of (1) allocationThe weight is calculated according to the following formula;
Figure BDA0003084598280000031
wherein, (i, j) represents a row coordinate and a column coordinate, respectively; H. w represents the pixel width and height of the image feature, respectively;
Figure BDA0003084598280000032
representing transitional image features
Figure BDA0003084598280000033
The weight assigned to each pixel point.
According to an embodiment of the present disclosure, the receptive field enlarging branch comprises two partial image sub-features, each partial image sub-feature being formed by two consecutive hole convolutions, wherein: the sub-features of the two partial images are calculated according to the following formulas respectively:
Figure BDA0003084598280000034
Figure BDA0003084598280000035
wherein the content of the first and second substances,
Figure BDA0003084598280000036
a first partial image sub-feature and a second partial image sub-feature, respectively;
Figure BDA0003084598280000037
respectively representing a first hole convolution operation and a second hole convolution operation;
Figure BDA0003084598280000038
respectively representing parameter sets of the label filter corresponding to the first hole convolution operation and the second hole convolution operation; Θ denotes the Relu activation function;
Figure BDA0003084598280000039
to represent
Figure BDA00030845982800000310
And the converted corresponding pixel points are distributed with weights.
According to an embodiment of the present disclosure, an image A is inputnD scale feature of
Figure BDA00030845982800000311
Calculated according to the following formula:
Figure BDA00030845982800000312
wherein the content of the first and second substances,
Figure BDA00030845982800000313
a first volume operation is shown as a first volume operation,
Figure BDA00030845982800000314
a parameter set representing a label filter corresponding to the first convolution operation; cat (·) denotes cascade operation; pooling (. cndot.) represents a pooling operation, with a pooling step size of 2 being set.
According to the embodiment of the disclosure, the image features of k-level scale are subjected to the same-level scale fusion mode
Figure BDA00030845982800000315
Performing preliminary fusion to obtain a preliminary fusion characteristic UdThe method comprises the following steps: image features with same scale from two input images
Figure BDA00030845982800000316
Cascading, and then obtaining a weight value graph through a softmax layer; d scale feature
Figure BDA00030845982800000317
Adding the product multiplied by the weight value graph corresponding to the d-th scale to obtainPreliminary fusion features U to d-th scaled
According to an embodiment of the present disclosure, the features U are preliminarily fuseddCalculated according to the following formula:
Figure BDA00030845982800000318
Figure BDA0003084598280000041
wherein, mapn (n is 1, 2) represents a weight map obtained through softmax layer operation; cat (.) denotes cascade operation; denotes multiplication operations at the pixel level; + represents an addition operation at the pixel level.
According to the embodiment of the disclosure, the primary fusion characteristics U of each level scale are combineddPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'dThe method comprises the following steps: adopting a step-by-step scale reverse transmission mode to enable the preliminary fusion characteristic U of the d-level scaledThe preliminary fusion feature of the previous-level scale obtained after inverse transformation is changed into a preliminary fusion feature Ud+1(ii) a To the previous level preliminary fusion characteristics Ud+1Post-upsampling and preliminary fusion features UdPerforming fusion to obtain refined fusion characteristic U'd
Refining fused feature U 'according to an embodiment of the present disclosure'dCalculated according to the following formula:
Figure BDA0003084598280000042
wherein the content of the first and second substances,
Figure BDA0003084598280000043
it is shown that the second convolution operation is,
Figure BDA0003084598280000044
representing a second convolution operationA set of parameters corresponding to the label filter; Θ denotes the Relu activation function; cat (·) denotes cascade operation; sample (·) represents an upsampling operation.
According to an embodiment of the disclosure, refined fused feature U 'is pair of image reconstruction models'dReconstructing to obtain a fused image FreconThe method comprises the following steps: fused image FreconCalculated according to the following formula:
Figure BDA0003084598280000045
wherein conv (; θ)recon) Denotes a third convolution operation, θreconA parameter set representing a label filter corresponding to the third convolution operation; Θ denotes the Relu activation function.
According to an embodiment of the present disclosure, two input images are constructed by: inputting an image training set containing a plurality of source images, setting a first mask and a second mask with complementary scenes, and acquiring a target region in the source images through the first mask; performing dot multiplication operation on each source image by using a first mask and a second mask respectively to obtain a target image and a background image; continuously and repeatedly blurring the target image and the background image through a blurring filter to obtain a plurality of groups of target blurred images and background blurred images with different blurring degrees; and respectively adding the target blurred image and the background blurred image which have the same degree of blurring in each group to obtain a plurality of groups of artificially synthesized multi-focus images.
According to an embodiment of the present disclosure, the blur filter is a gaussian filter with a sliding window size of 7 × 7 and a standard deviation of 2.
According to an embodiment of the disclosure, the method further comprises: determining a predictive fused image of one of the plurality of input images using a multi-focus image fusion network model; calculating a fused image F of a predicted fused image and an input image determined using a multi-focus image fusion network model using a joint loss functionreconA loss value in between; judging whether the loss value meets a preset loss threshold value, if not, judging according to the loss valueAnd adjusting parameters of the multi-focus image fusion network model, and returning to the step of determining a fusion output image by using the multi-focus image fusion network model aiming at another input image in the plurality of input images.
According to an embodiment of the present disclosure, a joint loss function is jointly constructed according to a structural similarity loss function and a mean square error loss function, wherein; structural similarity loss function LMSESum mean square error loss function LSSIMRespectively calculated according to the following formula:
Figure BDA0003084598280000051
Figure BDA0003084598280000052
LSSIM=1-SSIM(G,P)
wherein H, W represents the pixel height and width of the image, respectively; (i, j) represent row and column coordinates in the image; g (i, j) and P (i, j) respectively represent the color values of the true value image and the prediction fusion image of the corresponding pixel coordinates; i | · | purple wind2Representing a two-norm operation; mu.sG、μPRespectively represent the fused images FreconThe color mean value of the image is fused with the prediction; c1、C2Respectively representing two constants which are used for preventing zero-removing errors and are respectively set to be 0.01 and 0.03 in training;
Figure BDA0003084598280000053
respectively represent the fused images FreconColor variance of the fused image with the prediction; sigmaGPRepresenting a fused image FreconAnd the covariance of the image fused with the prediction.
According to An embodiment of the present disclosure, a fused image F using An input image An and An input imagereconBefore the step of training the multi-focus image fusion network model as training data, the method further comprises: will input image AnZooming to a preset size.
Another aspect of the present disclosure provides a multi-focus image fusion apparatus based on multi-scale transformation, including: an image acquisition module for acquiring two input images A under the same imaging scenen(n ═ 1, 2), acquiring different depths of the input image; a feature extraction module for extracting k-level scale image features at different depths from the input image respectively by using a k-level context feature extraction model
Figure BDA0003084598280000054
A preliminary fusion module for using the same-level scale fusion mode to the image characteristics of k-level scale
Figure BDA0003084598280000055
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud(ii) a The inverse transformation fusion module is used for fusing the primary fusion characteristics U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd(ii) a An image reconstruction module for using the image reconstruction model to refine the fused feature U'dReconstructing to obtain a fused image Frecon(ii) a And a network training module for using the input image AnFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a storage device to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
Compared with the prior art, the multi-focus image fusion method and device based on multi-scale transformation provided by the disclosure at least have the following beneficial effects:
(1) according to the method, the characteristic extraction and the characteristic fusion do not need to be artificially designed, the accurate fusion of the multi-focus images can be realized, and a simulation result shows that the fused image containing abundant information can be obtained;
(2) the method uses the basic network to extract the depth image features which are multi-level and different in scale and contain rich detail information and context information, and can effectively utilize the context information in the image to assist the network in judging the focusing area in the multi-focus image, thereby improving feature extraction;
(3) according to the method, the image features of the same scale and different branches are fused together by constructing the feature fusion module, then the fusion image features under the current scale are fused together with the fusion image features subjected to inverse multi-scale transformation, and the step-by-step refinement of the features is realized in a reverse transmission mode, so that the detail information in the fusion image is effectively improved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates a flow chart of a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates an operational flow diagram of a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram for manual synthesis of an input image according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a sequential multi-pass blur processing procedure according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a process of attention weight branching according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a process of preliminary fusion feature processing according to an embodiment of the disclosure;
FIG. 7 schematically illustrates a process of refining a fused feature according to an embodiment of the disclosure;
FIG. 8 schematically illustrates a flow diagram for training a multi-focus image fusion network model with training data, in accordance with an embodiment of the present disclosure;
fig. 9 schematically illustrates a block diagram of a multi-focus image fusion apparatus based on multi-scale transformation according to an embodiment of the present disclosure; and
FIG. 10 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Before describing in detail specific embodiments of the present disclosure, technical terms are first explained to facilitate a better understanding of the present disclosure.
Convolutional Neural Network (CNN): the supervised deep learning model is a trainable multi-layer structure and aims to learn multi-level feature representation of input data, and each level of feature comprises a plurality of feature maps. The coefficients in the feature map are called neurons, and the feature map is connected by performing several different types of computations, such as convolution, nonlinear activation, and spatial pooling. Generally, nonlinear processing is performed immediately after the input passes through the convolutional layer, and nonlinear functions such as a Sigmoid function, a Relu activation function, and a tanh function are generally adopted, so that the convergence rate during network training can be increased, and the network training process can be accelerated.
Receptive field: the area size of the pixel points on the feature map (feature map) output by each layer of the convolutional neural network is mapped on the original image. The relevant theory and method of the receptive field are researched, the size of the receptive field of each layer in the convolutional neural network is quantified, a reliable optimization direction can be provided for image processing tasks such as target detection and the like, and the method has important significance for improving the precision of the target detection.
Multi-scale: in fact, sampling the signal at different granularities, and we can observe different features at different scales, so as to accomplish different tasks. Generally, more detail can be seen with less/denser granularity sampling, and the overall trend can be seen with more/sparser granularity sampling.
Hole convolution (punctured convolution): by convolving at sparsely sampled locations, the convolution kernel is enlarged with the original weights, thereby increasing the size of the receptive field without adding additional cost. Moreover, the hole convolution can integrate a large amount of context information in semantic segmentation.
softmax layer: and (4) for classification, converting the output result of the neural network into a probability expression through a softmax function, finding the maximum probability item, and classifying the maximum probability item. The softmax layer is typically used at the last level in the neural network for final classification and normalization.
Image masking: refers to the control of the area or process of image processing by occluding (wholly or partially) the processed image with selected images, graphics or objects. In digital image processing, masks may be used to mask certain areas of the image from processing or from processing parameter calculations, or to process or count only the masked areas.
The embodiment of the disclosure provides a multi-focus image fusion method based on multi-scale transformation, which comprises the following steps: acquiring two input images A under the same imaging scenen(n ═ 1, 2), acquiring different depths of the input image; extracting k-level scale image features at different depths from an input image respectively by using a k-level context feature extraction model
Figure BDA0003084598280000091
Image characteristics of k-level scale by using same-level scale fusion mode
Figure BDA0003084598280000092
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud(ii) a Preliminarily fusing the characteristics U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd(ii) a Refining fused feature U 'by using image reconstruction model'dReconstructing to obtain a fused image Frecon(ii) a Using input image AnFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
Fig. 1 schematically shows a flowchart of a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure. Fig. 2 schematically illustrates an operational flow diagram of a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure.
Referring to fig. 2, the method shown in fig. 1 will be described in detail. The multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure may include the following operations S101 to S106.
In operation S101, two input images A of the same imaging scene are acquiredn(n ═ 1, 2), the different depths of the input image are acquired.
In operation S102, image features of k-scale at different depths are respectively extracted from an input image using a k-scale context feature extraction model
Figure BDA0003084598280000093
In operation S103, image features of k-level scale are subjected to a same-level scale fusion method
Figure BDA0003084598280000094
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud
In operation S104, the preliminary fusion features U of each level scale are combineddPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd
In operation S105, the refined fusion feature U 'is fused using the image reconstruction model'dReconstructing to obtain a fused image Frecon
In operation S106, an input image A is usednFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
According to the embodiment of the disclosure, the multi-level depth and multi-scale image features are respectively extracted from the multi-focus image by using the k-level context feature extraction model, and the inverse transformation is performed on the image features of different scales, so that the fusion of the image features of different levels and different scales is realized, the accurate fusion of the multi-focus image can be realized without manually designing feature extraction and feature fusion, a large amount of objective and accurate training data is mined, and the training efficiency and accuracy of the multi-focus image fusion network model are improved.
According to the embodiment of the disclosure, the preliminary fusion features are obtained by using the same-level fusion mode for the k-level image features with different scales obtained in the step S102, and then the refinement of the fusion features is realized by adopting a coarse-to-fine mode.
In the embodiment of the present disclosure, the two input images are two multi-focus images to be fused and pre-registered.
Due to the depth of field of the imaging device and when the image acquisition is actually performed, the target is acquired in a targeted manner, so that the target in the focus area is relatively sharp and the rest of the focus area is relatively blurred. To be able to better simulate the object properties of the in-focus region and to increase the learning of the network for the edges of the in-focus and out-of-focus regions, the present disclosure artificially synthesizes an input image, for example, using the data set MSRA10K for salient object detection as a base data set, which contains 10000 images. It should be noted that, in other embodiments, both the source and the image size of the image training set may be set according to the actual training process, which is not limited by the present disclosure.
Fig. 3 schematically illustrates a flow diagram for manual synthesis of an input image according to an embodiment of the present disclosure. FIG. 4 schematically illustrates a sequential multi-pass blur processing procedure according to an embodiment of the present disclosure.
Referring to fig. 4, the process of artificially synthesizing the input image shown in fig. 3 will be further described. In some embodiments, the two input images may be constructed by the following sub-operations S110-S140:
in operation S110, an image training set including a plurality of source images is input, a first mask and a second mask having complementary scenes are set, and a target region in the source images is obtained through the first mask.
The target area refers to the area where the target object is located, and the target area is selected by acting on the source image through the first mask. For example, the source image may be a face image and the target region may be a face region.
In the embodiment of the present disclosure, after the first mask is set, the second mask can be obtained by inverting the first mask.
In operation S120, a dot product operation is performed on each source image by using the first mask and the second mask, so as to obtain a target image and a background image.
The target image is obtained as the first mask acts on the source image to select the target area. The second mask and the first mask have complementary scenes, and the second mask acts on the source image through dot product operation to obtain the remaining region of the target region, namely the background image.
In operation S130, a plurality of sets of target blurred images and background blurred images with different degrees of blur are obtained by performing a plurality of consecutive blurring processes on the target image and the background image respectively through a blurring filter.
For example, the blurring filter may be a gaussian filter having a sliding window size of 7 × 7 and a standard deviation of 2, and the consecutive multi-pass blurring process may be a gaussian blurring process that operates 5 times in succession.
As shown in fig. 4, 5 blurred images of the target with different blurring degrees can be obtained by continuously operating the selected target part 5 times with a gaussian filter with a window size of 7 and a parameter of 2. Accordingly, 5 background blurred images with different blurring degrees can be obtained by performing the same continuous blurring processing on the background image.
In operation S140, each group of target blurred images and background blurred images with the same blur degree are added to obtain a plurality of artificially synthesized multi-focus images.
The target blurred images and the background blurred images are overlapped based on the same blurring degree, so that multi-focus images with different blurring degrees can be obtained, and therefore the input image has different depths. It should be further noted that the depth, i.e., the blur degree, of the embodiments of the present disclosure is discussed below in terms of depth in order to adapt to the image processing field in computer vision.
In this embodiment of the disclosure, in operation S102, each level of context feature extraction model is formed by 3 parallel convolution modules with different receptive fields, and an image feature including context information at each level of scale is obtained
Figure BDA0003084598280000111
Wherein: the parallel convolution module for 3 different receptive fields includes three branches: original image characteristic branch, receptive field expansion branch and attention weight branch; the receptive field expanding branch expands the receptive field by using hole convolution to acquire image relative global information, and the attention weight branch performs weight processing on the receptive field expanding branch by using a self-attention mechanism.
Each level of context feature extraction model is composed of 3 parallel convolution modules with different receptive fields, and image features containing context information under a single scale are obtained by fusing image features of different levels together.
In the original image feature branch, input image AnD scale feature of
Figure BDA0003084598280000112
Is input as an input image AnCharacteristic of the previous scale
Figure BDA0003084598280000113
Representing the input image.
For ease of understanding, assume that k is 4, and an arbitrary input image a is inputnD scale feature of
Figure BDA0003084598280000114
Can be expressed as:
Figure BDA0003084598280000115
wherein the content of the first and second substances,
Figure BDA0003084598280000116
representing an input image AnThe d-1 th order scale feature of (a),
Figure BDA0003084598280000117
representing an input image;
Figure BDA0003084598280000121
representing the feature extraction function in the context feature extraction model.
Thus, the previous-level scale features need to be input for calculation
Figure BDA0003084598280000122
Performing a series of processing equivalent to converting the computation input into computation output by means of a feature extraction function, i.e. the d-th scale feature
Figure BDA0003084598280000123
Fig. 5 schematically illustrates a process of attention weight branching according to an embodiment of the present disclosure.
As shown in fig. 5, the process of the attention weight branch may include the following sub-operations S210-S220.
In operation S210, a self-attentive mechanism is introduced
Figure BDA0003084598280000124
Conversion to transitional image features
Figure BDA0003084598280000125
For example, the self-attention mechanism may employ a filter pair of size 1 and outputting 32 channels
Figure BDA0003084598280000126
Performing convolution operation to obtain transition image characteristics
Figure BDA0003084598280000127
In operation S220, adoptUsing Sigmoid function as transition image characteristic
Figure BDA0003084598280000128
Each pixel point of (a) is assigned a weight.
Specifically, the transition image features are calculated according to the following formula
Figure BDA0003084598280000129
The distribution weight of each pixel point:
Figure BDA00030845982800001210
wherein, (i, j) represents a row coordinate and a column coordinate, respectively; H. w represents the pixel width and height of the image feature, respectively;
Figure BDA00030845982800001211
representing transitional image features
Figure BDA00030845982800001212
The weight assigned to each pixel point.
In the embodiment of the present disclosure, the receptive field enlarging branch includes two partial image sub-features, each partial image sub-feature is formed by convolution of two continuous holes, where:
the sub-features of the two partial images are calculated according to the following formulas respectively:
Figure BDA00030845982800001213
Figure BDA00030845982800001214
wherein the content of the first and second substances,
Figure BDA00030845982800001215
respectively a first partial image sub-feature and a second partial imageA sub-feature;
Figure BDA00030845982800001216
respectively representing a first hole convolution operation and a second hole convolution operation;
Figure BDA00030845982800001217
respectively representing parameter sets of the label filter corresponding to the first hole convolution operation and the second hole convolution operation; Θ denotes the Relu activation function;
Figure BDA00030845982800001218
to represent
Figure BDA00030845982800001219
And the converted corresponding pixel points are distributed with weights.
Thus, the output of the computation, namely the d-th-order scale feature, is synthesized by integrating the three branches included in the parallel convolution module
Figure BDA0003084598280000131
It can be calculated according to the following formula:
Figure BDA0003084598280000132
wherein the content of the first and second substances,
Figure BDA0003084598280000133
a first volume operation is shown as a first volume operation,
Figure BDA0003084598280000134
a parameter set representing a label filter corresponding to the first convolution operation; cat (·) denotes cascade operation; pooling (. cndot.) represents a pooling operation, with a pooling step size of 2 being set.
FIG. 6 schematically illustrates a process of preliminary fusion feature processing according to an embodiment of the disclosure.
As shown in fig. 6, the image features of k-level scale are fused by using the same level scale
Figure BDA0003084598280000135
Performing preliminary fusion to obtain a preliminary fusion characteristic UdThe following sub-operations S310-S320 may be included.
In operation S310, image features having the same scale from two input images
Figure BDA0003084598280000136
And cascading, and then obtaining a weight value graph through a softmax layer.
In operation S320, a d-th order scale feature
Figure BDA0003084598280000137
Adding the products multiplied by the weight value graph corresponding to the d-th scale to obtain the preliminary fusion characteristic U under the d-th scaled
Therefore, the embodiment of the disclosure adopts a cascading mode for image features of different scales, obtains pixel-level weight information through the softmax layer, performs pixel-level multiplication operation with the original image features respectively, and then obtains preliminary fusion features through pixel-level addition operation.
Specifically, the preliminary fusion features UdCalculated according to the following formula:
Figure BDA0003084598280000138
Figure BDA0003084598280000139
wherein, mapn (n is 1, 2) represents a weight map obtained through softmax layer operation; cat (·) denotes cascade operation; denotes multiplication operations at the pixel level; + represents an addition operation at the pixel level.
FIG. 7 schematically illustrates a process of refining a fused feature according to an embodiment of the disclosure.
As shown in fig. 7, the preliminary fusion features U at each level scale are combineddPreliminary integration with the previous stage through inverse transformationCombined characteristic Ud+1Performing fusion to obtain refined fusion characteristic U'dThe following sub-operations S410-S420 may be included.
In operation S410, a step-by-step scale backward transfer manner is adopted to enable the preliminary fusion feature U of the d-th step scaledThe preliminary fusion feature of the previous-level scale obtained after inverse transformation is changed into a preliminary fusion feature Ud+1
In operation S420, a feature U is preliminarily fused to a previous staged+1Post-upsampling and preliminary fusion features UdPerforming fusion to obtain refined fusion characteristic U'd
Specifically, refining fused feature U'dIt can be calculated according to the following formula:
Figure BDA0003084598280000141
wherein the content of the first and second substances,
Figure BDA0003084598280000142
it is shown that the second convolution operation is,
Figure BDA0003084598280000143
a parameter set representing a second convolution operation corresponding to the label filter; Θ denotes the Relu activation function; cat (·) denotes cascade operation; sample (·) represents an upsampling operation.
Refining fusion characteristic U 'obtained in step S420'dInputting into image reconstruction model to obtain fused image FreconIt can be calculated according to the following formula:
Figure BDA0003084598280000144
wherein conv (; θ)recon) Denotes a third convolution operation, θreconA parameter set representing a label filter corresponding to the third convolution operation; Θ denotes the Relu activation function.
Therefore, the embodiment of the disclosure performs inverse transformation on the obtained preliminary fusion features through a depth supervision mechanism and a back propagation rule, and realizes fusion of image features of different scales by adopting a back transmission mode.
In the embodiment of the present disclosure, the multi-focus image fusion network model may be a mobilene series neural network model or a Resnet series neural network model. The model of the Mobilene series neural network is a neural network model based on deep level separable convolution, and the model of the Resnet series neural network is a neural network model based on residual errors.
In some embodiments, the number of the input images may be greater than two, and for more than two input multi-focus images, the two multi-focus images may be fused first by performing the above steps S101 to S106, and then the same fusion step is repeated for the other two multi-focus images until all the multi-focus images are fused.
Fig. 8 schematically illustrates a flow diagram for training a multi-focus image fusion network model with training data according to an embodiment of the present disclosure. In this example, the training data includes a plurality of input images and a fused image F for each input imagerecon
Determining a predictive fusion image of one input image among the plurality of input images using the multi-focus image fusion network model in operation S610;
in operation S620, a fused image F of the predicted fused image determined using the multi-focus image fusion network model and the input image is calculated using a joint loss functionreconA loss value in between;
in operation S630, it is determined whether the loss value satisfies a preset loss threshold, and if not, parameters of the multi-focus image fusion network model are adjusted according to the loss value, and the step of determining a fusion output image using the multi-focus image fusion network model is returned for another input image among the plurality of input images.
In the embodiment of the present disclosure, the joint loss function in operation S620 is jointly constructed according to a structural similarity loss function and a mean square error loss function, where the structural similarity loss function LMSESum mean square error loss function LSSIMRespectively calculated according to the following formula:
Figure BDA0003084598280000151
Figure BDA0003084598280000152
LSSIM=1-SSIM(G,P)
wherein H, W represents the pixel height and width of the image, respectively; (i, j) represent row and column coordinates in the image; g (i, j) and P (i, j) respectively represent the color values of the true value image and the prediction fusion image of the corresponding pixel coordinates; i | · | purple wind2Representing a two-norm operation; mu.sG、μPRespectively represent the fused images FreconThe color mean value of the image is fused with the prediction; c1、C2Respectively representing two constants which are used for preventing zero-removing errors and are respectively set to be 0.01 and 0.03 in training;
Figure BDA0003084598280000153
respectively represent the fused images FreconColor variance of the fused image with the prediction; sigmaGPRepresenting a fused image FreconAnd the covariance of the image fused with the prediction.
Therefore, the combined loss function constructed by the embodiment of the disclosure comprises the loss (structural similarity loss function) based on the image block level and the loss (mean square error loss function) based on the pixel level, optimizes the quality of image fusion, completes the training of the network, obtains the network model parameters, and realizes the accurate reconstruction of the image by optimizing the loss function.
In some embodiments, input image A is usednFused image F with input imagereconBefore the step of training the multi-focus image fusion network model as training data, the method further comprises: will input image AnZooming to a preset size.
Since the images in the data training set have arbitrary sizes, the images are uniformly transformed into preset sizes of 180 × 180 pixels, for example, so as to adapt to a multi-focus image fusion network model used subsequently.
Fig. 9 schematically shows a block diagram of a multi-focus image fusion apparatus based on multi-scale transformation according to an embodiment of the present invention.
As shown in fig. 9, the multi-focus image fusion apparatus 900 based on multi-scale transformation may include an image acquisition module 910, a feature extraction module 920, a preliminary fusion module 930, an inverse transformation fusion module 940, an image reconstruction module 950, and a network training module 960.
An image collecting module 910, configured to collect two input images a in the same imaging scenen(n ═ 1, 2), acquiring different depths of the input image;
a feature extraction module 920, configured to extract k-scale image features at different depths from the input image using a k-level context feature extraction model
Figure BDA0003084598280000161
A preliminary fusion module 930, configured to apply a same-level scale fusion mode to the k-level scale image features
Figure BDA0003084598280000162
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud
An inverse transform fusion module 940 for fusing the preliminary fusion features U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd
An image reconstruction module 950 for fusing the refined feature U 'with an image reconstruction model'dReconstructing to obtain a fused image Frecon(ii) a And
a network training module 960 for fusing images F using the input image An and the input imagereconAnd training a multi-focus image fusion network model as training data.
It should be noted that the apparatus part of the embodiment of the present disclosure corresponds to the method part of the embodiment of the present disclosure, and the description of the multi-focus image fusion apparatus part based on multi-scale transformation specifically refers to the multi-focus image fusion method part based on multi-scale transformation, and is not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the image acquisition module 910, the feature extraction module 920, the preliminary fusion module 930, the inverse transform fusion module 940, the image reconstruction module 950, and the network training module 960 may be combined into one module/unit/sub-unit to be implemented, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the image acquisition module 910, the feature extraction module 920, the preliminary fusion module 930, the inverse transform fusion module 940, the image reconstruction module 950, and the network training module 960 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or by a suitable combination of any of them. Alternatively, at least one of the image acquisition module 910, the feature extraction module 920, the preliminary fusion module 930, the inverse transform fusion module 940, the image reconstruction module 950 and the network training module 960 may be at least partially implemented as a computer program module, which when executed, may perform corresponding functions.
FIG. 10 schematically shows a block diagram of an electronic device according to an embodiment of the invention. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 includes a processor 1010, a computer-readable storage medium 1020. The electronic device 1000 may perform a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure.
In particular, processor 1010 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 1010 may also include on-board memory for caching purposes. Processor 1010 may be a single processing unit or multiple processing units for performing different acts of a method flow according to embodiments of the disclosure.
Computer-readable storage media 1020, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.
The computer-readable storage medium 1020 may comprise a computer program 1021, which computer program 1021 may comprise code/computer-executable instructions that, when executed by the processor 1010, cause the processor 1010 to perform a method according to an embodiment of the disclosure, or any variant thereof.
The computer program 1021 may be configured with computer program code, for example, comprising computer program modules. For example, in an example embodiment, code in computer program 1021 may include one or more program modules, including, for example, 1021A, modules 1021B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when the program modules are executed by the processor 1010, the processor 1010 may execute the method according to the embodiment of the present disclosure or any variation thereof.
According to an embodiment of the present disclosure, at least one of the image acquisition module 910, the feature extraction module 920, the preliminary fusion module 930, the inverse transform fusion module 940, the image reconstruction module 950 and the network training module 960 may be implemented as a computer program module as described with reference to fig. 10, which, when executed by the processor 1010, may implement the respective operations described above.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a multi-focus image fusion method based on multi-scale transformation according to an embodiment of the present disclosure.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (20)

1. A multi-focus image fusion method based on multi-scale transformation comprises the following steps:
acquiring two input images A under the same imaging scenen(n ═ 1, 2) obtaining the different depths of the input image;
extracting k-scale image features at different depths from the input image respectively using a k-scale context feature extraction model
Figure FDA0003084598270000011
Using a same-level scale fusion mode to perform on the image characteristics of the k-level scale
Figure FDA0003084598270000012
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud
Preliminarily fusing the characteristics U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd
Refining the fused feature U 'by using an image reconstruction model'dReconstructing to obtain a fused image Frecon
Using the input image AnFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
2. The method of claim 1, wherein the two input images are two multi-focus images to be fused and pre-registered.
3. The method as claimed in claim 1, wherein each stage of the context feature extraction model is composed of 3 parallel convolution modules with different receptive fields, and the image features containing context information at each stage of scale are obtained
Figure FDA0003084598270000013
Wherein:
the parallel convolution module of the 3 different receptive fields comprises: original image characteristic branch, receptive field expansion branch and attention weight branch;
the receptive field expanding branch expands the receptive field by using cavity convolution to acquire image relative global information, and the attention weight branch performs weight processing on the receptive field expanding branch by using a self-attention mechanism.
4. The method of claim 3, wherein in the k-level contextual feature extraction model, input image AnD scale feature of
Figure FDA0003084598270000014
Is input as an input image AnCharacteristic of the previous scale
Figure FDA0003084598270000015
Figure FDA0003084598270000016
Representing an input image;
by reference to the self-attention mechanism
Figure FDA0003084598270000017
Conversion to transitional image features
Figure FDA0003084598270000018
Adopting Sigmoid function as transition image characteristic
Figure FDA0003084598270000019
Each pixel point of (a) is assigned a weight.
5. The method of claim 4, wherein the self-attentiveness by reference mechanism is to
Figure FDA0003084598270000021
Conversion to transitional image features
Figure FDA0003084598270000022
The method comprises the following steps:
using a filter pair of size 1 and outputting 32 channels
Figure FDA0003084598270000023
Performing convolution operation to obtain transition image characteristics
Figure FDA0003084598270000024
The Sigmoid function is adopted as the transition image characteristic
Figure FDA0003084598270000025
Each pixel point of (1) is assigned with a weight, and is calculated according to the following formula;
Figure FDA0003084598270000026
wherein, (i, j) represents a row coordinate and a column coordinate, respectively; H. w represents the pixel width and height of the image feature, respectively;
Figure FDA0003084598270000027
representing transitional image features
Figure FDA0003084598270000028
The weight assigned to each pixel point.
6. The method of claim 3, wherein the expanded receptor field branch comprises two partial image sub-features, each partial image sub-feature being comprised of two successive hole convolutions, wherein:
the two partial image sub-features are calculated according to the following formulas respectively:
Figure FDA0003084598270000029
Figure FDA00030845982700000210
wherein the content of the first and second substances,
Figure FDA00030845982700000211
a first partial image sub-feature and a second partial image sub-feature, respectively;
Figure FDA00030845982700000212
respectively representing a first hole convolution operation and a second hole convolution operation;
Figure FDA00030845982700000213
respectively representing parameter sets of the label filter corresponding to the first hole convolution operation and the second hole convolution operation; Θ denotes the Relu activation function;
Figure FDA00030845982700000214
to represent
Figure FDA00030845982700000215
And the converted corresponding pixel points are distributed with weights.
7. The method of claim 6, wherein the input image AnD scale feature of
Figure FDA00030845982700000216
Calculated according to the following formula:
Figure FDA00030845982700000217
wherein the content of the first and second substances,
Figure FDA00030845982700000218
a first volume operation is shown as a first volume operation,
Figure FDA00030845982700000219
a parameter set representing a label filter corresponding to the first convolution operation; cat (·) denotes cascade operation; poThe vibrating (. cndot.) represents a pooling operation, setting the pooling step size to 2.
8. The method of claim 1, wherein the image features at the k-scale are fused using a sibling scale
Figure FDA0003084598270000031
Performing preliminary fusion to obtain a preliminary fusion characteristic UdThe method comprises the following steps:
image features with same scale from two input images
Figure FDA0003084598270000032
Cascading, and then obtaining a weight value graph through a softmax layer;
d scale feature
Figure FDA0003084598270000033
Adding the products multiplied by the weight value graph corresponding to the d-th scale to obtain the preliminary fusion characteristic U under the d-th scaled
9. The method of claim 8, wherein the preliminary fusion features UdCalculated according to the following formula:
Figure FDA0003084598270000034
Figure FDA0003084598270000035
wherein, mapn (n is 1, 2) represents a weight map obtained through softmax layer operation; cat (·) denotes cascade operation; denotes multiplication operations at the pixel level; + represents an addition operation at the pixel level.
10. The method of claim 1, wherein the step of adding each of the plurality ofPreliminary fusion features U of level scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'dThe method comprises the following steps:
adopting a step-by-step scale reverse transmission mode to enable the preliminary fusion feature U of the d-th step scaledThe preliminary fusion feature of the previous-level scale obtained after inverse transformation is changed into a preliminary fusion feature Ud+1
To the previous level preliminary fusion characteristics Ud+1Post-upsampling and preliminary fusion features UdPerforming fusion to obtain refined fusion characteristic U'd
11. The method of claim 10, wherein the refined fused feature U'dCalculated according to the following formula:
Figure FDA0003084598270000036
wherein the content of the first and second substances,
Figure FDA0003084598270000037
it is shown that the second convolution operation is,
Figure FDA0003084598270000038
a parameter set representing a second convolution operation corresponding to the label filter; Θ denotes the Relu activation function; cat (·) denotes cascade operation; sample (·) represents an upsampling operation.
12. The method of claim 1, wherein the refined fused feature U 'is modeled using an image reconstruction model'dReconstructing to obtain a fused image FreconThe method comprises the following steps:
the fusion image FreconCalculated according to the following formula:
Figure FDA0003084598270000041
wherein conv (; θ)recon) Denotes a third convolution operation, θreconA parameter set representing a label filter corresponding to the third convolution operation; Θ denotes the Relu activation function.
13. The method of claim 1, wherein the two input images are constructed by:
inputting an image training set containing a plurality of source images, setting a first mask and a second mask with complementary scenes, and acquiring a target region in the source images through the first mask;
performing dot multiplication operation on each source image by using a first mask and a second mask respectively to obtain a target image and a background image;
continuously and repeatedly blurring the target image and the background image through a blurring filter to obtain a plurality of groups of target blurred images and background blurred images with different blurring degrees;
and respectively adding the target blurred image and the background blurred image which have the same degree of blurring in each group to obtain a plurality of groups of artificially synthesized multi-focus images.
14. The method of claim 13, wherein the blur filter is a gaussian filter with a sliding window size of 7 x 7 and a standard deviation of 2.
15. The method of claim 1, wherein the method further comprises:
determining a predictive fused image of one of the plurality of input images using a multi-focus image fusion network model;
calculating a fused image F of a predicted fused image and an input image determined using a multi-focus image fusion network model using a joint loss functionreconA loss value in between;
and judging whether the loss value meets a preset loss threshold value, if not, adjusting parameters of the multi-focus image fusion network model according to the loss value, and returning to the step of determining a fusion output image by using the multi-focus image fusion network model for another input image in the plurality of input images.
16. The method of claim 15, wherein the joint loss function is jointly constructed from a structural similarity loss function and a mean square error loss function, wherein;
the structural similarity loss function LMSESum mean square error loss function LSSIMRespectively calculated according to the following formula:
Figure FDA0003084598270000042
Figure FDA0003084598270000051
LSSIM=1-SSIM(G,P)
wherein H, W represents the pixel height and width of the image, respectively; (i, j) represent row and column coordinates in the image; g (i, j) and P (i, j) respectively represent the color values of the true value image and the prediction fusion image of the corresponding pixel coordinates; i | · | purple wind2Representing a two-norm operation;
μG、μPrespectively represent the fused images FreconThe color mean value of the image is fused with the prediction; c1、C2Respectively representing two constants which are used for preventing zero-removing errors and are respectively set to be 0.01 and 0.03 in training;
Figure FDA0003084598270000052
respectively represent the fused images FreconColor variance of the fused image with the prediction; sigmaGPRepresenting a fused image FreconAnd the covariance of the image fused with the prediction.
17. The method of claim 1, whereinUsing the input image AnFused image F with input imagereconBefore the step of training the multi-focus image fusion network model as training data, the method further comprises:
the input image A is processednZooming to a preset size.
18. A multi-focus image fusion device based on multi-scale transformation comprises:
an image acquisition module for acquiring two input images A under the same imaging scenen(n ═ 1, 2) obtaining the different depths of the input image;
a feature extraction module for extracting k-scale image features at different depths from the input image respectively using a k-level context feature extraction model
Figure FDA0003084598270000053
A preliminary fusion module for using the same-level scale fusion mode to the image characteristics of the k-level scale
Figure FDA0003084598270000054
Performing preliminary fusion to obtain a preliminary fusion characteristic Ud
The inverse transformation fusion module is used for fusing the primary fusion characteristics U of each level of scaledPreliminarily fusing the characteristics U with the previous stage after inverse transformationd+1Performing fusion to obtain refined fusion characteristic U'd
An image reconstruction module for reconstructing the refined fused feature U 'using an image reconstruction model'dReconstructing to obtain a fused image Frecon(ii) a And
a network training module for using the input image AnFused image F with input imagereconAnd training a multi-focus image fusion network model as training data.
19. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-17.
20. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 17 when executed.
CN202110581448.7A 2021-05-26 2021-05-26 Multi-focus image fusion method and device based on multi-scale transformation Pending CN113159236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110581448.7A CN113159236A (en) 2021-05-26 2021-05-26 Multi-focus image fusion method and device based on multi-scale transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110581448.7A CN113159236A (en) 2021-05-26 2021-05-26 Multi-focus image fusion method and device based on multi-scale transformation

Publications (1)

Publication Number Publication Date
CN113159236A true CN113159236A (en) 2021-07-23

Family

ID=76877704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110581448.7A Pending CN113159236A (en) 2021-05-26 2021-05-26 Multi-focus image fusion method and device based on multi-scale transformation

Country Status (1)

Country Link
CN (1) CN113159236A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705675A (en) * 2021-08-27 2021-11-26 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113762484A (en) * 2021-09-22 2021-12-07 辽宁师范大学 Multi-focus image fusion method for deep distillation
CN113763300A (en) * 2021-09-08 2021-12-07 湖北工业大学 Multi-focus image fusion method combining depth context and convolution condition random field
WO2024027146A1 (en) * 2022-08-01 2024-02-08 五邑大学 Array-type facial beauty prediction method, and device and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705675A (en) * 2021-08-27 2021-11-26 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113705675B (en) * 2021-08-27 2022-10-04 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113763300A (en) * 2021-09-08 2021-12-07 湖北工业大学 Multi-focus image fusion method combining depth context and convolution condition random field
CN113762484A (en) * 2021-09-22 2021-12-07 辽宁师范大学 Multi-focus image fusion method for deep distillation
WO2024027146A1 (en) * 2022-08-01 2024-02-08 五邑大学 Array-type facial beauty prediction method, and device and storage medium

Similar Documents

Publication Publication Date Title
CN110335290B (en) Twin candidate region generation network target tracking method based on attention mechanism
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN109949255B (en) Image reconstruction method and device
CN113159236A (en) Multi-focus image fusion method and device based on multi-scale transformation
CN109711426B (en) Pathological image classification device and method based on GAN and transfer learning
CN111242288B (en) Multi-scale parallel deep neural network model construction method for lesion image segmentation
CN113159073B (en) Knowledge distillation method and device, storage medium and terminal
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
KR102338372B1 (en) Device and method to segment object from image
CN113705769A (en) Neural network training method and device
CN113065645B (en) Twin attention network, image processing method and device
CN111696110B (en) Scene segmentation method and system
CN112927209B (en) CNN-based significance detection system and method
CN110443784B (en) Effective significance prediction model method
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
US20220157046A1 (en) Image Classification Method And Apparatus
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN112529904A (en) Image semantic segmentation method and device, computer readable storage medium and chip
CN114119975A (en) Language-guided cross-modal instance segmentation method
Song et al. Contextualized CNN for scene-aware depth estimation from single RGB image
CN112446888A (en) Processing method and processing device for image segmentation model
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
Uddin et al. A perceptually inspired new blind image denoising method using $ L_ {1} $ and perceptual loss
Fakhari et al. A new restricted boltzmann machine training algorithm for image restoration
US20230073175A1 (en) Method and system for processing image based on weighted multiple kernels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination