CN111861880B - Image super-fusion method based on regional information enhancement and block self-attention - Google Patents

Image super-fusion method based on regional information enhancement and block self-attention Download PDF

Info

Publication number
CN111861880B
CN111861880B CN202010506835.XA CN202010506835A CN111861880B CN 111861880 B CN111861880 B CN 111861880B CN 202010506835 A CN202010506835 A CN 202010506835A CN 111861880 B CN111861880 B CN 111861880B
Authority
CN
China
Prior art keywords
super
fusion
resolution
block
source image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010506835.XA
Other languages
Chinese (zh)
Other versions
CN111861880A (en
Inventor
李华锋
岑悦亮
余正涛
张亚飞
原铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010506835.XA priority Critical patent/CN111861880B/en
Publication of CN111861880A publication Critical patent/CN111861880A/en
Application granted granted Critical
Publication of CN111861880B publication Critical patent/CN111861880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of digital image processing. The method comprises a source image super-resolution branch and a fusion super-resolution branch. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block also passes through the region information enhancement block to explore the region of each object in the source image, and the information assists in fusing the super-resolution branch accurate prediction fusion decision diagram. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the source image information after the region enhancement input in the source image super-resolution branch so as to better distinguish focusing and non-focusing regions. And finally, performing sub-pixel convolution on each branch to generate a super-resolution source image and a fusion image.

Description

Image super-fusion method based on regional information enhancement and block self-attention
Technical Field
The invention relates to an image super-fusion method based on regional information enhancement and block self-attention, and belongs to the technical field of image information processing.
Background
The purpose of image fusion is to fuse the information of two or more source images captured by different cameras in the same scene into one image and ensure that the information of each source image can be preserved. The image fusion has very wide application in the fields of safety monitoring images, medical images, satellite remote sensing images and the like. In recent years, many researches have achieved good fusion effects, but the existing method is usually based on the de-fusion of high-resolution multi-focus source image data sets, however, the images obtained by the real-world imaging system are not necessarily high-resolution images. When fusing low-resolution source images, the fused image will also be low-resolution, even blurred and lacking in detail information, which reduces the utility of the image fusion technique. In order to input a low-resolution source image into a traditional fusion method for fusion, bicubic interpolation and nearest neighbor interpolation are generally adopted as upsampling operations to unify the resolution of the source image. However, these interpolation methods are too simple, have no pertinence to different data, and introduce wrong information to reduce the accuracy of image texture details, resulting in poor fusion effect; in addition, for the fusion task of the multi-focus images, the accuracy of the fusion decision diagram is also reduced. Therefore, in order to solve these disadvantages and make the task of low resolution image fusion more efficient, a method capable of accurately super-resolving and fusing images is urgently needed.
In recent years, many image fusion methods based on deep learning have been proposed, which have a greater ability to extract texture and detail than fusion methods based on transform domain and spatial domain. One of the methods is to use an encoder-decoder network, extract the features of the source image by using an encoder, fuse the features by using a decoding network, and gradually amplify to obtain a fused image. One class of methods uses a pre-trained classification convolutional network to input image blocks into the network to predict whether the image blocks are focused, thereby generating a fusion decision diagram. One class of methods decomposes a source image into a base layer containing large-scale contours or intensity variations and a detail layer containing important textures, which are fused separately. Still other methods are based on generating a countermeasure network, implementing a fused image with the generator, while the discriminator is only used to distinguish the difference between the fused image and the visible image, extracting more texture from the visible image. These methods, while innovative and successful, still suffer from two major drawbacks: 1) the resolution of the source image is low, the resolution of the fused image is low, and texture details are lacked; 2) the regional scope of the salient features in the image cannot be accurately estimated, so that the salient features of the source image contained in the fusion result image are not complete enough.
To overcome the first two disadvantages, some work combines super-resolution with image fusion tasks. Dictionary learning-based methods learn a set of multi-scale dictionaries from a high-resolution image and then use sparse coefficients based on local information content to fuse low-resolution image blocks, but these methods require storage of the dictionaries from low-resolution image to high-resolution image, thereby consuming memory. Some methods fuse images by compressed sensing, however, these methods on the one hand require two steps, i.e. decomposing this task into super-resolution and fusion of the images, which is very time consuming. Some methods use structure tensor, fractional order differential and variation technology to integrate image fusion and super-resolution into one step, but these methods can only perform super-resolution of integral multiple, and are not flexible and practical enough, and the fusion result is not good enough.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an image super-fusion and fusion method based on regional information enhancement and block self-attention, so as to solve the image fusion problem when the resolution of a source image is low, and improve the quality of a fusion result.
The technical scheme adopted by the invention is as follows: an image super-fusion and fusion method based on region information enhancement and block self-attention is disclosed, taking a low-resolution multi-focus image fusion method as an example, and a flow chart is shown in fig. 1, wherein the method specifically comprises the following steps:
step1, during the task of over-resolution and fusion of multi-focus images, as shown in FIG. 1, the source image with low resolution
Figure GDA0003687926730000021
Figure GDA0003687926730000022
Respectively input into the source image super-resolution branches, and at the same time,
Figure GDA0003687926730000023
spliced together according to the channels and input into the fusion and super-resolution branches. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. And then, the source image super-resolution branch contains 17 feature extraction blocks and 17 region information enhancement blocks, and the fusion and super-resolution branch contains 17 fusion blocks based on a block self-attention mechanism. And the 17 feature extraction blocks, the 17 region information enhancement blocks and the 17 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 17 feature extraction blocks, and the 17 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction Block
Figure GDA0003687926730000024
Besides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. The information output by the region information enhancement block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 17 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, 17 in the source image super-resolution branchAfter the feature extraction block, and after 17 block-based self-attention mechanism fusion blocks in the source image super-resolution branch, there is a layer of 1 × 1 convolutions and a layer of sub-pixel convolutions. 1 x 1 convolution reduction
Figure GDA0003687926730000031
(17 th block-based output from the fusion block of the attention mechanism) to the square of the magnification factor r, where
Figure GDA0003687926730000032
Are respectively
Figure GDA0003687926730000033
The output of the 17 th characteristic extraction block of the super-resolution branch of the source image,
Figure GDA0003687926730000034
Sub-pixel convolution is carried out on the output of a 17 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure GDA0003687926730000035
Super-resolution result of
Figure GDA0003687926730000036
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold division SR Finally, combining the source image to obtain a super-resolution fusion result image
Figure GDA0003687926730000037
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure GDA0003687926730000038
Super-resolution results of
Figure GDA0003687926730000039
Figure GDA00036879267300000310
And a decision weight graph W SR Super-resolution fusion result image
Figure GDA00036879267300000311
And then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases to be flat, and obtaining a high-quality super-resolution and fusion result by testing.
Specifically, the dense connection mode proposed in Step2 refers to an initial feature map f output by a first layer convolution layer in a source image super-resolution branch 0 And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f 0 And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
specifically, the region information enhancement block proposed in Step2 is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition of
Figure GDA00036879267300000312
Are respectively as
Figure GDA00036879267300000313
The amount of deviation in the horizontal and vertical directions of (1),
Figure GDA00036879267300000314
are respectively as
Figure GDA00036879267300000315
Is offset in the horizontal and vertical directions, wherein
Figure GDA00036879267300000316
Are respectively
Figure GDA00036879267300000317
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure GDA0003687926730000041
And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th time
Figure GDA0003687926730000042
The calculation method is as follows:
Figure GDA0003687926730000043
Figure GDA0003687926730000044
Figure GDA0003687926730000045
Figure GDA0003687926730000046
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Specifically, the block self-attention mechanism proposed in Step3 means that attention should be paid to pixels having a large influence on a pixel when local characteristics of the pixel are considered. In the present invention, the feature relationship of each location to its 7 × 7 neighborhood will be explored. In that
Figure GDA0003687926730000047
In for position p, define
Figure GDA0003687926730000048
A neighborhood range of 7 × 7 with p as the center point;
Figure GDA0003687926730000049
is composed of
Figure GDA00036879267300000410
The characteristic value corresponding to the region, delta (-) fuses the information in the neighborhood range together; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p position p Can be calculated as:
Figure GDA00036879267300000411
wherein
Figure GDA00036879267300000412
I.e. calculating the eigenvector x of the p position by using the transposition multiplication mode p And the feature vector x of the q position q The correlation of (c). BatchNormalize (·) is a batch normalization operation.
The fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
Specifically, the normalization of the Sigmoid function in Step4 means that:
Figure GDA00036879267300000413
wherein
Figure GDA0003687926730000051
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight graph of multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides a weight graph W SR This can be obtained by the following formula:
Figure GDA0003687926730000052
then, the results are fused
Figure GDA0003687926730000053
Can be represented by a decision weight graph W SR Obtaining:
Figure GDA0003687926730000054
specifically, the loss calculation proposed in Step5 uses the L1 norm with better convex optimization properties to calculate the loss and uses Adam optimizer to minimize the loss value. Definition of
Figure GDA0003687926730000055
Are tag values, respectively
Figure GDA0003687926730000056
Corresponding high resolution image,
Figure GDA0003687926730000057
Corresponding high-resolution image, high-resolution fused image, W SR 、W HR If the decision weight graph and the high-resolution label fusion decision graph are respectively used, the loss is calculated as follows:
Figure GDA0003687926730000058
specifically, Relu is used as the nonlinear activation function after all convolutional layers, except where specifically noted; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
The invention has the beneficial effects that: the method comprises a source image super-resolution branch and a fusion super-resolution branch, wherein the image super-resolution branch assists in fusing the super-resolution branch to obtain an accurate fusion decision map. In the source image super-resolution branch, a feature extraction block is used iteratively to extract a source image feature map, and dense connection is used to fully utilize feature map information before and after the source image feature map is extracted. The output of each feature extraction block goes through a regional information enhancement block to explore the range and the region of each object in the source image, and the information is transmitted to a fusion super-resolution branch to accurately predict a fused decision weight map. In the fusion super-resolution branch, two source images are spliced together and input, and a fusion block based on a block self-attention mechanism is iteratively used by combining the information of the source images after the region enhancement input in the source image super-resolution branch, so that the focused region and the unfocused region are better distinguished. And finally, using sub-pixel convolution as an upsampling layer to generate a super-resolution source image and a fusion image.
Drawings
FIG. 1 is an overall architecture diagram of the present invention incorporating an embodiment;
fig. 2 is a diagram of the structure of each sub-module: (a) a structure diagram of a feature extraction block in a source image super-resolution branch is shown; (b) the structure diagram of the fusion block based on the self-attention mechanism in the super-resolution and fusion branch is shown; (c) the tile is patterned for region information enhancement.
Detailed Description
The following detailed description of the embodiments, specific examples and flow diagrams are shown in FIG. 1. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.
Example 1: referring to fig. 1, a schematic diagram of steps of an image super-fusion and fusion method based on region information enhancement and block self-attention according to the present application is shown, in which an input source image and an output result image of a specific example are also shown. As shown in fig. 1, the present application is composed of a source image super-resolution branch and a super-resolution and fusion branch, and provides an image super-resolution and fusion method based on region information enhancement and block self-attention, including:
step1, in the task of over-resolution and fusion of multiple focus images, as shown in FIG. 1, the source image with low resolution
Figure GDA0003687926730000061
Figure GDA0003687926730000062
Respectively input into the source image super-resolution branches, and at the same time,
Figure GDA0003687926730000063
spliced together according to the channels and input into the fusion and super-resolution branches. And a 3 multiplied by 3 convolutional layer is arranged at the beginning of the source image super-resolution branch and the fusion and super-resolution branch and is used for preliminarily extracting features. Thereafter, the source imageThe super-resolution branch comprises 17 feature extraction blocks and 17 region information enhancement blocks, and the fusion and super-resolution branch comprises 17 fusion blocks based on a block self-attention mechanism. And the 17 feature extraction blocks, the 17 region information enhancement blocks and the 17 fusion blocks based on the block self-attention mechanism are in one-to-one correspondence, and i (i is more than or equal to 0 and less than or equal to 16) is defined as the ith feature extraction block/region information enhancement block/fusion block based on the block self-attention mechanism.
Step2, in the source image super-resolution branch, the initial feature map passes through 17 feature extraction blocks, and the 17 feature extraction blocks are connected in a dense connection mode. Output of the i-1 th feature extraction Block
Figure GDA0003687926730000064
Besides being continuously input into the ith feature extraction block to construct a super-resolution source image, the super-resolution source image is also input into the ith region information enhancement block to assist in fusion and super-resolution branch acquisition decision weight map. The region information enhancement block will enhance the information of the salient feature region, especially the feature information of the focus region. The information output by the region information enhancement block is input into the ith fusion block based on the block self-attention mechanism in the fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 17 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after passing 17 feature extraction blocks in the source image super-resolution branch, and after 17 block self-attention mechanism based fusion blocks in the source image super-resolution branch, is a layer of 1 × 1 convolutional layers and a layer of sub-pixel convolutions. 1 x 1 convolution reduction
Figure GDA0003687926730000071
To the square of the magnification r, wherein
Figure GDA0003687926730000072
Are respectively
Figure GDA0003687926730000073
The output of the 17 th characteristic extraction block of the super-resolution branch of the source image,
Figure GDA0003687926730000074
Sub-pixel convolution is carried out on the output of a 17 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure GDA0003687926730000075
Super-resolution result of
Figure GDA0003687926730000076
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold division SR Finally, combining the source image to obtain a super-resolution fusion result image
Figure GDA0003687926730000077
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure GDA0003687926730000078
Super-resolution result of
Figure GDA0003687926730000079
Figure GDA00036879267300000710
And a decision weight graph W SR Super-resolution fusion result image
Figure GDA00036879267300000711
Thereafter, the losses between them and the label will be calculated and the basis usedAnd (3) minimizing the loss by an optimizer of a gradient descent method so as to optimize parameters of the network, finishing network training when the loss gradually decreases to be flat, and testing to obtain high-quality super-resolution and fusion results.
Furthermore, in Step2, the dense connection mode refers to the initial feature map f output by the first layer convolution layer in the super-resolution branch of the source image 0 And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block. Finally, f 0 And the outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1. The structure of the feature extraction block is shown in fig. 2(a), which is composed of three convolution layers of 3 × 3, and uses a residual learning mode to alleviate the degradation problem caused by the deep network;
further, in Step2, the proposed region information enhancement block is shown in fig. 2 (c). Firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so that the feature map closer to the shape and the size of the object is obtained. Definition of
Figure GDA00036879267300000712
Are respectively as
Figure GDA00036879267300000713
The amount of offset in the horizontal and vertical directions of the,
Figure GDA00036879267300000714
are respectively as
Figure GDA00036879267300000715
Is offset in the horizontal and vertical directions, wherein
Figure GDA00036879267300000716
Are respectively
Figure GDA00036879267300000717
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure GDA00036879267300000718
And outputting the ith characteristic extraction block of the source image super-resolution branch. Therefore, the feature map of the salient object region information input to the super-resolution and fusion branch i-th time
Figure GDA0003687926730000081
The calculation method is as follows:
Figure GDA0003687926730000082
Figure GDA0003687926730000083
Figure GDA0003687926730000084
Figure GDA0003687926730000085
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolution layer with a convolution kernel size k of 3, and LeakyRelu (-) is a commonly used nonlinear activation function with a slope s set to 0.2.
Further, in Step3, the block self-attention mechanism means that attention should be paid to pixels that have a large influence on a pixel when local characteristics of the pixel are considered. In the present invention, the feature relationship of each location to its 7 × 7 neighborhood will be explored. In that
Figure GDA0003687926730000086
In for position p, define
Figure GDA0003687926730000087
A neighborhood range of 7 × 7 with p as the center point;
Figure GDA0003687926730000088
is composed of
Figure GDA0003687926730000089
The characteristic value corresponding to the region, delta (-) fuses the information in the neighborhood range together; sigmoid (-) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p position p Can be calculated as:
Figure GDA00036879267300000810
wherein
Figure GDA00036879267300000811
I.e. calculating the eigenvector x of the p position by using the transposition multiplication mode p And the feature vector x of the q position q The correlation of (c). BatchNormalize (·) is a batch normalization operation.
Further, in Step3, the fusion block based on the block self-attention mechanism means that the previously output fusion feature map is spliced with the feature map of the salient focusing region input by the source image super-resolution branch, and after information integration is performed through convolution of 1 × 1 and convolution of several layers of 3 × 3, the self-attention mechanism based on the block range is used to more accurately highlight the range of the salient object.
In Step4, the normalization of Sigmoid function means:
Figure GDA00036879267300000812
wherein
Figure GDA00036879267300000813
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target size; and (m, n) represents a coordinate position, and then a decision weight map for multi-focus image fusion is obtained by using threshold t division. The invention sets t to 0.5, decides a weight graph W SR This can be obtained by the following formula:
Figure GDA0003687926730000091
then, the results are fused
Figure GDA0003687926730000092
Can be represented by a decision weight graph W SR Obtaining:
Figure GDA0003687926730000093
further, in Step5, regarding the loss calculation, the present invention calculates the loss using the L1 norm with better convex optimization properties and uses an Adam optimizer to minimize the loss value. Definition of
Figure GDA0003687926730000094
Are tag values, respectively
Figure GDA0003687926730000095
Corresponding high resolution image,
Figure GDA0003687926730000096
Corresponding high resolution image, high resolution fused image, W SR 、W HR If the decision weight graph and the high-resolution label fusion decision graph are respectively used, the loss is calculated as follows:
Figure GDA0003687926730000097
in Step5, the input test image, i.e. the two low-resolution source images on the left side in fig. 1, is the input low-resolution source image of the specific example, and the intermediate image on the right side in fig. 1 is the fusion result image of the specific example, it can be seen that the super-resolution fusion result contains abundant texture detail information of the two low-resolution source images, which indicates that the method can capture information in the low-resolution source images deeply and further generate natural high-quality details. The focused boundary and the non-focused boundary are accurately estimated, which shows that the region information enhancement block of the invention has the effect of accurately estimating the object contour, the fusion block based on the block attention mechanism has the effect of accurately estimating the focused region, and the combination of the two blocks ensures the information fusion of the focused regions of the two source images.
Further, unless otherwise specified, Relu is used as the nonlinear activation function after all convolutional layers; the convolutional layers are all SAME type convolution, namely the input and output of the convolutional layers are consistent in size, and all source images share one source image super-resolution branch.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (5)

1. An image super-fusion method based on region information enhancement and block self-attention is characterized in that: the method comprises the following specific steps:
step1, in the task of over-dividing and fusing multi-focus images, the source image with low resolution
Figure FDA0003687926720000011
Respectively input into the source image super-resolution branches, and at the same time,
Figure FDA0003687926720000012
splicing the source image super-resolution branch and the fusion and super-resolution branch together according to channels, inputting the source image super-resolution branch and the fusion and super-resolution branch into a 3 x 3 convolution layer at the beginning for preliminarily extracting features, then, the source image super-resolution branch contains 17 feature extraction blocks and 17 region information enhancement blocks, the fusion and super-resolution branch contains 17 fusion blocks based on a block self-attention mechanism, the 17 feature extraction blocks, the 17 region information enhancement blocks and the 17 fusion blocks based on the block self-attention mechanism correspond to one another one by one, and i is defined, i is more than or equal to 0 and less than or equal to 16 and is the ith feature extraction block or the region information enhancement block or the fusion block based on the block self-attention mechanism;
step2, in the source image super-resolution branch, the initial feature map passes through 17 feature extraction blocks, and the 17 feature extraction blocks are connected in a dense connection mode, and the output of the i-1 th feature extraction block
Figure FDA00036879267200000110
Figure FDA00036879267200000111
The method comprises the steps that except that the information is continuously input into an ith feature extraction block to construct a super-resolution source image, the information is input into an ith region information enhancement block to assist fusion and a super-resolution branch to obtain a decision weight map, the region information enhancement block enhances the information of a significant feature region and the feature information of a focusing region, and the information output by the region information enhancement block is input into an ith fusion block based on a block self-attention mechanism in a fusion and super-resolution branch;
step3, in the fusion and super-resolution branch, the initial feature map passes through 17 fusion blocks based on a block self-attention mechanism, the features are fully extracted, and the information is fused in a self-adaptive manner;
step4, after 17 feature extraction blocks in the source image super-resolution branch, and after 17 fusion blocks based on the block self-attention mechanism in the source image super-resolution branch, is a layer of 1 × 1 convolutions and a layer of sub-imagesConvolution of elements, 1 x 1 convolution reduction
Figure FDA0003687926720000013
To the square of the magnification r, wherein
Figure FDA0003687926720000014
Figure FDA0003687926720000015
Are respectively
Figure FDA0003687926720000016
The output of the 17 th characteristic extraction block of the super-resolution branch of the source image,
Figure FDA0003687926720000017
Sub-pixel convolution is carried out on the output of a 17 th feature extraction block of a source image super-resolution branch, the output of a fusion block of a fusion and super-resolution branch based on a block self-attention mechanism, the output of the 1 x 1 convolution layer is up-sampled to reach a target size H x W, H and W respectively represent the height and width of the target size, and after sub-pixel convolution, the source image super-resolution branch is obtained
Figure FDA0003687926720000018
Super-resolution results of
Figure FDA0003687926720000019
In the fusion and super-resolution branch, normalization is carried out through a Sigmoid function, and a decision weight graph W for multi-focus image fusion is obtained through threshold division SR Finally, combining the source image to obtain a super-resolution fusion result image
Figure FDA0003687926720000021
Step5, obtaining the network parameter through Step4 in the network parameter training process
Figure FDA0003687926720000022
Super-resolution results of
Figure FDA0003687926720000023
Figure FDA0003687926720000024
And a decision weight graph W SR And super-resolution fusion result image
Figure FDA0003687926720000025
Then, calculating the loss between the label and the label, and minimizing the loss by using an optimizer based on a gradient descent method, thereby optimizing the parameters of the network, finishing the network training when the loss gradually decreases and tends to be flat, and obtaining a high-quality super-resolution and fusion result through testing;
the region information enhancement block proposed in Step2 is: firstly, a layer of convolution layer acts on an input characteristic diagram, and the dimension of an output characteristic diagram is 2 times that of the input characteristic diagram; the output characteristic diagram is sliced according to the channel to obtain two characteristic diagrams with the same dimension, and the two characteristic diagrams are the offset of the input characteristic diagram in the horizontal direction and the vertical direction; namely, the convolutional layer learns the offset of each position of the input feature map in the horizontal and vertical directions, and the horizontal and vertical offsets and the input feature map are input into the deformable convolution, so as to obtain the feature map closer to the shape and the size of the object and define the feature map
Figure FDA0003687926720000026
Are respectively as
Figure FDA0003687926720000027
The amount of deviation in the horizontal and vertical directions of (1),
Figure FDA0003687926720000028
are respectively as
Figure FDA0003687926720000029
Is offset in the horizontal and vertical directions, wherein
Figure FDA00036879267200000210
Are respectively
Figure FDA00036879267200000211
The output of the ith feature extraction block of the super-resolution branch of the source image,
Figure FDA00036879267200000212
The output of the ith feature extraction block in the super-resolution branch of the source image, and therefore the feature map of the salient object region information input to the super-resolution and fusion branch for the ith time
Figure FDA00036879267200000213
The calculation method is as follows:
Figure FDA00036879267200000214
Figure FDA00036879267200000215
Figure FDA00036879267200000216
Figure FDA00036879267200000217
where split (-) is the channel slicing operation, DConv (-) represents the deformable convolution, Conv (-) represents the convolutional layer with a convolution kernel size k of 3, leak relu (-) is a commonly used nonlinear activation function with a slope s set to 0.2;
the block self-attention mechanism proposed in Step3 means that when considering the local feature of a pixel, attention should be paid to the pixels which have a large influence on the local feature, and the feature relationship between each position and the 7 × 7 neighborhood range is explored, wherein
Figure FDA00036879267200000218
In for position p, define
Figure FDA00036879267200000219
A neighborhood range of 7 × 7 with p as the center point;
Figure FDA00036879267200000220
is composed of
Figure FDA00036879267200000221
The characteristic value corresponding to the region, delta (-) fuses the information in the neighborhood range together; sigmoid (·) is an intra-block normalization function, and is used for calculating the weight of other position features in the neighborhood to the feature at the central point p; after the block self-attention mechanism, the characteristic value y of the p position p Can be calculated as:
Figure FDA0003687926720000031
wherein
Figure FDA0003687926720000039
I.e. calculating the eigenvector x of the p position by using the transposition multiplication mode p And the feature vector x of the q position q Batchnormaize (·) is a batch normalization operation;
the fusion block based on the block self-attention mechanism proposed in Step3 means that a fusion feature map output in the front is spliced with a feature map of a salient focusing area input by a source image super-resolution branch, information integration is carried out through convolution of 1 × 1 and convolution of several layers of 3 × 3, and then the self-attention mechanism based on the block range is used for more accurately highlighting the range of a salient object.
2. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein:
the dense connection mode proposed in Step2 is as follows: initial characteristic diagram f output by first layer convolution layer in source image super-resolution branch 0 And the output of the previous i-1 feature extraction blocks will be the input of the ith feature extraction block, and finally, f 0 The outputs of all the blocks are spliced together, and dimension reduction and information integration are carried out through convolution of 1 multiplied by 1; the structure of the feature extraction block is composed of three convolution layers of 3 x 3, and residual learning is used to alleviate the degradation problem caused by the deep network.
3. The method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the normalization of the Sigmoid function in Step4 means that:
Figure FDA0003687926720000034
wherein
Figure FDA0003687926720000035
Representing the result of convolution of the super-resolution with the sub-pixels in the fused branch, the feature map being single-channel and of the size of the target; (m, n) represents coordinate positions, and then a decision weight map of multi-focus image fusion is obtained by using threshold t division, t is set to be 0.5, and a decision weight map W is obtained SR This can be obtained by the following formula:
Figure FDA0003687926720000036
then, the results are fused
Figure FDA0003687926720000037
Can be represented by a decision weight graph W SR Obtaining:
Figure FDA0003687926720000038
4. the method for image hyper-and fusion based on regional information enhancement and block self-attention of claim 1, wherein: the loss calculation proposed in Step5 uses the L1 norm with better convex optimization properties to calculate the loss, and uses an Adam optimizer to minimize the loss value to define the loss
Figure FDA0003687926720000041
Are tag values, respectively
Figure FDA0003687926720000042
Corresponding high resolution image,
Figure FDA0003687926720000043
Corresponding high resolution image, high resolution fused image, W SR 、W HR The decision weight graph and the high-resolution label fusion decision graph are respectively used, and the loss is calculated as follows:
Figure FDA0003687926720000044
5. the image super-fusion method based on region information enhancement and block self-attention according to any one of claims 1-4, characterized in that: relu was used as the nonlinear activation function after all convolutional layers; the convolutional layers are all SAME type convolutions, namely the input and output of the convolutional layers are kept consistent in size, and all source images share one source image super-resolution branch.
CN202010506835.XA 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention Active CN111861880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010506835.XA CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010506835.XA CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Publications (2)

Publication Number Publication Date
CN111861880A CN111861880A (en) 2020-10-30
CN111861880B true CN111861880B (en) 2022-08-30

Family

ID=72986067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010506835.XA Active CN111861880B (en) 2020-06-05 2020-06-05 Image super-fusion method based on regional information enhancement and block self-attention

Country Status (1)

Country Link
CN (1) CN111861880B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418163B (en) * 2020-12-09 2022-07-12 北京深睿博联科技有限责任公司 Multispectral target detection blind guiding system
CN112784909B (en) * 2021-01-28 2021-09-28 哈尔滨工业大学 Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
CN113094972B (en) * 2021-03-15 2022-08-02 西南大学 Bedrock depth prediction method and system based on generation of confrontation network and environmental element data
CN113537246A (en) * 2021-08-12 2021-10-22 浙江大学 Gray level image simultaneous coloring and hyper-parting method based on counterstudy
CN113705675B (en) * 2021-08-27 2022-10-04 合肥工业大学 Multi-focus image fusion method based on multi-scale feature interaction network
CN113837946B (en) * 2021-10-13 2022-12-06 中国电子技术标准化研究院 Lightweight image super-resolution reconstruction method based on progressive distillation network
CN113963009B (en) * 2021-12-22 2022-03-18 中科视语(北京)科技有限公司 Local self-attention image processing method and system based on deformable block

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109714592A (en) * 2019-01-31 2019-05-03 天津大学 Stereo image quality evaluation method based on binocular fusion network
CN109859106A (en) * 2019-01-28 2019-06-07 桂林电子科技大学 A kind of image super-resolution rebuilding method based on the high-order converged network from attention
CN110033410A (en) * 2019-03-28 2019-07-19 华中科技大学 Image reconstruction model training method, image super-resolution rebuilding method and device
CN110322402A (en) * 2019-04-30 2019-10-11 武汉理工大学 Medical image super resolution ratio reconstruction method based on dense mixing attention network
CN110334765A (en) * 2019-07-05 2019-10-15 西安电子科技大学 Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism
CN111179167A (en) * 2019-12-12 2020-05-19 天津大学 Image super-resolution method based on multi-stage attention enhancement network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140719B2 (en) * 2016-12-22 2018-11-27 TCL Research America Inc. System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US10671918B2 (en) * 2017-10-24 2020-06-02 International Business Machines Corporation Attention based sequential image processing
US20190156220A1 (en) * 2017-11-22 2019-05-23 Microsoft Technology Licensing, Llc Using machine comprehension to answer a question

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859106A (en) * 2019-01-28 2019-06-07 桂林电子科技大学 A kind of image super-resolution rebuilding method based on the high-order converged network from attention
CN109714592A (en) * 2019-01-31 2019-05-03 天津大学 Stereo image quality evaluation method based on binocular fusion network
CN110033410A (en) * 2019-03-28 2019-07-19 华中科技大学 Image reconstruction model training method, image super-resolution rebuilding method and device
CN110322402A (en) * 2019-04-30 2019-10-11 武汉理工大学 Medical image super resolution ratio reconstruction method based on dense mixing attention network
CN110334765A (en) * 2019-07-05 2019-10-15 西安电子科技大学 Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism
CN111179167A (en) * 2019-12-12 2020-05-19 天津大学 Image super-resolution method based on multi-stage attention enhancement network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Face Super-Resolution Reconstruction Based on Self-Attention Residual Network";Y QING-MING LIU;《IEEE Access》;20200108;全文 *
"MCFNet: multi-layer concatenation fusion network for medical images fusion";Liang X C et al.;《IEEE Sensors Journal》;20190425;全文 *
"卷积稀疏表示图像融合与超分辨率联合实现";杨默远等;《光学技术》;20200430;全文 *
"基于深度学习的图像描述算法研究";朱欣鑫;《信息科技》;20190815;全文 *

Also Published As

Publication number Publication date
CN111861880A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
Islam et al. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN109791697B (en) Predicting depth from image data using statistical models
Engin et al. Cycle-dehaze: Enhanced cyclegan for single image dehazing
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN110427968B (en) Binocular stereo matching method based on detail enhancement
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
CN113657388B (en) Image semantic segmentation method for super-resolution reconstruction of fused image
CN110717851A (en) Image processing method and device, neural network training method and storage medium
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
Cheng et al. Zero-shot image super-resolution with depth guided internal degradation learning
CN111696035A (en) Multi-frame image super-resolution reconstruction method based on optical flow motion estimation algorithm
CN115713679A (en) Target detection method based on multi-source information fusion, thermal infrared and three-dimensional depth map
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN111932594B (en) Billion pixel video alignment method and device based on optical flow and medium
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
CN116563103A (en) Remote sensing image space-time fusion method based on self-adaptive neural network
CN112950653B (en) Attention image segmentation method, device and medium
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN114565764A (en) Port panorama sensing system based on ship instance segmentation
CN114693951A (en) RGB-D significance target detection method based on global context information exploration
CN113850719A (en) RGB image guided depth map super-resolution method based on joint implicit image function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant